Markov chain Monte Carlo methods

Show Summary Details

Quick Reference

Methods, making heavy use of computers, that are very useful in the analysis of large arrays of correlated observations, such as satellite images of the Earth. Suppose we wish to calculate θ, the expected value of a function h of r correlated random variables. Let X be the r×1 vector of random variables. A standard Monte Carlo method would generate a sequence of numerical values, X1, X2,…, Xn, from which θ could be estimated by θ̂, given by . However, it can be difficult to generate a vector Xj when the joint distribution of the r variables involves a complex correlation structure—for example, when the variables refer to the values displayed in an array of neighbouring pixels in a satellite image of the Earth's surface.

Let Xjk be the kth random variable in Xj. MCMC methods aim to generate r Markov chains, where the kth chain is the sequence of values x1k, x2k,…. An arbitrary set of values is chosen for the first chain, subsequent chains being generated from their predecessors by Monte Carlo methods. In calculating θ̂ it is necessary to ignore the early chains in the sequence, since these will be affected by the initial choice of values. This is called the burn-in period.

A popular method for obtaining the required Markov chains, which need to be stationary, is the Metropolis–Hastings algorithm. Let the stationary probability for the random variable Xj being in state m be pm and let Q be a known matrix with non-negative elements. The transition probability matrix governing the sequence X1, X2,…is defined by , where ajk is given by , and each cj is chosen so that Σkpjk=1.

The most used version of the Metropolis–Hastings algorithm incorporates the Gibbs sampler. The aim is to generate a random vector X with elements satisfying a specified relationship. Denote the initial values in X by x1(0), x2(0),…. A single element of the vector (element j, say) is chosen at random, and a potential new value, x, is generated by random selection from the conditional distribution of Xj given the values of the remaining variables, {Xk, k≠j}. If the new value is in accord with the specified relationship, then it replaces the previous value so that xj(1)=x; otherwise the previous value is retained: xj(1)=xj(0). This process is repeated until approximate equilibrium has been reached.

The procedure was originally proposed in the context of statistical mechanics by Metropolis and others in 1953, and was introduced into statistics in 1970 by Hastings.

Subjects: Probability and Statistics.

Reference entries

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.