Random variables can converge in several different ways. Here is a brief introduction, highlighting examples of the different behaviors that can occur. Many of the examples below are taken from Stoyanov (2013).

There are at least six different notions of convergence for random variables.

- \(X_n\) converges to \(X\)
**pointwise**if \(X_n(\omega)\to X(\omega)\) for all \(\omega\). - \(X_n\) converges to \(X\)
**almost surely**or**almost everywhere**if \(X_n(\omega)\to X(\omega)\) for almost all \(\omega\), i.e., for all \(\omega\) in a set of probability 1. Thus, almost sure convergence requires \(\mathsf{Pr}(\{\omega \mid X_n(\omega)\to X(\omega) \})=1\). - \(X_n\) converges to \(X\) in
**probability**if for any \(\epsilon>0\) we have \(\mathsf{Pr}(|X_n(\omega)-X(\omega)|>\epsilon)\to 0\) as \(n\to\infty\). - \(X_n\) converges to \(X\) in
**distribution**or**law**or**weakly**or in the**weak* topology**if \(F_n(x)\to F(x)\) for all \(x\) for which \(F(x)\) is continuous, where \(F_n,F\) are the distribution functions of \(X_n,X\). - \(X_n\) converges to \(X\) in the
**\(L^1\)-norm**if \(\int |X_n(\omega) - X(\omega)| \,\mathsf{Pr}(d\omega)\to 0\) as \(n\to\infty\). More generally, for \(p\ge 1\) there is convergence for the**\(L^p\)-norm**if \(\int |X_n(\omega) - X(\omega)|^p\, \mathsf{Pr}(d\omega)\to 0\). - \(X_n\) converges to \(X\) in the
**sup-norm**if \(\sup_\Omega |X_n - X| \to 0\).

Pointwise convergence is the strongest notion, and it obviously implies almost sure convergence. Almost sure convergence implies convergence in probability, which implies convergence in distribution. \(L^1\) convergence implies convergence in probability. The figure lays out the relationships schematically. \(L^p\) convergence implies \(L^r\) convergence for \(p\ge r\ge 1\). sup-norm convergence can be regarded as a special case of \(L^p\) as \(p\to\infty\). Notice that since probability spaces have total probability (measure) 1, we are concerned about large values of \(X\) only . Random variables never fail to be integrable because of small values of \(X\). (On \([1,\infty)\) the variable \(X(x)=1/x\) is divergent, but \([1,\infty)\) does not have finite measure.)

Convergence in distribution is special to probability theory. It is equivalent to a number of other conditions, spelled out in the *Portmanteau theorem*, Billingsley (1986). In particular, on a standard probability space, convergence in distribution is equivalent to \(\mathsf{Pr}(X_n\in A)\to\mathsf{Pr}(X\in A)\) for all events \(A\) whose boundary has probability zero and to \(\mathsf E[g(X_n)]\to \mathsf E[g(x)]\) for all bounded, continuous functions \(g\). The last condition partially explains the condition for convergence in distribution using Fourier transforms (moment generating functions), since \(g(x)=e^{2\pi i x\theta}\) is bounded for fixed \(\theta\).

The relationships between the different modes of convergence are best understood by considering examples.

- Convergence in probability but not almost surely.
- \(X_n\) independent with \(\mathsf{Pr}(X_n=1)=1/n\) and \(\mathsf{Pr}(X_n=0)=1-1/n\); \(X=0\). \(X_n(\omega)\to 0\) requires that all \(X_n\) for \(n\ge N\) equal zero, which has probability \(\prod_{n\ge N}(1-\frac{1}{n})=0\) (take logs and use \(\log(1-1/n)<-1/n\) and the fact \(\sum_n 1/n\) diverges).
- (Typewriter sequence.) Each integer \(n\ge 1\) can be written uniquely as \(n=2^m+k\) for \(0\le k < 2^m\). Let \(X_n(\omega)=1\) if \(\omega\in [k2^{-m}, (k+1)2^{-m}]\) and 0 otherwise, and \(X=0\). Then \(X_n\) converges to \(X\) in probability but not almost surely (for given \(\omega\), \(X_n(\omega)=1\) for one \(k\) for each \(m\) and is zero otherwise, hence it takes the values 0 and 1 infinitely many times and so \(X_n(\omega)\) does converge for any \(\omega\)).

- Convergence in distribution but not in probability.
- Let \(X_n=X\) be Bernoulli and \(Y=1-X\). Then \(X_n\) tends to \(X\) and \(Y\) in distribution (they have the same distribution) but not in probability because \(\mathsf{Pr}(X_n=Y)=\mathsf{Pr}(X=Y)=0\). Just as law invariant risk measures do not
*see*the actual events, convergence in distribution does not consider explicit events. - The same example works if \(X\) is any non-trivial, symmetric random variable, and \(Y=-X\).
- Let \(X_n\) be uniform on \(k/n\) for \(k=0,1,\dots,n-1\) and \(X\) be uniform on \([0,1]\). Then \(X_n\) converges to \(X\) in distribution (the distribution of \(X_n\) is a finer and finer stair-step function converging to the distribution of \(X\)) but not probability (the distribution of \(X_n\) is supported on rational numbers, which have probability zero.)

- Let \(X_n=X\) be Bernoulli and \(Y=1-X\). Then \(X_n\) tends to \(X\) and \(Y\) in distribution (they have the same distribution) but not in probability because \(\mathsf{Pr}(X_n=Y)=\mathsf{Pr}(X=Y)=0\). Just as law invariant risk measures do not
- \(L^1\) convergence or almost sure but not both.
- \(X_n(\omega)=n\) if \(\omega<1/n\) and 0 otherwise converges to \(X=0\) almost surely but not in \(L^1\), since \(\int X_n=1\) for all \(n\) but \(\int X=0\). Note \(X_n\) is unbounded; if \(X_n\) is dominated by an integrable function then Lebesgueâ€™s dominated convergence theorem ensures \(L^1\) convergence.
- The typewriter sequence has \(L^1\) convergence but not almost sure convergence, since \(\int X_n\to 0\). In fact, it converges in \(L^p\) for all \(p<\infty\). It does not converge in \(L^\infty\) since \(\sup X_n=1\not=\sup X=0\).

- Equivalent formulations for convergence in distribution.
- The test function \(g\) must be continuous. Let \(X_n=1/n\) with probability \(1-1/n\) and \(1\) otherwise. \(X_n\) converges to 0 in probability (for all \(\epsilon>0\), \(\mathsf{Pr}(X_n>\epsilon)\to 0\) as \(n\to \infty\)). Let \(g(x)=0\) for \(x\le 0\) and \(g(x)=1\) for \(x>0\). For all \(n\), \(g(X_n)=1\), but \(g(0)=0\).
- Test sets \(A\) must have a boundary of probability zero. Apply the third (uniform) example from (2) to \(A=\mathbb Q\cap [0,1]\), the rationals in \([0,1]\). \(\mathsf{Pr}(X_n\in A)=1\) for all \(n\), but \(\mathsf{Pr}(X\in A)=0\). In this case the boundary of \(A\) is the set of all irrational numbers, which has probability 1 (the rationals have probability zero: they can be covered by an open set of arbitrarily small probability by putting an open interval of width \(\epsilon /2^{n+1}\) around the \(n\)th rational).

- The strong law of large numbers is a statement that the sample mean converges to the true mean almost surely. For an iid sequence it holds iff \(\mathsf E[|X_1|]<\infty\).
- The weak law of large numbers is a statement that the sample mean converges in probability, which is true under weaker conditions that do not require the mean exists, see Feller (1971).
- The central limit theorem is a statement about the convergence of the distribution of the mean of a sample as the sample size increases.

Billingsley, Patrick, 1986, *Probability and Measure*. Second. (J. Wiley; Sons).

Feller, William, 1971, *An Introduction to Probability Theory and its Applications, Volume 2*. Second. (J. Wiley; Sons).

Stoyanov, Jordan M., 2013, *Counterexamples in Probability*. Third. (Dover).

posted 2022-01-20 | tags: mathematics, probability

- 2020-10-20 | Minimum Bias and Exponential Family Distributions
- 2020-10-20 | Probability Models for Insurance Losses
- 2022-01-20 | Modes of Convergence
- 2020-10-20 | The Tweedie Power Variance Function Family
- 2020-10-20 | Exponential Family Distributions
- 2020-10-20 | Bailey Simon Minimum Bias Reexamined
- 2018-09-15 | Frequency Distributions
- 2022-01-28 | The Power Variance Family of Distributions