Central Limit Theorems

Lindeberg–Lévy CLT (Classical CLT)

In this post, we will explore the Central Limit Theorem (CLT), a fundamental result in probability theory that describes the behavior of the mean of a large number of random variables. The CLT states that, under certain conditions, the distribution of the sample mean approaches a normal distribution as the sample size increases. There are several versions of the CLT, each with its own set of assumptions and conclusions. We will discuss the Lindeberg–Lévy CLT (Classical CLT), the Lyapunov CLT, and the Lindeberg CLT.

Lindeberg–Lévy CLT (Classical CLT)

Theorem (Lindeberg–Lévy CLT). Let \(X_1, X_2, \ldots\) be i.i.d. random variables with mean \(\mu\) and variance \(\sigma^2 < \infty\). Define the sample mean as \(\bar{X}_n = \dfrac{1}{n} \sum_{i=1}^{n} X_i\). Then, as \(n \to \infty\),

\[\dfrac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \xrightarrow{d} \mathcal{N}(0, 1),\]

where \(\xrightarrow{d}\) denotes convergence in distribution.

Proof. To prove the Lindeberg–Lévy CLT, we can use the method of characteristic functions. The characteristic function of a random variable \(X\) is defined as \(\varphi_X(t) = \mathbb{E}[e^{itX}]\). The following theorem connects convergence in distribution of the sequence of random variables with pointwise convergence of their characteristic functions.

Lévy’s Continuity Theorem. Let \(X_n\) be a sequence of random variables and \(X\) be a random variable. Then \(X_n \xrightarrow{d} X\) if and only if \(\varphi_{X_n}(t) \to \varphi_X(t)\) for all \(t \in \mathbb{R}\) where \(\varphi_X(t)\) is continuous.

Define the standardized random variables \(Z_i = \dfrac{X_i - \mu}{\sigma}\). It follows that \(Z_i\) are i.i.d. with \(\mathbb{E}[Z_i] = 0\) and \(\mathrm{Var}(Z_i) = 1\). Then, we can express \(\dfrac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma}\) as the sum of the standardized random variables, i.e.,

\[\dfrac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} = \frac{1}{\sqrt{n}}\sum_{i=1}^{n} Z_i := S_n.\]

Now we consider the characteristic function of \(S_n\). Since \(Z_i\) are i.i.d. with mean 0 and variance 1, the characteristic function of \(S_n\) is given by

\[\varphi_{S_n}(t) = \left( \varphi_{Z_1}(t/\sqrt{n}) \right)^n.\]

For any fixed \(t\), by the Taylor expansion of the characteristic function \(\varphi_{Z_1}(t)\) around 0, we have

\[\begin{align*} \varphi_{Z_1}(t/\sqrt{n}) &= \mathbb{E}[e^{\frac{itZ_1}{\sqrt{n}}}] \\ &= \mathbb{E}[1 + \frac{itZ_1}{\sqrt{n}} + \frac{1}{2}\left(\frac{itZ_1}{\sqrt{n}}\right)^2] + o(\frac{1}{n}) \\ &= 1 - \frac{t^2}{2n} + o(\frac{1}{n}). \end{align*}\]

Therefore, the limiting characteristic function of the sum is

\[\lim_{n \to \infty} \varphi_{S_n}(t) = \lim_{n \to \infty} \left( 1 - \frac{t^2}{2n} + o(\frac{1}{n}) \right)^n = e^{-\frac{t^2}{2}}.\]

Since \(e^{-\frac{t^2}{2}}\) is the characteristic function of the standard normal distribution \(\mathcal{N}(0, 1)\), by Lévy’s Continuity Theorem, we conclude that \(\dfrac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \xrightarrow{d} \mathcal{N}(0, 1)\) as \(n \to \infty\). This completes the proof.

Corollay (Lindeberg–Lévy CLT with unknown variance). Let \(X_1, X_2, \ldots\) be i.i.d. random variables with mean \(\mu\) and unknown variance \(\sigma^2 < \infty\). Define the sample mean as \(\bar{X}_n = \dfrac{1}{n} \sum_{i=1}^{n} X_i\) and the sample variance as \(\hat{\sigma}_{n}^{2} = \dfrac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X}_n)^2\). Then, as \(n \to \infty\),

\[\dfrac{\sqrt{n}(\bar{X}_n - \mu)}{\hat{\sigma}_{n}} \xrightarrow{d} \mathcal{N}(0, 1).\]

Proof. To prove this corollary, we can use Slutsky’s Theorem and Continuous Mapping Theorem.

Slutsky’s Theorem. Let \(X_n\) and \(Y_n\) be sequences of random variables, and let \(X\) and \(Y\) be random variables. If \(X_n \xrightarrow{d} X\) and \(Y_n \xrightarrow{p} Y\), then \(X_n + Y_n \xrightarrow{d} X + Y\), \(X_n Y_n \xrightarrow{d} XY\), and if \(Y \neq 0\), then \(\frac{X_n}{Y_n} \xrightarrow{d} \frac{X}{Y}\).

Continuous Mapping Theorem. Let \(X_n\) be a sequence of random variables and \(X\) be a random variable. If \(X_n \xrightarrow{d} X\) and \(g\) is a function that is continuous at all points in the support of \(X\), then \(g(X_n) \xrightarrow{d} g(X)\). This also holds for convergence in probability and almost sure convergence.

By the Lindeberg–Lévy central limit theorem,

\[\frac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma} \xrightarrow{d} \mathcal{N}(0,1).\]

Thus it remains to show that

\[\hat{\sigma}_n \xrightarrow{p} \sigma.\]

Once this is established, Slutsky’s theorem will imply that

\[\frac{\sqrt{n}(\bar{X}_n-\mu)}{\hat{\sigma}_n} = \frac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma} \cdot \frac{\sigma}{\hat{\sigma}_n} \xrightarrow{d} \mathcal{N}(0,1).\]

We now prove the consistency of \(\hat{\sigma}_n^2\).

First, use the identity

\[\sum_{i=1}^n (X_i-\bar{X}_n)^2 = \sum_{i=1}^n (X_i-\mu)^2 - n(\bar{X}_n-\mu)^2.\]

Therefore,

\[\hat{\sigma}_n^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i-\mu)^2 - \frac{n}{n-1}(\bar{X}_n-\mu)^2.\]

We now analyze the two terms separately. Since the random variables \((X_i-\mu)^2\) are i.i.d. and satisfy \(\mathbb{E}[(X_i-\mu)^2] = \sigma^2 < \infty,\) the weak law of large numbers gives

\[\frac{1}{n}\sum_{i=1}^n (X_i-\mu)^2 \xrightarrow{p} \sigma^2.\]

Because \(\frac{n}{n-1} \to 1,\) it follows that

\[\frac{1}{n-1}\sum_{i=1}^n (X_i-\mu)^2 = \frac{n}{n-1}\cdot \frac{1}{n}\sum_{i=1}^n (X_i-\mu)^2 \xrightarrow{p} \sigma^2.\]

Next, by the weak law of large numbers, \(\bar{X}_n \xrightarrow{p} \mu.\) Hence, by the continuous mapping theorem,

\[(\bar{X}_n-\mu)^2 \xrightarrow{p} 0.\]

Since also \(\frac{n}{n-1} \to 1,\) we obtain

\[\frac{n}{n-1}(\bar{X}_n-\mu)^2 \xrightarrow{p} 0.\]

Combining the two convergence results above, we conclude that \(\hat{\sigma}_n^2 \xrightarrow{p} \sigma^2.\) Because \(\sigma^2>0\) and the square-root function is continuous on \((0,\infty)\), the continuous mapping theorem yields \(\hat{\sigma}_n \xrightarrow{p} \sigma.\) Equivalently, \(\frac{\sigma}{\hat{\sigma}_n} \xrightarrow{p} 1.\)

Finally, by Slutsky’s theorem,

\[\frac{\sqrt{n}(\bar{X}_n-\mu)}{\hat{\sigma}_n} = \frac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma} \cdot \frac{\sigma}{\hat{\sigma}_n} \xrightarrow{d} Z \cdot 1 = Z,\]

where \(Z \sim \mathcal{N}(0,1)\). Therefore, \(\frac{\sqrt{n}(\bar{X}_n-\mu)}{\hat{\sigma}_n} \xrightarrow{d} \mathcal{N}(0,1).\) This completes the proof.