The Central Limit Theorem

The Central Limit Theorem states that the sample mean of iid random variables is approximately normal for large sample sizes, regardless of the underlying distribution. This lesson develops the statement of the CLT, shows how to compute probabilities for sample means, applies the normal approximation to the binomial, and handles sums of iid random variables.

Step 1 of 157%

Tutorial

Statement of the Central Limit Theorem

Let X1,X2,,XnX_1, X_2, \ldots, X_n be independent, identically distributed (iid) random variables with mean μ\mu and finite variance σ2\sigma^2. The sample mean is

Xˉn=X1+X2++Xnn.\bar{X}_n = \dfrac{X_1 + X_2 + \cdots + X_n}{n}.

The Central Limit Theorem (CLT) states that for large nn, the sample mean is approximately normally distributed:

Xˉn    N ⁣(μ,  σ2n).\bar{X}_n \;\approx\; N\!\left(\mu,\; \dfrac{\sigma^2}{n}\right).

That is, Xˉn\bar{X}_n has approximate mean μ\mu and approximate variance σ2/n\sigma^2/n. The remarkable feature is that this holds regardless of the distribution of the XiX_i -- it could be skewed, discrete, bimodal -- as long as σ2<\sigma^2 < \infty.

A common rule of thumb is that n30n \geq 30 is large enough for the approximation to be useful. (For sample means drawn from a normal population, the result is exact for every nn.)

For example, suppose the daily number of customers at a coffee shop has mean μ=200\mu = 200 and variance σ2=144\sigma^2 = 144. Then the average daily count over n=36n = 36 days has approximate distribution

Xˉ36    N ⁣(200,  14436)=N(200,4),\bar{X}_{36} \;\approx\; N\!\left(200,\; \dfrac{144}{36}\right) = N(200,\, 4),

so the standard deviation of Xˉ36\bar{X}_{36} is 4=2\sqrt{4} = 2.

navigate · Enter open · Esc close · ⌘K/Ctrl K toggle