Supplement: Probability Inequalities and Limit Theorems

This supplement reinforces the first part of the course: how sample summaries behave as random variables and why probability inequalities lead naturally to convergence theorems.

Sample Mean and Sample Variance

Let $X_1,\ldots,X_n$ be a random sample from a population with mean $\mu$ and variance $\sigma^2 < \infty$.

The sample mean is

\[\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i.\]

It is itself a random variable. Its mean and variance are

\[E[\bar{X}_n] = \mu, \qquad \mathrm{Var}(\bar{X}_n) = \frac{\sigma^2}{n}.\]

The sample variance is

\[S_n^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i-\bar{X}_n)^2.\]

The main lesson is that averaging reduces variance. The sample mean is centered at the population mean, and its spread shrinks like $1/n$.

Markov’s Inequality

Theorem (Markov’s Inequality): Let $Y$ be a nonnegative random variable. For any $a > 0$,

\[P(Y \geq a) \leq \frac{E[Y]}{a}.\]

Markov’s inequality converts information about an average into an upper bound on a tail probability. It is often loose, but it is a powerful first tool because it makes very few assumptions.

Proof Sketch For a continuous nonnegative random variable with density $p(y)$, $$ E[Y] = \int_0^\infty y p(y)\,dy \geq \int_a^\infty y p(y)\,dy \geq \int_a^\infty a p(y)\,dy = aP(Y \geq a). $$ Dividing by $a$ gives the result.

Chebyshev’s Inequality

Theorem (Chebyshev’s Inequality): If $Y$ has mean $\mu$ and finite variance $\sigma^2$, then for any $k > 0$,

\[P(|Y-\mu| \geq k\sigma) \leq \frac{1}{k^2}.\]

Chebyshev’s inequality applies Markov’s inequality to the nonnegative random variable $(Y-\mu)^2$.

Proof By Markov's inequality, $$ P(|Y-\mu| \geq k\sigma) = P((Y-\mu)^2 \geq k^2\sigma^2) \leq \frac{E[(Y-\mu)^2]}{k^2\sigma^2} = \frac{1}{k^2}. $$

Moment Generating Functions and Chernoff Bounds

The moment generating function of $X$ is

\[M_X(t) = E[e^{tX}],\]

when the expectation exists in a neighborhood of $0$.

Chernoff Bound: If $M_Y(t)$ exists, then for $t>0$,

\[P(Y \geq a) \leq e^{-ta}M_Y(t).\]

The useful part of the Chernoff bound is that we may optimize over $t$ to get the tightest available upper bound.

Laws of Large Numbers

The weak law of large numbers says that sample averages converge to the population mean in probability:

\[\bar{X}_n \xrightarrow{P} \mu.\]

The strong law gives a stronger conclusion:

\[\bar{X}_n \xrightarrow{a.s.} \mu.\]

In words, averages stabilize. The more independent observations we collect, the harder it becomes for the sample mean to remain far from the population mean.

Deriving the Weak Law from Chebyshev

For iid $X_i$ with finite variance, Chebyshev gives the whole proof:

\[P(|\bar{X}_n-\mu|\geq \epsilon) \leq \frac{\mathrm{Var}(\bar{X}_n)}{\epsilon^2} = \frac{\sigma^2}{n\epsilon^2}.\]

The right-hand side goes to zero as $n\to\infty$, so

\[\bar{X}_n \xrightarrow{P}\mu.\]

This is a useful template: first control a probability with an inequality, then show the bound vanishes.

Central Limit Theorem

The central limit theorem describes the distribution of the remaining estimation error:

\[\frac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma} \xrightarrow{d} N(0,1).\]

This theorem explains why Gaussian approximations appear throughout statistical inference, even when the original population is not Gaussian.

Approximate Confidence Calculations

For large $n$,

\[\bar{X}_n \approx N\left(\mu,\frac{\sigma^2}{n}\right).\]

If $\sigma$ is known, then an approximate 95 percent interval for $\mu$ is

\[\bar{X}_n \pm 1.96\frac{\sigma}{\sqrt{n}}.\]

If $\sigma$ is unknown, it is common to plug in $S_n$:

\[\bar{X}_n \pm 1.96\frac{S_n}{\sqrt{n}},\]

with the approximation justified by Slutsky’s theorem when $S_n\xrightarrow{P}\sigma$.

Delta Method

If

\[\sqrt{n}(\hat{\theta}_n-\theta) \xrightarrow{d} N(0,\sigma^2),\]

and $g$ is differentiable at $\theta$, then

\[\sqrt{n}(g(\hat{\theta}_n)-g(\theta)) \xrightarrow{d} N(0, [g'(\theta)]^2\sigma^2).\]

The delta method lets us transfer asymptotic normality through smooth transformations.

Example: Delta Method for the Log Mean

Suppose $\bar{X}_n$ estimates a positive mean $\mu>0$ and

\[\sqrt{n}(\bar{X}_n-\mu)\xrightarrow{d}N(0,\sigma^2).\]

For $g(x)=\log x$, $g’(\mu)=1/\mu$. Therefore,

\[\sqrt{n}(\log \bar{X}_n-\log \mu) \xrightarrow{d} N\left(0,\frac{\sigma^2}{\mu^2}\right).\]

The derivative controls how uncertainty changes under the transformation.

Student Takeaways

  • Sample statistics are random variables with their own distributions.
  • Probability inequalities give finite-sample probability bounds.
  • Laws of large numbers justify consistency.
  • The central limit theorem and delta method justify approximate uncertainty calculations.