Basics in Linear Models (Part 2)
Inference in Linear Models
From the previous post, we have shown that the linear models can be fitted using the ordinary least squares (OLS) method. Recall that a linear model can be written as
\[\mathbf{y} = X \boldsymbol{\beta} + \boldsymbol{\epsilon},\]where \(X \in \mathbb{R}^{n \times d}\) is a full column-rank design matrix, \(\mathbf{y} \in \mathbb{R}^n\) is the response vector, and \(\boldsymbol{\epsilon}\) is the noise vector. The OLS estimate is given by
\[\begin{equation} \label{eq:ols} \hat{\boldsymbol{\beta}} = (X^{\top} X)^{-1} X^{\top} \mathbf{y}. \end{equation}\]Test for a Single Coefficient (t-test)
The OLS estimate is unbiased even without assuming the normality of the noise, i.e., \(\mathbb{E}[\hat{\boldsymbol{\beta}}] = \boldsymbol{\beta}\). If we further assume that the noise \(\epsilon\) is normally distributed, i.e., \(\boldsymbol{\epsilon} \sim \mathcal{N}(0, \sigma^2 \mathbf{I}_d)\), then we can derive the distribution of the OLS estimates.
The OLS estimates \(\hat{\boldsymbol{\beta}}\) are normally distributed with mean \(\boldsymbol{\beta}\) and variance \(\sigma^2 (X^{\top} X)^{-1}\), i.e.,
\[\hat{\boldsymbol{\beta}} \sim \mathcal{N} \left(\boldsymbol{\beta}, \sigma^2 (X^{\top} X)^{-1}\right).\]To see this, we can rewrite the OLS estimates \eqref{eq:ols} as
\[\begin{aligned} \hat{\boldsymbol{\beta}} &= (X^{\top} X)^{-1} X^{\top} \mathbf{y} \\ &= (X^{\top} X)^{-1} X^{\top} (X \boldsymbol{\beta} + \boldsymbol{\epsilon}) \\ &= (X^{\top} X)^{-1} X^{\top} X \boldsymbol{\beta} + (X^{\top} X)^{-1} X^{\top} \boldsymbol{\epsilon} \\ &= \boldsymbol{\beta} + (X^{\top} X)^{-1} X^{\top} \boldsymbol{\epsilon}. \end{aligned}\]Since \(\boldsymbol{\epsilon} \sim \mathcal{N}(0, \sigma^2 \mathbf{I}_n)\), we have
\[\begin{aligned} (X^{\top} X)^{-1} X^{\top} \boldsymbol{\epsilon} &\sim \mathcal{N} \left(0, \sigma^2 (X^{\top} X)^{-1} X^{\top} \mathbf{I}_n X (X^{\top} X)^{-1}\right) \\ &= \mathcal{N} \left(0, \sigma^2 (X^{\top} X)^{-1}\right). \end{aligned}\]It is now clear that the OLS estimates \(\hat{\boldsymbol{\beta}}\) are normally distributed with mean \(\boldsymbol{\beta}\) and variance \(\sigma^2 (X^{\top} X)^{-1}\).
Or equivalently, we have
\[\frac{1}{\sigma} \left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right) \sim \mathcal{N} \left(0, (X^{\top} X)^{-1}\right).\]Recall the unbiased estimator of the variance \(\sigma^2\) is given by
\begin{equation} \label{eq:sigma2} \hat{\sigma^2} = \frac{1}{n - d} \sum_{i=1}^n (y_i - \hat{y}_i)^2 = \frac{1}{n - d} \lVert \mathbf{y} - X \hat{\boldsymbol{\beta}} \rVert_2^2. \end{equation}
We also want to derive the distribution os \(\hat{\sigma^2}\). We can rewrite the residuals as
\[\begin{aligned} \mathbf{y} - X \hat{\boldsymbol{\beta}} &= \mathbf{y} - H \mathbf{y} \\ &= (\mathbf{I}_n - H) \mathbf{y} \\ &= (\mathbf{I}_n - H) (X \boldsymbol{\beta} + \boldsymbol{\epsilon}) \\ &= (\mathbf{I}_n - H) \boldsymbol{\epsilon}, \end{aligned}\]Proposition Let \(\mathbf{x} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_p)\) and let \(A\) be an orthogonal projection matrix with \(\text{rank}(A) = r \leq p\). Then the quadratic form \(Y = \mathbf{x}^{\top} A \mathbf{x}\) follows a chi-squared distribution with \(r\) degrees of freedom, that is, \(Y \sim \chi^2_r\).
Proof. Since \(A\) is an orthogonal projection matrix, \(A\) is symmetric and idempotent, i.e., \(A = A^{\top}\) and \(A^2 = A\). Due to the symmetry of \(A\), we can diagonalize \(A\) as
\[A = Q \Lambda Q^{\top},\]where \(Q\) is an orthogonal matrix and \(\Lambda\) is a diagonal matrix with eigenvalues of \(A\). Since \(A\) is idempotent, the eigenvalues of \(A\) are either 0 or 1. Let \(\lambda_1, \ldots, \lambda_r = 1\) be the non-zero eigenvalues of \(A\) and \(\lambda_{r+1}, \ldots, \lambda_p = 0\) be the zero eigenvalues of \(A\).
Define \(\mathbf{z} = Q^{\top} \mathbf{x}\), then we have \(\mathbf{z} \sim \mathcal{N}(0, Q^{\top} Q) = \mathcal{N}(0, \mathbf{I}_p)\), known as the unitary invariance of the normal distribution.
Thus, we can rewrite the quadratic form as
\[Y = \mathbf{x}^{\top} A \mathbf{x} = \mathbf{z}^{\top} \Lambda \mathbf{z} = \sum_{i=1}^r z_i^2,\]where \(z_i\) are the components of the vector \(\mathbf{z}\). Since \(z_i \sim \mathcal{N}(0, 1)\), we have \(z_i^2 \sim \chi^2_1\). Therefore,
\[Y = \sum_{i=1}^r z_i^2 \sim \sum_{i=1}^r \chi^2_1 = \chi^2_r.\]This completes the proof.
Now we can apply the proposition to derive the distribution of the residuals. Since \(\mathbf{I}_n - H\) is an orthogonal projection matrix with rank \(n - d\), we have
\[\begin{aligned} \frac{(n-d)\hat{\sigma^2}}{\sigma^2} &= \frac{1}{\sigma^2} \lVert \mathbf{y} - X \hat{\boldsymbol{\beta}} \rVert_2^2 \\ &= \frac{1}{\sigma^2} \lVert (\mathbf{I}_n - H) \boldsymbol{\epsilon} \rVert_2^2 \\ &= \frac{1}{\sigma^2} \boldsymbol{\epsilon}^{\top} (\mathbf{I}_n - H) \boldsymbol{\epsilon} \\ &\sim \chi^2_{n - d}. \end{aligned}\]We also want to show that the OLS estimate \(\frac{1}{\sigma} \left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)\) is independent of \(\frac{(n-d)\hat{\sigma^2}}{\sigma^2}\). To show this, we only need to show that \(y-X\hat{\boldsymbol{\beta}}\) is independent of \(\hat{\boldsymbol{\beta}}-\boldsymbol{\beta}\). We have \(y-X\hat{\boldsymbol{\beta}} = (\mathbf{I}_n - H) \boldsymbol{\epsilon}\) and \(\hat{\boldsymbol{\beta}}-\boldsymbol{\beta} = (X^{\top} X)^{-1} X^{\top} \boldsymbol{\epsilon}\).
Then we can compute the covariance between \(\hat{\boldsymbol{\beta}}-\boldsymbol{\beta}\) and \(y-X\hat{\boldsymbol{\beta}}\) as follows:
\[\begin{aligned} \quad & \mathbb{E} \left[ \left( \hat{\boldsymbol{\beta}}-\boldsymbol{\beta} \right) \left( y-X\hat{\boldsymbol{\beta}} \right)^{\top} \right] \\ &= \mathbb{E} \left[ (X^{\top} X)^{-1} X^{\top} \boldsymbol{\epsilon} \boldsymbol{\epsilon}^{\top} (\mathbf{I}_n - H) \right] \\ &= \mathbb{E} \left[ (X^{\top} X)^{-1} X^{\top} (\mathbf{I}_n - H) \sigma^2 \right] \\ &= 0, \end{aligned}\]where the last equality holds because \(X^{\top} (\mathbf{I}_n - H) = 0\). Since both \(\hat{\boldsymbol{\beta}}-\boldsymbol{\beta}\) and \(y-X\hat{\boldsymbol{\beta}}\) are normally distributed, we can conclude that they are independent. Therefore, \(\hat{\boldsymbol{\beta}}-\boldsymbol{\beta}\) and \(\frac{(n-d)\hat{\sigma^2}}{\sigma^2}\) are independent.
Definition (Student’s t-distribution) The Student’s t-distribution with \(\nu\) degrees of freedom is defined as
\[T = \frac{Z}{\sqrt{V/\nu}} \sim t_{\nu},\]where \(Z \sim \mathcal{N}(0, 1)\) and \(V \sim \chi^2_{\nu}\) are independent.
Let \(Z = \frac{1}{\sigma \cdot {(X^{\top} X)^{-1}}_{k,k}} \left(\hat{\beta}_k - \beta_k\right)\), where \(\hat{\beta}_k\) is the \(k\)-th component of the OLS estimate \(\hat{\boldsymbol{\beta}}\) and \({(X^{\top} X)^{-1}}_{k,k}\) is the \(k\)-th diagonal element of the matrix \((X^{\top} X)^{-1}\). We can see that \(Z \sim \mathcal{N}(0, 1)\). Let \(V = \frac{(n-d)\hat{\sigma^2}}{\sigma^2}\). We have shown that \(Z \sim \mathcal{N}(0, 1)\) and \(V \sim \chi^2_{n-d}\) are independent. Therefore, we can conclude that
\[\begin{equation} \label{eq:student} T = \frac{Z}{\sqrt{V/(n-d)}} = \frac{\hat{\beta}_k - \beta_k}{\sqrt{\hat{\sigma^2} {(X^{\top} X)^{-1}}_{k,k}}} \sim t_{n-d}. \end{equation}\]We usually call the denominator \(\sqrt{\hat{\sigma^2} {(X^{\top} X)^{-1}}_{k,k}}\) the standard error of the OLS estimate \(\hat{\beta}_k\), denoted as \(\text{se}(\hat{\beta}_k)\).
Now we can perform a hypothesis test for the \(k\)-th coefficient \(\beta_k\). The null hypothesis and the alternative hypothesis are given by
\[\begin{aligned} H_0: \beta_k &= c, \\ H_a: \beta_k &\neq c. \end{aligned}\]The test statistic is given by
\[T = \frac{\hat{\beta}_k - c}{\text{se}(\hat{\beta}_k)} \sim t_{n-d}.\]We can reject the null hypothesis if the absolute value of the test statistic is larger than the critical value \(t_{n-d, \alpha/2}\), where \(\alpha\) is the significance level. The p-value can be computed as \(p = 2 \mathbb{P}(|T| > t_{n-d, \alpha/2}).\)
Test for Multiple Coefficients (F-test)
References
-
Seber, G. A. F., & Lee, A. J. (2003). Linear regression analysis (2nd ed.). John Wiley & Sons. https://doi.org/10.1002/9780471722199
-
Faraway, J. J. (2025). Linear Models with R (3rd ed.). Chapman and Hall/CRC. ISBN 9781032583983