4 Normal Errors Regression
Thoughts
\[ Y_i = \beta_0 + \beta_1X_i + \epsilon_i \] If we make the assumption that our data is normally distributed, we can improve the correctness of our p-values, critical values, and confidence intervals, especially for smaller samples.
Recall our constraints:
\(E(\epsilon_i=0)\)
\(\epsilon_1...\epsilon_n\) have constant variance
\(\epsilon_1...\epsilon_n\) uncorrelated
\(\epsilon_i\sim N\)
When \(y_1...y_n\) are jointly distributed.
Theorem 4.1 If \(w \sim N(\mu,\sigma)\)
define \(Z = a +bw\), where \(a\) and \(b\) are constants.
Then \(E(a+bw) \rightarrow a+bE(w)\)
\(\therefore Z\sim N(a+b\mu, b^2\sigma^2)\)
Definition 4.1 With normal errors, we can say:
\[Y_i|X_i \sim N(\beta_0+\beta_1X_1, \sigma^2)\] 1. \(E(Y_i|X_i) = \beta_0+\beta_1X_i\)
\(\sigma^2(Y_i)\) is constant
\(Y_1...Y_n\) uncorrelated
\(Y_1...Y_n \sim N\)
\(Y_i|X_i\) will always be fixed in the definition above. All of the variance in the model must come from \(\epsilon_i\), the only random variable.
Why do we assume normal? The sampling distribution for \(b_1\) also changes.
\[ b_1 = \sum k_iy_i \]
The correctness of p-values and confidence intervals, especially with smaller samples, is thanks to the central limit theorem.
If \(Z_1,...Z_n\) joint normal distribution, define \(Y= \Sigma(a+bz) \Rightarrow Y \sim N\)
4.1 Predicted Values
Estimating the expected values of \(y_i\) conditional on \(x_i\):
\[ \hat y_h = b_0+b_1X_h \] allows us to estimate \(\hat{y_h}\) for \(x_h\), even if \(h\) isn’t in the original dataset.
Given \(Y_h\sim N\), our \(\beta\)s will also be normally distributed.
Definition 4.2 (Normal Population Variance) \[\sigma^2 (\hat{Y_h}) = \sigma^2[\frac{1}{n} + \frac{X - \bar X}{\sum(X-\bar X)^2}]\]
Definition 4.3 (Normal Sample Variance) \[s^2 (\hat{Y_h}) = MSE[\frac{1}{n} + \frac{X^h - \bar X}{\sum(X-\bar X)^2}]\]
Definition 4.4 (Normal Conditional CI) Given \(E(Y_h|X_h)\),
\[ \hat y \pm t(1-\frac{\alpha}{2};n-2)s(\hat{Y_h}) \]
- | parameter | statistic |
---|---|---|
variance | \(\sigma^2(\epsilon_i) = 225\) | \(MSE=247\) |
slope | \(\beta_1 = 0.9\) | \(b_1= 0.8\) |
intercept | \(\beta_0 = 90\) | \(b_0 = 91.6\) |
Example 4.1 Construct a \(95\%\) confidence interval for the the blood pressure of a 30-year-old, \(E(Y|X =30)\)
\[\begin{equation} \begin{split} s^2(\hat{Y_h}) &= 247[\frac{1}{20} + \frac{(30-33.15)^2}{6072.55}] \approx 12.75\\ L &= 115.6-2.10\sqrt{12.75} \approx 108\\ U &= 115.6+2.10\sqrt{12.75} \approx 123 \end{split} \end{equation}\]
Returns a 95% confidence interval of \((108,123)\). We are 95% confident that the true value of E(Y) falls in this range.
Things to notice:
- When \(X_h = \bar X\), then
\[\begin{equation} \begin{split} \sigma^2 (\hat{Y_h}) &= \sigma^2[\frac{1}{n} + \frac{ \bar X - \bar X}{\sum(x_i-\bar X)^2}]\\ &= \frac{\sigma^2}{n} \end{split} \end{equation}\]
because \((\bar X,\bar Y )\) will always fall on the regression line.
By extension:
\[ \sigma^2(\hat{Y_h}) = var(\bar y) = \frac{\sigma^2}{n} \]
- \(X\) values that are more spread out are easier to predict.
Our variances \(\sigma^2(\hat{Y_h})\) and \(s^2(\hat{Y_h})\) decrease as \(\sum(x_i-\bar x)^2\) increases.
- We have the least amount of variability near \(\bar{x}\). In the equation
\[ s^2(\hat{Y_h}) = MSE \left[ \frac{1}{n} + \frac{x^h - \bar x}{\sum(x-\bar x)^2}\right] \]
we expect \(s^2(\hat{Y_h})\) to decrease as \(|x^h - \bar x|\) decreases.
There are two different intervals for \(\hat{y}_h\):
A confidence interval for \(E(Y|X_h)\). Here the value returns a range of likely values for the parameter.
A prediction interval, which produces a range of likely values for a new observation \(Y\) for a given \(X_n\).
4.2 Shape
Example 4.2 Given the simulated data \((50,139),(29,150)\) and others to make up \(n=20\). Parameters and estimates from the table below:
- | parameter | statistic |
---|---|---|
variance | \(\sigma^2(\epsilon_i) = 225\) | \(MSE=247\) |
slope | \(\beta_1 = 0.9\) | \(b_1= 0.8\) |
intercept | \(\beta_0 = 90\) | \(b_0 = 91.6\) |
When we sample 10,000 30-year-olds and measure their blood pressures, what shape will the histogram be? What is our mean? Variance?
Answer: The histogram will be bell-shaped, with means and variances closely matching our true parameter values.
The shape of our normal distribution follows some general rules.
- When errors follow a normal distribution, so will our values. This is what gives us the bell-curved shape.
\[\begin{equation}E_i \sim N \Rightarrow Y_i \sim N\end{equation}\]
- The center of our curve will be the mean, unless otherwise adjusted. We expect \(E(\epsilon_i) = 0\) when the distribution is normal, thus yielding the mean:
\[\begin{equation} \begin{split} E(Y|X) = 90+0.9(30)\\ =117 \end{split} \end{equation}\]
- Similarly, follow the rules for variance:
\[\begin{equation} \begin{split} \sigma^2(\epsilon_i) = 225\\ \sigma^2[E(Y|X)] = 225 \end{split} \end{equation}\]
With these general rules for shape established, suppose we want to look at the distribution on the interval \((a,b)\):
\[ E(Y|X_n >30) =117 \]
We can establish a p-value:
\[ P(a < Y_n < b ) = 0.95 \] To shift for a standard normal distribution (\(Z\sim N(0,1)\)), we make the following adjustments:
Center at zero
When
\[ Z = \frac{Y-\mu}{\sigma}, Z \sim N (0,1) \]
then
\[ P(a < Z < b ) = 0.95 \]