« previous | Thursday, April 14, 2011 | next »
Simple Linear Regression
Predicting linear relationships between two variables  and
 and  :
:  , where rvs
, where rvs  .
.
Goal: Find the line such that the squared distance (so distance values are positive) between the predicted and observed values is minimized.
What the line tells us:
- Estimates/Predicts values (interpolation within range; extrapolation outside of range)
- Describe ratio between two variables (slope)

Regression Model Equation
- β0, β1, and x are assumed to be unknown fixed parameters.
- ε Error variable:  (σ2 is unknown) (σ2 is unknown)
- Therefore  
We have estimates for  (sample data), but we need estimates for
 (sample data), but we need estimates for  and
 and  .
.
Least Squares
Given points  , our regression model is:
, our regression model is:
 
Squared distance between points is given by  . Therefore the sum of squared differences is:
. Therefore the sum of squared differences is:
 
Find  and
 and  such that
 such that  is minimized. (Take derivative, equate to zero, and solve)
 is minimized. (Take derivative, equate to zero, and solve)
Estimated Solutions: (both have normal distribution regardless of sample size)
 
Therefore, the final equation becomes
 
The fitted (predicted) value of  is
 is  
Residual (observed − fitted;  ) can be plotted to check how well the line fits.
) can be plotted to check how well the line fits.
All computer stat systems will return a coefficient of determination  value. This value is always between 0 and 1, and
 value. This value is always between 0 and 1, and  close to 1 implies a better fitting line.
 close to 1 implies a better fitting line.
Tests for β1
Since we don't know σ, we can substitute an estimate  (see below), then the standard deviation of β1 is a T-distribution with df
 (see below), then the standard deviation of β1 is a T-distribution with df  .
.
Therefore, we can do confidence intervals  and t-test for H0: β1 = β10
 and t-test for H0: β1 = β10  
Estimating σ2
Minimum value for  is the Error Sum of Squares:
 is the Error Sum of Squares:
 
To obtain estimate for σ2, we divide SSE by the (remaining) degrees of freedom (after estimating β1 and β2)  :
:
 
ANOVA for Regression
 
| Source | Sum of Squares | Degrees of Freedom | Mean Square | f statistic | 
| Regression Error | SSR | 1 | MSR | MSR / MSE | 
| Error | SSE |  | MSE | 
| Total | SST |   | 
Predicting with Regression
Suppose we have a new value that we want to estimate: x*
2 possible ways to calculate this:
- mean response (only think of value of regression model line):  
- prediction (include error variable ε):  
Basically plugging in a new  value into the model equation to obtain a
 value into the model equation to obtain a  estimate.
 estimate. 
Mean Response
We want to narrow the range and reduce variance by as much as possible, so we disregard the variance of ε.
Calculate 100(1-α)% confidence interval for response with:
 
Prediction
Because the regression model has the ε term, the prediction is actually within a range.
 
Correlation Coefficient
Show how strongly related two random variables are.
 Note: If X and Y are independent, then correlation is 0
 
where Covariance is
 
For a population where we do not know the pdf (ƒ(x, y)), we can estimate the sample correlation coefficient using
