STAT 211 Topic 9
« previous | Tuesday, April 12, 2011 | next »
Topic 8 covered comparison of 2 populations based on:
- population mean
- population proportion
- population variable
ANOVA
ANalysis Of VAriance — extension of pooled t-test.
Suppose we have many samples from many populations. Each population has it's own mean… Are all population means equal?
Assumptions:
- All populations have normal distributions
- All samples are independent
- All population variances are equal
We use the F-test due to sums of squares (since )
Procedure
Set up the following hypothesis test:
- H0:
- Ha: H0 is not true. (at least one μ is different)
1. Sums of Squares
- Total
- Treatment
- Error
Where:
- represents data of j-th sample subject in i-th population (treatment).
- , represents number of subjects in each sample ( is the average of each population's sample)
- , represents number of treatments or populations
2. Mean Squares
We can show that :
- Treatment Mean Square
- , where is the treatment degrees of freedom:
- Error Mean Square
- , where is the error degrees of freedom:
Therefore,
3. Test Statistic
Think about the ratio:
(f follows F-distribution with df and , respectively)
If H0 is true, then f ≈ 1.
4. ANOVA Table
Source | DF | SS | MS | f |
---|---|---|---|---|
Treatment | ||||
Error | ||||
Total |
Perform F-test and reject H0 if
Example
Study effects of diet pills (four different brands):
- Randomly assign 20 women to each of 5 different groups: one for each diet pill and one placebo
- Let women take pills for a month and record weight loss (in pounds):
Group | 1 | 2 | 3 | 4 | 5 (placebo) |
---|---|---|---|---|---|
Average Loss () | 14 | 12 | 10 | 8 | 6 |
Standard Deviation () | 1.3 | 1.5 | 0.8 | 1.0 | 1.7 |
If we wanted to see the effectiveness of one brand, do a pooled t-test between that group and group 5 (placebo):
- H0:
- Ha:
ANOVA Table:
Source | DF | SS | MS | f |
---|---|---|---|---|
Treatment | 4 | 800 | 200 | 118.0638 |
Error | 95 | 160.93 | 1.694 | |
Total | 99 | 960.93 |
If F0.05, 4, 95 = 2.47 and F0.01, 4, 95 = 3.52, P-value is almost 0 since 118.0638 > 2.47. Therefore, we find enough evidence to conclude that not all of the means are equal.
Multiple Comparison
ANOVA only tells us whether a population mean differs from the others. To find out which ones are different, we could perform multiple t-tests, but that would throw off our significance level.
Tukey's Procedure
Only works if all treatments contain same number of observations.
- Select α and find (Q-distribution)
- Determine , where is the number of observations per treatment.
- List sample means in increasing order. Underline pairs that differ by less than .
- Any pairs not underlined by same line are the ones that are significantly different