STAT 211 Topic 9

From Notes
Jump to navigation Jump to search
Lecture 18 Notes

« previous | Tuesday, April 12, 2011 | next »


Topic 8 covered comparison of 2 populations based on:

  • population mean
  • population proportion
  • population variable


ANOVA

ANalysis Of VAriance — extension of pooled t-test.

Suppose we have many samples from many populations. Each population has it's own mean… Are all population means equal?

Assumptions:

  1. All populations have normal distributions
  2. All samples are independent
  3. All population variances are equal

We use the F-test due to sums of squares (since )


Procedure

Set up the following hypothesis test:

  • H0:
  • Ha: H0 is not true. (at least one μ is different)


1. Sums of Squares

Total
Treatment
Error

Where:

  • represents data of j-th sample subject in i-th population (treatment).
  • , represents number of subjects in each sample ( is the average of each population's sample)
  • , represents number of treatments or populations


2. Mean Squares

We can show that :

Treatment Mean Square
, where is the treatment degrees of freedom:
Error Mean Square
, where is the error degrees of freedom:

Therefore,


3. Test Statistic

Think about the ratio:


(f follows F-distribution with df and , respectively)

If H0 is true, then f ≈ 1.

4. ANOVA Table

Source DF SS MS f
Treatment
Error
Total  

Perform F-test and reject H0 if


Example

Study effects of diet pills (four different brands):

  1. Randomly assign 20 women to each of 5 different groups: one for each diet pill and one placebo
  2. Let women take pills for a month and record weight loss (in pounds):
Group 1 2 3 4 5 (placebo)
Average Loss () 14 12 10 8 6
Standard Deviation () 1.3 1.5 0.8 1.0 1.7

If we wanted to see the effectiveness of one brand, do a pooled t-test between that group and group 5 (placebo):

  • H0:
  • Ha:

ANOVA Table:

Source DF SS MS f
Treatment 4 800 200 118.0638
Error 95 160.93 1.694
Total 99 960.93  

If F0.05, 4, 95 = 2.47 and F0.01, 4, 95 = 3.52, P-value is almost 0 since 118.0638 > 2.47. Therefore, we find enough evidence to conclude that not all of the means are equal.


Multiple Comparison

ANOVA only tells us whether a population mean differs from the others. To find out which ones are different, we could perform multiple t-tests, but that would throw off our significance level.

Tukey's Procedure

Only works if all treatments contain same number of observations.

  1. Select α and find (Q-distribution)
  2. Determine , where is the number of observations per treatment.
  3. List sample means in increasing order. Underline pairs that differ by less than .
  4. Any pairs not underlined by same line are the ones that are significantly different