STAT 211 Topic 9

Lecture 18 Notes

« previous | Tuesday, April 12, 2011 | next »

Topic 8 covered comparison of 2 populations based on:

population mean
population proportion
population variable

ANOVA

ANalysis Of VAriance — extension of pooled t-test.

Suppose we have many samples from many populations. Each population has it's own mean… Are all population means equal?

Assumptions:

All populations have normal distributions
All samples are independent
All population variances are equal

We use the F-test due to sums of squares (since $t^{2}=F$ )

Procedure

Set up the following hypothesis test:

H₀: $\mu _{1}=\mu _{2}=\dots =\mu _{I}$
H_a: H₀ is not true. (at least one μ is different)

1. Sums of Squares

Total: $SS_{tot}=\sum _{i,j}(x_{ij}-{\bar {x}})^{2}$
Treatment: $SS_{trt}=\sum _{i,j}(x_{i}-{\bar {x}})^{2}$
Error: $SS_{trt}=\sum _{i,j}(x_{ij}-{\bar {x}}_{i})^{2}$

Where:

$x_{ij}$ represents data of j-th sample subject in i-th population (treatment).
${\bar {x}}_{i}={\tfrac {1}{J}}\textstyle \sum _{j=1}^{J}x_{ij}$ , $J$ represents number of subjects in each sample ( ${\bar {x}}_{i}$ is the average of each population's sample)
${\bar {x}}={\tfrac {1}{IJ}}\textstyle \sum _{i,j}x_{ij}$ , $I$ represents number of treatments or populations

2. Mean Squares

We can show that $SS_{tot}=SS_{trt}+SS_{err}$ :

Treatment Mean Square: $MS_{trt}={\frac {SS_{trt}}{DF_{trt}}}$ , where $DF_{trt}$ is the treatment degrees of freedom: $I-1$
Error Mean Square: $MS_{err}={\frac {SS_{err}}{DF_{err}}}$ , where $DF_{err}$ is the error degrees of freedom: $I(J-1)$

Therefore, $DF_{tot}=DF_{trt}+DF_{err}=IJ-1$

3. Test Statistic

Think about the ratio:

f={\frac {MS_{trt}}{MS_{err}}}\sim F_{DF_{trt},\ DF_{err}}

(f follows F-distribution with df

DF_{trt}

and

DF_{err}

, respectively)

If H₀ is true, then f ≈ 1.

4. ANOVA Table

Source	DF	SS	MS	f
Treatment	$I-1$	$SS_{trt}$	$MS_{trt}={\frac {SS_{trt}}{I-1}}$	$f={\frac {MS_{trt}}{MS_{err}}}$
Error	$I(J-1)$	$SS_{err}$	$MS_{err}={\frac {SS_{err}}{I(J-1)}}$	$f={\frac {MS_{trt}}{MS_{err}}}$
Total	$N-1$	$SS_{tot}$

Perform F-test and reject H₀ if $f>F_{\alpha ,DF_{trt},DF_{err}}$

Example

Study effects of diet pills (four different brands):

Randomly assign 20 women to each of 5 different groups: one for each diet pill and one placebo
Let women take pills for a month and record weight loss (in pounds):

Group	1	2	3	4	5 (placebo)
Average Loss ( ${\bar {x}}$ )	14	12	10	8	6
Standard Deviation ( $s$ )	1.3	1.5	0.8	1.0	1.7

If we wanted to see the effectiveness of one brand, do a pooled t-test between that group and group 5 (placebo):

H₀: $\mu _{1}=\mu _{2}$
H_a: $\mu _{1}>\mu _{2}$

ANOVA Table:

Source	DF	SS	MS	f
Treatment	4	800	200	118.0638
Error	95	160.93	1.694	118.0638
Total	99	960.93

If F_{0.05, 4, 95} = 2.47 and F_{0.01, 4, 95} = 3.52, P-value is almost 0 since 118.0638 > 2.47. Therefore, we find enough evidence to conclude that not all of the means are equal.

Multiple Comparison

ANOVA only tells us whether a population mean differs from the others. To find out which ones are different, we could perform multiple t-tests, but that would throw off our significance level.

Tukey's Procedure

Only works if all treatments contain same number of observations.

Select α and find $Q_{\alpha ,\,I,\,DF_{err}}$ (Q-distribution)
Determine $w=Q_{\alpha ,I,\,DF_{err}}\cdot {\sqrt {\frac {MS_{err}}{J}}}$ , where $J$ is the number of observations per treatment.
List sample means in increasing order. Underline pairs that differ by less than $w$ .
Any pairs not underlined by same line are the ones that are significantly different