STAT 211 Topic 8

Lecture 16 Notes

« previous | Tuesday, April 5, 2011 | next »

Comparing 2 Samples

In topic 7, we did confidence intervals and hypothesis tests for a single sample with the population. Now we're comparing 2 samples with each other.

Two samples ( $X_{1..m}$ and $Y_{1..n}$ ) must be independent and from separate populations

Look at difference in averages ${\bar {X}}-{\bar {Y}}$ for estimate of difference of means $\mu _{X}-\mu _{Y}$ and compare these with the proposed difference $\Delta _{0}$ :

{\begin{aligned}E({\bar {X}}-{\bar {Y}})&=E({\bar {X}})+E({\bar {Y}})=\mu _{X}-\mu _{Y}\\V({\bar {X}}-{\bar {Y}})&=V({\bar {X}})+V({\bar {Y}})={\frac {\sigma _{X}^{2}}{m}}+{\frac {\sigma _{Y}^{2}}{n}}\end{aligned}}

Cases 1&2: Normal Population or Large Sample

When the sample sizes are large, the distribution of ${\bar {X}}-{\bar {Y}}$ is also Normal.

Use the same methods as in STAT 211 Topic 7, substituting $\mu =\mu _{X}-\mu _{Y}$ , $\mu _{0}=\Delta _{0}$ and $\sigma ^{2}={\tfrac {\sigma _{X}^{2}}{m}}+{\tfrac {\sigma _{Y}^{2}}{n}}$ .

{\begin{aligned}{\bar {X}}-{\bar {Y}}&\sim \mathrm {Normal} \left(\mu _{X}-\mu _{Y},\ {\frac {\sigma _{X}^{2}}{m}}+{\frac {\sigma _{Y}^{2}}{n}}\right)\\z&={\frac {{\bar {X}}-{\bar {Y}}-\Delta _{0}}{\sqrt {{\frac {\sigma _{X}^{2}}{m}}+{\frac {\sigma _{Y}^{2}}{n}}}}}\end{aligned}}

Example

A realtor from the northeast claims that houses are more valuable (higher sales price) than anywhere else in the US.

$\mu _{1}$ : average sales price in Northeast
$\mu _{2}$ : average sales price anywhere but Northeast
$H_{0}:\mu _{1}-\mu _{2}=0$
$H_{a}:\mu _{1}-\mu _{2}>0$
$z={\tfrac {{\bar {X}}-{\bar {Y}}-0}{\sqrt {s_{X}^{2}/m+s_{Y}^{2}/n}}}$

Case 3: Small Sample from Normal Population

We do different things depending on the two population variances (not necessarily known):

{\begin{cases}\sigma _{X}^{2}=\sigma _{Y}^{2}&\mathrm {pooledt-test} \\\sigma _{X}^{2}\neq \sigma _{Y}^{2}&\mathrm {unpooledt-test} \end{cases}}

If smaller σ is greater than half of the bigger σ, then we say that they are the same:

\sigma _{1}>{\frac {\sigma _{2}}{2}}

Pooled Sample Variance

$s_{p}^{2}={\frac {(m-1)s_{X}^{2}+(n-1)s_{Y}^{2}}{n+m-2}}$

Use t-test as normal with df $\nu =n+m-2$

t={\frac {{\bar {X}}-{\bar {Y}}-\Delta _{0}}{s_{p}{\sqrt {{\frac {1}{m}}+{\frac {1}{n}}}}}}

Unpooled Sample Variance

t-test would be normal as expected, but degrees of freedom is more complicated (this is why the pooled test is more common):

{\begin{aligned}t&={\frac {{\bar {X}}-{\bar {Y}}-\Delta _{0}}{s_{p}{\sqrt {{\frac {1}{m}}+{\frac {1}{n}}}}}}\\\nu &={\frac {\left({\frac {s_{X}^{2}}{m}}+{\frac {s_{Y}^{2}}{n}}\right)^{2}}{{\frac {\left(s_{X}^{2}/m\right)^{2}}{m-1}}+{\frac {\left(s_{Y}^{2}/n\right)}{n-1}}}}\end{aligned}}

Paired Data

When two samples are related to each other by a third variable (e.g. mother of two children, student who takes two exams, etc.)

Use paired t-test:

calculate differences between samples: $D_{i}=X_{i}-Y_{i}$
calculate average and standard deviation of the differences
use regular t-test on $t={\tfrac {{\bar {D}}-\Delta _{0}}{s_{D}/{\sqrt {n}}}}$

Confidence Interval

{\bar {D}}\pm t_{\alpha /2,\ n-1}{\frac {s_{D}}{\sqrt {n}}}

Comparing Two Population Proportions

Lecture 17 Notes

Given two sample proportions ${\hat {p}}_{X}$ and ${\hat {p}}_{Y}$ from two different populations

We are interested in the difference between the two proportions (Normal distribution). Therefore, we can standardize and perform a z-test with the following parameters:

H₀: $p_{X}-p_{Y}=\Delta _{0}$
H_a: ${\begin{cases}p_{X}\neq p_{Y}\\p_{X}>p_{Y}\\p_{X}<p_{Y}\end{cases}}$
Test Statistic: $z={\frac {{\hat {p}}_{X}-{\hat {p}}_{Y}-\Delta _{0}}{\sqrt {{\hat {p}}(1-{\hat {p}})\left({\frac {1}{m}}+{\frac {1}{n}}\right)}}}$

The only problem is that we don't know $p_{X}$ and $p_{Y}$ so we estimate it with the second equation.

${\hat {p}}={\frac {m}{m+n}}{\hat {p}}_{1}+{\frac {n}{m+n}}{\hat {p}}_{2}$

Review

(See STAT 211 Topic 3→)

In general, the exact distribution of our sample proportions are Binomial:

${\begin{aligned}{\hat {p}}_{1}&={\frac {\textstyle \sum _{i=1}^{m}X_{i}}{m}}\\{\hat {p}}_{2}&={\frac {\textstyle \sum _{i=1}^{n}Y_{i}}{n}}\end{aligned}}$	${\begin{aligned}\textstyle \sum _{i=1}^{m}X_{i}\sim \mathrm {Bin} (m,p_{1})\\\textstyle \sum _{i=1}^{n}Y_{i}\sim \mathrm {Bin} (m,p_{2})\end{aligned}}$
${\begin{aligned}E\left(\textstyle \sum _{i=1}^{m}X_{i}\right)&=mp_{1}\\E\left(\textstyle \sum _{i=1}^{n}Y_{i}\right)&=mp_{2}\end{aligned}}$	${\begin{aligned}V\left(\textstyle \sum _{i=1}^{m}X_{i}\right)&=mp_{1}(1-p_{1})\\V\left(\textstyle \sum _{i=1}^{n}Y_{i}\right)&=mp_{2}(1-p_{2})\end{aligned}}$

${\hat {p}}_{1}-{\hat {p}}_{2}\sim \mathrm {Normal} \left(p_{1}-p_{2},\ {\frac {p_{1}(1-p_{1})}{m}}+{\frac {p_{2}(1-p_{2})}{n}}\right)$

Comparing Two Variances

Instead of using $\chi ^{2}$ distribution, we use $F$ distribution (F-test).

If we have two samples $X_{1},\ldots ,X_{m}$ and $Y_{1},\ldots ,Y_{n}$

H₀: $\sigma _{X}^{2}=\sigma _{Y}^{2}$
Test statistic: $F={\frac {s_{X}^{2}}{s_{Y}^{2}}}$

Rejection Range:

H_a	Reject H₀ if
$\sigma _{X}^{2}>\sigma _{Y}^{2}$	$F>F_{a,\ m-1,\ n-1}$
$\sigma _{X}^{2}<\sigma _{Y}^{2}$	$F<F_{a,\ m-1,\ n-1}$
$\sigma _{X}^{2}\neq \sigma _{Y}^{2}$	$F>F_{a/2,\ m-1,\ n-1}$ or $F<F_{1-a/2,\ m-1,\ n-1}$