« previous | Tuesday, April 5, 2011 | next »
Comparing 2 Samples
In topic 7, we did confidence intervals and hypothesis tests for a single sample with the population. Now we're comparing 2 samples with each other.
- Two samples (
and
) must be independent and from separate populations
Look at difference in averages
for estimate of difference of means
and compare these with the proposed difference
:
Cases 1&2: Normal Population or Large Sample
When the sample sizes are large, the distribution of
is also Normal.
Use the same methods as in STAT 211 Topic 7, substituting
,
and
.
Example
A realtor from the northeast claims that houses are more valuable (higher sales price) than anywhere else in the US.
: average sales price in Northeast
: average sales price anywhere but Northeast
data:image/s3,"s3://crabby-images/1a2b1/1a2b156c7d491ec82fd6ff8d50cfd181e500e53e" alt="{\displaystyle H_{0}:\mu _{1}-\mu _{2}=0}"
data:image/s3,"s3://crabby-images/9b63e/9b63ee4de265a5b73dc1a809930dd4a2ca82fa94" alt="{\displaystyle H_{a}:\mu _{1}-\mu _{2}>0}"
data:image/s3,"s3://crabby-images/9f311/9f3117fb08c94dcde36ab6bddcbe9e78d019fa8e" alt="{\displaystyle z={\tfrac {{\bar {X}}-{\bar {Y}}-0}{\sqrt {s_{X}^{2}/m+s_{Y}^{2}/n}}}}"
Case 3: Small Sample from Normal Population
We do different things depending on the two population variances (not necessarily known):
If smaller σ is greater than half of the bigger σ, then we say that they are the same:
Pooled Sample Variance
Use t-test as normal with df
Unpooled Sample Variance
t-test would be normal as expected, but degrees of freedom is more complicated (this is why the pooled test is more common):
Paired Data
When two samples are related to each other by a third variable (e.g. mother of two children, student who takes two exams, etc.)
Use paired t-test:
- calculate differences between samples:
data:image/s3,"s3://crabby-images/c2451/c2451410e2abf35afe05dd5b21a2402186886dc6" alt="{\displaystyle D_{i}=X_{i}-Y_{i}}"
- calculate average and standard deviation of the differences
- use regular t-test on
data:image/s3,"s3://crabby-images/1816d/1816d03a8c78aa585daf65423e52865f00ef6991" alt="{\displaystyle t={\tfrac {{\bar {D}}-\Delta _{0}}{s_{D}/{\sqrt {n}}}}}"
Confidence Interval
Comparing Two Population Proportions
Given two sample proportions
and
from two different populations
We are interested in the difference between the two proportions (Normal distribution). Therefore, we can standardize and perform a z-test with the following parameters:
- H0:
data:image/s3,"s3://crabby-images/f9480/f94804355de8613fb1bedb9589b39360f6d1ac66" alt="{\displaystyle p_{X}-p_{Y}=\Delta _{0}}"
- Ha:
data:image/s3,"s3://crabby-images/71766/71766c70fe226156d5172b82fbec56e8234398ef" alt="{\displaystyle {\begin{cases}p_{X}\neq p_{Y}\\p_{X}>p_{Y}\\p_{X}<p_{Y}\end{cases}}}"
- Test Statistic:
data:image/s3,"s3://crabby-images/85de9/85de9f3f4311501c6db9ea3e2ad20b99dc622cc7" alt="{\displaystyle z={\frac {{\hat {p}}_{X}-{\hat {p}}_{Y}-\Delta _{0}}{\sqrt {{\hat {p}}(1-{\hat {p}})\left({\frac {1}{m}}+{\frac {1}{n}}\right)}}}}"
The only problem is that we don't know
and
so we estimate it with the second equation.
Review
(See STAT 211 Topic 3→)
In general, the exact distribution of our sample proportions are Binomial:
|
|
|
|
Comparing Two Variances
Instead of using
distribution, we use
distribution (F-test).
If we have two samples
and
- H0:
data:image/s3,"s3://crabby-images/d9736/d9736a0f2fca1a45f0f7af04b84cb1c48dbcdbe2" alt="{\displaystyle \sigma _{X}^{2}=\sigma _{Y}^{2}}"
- Test statistic:
data:image/s3,"s3://crabby-images/e7fb3/e7fb3c17ebb5fe9a0b91ebd86d622f8e56bb13f2" alt="{\displaystyle F={\frac {s_{X}^{2}}{s_{Y}^{2}}}}"
Rejection Range:
Ha |
Reject H0 if
|
data:image/s3,"s3://crabby-images/39c1f/39c1f03d00250718c024c0b2bd56d08f8f30ca00" alt="{\displaystyle \sigma _{X}^{2}>\sigma _{Y}^{2}}" |
|
data:image/s3,"s3://crabby-images/fa60f/fa60f58ae8ce47892097f54a129f54491463c2e2" alt="{\displaystyle \sigma _{X}^{2}<\sigma _{Y}^{2}}" |
|
data:image/s3,"s3://crabby-images/795cf/795cf946f960c79c20d0f8ea7b4235ea6aa8ec05" alt="{\displaystyle \sigma _{X}^{2}\neq \sigma _{Y}^{2}}" |
or
|
Confidence Interval