« previous | Tuesday, April 5, 2011 | next »
Comparing 2 Samples
In topic 7, we did confidence intervals and hypothesis tests for a single sample with the population. Now we're comparing 2 samples with each other.
- Two samples ( and ) must be independent and from separate populations
Look at difference in averages for estimate of difference of means and compare these with the proposed difference :
Cases 1&2: Normal Population or Large Sample
When the sample sizes are large, the distribution of is also Normal.
Use the same methods as in STAT 211 Topic 7, substituting , and .
Example
A realtor from the northeast claims that houses are more valuable (higher sales price) than anywhere else in the US.
- : average sales price in Northeast
- : average sales price anywhere but Northeast
Case 3: Small Sample from Normal Population
We do different things depending on the two population variances (not necessarily known):
If smaller σ is greater than half of the bigger σ, then we say that they are the same:
Pooled Sample Variance
Use t-test as normal with df
Unpooled Sample Variance
t-test would be normal as expected, but degrees of freedom is more complicated (this is why the pooled test is more common):
Paired Data
When two samples are related to each other by a third variable (e.g. mother of two children, student who takes two exams, etc.)
Use paired t-test:
- calculate differences between samples:
- calculate average and standard deviation of the differences
- use regular t-test on
Confidence Interval
Comparing Two Population Proportions
Given two sample proportions and from two different populations
We are interested in the difference between the two proportions (Normal distribution). Therefore, we can standardize and perform a z-test with the following parameters:
- H0:
- Ha:
- Test Statistic:
The only problem is that we don't know and so we estimate it with the second equation.
Review
(See STAT 211 Topic 3→)
In general, the exact distribution of our sample proportions are Binomial:
|
|
|
|
Comparing Two Variances
Instead of using distribution, we use distribution (F-test).
If we have two samples and
- H0:
- Test statistic:
Rejection Range:
Ha |
Reject H0 if
|
|
|
|
|
|
or
|
Confidence Interval