STAT 211 Topic 1
Lecture 1
« previous | Tuesday, January 18, 2011 | next »
Introduction
Please do the following as soon as possible:
- Print and sign syllabus; bring on Thursday, January 20, 2010
- Send email with Bio and Picture to stat211.jun@gmail.com;
- Create account on and upload same picture to http://dl.stat.tamu.edu/dostat
- Verify Textbook: Miller and Freund’s Probability and Statistics for Engineers (8th ed)
What is Statistics
Example: M&Ms
- Number of candies in each bag; in particular, how many red?
Science of collecting, classifying, and interpreting data
Vocabulary
- population
- entire group of interest (normally very big, and potentially more than one!)
- EX: all M&Ms
- sample
- subset of population selected for analysis
- EX: M&Ms purchased by students
- parameter
- fixed unknown number that describes population (what we're trying to figure out)
- EX: avg. number of red M&:Ms in total production
- statistic
- number produced from a sample that estimates parameter
- this is the goal of statistics in general
- EX: avg. number of red M&Ms in sample
- variable
- any characteristic whose value may change from one object to another in the population
- EX: number of red M&Ms in each bag
Interpreting Data: Histograms
bar graph drawn across -axis, where the area of the bars represents the relative frequency of the results:
Height of bar = density
If you add up all the areas in a histogram, the result is 1 (100%)
Lecture 2
Thursday, January 20, 2011
Histograms (cont'd)
Plotting a fit-line over a histogram reveals one of four general shapes:
- symmetric
- similar on both sides
- unimodal
- 1 maximum
- bimodal
- 2 maxima
- multimodal
- 3+ maxima
- positively skewed
- low on right side
- example: income
- negatively skewed
- low on left side
Histograms can be described using more than one term.
Measures of Location
Summarizing data with one number
"center" values
- mean/average:
- median: "data point in middle"
- sort the data
- if odd number of data, use data[(n+1)/2] as ordered value
- if even number of data, use average of data[n/2] and data[n/2+1]
Example
Data set: { 1, 3, 10, 4, 6 }
- mean:
- median:
Suppose we add 100:
- mean:
- median:
Mean vs. Median
Any data point that is large or small compared to surrounding values are called outliers
- mean is more sensitive to outliers
- median is robust in that it is not sensitive to outliers
Going back to histograms
- symmetric & unimodal: mean = median
- positively skewed: mean > median
- negatively skewed: mean < median
Medians occur roughly around the maximum of a histogram.
Percentiles and Quartiles
90th percentile of SAT scores mean that 90% of people who took SAT are below your score and 10% are above.
Quartiles (robust):
- Q1 (First Quartile) is 25th percentile
- Q2 (Second Quartile) is 50th percentile (=Median)
- Q3 (Third Quartile) is 75th percentile
- IQR (Interquartile Range or Fourth Spread) = Q3 - Q1
more precise definition of outliers:
- any observation that is farther than 1.5 × IQR from Q1 or Q3
Calculation of pth percentile:
- Order n values from smallest to largest
- calculate product (n*p)/100
- if product is not integer, go up to next (ceil())
Variables
Quantitative
Recall that variable is a characteristic or quantity to be measured.
quantitative variables take numerical values that we can manipulate arithmetically
Categorical
Places a unit into one of several categories:
- EX: Gender, race, political party
Think of a sample proportion:
Variance
How is the data spread out?
range
- Difference between maximum and minimum (max − min)
- very sensitive to outliers
sample variance
(for entire data set)
- deviation from mean of each item in data set:
- calculate sample variance: (don't forget to square units!)
- Sample standard deviation (the ± deviation)
Effects of math on Mean and Variance
if a professor added 5 points to everyone's test
- the mean would increase by 5 points ()
- deviation and variance would not change
if a company raises the salary by 10%
- mean would change by 1.1 (
- variance would change ()
- standard deviation would change ()
Box Plots
Way to show center and spread of data set
Box extends from Q1 to Q3 with line drawn inside at median. Whisks extend from both sides by a length of 1.5 × IQR (can be truncated to max/min of range)
Very useful for comparing two variables
Tuesday, January 25, 2010
Experiments
Types of data collection
- Observational study
- observe group and measure quantities. Very passive / non-invasive (i.e. does not influence the group)
- all terms studied in Lecture 1 are for this group
- Experiment
- deliberately expose a group to certain environments/treatments and observe responses.
New Vocabulary
- Experimental Group
- experimental units subjected to a real treatment
- Control Group
- experimental units subjected to the same conditions as experimental groups, but no treatment is imposed
- Confounding effects
- nuances between control group and experimental groups
- should be avoided when possible.
- Blinding
- If people in control group know that they are in the control group, then data can be affected
- e.g. subjects in medical survey can be given treatment or placebo
- Double-blinding
- All other people involved in experiment do not know whether subject is in control or experimental group
- e.g. doctors in medical survey are not told whether patient is experimental or control