STAT 211 Topic 1

From Notes
Jump to navigation Jump to search

Lecture 1

Lecture 1 Notes

« previous | Tuesday, January 18, 2011 | next »

Introduction

Please do the following as soon as possible:

What is Statistics

Example: M&Ms

Number of candies in each bag; in particular, how many red?

Science of collecting, classifying, and interpreting data


Vocabulary

population
entire group of interest (normally very big, and potentially more than one!)
EX: all M&Ms
sample
subset of population selected for analysis
EX: M&Ms purchased by students
parameter
fixed unknown number that describes population (what we're trying to figure out)
EX: avg. number of red M&amp:Ms in total production
statistic
number produced from a sample that estimates parameter
this is the goal of statistics in general
EX: avg. number of red M&Ms in sample
variable
any characteristic whose value may change from one object to another in the population
EX: number of red M&Ms in each bag


Interpreting Data: Histograms

bar graph drawn across -axis, where the area of the bars represents the relative frequency of the results:

Height of bar = density

If you add up all the areas in a histogram, the result is 1 (100%)

Note: inclusion of endpoints is very important! The entire shape of histogram can change between a ≤ b < c and a < b ≤ c



Lecture 2

Lecture 2 Notes

Thursday, January 20, 2011

Histograms (cont'd)

Plotting a fit-line over a histogram reveals one of four general shapes:

symmetric
similar on both sides
unimodal
1 maximum
bimodal
2 maxima
multimodal
3+ maxima
positively skewed
low on right side
example: income
negatively skewed
low on left side

Histograms can be described using more than one term.


Measures of Location

Summarizing data with one number

"center" values

  • mean/average:
  • median: "data point in middle"
    1. sort the data
    2. if odd number of data, use data[(n+1)/2] as ordered value
    3. if even number of data, use average of data[n/2] and data[n/2+1]

Example

Data set: { 1, 3, 10, 4, 6 }

  • mean:
  • median:

Suppose we add 100:

  • mean:
  • median:

Mean vs. Median

Any data point that is large or small compared to surrounding values are called outliers

mean is more sensitive to outliers
median is robust in that it is not sensitive to outliers

Going back to histograms

  • symmetric & unimodal: mean = median
  • positively skewed: mean > median
  • negatively skewed: mean < median

Medians occur roughly around the maximum of a histogram.

Percentiles and Quartiles

90th percentile of SAT scores mean that 90% of people who took SAT are below your score and 10% are above.

Quartiles (robust):

  1. Q1 (First Quartile) is 25th percentile
  2. Q2 (Second Quartile) is 50th percentile (=Median)
  3. Q3 (Third Quartile) is 75th percentile
  4. IQR (Interquartile Range or Fourth Spread) = Q3 - Q1

more precise definition of outliers:

any observation that is farther than 1.5 × IQR from Q1 or Q3

Calculation of pth percentile:

  1. Order n values from smallest to largest
  2. calculate product (n*p)/100
  3. if product is not integer, go up to next (ceil())


Variables

Quantitative

Recall that variable is a characteristic or quantity to be measured.

quantitative variables take numerical values that we can manipulate arithmetically

Categorical

Places a unit into one of several categories:

EX: Gender, race, political party

Think of a sample proportion:


Variance

How is the data spread out?

range

Difference between maximum and minimum (max − min)
very sensitive to outliers

sample variance

(for entire data set)

deviation from mean of each item in data set:
calculate sample variance: (don't forget to square units!)
Sample standard deviation (the ± deviation)


Effects of math on Mean and Variance

if a professor added 5 points to everyone's test

  • the mean would increase by 5 points ()
  • deviation and variance would not change

if a company raises the salary by 10%

  • mean would change by 1.1 (
  • variance would change ()
  • standard deviation would change ()


Box Plots

Male and Female Height

Way to show center and spread of data set

Box extends from Q1 to Q3 with line drawn inside at median. Whisks extend from both sides by a length of 1.5 × IQR (can be truncated to max/min of range)

Very useful for comparing two variables


Tuesday, January 25, 2010


Experiments

Types of data collection

Observational study
observe group and measure quantities. Very passive / non-invasive (i.e. does not influence the group)
all terms studied in Lecture 1 are for this group
Experiment
deliberately expose a group to certain environments/treatments and observe responses.

New Vocabulary

Experimental Group
experimental units subjected to a real treatment
Control Group
experimental units subjected to the same conditions as experimental groups, but no treatment is imposed
Confounding effects
nuances between control group and experimental groups
should be avoided when possible.
Blinding
If people in control group know that they are in the control group, then data can be affected
e.g. subjects in medical survey can be given treatment or placebo
Double-blinding
All other people involved in experiment do not know whether subject is in control or experimental group
e.g. doctors in medical survey are not told whether patient is experimental or control