1 4 where do we get the data

Post on 13-Dec-2014

408 views 4 download

description

 

transcript

Where do we get the data?

• Census vs sample• Observations

– “Watching” real activity and collecting data

– Opinion polls

• Experiments– Running the activity and measuring the results

– Relatively easy to control

For Example

TV watching and test scores

• Observation– Use a survey that asks your

sampled students their TV watching habits and their test scores.

• Experiment– Design varied TV-watching

schedules for your samples

– Design and/or administer an test to measure learning

Car crashworthiness and make

• Observation– Collect accident data and auto

repair data

• Experiment– Deliberately crash cars and

measure the results

Live Example

Movie popularity

• Observation

• Experiment

Cell Phone Reception

• Observation

• Experiment

Variables

• Variable refers to any characteristic that could effect an outcome being tested.– Variables have to be measureable

• What characteristics affect SAT scores?

• What characteristics affect car crashworthiness?

Varying and Controlling

• In a statistics study, we test if one variable really has an affect on the outcome.

• We will vary the test variable– Change the value to see if the outcome also changes

• To prevent confounding, we will control the other variables– Confounding: The effects of two or more variables can not

be distinguished

– Control: Samples with similar values for the kother variables may be grouped

Treatment

• When running a experiment that tests a variable:– The sample will be split into groups

– Each group will be administered one level of the variable

– Who or what is assigned to each group is randomly determined.

• In some experiments the test variable is all or none.– E.g., a drug

– One group, the treatment group, receives all (called the treatment)

– The other group, the control group, receives nothing or a pretend treatment called a placebo

Placebo Effect

• The subject, but especially the control group, might think they are being given the treatment and start to act accordingly.

• If the experiment is blinded the subjects are not told if they are receiving the real treatment or placebo.– The subjects should also not be told the outcome

• If the experiment is double blinded the people administering the experiment are also not told

Sampling

• Sampling: picking a subset of a population • Sample’s characteristics should reflect the

population’s in the same proportion• E.g., our school’s demographic break-down is

Frosh Sophomore Junior Senior

Male 13% 12% 12% 13%

Female 13% 13% 11% 13%

Sample Scheme Characteristics

• Random sample– Each member of the population has an equal chance to be

selected

• Simple random sample– Each subset a population has an equal change of being

selected.

Sampling Strategies

• Self-selected– Population members volunteer

– E.g., Call-in phone lines

– Easy to implement

– Difficult to get a proportional sample

– Susceptible to bias

• Convenience sampling– Whoever happens by

– E.g., Mall surveys

– Also susceptible to bias

Sampling Strategies

• Random sample– Each member of the population is selected at random

– E.g., Generate random student id’s

• Systematic sampling– Population is put into some order

– Select some starting point, then select every nth individual in a population

– The starting point and maybe the interval (n) are picked at random

More Sampling Selection and Collection

• Stratified sampling– Divide the population into groups.

• Groups are determined by control variables

– Randomly sample within each group

• Cluster sample– Divide the population into clusters, randomly pick a

cluster, then sample all (or most) members of the cluster

Example: Student Opinion Poll

• Self-selecting

• Random sampling

• Systematic sampling

• Convenience sampling

• Stratified sampling

• Cluster sample

Example: Crashworthiness

• Self-selecting

• Random sample

• Systematic sampling

• Convenience sampling

• Stratified sampling

• Cluster sample

Bias

• Sampling members of a population…– With a specific characteristic

– That will give a specific outcome

– “Rigging the game”

• Selection and undercoverage bias– E.g., FOX news and health care

• Non-response bias– Counting non-response as one answer

• Voluntary response bias

More on Bias

• If I want my test to support the claim that watching too much TV hurts SAT scores, how do I rig the sample?

• If I want my test to support the claim that US cars are safer that Japanese cars, how do I rig the sample?