Chapter 12 Sample Surveys Producing Valid Data “If you don’t believe in random sampling, the next time you have a blood test tell the doctor to take it all.”
Transcript
Slide 1
Slide 2
Chapter 12 Sample Surveys Producing Valid Data If you dont
believe in random sampling, the next time you have a blood test
tell the doctor to take it all.
Slide 3
The election of 1948 The Predictions The Candidates Crossley
Gallup Roper The Results Truman 45443850 Dewey 50505345
Slide 4
Beyond the Data at Hand to the World at Large H We have learned
ways to display, describe, and summarize data, but have been
limited to examining the particular collection of data we have. H
Wed like (and often need) to stretch beyond the data at hand to the
world at large. H Lets investigate three major ideas that will
allow us to make this stretch
Slide 5
3 Key Ideas That Enable Us to Make the Stretch
Slide 6
Idea 1: Examine a Part of the Whole H The first idea is to draw
a sample. Wed like to know about an entire population of
individuals, but examining all of them is usually impractical, if
not impossible. We settle for examining a smaller group of
individualsa sampleselected from the population.
Slide 7
Examples 1.Think about sampling something you are cookingyou
taste (examine) a small part of what youre cooking to get an idea
about the dish as a whole. 2.Opinion polls are examples of sample
surveys, designed to ask questions of a small group of people in
the hope of learning something about the entire population.
Slide 8
Convenience sampling: Just ask whoever is around. Example: Man
on the street survey (cheap, convenient, often quite opinionated or
emotional => now very popular with TV journalism) H Which
people, and on which street? Ask about gun control or legalizing
marijuana on the street in Berkeley or in some small town in Idaho
and you would probably get totally different answers. Even within
an area, answers would probably differ if you did the survey
outside a high school or a country western bar. Bias: Opinions
limited to individuals present. Sampling methods
Slide 9
Voluntary Response Sampling: H Individuals choose to be
involved. These samples are very susceptible to being biased
because different people are motivated to respond or not. Often
called public opinion polls. These are not considered valid or
scientific. H Bias: Sample design systematically favors a
particular outcome. Ann Landers summarizing responses of readers
70% of (10,000) parents wrote in to say that having kids was not
worth itif they had to do it over again, they wouldnt. Bias: Most
letters to newspapers are written by disgruntled people. A random
sample showed that 91% of parents WOULD have kids again.
Slide 10
CNN on-line surveys: Bias: People have to care enough about an
issue to bother replying. This sample is probably a combination of
people who hate wasting the taxpayers money and animal lovers.
Slide 11
Bias Bias is the bane of samplingthe one thing above all to
avoid. There is usually no way to fix a biased sample and no way to
salvage useful information from it. The best way to avoid bias is
to select individuals for the sample at random. The value of
deliberately introducing randomness is one of the great insights of
Statistics Idea 2
Slide 12
Idea 2: Randomize Randomization can protect you against factors
that you know are in the data. It can also help protect against
factors you are not even aware of. Randomizing protects us from the
influences of all the features of our population, even ones that we
may not have thought about. Randomizing makes sure that on the
average the sample looks like the rest of the population
Slide 13
Idea 2: Randomize (cont.) Individuals are randomly selected. No
one group should be over- represented. Sampling randomly gets rid
of bias. Random samples rely on the absolute objectivity of random
numbers. There are tables and books of random digits available for
random sampling. Statistical software can generate random digits
(e.g., Excel =random(), ran# button on calculator).
Slide 14
Idea 2: Randomize (cont.) H Not only does randomizing protect
us from bias, it actually makes it possible for us to draw
inferences about the population when we see only a sample.
Slide 15
Example: selecting a random sample H Listed in the table are
the names of the 20 pharmacists on the hospital staff. Use the
random numbers listed below to select three of them to be in the
sample. H 04905 83852 29350 91397 19994 65142 05087 11232
Slide 16
Idea 3: Its the Sample Size!! How large a random sample do we
need for the sample to be reasonably representative of the
population? Its the size of the sample, not the size of the
population, that makes the difference in sampling. Exception: If
the population is small enough and the sample is more than 10% of
the whole population, the population size can matter. The fraction
of the population that youve sampled doesnt matter. Its the sample
size itself thats important.
Slide 17
Example i) In the city of Chicago, Illinois, 1,000 likely
voters are randomly selected and asked who they are going to vote
for in the Chicago mayoral race. ii) In the state of Illinois,
1,000 likely voters are randomly selected and asked who they are
going to vote for in the Illinois governor's race. iii) In the
United States, 1,000 likely voters are randomly selected and asked
who they are going to vote for in the presidential election. Which
survey has more accuracy? All the surveys have the same
accuracy
Slide 18
Idea 3: Its the Sample Size!! H Chicken soup H Blood
samples
Slide 19
Does a Census Make Sense? Why bother worrying the sample size?
Wouldnt it be better to just include everyone and sample the entire
population? Such a special sample is called a census.
Slide 20
Does a Census Make Sense? (cont.) There are problems with
taking a census: Practicality: It can be difficult to complete a
census there always seem to be some individuals who are hard to
locate or hard to measure. Timeliness: populations rarely stand
still. Even if you could take a census, the population changes
while you work, so its never possible to get a perfect measure.
Expense: taking a census may be more complex than sampling.
Accuracy: a census may not be as accurate as a good sample due to
data entry error, inaccurate (made-up?) data, tedium.
Slide 21
Population versus sample Population: The entire group of
individuals in which we are interested but cant usually assess
directly. Example: All humans, all working-age people in
California, all crickets A parameter is a number describing a
characteristic of the population. Sample: The part of the
population we actually examine and for which we do have data. How
well the sample represents the population depends on the sample
design. A statistic is a number describing a characteristic of a
sample. Population Sample
Slide 22
Sample Statistics Estimate Parameters Values of population
parameters are unknown; in addition, they are unknowable. Example:
The distribution of heights of adult females (at least 18 yrs of
age) in the United States is approximately symmetric and
mound-shaped with mean . is a population parameter whose value is
unknown and unknowable The heights of 1500 females are obtained
from a sample of government records. The sample mean x of the 1500
heights is calculated to be 64.5 inches. The sample mean x is a
sample statistic that we use to estimate the unknown population
parameter
Slide 23
We typically use Greek letters to denote parameters and Latin
letters to denote statistics.
Slide 24
Slide 25
Simple Random Sample H A simple random sample (SRS) of size n
consists of n units from the population chosen in such a way that
every set of n units has an equal chance to be the sample actually
selected.
Slide 26
Simple Random Samples (cont.) To select a sample at random, we
first need to define where the sample will come from. The sampling
frame is a list of individuals from which the sample is drawn.
E.g., To select a random sample of students from a college, we
might obtain a list of all registered full-time students. When
defining sampling frame, must deal with details defining the
population; are part-time students included? How about current
study-abroad students? Once we have our sampling frame, the easiest
way to choose an SRS is with random numbers.
Slide 27
Warning! If some members of the population are not included in
the sampling frame, they cannot be part of the sample!! (e. g.,
using a telephone book as the sampling frame) Population: Wal Mart
shoppers Sampling frame?
Slide 28
Example: simple random sample H Academic dept wishes to
randomly choose a 3-member committee from the 28 members of the
dept 00 Abbott07 Goodwin14 Pillotte21 Theobald 01 Cicirelli08
Haglund15 Raman22 Vader 02 Crane09 Johnson16 Reimann23 Wang 03
Dunsmore10 Keegan17 Rodriguez24 Wieczoreck 04 Engle11 Lechtenbg 18
Rowe25 Williams 05 Fitzpatk12 Martinez19 Sommers26 Wilson 06
Garcia13 Nguyen20 Stone27 Zink
Slide 29
Solution Use a random number table; read 2-digit pairs until
you have chosen 3 committee members For example, if a row of a
random number table is 76509 47069 86378 41797 11910 49672 88575
Rodriguez (17) Lechtenberg (11) Engle (04) Your calculator
generates random numbers; you can also generate random numbers
using Excel
Slide 30
Sampling Variability Suppose we had used row 19689 90332 04315
21358 97248 11188 39062 Our sample would have been 19 Summers, 03
Dunsmore, 04 Engle
Slide 31
Sampling Variability Samples drawn at random generally differ
from one another. Each draw of random numbers selects different
people for our sample. These differences lead to different values
for the variables we measure. We call these sample-to-sample
differences sampling variability. Variability is OK; bias is
bad!!
Slide 32
Slide 33
H This sampling procedure separates the population into
mutually exclusive sets (strata), and then selects simple random
samples from each stratum. Sex Male Female Age under 20 20-30 31-40
41-50 Occupation professional clerical blue-collar Stratified
Random Sampling
Slide 34
H With this procedure we can acquire information about the
whole population each stratum the relationships among strata.
Stratified Random Sampling
Slide 35
There are several ways to build the stratified sample. For
example, keep the proportion of each stratum in the population. A
sample of size 1,000 is to be drawn Stratum Income Population
proportion 1 under $15,000 25% 250 2 15,000-29,999 40% 400 3
30.000-50,00030%300 4over $50,000 5% 50 Stratum size Total
1,000
Slide 36
Cluster Sampling Sometimes stratifying isnt practical and
simple random sampling is difficult. Splitting the population into
similar parts or clusters can make sampling more practical. Then we
could select one or a few clusters at random and perform a census
within each cluster. This sampling design is called cluster
sampling. If each cluster fairly represents the full population,
cluster sampling will give us an unbiased sample.
Slide 37
Cluster Sampling Useful When it is difficult and costly to
develop a complete list of the population members (making it
difficultto develop a simple random sampling procedure.) e.g., all
items sold in a grocery store the population members are widely
dispersed geographically. e.g., all Toyota dealerships in North
Carolina
Slide 38
Mean length of sentences in our course text We would like to
assess the reading level of our course text based on the length of
the sentences. Simple random sampling would be awkward: number each
sentence in the book? Better way: choose a few pages at random (the
pages are the clusters, and it's reasonable to assume that each
page is representative of the entire text). count the length of the
sentences on those pages
Slide 39
Cluster sampling - not the same as stratified sampling!! We
stratify to ensure that our sample represents different groups in
the population, and sample randomly within each stratum. Clusters
are more or less alike, each heterogeneous and resembling the
overall population. We select clusters to make sampling more
practical or affordable. We conduct a census on or select a SRS
from each selected cluster. Strata are homogenous (e.g., male,
female) but differ from one another
Slide 40
Multistage Sampling Sometimes we use a variety of sampling
methods together. Sampling schemes that combine several methods are
called multistage samples. Most surveys conducted by professional
polling organizations and government agencies use some combination
of stratified and cluster sampling as well as simple random
sampling.
Slide 41
Example: The American Community Survey The American Community
Survey (ACS) is an ongoing survey information from the survey
generates data that help determine how more than $400 billion in
federal and state funds are distributed each year. combined into
statistics that are used to help decide everything from school
lunch programs to new hospitals. http://www.census.gov/acs/www/
http://www.census.gov/acs/www/
Slide 42
Mean length of sentences in our course text, cont. In
attempting to assess the reading level of our course text: we might
worry that it starts out easy and gets harder as the concepts
become more difficult we want to avoid samples that select too
heavily from early or from late chapters Suppose our course text
has 5 sections, with several chapters in each section.
Slide 43
Mean length of sentences in our course text, cont. We could: i)
randomly select 1 chapter from each section ii) randomly select a
few pages from each of the selected chapters iii) if altogether
this makes too many sentences, we could randomly select a few
sentences from each page. So what is our sampling strategy? i) we
stratify by section of the book ii) we randomly choose a chapter to
represent each stratum (section) iii) within each chapter we
randomly choose pages as clusters iv) finally, we choose an SRS of
sentences within each cluster
Slide 44
Systematic Sampling Sometimes we draw a sample by selecting
individuals systematically. For example, you might survey every
10th person on an alphabetical list of students. To make it random,
you must still start the systematic selection from a randomly
selected individual. When there is no reason to believe that the
order of the list could be associated in any way with the responses
sought, systematic sampling can give a representative sample.
Systematic sampling can be much less expensive than true random
sampling. When you use a systematic sample, you need to justify the
assumption that the systematic method is not associated with any of
the measured variables.
Slide 45
Systematic Sampling-example You want to select a sample of 50
students from a college dormitory that houses 500 students. On a
list of all students living in the dorm, number the students from
001 to 500. Generate a random number between 001 and 010, and start
with that student. Every 10th student in the list becomes part of
your sample. Questions: 1) does each student have an equal chance
to be in the sample? 2) what is the chance that a student is
included in the sample? 3) is this an SRS?