Date post: | 13-Apr-2017 |
Category: |
Education |
Upload: | jason-edington |
View: | 75 times |
Download: | 0 times |
Chapter 1 (Part 2)Sampling and Data
DOT PLOT (FIGURE 1.2, PAGE 6)
This is an example of a Dot Plot, which is a simple graph used to quickly organize and look at data.
What does this Dot Plot tell you about this sample?
PIE CHART (FIGURE 1.3, PAGE 13)
This is an example of a Pie Chart featuring qualitative data.
Why is this data qualitative?
HISTOGRAM (FIGURE 1.4, PAGE 13)
Here is an example of a Histogram. The class boundaries 10 to less than 13, 13 to less than 16, 16 to less than 19, 19 to less than 22, and 22 to less than 25.
Approximately how many students have completed 16 or fewer credit hours?
WHICH GRAPH EXPLAINS IT BEST? (FIGURES 1.5, 1.6, PAGES14, 15)It is a good idea to look at a variety of graphs to see which is the most helpful in displaying data.
Take a look at these two graphs. Both are made from the same information.
Which graph (pie or bar) you think displays the comparisons better?
DISTRIBUTION OF SPRING 2016 ENROLLED STUDENTS BY ETHNICITY FOR MENDOCINO COLLEGE
Spring 2016 Students Frequency PercentAmerican Indian/ Alaskan Native 227 5.3%
Asian 125 2.9%
Black, Non-Hispanic 117 2.8%
Hispanic 1,346 31.6%
Native Hawaiian/ Pacific Islander 15 0.4%
White, Non-Hispanic 2,397 56.3%
Multi-Ethnicity 9 0.2%
Unknown/No Response 21 0.5%
Total
CHART SORTED ALPHABETICALLY
Here, the chart is sorted alphabetically according to ethnicity.
American Indian/ Alaskan Native
Asian Black, Non-Hispanic Hispanic Multi-Ethnicity Native Hawaiian/ Pacific Islander
Unknown/No Response
White, Non-Hispanic0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
5.3%2.9% 2.7%
31.6%
0.2% 0.4% 0.5%
56.3%
Ethnicity of Students, Spring 2016
PARETO CHART
Here, we have the same information, but sorted from largest to smallest, making it easier to read and interpret.
White, Non-Hispanic Hispanic American Indian/ Alaskan Native
Asian Black, Non-Hispanic Unknown/No Response
Native Hawaiian/ Pacific Islander
Multi-Ethnicity0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0% 56.3%
31.6%
5.3%2.9% 2.7%
0.5% 0.4% 0.2%
Ethnicity of Students, Spring 2016
PIE CHART
Again, the chart is sorted alphabetically.
5.3% 2.9%2.7%
31.6%
0.2%0.4%0.5%
56.3%
Ethnicity of Students, Spring 2016
American Indian/ Alaskan NativeAsianBlack, Non-HispanicHispanicMulti-EthnicityNative Hawaiian/ Pacific IslanderUnknown/No ResponseWhite, Non-Hispanic
PIE CHART
Here, the legend is alphabetical, but the pie wedges are organized by size, making it a more visually informative graph.
56.3%31.6%
5.3%2.9% 2.7% 0.5% 0.4% 0.2%
Ethnicity of Students, Spring 2016
White, Non-HispanicHispanicAmerican Indian/ Alaskan NativeAsianBlack, Non-HispanicUnknown/No ResponseNative Hawaiian/ Pacific IslanderMulti-Ethnicity
Statistics• Last class, we covered a lot of the basic words used in
descriptive statistics.• The part of statistics where we describe things.• Organizing and summarizing data
• Graphing• Using numbers (finding an average, for example.)
• We’ll also be looking at Inferential Statistics• Formal methods used for drawing conclusions from ‘good’ data.
• “If you thoroughly grasp the basics of statistics, you can be more confident in the decisions you make in life.”
• We’re interested in more than just what is, and what data we’ve been able to gather.• We want to draw conclusions and make predictions in general.
• Suppose we know that 460 of 1000 people we asked said they are going to vote yes on Proposition X.• Can we say that 46% of the entire population will vote yes for this
proposition?• The data set consisting of the 1000 yes’s and no’s that we collected is
called a sample.• Any number that describes this sample (for instance that 46% are yes’s)
is called a statistic.
Statistics
Inference• In the case of Proposition X, the population is the entire
potential collection of yes’s and no’s.• A parameter is a number that is a property of the population.• Samples have Statistics; Populations have Parameters.
• It might seem tempting to say that they’re equal, they have totally different significances.• Thinking of our 46% - Samples and Statistics are realities (we counted them),
Populations and Parameters are guesses (we’re guessing how many of the entire population will vote yes.)
Inference• An inference is a conclusion that you draw from information
you receive. • Inferential statistics takes information from a sample and
applies it to the population.• Population – a collection of persons, things, or objects under study.• Sample – a portion (or subset) of the population, used to study and gain
information about the population.• Data are the result of sampling from a population.
Probability• We calculate how likely it is that 46% of the entire electorate
actually votes yes on Proposition X? • That is, assuming we didn’t ask our question at the headquarters of
‘vote yes on Proposition X’, or ‘vote no on Proposition X’.• Or how close to this 46% can we expect?
• Probability is a mathematical tool used to study randomness. • It deals with the chance (or likelihood) of an even occurring.• Probability is the tool used to transition from descriptive statistics to
inferential statistics.
Probability• If you toss a fair coin four times, the outcomes may not
be two heads and two tails.• However, if you toss the same coin 4,000 times, the
outcomes will be close to half heads and half tails• The expected theoretical probability of heads in any one
toss is ½ or 0.5.• The theory of probability began with the study of games
of chance such as poker.• In your study of statistics, you will use the power of
mathematics through probability calculations to analyze and interpret your data.
Other Key Terms• Data – actual values of the variable• Datum – a single value• Mean – the weighted average (sum of data divided by quantity
of data)• Proportion or ratio –
Sampling• Gathering information about an entire population often costs
too much or is not possible. • Instead, we use a sample of the population.• A sample should have the same characteristics as the
population it is representing.• Most statisticians use various methods of random sampling in an
attempt to achieve this goal.
Sampling Techniques• Simple Random Sample – every
member of a population has an equal chance of being selected for the sample.• Suppose you want to form a study group
with three other people.• To choose a simple random sample of
size three from the other members of your class, assign each member of the class a number from 0 – 30 (assuming 31 in your class, and you are leaving yourself out.)
Sampling Techniques• Now, using your TI-83 or TI-84, go to MATH (left side, three buttons down)
• Use your left arrow to go to PRB (probability)• And either scroll down to 5:randint( , or simply type in 5.• Now, enter 0, 30) and hit enter 3 times. (If you get a multiple of the same
number, hit enter again.)
Sampling Techniques• There are other ways to do this sort of a simple random
sample• There are tables of random numbers• You could put each persons name on a slip of paper and put them into a
hat
Sampling Techniques• Simple Random Sample – every member of a population has
an equal chance of being selected for the sample.• This is the preferred method of sampling, but how do you actually
accomplish that?• Say you’re doing a poll. • You assign a random number to each registered voter, select a certain number
of these numbers, and contact those voters.• Some of them aren’t home• Some of them have had their phones disconnected• Some of them won’t talk to you• Sometimes the phone is answered by a three-year-old.
Sampling Techniques• Another technique is called systematic sampling.
• You could pick every tenth number in the phone book for instance.• However, even if every person answered the phone and your question,
you still wouldn’t have a truly random sample.• People with the same name would have their chances of being selected
greatly decreased, since you are going every ten.• Cell phones• People with no phones at all• Still, it’s less work than generating and assigning random numbers, and not at
all a bad way to get close to a random sample.
Sampling Techniques• Another technique is called stratified
sampling.• Think of strata, or layers, of a rock.• Say you want to know how many units students
are taking, but you want to be sure that you get a fairly equal number of men and women in your sample.
• Instead of getting a random sample from the whole group, first split the groups into two sexes and then get a random sample from each group.
• You can do this with more than two groups as well• Suppose you want to make sure you get a fairly equal
number of people born in the 1960’s, in the 1970’s, the 1980’s, and the 1990’s.
Sampling Techniques• Another technique is called cluster sampling.
• This is actually how unemployment figures are estimated, and it’s often used in evaluating medical techniques.
• Select, at random, a bunch of neighborhoods, or hospitals, and get the information about every person or procedure in the selected parts.
• This is the reverse of stratifying, where you split first and then randomly select;• Here, you randomly select the groups and then try to
do a census of the groups chosen.• Census – the information about the entire population.• Why not just do a Census? Well – it’s very expensive, and
not always possible.
Sampling Techniques• What you’d like to avoid is a convenience sample.
• Using results that are readily available or easily gathered.• You go online to see how much it’s going to rain in the next week. At the bottom of the
page is a poll asking ‘Do you believe that the government can stop climate change?’• The problem here is, who is going to answer?
• Self Selecting• Care about the question/answer• Triggered by the idea of climate change, or government intervention
• We won’t know which groups, the yes’s or no’s, will respond more, but we know it’s not in any way a random response.
Replacement• True random sampling is done with replacement.
• Once a member is picked, that member goes back into the population and thus may be chosen once more.
• However, for practical reasons, in most populations, simple random sampling is done without replacement.• Most samples are taken from large populations and the sample tends to be
small in comparison to the population.• Since this is the case, sampling without replacement is approximately the
same as sampling with replacement because the chance of picking the same individual more than once with replacement is very low.
• Sampling without replacement becomes a mathematical issue only when the population is small.
Which of these are representative samples?
• To find the average GPA of all students, use all honor students as the sample.
• To find out the most popular cereal among young people under the age of ten, stand outside a large supermarket for three hours and speak to every twentieth child under age ten who enters the store.
• To determine the proportion of people taking public transportation to work, survey 20 people in New York City. Conduct the survey by sitting in Central Park on a bench and interview every person who sits next to you.
• To determine the average cost of a two-day stay in a hospital in Massachusetts, survey 100 hospitals across the state using simple random sampling.
Variation in Data• Variation is present in any set of data.• Suppose you have a six pack of 12 ounce beverages.• If you were to actually measure the amount in each
can, you might get the following amount (of ounces) in each of the cans:• 11.8, 12.1, 11.2, 10.8, 12, 12.3
• Measurements may vary because different people make the measurements, or because the exact amount, 12 ounces, was not put into the cans.
• Manufacturers regularly run tests to determine if the amount of beverage in a 12-ounce can falls within the desired range. (Quality Control.)
Variation in Data• Your data may vary from
someone else’s data.• This is natural.• If the results are very different, you
both should probably reevaluate your data-taking methods and your accuracy.
Variation in Samples• Two or more samples from the same population, taken randomly,
and having close to the same characteristics of the population will likely be different from each other.
• Suppose you and your study partner each decide to study the average amount of time Mendocino College students study per week.
• You decide that you will only choose from full time students.• Even if you use the same sampling technique, and the same
sample size (say, 100 students), it is highly likely that your samples will be different.• Neither would be wrong, though.
Variation in Samples• What contributes to making your samples different?• If you both took larger samples (say, 200 students), your
sample results (the average number of time a student studies) might be closer to the actual population average.
• Still, your samples would likely be different from each other.• This variability in samples cannot be stressed enough.
Size of a Sample• The size of a sample (also called number of observations) is
important.• Samples of only a few hundred observations, or even smaller, are
sufficient for many purposes.• In polling, samples that are from 1,200 to 1,500 observations are
considered large enough and good enough if the survey is random and done well.• We will learn more about this when we study confidence intervals.
• Just because a sample is large does not mean it is ‘good’.• Many large samples are biased.• Example: Call-in surveys
• People choose to call in, causing bias.
Sampling Errors and Bias• When you analyze data, it is important to be aware of
sampling errors and nonsampling errors.• The actual process of sampling causes sampling errors.
• One example is that the sample may not be large enough.• Factors not related to the sampling process cause nonsampling
errors.• A defecting counting device can cause a nonsampling error.
• In reality, a sample will never be exactly representative of the population so there will always be some sampling error.• As a rule, the larger the sample, the smaller the sampling error.
• Sampling Bias is created when a sample is collected from a population and some members of the population are not as likely to be chosen as others.• This can cause incorrect conclusions to be drawn about the
population being studied.
Critical Evaluation• Problems with samples
• A sample must be representative of the population• A sample that is not representative of the population is biased.• Biased samples that are not representative of the populations give
results that are inaccurate and not valid.• Self Selected Samples
• Responses only by people who choose to respond are often unreliable• Call-in Surveys• Website Surveys
Critical Evaluation• Sample Size Issues
• Samples that are too small may be unreliable• Larger samples are better, if possible• In some situations, having small samples is unavoidable and can still be
used to draw conclusions.• Examples: crash testing cars, medical testing for rare conditions
• Undue Influence• Collecting data or asking questions in a way that influences the response
• Example: How much do you think ObamaCare has hurt the economy?
Critical Evaluation• Non-response or refusal of subject to participate
• The collected responses may no longer be representative of the population• Often, people with strong positive or negative opinions may answer
surveys• This can affect the results
• Causality• A relationship between two variables does not mean that one causes the
other to occur• They may be related (correlated) because of their relationship through a
different variable• They may also just look correlated with no connection whatsoever
Critical Evaluation• Self-funded or self-interest studies
• A study performed by a person or organization in order to support their claim.
• Is the study impartial?• Read the study carefully to evaluate the work.• Do not automatically assume that the study is good, but do not
automatically assume it is bad, either.• Evaluate it on its merits and the work done
Critical Evaluation• Misleading use of data
• Improperly displayed graphs, incomplete data, or lack of context• Confounding
• When the effects of multiple factors on a response cannot be separated• Confounding makes it difficult or impossible to draw valid conclusions
about the effect of each factor.
Answers and Rounding Off• Carry your answer out to one more decimal place than was
present in the original data• Example: The mean of 5, 7, and 11 is 7.666666…’
• So, state that the mean is 7.7• Round off only at the final answer
• You try it: find the sum of the means of the following two data sets• 5, 7, 9 • 2, 3, 9• (Find the mean of each, and then add together)• (When you get your answer, check with your neighbor.)• (When you agree, check with another pair of people.)