Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | estefany-points |
View: | 232 times |
Download: | 0 times |
Introduction to Statistics
Topics 1 - 5 Nellie Hedrick
StatisticsStatistics is the Study of Data, it is science of reasoning from
data.
What does it mean by the term data?You will find that data vary and variability abounds in everyday
life.
• Observational unit – are the objects described by a set of data.
• Variability – phenomenon of a variable taking on different values or categories from observational unit to observational unit.
• Quantitative Variables – take the numerical values which numerical operation makes sense. Such as height, weight, time, …
• Categorical Variables – places an individual into one of several group or categories. Such as gender, cities in Oklahoma, states in USA, …
• Binary variables – categorical variable that can only take two possible outcome. Male/female, Yes/No, …
• Research Question – often looks for patterns in a variable or compares a variable across different groups or looks for a relationship between variables
More on Observational Units and Variables:
• Distinction between categorical and quantitative variables is very important determines which statistical tools to use for analyzing a given data set.
• Determine if data measured either quantitatively or categorically • How many hours you slept in the past 24-hours• Whether you have slept for at least 7 hours in the past 24-
hours• Determine a variable that takes numerical values that are
really just category labels, such as zip-code, …• Watch out:
• to determine whether something is actually a variable, ask yourself whether or not it represents a question that can be asked of each observational unit and
• Whether the values can potentially vary from observational unit to observational unit.
More on Wrap up -• Statistics is the science of data• Data are not mere numbers• Data are collected with purpose and have
meaning in some context• Fundamental concept of statistics is variability• As we go through the course you will understand
to classify variables and determine which statistical tools to apply to the data• Always consider data in context and anticipate
reasonable values for the data collected and analyzed
• Variable is characteristic that varies from one person to another (observational unit)
• Identify variables as categorical, quantitative or binary
Activity 1-6 page 11
Activity 1-9 page 12
Activity 1-13 page 12
Topic 2 – Data and Distributions and the Graphing Calculator
Picturing Distributions with Graph
The Distribution of a variable tells us what values it takes and how often it takes these values. We are looking for pattern of variation.
• Categorical Variables – places an individual into one of several group or categories.
• Quantitative Variables – take the numerical values which numerical operation makes sense
• Distribution of a variable – what values it takes and how often,.
Graphical Representations of DataCategorical Variable
Distribution of Percent of Students Attended Training by Class
0
10
20
30
40
50
60
70
80
90
100
Freshman Sophomore Junior Senior
Per
cent
of s
tude
nts
• Bar Chart
Class Frequency (%)Freshman 14.3
Sophomore 42.9Junior 7.1Senior 35.7
100
Activity 2-2 hand washing (page 17)In August 2005, researchers for the American Society for Microbiology and the Soap and Detergent Association monitored the behavior of more than 6300 users of public restrooms. They observed people in public venues such as Turner Field in Atlanta and Grand Central Station in New York City. They found that 2393 of 3206 men washed their hands, compared to 2802 of 3130 women.a. What proportion of the men washed their hands? What
proportion of the women washed their hands? b. Are these proportions consistent with the following pair of bar
graphs?c. Comment on what your calculations and the bar graph reveal
about whether or not one gender is more likely to wash their hands after using a public restroom.
d. For each city, estimate the proportion of people who washed their hands as accurately as you can from the graph. Atlanta: Chicago: New York: San Francisco:
e. Comment on what the bar graphs reveal about how these cities compare with regard to hand washing.
Activity 2-2 hand washing (page 17)Studying people washing their hand after
using restroom• We can look at % of all data collected
whether or not they are washing their hands• Look at variation between men and women• Variation between people in different state
whether or not washing their hands• Variation between men and women in each
state washing their hand
Activity 2-4: Buckle Up (page 19)
The National Highway Traffic Safety Administration ( NHTSA) reports the percentage of residents in each state who regularly wear a seatbelt in a car and also whether or not the state has a primary or secondary type of seatbelt law. A primary law means that motorists can be stopped based solely on belt usage, while a secondary law means that the motorist can be stopped only for another reason. The 2005 data appear in the next table ( s secondary, p primary, and * not known):a. What are the observational units for these data? b. Classify each of the variables in the table as categorical ( also binary)
or quantitative.c. What would you estimate is a typical usage percentage for a state with
a primary- type seatbelt law? How about a state with a secondary- type law? ( Do not perform any calculations; base your answers on a casual reading of the dotplots.) Primary: Secondary:
d. Does a state with a primary law always have a higher usage percentage than a state with a secondary law? Explain. If not, identify a pair of states for which the state with a primary law has a lower usage percentage than the state with a secondary law.
e. Do states with a primary law tend to have higher usage percentages than states with a secondary law? Explain how you can tell from the dotplots.
f. Do the data seem to support the contention that tougher ( primary) laws lead to more seatbelt usage? Can you draw this conclusion definitively? Explain.
Activity 2-4: Buckle Up (page 19)• What type of variable?• Create visual display DOTPLOT, useful
method for displaying small datasets of quantitative variable
• Label the axis, specially if more than one group
• Bar or dot plot usually more illuminating when we are comparing the distribution of variables between two or more groups
• Statistical tendency- when comparing 2 or more groups or analyzing dataset• Use words like tend to, on average, lead to
in order to express the results.
Watch out and In Brief• Bar or dot plot usually more illuminating when we
are comparing the distribution of variables between two or more groups
• Statistical tendency- when comparing 2 or more groups or analyzing dataset. But it is not a hard-and-fast rule for categorical and quantitative variables. Be careful with your language. This is also true for cause-and-effect conclusions.
• Label your graphs• Be careful, when it is asked proportion(0-1) or
percent(0% - 100%) • Bar graph are easier to compare than comparing
raw data. • Always relate your comments to the context of
the data and ideally to the question of the interest.
Watch out and Wrap up continued• Consistency refer to how variable or
spread out, the values in a data sets are for a quantitative variables.
• When describing a distribution refer to both center (tendency) and spread (consistency)
Exercises 2-9 page 27
Exercises 2-16 page 30
Exercises 2-12 page 28
Topic 3: Drawing Conclusions from Studies
• Data gives you insight into interesting questions.• Idea of generalizing the results of the study to a
larger group than those you used in the study itself.• Population – in a study refers to the entire group of
people or objects (observational unit) of interest• Sample – is typically small part of the population
from whom or about what data are gathered to learn about the population. If sample is selected carefully (representative of the population) you can learn a lot about the population.
• Sample size – the number of observational units (people or objects) studied in a sample.
• Sampling Bias – sampling procedures if it tends systematically to over represent certain segments of the population and under represents others.
More Definition – Activity 3-1 page 35
• Convenience samples – sample selected due to convenience of being available.
• Voluntary response – sample selected in a such a way that members of the population decide for themselves whether or not to be part of the study.
• Non-response – problem could rise when the observational unit does not respond to the study
• Sampling frame – list used to select the subjects does not represent all variation in the population
• Parameter – number that describe the population (P-P)• Statistics – number that describe the sample (S-S)
Activity 3-1 page 34Elvis Presley is reported to have died in his Graceland mansion on
August 16, 1977. On the 12th anniversary of this event, a Dallas record company wanted to learn the opinions of all adult Americans on the issue of whether Elvis was really dead. But of course they could not ask every adult American this question, so they sponsored a national call- in survey. Listeners of more than 100 radio stations were asked to call a 1- 900 number ( at a charge of $ 2.50) to voice an opinion concerning whether Elvis was really dead. It turned out that 56% of the callers thought that Elvis was alive. This scenario is very common in statistics: wanting to learn about a large group based on data from a smaller group.
Activity 3-1 page 34 (cont)• In 1936, Literary Digest magazine conducted the most extensive
( to that date) public opinion poll in history. They mailed out questionnaires to over 10 million people whose names and addresses they had obtained from telephone books and vehicle registration lists. More than 2.4 million people responded, with 57% indicating they would vote for Republican Alf Landon in the upcoming presidential election. ( Incumbent Democrat Franklin Roosevelt won the actual election, carrying 63% of the popular vote.)
More Definition – Activity 3-4 page 39• Explanatory variable – The variable whose effect
you want to study.• Response variable – The variable that you suspect
is effected by the other variable, explanatory variable
• Observational Study – when researcher passively observe and record information about observational units.
• Lurking variables – when observational does not includes the possible effects of a variable. Unrecorded variable is called lurking variable.
• Confounding variable – is a lurking variable whose effects on the response variable indistinguishable from the effects of the explanatory variable.
• Activity 3-4 page 39• Exercise 3-8 page 46
Wrap Up:Key questions to consider• What are the two things can prevent you
from drawing certain conclusion in the study?• Bias and compounding
• To what population can you reasonably generalize the results of a study?• Depends to how you have selected your
data• Can you reasonably draw a cause-and-
effect connection between the explanatory and response variables?• Depends on whether or not explanatory
variable was assigned to the observational units
Topic 4 – Random Sampling• One way to avoid a biased sampling
method is to give every member of the population the same chance of being selected for the sample.
• Your selection method should ensure that every possible sample (of the desired sample size) has an equal chance of being the sample ultimately selected.
• Such a sampling is called Simple Random Sampling (SRS)
• Unbiased – A statistic is said to provide unbiased estimates of a population parameter if values of the statistics from different random samples are centered at the actual parameter value
Definition• Sampling variability – an important
statistical property knows as sampling variability refers to the fact the values of sample statistics vary from sample to sample.
• Precision – of a sample statistics refers to how much the values vary from sample to sample• The bigger the sample size the more
precise and closer together than those with the smaller sample size
• Statistics provides more accurate estimate of the corresponding population parameter
Activity 4-1 page 54Activity 4-2 page 57Exercise 4-18 page 73
Wrap up• Do not confuse the difference between
sample size and the number of sample done in a study.
• Although the role of the sample is crucial to assessing how a sample statistics varies from one sample to sample.
• The size of the sample will not effects the sampling variability.
• As long as the population is large relative to the sample size (at least 10 times as large), the precision of a sample statistics depends on the sample size and not on the population size.
Topic 5: Designing Experiments • SELF STUDY• QUIZ – Assignment 1