Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | michael-snow |
View: | 215 times |
Download: | 1 times |
URBP 204A QUANTITATIVE METHODS I
Statistical Analysis Lecture II
Gregory NewmarkSan Jose State University
(This lecture accords with Chapters 6,7, & 8 of Neil Salkind’sStatistics for People who (Think They) Hate Statistics)
Populations and Samples• Populations
– All the people in a specified group of people• The population of Students at SJSU• The population of Students in Urban Planning at SJSU• The population of Students in 204A this semester
• Samples– A portion of a larger population selected for study
• A 500 person Sample of Students at SJSU• A 50 person Sample of Students in Urban Planning• A 15 person Sample of Students in 204A this semester
Populations and Samples• Ideally, research covers entire populations
– “Medicine X always cures the common cold”
• Financially, research is expensive– “We can’t afford to test Medicine X on everyone”
• Practically, we test samples of a population– “We can afford to test Medicine X on 1,000 people”
• Hopefully, those samples well represent the actual population– “For our results to be generalizable, our 1,000 people
should approximate the characteristics of everyone”
Populations and Samples
Populations and Samples• Sampling Error
– A measure of how well a sample approximates the characteristics of the larger population
– The difference between a sampling statistic (i.e., values in the sample) and a population parameter (i.e., values in the population)
– Low sampling error means higher precision– Higher precision means more generalizability– Valuable research has a high degree of
generalizability
Questions and Hypotheses• Research Questions (Problem Statements)
– What you are trying to investigate
• Hypotheses– Translates research question into a testable form
Hypotheses• Null Hypothesis
– Assumption that no relationship exists in population– Statements of equality– Examples
• “There is no relationship between reaction time and problem solving ability”
• “There is no difference in the average GRE scores of women and men”
– Purposes (Null Hypothesis can not be tested directly)• Starting point for research
– Until you prove a difference you have to assume none exists• Benchmark to compare observations
– Defines a range within which observed difference may be due to change
Hypotheses
Hypotheses• Research Hypothesis
– Definitive statement that a relationship exists in a sample
– Statements of inequality– Examples
• “There is a positive relationship between reaction time and problem solving ability”
• “There is a difference in the average GRE scores of women and men”
– Two Types• Non-directional – there is a difference but its direction is
unspecified• Directional – there is a difference and its direction is
specified– Purpose – to provide a hypothesis for direct testing
Hypotheses• Should be stated in a clear, forceful, declarative form
– “Students who complete all assignments will get higher grades in 204A than those who do not.”
• Should be expressed succinctly– Avoid excessive verbiage that can confuse your readers
• Should posit an expected relationship between variables– This will focus the research and avoid ‘scattershot’ approach
• Should reflect theory or literature– This ensures that the researcher has investigated the issue in
advance• Should be testable
– One can actually carry out the research– Defines how measurement will happen
Hypotheses Quotes• The great tragedy of Science - the slaying of a beautiful
hypothesis by an ugly fact.– Thomas H. Huxley (1825 - 1895)
• There are two possible outcomes: If the result conforms the hypothesis, then you've made a measurement. If the result is contrary to the hypothesis, then you've made a discovery.– Enrico Fermi (1901-1954)
• It is a good morning exercise for a research scientist to discard a pet hypothesis every day before breakfast. It keeps him young.– Konrad Lorenz (1903 - 1989)
• For every fact there is an infinity of hypotheses.– Robert M. Pirsig (1928 - )
Inferential Statistics• Descriptive Statistics describe a data set
– “The average height in this class is 5’6” with a standard deviation of 3”.”
• Inferential Statistics are used to make inferences from sample data to populations– “Based on our class data, we infer that the
average height at SJSU is 5’6” with a standard deviation of 3”.”
Inferential Statistics
The Normal Curve• Visual representation of a distribution of
scores with the following characteristics– Mean, median, and mode are the same– Symmetry around the mean (or mode or median)– Tails of curve approach zero asymptotically
The Normal Curve
The Normal Curve• We can exploit these properties of the normal
curve to compare distributions with different means and standard deviations, by putting them into standard scores based on the standard deviation
• Basically, we can compare curves by discussing their standard deviations
Z-Scores• A commonly used standardized score• Represent the number of standard deviations a
raw score falls from the mean• Result of dividing the amount that a raw score
differs from the mean of a distribution by the standard deviation of that distribution
• Z = z score; X = individual score; Xbar = mean; s = standard deviation
Z-Scores• Characteristics
– Z scores above the mean are:• Positive• To the right of the mean• In the upper half of the distribution
– Z scores below the mean are:• Negative• To the left of the mean• In the lower half of the distribution
– Z scores have associated probabilities
Z-Scores• Every z score has an associated probability• We can use that property to test hypotheses• This property enables inferential statistics• We can assess whether an event is due to
chance or reflects some research finding• Typically, we reject the null hypothesis if an
event has less than a 5% chance of occurring• In that case, the research hypothesis likely
makes more sense
Class Lab• Have everyone report their height in inches• Determine class mean• Determine class standard deviation• Calculate z score for your height• What percentage of the class is taller than
you? (see chart in back of book or online)
• Have everyone move the data into SPSS and repeat the experiment
The Normal CurveThe Normal Law
by W.J. Youden (1900 - 1971)
THENORMAL
LAW OF ERRORSTANDS OUT IN THE
EXPERIENCE OF MANKINDAS ONE OF THE BROADEST
GENERALIZATIONS OF NATURALPHILOSOPHY ... IT SERVES AS THE
GUIDING INSTRUMENT IN RESEARCHESIN THE PHYSICAL AND SOCIAL SCIENCES AND
IN MEDICINE, AGRICULTURE, AND ENGINEERING.IT IS AN INDISPENSABLE TOOL FOR THE ANALYSIS AND THE
INTERPRETATION OF THE BASIC DATA OBTAINED BY OBSERVATION AND EXPERIMENT
Statistical Significance• Refers to whether or not an observed effect is due to
chance or to systematic influence.– “There is a positive statistically significant relationship
between GDP and average life span.”– Statistical significance makes the null hypothesis less
attractive an explanation than the research hypothesis• Ideally, research would control for all other factors,
but in practice there will be uncontrolled error.– “There is a chance that a low GDP nation will have a higher
average life span, due to unaccounted for factors.”• Researchers ultimately define the level of certainty
they are willing to accept in determining significance.– “There is a 1 in 20 chance that the observed effect is not
due to the hypothesized reason, and we can live with that.”
– This is called significance level (or critical p-value).
Significance Levels can Vary
Statistical Significance
• To review:– First, hypothesize a relationship
• Null Hypothesis means no relationship (often implied)• Research Hypothesis means there is a relationship
– Second, test the research hypothesis• Define your significance level• Do your experiment
– Third, based on your findings either:• Reject the null and accept the research hypothesis• Accept the null and reject the research hypothesis
Statistical Significance
• Data and Dating– Is this enough to reject the null hypothesis?
Statistical Significance• Null Hypotheses can be either true or false
– If true, there is an equality– If false, there is an inequality
• The Null Hypothesis can not be directly tested– This presents a problem because one might reject the null
when it is true (Type I) or accept it when it is false (Type II)– Four options:
No ProblemAccept the Null Hypothesis when there is truly no difference between groups
Type I Error (False Positive)Reject the Null Hypothesis when there is truly no difference between groups
Type II Error (False Negative)Accept the Null Hypothesis when there truly are differences between groups
No ProblemReject the Null Hypothesis when there truly are differences between groups
Significant vs. Meaningful• Statistically significant does not always imply
the finding is meaningful– “There is a statistically significant ¼ inch difference
in the heights of women and men.”– “There is a statistically significant $0.50 difference
in the per capita tax returns of married couples versus singles.”
• Large samples will almost always find statistically significant differences.
• The researcher needs to assess the meaning of the outcomes by considering their context.
Statistical Significance Revisited
• Steps:– State hypothesis– Set significance level associated with null
hypothesis– Select statistical test (we will learn these soon)– Computation of obtained test statistic value – Computation of critical test statistic value– Comparison of obtained and critical values
• If obtained > critical reject the null hypothesis• If obtained < critical stick with the null hypothesis
Statistical Significance Revisited
• One Tailed Test
Statistical Significance Revisited
• Two Tailed Test
Inferential Statistics Revisited
• Inference allows decisions to be made about populations based on information about samples.
• Steps:– Take a representative sample– Test each member of the sample– Analyze data to determine if variation is due to
chance (accept null hypothesis) or statistically significant (accept research hypothesis)
– Conclusions inferred about population
Inferential Statistics Revisited