Post on 11-Jan-2016
transcript
Chapter 9Chapter 9Statistical Data AnalysisStatistical Data Analysis
An Introduction to Scientific An Introduction to Scientific Research Methods in GeographyResearch Methods in Geography
Montello and SuttonMontello and Sutton
Data AnalysisData Analysis
Data AnalysisData Analysis Helps us achieve the four scientific goals of Helps us achieve the four scientific goals of
description, prediction, explanation, and description, prediction, explanation, and controlcontrol
Statisical Data Analysis Statisical Data Analysis Three primary reasons geographers treat data Three primary reasons geographers treat data
in a statisitical fashionin a statisitical fashion
http://rlv.zcache.com/knowledge_is_power_do_statistics_stats_humor_flyer-p2440846222778564182dwj5_400.jpg
Statistical DescriptionStatistical Description
Descriptive StatisticsDescriptive Statistics ParametersParameters Central TendencyCentral Tendency
ModeMode MedianMedian MeanMean
Arithmetic meanArithmetic mean
When would you use the median or the mode When would you use the median or the mode instead of the mean?instead of the mean?
,X m
Descriptive StatisticsDescriptive Statistics
VariabilityVariability RangeRange
= largest value – smallest value= largest value – smallest value
VarianceVariance
Standard DeviationStandard Deviation
2
2 1
( )N
ii
x
N
ms =
-=å
2
1
( )N
ii
x
N
ms =
-=
å
Descriptive StatisticsDescriptive Statistics
FormForm ModalityModality SkewnessSkewness
PositivePositive NegativeNegative
SymmetrySymmetry Unimodal – Bell-shapedUnimodal – Bell-shaped
Normal DistributionNormal Distribution
http://people.eku.edu/falkenbergs/images/skewness.jpg
Descriptive StatisticsDescriptive Statistics
Derived ScoresDerived Scores Percentile RankPercentile Rank
Highest – 99Highest – 99thth percentile percentile Where is the median?Where is the median?
Z-scoreZ-score Standard deviation units above or below the meanStandard deviation units above or below the mean
xz
ms-
=
Descriptive StatisticsDescriptive Statistics
RelationshipRelationship Linear RelationshipLinear Relationship
PositivePositive NegativeNegative
Relationship StrengthRelationship Strength Weak, strong, no relationshipWeak, strong, no relationship
Correlation CoefficientCorrelation Coefficient Between -1 and 1Between -1 and 1 0 – no relationship0 – no relationship
Regression AnalysisRegression Analysis Criterion variables (Y)Criterion variables (Y) Predictor variables (X)Predictor variables (X)
http://hosting.soonet.ca/eliris/remotesensing/LectureImages/correlation.gif
“Correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there’.” - XKCD
http://xkcd.com/552/
Correlation – Causation?Correlation – Causation?
Statistical InferenceStatistical Inference
Inferential StatisticsInferential Statistics StatisticsStatistics
Sampling errorSampling error Given our sample statistics, we infer our Given our sample statistics, we infer our
parametersparameters Assign probabilities to our guessesAssign probabilities to our guesses
Power and difficulty of inferential statistics Power and difficulty of inferential statistics comes from deriving probabilities about how comes from deriving probabilities about how likely it is that sample patterns reflect likely it is that sample patterns reflect population patternspopulation patterns
Inferential StatisticsInferential Statistics
Sampling distributionSampling distribution Ex: sampling distribution of means – show the Ex: sampling distribution of means – show the
probability that a single sample would have a probability that a single sample would have a mean within some given RANGE of valuesmean within some given RANGE of values
Central limit theorem – sampling distribution Central limit theorem – sampling distribution of sample means will be normal with a mean of sample means will be normal with a mean equal to the population mean and a standard equal to the population mean and a standard deviation equal to the population standard deviation equal to the population standard deviation divided by the square root of the deviation divided by the square root of the sample sizesample size
Inferential StatisticsInferential Statistics
Generation of sampling distributionsGeneration of sampling distributions AssumptionsAssumptions
Distributional assumptionsDistributional assumptions NonparametricNonparametric ParametricParametric
NormalityNormality Homogeneity of varianceHomogeneity of variance
Independence of scoresIndependence of scores Correct specification of modelsCorrect specification of models
Estimation and Hypothesis TestingEstimation and Hypothesis Testing
EstimationEstimation Point estimationPoint estimation Confidence IntervalConfidence Interval
Usually 95%Usually 95%
Hypothesis TestingHypothesis Testing Null hypothesis Null hypothesis
A hypothesis about the exact (point) value of a A hypothesis about the exact (point) value of a parameter or set of parametersparameter or set of parameters
Use sample statistics to make an inference about Use sample statistics to make an inference about the probable truth of our null hypothesisthe probable truth of our null hypothesis
Hypothesis TestingHypothesis Testing
Alternative Alternative HypothesisHypothesis Hypothesis that the Hypothesis that the
parameter does not parameter does not equal the exact value equal the exact value hypothesized in the hypothesized in the nullnull
A range rather than an A range rather than an exact valueexact value
Modus TollensModus Tollens Useful for Useful for
disconfirmingdisconfirming Not confirming!Not confirming!
If A is true, Then B is true
B is not true B is true
Therefore,A is not true
Therefore, ???
ExampleExample
From a recent nationwide study it is known that the From a recent nationwide study it is known that the typical American watches 25 hours of television per typical American watches 25 hours of television per week, with a population standard deviation of 5.6 hours. week, with a population standard deviation of 5.6 hours. Suppose 50 Denver residents are randomly sampled Suppose 50 Denver residents are randomly sampled with an average viewing time of 22 hours per week and a with an average viewing time of 22 hours per week and a standard deviation of 4.8. Are Denver television viewing standard deviation of 4.8. Are Denver television viewing habits different from nationwide viewing habits?habits different from nationwide viewing habits?
Step 1: State your null and alternative hypothesesStep 1: State your null and alternative hypotheses
What is this saying?What is this saying?
0 : 2 5
: 2 5A
H X
H X
=
¹
ExampleExample
Step 2: Determine your appropriate test statistic and its sampling Step 2: Determine your appropriate test statistic and its sampling distribution assuming the null is truedistribution assuming the null is true We are testing a sample mean where n>30 and so a z distribution can We are testing a sample mean where n>30 and so a z distribution can
be usedbe used
Step 3: Calculate the test statistic from your sample dataStep 3: Calculate the test statistic from your sample data
Step 4: Compare the empirically obtained test statistic to the null Step 4: Compare the empirically obtained test statistic to the null sampling distributionsampling distribution P value:P value: OR Critical value at .05 significance level: z = OR Critical value at .05 significance level: z = ±1.96±1.96 Decision: Reject the null hypothesisDecision: Reject the null hypothesis
-3.79 is less than -1.96: reject-3.79 is less than -1.96: reject The p value is very small, less than .05 and even .01: rejectThe p value is very small, less than .05 and even .01: reject
2 2
4 .8
5 0
X
s
n
===
2 5
5 .6
ms
==
2 2 2 53 .7 9
/ 5 .6 / 5 0
Xz
n
ms
- -= = =-
.0 0 0 1p =
ErrorError
You have made either a correct inference You have made either a correct inference or a mistakeor a mistake
Type I error is the rejection level, p (or Type I error is the rejection level, p (or αα)) Type II error - Type II error - ββ
http://www.mirrorservice.org/sites/home.ubalt.edu/ntsbarsh/Business-stat/error.gif
Data in Space and PlaceData in Space and Place
Spatiality is a focus in geography, unlike other disciplinesSpatiality is a focus in geography, unlike other disciplines Spatial autocorrelationSpatial autocorrelation
First Law of Geography: Everything is related to everything else, First Law of Geography: Everything is related to everything else, but near things are more related than distant thingsbut near things are more related than distant things
Positive v negative spatial autocorrelationPositive v negative spatial autocorrelation A violation of the important statistical assumption of A violation of the important statistical assumption of
independenceindependence Ex: If its raining in my backyard, I can say with a high degree of Ex: If its raining in my backyard, I can say with a high degree of
confidence its raining in my neighbor’s backyard, but my level of confidence its raining in my neighbor’s backyard, but my level of confidence that it is raining across town is lower, and 300 miles confidence that it is raining across town is lower, and 300 miles away even loweraway even lower
VariogramVariogram
http://www.innovativegis.com/basis/Papers/Other/ASPRSchapter/Default_files/image023.png
Data in Space and PlaceData in Space and Place
““Spatial data are special” – a special difficultySpatial data are special” – a special difficulty Which areal units should be used to analyze Which areal units should be used to analyze
geographic datageographic data Modifiable Areal Unit ProblemModifiable Areal Unit Problem
Gerrymandering Gerrymandering Geographic phenomena are often scale Geographic phenomena are often scale
dependentdependent Must identify the scale of a phenomena and collect Must identify the scale of a phenomena and collect
and organize data in units of that sizeand organize data in units of that size Data aggregation issuesData aggregation issues
Discussion QuestionsDiscussion Questions What measure of central tendency is best for nominal What measure of central tendency is best for nominal
data? data? When pollsters tell you that a candidate is favored by When pollsters tell you that a candidate is favored by
44% of likely voters, plus or minus 3 percent, what is the 44% of likely voters, plus or minus 3 percent, what is the 44% and what is the plus/minus 3%?44% and what is the plus/minus 3%?
A survey of all users of a park in 1980 found the average A survey of all users of a park in 1980 found the average number of people per party to be 3.5. In a random number of people per party to be 3.5. In a random sample of 35 parties in 2000 the average was 2.9. If you sample of 35 parties in 2000 the average was 2.9. If you wanted to test if the number of persons per party in 2000 wanted to test if the number of persons per party in 2000 was different from the number in 1980, what would your was different from the number in 1980, what would your null and alternative hypotheses be?null and alternative hypotheses be?
In the United States, we presume that someone is In the United States, we presume that someone is innocent. If a guilty person were found to be not guilty, innocent. If a guilty person were found to be not guilty, what type of error would this be?what type of error would this be?
A researcher finds that a particular learning software has A researcher finds that a particular learning software has an effect on student’s test scores, when actually it does an effect on student’s test scores, when actually it does not. What type of error is this?not. What type of error is this?