Hypothesis Testing
• Correct models?
• Data similar?– Use one series to predict another
• Has something changed in the data?– Quality control, portfolio strategies
Outline
• Introduction (Basketball)
• Proportion changes (Political polls)
• Difference in means (Airline arrivals, Firestone)
• Testing a distribution (die)
• Causality
• Multiple comparisons and data snooping
• Statistical power
Outline
• Introduction (Basketball)
• Proportion changes (Political polls)
• Difference in means (Airline arrivals, Firestone)
• Testing a distribution (die)
• Causality
• Multiple comparisons and data snooping
• Statistical power
Hypothesis Testing
• Null hypothesis– Assumption about how the world works– Assume this is true– Could data have come from this
machine/theory/conjecture???– Do you need more/other data?
Basketball and Larry Bird
• Facts– Bird normally makes 48 percent of his shots– Bird has just finished a series of games where
he made only 20 of 57 shots– Question: Is this the usual Larry Bird, or has
something changed?– Is he in a slump?– On to matlab (bird1.m)
Hypothesis Testing Terms
• Null hypothesis– Assumption about the world
• Test statistic – Observed statistic (Random variable)
• p-value (probability null is true)– Prob( shots <= 20 )
Outline
• Introduction (Basketball)
• Proportion changes (Political polls)
• Difference in means (Airline arrivals, Firestone)
• Testing a distribution (die)
• Causality
• Multiple comparisons and data snooping
• Statistical power
Political Poll
• Gore/Bush 0/1
• Two polls (100 people)– First 50/50– Second 55/45
• What is the probability that something has changed in the population?
• Matlab: pollchange.m
Outline
• Introduction (Basketball)
• Proportion changes (Political polls)
• Difference in means (Airline arrivals, Firestone)
• Testing a distribution (die)
• Causality
• Multiple comparisons and data snooping
• Statistical power
Differences in Means
• Two samples
• Different means
• Could they be drawn from the same population?
• Examples– Has something changed?
• Flights (time)
• Tires (Firestone)
Flight Delays
• Two series (minutes late)– Before mechanics threat of delays– After mechanics threat of delays
• More delays after threat
• Compare to pooled data
• Null = two series are the same
• Could the mean difference between the two come from the pooled series?
Firestone
• Overall tires have a failure rate of 5 in 1000
• You have observed in a sample of 10,000 tires a failure rate of 60
• Is something wrong with Firestone tires?
• Matlab: firestone.m
Outline
• Introduction (Basketball)
• Proportion changes (Political polls)
• Difference in means (Airline arrivals, Firestone)
• Testing a distribution (die)
• Causality
• Multiple comparisons and data snooping
• Statistical power
Testing a Die
• Problem:– You’ve observed the following rolls of a die
out of 6000 rolls• 1: 1014, 2: 958, 3: 986, 4: 995, 5: 1055, 6:992
– Could this have come from a fair die with probs of 1/6 for each side?
Dietest.m
• Method:– Think up a test statistic– Roll 6000 dies with sample– Check how the value of the test statistic from
the original data compares with the distribution from the simulations
• dietest.m
Outline
• Introduction (Basketball)
• Proportion changes (Political polls)
• Difference in means (Airline arrivals, Firestone)
• Testing a distribution (die)
• Causality
• Multiple comparisons and data snooping
• Statistical power
Causality
• Stock returns and weather
• Are returns higher when it is sunny?
• Given some data on weather and returns test this hypothesis
• on to matlab: sunny.m
Outline
• Introduction (Basketball)
• Proportion changes (Political polls)
• Difference in means (Airline arrivals, Firestone)
• Testing a distribution (die)
• Causality
• Multiple comparisons and data snooping
• Statistical power
Multiple Tests and Data Snooping
• In the search for patterns you often look at many different things– Different trading rules– Different regression runs– Different drugs
• Each is often tested alone
• Then get excited when 1 is significant
Data Snooping and Trading Strategies
• Efficient markets world (no predictability)
• Someone claims to have a buy/sell (short/long) strategy which generates significantly large returns
• They pretested 10 strategies and chose the best out of the 10
• Return sample is independent and normal
Questions
• What is the likelihood that some “best” strategy beats a buy and hold benchmark?
• What if this strategy were tested to see if it was “significant” using traditional statistical tests, ignoring that it had been snooped?
• Matlab: snooptest.m
Other Applications
• Many other trading strategies– More later
• Multiple regressions– Run 20 regressions of y = a + bx for different x– Report only those with significant b– Common economist sin
Outline
• Introduction (Basketball)
• Proportion changes (Political polls)
• Difference in means (Airline arrivals, Firestone)
• Testing a distribution (die)
• Causality
• Multiple comparisons and data snooping
• Statistical power
Hypothesis Tests Again
• P-value or significance level– Probability of rejecting null hypothesis given
that it is true
Hypothesis Tests Again
• Type II error– Probability of accepting null hypothesis given
that it is false
Hypothesis Tests Again
• Power– Probability of rejecting null hypothesis when it
is false– Probability of catching a deviation
Type I and Type II errorsWhich do you prefer?
• Mushroom/Toadstool(poison) test– Null = Mushroom– Type I: Reject mushroom given mushroom– Type II: Accept mushroom given toadstool
• Makes a difference
Hypothesis Tests: Final Word
• Traditional Goals– Correct Size– Maximum Power
• Specific situations– Costs of Type II error (mushrooms)– Finance:
• Using incorrect model
• Missing risks (LTCM)
Problems for Monte-Carlo Tests of Power
• Test a null hypothesis under some alternative
• Need to commit to which alternative
• Power(alternative)