+ All Categories
Home > Documents > z Test t Test Chi Sq Test

z Test t Test Chi Sq Test

Date post: 24-Nov-2015
Category:
Upload: jxl-li
View: 40 times
Download: 3 times
Share this document with a friend
45
STAT 2 Lecture 32: Still more testing
Transcript
  • STAT 2Lecture 32:

    Still more testing

  • Recall: the two-sample z-test

    In 1998, a survey of a SRS of 200 Berkeley students found that they, on average, played hackeysack 4 hours a week (with an SD+ of 8 hours)

    In 2008, a survey of a SRS of 200 Berkeley students found that they, on average, played hackeysack 2 hours a week (with an SD+ of 6 hours)

    Has the average number of hours of hackeysack played per week by all Berkeley students changed from 1998 to 2008?

  • Setting up the two-sample z-test

    Null hypothesis: the 1998 population average is the same as the 2008 population average (so the data are like draws from two boxes with the same average)

    Alternative: the averages are different Test statistic: the z-statistic for the difference

    in sample averages Assume 1998 and 2008 samples are

    independent

  • Calculations

    Observed difference in averages = 2 hours Expected difference in averages,

    assuming null hypothesis = 0 SE of 1998 sample average = 8/sqrt(200)

    = 0.566 SE of 2008 sample average = 6/sqrt(200)

    = 0.424 SE of difference in sample averages =

    sqrt(0.5662 + 0.4242) = 0.707

  • Calculations

    z-statistic = (2 0)/0.707 = 2.83From tables: P(Z < -2.83) = P(Z > 2.83) = 0.23% Two-tailed P-value is 0.23% + 0.23%

    = 0.46%There is a statistically significant

    difference; the average hours of hackeysack played has changed

  • Today

    Testing results of experiments The chi-square test

  • I

    Experimental averages

  • Example: vitamin C

    A study of the effect of vitamin C on cold resistance is performed. 200 participants are randomly assigned to the treatment group of 100, which receives vitamin C pills, or the control group of 100, which receives placebos. Doctors and participants don't know who's in which group.

  • Example: vitamin C

    The treatment group averaged 2.3 colds (SD 3.1)

    The control group averaged 2.6 colds (SD 2.9)

    Is this difference significant? Does vitamin C prevent colds?

  • A box model for controlled experiments First imagine a box containing 200 tickets:

    one for each participant Each ticket has TWO numbers written on it:

    a treatment number and a control number If they're randomly assigned to the

    treatment group, we observe the treatment number; if they're randomly assigned to the control group, we observe the control number

  • A box model for controlled experiments Null hypothesis: the average of all

    200 of the treatment numbers is equal to the average of all 200 of the control numbers

    Alternative: the treatment and control averages are not equal

  • Try the two-sample z-test

    Let's do the two-sample z-test for now, and think about whether it's appropriate later

    Test statistic = treatment group average control group average

    Observed value = 2.3 2.6 = -0.3 Expected value under null = 0

  • Standard errors

    SE of treatment group average (ignoring correction factor) = 3.1/sqrt(100) = 0.31

    SE of control group average = 2.9/sqrt(100) = 0.29

    SE of difference = sqrt(0.312 + 0.292) = 0.42

  • Try the two-sample z-test

    z-statistic = (-0.3 0)/0.42 = -0.71 P(Z < -0.71) = 23.89% P-value is 24% The difference is explainable by

    chance: there's not enough evidence to show vitamin C prevents colds (even for this sample)

  • Was the two-sample z-test appropriate? But wait! We drew 100 tickets from a box

    containing 200 tickets, so shouldn't we have used the correction factor?

    Our standard error calculation assumed the two samples were independent, but they're not. The numbers you draw for the treatment tickets affect the numbers you get for the control tickets: if you're in the treatment group, you can't be in the control group!

    IS IT ALL BALDERDASH

  • Why the two-sample z-test still works for experiments The two errors we stated on the

    previous page will magically (almost) cancel out

    The two-sample z-test will give us a P-value that's marginally too large (conservative) but this is usually better than being too small

    For a proof, see page 33 of the textbook appendix, if you like algebra.

  • Another example: are people rational? This experiment was performed on 167

    doctors in a summer course at Harvard The doctors were asked to evaluate

    information presented on surgery and radiation treatment for cancer

    Doctors were randomly divided into two groups; the group were presented the information in different ways

  • Group A

    Of 100 people having surgery: 10 will die during treatment 32 will have died by one year 66 will have died by five yearsOf 100 people having radiation therapy: None will die during treatment 23 will die by one year 78 will due by five years

  • Group B

    Of 100 people having surgery: 90 will survive the treatment 68 will survive one year or longer 34 will survive five years or longerOf 100 people having radiation therapy: All will survive the treatment 77 will survive one year or longer 22 will survive five years or longer

  • The doctors' results

    In Group A, 40 favoured surgery, 40 favoured radiation (50% vs 50%)

    In Group B, 73 favoured surgery, 14 favoured radiation (83.9% vs 16.1%)

    Can this difference be explained by chance?

  • Setting up the test

    We'll use (Group B percentage Group A percentage) as our test statistic

    Null hypothesis: difference in percentages for all the doctors is zero

    Alternative: difference is not zero

  • Calculations

    Observed difference = 83.9% 50% = 33.9% Expected difference under null = 0% SE of Group A = sqrt(0.5*0.5/80)*100%

    = 5.6% SE of Group B = sqrt(0.839*0.161/87)*100%

    = 3.94% SE of difference = sqrt(3.942+5.62) = 6.84%Note: some statisticians would pool both samples to get

    one estimate of the box SD (instead of one for each group). It doesn't make much difference though.

  • Setting up the test

    z-statistic = (33.9 0)/6.84 = 4.96 This is so big it's not even on our

    normal table, so the P-value will be miniscule (from computer, P-value is 1/14000 of 1%)

    These doctors were not rational (doesn't necessarily mean that the whole population of doctors is irrational, but it might)

  • When can we use the two-sample z-test? When we are examining the difference between two

    independent samplesOR When we are examining the difference between two

    groups from a randomised experimentNOT When we are examining the difference between two

    dependent samples, and they're not from a randomised experiment

    In any case, we require a reasonably large sample size

  • Example

    I have both the midterm and final scores for a large sample of past Stat 2 students. What test should I do for a difference between midterm and final scores?

    Not an experiment Dependent: midterm and final scores for the same

    student are obv. related data is paired Can't do a two-sample z-test Instead find the difference between final and

    midterm scores, and compare the average difference to zero using a one-sample z-test

  • Aside: the two-sample z-test

    Just as there's a two-sample z-test, there's a two-sample t-test for the average difference between two moderate-sized normal samples

    We don't teach it because it's theoretically fraught; however it's commonly used (inappropriately)

  • II

    The chi-square test

  • So far

    We've seen: z-test: average/percentage/total/

    count, large sample t-test: average/percentage/total/

    count, normal data Two-sample z-test: difference

    between averages of independent samples or of experimental groups

  • Now

    All the tests we've seen have examined parameters of some model

    What if we want to test an entire chance model? Specifically, how well does the data we've observed fit a particular chance model?

    Such tests are goodness-of-fit tests

  • Example: is this die loaded?

    To see if a die is loaded, I roll it 60 times. I expect ten 1's, ten 2's, ten 3's etc. I get:

    four 1's six 2's 17 3's16 4's eight 5's nine 6's Is the die loaded?

  • Why not do a z-test?

    We could do a z-test to check that the sample average is what it should be (3.5)

    However, there are some thing this test won't pick up

    e.g. the die has a 30% chance of a 1, a 30% chance of a 6, and a 10% chance of any other number

    Mean will still be 3.5 even though it's loaded

  • The chi-square statistic

    Again we want to compare what we observe to what we expect. The way we do this is by finding the chi-square statistic:

    Calculate

    for each outcome, then take the sum of theseNotes: Frequency just means the count in each category.

    Don't divide by the SE; that was the z- or t-statistic.

    observed frequencyexpected frequency2

    expected frequency

  • Calculating the chi-square statistic

    For ones: (4 10)2/10 = 3.6 For twos: (6 10)2/10 = 1.6 For threes: (16 10)2/10 = 3.6 For fours: (17 10)2/10 = 4.9 For fives: (8 10)2/10 = 0.4 For sixes: (9 10)2/10 = 0.1 2 = 3.6 + 1.6 +3.6 + 4.9 + 0.4 +0.1

    = 14.2

  • Is this significant?

    If we have used the true expected values, then 2 has an approximate chi-square distribution

    Actually, like the t-distribution, the chi-square distribution is really a whole set of distributions

    Here degrees of freedom = number of categories 1 = 6 1 = 5

    So if the null is true, our has a chi-square distribution with 5 degrees of freedom

  • Is this significant?

    We look up the chi-square distribution in tables or on a computer

    We find the probability of getting a 2

    5 of more than 14.2 is 1.4%

    This means the P-value is 1.4% The die is loadedNote: chi-square tests are almost always one-tailed

  • When do we use the chi-square test? We have data in categories (or that we

    can put into categories) We have a (box) model for how many

    data points fall into each category We want to see if our observed data

    matches this modelNote: for the test to be accurate, we need the

    expected number in each category to be at least 5

  • The structure of the chi-square test

    Null hypothesis: often easiest to express in terms of tickets in a box. For the die, the null hypothesis was that rolls were like draws from the box

    [ 1 2 3 4 5 6 ] Alternative: data are not like draws

    from this box

  • The structure of the chi-square test

    Null hypothesis: often easiest to express in terms of tickets in a box. For the die, the null hypothesis was that rolls were like draws from the box

    [ 1 2 3 4 5 6 ] Alternative: data are not like draws

    from this box

  • The chi-square statistic

    Draw a table of observed and expected frequencies for each category

    Calculate

    for each category Add up the results; this is 2

    observed frequencyexpected frequency 2

    expected frequency

  • The chi-square statistic

    A high value of 2 means the data is different from what we expect

    The exact distribution of 2 depends on the degrees of freedom (number of categories 1)

    We calculate a P-value using tables or a computer

  • Recap

  • Recap: the two-sample z-test

    We use the two-sample z-test to test a null hypothesis about the difference in the averages of two boxes (often, that the averages are the same)

    The test statistic isobserved differenceexpected differenceSE of difference

  • Recap: the distribution of a difference If the null hypothesis is that the true averages

    are the same, the expected difference is zero If the SEs of the sample averages are a and b,

    the SE of their difference is

    This requires either independent samples, or a randomised experiment

    With large samples, we can compare z to a standard normal to get a P-value

    a2b2

  • Recap: the chi-square test If we have data in categories, and we wish

    to know if this data fits a null model, we calculate the chi-squared statistic:

    The degrees of freedom is the number of categories minus one

    We find the P-value from a chi-square distribution using a table or computer

    2= observed frequencyexpected frequency 2

    expected frequency

  • Monday

    The chi-square test for independence

    Problems with statistical tests

    Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45


Recommended