+ All Categories
Home > Documents > Stascal inference on proporons -...

Stascal inference on proporons -...

Date post: 04-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
32
Sta$s$cal inference on propor$ons
Transcript
Page 1: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Sta$s$cal inference on propor$ons

Page 2: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Outline for today

Reviewofthebinomial,normaldistribu5onsBayesianinferenceHypothesistestsforpropor5ons

Page 3: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Announcement

Openingdaywasyesterday

TheRedSoxbeatthePirates5-3Offtoagoodstart!

Page 4: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Announcement 2

CathyO’Neilspeakingon“WeaponsofMathDestruc5on”

When:SaturdayApril8th,at10:30amWhere:HookerAuditorium,ClappLaboratory,MountHolyoke

Page 5: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Ques$ons about worksheet 7?

Page 6: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Review: Discrete probability models

Discretedistribu5ons:randomvariableXtakesondiscretenumbers

•  Examples?

BernoulliDistribu5on:

•  Whatdoesitmodel?•  Probabilityofsuccessonasingletrial(coinflip)

•  Whatisthesamplespace?•  Xisin{0,1}

•  Whataretheparameters?•  π:probabilityofsuccess

•  i.e.,Pr(X=1)

Page 7: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Review: Binomial distribu$on

Whatdoesitmodel?•  Itmodelstheprobabilityofksuccessoutofntrials

Whataretheparameters?•  n:numberoftrials•  π:probabilityofsuccessoneachtrial•  k:numberofsuccesses

InR:•  Pr(X=k):dbinom(x=k,size=n,prob=pi)•  Pr(X≤a):pbinom(a,size=n,prob=pi)

Page 8: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Con$nuous probability distribu$ons

Con5nuousdistribu5onshaveprobabilitydensityfunc2onsthatcanbeusedtocalculatetheprobabilitythatanevenisinsomeinterval[ab]

Page 9: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Normal Density Curve

Anormaldistribu2onfollowsabell-shapedcurve•  Usefulfordescribingthemeanofmanyrandomoutcomes

Therearetwoparametersthatcharacterizenormalcurves,whichare:

•  Themean:μ•  Thestandarddevia5on:σ

Nota5on:X~N(μ,σ)

Page 10: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Normal curves with different means

N(0,1) N(2,1)

Page 11: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Normal curves with different variances

N(0,1) N(0,.5)

N(0,2)

Page 12: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Normal probability

R: pnorm(88, 100, 10)

Example:IQscoresarenormallydistributedwithμ=100,sd=10

WhatistheprobabilitysomeonewouldhaveanIQlessthan88?

Page 13: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Why the normal distribu$on?

Centrallimittheorem:themean(x̅)ofanumberofrandomvariablesindependentlydrawnfromthesamedistribu5onisnormal

Page 14: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Example: Binomial distribu$on comes normal as n increases

Binomialdistribu5onisthesumofBernoullioutcomes•  e.g.,binomialdistribu5onsaysifItossacoinn5mes,whatistheprobabilityIgetkheads

Bythecentrallimittheorem,itshouldbecomelikeanormaldistribu5onasnincreases

Page 15: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Example: Binomial distribu$on comes normal as n increases (here π = .3)

Page 16: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Examples of normal distribu$ons in baseball?

Canyouthinkofanyexamples?•  Heightofplayers?• Weightofplayers?•  Lengthofgames?

Page 17: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Examining player heights…

Page 18: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Examining player heights…

Page 19: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Any ques$ons about probability?

Ontosta5s5calinference!

Page 20: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Sta$s$cal inference

Sta2s2calinference:usesampleofdatatodeduceproper5esofanunderlyingpopula5on,orstochas5cprocess

Inthecontextofbaseballthisusuallymeans:lookingataplayer’sperformancetotellsomethingabouttheplayer’sability•  Ability:innatetalent•  Performance:outcomesfromplayinganumberofgames

Page 21: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Sta$s$cal inference

We’veseenmanycasesofsimula5ngaplayer’sperformancebasedonpre-specifiedabili5es(probability)

•  E.g.,wecansimulatea.333OBPbyrollingadie(orusinganRfunc5on)togeneraterandomdataconsistentwitha.333OBP

Withsta5s5calinferencewegointheotherdirec5on:wetakeacollec5onofoutcomesandes5matetheprobabilitymodelparameters

Hit,Out,Hit,Out,Out,Out,…

Hit,Out,Hit,Out,Out,Out,… Es5mateπhit

Page 22: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Bayesian Inference: Determining the ability of a player

Supposethereare3playerswithdifferenttrueOBPabili5es:•  PlayerH1’strueOBPis.200H1:π=.200•  PlayerH2’strueOBPis.333 H2:π=.333•  PlayerH3’strueOBPis.500 H3:π=.500

Oneplayerisselectedatrandomandweobservetheplayerfor10plateappearances

CanwetellwhetheritwasplayerH1,H2,H3,whowaspicked?

Let’ssimulatethiswitha4,6,and10sideddie•  1or2isonbase•  Highernumbersareouts•  Iwillrollthedie105mes…

Page 23: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Bayesian Inference: Determining the ability of a player

Supposewegot5onbaseeventsoutof5plateappearances

Ques5on:Whatdiewaschosen?•  i.e.,whatvalueisπ?

Page 24: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Determining the ability of a player

Herearethesimula5onresultsfrom1000simula5onsofrollingthedifferentdice105mes:

Pr(π=.200|5hits)=35/361=.097Pr(π=.333|5hits)=121/361=.335Pr(π=.500|5hits)=205/361=.568

0 1 2 3 4 5 6 7 8 9 10

0.200 95 271 315 199 80 35 5 0 0 0 0

0.333 17 81 206 270 226 121 59 16 4 0 0

0.500 7 44 111 206 252 205 122 47 6 0 0

Totalnumberofsimula5onsthatproduced5hits=35+121+205=361

Page 25: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Determining the ability of a player

Resultsbasedonthebinomialdistribu5on•  Whataren,kandπhere?

0 1 2 3 4 5 6 7 8 9 10

0.200 0.11 0.27 0.3 0.2 0.09 0.03 0.01 0 0 0 0

0.333 0.02 0.09 0.2 0.26 0.23 0.14 0.06 0.02 0 0 0

0.500 0 0.01 0.04 0.12 0.21 0.25 0.21 0.12 0.04 0.01 0

sum 0.13 0.37 0.54 0.58 0.53 0.42 0.28 0.14 0.04 0.01 0

0 1 2 3 4 5 6 7 8 9 10

0.200 0.85 0.73 0.56 0.34 0.17 0.07 0.04 0.00 0.00 0.00

0.333 0.15 0.24 0.37 0.45 0.43 0.33 0.21 0.14 0.00 0.00

0.500 0.00 0.03 0.07 0.21 0.40 0.60 0.75 0.86 1.00 1.00

Binomialdistribu5onresultsnormalized

Page 26: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Bayesian inference

Givesaprobabilitydistribu5onoverability(parameters)giventhatwehaveobservedsomeperformance(data)

Page 27: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Hypothesis tests

Ovenwewanttotestifaparameterisequaltoapar5cularvaluee.g.,wemightwanttotestifπ=.350

Totestifaparameterisequaltoapar5cularvaluewecanusehypothesistests

Page 28: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Paul the Octopus

Inthe2010WorldCup,PaultheOctopus(inaGermanaquarium)becamefamousforcorrectlypredic5ng11outof13soccergames

Ques5on:isPaulpsychic?

Page 29: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Paul the Octopus

Ques2on:IfPaulwaspsychic,whatpropor5onofgameswouldweexpecthimtoguesscorrectly?

•  Answer:π=.5Ques2on:HowcouldwecalculatetheprobabilityPaulwouldguess11ormoregamescorrectly?

•  Answer1:Wecouldflipafaircoin135mesandseehowmany5mesweget11ormoreheads.Thenrepeatthisprocess10,0005mes.

•  Wecandothissimula5oninRusingthefollowingcommands:•  >simulated.correct.guesses<-rbinom(10000,13,.5)•  >num.sims.as.good.as.paul<-sum(simulated.correct.guesses>=11)•  >propor5on.as.good.as.paul<-num.sims.as.good.as.paul/10000

Page 30: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Paul the Octopus

Whataresomeanswersforhowovenarandomlyguessingoctopuswouldguess11outof13gamescorrect?Igot:129of10,000simulatedtrialshave11ormorecorrectguesses•  IfPaulwasguessing,hewouldonlyget11right129/10,000=1.2%ofthe5me

Page 31: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Paul the Octopus

DoyouthinkPaulispsychic?

Page 32: Stascal inference on proporons - emeyers.scripts.mit.eduemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/class16.pdfPaul the Octopus In the 2010 World Cup, Paul the

Paul the Octopus: second approach

Ques2on:IfPaulwaspsychic,whatpropor5onofgameswouldhehaveguessedcorrectly?

•  Answer:π=.5Ques2on:Howcanwecalculatetheprobabilityhewouldguess11gamescorrectly?•  Answer2:Wecouldusethebinomialdistribu5ontotheprobabilityofge{ng11ormoreheads

• WecandothisinRusing:

•  >sum(dbinom(11:13,13,.5))#sumPr(X=11)+Pr(X=12)+Pr(X=13)•  >1–pbinom(10,13,.5)#equivalently:1–Pr(X≤10)


Recommended