Praccal Issues in Stascal Interpretaon of Tevatron Datatrj/trjslacstats.pdf · Praccal Issues in...

Post on 11-May-2020

2 views 0 download

transcript

Prac%calIssuesinSta%s%calInterpreta%onofTevatronData

6/5/12 1T.JunkSLACStatsProgress

ThomasR.JunkFermilab

ProgressonSta<s<calIssuesinSearchesConferenceSLACNa<onalAcceleratorLaboratory

June5,2012

• Mul<pleParametersofInterest• HandlingNuisanceParameters• Overbinning,Smoothing,andDistribu<onsthatOughtNottobeSmoothed• Modelvalida<onwithMul<variateAnalyses• ABCDMethods• Look‐Elsewhere

6/5/12 T.JunkSLACStatsProgress 2

2

Protons‐an<protoncollisionsforRunIandRunII

1.96 TeVpps =

Main Injector and Recycler

Antiproton source

Booster

Tevatronringradius=1kmCommissionedin1983

MainInjectorcommissionedforRunII

Recyclerusedasanotheran<protonaccumulator

RunIIendedSep.30,201110W‐1ofanalyzabledata/experiment.500+papers/experimentandcoun<ng!

RunII:

6/5/12 T.JunkSLACStatsProgress 3

3

6/5/12 T.JunkSLACStatsProgress 4

Two(ormore)ParametersofInterest

Fromthe2011PDGSta<s<csReview

h`p://pdg.lbl.gov/2011/reviews/rpp2011‐rev‐sta<s<cs.pdf

Forquo<ngGaussianuncertain<esonsingleparameters.Ellipseisacontourof2ΔlnL=1

Fordisplayingjointes<ma<onofseveralparameters

6/5/12 T.JunkSLACStatsProgress 5

1Dor2DPresenta%on

Parameter1

Parameter2

68%

68%

2ΔlnL=168%

2ΔlnL=2.3

Ipreferwhenshowinga2Dplot,showingthecontourswhichcoverin2D.The2ΔlnL=1contouronlycoversforthe1Dparameters,oneata<me.

6/5/12 T.JunkSLACStatsProgress 6

PRD85,071106PRD83,032009

h`p://www‐cdf.fnal.gov/physics/new/top/2011/WhelDil/index.html

AVarietyofwaystoshow2DFitresults

68%and95%contours

6/5/12 T.JunkSLACStatsProgress 7

HypothesisTes%ngwithTwoParametersofInterest?

Seeforexample,D0’sevidencefort‐channelsingletopproduc<on:Phys.Le`.B705(2011)313‐319

“...usingthelog‐likelihoodapproach...wecomputeforthefirst<methesignificanceofthetqbcrosssec<onindependentlyofanyassump<onontheproduc<onrateoftb.”

Inthisspecificcasethecorrela<onissmall,andthisclaimisn’tsobad.Buttocalculateap‐valuep(λ≥λobs|signal=0),oneneedsasamplespaceofpseudoexperiments,andthusanassump<onofthes‐channelrate.

6/5/12 T.JunkSLACStatsProgress 8

HypothesisTes%ngwithTwoParametersofInterest?

Anotherissue:Avaria<onontheLEEtheme.

s‐channelandt‐channelsingletopeventsaredifferintheirkinema<cdistribu<ons

Forobserva<onandmeasurementofthetotalsingletopproduc<onrateσs+t,we’dtrainourMVA’sontheStandardModelmixtureofboth.

Forseparatemeasurementofσsandσt,wehavechoicesoftrainingstrategies.Single‐tagevents–t‐channel;Double‐tagevents–s‐channel.

Supposewewanttodoahypothesistestonjustthet‐channel?Re‐op<mizeallMVA’switht‐channelasthesignalands‐channelasbackground.Re‐dothisfors‐channel.

Ideallywe’dpickthemostsensi<veMVA(highestexpectedsignificance)forthetestwewanttodo,butthereisatempta<ontopicktheonewiththehighestmeasuredsignificance(wetellanalyzerstopickthemostsensi<ve).

Ifyouhaveanexcessofdataeventsthatcouldbes‐ort‐channel(can’ttell),thisstrategymayendupgivinganobserva<onofboth(orneither),whenwe’rereallyonlysurethere’satleastoneprocesspresent.

6/5/12 T.JunkSLACStatsProgress 9

SeveralAnalysesontheSameData• Differentgroupsareinterestedinthesamesearch/measurementusingthesamedata.• Mayhaveslightlydifferentselec<onrequirements(Jetenergies,leptontypes,missingEt,etc).• UsuallyhavedifferentchoicesofMVAoreventrainingstrategiesforthesameMVA• Alwayswillgivedifferentresults!

• Whattodo?• Pickoneandpublishit–criterion:bestsensi<vity.Medianexpectedlimit,medianexpectedp‐value,medianexpectedmeasurementuncertainty.Howtopickitiftheresultis2D?Needa1Dfigureofmerit.• Cancheckconsistencywithpseudoexperiments.Ap‐valueusingΔ(measurement)asateststa<s<c.What’sthechanceofrunningtwoanalysesonthesamedataandgevngaresultasdiscrepantaswhatwegot?• CombineMVA’sintoasuper‐MVA

• Keepseveryonehappyandinvolved• Usuallyhelpssensi<vity• Requirescoordina<onandalignmentofeacheventindataandMC• Easiestwhenoverlapindatasamplesis100%.Otherwisehavetobreaksampleupintosharedandnon‐sharedsubsetsandanalyzethemseparately

• Whatnottodo:Picktheonewiththe“best”observedresult.(LEE!)

6/5/12 T.JunkSLACStatsProgress 10

AnExampleofRunningThreeAnalysesontheSameEventsinMonteCarloRepe%%ons

Differentques<onscanbeasked:What’sthedistribu<onofthemaximumdifferencebetweenthemeasurementsanytwoteams?What’sthequadraturesumofthepairwisedifferences?Condi<ononthesum?(Probablynot..)

6/5/12 T.JunkSLACStatsProgress 11

Systema%cUncertaintyHandlingForaverythoroughreview,seeLucDemor<er’snote:“PValues:WhatTheyAreandHowtoUseThem“

h`p://www‐cdf.fnal.gov/~luc/sta<s<cs/cdf8662.pdf

Plausibleop<ons:

1)Prior‐Predic<vemethod2)Supremummethod3)Confidence‐Levelmethod4)Plug‐Inp‐values5)Defineallnuisanceparameterstobeparametersofinterest6)Defineonlytheimportantnuisanceparameterstobeparametersofinterest

Theprior‐predic<vemethodisamixtureofBayesianandFrequen<streasoning

Thesupremummethodisveryconserva<veandIarguenotfullynon‐Bayesian.Italsoproducesmixedresults–canhaveanoutcomewhichisanexcessoverbackgroundwhensevnglimitsandadeficitwhencompu<ngap‐value.

6/5/12 T.JunkSLACStatsProgress 12

TreatNuisanceParametersAsParametersofInterest!

• Oneperson’snuisanceparameterisanother’sparameterofinterest.

• Reallyonlygoodifyouhaveonedominantsourceofsystema<cuncertainty,andyouwanttoshowyourjointmeasurementofthenuisanceparameterandtheparameterofinterest.

Example:topquarkmass(parameterofinterest),vs.CDF’sjetenergyscaleinall‐hadronic`barevents.Doesn’tfollowmysugges<on!Butoneparameter(JES)isnotaparameterof“interest”Ifitwere,we’duseΔLnL=1.15insteadof0.5

Difficulttoapplytocaseswithmanynuisanceparameters.

6/5/12 T.JunkSLACStatsProgress 13

“Strong”SidebandConstraints

6/5/12 T.JunkSLACStatsProgress 14

AnotherStrongSidebandConstraintExample

6/5/12 T.JunkSLACStatsProgress 15

“Weak”SidebandConstraints

CDF’sΩbobserva<onpaper:

Phys.Rev.D80(2009)072003

6/5/12 T.JunkSLACStatsProgress 16

AMixtureofTheoryandDataisNeededforamoreComplicatedsitua<on

HZZFourLeptonsMainBackgrounds:

ppZZ(MC)*theoryppZ+jets,ZW+jet(s),...Datapp`bar

Low‐mass4Lside:off‐shellZ’s,“radia<vetail”,andotherbackgrounds

Dependentontheore<calpredic<onsoftheshapeofthedominantZZbackground.

Withmoredata,replace“bad”systema<ccswith“good”ones(theoryreplacedwithdata).Butintheearlystages,the“bad”systema<cuncertain<esaresmaller!

6/5/12 T.JunkSLACStatsProgress 17

AnotherWeakSidebandConstraintExamplethatLooksLikeaStrongSidebandConstraint

Phys.Rev.D85(2012)032005e‐Print:arXiv:1106.4782[hep‐ex]

6/5/12 T.JunkSLACStatsProgress 18

BreakingtheFlavorDegeneracywithaTagVariable

6/5/12 T.JunkSLACStatsProgress 19

NoSidebandConstraints?Example:Coun<ngexperiment,onlyhaveaprioripredic<onsofexpectedsignalandbackground

Allteststa<s<csareequivalenttotheeventcount–theyservetoorderoutcomesasmoresignal‐likeandlesssignal‐like.Moreevents==moresignal‐like.

Classicalexample:RayDavis’sSolarNeutrinoDeficitobserva<on.Comparingdata(neutrinointerac<onsonaChlorinedetectorattheHomestakemine)withamodel(JohnBahcall’sStandardSolarModel).Calibra<onsofdetec<onsystemwereexquisite.Butitlackedastandardcandle.

Howtoincorporatesystema<cuncertain<es?Fewerop<onsle�.

Anotherexample:Beforeyouruntheexperiment,youhavetoes<matethesensi<vity.Nosidebandconstraintsyet(exceptfromotherexperiments).

Priorpredic<vemethodthenisequivalenttotheprofilemethodusingthecontrolsamplestoes<matenuisanceparameters.Andit’smoregeneralincasesthatthesignalcontamina<onofthesidebandsisimportant.

6/5/12 T.JunkSLACStatsProgress 20

Several“on’s”,Several“off’s”

6/5/12 T.JunkSLACStatsProgress 21

Morethanone“off”sample

Conflic<nges<matesofbackground–whattodo?

Verytypicalexample:Pythiavs.Herwig(herethe“off”samplesareMonteCarlo).“Takethedifferenceasasystema<c”“Taketheaverageandhalfthedifferenceasasystema<c”Trytolearnsomethingaboutwhichoneismorereliable.

Butyoucaninvertmorethanonecutandhave“conflic<ng”offsamplesinthedatatoo.Reallytheextrapola<onfactorsorthesamplecomposi<ones<matesarewhat’swrong,nottheactualdata.

6/5/12 T.JunkSLACStatsProgress 22

Lessonlearned:Trytodoabe`erjobwiththepredic<ons!Sta<s<calmethodswon’tsaveus.

SeeGlenCowan’stalkyesterday

6/5/12 T.JunkSLACStatsProgress 23

MCSta%s%csand“Broken”Bins

• Limitcalculators/discoverytoolscannottellifthebackgroundexpecta<onisreallyzeroorjustadownwardMCfluctua<on.• Realbackgroundes<ma<onsaresumsofpredic<onswithverydifferentweightsineachMCevent(ordataevent)• Rebinningorjustcollec<ngthelastfewbinstogethero�enhelps.• Problemcompoundedbyrequiringshapeuncertain<estobeevaluated!AlternateshapeMCsamplesareo�enevenmorethinlypopulatedthanthenominalsamples.Valida<onofadequateprepara<onofresultsisnecessary?(butwhatarethecriteria?)

NDOF=?

OverbinningislikeovertrainingaNN.s,b,anddcanallbeindifferentbins.Ahistogramcanbepar<allyoverbinned,likethisonehere:

6/5/12 T.JunkSLACStatsProgress 24

SomeVeryEarlyPlotsfromATLASSufferfromlimitedsamplesizesincontrolsamplesandMonteCarloNearlyallexperimentsareguiltyofthis,especiallyintheearlydays!

Thele�plothasadequatebinninginthe“uninteres<ng”region.Fallsapartontheright‐handside,wherethesignalisexpected.Sugges<ons:MoreMC,Widerbins,transforma<onofthevariable(e.g.,takethelogarithm).Notsurewhattodowiththeright‐handplotexceptgetmoremodelingevents.

Datapoints’errorbarsarenotsqrt(n).Whatarethey?Idon’tknow.Howabouttheuncertaintyonthepredic<on?

6/5/12 T.JunkSLACStatsProgress 25

ThisHistogramisProbablyOkay

Thebinningisali`leodd,though.Youcangetthiskindofdistribu<onfromadecisiontreeoralikelihoodMVA.(forestofdeltafunc<ons)

Watchoutthough!Smoothingandsomekindsofinterpola<ons(E.G.horizontalmorphingalaAlexRead)areinappropriateforthisdistribu<on.

Some<mesdistribu<onslikethesehavenaturalcauses:Leptonφdistribu<onsfordetectorswithmanycracks,forexample.

MVAmark

6/5/12 T.JunkSLACStatsProgress 26

AMoreCommonExample–muoncoverageathighangles.

Nosmoothing/extrapoa<onallowedhere!

6/5/12 T.JunkSLACStatsProgress 27

Twocompe<ngeffects:

1)Separa<onofeventsintoclasseswithdifferents/bimprovesthesensi<vityofasearchorameasurement.Addingeventsincategorieswithlows/btoeventsincategorieswithhighers/bdilutesinforma<onandreducessensi<vity.

Pushestowardsmorebins

2)InsufficientMonteCarlocancausesomebinstobeempty,ornearlyso.Thisonlyhastobetrueforonehigh‐weightcontribu<on.

Needreliablepredic<onsofsignalsandbackgroundsineachbin

Pushestowardsfewerbins

Note:Itdoesn’tma`erthattherearebinswithzerodataevents–there’salwaysaPoissonprobabilityforobservingzero.

Theproblemisinadequatepredic<on.Zerobackgroundexpecta<onandnonzerosignalexpecta<onisadiscovery!

Op%mizingHistogramBinning

6/5/12 T.JunkSLACStatsProgress 28

ACommonpi�all–Choosingselec<oncriteriaa�erseeingthedata.“Drawingsmallboxesaroundindividualdataevents”

ThesamethingcanhappenwithMonteCarloPredic<ons–

Limi<ngcase–eacheventinsignalandbackgroundMCgetsitsownbin.FakePerfectsepara<onofsignalandbackground!.

Sta<s<caltoolsshouldn’tgiveadifferentanswerifbinsareshuffled/sorted.

Trysor<ngbys/b.Andcollectbinswithsimilars/btogether.Cangetarbitrarilygoodperformancefromananalysisjustbyoverbinningit.

Note:Emptydatabinsareokay–justemptypredic<onisaproblem.Itisourjobhowevertoproperlyassigns/btodataeventsthatwedidget(andallpossibleones).

Overbinning=Overlearning

6/5/12 T.JunkSLACStatsProgress 29

ModelValida%on

• Notnormallyasta<s<csissue,butsomethingHEPexperimentalistsspendmostoftheir<meworryingabout.

• Systema<cUncertain<esonpredic<onsareusuallyconstrainedbydatapredic<ons.

• O�endiscrepanciesbetweendataandpredic<onarethebasisfores<ma<ngsystema<cuncertainty

6/5/12 T.JunkSLACStatsProgress 30

CheckingInputDistribu%onstoanMVA• Relaxselec<onrequirements–showmodelinginaninclusivesample(example–nob‐tagrequiredforthecheck,butrequireitinthesignalsample)• Checkthedistribu<onsinsidebands(requirezerob‐tags)• Checkthedistribu<oninthesignalsampleforallselectedevents• Checkthedistribu<ona�erahigh‐scorecutontheMVA

Example:Qlepton*ηuntaggedjetinCDF’ssingletopanalysis.Goodsepara<onpowerfort‐channelsignal.

highest|η|jetasawell‐chosenproxy

Phys.Rev.D82:112005(2010)

6/5/12 31T.JunkSta<s<csETHZurich30Jan‐3Feb

CheckingMVAOutputDistribu%ons• CalculatethesameMVAfunc<onforeventsinsideband(control)regions• Forvariablesthatarenotdefinedoutsideofthesignalregions,putinproxies.(some<mesjustazerofortheinputvariableworkswellifthequan<tyreallyisn’tdefinedatall–pickatypicalvalue,notonewayoffontheedgeofitsdistribu<on)• BesuretousethesameMVAfunc<onasforanalyzingthesignaldata.

Example:CDFNNsingle‐topNNvalidatedusingeventswithzerob‐tag

signalregion

Phys.Rev.D82:112005(2010)

6/5/12 T.JunkSLACStatsProgress 32

AComparisoninaControlSamplethatisLessthanPerfectCDF’ssingletopLikelihoodFunc<ondiscriminantcheckedinuntaggedevents

Phys.Rev.D82:112005(2010)

Strategy:Assessashapesystema<ccoveringthedifferencebetweendataandMC–extrapolatetheuncertaintyfromthecontrolsampletothesignalsample.

Ifthecomparisonisokaywithinsta<s<calprecision,donotassesanaddi<onaluncertainty(even/especiallyiftheprecisionisweak).Barlow,hep‐ex/0207026(2002).

6/5/12 T.JunkSLACStatsProgress 33

AnotherValida%onPossibility–TrainDiscriminantstoSeparateEachBackground

Phys.Rev.D82:112005(2010)

SameinputvariablesassignalLF.LFhasthepropertythatthesumoftheseplusthesignalLFis1.0foreachevent.Givesconfidence.Ifthecheckfails,it’sastar<ngpointforaninves<ga<on,andnotawaytoes<mateanuncertainty.

6/5/12 T.JunkSLACStatsProgress 34

ModelValida%onwithMVA’s

• Eventhoughinputdistribu<onscanlookwellmodeled,theMVAoutputcoulds<llbemismodeled.Possiblecause–correla<onsbetweenoneormorevariablescouldbemismodeled• Checksinsubsetsofeventscanalsobeincomplete.Asumofdistribu<onswhoseshapesarewellreproducedbythetheorycans<llbemismodelediftherela<venormaliza<onsofthecomponentsismismodeled.

• Cancheckthecorrela<onsbetweenvariablespairwisebetweendataandpredic<on• Difficulttodoifsomeofthepredic<onisaone‐dimensionalextrapola<onfromcontrolregions(e.g.,ABCDmethods).

• Myfavorite:ChecktheMVAoutputdistribu<oninbinsoftheinputvariables!WecaremoreabouttheMVAoutputmodelingthantheinputvariablemodelinganyway.• Makesuretousethesamenormaliza<onschemeasfortheen<redistribu<on–donotrescaletoeachbin’scontents.

Ideally,we’dtrytofindacontrolsampledepletedinsignalthathasexactlythesamekindofbackgroundasthesignalregion(usuallythisisunavailable).

6/5/12 T.JunkSLACStatsProgress 35

TheSumofUncorrelated2DDistribu%onsmaybeCorrelated

x

y

Knowledgeofonevariablehelpsiden<fywhichsampletheeventcamefromandthushelpspredicttheothervariable’svalueeveniftheindividualsampleshavenocovariance.

6/5/12 T.JunkSLACStatsProgress 36

“ABCD”MethodsCDF’sWCrossSec<onMeasurement

Isola<onfrac<on=

Energyinaconeofradius0.4aroundleptoncandidatenotincludingtheleptoncandidate/EnergyofleptoncandidateMissingTransverseEnergy(MET)

WantQCDcontribu<ontothe“D”regionwheresignalisselected.

Assumes:METandISOareuncorrelatedsamplebysampleSignalcontribu<ontoA,B,andCaresmallandsubtractable

ABCDmethodsarereallyjuston‐offmethodswhereτismeasuredusingdatasamples

6/5/12 T.JunkSLACStatsProgress 37

“ABCD”MethodsAdvantages• Purelydatabased,goodifyoudon’ttrustthesimula<on• Modelassump<onsareinjectedbyhandandnotinacomplicatedMonteCarloprogram(mostly)• Modelassump<onsareintui<ve

Disadvantages• Thelackofcorrela<onbetweenMETandISOassump<onmaybefalse.e.g.,semileptonicBdecaysproduceunisolatedleptonsandMETfromtheneutrinos.• Evenatwo‐componentbackgroundcanbecorrelatedwhenthecontribu<onsaren’tbythemselves.• Anotherwayofsayingthatextrapola<onsaretobechecked/assignedsufficientuncertainty• WorksbestwhentherearemanyeventsinregionsA,B,andC.Otherwisealltheproblemsoflowstatsinthe“Off”sampleintheOn/Offproblemreappearhere.LargenumbersofeventsGaussianapproxima<ontouncertaintyinbackgroundinD• Requiressubtrac<onofsignalfromdatainregionsA,B,andCintroducesmodeldependence• Worse,thesignalsubtrac<onfromthesidebandsdependsonthesignalratebeingmeasured/tested.Asmalleffectifs/binthesidebandsissmallYoucaniteratethemeasurementanditwillconvergequickly

6/5/12 T.JunkSLACStatsProgress 38

ExamplesofABCDMethods• Sidebandcalibra<onofbackgroundunderapeak.(“whatifthebackgroundpeaksalsowherethesignalpeaks?”)• Theon‐offproblemwithτ=A/C.VeryfrequentlysamplesAandCareinMCsimula<ons,wherewecanbesurenottocontaminatethebackgroundes<ma<onsw<hsignal.Example:UsingtheMCtoes<mateacceptanceforacutforbackground,tobescaledwithadatacontrolsample.ButwepaythepriceofunknownMCmismodeling.

Uncorrelatedvariableassump<on==assump<onthatτisthesameinthedataandtheMC.(checkmodelingofshapeofdistribu<onintheMC)

Equivalentofpreviousproblem:EvenifthebackgroundshapesarewellmodeledbytheMC,iftherearemul<plebackgroundprocesseswhichcontribute,theycanhavedifferentfrac<onalcontribu<ons,distor<ngthetotalshapes.

• FivnganMVAshapetothedata.Low‐scoreMC=A,High‐ScoreMC=CLow‐scoredata=B,High‐scoreData=D.

6/5/12 T.JunkSLACStatsProgress 39

AnApproximateLEECorrec%onforPeakHun%ngSeeE.GrossandO.Vitells,Eur.Phys.J.C70(2010)525‐530.

Approximateformulaappliestobumphuntsonasmoothbackground.

Requiresafewfullysimulatedpseudoexperimentswithcompletep‐valuecalcula<onsovertheregionofinterest.Countup‐crossingsofathreshold.Extrapolatestohigherthresholdsassuminglarge‐samplebehavior.Specifically,thattheLRteststa<s<chasachisquareddistribu<on.

Aninteres<ngfeature–specifictobumphuntsbutmaybemoregeneral:

Astheexpectedsignificancegoesup,sodoestheLEEcorrec%on

Thismakeslotsofsense:LEEdependsonthenumberofseparatemodelsthatcanbetested.Aswecollectmoredata,wecanmeasuretheposi<onofthepeakmoreprecisely.

Sowecantellmorepeaksapartfromeachother,evenwiththesamereconstruc<onresolu<on.

But:Combineapoorresolu<onlows/bsearchwithahighresolu<onhighs/bbutvery<nysandvery<nybsearch–maynotgettherightanswer.

6/5/12 T.JunkSLACStatsProgress 40

CDF’s2011HγγSearch

+2otherchannelswithsmallerexcesses

Insufficientsensi<vitytoaSMHiggsboson.Rateruledoutbyothersearches(ggHWWforexample).Soweknowthebumpisastatfluctua<on.

6/5/12 T.JunkSLACStatsProgress 41

arXiv:1203.3774

CDF+D0HiggsSearchChannelsCombined

>300nuisanceparameters

6/5/12 T.JunkSLACStatsProgress 42

TevatronHiggsSearchLEELocalp‐valuevsHiggsbosonmass

Bandsshowexpecta<onassumingasignalispresent(ateachmHseparately)

CrossSec<on<mesBranchingRa<oFitsvs.mH

Acomplica<on:MostsearchchannelstraintheirMVA’sseparatelyateachmHtoop<mizesensi<vity

6/5/12 T.JunkSLACStatsProgress 43

Signal,Background,andDataintheZHllbbSearch

Reconstructedmjjdistribu%on

YoucantellmHisonthehighsideoftherangeonlybywhat’smissingandnotwhat’sthere!

6/5/12 T.JunkSLACStatsProgress 44

)2Higgs Mass (GeV/c

100 110 120 130 140 150

95

% C

L L

imit

/SM

1

10

210 ! 1!Expected Limits

! 2!Expected Limits

2=115 GeV/cH

Expected with Injected M

)-1CDF II Preliminary (10 fb

CDF’sWHchannelexpecta<on(x3luminositytosimulatethepresenceofotherchannels:llbb,METbb)

Witha115GeVsignalinjected

6/5/12 T.JunkSLACStatsProgress 45

S<rringitAllTogether–D0’sLLRTest

Assumingobservedandexpected+3sigmaexcess,andmedianoutcome.Resolu<onfrom‐2ΔLLR=Δχ2=1

Resolu<onat115GeV:±5GeVResolu<onat135GeV:~±10GeV

6/5/12 T.JunkSLACStatsProgress 46

Aninteres<ngBiasBillMurrayShowedatTheNextStretchoftheHiggsMagnificentMileConference

h`ps://twindico.hep.anl.gov/indico/conferenceOtherViews.py?view=standard&confId=856

SeekabumponasmoothbackgroundExample:LHC(orTevatron)Hγγsearch.

AllowmHtofloatandpickthemHthatmaximizesthefi`edcrosssec<on.

Thefi`edcrosssec<onwillbebiasedupwardsandtheposi<onresolu<onof“lucky”outcomeswillbeworsethanunluckyonesevenifasignalistrulypresent.

Why?Atruebumpcancoalescewithafluctua<oneithertothele�ortotherightofthebump(twochancestofluctuateupwards).

Effectcanbesubstan<al!Calibratewithsimulatedexperimentaloutcomes(FC).

6/5/12 T.JunkSLACStatsProgress 47

LEEforLimits?

10-6

10-5

10-4

10-3

10-2

10-1

1

100 102 104 106 108 110 112 114 116 118 120

mH

(GeV/c2)

CL

s

114.4115.3

LEP

Observed

Expected forbackground

No,butthereistheoppositeeffect.Wetakethemostconserva<vemassexclusion.IftheCLscurvecrossesseveral<meswequotethesmallest(LEP).

Hardtosaywhatthemedianexpectedlimitis–theplacewherethemedianCLscrossesthelineishigherthanthemedianlowestlimit.

LHCandTevatronexperimentsquotemul<pledisjointmHlimits.

NoLEE:jus<fica<on–eachtestateachmHisanindependentsearchwithitsownerrorrate,assumingapar<cleistrulythereateachmassoneata<me.

6/5/12 T.JunkSLACStatsProgress 48

1

10

100 110 120 130 140 150 160 170 180 190 200

1

10

mH (GeV/c2)

95

% C

L L

imit

/SM

Tevatron Run II Preliminary, L ! 10.0 fb-1

Expected

Observed

!1 s.d. Expected

!2 s.d. Expected

LE

P E

xc

lus

ion

Tevatron

+ATLAS+CMS

Exclusion

SM=1

Te

va

tro

n +

LE

P E

xc

lus

ion

CM

S E

xclu

sio

n

AT

LA

S E

xclu

sio

n

AT

LA

S E

xclu

sio

n

LE

P+

AT

LA

S E

xclu

sio

n

ATLAS+CMS

Exclusion

ATLAS+CMS

Exclusion

February 27, 2012

LEEforLimits?Mul<pleexperimentssearchingforthesamepar<cle.

Mul<plechancestofalselyexcludeapar<clethat’sactuallythere.

Veryeasytotaketheunionofexcludedregions,butthisdoesnothave95%coverage.

Thebestthingtodoistocombineforasingleinterpreta<on.

Butthelimitsareofsecondaryimportancehere.

6/5/12 T.JunkSLACStatsProgress 49

Whereis“Elsewhere?”Acollidercollabora<onistypicallyverylarge;>1000Ph.D.students.ATLAS+CMSisanotherfactoroftwo.(FourLEPcollabora<ons,TwoTevatroncollabora<ons).

Manyongoinganalysesfornewphysics.Thechanceofseeingabumpsomewhereislarge.WhatistheLEE?

Dowehavetocorrectourpreviouslypublishedp‐valuesforalargerLEEwhenweaddnewanalysestoourpor�olio?

Howaboutthephysicistwhogoestothelibraryandhand‐picksallthelargestexcesses?WhatisLEEthen?

“Consensus”attheBanff2010Sta<s<csWorkshop:LEEshouldcorrectonlyforthosemodelsthataretestedwithinasinglepublishedanalysis.Usuallyonepapercoversoneanalysis,butreviewpaperssummarizingmanyanalysesdonothavetoputinaddi<onalcorrec<onfactors.

FortheWinter2012Higgssearchanalyses,wehadseveralLEE’scomputed,dependingonthemassrangedefinedtobeelsewhere.

Caveatlector.

6/5/12 T.JunkSLACStatsProgress 50

Whereis“Elsewhere?”LEEiso�enhardenoughtoevaluate.Rightwaytodoit–computep‐valueofp‐valuessimulateexperimentassumingzerosignalmany<mesandforeachsimulatedoutcomefindthemodelwiththesmallestp‐value.Mul<dimensionalmodelsareharder,andLEEisworse.

Kane,Wang,Nelson,Wang,Phys.Rev.D71,035006(2005)

ALEPH,DELPHI,L3,OPAL,andtheLHWGPhys.Lel.B565(2003)61‐75

ALELPH,DELPHI,L3,OPAL,andtheLHWGEur.Phys.J.C47(2006)547‐587

Twoexcessesseen;proposedmodelsexplainbothwithtwoHiggsbosons.Combinedlocalsignificanceisgreater,butLEEnowismuchlarger(andunevaluated).Publishedplotgraysoutregionbeyondexperimentalsensi<vity.

6/5/12 T.JunkSLACStatsProgress 51

6/5/12 T.JunkSLACStatsProgress 52

ChoosingaRegionofInterest

• Idonothaveafoolproofprescrip<onforthis,justsomethoughts.

• Analysesaredesignedtoop<mizesensi<vity,butLEEdilutessensi<vity.Thereisapenaltyforlookingformanyindependentlytestablemodels.Canweop<mizethis?

• Butyoushouldalwaysdoasearchanyway!Ifyouexpecttobeabletotestamodel,youshould.

• Tes<ngpreviouslyexcludedmodels?Wedothisanyway,justincasesomenewphysicsshowsupinawaythatevadedtheprevioustest.

• Thereisnosuchthingasamodel‐independentsearch.MerelybuildingtheLHCortheTevatronmeanswehadsomethinginmind.AndtheSM(orjustourimplementa<onofit)iswrong,butpossiblynotinawaythatisbothinteres<ngandtestable.

6/5/12 T.JunkSLACStatsProgress 53

LookElseWHENRunningaveragesconvergeoncorrectanswer,butthedevia<onsinunitsoftheexpecteduncertaintyhavearandomwalkinthelogarithmofthenumberoftrials

TherkareIIDnumbersdrawnfromaunitGaussian.

dn =

rk /nk=1

n

∑1/ n

Trial Number

d-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

1 10 102

103

104

105

106

107

108

109

=nIt’spossibletocherry‐pickadatasetwithamaximumdevia<on.“Samplingtoaforegoneconclusion”

StoppingRule:InHEP,we(almostalways!)takedataun<lourmoneyisgone.Weproduceresultsforthemajorconferencesalongtheway.Somewillcoincidentallystopwhenthefluctua<onsarebiggest.Wetakethemostrecent/largestdatasampleresultandignore(orshould!)resultsperformedonsmallerdatasets.p‐valuess<lldistributeduniformlyfrom0to1.Arecipeforgenera<ng“effectsthatgoaway”

6/5/12 T.JunkSLACStatsProgress 54

LookElseWHENC.Paus,Implica<onsWorkshop,Mar.27,2012

h`ps://indico.cern.ch/conferenceOtherViews.py?view=standard&confId=162621

6/5/12 T.JunkSLACStatsProgress 55

ParameterEs%ma%on–MarginalizeorProfile?

0

5

10

15

20

25

30

-3 -2 -1 0 1 2 3Nuisance Parameter ! (units of ")

Yie

ld

PredictedObserved

Predicted = 10+6

Observed = 15+3

Nuisance Parameter ! (units of ")

Lik

elih

oo

d

0

0.02

0.04

0.06

0.08

0.1

-3 -2 -1 0 1 2 3

IfPred=10‐6‐3,andobs=15,thenthelikelihoodwouldhaveonemaximum,butitwouldhaveacorner.MINUITmayquoteinappropriateuncertain<esasthesecondderiva<veisn’twelldefined.

Thecornercanbesmoothedout–SeeButIknowofnowayR.Barlow,h`p://arxiv.org/abs/physics/0406120,togetridofthedouble‐peakh`p://arxiv.org/abs/physics/0401042(norshouldtherebeaway‐‐h`p://arxiv.org/abs/physics/0306138itcanbearealeffect.SeetheLEP2TGCmeasurements)

6/5/12 T.JunkSLACStatsProgress 56

AnalysisOp%miza%oninIsola%onorinCombina%on?Typicalsitua<on:

Ameasurementhasasta<s<calandasystema<cuncertainty,wherethesta<s<caluncertaintyincludes“good”systema<csthatareconstrainedbythedata,andthe“bad”onesnevergetbe`erconstrainednoma`erhowmuchdataarecollected.

Wesome<meshaveachoiceofhowtoanalyzemarkedPoissondata.

1)aggressivereconstruc<onmakingassump<onsaboutpar<cledistribu<ons–moresta<s<calpowerpereventatthecostofintroducingsystema<cuncertainty

2)moremodel‐independentanalysiswithfewerassump<ons–lesssta<s<calpowerpereventbutbe`ercontroloversystema<cs.

Combina<onwithothermeasurements(fromotherdatarunsorothercollabora<ons)islikecollec<ngmoredata.Method1hitsthesystema<climitandlosesweightinthecombina<oneventhoughitmaybethemostpowerfulmethodbyitself.

Moregeneral:Withli`ledata,wearemoredependentonourassump<ons,withmoredatawecanrelaxtheassump<onsandconstrainourmodels.

Recommenda<on:Forcombina<ons,op<mizeforthelargeluminositycase.

6/5/12 T.JunkSLACStatsProgress 57

Correla%onsamongUncertain%es–WhenisitConserva%ve,whennot?

• Withinachannel–contribu<onsthataddtogether:includingcorrela<onsusuallyweakensthesensi<vity(always:sensi<vityisexpected)• Betweenchannels–accoun<ngforcorrela<onsisnotconserva<veOnechannel’sobserveddatabecomesanother“off”sampleforanother’s.Havetotrustalltheτfactors,andevenoffsetsfromcentralpredic<onsinordertoputinthesecorrela<ons.

• Overes<ma<ngtheimpactsofsystema<cuncertaintyonapredic<onisnotconserva<veifacorrela<onistakenintoaccount.Canresultinunderes<matedsystema<cerroronacombinedresult.

Example(systema<cuncertainty1is100%correlated,systuncertainty2is100%correlated)

Measurement1:m1=5±1(syst1)±1(syst2)CombinewithBLUE:mbest=2m1‐m2Measurement2:m2=5±1(syst1)±2(syst2)mbest=5±1(syst1)±0(syst2)

Hereaccoun<ngforcorrela<onandanoveres<matedsystema<cuncertaintyresultsinanaggressiveresult.

6/5/12 T.JunkSLACStatsProgress 58

ExtraSlides

6/5/12 T.JunkSLACStatsProgress 59

Whereis“Elsewhere?”• Mostsearchesfornewphysicshavea“regionofinterest”

• Defini<onisachoiceoftheanalyzer/collabora<on• O�enboundedbelowbyprevioussearches,boundedabovebykinema<creachoftheaccelerator/detector• Limitstheamountofworkinvolvedinpreparingananalysis.Some<mesa2DsearchinvolveslotsoftrainingofMVA’sandcheckingsidebandsandvalida<onofinputsandoutputs

Example:Asearchforpair‐producedstopquarkswhichdecaytoc+Neutralino

IfMstop>mW+mb+mneutralinothenanotheranalysistakesover.

6/5/12 60T.JunkSta<s<csETHZurich30Jan‐3Feb

AnExample:Double‐TagMethods

x

yDijeteventsatLEP1/SLD

Zu,ubard,dbars,sbarb,bbarleptonsneutrinos

PrimaryVertex

Adouble‐vertex‐B‐taggedeventwithasemileptonicdecay

B‐taggingefficiencies(efficiencyoffindingthedisplacedvertex)areabout40%.WedonottrustMCmodelingoftheb‐tagefficiency.WouldliketomeasuretheB‐tagefficiencyandtheBr(Zb,bbar)branchingfrac<ontogetherinthesamedata.Counteventswith0,1,and2vertextags.Enoughinforma<ontosolvefortheBrandtheefficiency.

x=b‐tagofjet1,y=b‐tagofjet2.Assumeuncorrelatedprobabili<esfortaggingthejets.Buttheflavorofthejetsiscorrelated!Itisthisflavorcorrela<onthatallowsustoextractBrandTageff.

6/5/12 T.JunkSLACStatsProgress 61

On‐OffMeasurements–AveragedorCombined

• Oneglobalon‐offmeasurementvs.breakingthedataintosubsamples• Assumethe“off”dataarecollectedalongwiththe“on”data(controlsampleontheothersideofacutforexample)

• Globalon‐offmeasurementallowseachdatasubsample’soffmeasurementstohelpmeasureeachotherdatasubsample’sonsample’sbackgrounds.Assump<onwhichmaybefalse:youareallowedtodothis.Ifthedetectororacceleratorchangedpartwaythroughtherun,thenyoumayneedtobreakthesamplesup.

• Breakingthemapartallowsonlyeachsubsample’soffmeasurementstocalibratethebackgroundinthecorrespondingonsamples.

• SameforABCDmethods–averagingsubsamples

6/5/12 T.JunkSLACStatsProgress 62

SingleTopProduc5onMechanisms

“s‐Channel” “t‐Channel”

“NLOContribu<onstot‐ChannelProduc<on”

6/5/12 T.JunkSLACStatsProgress 63

LeveragingourRateMeasurementstoMeasuretheHiggsBosonMass

AssumingSMcrosssec<onsandbranchingfrac<ons,measuredratesarestrongfunc<onsofmH.ExampleatmH=115GeV,assuming+3sigmaexcess,andamedianoutcomeinboththebb,ττchannelsandtheWWchannels:

Tauchannelscancontributehere,evenwithlessprecisemrecthanthebbchannels

6/5/12 T.JunkSLACStatsProgress 64

6/5/12 T.JunkSLACStatsProgress 65

6/5/12 66

Interes%ngBehaviorofCLsCLsmaynotbeamonotonicfunc<onof‐2lnQ

Tailsinthe‐2lnQdistribu<onsharedinthes+bandb‐onlyhypothesis(fitfailures)

Notreallyapathologyofthemethod,butratherareflec<onthattheteststa<s<cisn’talwaysdoingitsjobofsepara<ngs+b‐likeoutcomesfromb‐likeoutcomesinsomefrac<onofthecases.

CLs=1for‐2lnQ<‐15or‐2lnQ>+15

Distribu<onsaresumsoftwoGaussianseach.ThewideGaussianiscenteredonzero.

Prac<calreasonthiscouldhappen–everythousandthexperimentaloutcome,thefitprogram“fails”

andgivesarandomanswer.

T.JunkSLACStatsProgress

6/5/12 T.JunkSLACStatsProgress 67

ABumpthatGotAway

Dijet mass sum in e+e-→jjjj ALEPH Collaboration, Z. Phys. C71, 179 (1996)

“the width of the bins is designed to correspond to twice the expected resolution ... and their origin is deliberately chosen to maximize the number of events found in any two consecutive bins”

6/5/12 T.JunkSLACStatsProgress 68

ASamplewithZeroCovarianceisNotNecessarilyUncorrelated

y

x

Example–perimeterofacircle.Knowledgeofxprovidesknowledgeofyuptoa2‐foldambiguity.Butthecovarianceofthesamplevanishes!

SomethingtowatchoutforwithPrincipalComponentsAnalysis–doesnotremovecorrela<on,onlycovariance.

6/5/12 T.JunkSLACStatsProgress 69

Notallsearchesarebumphuntsonasmoothbackground

–Mul<variateAnalysesareusuallytrainedupateachmassseparately,andthereisnotasingledistribu<onwecanlookelsewherein.

Sta<s<caleffectsonly.Ifthere’sasystema<ceffectinthebackgroundmodeling,a“signal”maygrowinsignificancewithaddi<onaldatainawaythat’snotdescribedhere.

Themismodelingmaybeconcentratedinasmallpor<onofthehistogram(thisisnotaLEEeffectbutamoredifficultques<on).

Backgroundparameteriza<onmaygrowinsophis<ca<onasdataarecollected.

NotallLRteststa<s<cdistribu<onsaremodeledwellbychisquareddistribu<ons.

Combinealarge‐data‐samplebumphuntwithahighs/b,low‐background(say,b=1e‐5)searchandthedistribu<onoftheLRisaconvolu<onofchisquaredandPoisson.

CasestobeCarefulaboutApplyingtheLEEApproxima%on