Multivariate analysis: BDT Multivariate analysis: BDT • Selection strategy based on multivariate analysis: BDTSelection strategy based on multivariate analysis: BDT• Variables used for training the BDT:Variables used for training the BDT: (after Muon Selection)(after Muon Selection)
pT of the muonspT of the muonsMissing EtMissing EtInvariant MassInvariant MassDphiDphiTransverse massTransverse massDphi(mu,MET)Dphi(mu,MET)Eta of the muonsEta of the muonsNTracksNTracksSumEtSumEt
MET
PtminMtmax
Dphi(,MET) min
Usual variablesUsual variables
Other variablesOther variables
2
Number of tracks with some quality cuts: Number of tracks with some quality cuts: -pt>3 GeV-pt>3 GeV-N hits > 5-N hits > 5- |zTrack-zVer|<0.4 cm- |zTrack-zVer|<0.4 cm
•Most important variables to rejectMost important variables to reject ttbar backgroundttbar backgroundVariables already incorporated to the BDT study for CMS Note (see appendix)
•Definition of Ntracks variable will be updated in 21X (new quality cuts for tracks studied Nhits> 8 , |zTrack-zVer|<0.2 cm).•Study in detail jets with tracks, particle flow, jet Energy corrections…
-Jet veto not applied in this bdt analysis Jet veto not applied in this bdt analysis some variables providing jet some variables providing jet information used insteadinformation used instead
Sum of the energy of all jets in the eventwith Et>15 GeV, eta<2.4
Plans Plans for 21xfor 21x 3
Overtraining checked by TMVAUsing independent samples
BDT Ranking result BDT Ranking result (top variable is best ranked)(top variable is best ranked)---------------------------------------------------Rank : Variable : Variable Importance---------------------------------------------------1 : Dphill : 2.819e-011 : Dphill : 2.819e-012 : Mtmin : 1.573e-013 : etalepmin : 8.728e-024 : MET : 7.767e-025 : etalepmax : 6.933e-026 : InvMass : 6.621e-027 : ptlepmin : 4.916e-028 : DphiLepMetmin : 4.636e-029 : SumEt : 4.154e-0210 : DphiLepMetmax : 3.520e-0211 : Mtmax : 3.438e-0212 : Ntracks : 3.290e-0213 : ptlepmax : 2.069e-02
Example: training at mH=160 GeV
Rank : Variable :Variable Importance-- ------------------------------------------------------------- 1 : Ntracks : 2.591e-011 : Ntracks : 2.591e-01 2 : SumEt : 2.091e-012 : SumEt : 2.091e-01 3 : Mtmin : 8.163e-02 4 : ptlepmin : 8.008e-02 5 : InvMass : 7.526e-02 6 : Dphill : 6.305e-02 7 : etalepmax : 4.709e-02 8: DphiLepMetmin : 3.763e-02 9 : etalepmin : 3.607e-02 10 : Mtmax : 3.565e-02 11 : MET : 3.057e-02 12 : DphiLepMetmax : 2.759e-02 13 : ptlepmax : 1.722e-02
TMVA outputTMVA output• Distribution of the Distribution of the
variables for main variables for main backgrounds WW and backgrounds WW and ttbar have diferent ttbar have diferent shapeshape
• 2 independent trainings 2 independent trainings tried: against WW and tried: against WW and ttbarttbar
4
•Ntracks and SumEt Ntracks and SumEt most useful variables tomost useful variables to reject ttreject tt
•Dphi most useful Dphi most useful variable to reject WW variable to reject WW
Training against ttbarTraining against ttbar
Training against WWTraining against WW
• the resulting functions the resulting functions are combined into a are combined into a bidimensional onebidimensional one
• This method Provides This method Provides also good rejection also good rejection against other against other backgrounds like Z+jets, backgrounds like Z+jets, w+jets…w+jets…
3 training points : 130, 160, 190R=sqrt( (x0-bdt1)2 + (y0-bdt2)2)
First approach circular cut aroundthe region with highest s/b
Signal Mh=160WW
ttbar DY
X axis output from a BDT trained against WWY axis output from a BDT trained agains ttbar
5
Distributions ofSome variables before and after applying the BDT cut (maximizing significance)
Example of using the BDT Example of using the BDT to estimate cuts to be to estimate cuts to be applied on a sequential cut applied on a sequential cut analysis.analysis.
MET
Ptmax
Ptmin
dPhi
InvMass
NTracks
MET>45
Example for mH=160
Ptmax>30Ptmax<50
Ptmin>25
InvMass<50
dPhi<1.2(69º)
Ntracks<10 6
(MET>48)
(Ptmax>28)(Ptmax<50)
(Ptmin>25)
(dPhi<57º)
Cut values usedIn the sequential analysis
55 significance around 160 almost significance around 160 almost achieved with mumu channel ONLYachieved with mumu channel ONLY
HWW(160 HWW(160 GeV)GeV)
WWWW WZWZ ZZZZ tWtW tttt W+jetsW+jets DYDY
Preselection && final state 143143 459459 226226 184184 461461 51285128 177177 3981039810
BDTBDT 1818 2.82.8 0.10.1 00 0.60.6 33 0.0020.002 00
Cut basedCut based 2121 1010 0.10.1 00 0.30.3 66 0.070.07 0.60.6
Numbers including fake rate estimation
7
mH=160 GeV
background vs signal events for various cuts on BDT function.
Preliminary results 21x samples (10 TeV)
HWW165
low statistics for tt
8
mH = 165GeVmH = 165GeV ttbarttbar ZmumuZmumu
CMSSW 16x
CMSSW 21x
CMSSW 16x
CMSSW 21x
CMSSW 16x
CMSSW 21x
HLT Muon HLT Muon +Muon Selection +Muon Selection +Isolation+Isolation
BDTBDT 13% 11% 0.058% 0% 0% 0%
HWW 165: RelVal (CMSSW_2_1_9)TTbar: /TauolaTTbar/Summer08_IDEAL_V9_AODSIM_v1/AODSIMZmumu: /Zmumu/Summer08_IDEAL_V9_AODSIM_v1/AODSIM
CSA07 Vs.:
HWW 165: RelVal (CMSSW_2_1_9)TTbar: /TauolaTTbar/Summer08_IDEAL_V9_AODSIM_v1/AODSIMZmumu: /Zmumu/Summer08_IDEAL_V9_AODSIM_v1/AODSIM
TTbarTTbar
Zmumu:Zmumu:
HWW 165HWW 165 HWW 165HWW 165
HWW 165HWW 165CMSSW_1_6_12CMSSW_1_6_12CMSSW_2_1_9CMSSW_2_1_9
16X: @14 TeV21X: @10 TeV
9
mH = 165GeVmH = 165GeV ttbarttbar ZmumuZmumu
CMSSW 16x
CMSSW 21x
CMSSW 16x
CMSSW 21x
CMSSW 16x
CMSSW 21x
HLT Muon HLT Muon 41% 43% 19% 23% 98% 93%
Muon Selection Muon Selection 21% 21% 11% 9% 96% 88%
Iso Tracks Iso Tracks 81% 83% 37% 34% 88% 88%
Iso Calo Iso Calo 96% 97% 88% 91% 97% 97%
Jet Veto Jet Veto 52% 54% 3% 4% 80% 86%
Met Met 71% 71% 64% 48% 0% 0%
Phi Phi 65% 66% 22% 22% 11% 12%
InvMass InvMass 90% 88% 68% 59% 5% 2%
Ptmax Ptmax 74% 79% 40% 66% 22% 75%
Ptmin Ptmin 69% 68% 55% 0% 39% 100%
Analysis based in sequential cuts for 21x Analysis based in sequential cuts for 21x →→ StartedStarted
Kinematic variables performing well Still to include the pre-selection
Relative efficiencies w.r.t. the previous cut at each level
HWW 165HWW 165
10
BACKGROUND ESTIMATION WITH DATABACKGROUND ESTIMATION WITH DATA Estimation of the number of events from data in the 0 jet bin using
soft muons(*), coming from B semileptonic decays in b jets produced in top decays, so basically, estimation of top-antitop fraction, from “well measured” 2 and >=2 jet bins, extrapolate to 0 jet bin.
(*) see talk by Dmytro Kovalsky http://indico.cern.ch/getFile.py/access?contribId=2&resId=1&materialId=slides&confId=42770
Simple selection of events:
•HWW 2 selection/isolation•MET>50 GeV•|Mll-MZ|<20 GeV
•presence of a soft muon (pT>5 GeV, no isolation required)
•With this simple selection, still presence of tW, W+jets, Z+jets and WW in most relevant bins (0-4), at the level of 10-15%.
tt dimuons,chowder 1_6
11
Next steps for 21x
• Better estimation of the fake using QCD samples (large statistics)
• Better estimation of the background contamination with data driven methods
• Study of muon identification and isolation • Improve results in low mass region
• ….
12