Multivariate analysis of H → bb̄ in associated production of H withtt̄-pair using full simulation of ATLAS detector
Sergey Kotov
MPI für Physik, München
DPG meeting, March 28, 2006
Sergey Kotov (MPI für Physik, München) Multivariate analysis of H → bb̄ in tt̄H production DPG meeting, March 28, 2006 1 / 16
1 Channel overview
2 General reconstruction strategy
3 Building and training of the neural network
4 Analysis results
5 Conclusions and plans
Sergey Kotov (MPI für Physik, München) Multivariate analysis of H → bb̄ in tt̄H production DPG meeting, March 28, 2006 2 / 16
Low mass SM Higgs boson overview
LEP2 experimental bounds on Higgs mass
precision measurements of EW observables:mH = 117+67
−45 GeV
direct searches: mH > 114 GeV
Signature channels for low mass SM Higgs
H → τ+τ− in vector boson fusion
H → γγ in gluon fusion
H → WW∗ → lνlν in vector boson fusion
H → ZZ∗ → 4l in gluon fusion
H → bb̄ in H associated production with tt̄
Sergey Kotov (MPI für Physik, München) Multivariate analysis of H → bb̄ in tt̄H production DPG meeting, March 28, 2006 3 / 16
Channel description
Features of tt̄H, H → bb̄ channel
Complex final state
I 6 jets: 4 b-jets and 2 light jetsI 1 high-pt lepton (trigger)I missing energy from neutrinoI additional jets from ISR/FSR
Large backgrounds
I combinatorial from mis-pairing of jetsI irreducible from tt̄bb̄ eventsI reducible from tt̄ + jets events
Full reconstruction of event and very goodb-tagging are needed
tt̄H, H → bb̄ signal
tt̄bb̄ background
tt̄jj background
Expected number of events at LHC
Process σLO , BR LHC events for L of MC FastSim FullSimpb 30 fb−1 100 fb−1 generator sample sample
tt̄H → (blν)(bjj)(bb̄) 0.52 0.20 3.15k 10.5k Pythia 1M 42ktt̄bb̄ → (blν)(bjj)bb̄ (QCD) 8.1a 0.29 70.5k 235k AcerMC 1.8M 92ktt̄bb̄ → (blν)(bjj)bb̄ (EW) 0.9 0.29 7.8k 26k AcerMC 200k —tt̄ → (blν)(bjj) + jets 500 0.29 4.3M 14.5M Pythia 4M 327K
aStrongly depends upon factorization scale (up to a factor of 2). Here, µ0 = (mt + mH )/2
Sergey Kotov (MPI für Physik, München) Multivariate analysis of H → bb̄ in tt̄H production DPG meeting, March 28, 2006 4 / 16
Event reconstruction: preselection
H t
t
b
b b
b
+W
-W
+l
ν
q ’q
Preselection cuts
≥ 1 isolated leptons
I pt>20 GeV and |η|<2.7I Et < 10 GeV within the isolation cone of ∆R = 0.4I e-Id: EM cluster has a matched track in ID and the cluster
shape is consistent with e-hypothesisI µ-Id: the combined fit of muon track has has good quality
≥ 4 b-jets
I pt>15 GeV and |η|<3I standard ATLAS b-tagging cut: jetWeight > 3
≥ 2 light jets
I pt>15 GeV and |η|<3I b-tagging cut (anti-b-tag): jetWeight < 0.1
Preselection efficiencies
Particle Kinematical Reconstructionacceptance, % efficiency, %
e 82.8 66.0µ 82.3 70.2
b-jet 93.4 42.9light jet 48.6 52.2
15 GeV pt cut on light jets required by anti-b-tagging algorithm considerably decreaseskinematical acceptance
b-tagging algorithm has 60% efficiency, butthere’re fewer than expected reconstructed jets totag
Sergey Kotov (MPI für Physik, München) Multivariate analysis of H → bb̄ in tt̄H production DPG meeting, March 28, 2006 5 / 16
Reconstruction efficiencies and resolutions: leptons
, MeVt
p0 20 40 60 80 100 120 140 160 180 200
310×0
200
400
600
800
1000
1200
Electrons
Matched
Fakes
for electronst
Reconstructed p
, MeVt
p0 20 40 60 80 100 120 140 160 180 200
310×
Eff
icie
ncy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
= 66.0%∈
Efficiency
Fake rate
for electronst
Efficiency vs p
, MeVt
p0 20 40 60 80 100 120 140 160 180 200
310×
t/p t
p∆
-0.2
-0.15
-0.1
-0.05
-0
0.05
0.1
0.15
0.2
for electronst
resolution vs pt
p
, MeVt
p0 20 40 60 80 100 120 140 160 180 200
310×0
200
400
600
800
1000
1200
1400
Muons
Matched
Fakes
for muonst
Reconstructed p
, MeVt
p0 20 40 60 80 100 120 140 160 180 200
310×
Eff
icie
ncy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
= 70.2%∈
Efficiency
Fake rate
for muonst
Efficiency vs p
, MeVt
p0 20 40 60 80 100 120 140 160 180 200
310×
t/p t
p∆
-0.2
-0.15
-0.1
-0.05
-0
0.05
0.1
0.15
0.2
for muonst
resolution vs pt
p
due to high jet activity the efficiencies are somewhat lower than expected
the pt resolutions are ∼ 4% for electrons and ∼ 5% for muons
Sergey Kotov (MPI für Physik, München) Multivariate analysis of H → bb̄ in tt̄H production DPG meeting, March 28, 2006 6 / 16
Reconstruction efficiencies and resolutions: jets
, MeVt
p0 50 100 150 200 250 300
310×0
1000
2000
3000
4000
5000
B-jets
Matched
Fakes
for b-jetst
Reconstructed p
, MeVt
p0 50 100 150 200 250 300
310×
Eff
icie
ncy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
= 42.9%∈
Efficiency
Fake rate
for b-jetst
Efficiency vs p
, MeVt
p0 50 100 150 200 250 300
310×
t/p t
p∆
-0.5
0
0.5
1
1.5
for b-jetst
resolution vs pt
p
, MeVt
p0 50 100 150 200 250
310×0
500
1000
1500
2000
2500
3000
3500
4000
Light jets
Matched
for light jetst
Reconstructed p
, MeVt
p0 50 100 150 200 250
310×
Eff
icie
ncy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
= 52.2%∈
Efficiency
for light jetst
Efficiency vs p
, MeVt
p0 50 100 150 200 250
310×
t/p t
p∆
-2
-1
0
1
2
3
4
for light jetst
resolution vs pt
p
in high jet multiplicity events overlapping of jets considerably deteriorates jet energycalibration and resolution
Sergey Kotov (MPI für Physik, München) Multivariate analysis of H → bb̄ in tt̄H production DPG meeting, March 28, 2006 7 / 16
Event reconstruction: making combinations
Making 4 b-jet + 2 light jets + 1 lepton combinations and selecting the best one
use events which pass preselection criteria (1 lepton, 4 b-jets, 2 light jets)
determine pν from pl and pmiss using mW constraint (if fails, use approximation pzν = pz
l )
reconstruct “leptonic” W → lν from lepton and neutrinos
reconstruct “hadronic” W → jj from jj combinations with |mjj −mW | < 35 GeV (the jets 4-momentarescaled to get the nominal W mass)
permute over all combinations of reconstructed Wlep, Whad, and 4 b-jets
calculate the evaluation parameter for each combination
from each event select the combination with the highest value of this parameter
plot invariant mass distributions from these best combinations and look for a Higgs peak
Various evaluation parameters of tt̄-pair reconstruction
ATLAS TDR: ∆mtt̄ =√
(mblν −mt)2 + (mbjj −mt)2
tt̄-pair likelihood in ATL-PHYS-2003-024 analysis
this analysis uses neural network evaluation parameter
Sergey Kotov (MPI für Physik, München) Multivariate analysis of H → bb̄ in tt̄H production DPG meeting, March 28, 2006 8 / 16
ANN variables: Full simulation vs Fast simulation signal samples
, MeVtt
m∆0 10 20 30 40 50 60 70 80 90 100
310×0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Delta2tMassDelta2tMass
, MeVjjm50 60 70 80 90 100 110
310×0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
jjMassFast, matched
Fast, not-matched
Full, matched
Full, not-matched
jjMass
jjR∆0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
0
0.01
0.02
0.03
0.04
0.05
0.06
jjDeltaRdjjDeltaRd
hbWR∆
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
bWhDeltaRdbWhDeltaRd
lbWR∆
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.01
0.02
0.03
0.04
0.05
bWlDeltaRd
Fast, matched
Fast, not-matched
Full, matched
Full, not-matched
bWlDeltaRd
, MeVHtt
m400 600 800 1000 1200 1400
310×0
0.01
0.02
0.03
0.04
0.05
0.06
ttHMassttHMass
most powerful discriminating variables are ∆mtt̄ and ∆Rjj
Sergey Kotov (MPI für Physik, München) Multivariate analysis of H → bb̄ in tt̄H production DPG meeting, March 28, 2006 9 / 16
The neural network structure and performance
Due to limited size of the full simulation sample, fast simulation sample was used to train the ANN
ANN variables
TDR’s evaluator, ∆mtt̄ =√
(mblν −mt)2 + (mbjj −mt)2
invariant mass of two light jets from Whad
invariant mass of tt̄-H system
∆R between two light jets from Whad
∆R between b-jet and Whad from the same t-quark
∆R between b-jet and Wlep from the same t-quark
∆R between tt̄ system and HiggsbWlDeltaRdNN
jjMassNN
jjDeltaRdNN
bWhDeltaRdNN
Delta2tMassNN
HttMassNN
HttDeltaRdNN
ttHTruthKineTagNN
ANN output value-0.2 0 0.2 0.4 0.6 0.8 1 1.20
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
Fast simulation and Htnot-matched t
, not-matched Htmatched t
, matched Htnot-matched t
and Htmatched t
Fast simulation
ANN output value-0.2 0 0.2 0.4 0.6 0.8 1 1.20
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Full simulation and Htnot-matched t
, not-matched Htmatched t
, matched Htnot-matched t
and Htmatched t
Full simulation
Sergey Kotov (MPI für Physik, München) Multivariate analysis of H → bb̄ in tt̄H production DPG meeting, March 28, 2006 10 / 16
Reconstructed tt̄-pair invariant mass distributions
bWhMassSgnEntries 364
Mean 1.742e+05
RMS 1.84e+04
Constant 3.51± 45.58
Mean 1153± 1.736e+05
Sigma 1333± 1.606e+04
, MeVbjjm100 120 140 160 180 200 220 240 260
310×
Eve
nts
0
10
20
30
40
50
60
bWhMassSgnEntries 364
Mean 1.742e+05
RMS 1.84e+04
Constant 3.51± 45.58
Mean 1153± 1.736e+05
Sigma 1333± 1.606e+04
all
matched
H signaltt bWlMassSgnEntries 364
Mean 1.725e+05
RMS 1.555e+04
Constant 3.83± 52.41
Mean 969± 1.719e+05
Sigma 997± 1.448e+04
, MeVνblm100 120 140 160 180 200 220 240 260
310×
Eve
nts
0
10
20
30
40
50
60
bWlMassSgnEntries 364
Mean 1.725e+05
RMS 1.555e+04
Constant 3.83± 52.41
Mean 969± 1.719e+05
Sigma 997± 1.448e+04
all
matched
H signaltt
bWhMassBgdEntries 267
Mean 1.737e+05
RMS 1.777e+04
Constant 2.98± 33.37
Mean 1436± 1.717e+05
Sigma 1637± 1.636e+04
, MeVbjjm100 120 140 160 180 200 220 240 260
310×
Eve
nts
0
10
20
30
40
50
60
bWhMassBgdEntries 267
Mean 1.737e+05
RMS 1.777e+04
Constant 2.98± 33.37
Mean 1436± 1.717e+05
Sigma 1637± 1.636e+04
backgroundbbtt bWlMassBgdEntries 267
Mean 1.704e+05
RMS 1.72e+04
Constant 2.94± 32.79
Mean 1549± 1.721e+05
Sigma 1877± 1.729e+04
, MeVνblm100 120 140 160 180 200 220 240 260
310×
Eve
nts
0
10
20
30
40
50
60
bWlMassBgdEntries 267
Mean 1.704e+05
RMS 1.72e+04
Constant 2.94± 32.79
Mean 1549± 1.721e+05
Sigma 1877± 1.729e+04
backgroundbbtt
there’s a small shift of ∼3 GeV in the reconstructed mt
the width of the reconstructed mt is ∼16 GeV
Sergey Kotov (MPI für Physik, München) Multivariate analysis of H → bb̄ in tt̄H production DPG meeting, March 28, 2006 11 / 16
Reconstructed Higgs invariant mass distributions
bbMassSgnEntries 364
Mean 1.382e+05
RMS 5.693e+04
Constant 3.63± 31.27
Mean 3241± 1.192e+05
Sigma 5018± 2.384e+04
, MeVbbm50 100 150 200 250 300
310×
Eve
nts
0
5
10
15
20
25
30
35
40
45
bbMassSgnEntries 364
Mean 1.382e+05
RMS 5.693e+04
Constant 3.63± 31.27
Mean 3241± 1.192e+05
Sigma 5018± 2.384e+04
N=152/71
all
matched
H signaltt bbMassBgd
Entries 267
Mean 1.482e+05
RMS 6.641e+04
, MeVbbm50 100 150 200 250 300
310×
Eve
nts
0
5
10
15
20
25
30
35
40
45bbMassBgd
Entries 267
Mean 1.482e+05
RMS 6.641e+04
N=60/0
backgroundbbtt
Efficiencies
tt̄H sample tt̄bb̄ sampleε, % Events ε, % Events100 42882 100 96053
Wlep 55.5 23810 58.3 56038Whad 32.3 13834 37.9 36455
4 b-jets 0.84 362 0.28 267mH window 0.35 152 0.06 60
the shape of the irreducible background isreasonably flat
the reconstructed Higgs mass is close tothe nominal with the width of ∼24 GeV
Sergey Kotov (MPI für Physik, München) Multivariate analysis of H → bb̄ in tt̄H production DPG meeting, March 28, 2006 12 / 16
Expected signal after 30fb−1 of luminosity
=120 GeVH
, M-1L=30 fb
, MeVbb
M0 50 100 150 200 250 300
310×
even
ts/1
5 G
eV
0
5
10
15
20
25
30
35
40
45
50HttH, matchedttbbtt
jjtt=1.561S = 11/
Signal and backgrounds
=120 GeVH
, M-1L=30 fb
, MeVbb
M0 50 100 150 200 250 300
310×
even
ts/1
5 G
eV
0
5
10
15
20
25
30
35
40
45
50
data″Real″
Signal significance estimate
tt̄H tt̄bb̄ tt̄jjEvents in mH±30 GeV window 152 60 1
Final efficiency, % 0.35 0.06 0.0003Events normalized to 30 (100) fb−1 11 (38) 48 (160) 13 (45)
Signal significance 1.5 (2.7)
it’s hard to extract thesignal, unless thebackground shape is wellknown from MC
Sergey Kotov (MPI für Physik, München) Multivariate analysis of H → bb̄ in tt̄H production DPG meeting, March 28, 2006 13 / 16
Conclusions and plans
signal significance for tt̄H, H → bb̄ channel in this study comes out quite low: S = 1.5 for 30fb−1 of integrated luminosity (ATLAS TDR had S = 3)
it would be difficult to extract the H → bb̄ signal from data without good understanding of thebackground shape
ANN gives a small improvement in signal significance over standard TDR evaluator (∼4%)still a lot of things can be done to improve the signal significance
I smarter jet reconstruction algorithms are needed to deal with overlapping of jets in high jet multiplicityevents (smaller jet cone size, TopoCluster jets)
I b-jet reconstruction efficiency can be increased by loosening the b-tagging cut, with the downside ofreduced suppression against tt̄jj background → room for optimisation
I with more statistics, the neural network can be retrained on full simulation data
Sergey Kotov (MPI für Physik, München) Multivariate analysis of H → bb̄ in tt̄H production DPG meeting, March 28, 2006 14 / 16
Distributions of the ANN variables in signal sample: Fast simulation
100 120 140 160 180 200 220 240 260
310×0
0.01
0.02
0.03
0.04
0.05
bWlMassmatched
not-matched
mc truth
bWlMass
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
bWlDeltaRdmatched
not-matched
mc truth
bWlDeltaRd
40 50 60 70 80 90 100 110 120
310×0
0.01
0.02
0.03
0.04
0.05
jjMassmatched
not-matched
mc truth
jjMass
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
jjDeltaRdmatched
not-matched
mc truth
jjDeltaRd
100 120 140 160 180 200 220 240 260
310×0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
bWhMassmatched
not-matched
mc truth
bWhMass
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
bWhDeltaRdmatched
not-matched
mc truth
bWhDeltaRd
0 10 20 30 40 50 60 70 80 90 100
310×0
0.01
0.02
0.03
0.04
0.05
Delta2tMassmatched
not-matched
mc truth
Delta2tMass
200 300 400 500 600 700 800 900 100011001200
310×0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
ttMassmatched
not-matched
mc truth
ttMass
0 1 2 3 4 5 60
0.02
0.04
0.06
0.08
0.1
ttDeltaRdmatched
not-matched
mc truth
ttDeltaRd
0 100 200 300 400 500 600 700 800 900 1000
310×0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
ttSumPtmatched
not-matched
mc truth
ttSumPt
400 600 800 1000 1200 1400
310×0
0.01
0.02
0.03
0.04
0.05
0.06
ttHMassmatched
not-matched
mc truth
ttHMass
0 1 2 3 4 5 60
0.02
0.04
0.06
0.08
0.1
0.12
ttHDeltaRdmatched
not-matched
mc truth
ttHDeltaRd
Sergey Kotov (MPI für Physik, München) Multivariate analysis of H → bb̄ in tt̄H production DPG meeting, March 28, 2006 15 / 16
Neural network basics
Multilayer Percerptron
Combination input variables
Probability of being a signal combination
Input layer
Hidden layer
)jθ+ijwi xΣ = f(jh
-x1+e1f =
Output layer
oθ+jwj hΣO =
2(O-T)∑ N
1E =
TMultiLayerPerceptron ROOTbuilt-in class is used as neuralnetwork (1 hidden layer with 10nodes)
11500 of matched and 12500 ofnon-matched combinations wereused to train the neural network
Sergey Kotov (MPI für Physik, München) Multivariate analysis of H → bb̄ in tt̄H production DPG meeting, March 28, 2006 16 / 16