Marie Lanfermann, Andrea Coccaro, Tobias Golling
DL1 A new Deep Neural Network-based higher level tagger for
ATLAS Flavour Tagging
SPS & ÖPG Joint Annual Meeting 23th August 2017
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017 2
Higher Level Tagger What do we want?
• Best separation of b, c and light-flavour jets • b-tagging • c-tagging • Robust tagger (data/MC comparison) • Optimisation and Generalisation
• Good performance over full kinematics region • Good for various physics searches
• As little total work as possible
Kinematics
Introduction to flavour tagging in ATLAS
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017
• Pre-processing:
• Reweight in 2D kinematics to b-jet distribution <—treating flavours on equal footing
• Default values:
• No values far from non-default values but rather set to mean of non-default values (or physics motivated)
• Introduce binary default-check variables (to propagate information on the values being defaults)
• Training (Hybrid ttbar/Z’ sample):
• Interesting phase space up to O(1TeV)
• Available statistics: 5.1 M training jets, 1.3 M validation jets
• Weighs are used in the back propagation update (training & validation set)
• Evaluation (separate pure samples of ttbar or Z’):
• Available statistics: ttbar: 6.5 M jets; Z’: 4.3 M
3
ProtocolPre-processing Training incl. loss monitoring Evaluation
Hybrid ttbar/Z’ sample
Pure samples of ttbar or Z’
[GeV]T
b−jet p0 200 400 600 800 1000 1200 1400 1600 1800 2000
T1/N
dN
/dp
7−10
6−10
5−10
4−10
3−10
2−10
ATLAS Simulation Preliminary
=13 TeVs
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017 4
DL1 - General Overview
c-tagging:
Bias Node
6-10 hidden layers Mixtures of
MaxoutDense and Dense layers ReLU activation function
3 output nodes softmax activation function
higher level variables b-tagging:
Building the final DL1 discriminant:
Reducing output dimensionality
—> Increased Flexibility: + Background weighing tuneable after training + Same training usable for b- and c-tagging NN config file size ~1MB
Performance evaluation on pure ttbar and Z’ samples
Training using ttbar-Z’ Hybrid sample
Input Layer Hidden Layer 1
Hidden Layer 2
Output LayerBias Node
Bias Node
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017
1.Grid Search 2. Sanity Checks:
1. #(training samples) > #(free parameters of the model O(100k)) 2. Loss development on training and validation set is monitored
3. Performance evaluated on test sets 4. Extend training for best performing configuration
• Performance after different number of epochs evaluated on test sets (using Keras ModelCheckpoints)
5
Optimisation Procedure
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017
100 101 102
b-jet rejection
101
102
light
-flav
our-
jetr
ejec
tion
ATLAS Work in Progress√s = 13 TeV, tt̄
DL1 20% c-tagging efficiencyDL1 25% c-tagging efficiencyDL1 30% c-tagging efficiencyDL1 40% c-tagging efficiencyMV2c(l)100 20% c-tagging efficiencyMV2c(l)100 25% c-tagging efficiencyMV2c(l)100 30% c-tagging efficiencyMV2c(l)100 40% c-tagging efficiency
6
State of the art c-tagging (BDT)
Significant Improvements!
c-tagging
40% c-tagging eff.: @ b-rejection of 4
light rejection 16->27 +68% improvement
20% c-tagging eff.: @ b-rejection of 18
light rejection ~53->~104 +195% improvement
+68%
+195%
Using the same inputs
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017
Performance Results DL1 w.r.t. state of the art BDT approach
7
B-TAGGING light-jet rejection
c-jet rejection
77% b-tagging efficiency ~same (5.8–>6.3)
9%
Same NN evaluated on all-inclusive ttbar sample
C-TAGGING light-jet rejection
b-jet rejection
40% c-tagging efficiency
(16—>27) +68% same
20% c-tagging efficiency
(53—>104) 195% same
Significant Improvements!
ATL-PHYS-PUB-2017-013
same inputs
slightly more inputs
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017
1/N
dN
/dD
L1M
uRnn
3−10
2−10
1−10
1tt
Data 2016MC16b Jetsc Jetslight−flavour Jets
ATLAS Preliminary
DL1MuRnn
4− 2− 0 2 4 6 8 10
Dat
a/M
C
s -1 = 13 TeV, 2.5 fb
0.60.8
11.21.4
1/N
dN
/dD
L1M
uRnn
3−10
2−10
1−10
1 + jetsµµZData 2016MC16b jetsc jetslight−flavour jets
ATLAS Preliminary
DL1MuRnn
4− 2− 0 2 4 6 8 10
Dat
a/M
C
s -1 = 13 TeV, 0.5 fb
0.60.8
11.21.4
Modelling check
8
• Good separation • Simulation describes the data within 20% with some localised differences for low and high values • To be checked with more data
Output variable well modelled
ATL-PHYS-PUB-2017-013
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017
Conclusions• Novel highly flexible tagger ready to be used on
2017 data • Only one training • Tuneable after training • Slight expected improvements for b-jet tagging • Significant expected improvements for c-jet
tagging • Calibration analysis starting
9
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017
References:
Deep Learning in the ATLAS experiment, ATL-PHYS-SLIDE-2017-477, http://cds.cern.ch/record/2274065.
Optimisation and performance studies of the ATLAS b-tagging algorithms for the 2017-18 LHC run, ATL-PHYS-PUB-2017-013, http://cds.cern.ch/record/2273281.
Identification of Jets Containing b-hadrons with Recurrent Neural Networks at the ATLAS experiment, ATL-PHYS-PUB-2017-003, http://cds.cern.ch/record/2255226.
10
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017
BACKUP
11
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017 12
The DL1 chain
VA
LI
DA
TE
D
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017 13
ATL-PHYS-SLIDE-2017-477
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017
Initial Calo-Jet Cuts
14
Akt4EMTopo jets
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017
1/N
dN
/dLL
RD
ata/
MC
5−10
4−10
3−10
2−10
1−10
1tt
Data 2016MC16b jetsc jetslight−flavour jets
ATLAS Preliminary
IP3D b LLR
10− 5− 0 5 10 15 20 25 30
s -1 = 13 TeV, 2.5 fb
0.60.8
11.21.4
1/N
dN
/dLL
RD
ata/
MC
5−10
4−10
3−10
2−10
1−10
1 + jetsµµZData 2016MC16b jetsc jetslight−flavour jets
ATLAS Preliminary
IP3D b LLR
10− 5− 0 5 10 15 20 25 30
s -1 = 13 TeV, 0.5 fb
0.60.8
11.21.4
Input modelling check
15
ATL-PHYS-PUB-2017-013
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017
Grid Search• Varied:
• Number of hidden layers, layer type sequencing, number of nodes, learning rate
—> Approximately 100k trainable parameters
16
• Keras sequential model
• 3 output nodes
• Theano backend
• Adam optimiser
• Minimise categorical cross-entropy loss
• General settings:
• ReLU activation function (softmax for output layer)
• Mixture of Maxout and Dense layers
• BatchNormalisation
• Dropout (training) for robustness
• 1st layer: 10% of nodes masked
• Other hidden layers: 20% masked
• 100 training epochsInformation accessible
after construction Keras model via: model.summary()
MO: Maxout layer
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017
Iso-efficiency curve = Scan over full range of :
1 10c-jet rejection
10
100light
-flav
our-
jetr
ejec
tion
ATLAS Simulation Preliminaryps = 13 TeV, tt̄
DL1 70% b-tagging efficiencyDL1 77% b-tagging efficiencyDL1 80% b-tagging efficiency
b-tagging
17
Tuneable after training:
Background fraction of the final DL1 discriminant can be adapted for physics performance interests by moving along the lines
ATL-PHYS-PUB-2017-013
DL1 Flavour Tagging in ATLAS Marie Lanfermann
SPS & ÖPG Joint Annual Meeting 23th August 2017 18