Download - DL1 A new Deep Neural Network-based higher level …DL1 A new Deep Neural Network-based higher level tagger for ATLAS Flavour Tagging SPS & ÖPG Joint Annual Meeting 23th August 2017

Marie Lanfermann, Andrea Coccaro, Tobias Golling

DL1 A new Deep Neural Network-based higher level tagger for

ATLAS Flavour Tagging

SPS & ÖPG Joint Annual Meeting 23th August 2017

DL1 Flavour Tagging in ATLAS Marie Lanfermann

SPS & ÖPG Joint Annual Meeting 23th August 2017 2

Higher Level Tagger What do we want?

• Best separation of b, c and light-flavour jets • b-tagging • c-tagging • Robust tagger (data/MC comparison) • Optimisation and Generalisation

• Good performance over full kinematics region • Good for various physics searches

• As little total work as possible

Kinematics

Introduction to flavour tagging in ATLAS



• Pre-processing:

• Reweight in 2D kinematics to b-jet distribution <—treating flavours on equal footing

• Default values:

• No values far from non-default values but rather set to mean of non-default values (or physics motivated)

• Introduce binary default-check variables (to propagate information on the values being defaults)

• Training (Hybrid ttbar/Z’ sample):

• Interesting phase space up to O(1TeV)

• Available statistics: 5.1 M training jets, 1.3 M validation jets

• Weighs are used in the back propagation update (training & validation set)

• Evaluation (separate pure samples of ttbar or Z’):

• Available statistics: ttbar: 6.5 M jets; Z’: 4.3 M

3

ProtocolPre-processing Training incl. loss monitoring Evaluation

Hybrid ttbar/Z’ sample

Pure samples of ttbar or Z’

[GeV]T

b−jet p0 200 400 600 800 1000 1200 1400 1600 1800 2000

T1/N

dN

/dp

7−10

6−10

5−10

4−10

3−10

2−10

ATLAS Simulation Preliminary

=13 TeVs



DL1 - General Overview

c-tagging:

Bias Node

6-10 hidden layers Mixtures of

MaxoutDense and Dense layers ReLU activation function

3 output nodes softmax activation function

higher level variables b-tagging:

Building the final DL1 discriminant:

Reducing output dimensionality

—> Increased Flexibility: + Background weighing tuneable after training + Same training usable for b- and c-tagging NN config file size ~1MB

Performance evaluation on pure ttbar and Z’ samples

Training using ttbar-Z’ Hybrid sample

Input Layer Hidden Layer 1

Hidden Layer 2

Output LayerBias Node

Bias Node



1.Grid Search 2. Sanity Checks:

1. #(training samples) > #(free parameters of the model O(100k)) 2. Loss development on training and validation set is monitored

3. Performance evaluated on test sets 4. Extend training for best performing configuration

• Performance after different number of epochs evaluated on test sets (using Keras ModelCheckpoints)

5

Optimisation Procedure



100 101 102

b-jet rejection

101

102

light

-flav

our-

jetr

ejec

tion

ATLAS Work in Progress√s = 13 TeV, tt̄

DL1 20% c-tagging efficiencyDL1 25% c-tagging efficiencyDL1 30% c-tagging efficiencyDL1 40% c-tagging efficiencyMV2c(l)100 20% c-tagging efficiencyMV2c(l)100 25% c-tagging efficiencyMV2c(l)100 30% c-tagging efficiencyMV2c(l)100 40% c-tagging efficiency

6

State of the art c-tagging (BDT)

Significant Improvements!

c-tagging

40% c-tagging eff.: @ b-rejection of 4

light rejection 16->27 +68% improvement

20% c-tagging eff.: @ b-rejection of 18

light rejection ~53->~104 +195% improvement

+68%

+195%

Using the same inputs



Performance Results DL1 w.r.t. state of the art BDT approach

7

B-TAGGING light-jet rejection

c-jet rejection

77% b-tagging efficiency ~same (5.8–>6.3)

9%

Same NN evaluated on all-inclusive ttbar sample

C-TAGGING light-jet rejection

b-jet rejection

40% c-tagging efficiency

(16—>27) +68% same

20% c-tagging efficiency

(53—>104) 195% same

Significant Improvements!

ATL-PHYS-PUB-2017-013

same inputs

slightly more inputs



1/N

dN

/dD

L1M

uRnn

3−10

2−10

1−10

1tt

Data 2016MC16b Jetsc Jetslight−flavour Jets

ATLAS Preliminary

DL1MuRnn

4− 2− 0 2 4 6 8 10

Dat

a/M

C

s -1 = 13 TeV, 2.5 fb

0.60.8

11.21.4

1/N

dN

/dD

L1M

uRnn

3−10

2−10

1−10

1 + jetsµµZData 2016MC16b jetsc jetslight−flavour jets

ATLAS Preliminary

DL1MuRnn

4− 2− 0 2 4 6 8 10

Dat

a/M

C

s -1 = 13 TeV, 0.5 fb

0.60.8

11.21.4

Modelling check

8

• Good separation • Simulation describes the data within 20% with some localised differences for low and high values • To be checked with more data

Output variable well modelled




Conclusions• Novel highly flexible tagger ready to be used on

2017 data • Only one training • Tuneable after training • Slight expected improvements for b-jet tagging • Significant expected improvements for c-jet

tagging • Calibration analysis starting

9



References:

Deep Learning in the ATLAS experiment, ATL-PHYS-SLIDE-2017-477, http://cds.cern.ch/record/2274065.

Optimisation and performance studies of the ATLAS b-tagging algorithms for the 2017-18 LHC run, ATL-PHYS-PUB-2017-013, http://cds.cern.ch/record/2273281.

Identification of Jets Containing b-hadrons with Recurrent Neural Networks at the ATLAS experiment, ATL-PHYS-PUB-2017-003, http://cds.cern.ch/record/2255226.

10

http://cds.cern.ch/record/2274065





BACKUP

11



The DL1 chain

VA

LI

DA

TE

D



ATL-PHYS-SLIDE-2017-477



Initial Calo-Jet Cuts

14

Akt4EMTopo jets



1/N

dN

/dLL

RD

ata/

MC

5−10

4−10

3−10

2−10

1−10

1tt

Data 2016MC16b jetsc jetslight−flavour jets

ATLAS Preliminary

IP3D b LLR

10− 5− 0 5 10 15 20 25 30

s -1 = 13 TeV, 2.5 fb

0.60.8

11.21.4

1/N

dN

/dLL

RD

ata/

MC

5−10

4−10

3−10

2−10

1−10

1 + jetsµµZData 2016MC16b jetsc jetslight−flavour jets

ATLAS Preliminary

IP3D b LLR

10− 5− 0 5 10 15 20 25 30

s -1 = 13 TeV, 0.5 fb

0.60.8

11.21.4

Input modelling check

15




Grid Search• Varied:

• Number of hidden layers, layer type sequencing, number of nodes, learning rate

—> Approximately 100k trainable parameters

16

• Keras sequential model

• 3 output nodes

• Theano backend

• Adam optimiser

• Minimise categorical cross-entropy loss

• General settings:

• ReLU activation function (softmax for output layer)

• Mixture of Maxout and Dense layers

• BatchNormalisation

• Dropout (training) for robustness

• 1st layer: 10% of nodes masked

• Other hidden layers: 20% masked

• 100 training epochsInformation accessible

after construction Keras model via: model.summary()

MO: Maxout layer



Iso-efficiency curve = Scan over full range of :

1 10c-jet rejection

10

100light

-flav

our-

jetr

ejec

tion

ATLAS Simulation Preliminaryps = 13 TeV, tt̄

DL1 70% b-tagging efficiencyDL1 77% b-tagging efficiencyDL1 80% b-tagging efficiency

b-tagging

17

Tuneable after training:

Background fraction of the final DL1 discriminant can be adapted for physics performance interests by moving along the lines