Searching for Single Top Using Decision Trees

Searching for Single Top Using Decision Trees

Searching for Single Top Using Decision Trees

G. Watts (UW)For the DØ Collaboration

5/13/2005 – APSNW Particles I

http://www.sfu.ca/

Gordon Watts (UW) APSNW Meeting May 13, 2005 2

SingleTop ChallengesSingleTop Challenges

Overwhelming Background!

Straight Cuts

Difficulty taking advantage of correlations

(and counting experiments)

Multivariate Cuts(and shape fitting)

Designed to take advantage of correlations and irreducible backgrounds


Asymmetries in t-Channel ProductionAsymmetries in t-

Channel Production

b

Pair Production

Lots of variables give small separation

(Use ME, phase space, etc.)


Combine Variables!Combine Variables!

Multivariate Likelihood Fit

7 variables means 7 dimensions…

Neural NetworkMany inputs and a single outputTrained on signal and background sampleWell understood and mostly accepted in HEP

Decision TreeMany inputs and a single outputTrained on signal and background sampleUsed mostly in life sciences & business (MiniBOONE

- physics/0408124).


Decision TreeDecision Tree

Trained Decision Tree

(Binned Likelihood Fit)

(Limit)


Internals of a Trained Tree

Internals of a Trained Tree

Every Event belongs to a single leaf node!

“Rooted Binary Tree”“You can see a decision tree”


TrainingTraining

Determine a branch point

Calculate Gini ImprovementAs a function of a

interesting variable (HT in this case)Choose the largest improvement as the cut pointRepeat for all interesting

variablesHT, Jet pT, Angular Variables, etc.

Best improvement is this node’s decision.


GiniGini

Process Requires a Variable to optimize separation.

bs

s

WW

WP

Ws – Weight of Signal Events

Wb – Weight of Background Events

Purity

Gini

bS WWPPG )1(

G is zero for pure background or signal!


Gini ImprovementGini Improvement

Data (S)S1 S2

For each node

GI = G(S) – G(S1) – G(S2)

Repeat the process for each subdivision of data


And Cut…And Cut…

Determine the Purity of each leaf

bs

s

WW

WP

Stop process and generate a leaf.We used statistical sample error

(# of events)

Use Tree as Estimator of PurityEach event belongs to a unique leafThe leaf’s purity is the estimator of the event


DT in the Single Top Search

DT in the Single Top Search

DTWbb

DTtt

l+jets

Two DTs

2d Histogram used in binned likelihood

fit

2d Histogram used in binned likelihood

fit

Trained on signal and Wbb as background

Trained on signal and tt lepton +

jets as background

DØ

This part is identical to a NN based analysisSeparate DT for muon & electron

Backgrounds: W+Jets, QCD, top Pair Production Fake Leptons


ResultsResults

Expected Limitss-channel: 4.5 pb (NN: 4.5)t-channel: 6.4 pb (NN: 5.8)

Actual Limitss-channel: 8.3 pb (NN: 6.4)t-channel: 8.1 pb (NN: 5.0)

Expected Results Close to NN


Future of the AnalysisFuture of the Analysis

Use a Single Decision Tree

Train it against all backgrounds

PruningTrain until each leaf has only a single eventRecombine leaves (pruning) using statistical estimator

BoostingCombine multiple trees, each weightedTrain trees on event samples that have mis-classified event weights enhanced


References & Introduction

References & Introduction

MiniBooNE Paper: hep-ex/0408124

Recent Advances in Predictive (Machine) Learning

Jerome H. Friedman, Conf. Proceedings

I have then linked and other on my web page

http://d0.phys.washington.edu/~gwatts/research/conferences


ConclusionsConclusions

• Decision Trees are good…– Model is obvious in form of 2d binary tree.– Not as sensitive to outliers in input data as other

methods– Easily accommodate integer inputs (NJets) or missing

variable inputs.– Easy to implement (several months to go from scratch to

working code)• Decision Trees aren’t so good…

– Well understood input variables are a must• Similar for Neural Networks, of course.

– Minor changes in the input events can make for major changes in tree layout and results.

– Estimator is not a continuous function• Don’t have to deal with hidden nodes

– Separate training of background or other issues

Date post:	07-Jan-2016
Category:	Documents
Upload:	gafna
View:	19 times
Download:	2 times

Searching for Single Top Using Decision Trees

Documents