Searching for Single Top Using Decision Trees
Searching for Single Top Using Decision Trees
G. Watts (UW)For the DØ Collaboration
5/13/2005 – APSNW Particles I
Gordon Watts (UW) APSNW Meeting May 13, 2005 2
SingleTop ChallengesSingleTop Challenges
Overwhelming Background!
Straight Cuts
Difficulty taking advantage of correlations
(and counting experiments)
Multivariate Cuts(and shape fitting)
Designed to take advantage of correlations and irreducible backgrounds
Gordon Watts (UW) APSNW Meeting May 13, 2005 3
Asymmetries in t-Channel ProductionAsymmetries in t-
Channel Production
b
Pair Production
Lots of variables give small separation
(Use ME, phase space, etc.)
Gordon Watts (UW) APSNW Meeting May 13, 2005 4
Combine Variables!Combine Variables!
Multivariate Likelihood Fit
7 variables means 7 dimensions…
Neural NetworkMany inputs and a single outputTrained on signal and background sampleWell understood and mostly accepted in HEP
Decision TreeMany inputs and a single outputTrained on signal and background sampleUsed mostly in life sciences & business (MiniBOONE
- physics/0408124).
Gordon Watts (UW) APSNW Meeting May 13, 2005 5
Decision TreeDecision Tree
Trained Decision Tree
(Binned Likelihood Fit)
(Limit)
Gordon Watts (UW) APSNW Meeting May 13, 2005 6
Internals of a Trained Tree
Internals of a Trained Tree
Every Event belongs to a single leaf node!
“Rooted Binary Tree”“You can see a decision tree”
Gordon Watts (UW) APSNW Meeting May 13, 2005 7
TrainingTraining
Determine a branch point
Calculate Gini ImprovementAs a function of a
interesting variable (HT in this case)Choose the largest improvement as the cut pointRepeat for all interesting
variablesHT, Jet pT, Angular Variables, etc.
Best improvement is this node’s decision.
Gordon Watts (UW) APSNW Meeting May 13, 2005 8
GiniGini
Process Requires a Variable to optimize separation.
bs
s
WW
WP
Ws – Weight of Signal Events
Wb – Weight of Background Events
Purity
Gini
bS WWPPG )1(
G is zero for pure background or signal!
Gordon Watts (UW) APSNW Meeting May 13, 2005 9
Gini ImprovementGini Improvement
Data (S)S1 S2
For each node
GI = G(S) – G(S1) – G(S2)
Repeat the process for each subdivision of data
Gordon Watts (UW) APSNW Meeting May 13, 2005 10
And Cut…And Cut…
Determine the Purity of each leaf
bs
s
WW
WP
Stop process and generate a leaf.We used statistical sample error
(# of events)
Use Tree as Estimator of PurityEach event belongs to a unique leafThe leaf’s purity is the estimator of the event
Gordon Watts (UW) APSNW Meeting May 13, 2005 11
DT in the Single Top Search
DT in the Single Top Search
DTWbb
DTtt
l+jets
Two DTs
2d Histogram used in binned likelihood
fit
2d Histogram used in binned likelihood
fit
Trained on signal and Wbb as background
Trained on signal and tt lepton +
jets as background
DØ
This part is identical to a NN based analysisSeparate DT for muon & electron
Backgrounds: W+Jets, QCD, top Pair Production Fake Leptons
Gordon Watts (UW) APSNW Meeting May 13, 2005 12
ResultsResults
Expected Limitss-channel: 4.5 pb (NN: 4.5)t-channel: 6.4 pb (NN: 5.8)
Actual Limitss-channel: 8.3 pb (NN: 6.4)t-channel: 8.1 pb (NN: 5.0)
Expected Results Close to NN
Gordon Watts (UW) APSNW Meeting May 13, 2005 13
Future of the AnalysisFuture of the Analysis
Use a Single Decision Tree
Train it against all backgrounds
PruningTrain until each leaf has only a single eventRecombine leaves (pruning) using statistical estimator
BoostingCombine multiple trees, each weightedTrain trees on event samples that have mis-classified event weights enhanced
Gordon Watts (UW) APSNW Meeting May 13, 2005 14
References & Introduction
References & Introduction
MiniBooNE Paper: hep-ex/0408124
Recent Advances in Predictive (Machine) Learning
Jerome H. Friedman, Conf. Proceedings
I have then linked and other on my web page
http://d0.phys.washington.edu/~gwatts/research/conferences
Gordon Watts (UW) APSNW Meeting May 13, 2005 15
ConclusionsConclusions
• Decision Trees are good…– Model is obvious in form of 2d binary tree.– Not as sensitive to outliers in input data as other
methods– Easily accommodate integer inputs (NJets) or missing
variable inputs.– Easy to implement (several months to go from scratch to
working code)• Decision Trees aren’t so good…
– Well understood input variables are a must• Similar for Neural Networks, of course.
– Minor changes in the input events can make for major changes in tree layout and results.
– Estimator is not a continuous function• Don’t have to deal with hidden nodes
– Separate training of background or other issues