Date post: | 30-Dec-2015 |
Category: |
Documents |
Upload: | benedict-york |
View: | 215 times |
Download: | 2 times |
1Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
TMVA Toolkit for Multivariate Data Analysis
with ROOT
Helge Voss, MPI-K Heidelberg
on behalf of: Andreas Höcker, Fredrik Tegenfeld, Joerg Stelzer*
http://tmva.sourceforge.net/arXiv: physics/0703039
Supply an environment to easily:
apply different sophisticated data selection algorithms
have them all trained, tested and evaluated
find the best one for your selection problem
and contributors:
A.Christov, S.Henrot-Versillé, M.Jachowski, A.Krasznahorkay Jr., Y.Mahalalel, X.Prudent, P.Speckmayer, M.Wolter, A.Zemla
2Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Motivation/Outline
Outline:
introduction
the MVA classifiers available in TMVA
demonstration with toy examples
summary
ROOT: is the analysis framework used by most (HEP)-physicists
Idea: rather than just implementing new MVA techniques and making them somehow available in ROOT (i.e. like TMulitLayerPercetron does):
have one common platform/interface for all MVA classifierseasy to use and compare different MVA classifiers
train/test on same data sample and evaluate consistently
3Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
…
y(Bkg) 0y(Signal) 1
Multivariate Event Classification
All multivariate classifiers condense (correlated) multi-variable input information into a single scalar output variable: Rn R
One variable to base your decision on
4Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
What is in TMVATMVA currently includes:
Rectangular cut optimisation Projective and Multi-dimensional likelihood estimator Fisher discriminant and H-Matrix (2 estimator)Artificial Neural Network (3 different implementations) Boosted/bagged Decision Trees Rule Fitting Support Vector Machines
TMVA package provides training, testing and evaluation of the classifiers
each classifier provides a ranking of the input variables
classifiers produce weight files that are read by reader class for MVA application
all classifiers are highly customizable
integrated in ROOT(since release 5.11/03) and very easy to use!
support of arbitrary pre-selections and individual event weights
common pre-processing of input: de-correlation, principal component analysis
5Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Commonly realised for all methods in TMVA (centrally in DataSet class):
Note that this “de-correlation” is only complete, if: input variables are Gaussians
correlations linear only
in practise: gain form de-correlation often rather modest – or even harmful
Preprocessing the Input Variables: Decorrelation
originaloriginal SQRT derorr.SQRT derorr. PCA derorr.PCA derorr.
Removal of linear correlations by rotating variables using the square-root of the correlation matrix
using the Principal Component Analysis
6Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Simplest method: cut in rectangular volume using
event eventvariabl
cut, , ,min ,mas
xe
(0,1) ,v vv
ivix x x x
scan in signal efficiency [0 1] and maximise background rejection
from this scan, the optimal working point in terms if S,B numbers can be derived
Technical problem: how to perform optimisation TMVA uses: random sampling, Simulated Annealing or Genetics Algorithm
speed improvement in volume search:
training events are sorted in Binary Seach Trees
Cut Optimisation
do this in normal variable space or de-correlated variable space
7Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Combine probability from different variables for an event to be signal or background like
event
event
event
variables,
signal
speci
P
variabe le
DE,
,ss
( )
( )
v vv
vS
i
iv
i
vS
p x
x
p x
discriminating variables
Species: signal, background types
Likelihood ratio for event ievent
PDFs
Projected Likelihood Estimator (PDE Appr.)
automatic,unbiased, but suboptimal
easy to automate, can create artefactsTMVA uses: Splines0-5, Kernel estimators
difficult to automate
Technical problem: how to implement reference PDFs
3 ways: counting, function fitting , parametric fitting (splines, kernel estimators.)
Optimal if no correlations and PDF’s are correct (known)
usually it is not true development of different methods
8Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Carli-Koblitz, NIM A501, 576 (2003)
Generalisation of 1D PDE approach to Nvar dimensions
Optimal method – in theory – if “true N-dim PDF” were known
Practical challenges: derive N-dim PDF from training sample
TMVA implementation: Range search PDERS
count number of signal and background events in “vicinity” of a data event fixed size or adaptive (latter one = kNN-type classifiers)
Multidimensional Likelihood Estimator
S
B
x1
x2
test event
speed up range search by sorting training events in Binary Trees
use multi-D kernels (Gaussian, triangular, …) to weight events within a volume
volumes can be rectangular or spherical
9Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Well-known, simple and elegant classifier:
determine linear variable transformation where:
linear correlations are removed
mean values of signal and background are “pushed” as far apart as possible
the computation of Fisher response is very simple: linear combination of the event variables * Fisher coefficients
event eventvariab
Fisheres
,l
, i v vv
ix x F
“Fisher coefficients”
Fisher Discriminant (and H-Matrix)
10Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Fee
d-fo
rwar
d M
ultil
ayer
Per
cept
ron
1( ) 1 xA x e
1
i
. . .N
1 input layer k hidden layers 1 ouput layer
1
j
M1
. . .
. . . 1
. . .Mk
2 output classes (signal and background)
Nvar discriminating input variables
11w
ijw
1jw. . .. . .
1( ) ( ) ( ) ( 1)
01
kMk k k k
j j ij ii
x w w xA
var
(0)1..i Nx
( 1)1,2kx
with:
(“Activation” function)
Get a non-linear classifier response by giving linear combination of input variables to nodes with non-linear activation
Artificial Neural Network (ANN)
Training: adjust weights using known event such that signal/background are best separated
Nodes (or neurons) and arranged in series
Feed-Forward Multilayer Perceptrons (3 different implementations in TMVA)
11Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Decision Trees
sequential application of “cuts” which splits the data into nodes, and the final nodes (leaf) classifies an event as signal or background
Training: growing a decision tree
Start with Root node
Split training sample according to cut on best variable at this node
Splitting criterion: e.g., maximum “Gini-index”: purity (1– purity)
Continue splitting until min. number of events or max. purity reached
Bottom up Pruning:
remove statistically insignificant nodes avoid overtraining
Classify leaf node according to majority of events, or give weight; unknown test events are classified accordinglyDecision tree before pruning
Decision tree after pruning
Decision Trees
12Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Decision Trees: well know since a long time but hardly used in HEP (although very similar to “simple Cuts”)
Disatvantage: instability: small changes in training sample can give large changes in tree structure
Boosted Decision Trees
Boosted Decision Trees (1996): combine several decision trees: forest
classifier output is the (weighted) majority vote of individual trees
trees derived from same training sample with different event weights
e.g. AdaBoost: wrong classified training events are given a larger weight
bagging (re-sampling with replacement) random weights
Boosted Decision Trees
Remark: bagging/boosting create a basis of classifiers
final classifier is a linear combination of base classifiers
13Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Following RuleFit from Friedman-Popescu:
Classifier is a linear combination of simple base classifiers
that are called rules and are here: sequences of cuts:
The procedure is:1. create the rule ensemble created from a set of decision trees2. fit the coefficients “Gradient directed regularization” (Friedman et al)
Rule Fitting(Predictive Learning via Rule Ensembles)
RF 01 1
ˆ ˆR RM n
m m k km k
x x xy a ra b
rules (cut sequence rm=1 if all cuts satisfied, =0 otherwise)
normalised discriminating event variables
RuleFit classifier
Linear Fisher termSum of rules
Friedman-Popescu, Tech Rep, Stat. Dpt, Stanford U., 2003
14Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Support Vector Machines
x1
x2
Find hyperplane that best separates signal from background
best separation: maximum distance between closest events (support) to hyperplane
linear decision boundary
Non linear cases:
transform the variables in higher dimensional feature space where linear boundary (hyperplanes) can separate the data
transformation is done implicitly using Kernel Functions that effectively introduces a metric for the distance measures that “mimics” the transformation
Choose Kernel and fit the hyperplane
x1
x1
x2
x1
x2
x3
Available Kernels: Gaussian,
Polynomial,
Sigmoid
15Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
A Complete Example Analysisvoid TMVAnalysis( ) { TFile* outputFile = TFile::Open( "TMVA.root", "RECREATE" );
TMVA::Factory *factory = new TMVA::Factory( "MVAnalysis", outputFile,"!V");
TFile *input = TFile::Open("tmva_example.root"); TTree *signal = (TTree*)input->Get("TreeS"); TTree *background = (TTree*)input->Get("TreeB"); factory->AddSignalTree ( signal, 1. ); factory->AddBackgroundTree( background, 1.);
factory->AddVariable("var1+var2", 'F'); factory->AddVariable("var1-var2", 'F'); factory->AddVariable("var3", 'F'); factory->AddVariable("var4", 'F');
factory->PrepareTrainingAndTestTree("", "NSigTrain=3000:NBkgTrain=3000:SplitMode=Random:!V" );
factory->BookMethod( TMVA::Types::kLikelihood, "Likelihood", "!V:!TransformOutput:Spline=2:NSmooth=5:NAvEvtPerBin=50" );
factory->BookMethod( TMVA::Types::kMLP, "MLP", "!V:NCycles=200:HiddenLayers=N+1,N:TestRate=5" );
factory->TrainAllMethods(); factory->TestAllMethods(); factory->EvaluateAllMethods(); outputFile->Close(); delete factory;}
create Factory
give training/test trees
tell which variables(example uses variables not directly
avaiable in the tree:i.e.” var1+var2”)
select the MVA methods
train,test and evaluate
16Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Example Applicationvoid TMVApplication( ) { TMVA::Reader *reader = new TMVA::Reader("!Color");
Float_t var1, var2, var3, var4; reader->AddVariable( "var1+var2", &var1 ); reader->AddVariable( "var1-var2", &var2 ); reader->AddVariable( "var3", &var3 ); reader->AddVariable( "var4", &var4 );
reader->BookMVA( "MLP method", "weights/MVAnalysis_MLP.weights.txt" );
TFile *input = TFile::Open("tmva_example.root"); TTree* theTree = (TTree*)input->Get("TreeS");
Float_t userVar1, userVar2; theTree->SetBranchAddress( "var1", &userVar1 ); theTree->SetBranchAddress( "var2", &userVar2 ); theTree->SetBranchAddress( "var3", &var3 ); theTree->SetBranchAddress( "var4", &var4 );
for (Long64_t ievt=3000; ievt<theTree->GetEntries();ievt++) { theTree->GetEntry(ievt);
var1 = userVar1 + userVar2; var2 = userVar1 - userVar2; cout << reader->EvaluateMVA( "MLP method" ) <<endl; }
delete reader;}
create Reader
tell it about the variables
selected MVA method
set tree variables
(example uses variables not
directly avaiable in the tree) event loop
calculate the MVA response
17Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Use data set with 4 linearly correlated Gaussian distributed variables:
--------------------------------------- Rank : Variable : Separation --------------------------------------- 1 : var3 : 3.834e+02 2 : var2 : 3.062e+02 3 : var1 : 1.097e+02 4 : var0 : 5.818e+01 ---------------------------------------
A purely academic Toy example
18Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Validating the Classifier Training
average no. of nodes before/after pruning: 4193 / 968
Validating the classifiers
TMVA GUIProjective likelihood PDFs, MLP training, BDTs, ....
19Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
TMVA output distributions:
The OutputClassifier Output
due to correlationscorrelations removed
Likelihood PDERS Fisher
Neural Network Boosted Decision Trees Rule Fitting
20Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
TMVA output distributions for Fisher, Likelihood, BDT and MLP…
The OutputEvaluation Output
For this case: Fisher discriminant provides the theoretically ‘best’ possible method
Same as de-correlated Likelihood
For this case: Fisher discriminant provides the theoretically ‘best’ possible method
Same as de-correlated Likelihood
Cuts and Likelihood w/o de-correlation are inferior
Cuts and Likelihood w/o de-correlation are inferior
Note: About A
ll Realistic
Use Cases are Much More Difficult T
han This One
21Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Evaluation Output (taken from TMVA printout)
Evaluation results ranked by best signal efficiency and purity (area)------------------------------------------------------------------------------ MVA Signal efficiency at bkg eff. (error): | Sepa- Signifi-
Methods: @B=0.01 @B=0.10 @B=0.30 Area | ration: cance:------------------------------------------------------------------------------
Fisher : 0.268(03) 0.653(03) 0.873(02) 0.882 | 0.444 1.189MLP : 0.266(03) 0.656(03) 0.873(02) 0.882 | 0.444 1.260LikelihoodD : 0.259(03) 0.649(03) 0.871(02) 0.880 | 0.441 1.251PDERS : 0.223(03) 0.628(03) 0.861(02) 0.870 | 0.417 1.192RuleFit : 0.196(03) 0.607(03) 0.845(02) 0.859 | 0.390 1.092HMatrix : 0.058(01) 0.622(03) 0.868(02) 0.855 | 0.410 1.093BDT : 0.154(02) 0.594(04) 0.838(03) 0.852 | 0.380 1.099CutsGA : 0.109(02) 1.000(00) 0.717(03) 0.784 | 0.000 0.000Likelihood : 0.086(02) 0.387(03) 0.677(03) 0.757 | 0.199 0.682
------------------------------------------------------------------------------Testing efficiency compared to training efficiency (overtraining check)
------------------------------------------------------------------------------ MVA Signal efficiency: from test sample (from traing sample)
Methods: @B=0.01 @B=0.10 @B=0.30------------------------------------------------------------------------------
Fisher : 0.268 (0.275) 0.653 (0.658) 0.873 (0.873)MLP : 0.266 (0.278) 0.656 (0.658) 0.873 (0.873)LikelihoodD : 0.259 (0.273) 0.649 (0.657) 0.871 (0.872)PDERS : 0.223 (0.389) 0.628 (0.691) 0.861 (0.881)RuleFit : 0.196 (0.198) 0.607 (0.616) 0.845 (0.848)HMatrix : 0.058 (0.060) 0.622 (0.623) 0.868 (0.868)BDT : 0.154 (0.268) 0.594 (0.736) 0.838 (0.911)CutsGA : 0.109 (0.123) 1.000 (0.424) 0.717 (0.715)Likelihood : 0.086 (0.092) 0.387 (0.379) 0.677 (0.677)
-----------------------------------------------------------------------------
Bet
ter
clas
sifie
r
Check for over-training
22Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
More Toys: Linear-, Cross-, Circular Correlations
Illustrate the behaviour of linear and nonlinear classifiers
Circular correlations(same for signal and background)
More Toys: Circular correlations
23Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Weight Variables by Classifier Performance
Example: How do classifiers deal with the correlation patterns ?
Illustustration: Events weighted by MVA-response:
Linear Classifiers:
Non Linear
Classifiers:
Decision Trees PDERS
Likelihood Fisherdecorrelated
Likelihood
24Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
CircularExample
Final Classifier Performance
Background rejection versus signal efficiency curve:
Final Classifier Performance
25Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
More Toys: “Schachbrett” (chess board)
Performance achieved without parameter adjustments:
PDERS and BDT are best “out of the box”
After some parameter tuning, also SVM und ANN(MLP) perform
Theoretical maximum
Event Distribution
Events weighted by SVM response
26Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
We (finally) have a Users Guide !
Available from tmva.sf.net
TMVA-Users Guide
TMVA Users Guide78pp, incl. code examples
arXiv: physics/0703039
27Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
TMVA unifies highly customizable and performing multivariate
classification algorithms in a single user-friendly framework
Summary
This ensures most objective classifier comparisons and simplifies their use
TMVA is available from tmva.sf.net and in ROOT (>5.11/03)
A typical TMVA analysis requires user interaction with a Factory (for classifier training) and a Reader (for classifier application)
a set of ROOT macros displays the evaluation results
We will continue to improve flexibility and add new classifiers
Bayesian Classifiers
“Committee Method” combination of different MVA techniques
C-code output for trained classifiers (for selected methods…)
28Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
More Toys: Linear-, Cross-, Circular Correlations
Illustrate the behaviour of linear and nonlinear classifiers
Linear correlations(same for signal and background)
Linear correlations(opposite for signal and background)
Circular correlations(same for signal and background)
More Toys: Linear, Cross, Circular correlations
29Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Weight Variables by Classifier Performance
Linear correlations(same for signal and background)
Linear correlations(opposite for signal and background)
Circular correlations(same for signal and background)
How well do the classifier resolve the various correlation patterns ?
Illustustration: Events weighted by MVA-response:
30Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Final Classifier Performance
Background rejection versus signal efficiency curve:
LinearExampleCrossExampleCircularExample
Final Classifier Performance
31Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Stability with Respect to Irrelevant Variables
Toy example with 2 discriminating and 4 non-discriminating variables ?
use only two discriminant variables in classifiers
use only two discriminant variables in classifiersuse all discriminant variables in classifiers
use all discriminant variables in classifiers
Stability with respect to irrelevant variables
32Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Using TMVA in Training and Application
Can be ROOT scripts, C++ executables or python scripts (via PyROOT), or any other high-level language that interfaces with ROOT
Can be ROOT scripts, C++ executables or python scripts (via PyROOT), or any other high-level language that interfaces with ROOT
33Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
A linear boundary? A nonlinear one?Rectangular cuts?
S
B
x1
x2 S
B
x1
x2
B
x1
x2
Introduction: Event Classification
Different techniques use different ways trying to exploit (all) features
compare and choose
How to place the decision boundary?
Let the machine learn it from training events
S