NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Prof. Dr. Michael FeindtKCETA - Centrum für Elementarteilchen- und AstroteilchenphysikIEKP, Universität Karlsruhe, KITPhi-T GmbH, Karlsruhe
DESY Computing Seminar May 19, 2008
NeuroBayes® et al.: professional methods for
optimised reconstruction algorithms and statistical analysis
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
NeuroBayes® principle
NeuroBayes® Teacher:Learning of complex relationships from existingdata bases (e.g. Monte Carlo)
NeuroBayes® Expert:Prognosis for unknown data
Output
Input
Sign
ific
ance
cont
rol
Postprocessing
Preprocessing
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
f t
Probability that hypothesisis correct (classification)or probability densityfor variable t
t
How it works: training and applicationHistoric or simulateddata
Data seta = ...b = ...c = .......t = …!
NeuroBayesNeuroBayes®®TeacherTeacher
NeuroBayesNeuroBayes®®ExpertExpert
Actual (new real) data
Data seta = ...b = ...c = .......t = ?
ExpertiseExpertise
Expert system
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Classification:Binary targets: Each single outcome will be “yes“ or “no“NeuroBayes output is the Bayesian posterior probability thatanswer is “yes“ (given that inclusive rates are the same in trainingand test sample, otherwise simple transformation necessary).
Examples:> This elementary particle is a K meson. > This event is a Higgs candidate.> Germany will become soccer world champion in 2010. > Customer Meier will have liquidity problems next year.> This equity price will rise.
NeuroBayes® task 1:Classifications
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
NeuroBayes® task 2:Conditional probability densities
Probability density for real valued targets:For each possible (real) value a probability (density) is given. From that all statistical quantities like mean value, median, mode, standard deviation, percentiles etc can be deduced.
Examples:> Energy of an elementary particle
(e.g a semileptonically decaying B meson with missing neutrino) > Q value of a decay> Lifetime of a decay> Price change of an equity or option> Company turnaround or earnings
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Prediction of the complete probabilitydistribution – event by event unfolding -
t
)|( xtf rModeExpectation value
Standard deviationvolatility
Deviations fromnormal distribution,e.g. crash probability
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Conditional probability densitiesin particle physics
What is the probability density of the true B momentum in this semileptonic B candidate event taken with the CDF II detector
with these n tracks with those momenta and rapidities in the hemisphere, which are forming this secondary vertex with this decay length and probability, this invariant mass and transverse momentum, this lepton information, this missing transverse momentum, this difference in Phi and Theta between momentum sum and vertex topology, etc pp
)|( xtf r
t
xr
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Naïve neural networks and criticizmWe‘ve tried that but it didn‘t give good results
- Stuck in local minimum- Learning not robust
We‘ve tried that but it was worse than our 100 person-yearsanalytical high tech algorithm
- Selected too naive input variables- Use your fancy algorithm as INPUT !
We‘ve tried that but the predictions were wrong- Overtraining: the net learned statistical fluctuations
Yeah but how can you estimate systematic errors?- How can you with cuts when variables are correlated?- Tests on data, data/MC agreement possible and done.
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Address all these topics and build a professional robust and flexible neuralnetwork package for physics, insurance, bank and industry applications:NeuroBayes®
<phi-t>: Foundation out of University of Karlsruhe, sponsored by exist-seed-programme of thefederal ministery for Education and ResearchBMBF
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
2000-2002 NeuroBayes®-specialisationfor economy at the University of Karlsruhe
Oct. 2002: GmbH founded, first industrial projects
June 2003: Removal into new office199 sqm IT-Portal Karlsruhe
May 2008: Expansion to 700 sqm officeExclusive rights for NeuroBayes®Staff all physicists (almost all from HEP)Customers (among others):
BGV and VKB car insurancesAXA and Central health insurancesLupus Alpha Asset Managementdm drogerie markt (drugstore chain)Otto Versand (mail order business)Libri (book wholesale) Thyssen Krupp (steel industry)
History
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Main message:NeuroBayes is a very powerful algorithm• robust – (unless fooled) does not overtrain, always finds
good solution - and fast • can automatically select significant variables • output interpretable as Bayesian a posteriori probability• can train with weights and background subtraction• has potential to improve many analyses significantly • in density mode it can be used to improve resolutions (e.g.
lifetime in semileptonic B decays)NeuroBayes is easy to use • Examples and documentation available• Good default values for all options fast start!• Direct interface to TMVA available• Introduction into root planned
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
<phi-t> NeuroBayes®
> is based on neural 2nd generation algorithms, Bayesian regularisation,optimised preprocessing with transformationsand decorrelation of input variables andlinear correlation to output.
> learns extremely fast due to 2nd order methods and 0-iteration mode
> is extremly robust against outliers> is immune against learning by heart statistical noise> tells you if there is nothing relevant to be learned> delivers sensible prognoses already with small statistics> has advanced boost and cross validation features> is steadily further developed
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Ramler-plot (extended correlation matrix)
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Ramler-II-plot (visualize correlation to target)
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Visualisation of single input-variables
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Visualisation of correlation matrix
Variable 1: Training target
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Visualisation of network performance
Purity vs. efficiency
Signal-effiziency vs. total efficiency(Lift chart)
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Visualisation of NeuroBayes network topology
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Some applications in high energy physics
DELPHI: (mainly predecessors of NeuroBayes in BSAURUS)Kaon, proton, electron idOptimisation of resolutions inclusive B- E, φ, θ, Q-valueB**, Bs** enrichmentB fragmentation functionLimit on Bs-mixingB0-mixingB- F/B-asymmetryB-> wrong sign charm
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Some applications in high energy physics
CDF II:Electron ID, muon ID, kaon/proton IDOptimisation of resonance reconstruction in many channels
(X, Y, D, Ds , Ds**, B, Bs, B**,Bs**)Spin parity analysis of X(3182) Inclusion of NB output in likelihood fitsB-tagging for high pt physics (top, Higgs, etc.)Single top quark production searchHiggs searchB-Flavour Tagging for mixing analyses (new combined tagging)B0, Bs-lifetime, ∆Γ, mixing, CP violation
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Some applications in high energy physics
Very recently:CMS: B-tagging
Belle:Continuum suppression
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
More than 35 diploma and Ph.D. theses…
from experiments DELPHI, CDF II, AMS and CMSused NeuroBayes® or predecessors very successfully.
Many of these can be found at www.phi-t.de Wissenschaft NeuroBayes
Talks about NeuroBayes® and applications:www-ekp.physik.uni-karlsruhe.de/~feindt Forschung
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
NeuroBayes soft electron identification for CDF II(Ulrich Kerzel, Michael Milnik, M.F.)NeuroBayes soft electron identification for CDF II(Ulrich Kerzel, Michael Milnik, M.F.)
Thesis U. Kerzel: on basis of Soft Electron Collection(much more efficient thancut selectionor JetNet with same inputs)- after clever preprocessing by hand
and careful learning parameterchoice this could also be as goodas NeuroBayes®
Thesis U. Kerzel: on basis of Soft Electron Collection(much more efficient thancut selectionor JetNet with same inputs)- after clever preprocessing by hand
and careful learning parameterchoice this could also be as goodas NeuroBayes®
Just a few examples…
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Just a few examples…
NeuroBayes® selectionNeuroBayes® selection
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Just a few examples… First observation of B_s1 and most precise of B_s2*
Selection using NeuroBayes®
First observation of B_s1 and most precise of B_s2*
Selection using NeuroBayes®
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Nice new methods…Training with weighted events (e.g for JPC-determination)
Data-only training with sideband subtraction (i.e. negative weights)
Construction of weights for MC phase space events such thatthey are distributed like real data
Brand new:Interpretation of NeuroBayes output as Bayesian a posteriori probability allows to avoid cuts on output variable but instead-- inclusion into likelihood-fits (B-mixing, CP-violation)-- usage with sPlot to produce “background free“ plots
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Example for data-only training (on1.resonance)(scan through cuts on network output)
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
NeuroBayes Bs to J/ψ Φ selection without MC (Michal Kreps)NeuroBayes Bs to J/ψ Φ selection without MC (Michal Kreps)
soft preselection, input to firstNeuroBayes training
soft cut on net 1, input to second NeuroBayes training
cut on net 2
all data
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
NeuroBayes Bs to J/ψ Φ selection without MC (Michal Kreps)NeuroBayes Bs to J/ψ Φ selection without MC (Michal Kreps)
Significance S/B
# Signal #Background
N_signal = 757.4+-28.7, mass 5366.6 +- 0.4 MeVlifetime 432.3 +- 17.6 mu
N_signal = 757.4+-28.7, mass 5366.6 +- 0.4 MeVlifetime 432.3 +- 17.6 mu
no lifetime bias by input variables or NeuroBayes
no lifetime bias by input variables or NeuroBayes
NNout
NNout
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Successfulin competitionwith otherdata-mining-methods
World‘s largest studentscompetion:Data-Mining-Cup2005: Fraud detectionin internet trading2006: price predictionin ebay auctions2007: coupon redemptionprediction
3434
Ziele f(t|x) Beispiele Prinzip Funktion Konkurrenz Forschung SpielIdee NeuroBayes A B
3434
Hintergrund Anwendung Beispiel Projekt l Projekt llHistorie AblaufStart Idee Summary A
Michael Feindt NeuroBayes DESY Computing Seminar May 2008
NeuroBayes
Phi-T: Applications of NeuroBayes® in Economy
> Medicine and Pharma researche.g. effects and undesirable effects of drugsearly tumor recognition
> Banks e.g. credit-scoring (Basel II), finance time seriesprediction, valuation of derivates, risk minimisedtrading strategies, client valuation
> Insurancese.g. risk and cost prediction for individual clients, probability of contract cancellation, fraud recognition, justice in tariffs
> Trading chain stores: turnover prognosis for individual articles/stores
Necessary prerequisite:Historic or simulated data must be available!
3535
Ziele f(t|x) Beispiele Prinzip Funktion Konkurrenz Forschung SpielIdee NeuroBayes A B
3535
Hintergrund Anwendung Beispiel Projekt l Projekt llHistorie AblaufStart Idee Summary A
Michael Feindt NeuroBayes DESY Computing Seminar May 2008
NeuroBayes
Individual risk prognosesfor car insurances:
Accident probabilityCost probability distributionLarge damage prognosisContract cancellation prob.
very successful at
3636
Ziele f(t|x) Beispiele Prinzip Funktion Konkurrenz Forschung SpielIdee NeuroBayes A B
3636
Hintergrund Anwendung Beispiel Projekt l Projekt llHistorie AblaufStart Idee Summary A
Michael Feindt NeuroBayes DESY Computing Seminar May 2008
NeuroBayes
Turnover prognosis for mail order businessTurnover prognosis for mail order business
3737
Ziele f(t|x) Beispiele Prinzip Funktion Konkurrenz Forschung SpielIdee NeuroBayes A B
3737
Hintergrund Anwendung Beispiel Projekt l Projekt llHistorie AblaufStart Idee Summary A
Michael Feindt NeuroBayes DESY Computing Seminar May 2008
NeuroBayes
Prognosis of individual health costs
Kunde N. 00000
Mann, 44
Tarif XYZ123 seit ca. 17 Jahre
Pilot project for a large private health insurance
Prognosis of costs in following year for each person insured with confidence intervals
4 years of training, test on following year
Results:
Probability density for each customer/tarif combination
Very good test results!
Has potential for a real and objective cost reduction in health management
3838
Ziele f(t|x) Beispiele Prinzip Funktion Konkurrenz Forschung SpielIdee NeuroBayes A B
3838
Hintergrund Anwendung Beispiel Projekt l Projekt llHistorie AblaufStart Idee Summary A
Michael Feindt NeuroBayes DESY Computing Seminar May 2008
NeuroBayes
Test
VDI-Nachrichten, 9.3.2007
Prognosis of financial markets
NeuroBayes® based risk averse market neutral fonds for institutional investors
Lupus Alpha NeuroBayes® Short Term Trading Fonds
Test
3939
Ziele f(t|x) Beispiele Prinzip Funktion Konkurrenz Forschung SpielIdee NeuroBayes A B
3939
Hintergrund Anwendung Beispiel Projekt l Projekt llHistorie AblaufStart Idee Summary A
Michael Feindt NeuroBayes DESY Computing Seminar May 2008
NeuroBayes
Test
Test
4040
Ziele f(t|x) Beispiele Prinzip Funktion Konkurrenz Forschung SpielIdee NeuroBayes A B
4040
Hintergrund Anwendung Beispiel Projekt l Projekt llHistorie AblaufStart Idee Summary A
Michael Feindt NeuroBayes DESY Computing Seminar May 2008
NeuroBayes
The <phi-t> mouse game:or:
even your ``free will´´ is predictable
//www.phi-t.de/mousegame
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
Bindings and Licenses NeuroBayes® is commercial software. Special rates for public research.Essentially free for high energy physics research.License files needed. Separately for expert and teacher.Please contact Phi-T.
NeuroBayes core code written in Fortran. Libraries for many platforms (Linux, Windows, …) available. Bindings exist for C++, Fortran, java, lisp, etc.
Code generator for easy usage exists for Fortran and C++.New: Interface to root-TMVA available (classification only)
Integration into root planned
NeuroBayes® Michael Feindt DESY Computing Seminar May 2008
DocumentationBasics:M. Feindt, A Neural Bayesian Estimator for Conditional Probability Densities, E-preprint-archive physics 0402093
M. Feindt, U. Kerzel, The NeuroBayes Neural Network Package,NIM A 559(2006) 190
Web Sites:www.phi-t.de (Company web site, German & English)www.neurobayes.de (English site on physics results with NeuroBayes & all diploma and PhD theses using NeuroBayes)www-ekp.physik.uni-karlsruhe.de/~feindt (some NeuroBayestalks can be found here under -> Forschung)