+ All Categories
Home > Documents > Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

Date post: 21-Dec-2015
Category:
View: 218 times
Download: 0 times
Share this document with a friend
26
Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University
Transcript
Page 1: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

Bayesian Neural Networks

Pushpa BhatFermilab

Harrison ProsperFlorida State University

Page 2: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

OutlineIntroductionBayesian LearningSimple Examples Summary

Page 3: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Multivariate Methods

Since the early 1990’s, we have used multivariate methods extensively in Particle Physics Some examples

Particle ID and signal/background discrimination Optimization of cuts for top quark discovery at DØ Precision measurement of top mass Searches for leptoquarks, technicolor, ..

Neural network methods have become popular due to ease of use, power and successful applications

Page 4: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Improve several aspects of analysisEvent selection

Triggering, Real-time Filters, Data Streaming

Event reconstruction Tracking/vertexing, particle ID

Signal/Background Discrimination Higgs discovery, SUSY discovery, Single top, …

Functional Approximation Jet energy corrections, tag rates, fake rates

Parameter estimation Top quark mass, Higgs mass, SUSY model parameters

Data Exploration Knowledge Discovery via data-mining Data-driven extraction of information, latent structure analysis

Why Multivariate Methods?

Page 5: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

1x

2x

)ˆ,,( 21 xxy

Multi Layer Perceptron

A popular and powerful neural network model:

Fkjijii

kjj e1

1yx( F

-θθ

;) f

ij

k

ji

kj

Need to find ’s and ’s, the free parameters of the model

Page 6: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

The Bayesian Connection

Output of a feed forward neural network can approximate the posterior probability P(s|x1,x2).

r

rxspxy

1)|()ˆ,(

1x

2x

)ˆ,,( 21 xxy

))P(|P(x

))P(|P(x )x |( 11

1ii CC

CCCP

)()|(

)()|(

bpbxp

spsxpr

Page 7: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

The Top QuarkPost-Evidence, Pre-Discovery !

Fisher Analysis of tte channel

One candidate event (S/B)(mt = 180 GeV)

= 18 w.r.t. Z = 10 w.r.t WW

NN Analysis tt e+jets channeltt

W+jets

W+jetstt160 Data

P. Bhat, DPF94

Page 8: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Measuring the Top Quark MassDiscriminant variables

mt = 173.3 ± 5.6(stat.) ± 6.2 (syst.) GeV/c2

The DiscriminantsThe Discriminants

DØ Lepton+jetsDØ Lepton+jets

Fit performed in 2-D: (DLB/NN, mfit)

Page 9: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Higgs Discovery Reach The challenges are daunting! But using NN provides same reach with

a factor of 2 less luminosity w.r.t. conventional analysis Improved bb mass resolution & b-tag efficiency crucial

Run II Higgs study hep-ph/0010338 (Oct-2000)P.C.Bhat, R.Gilmartin, H.Prosper, Phys.Rev.D.62 (2000) 074022

Page 10: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Limitations of “Conventional NN”

The training yields one set of weights or network parametersNeed to look for “best” network, but avoid

overfitting

Heuristic decisions on network architectureInputs, number of hidden nodes, etc.

No direct way to compute uncertainties

Page 11: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Ensembles of Networks

NN1

NN2

NN3

NNM

X

y1

y2

y3

yM

)(xyayi

ii

Decision by averaging over many networks (a committee of networks) has lower error than that of any individual network.

Page 12: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Bayesian Learning

The result of Bayesian training is a posterior density of the network weights

P(w|training data) Generate a sequence of weights (network

parameters) in the network parameter space i.e., a sequence of networks. The optimal network is approximated by averaging over the last K points:

K

1knew

1),( kwxy

Ky

Page 13: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Bayesian Learning – 2Advantages

Less prone to over-fitting, because of Bayesian averaging.

Less need to optimize the size of the network. Can use a large network! Indeed, number of weights can be greater than number of training events!

In principle, provides best estimate of p(t|x)p(t|x)

DisadvantagesComputationally demanding!

Page 14: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Bayesian Learning – 3Computationally demanding because

The dimensionality of the parameter space is, typically, large.

There could be multiple maxima in the likelihood function p(t|x,w), or, equivalently, multiple minima in the error function E(x,w).

Page 15: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Bayesian Neural Networks – 1

Basic IdeaCompute

Then estimate p(t|xnew) by averaging over NNs

)|(

),(),|(),|(

xtp

wxwxtpxtwp

dwxtwpwxyxy ),|(),()( newnew

Likelihood Prior

Page 16: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Bayesian Neural Networks – 2

Likelihood

Where ti = 0 or 1 for background/signal

Prior

N

i

ti

ti

ii yywxtp1

11 )(),|(

),,(),()( baww 2GammaGaussian

Page 17: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Bayesian Neural Networks – 3Computational method

Generate a Markov chain (MC) of N points {w} from the posterior density p(w|x) and average over last K

Markov Chain Monte Carlo software from http://www.cs.toronto.edu/~radford/fbm.software.html by Radford Neal

K

1knew

1),( kwxy

Ky

Page 18: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Bayesian Neural Networks – 4

Treat sampling of posterior density as a problem in Hamiltonian dynamics

in which the phase space (p,q) is explored using Markov techniques

2

ii

pxtwp

qpHqp

exp),|(

)],(exp[),Pr(

Page 19: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

A Simple ExampleSignal

ppbar tqb ( channel)

Backgroundppbar Wbb

NN Model (1, 15, 1)

MCMC5000 tqb + Wbb eventsUse last 20 networks in a

MC chain of 500.HT_AllJets_MinusBestJets(scaled)

Wbbtqb

Page 20: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

A Simple Example Estimate of Prob(s|HT)

Blue dots:p(s|HT) = Htqb/(Htqb+HWbb)

Curves: (individual NNs)y(HT, wn)

Black curve:< y(HT, w) >

Page 21: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Example: Single Top SearchTraining Data

2000 events (1000 tqb- + 1000 Wbb-)Standard set of 11 variables

Network (11, 30, 1) Network (391391 parameters!)

Markov Chain Monte Carlo (MCMC)500 iterations, but use last 100 iterations 20 MCMC steps per iterationNN-parameters stored after each iteration10,000 steps~ 1000 steps / hour (on 1 GHz, Pentium III laptop)

Page 22: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Signal/Bkgd. Distributions

Page 23: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Page 24: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Weighting with NN output

Number of data events:

Create weighted histograms of variables

nnn

xbnxsnxd

bs

bs

)()()(

)()(

)()(

)()(

)()()(

zsnzf

xbxs

xsxy

dxxyxdzfzxi

Page 25: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Weighted Distributions

Magenta: Weighting signal only; Blue: Weighting signal & backgroundBlack: Un-weighted signal distribution

Page 26: Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

Summary

Bayesian learning of neural networks takes us another step closer to realizing optimal results in classification (or density estimation) problems. It allows a fully probabilistic approach with proper treatment of uncertainties.

We have started to explore Bayesian neural networks and the initial results are promising, though computationally challenging.


Recommended