+ All Categories
Home > Documents > Roofit Tutorial

Roofit Tutorial

Date post: 26-Oct-2015
Category:
Upload: debashis-basu
View: 45 times
Download: 5 times
Share this document with a friend
Description:
High energy paticle physics
47
RooFit Tutorial – Topical Lectures June 2007 Tutorial by Wouter Verkerke Presented by Max Baak
Transcript
Page 1: Roofit Tutorial

RooFit Tutorial – Topical Lectures June 2007

Tutorial by Wouter Verkerke

Presented by Max Baak

Page 2: Roofit Tutorial

RooFit: Your toolkit for data modeling

What is it?

• A powerful toolkit for modeling the expected distribution(s) of events in a physics analysis

• Primarily targeted to high-energy physicists using ROOT

• Originally developed for the BaBar collaboration by Wouter Verkerke and David Kirkby.

• Included with ROOT v5.xx

Documentation:

• http://root.cern.ch/root/Reference.html – for latest class descriptions. RooFit classes start with “Roo”.

• http://roofit.sourceforge.net – for documentation and tutorials

• Bug Wouter

Page 3: Roofit Tutorial

RooFit purpose - Data Modeling for Physics Analysis

Probability Density Function F(x; p, q)• Physical parameters of interest p

• Other parameters q to describe

detector effect (resolution,efficiency,…)

• Normalized over allowed range of the observables x w.r.t the parameters p and q

Distribution of observables x

Determination of p,q

Fit model to data

Define data model

Page 4: Roofit Tutorial

Implementation – Add-on package to ROOT

C++ command line interface & macros

Data management & histogramming

Graphics interface

I/O support

MINUIT

ToyMC dataGeneration

Data/ModelFitting

Data Modeling

Model Visualization

Shared library: libRooFit.so

Page 5: Roofit Tutorial

Data modeling - Desired functionality

Building/Adjusting Models

Easy to write basic PDFs ( normalization)

Easy to compose complex models (modular design)

Reuse of existing functions

Flexibility – No arbitrary implementation-related restrictions

Using Models

Fitting : Binned/Unbinned (extended) MLL fits, Chi2 fits

Toy MC generation: Generate MC datasets from any model

Visualization: Slice/project model & data in any possible way

Speed – Should be as fast or faster than hand-coded model

A n

a l y

s i s

c

y c

l e

Page 6: Roofit Tutorial

Data modeling – OO representation

• Mathematical objects are represented as C++ objects

variable RooRealVar

function RooAbsReal

PDF RooAbsPdf

space point RooArgSet

list of space points RooAbsData

integral RooRealIntegral

RooFit classMathematical concept

),;( qpxF

px,

x

dxxfx

xmax

min

)(

)(xf

kx

Page 7: Roofit Tutorial

Model building – (Re)using standard components

• RooFit provides a collection of compiled standard PDF classes

RooArgusBG

RooPolynomial

RooBMixDecay

RooHistPdf

RooGaussian

BasicGaussian, Exponential, Polynomial,…

Physics inspiredARGUS,Crystal Ball, Breit-Wigner, Voigtian,B/D-Decay,….

Non-parametricHistogram, KEYS

PDF Normalization• By default RooFit uses numeric integration to achieve normalization • Classes can optionally provide (partial) analytical integrals• Final normalization can be hybrid numeric/analytic form

Page 8: Roofit Tutorial

RooBMixDecay

RooPolynomial

RooHistPdf

RooArgusBG

Model building – (Re)using standard components

• Most physics models can be composed from ‘basic’ shapes

RooAddPdf+

RooGaussian

Page 9: Roofit Tutorial

RooBMixDecay

RooPolynomial

RooHistPdf

RooArgusBG

RooGaussian

Model building – (Re)using standard components

• Most physics models can be composed from ‘basic’ shapes

RooProdPdf*

Page 10: Roofit Tutorial

Model building – (Re)using standard components

• Building blocks are flexible– Function variables can be functions themselves

– Just plug in anything you like

– Universally supported by core code (PDF classes don’t need to implement special handling)

g(x;m,s)m(y;a0,a1)g(x,y;a0,a1,s)

RooPolyVar m(“m”,y,RooArgList(a0,a1)) ;RooGaussian g(“g”,”gauss”,x,m,s) ;

Page 11: Roofit Tutorial

Model building – Expression based components

• RooFormulaVar – Interpreted real-valued function– Based on ROOT TFormula class

– Ideal for modifying parameterization of existing compiled PDFs

• RooGenericPdf – Interpreted PDF– Based on ROOT TFormula class

– User expression doesn’t need to be normalized

– Maximum flexibility

RooBMixDecay(t,tau,w,…)

RooFormulaVar w(“w”,”1-2*D”,D) ;

RooGenericPdf f("f","1+sin(0.5*x)+abs(exp(0.1*x)*cos(-1*x))",x)

Page 12: Roofit Tutorial

Using models – Fitting options

• Fitting interface is flexible and powerful, many options supported

Goodness-of-fit measure

-log(Likelihood)

Extended –log(L)

Chi2

User Defined

(add custom/penalty terms to any of these)

InterfaceOne-line: RooAbsPdf::fitTo(…)

Interactive: RooMinuit class

OutputModifies parameter objects of PDF

Save snapshot of initial/final parameters, correlation matrix, fit status etc…

Sample interactive MINUIT sessionRooNLLVar nll(“nll”,”nll”,pdf,data) ;

RooMinuit m(nll) ;

m.hesse() ;

x.setConstant() ;

y.setVal(5) ;

m.migrad() ;

m.minos()

RooFitResult* r = m.save() ;

Data typeBinned

Unbinned

Weighted unbinned Access any of MINUITsminimization methods

Change and fix param. values,using native RooFit interface during fit session

Page 13: Roofit Tutorial

Using models – Fitting speed & optimizations

• Benefit of function optimization traditionally a trade-off between– Execution speed (especially in fitting)

– Flexibility/maintainability of analysis user code• Optimizations usually hard-code assumptions…

• Evaluation of –log(L) in fits lends it well to optimizations– Constant fit parameters often lead to higher-level constant PDF components

– PDF normalization integrals have identical value for all data points

– Repetitive nature of calculation ideally suited for parallelization.

• RooFit automates analysis and implementation of optimization– Modular OO structure of PDF expressions facilitate automated introspection

• Find and pre-calculate highest level constant terms in composite PDFs

• Apply caching and lazy evaluation for PDF normalization integrals

• Optional automatic parallelization of fit on multi-CPU hosts

– Optimization concepts are applied consistently and completely to all PDFs

– Speedup of factor 3-10 typical in realistic complex fits

• RooFit delivers per-fit tailored optimization without user overhead!

Page 14: Roofit Tutorial

Using models – Plotting

• RooPlot – View of 1 datasets/PDFs projected on the same dimension

Create the view on mesRooPlot* frame = mes.frame() ;

Project the data on the mes viewdata->plotOn(frame) ;

Project the PDF on the mes viewpdf->plotOn(frame) ;

Project the bkg. PDF componentpdf->plotOn(frame,Components(“bkg”))

Draw the view on a canvasframe->Draw() ;

Axis labels auto-generated

Page 15: Roofit Tutorial

Using models - Overview

• All RooFit models provide universal and complete fitting and Toy Monte Carlo generating functionality– Model complexity only limited by available memory and CPU power

• models with >16000 components, >1000 fixed parametersand>80 floating parameters have been used (published physics result)

– Very easy to use – Most operations are one-liners

RooAbsPdf

RooDataSet

RooAbsData

gauss.fitTo(data)

data = gauss.generate(x,1000)

Fitting Generating

Page 16: Roofit Tutorial

Advanced features – Task automation

• Support for routine task automation, e.g. goodness-of-fit study

Input model Generate toy MC Fit model

Repeat N times

Accumulatefit statistics

Distribution of- parameter values- parameter errors- parameter pulls

// Instantiate MC study managerRooMCStudy mgr(inputModel) ;

// Generate and fit 100 samples of 1000 eventsmgr.generateAndFit(100,1000) ;

// Plot distribution of sigma parametermgr.plotParam(sigma)->Draw()

Page 17: Roofit Tutorial

RooFit users tutorial

The basics

Probability density functions & likelihoods

The basics of OO data modeling

The essential ingredients: PDFS, datasets, functions

Page 18: Roofit Tutorial

Outline of the hands-on part

1. Guide you through the fundamentals of RooFit

2. Look at some sample composite data models1. Still quite simple, all 1-dimensional

3. Try to do at least one ‘advanced topic’1. Tutorial 8: Calculating the P-value of your analysis

– P-Value = How often does an equivalent data sample with no signal mimic the signal you observe

2. Tutorial 9: Fit the top mass

• Copy tutorial.tar.gz from /project/atlas/users/mbaak/TPL/– Untar tutorial.tar in your home directory

– Contents of the tutorial setup

tutorial/docs/roofit_tutorial.ppt /macros

http://root.cern.ch/root/html514/ClassIndex.html

This presentation Macros to be used in this tutorial

Open in your favorite browser

Page 19: Roofit Tutorial

Loading RooFit into ROOT

• >source setup.csh (in the tutorial/ directory)

• Make sure libRooFit.so is in $ROOTSYS/lib

• Start ROOT

• In the ROOT command line load the RooFit library

– Or start root in the tutorial/ directory.

gSystem->Load(“libRooFit”) ;

Page 20: Roofit Tutorial

Creating a variable – class RooRealVar

• Creating a variable object

– Every RooFit objects must have a unique name!

RooRealVar mass(“mass”,“m(e+e-)”,0,1000) ;

C++ nameName Title Allowed range

Page 21: Roofit Tutorial

Creating a probability density function

• First create the variables you need

• Then create a function object

– Give variables as arguments to link variables to function

RooRealVar mass(“mass”,“mass”,-10,10) ;

RooRealVar mean(“mean”,“mean”,0.0,-10,10) ;RooRealVar width(“width”,“sigma”,3.0,0.1,10.) ;

RooGaussian gauss(“gauss”,”Gaussian”, mass, mean, width) ;

Initial value

Allowed range

Allowed range

Page 22: Roofit Tutorial

Making a plot of a function

• First create an empty plot

– A frame is a plot associated with a RooFit variable (mass in this case)

• Draw the empty plot on a ROOT canvas

RooPlot* frame = mass.frame() ;

Plot range taken from limits of x

frame->Draw()

Page 23: Roofit Tutorial

Making a plot of a function (continued)

• Draw the (probability density) function in the frame

• Update the frame in the ROOT canvas

gauss->plotOn(frame) ;

Axis label from gauss title

Unit normalization

frame->Draw()

Page 24: Roofit Tutorial

Interacting with objects

• Changing and inspecting variables

• Draw another copy of gauss

sigma.getVal() ;(const Double_t) 3.00

sigma = 1.0 ;

sigma.getVal() ;(const Double_t) 1.00

gauss->plotOn(frame) ;frame->Draw()

Page 25: Roofit Tutorial

Inspecting composite objects

• Inspecting the structure of gauss

• Inspecting the contents of frame

gauss->printCompactTree() ;

0x10b95fc0 RooGaussian::gauss (gauss) [Auto] 0x10b90c78 RooRealVar::x (x) 0x10b916f8 RooRealVar::mean (mean) 0x10b85f08 RooRealVar::sigma (sigma)

frame->Print(“v”)

RooPlot::frame(10ba6830): "A RooPlot of "x"" Plotting RooRealVar::x: "x" Plot contains 2 object(s) (Options="L") RooCurve::curve_gaussProjected: "Projection of gauss" (Options="L") RooCurve::curve_gaussProjected: "Projection of gauss"

Page 26: Roofit Tutorial

Data

• Unbinned data is represented by a RooDataSet object

• Class RooDataSet is RooFit interface to ROOT class TTree

row x y

1 0.57 4.86

2 5.72 6.83

3 2.13 0.21

4 10.5 -35.

5 -4.3 -8.8

TTree

RooDataSet

RooRealVar xRooRealVar y

RooDataSet associatesa RooRealVar withcolumn of a TTree

Association by matching TTree Branch name with RooRealVarname

Page 27: Roofit Tutorial

Creating a dataset from a TTree

• First open file with TTree

• Create RooDataSet from tree

TFile f(“tut0.root”) ;f.ls() ;root [1] .lsTFile** tut1.root TFile* tut1.root KEY: TTree xtree;1 xtree

RooDataSet data(“data”,”data”,x,xtree) ;

RooFit Variable in dataset Imported TTree

macros/tut0.root

Page 28: Roofit Tutorial

Drawing a dataset on a frame

• Create new plot frame, draw RooDataSet on frame, draw frame

RooPlot* frame2 = x.frame() ;data->plotOn(frame2) ;frame2->Draw() ;

Note Poisson Error bars

Page 29: Roofit Tutorial

Overlaying a PDF curve on a dataset

• Add PDF curve to frame

pdf->plotOn(frame2) ;frame2->Draw() ;

Unit normalizedPDF automaticallyscaled to dataset

But shape is not right!Lets fit the curveto the data

Page 30: Roofit Tutorial

Fitting a PDF to an unbinned dataset

• Fit gauss to data

• Behind the scenes

1. RooFit constructs the Likelihood from the PDF and the dataset

2. RooFit passes the Likelihood function to MINUIT to minimize

3. RooFit extracts the result from MINUIT and stores in the RooRealVar objects that represent the fit parameters

• Draw the result

gauss->fitTo(*data) ;

gauss->plotOn(frame2) ;frame2->Draw() ;

Page 31: Roofit Tutorial

Looking at the fit results

• Look again at the PDF variables

– Results from MINUIT back-propagated to variables

sigma.Print() ;RooRealVar::sigma: 1.9376 +/- 0.043331 (-0.042646, 0.044033) L(-10 – 10)

mean.Print() ;RooRealVar::mean: -0.0843265 +/- 0.061273 (-0.061210, 0.061361) L(-10 - 10)

Adjusted value Symmetricerror

(from HESSE)

Asymmetricerror

(from HESSE)

Page 32: Roofit Tutorial

Putting it all together

• A self contained example to construct a model, fit it, and plot it on top of the data

void fit(TTree* dataTree) { // Define model RooRealVar x(“x”,”x”,-10,10) ; RooRealVar sigma(“sigma”,”sigma”,2,0.1,10) ; RooRealVar mean(“mean”,”mean”,-10,10) ; RooGaussian gauss(“gauss”,”gauss”,x,mean,sigma) ;

// Import data RooDataSet data(“data”,”data”,dataTree,x) ;

// Fit data gauss.fitTo(data) ;

// Make plot RooPlot* frame = x.frame() ; data.plotOn(frame) ; gauss.plotOn(frame) ; frame->Draw() ;}

macro/tut1.C

In macro/tut1.Cuncomment two lines below // Make plot and see what happens

Page 33: Roofit Tutorial

Building composite PDFS

• RooFit has a collection of many basic PDFS

RooArgusBG - Argus background shapeRooBifurGauss - Bifurcated GaussianRooBreitWigner - Breit-Wigner shapeRooCBShape - Crystal Ball functionRooChebychev - Chebychev polynomialRooDecay - Simple decay functionRooExponential - Exponential functionRooGaussian - Gaussian functionRooKeysPdf - Non-parametric data descriptionRooPolynomial - Generic polynomial PDFRooVoigtian - Breit-Wigner (X) Gaussian

HTML class documentation indoc/html/ClassIndex.html

(open with internet explorer)

Page 34: Roofit Tutorial

Building realistic models

• You can combine any number of the preceding PDFs to build more realistic models

RooRealVar x(“x”,”x”,-10,10)

// Construct background modelRooRealVar alpha(“alpha”,”alpha”,-0.3,-3,0) ;RooExponential bkg(“bkg”,”bkg”,x, alpha) ;

// Construct signal modelRooRealVar mean(“mean”,”mean”,3,-10,10) ;RooRealVar sigma(“sigma”,”sigma”,1,0.1,10) ;RooGaussian sig(“sig”,”sig”,x,mean,sigma) ;

// Construct signal+background modelRooRealVar sigFrac(“sigFrac”,”signal fraction”,0.1,0,1) ;RooAddPdf model(“model”,”model”,RooArgList(sig,bkg),sigFrac) ;

// Plot modelRooPlot* frame = x.frame() ;model.plotOn(frame) ;model.plotOn(frame,Components(bkg),LineStyle(kDashed)) ;frame->Draw() ;

macro/tut2.C

Page 35: Roofit Tutorial

Building realistic models

Page 36: Roofit Tutorial

Sampling ‘toy’ Monte Carlo events from model

• Just like you can fit models, you can also sample ‘toy’ Monte Carlo events from models

RooDataSet* mcdata = model->generate(x,1000) ;

RooPlot* frame2 = x.frame() ;mcdata->plotOn(frame2) ;

model->plotOn(frame2) ;frame2->Draw() ;

Try it ...

Page 37: Roofit Tutorial

RooAddPdf can add any number of models

RooRealVar x("x","x",0,10) ;

// Construct background model RooRealVar alpha("alpha","alpha",-0.7,-3,0) ; RooExponential bkg1("bkg1","bkg1",x,alpha) ;

// Construct additional background model RooRealVar bkgmean("bkgmean","bkgmean",7,-10,10) ; RooRealVar bkgsigma("bkgsigma","bkgsigma",2,0.1,10) ; RooGaussian bkg2("bkg2","bkg2",x,bkgmean,bkgsigma) ;

// Construct signal model RooRealVar mean("mean","mean",3,-10,10) ; RooRealVar width("width","width",0.5,0.1,10) ; RooBreitWigner sig("sig","sig",x,mean,width) ;

// Construct signal+2xbackground model RooRealVar bkg1Frac("bkg1Frac","signal fraction",0.2,0,1) ; RooRealVar sigFrac("sigFrac","signal fraction",0.5,0,1) ; RooAddPdf model("model","model",RooArgList(sig,bkg1,bkg2), RooArgList(sigFrac,bkg1Frac)) ;

RooPlot* frame = x.frame() ; model.plotOn(frame) ; model.plotOn(frame,Components(RooArgSet(bkg1,bkg2)),LineStyle(kDashed)) ; frame->Draw() ;

macros/tut3.C

Page 38: Roofit Tutorial

RooAddPdf can add any number of models

Try adding another signal term

Page 39: Roofit Tutorial

Extended Likelihood fits

• Regular likelihood fits only fit for shape– Number of coefficients in RooAddPdf is always one less than

number of components

• Can also do extended likelihood fit– Fit for both shape and observed number of events

– Accomplished by adding ‘extended likelihood term’ to regular LL

• Extended term automatically constructed in RooAddPdf if given equal number of coefficients & PDFS

)log()),(log()(log expexp NNNpxgpL obsD

i

Page 40: Roofit Tutorial

Extended Likelihood fits and RooAddPdf

• How to construct an extended PDF with RooAddPdf

• Fitting with extended model

// Construct extended signal+2xbackground model RooRealVar nbkg1(“nbkg1",“number of bkg1 events",300,0,1000) ; RooRealVar nbkg2(“nbkg2",“number of bkg2 events",200,0,1000) ; RooRealVar nsig( “nsig",“number of signal events",500,0,1000) ; RooAddPdf emodel(“emodel",“emodel",RooArgList(sig, bkg1, bkg2), RooArgList(nsig,nbkg1,nbkg2)) ;

Previous modelsigFracbkg1Frac

Add extended termsigFracbkg1Fracntotal

New representationnsignbkg1nbkg2

emodel.fitTo(data,”e”) ;

Include extended term in fit

macros/tut4.C

Look at sum, expected errors, and correlations between fitted event numbers

Page 41: Roofit Tutorial

Switching gears

• Hands-on exercise so far designed to introduce you to basic model building syntax

• Real power of RooFit is in using those models to explore your analysis in an efficient way

• No time in this short session to cover this properly, so next slide just gives you a flavor of what is possible

1. Multidimensional models, selecting by likelihood ratio

2. Demo on ‘task automation’ as mentioned in last slide of introductory slide

Page 42: Roofit Tutorial

Multi-dimensional PDFs

• RooFit handles multi-dimensional PDFs as easily as 1D PDFs– Just use class RooProdPdf to multiply 1D PDFS

• Case example: selecting B+ D0 K+– Three discriminating variables: mES, DeltaE, m(D0)

• Look at example model, fit, plots in

* *

* *

Signal Model

Background Model

macros/tut5.C

Page 43: Roofit Tutorial

Selecting by Likelihood ratio

• Plain projection of multi-dimensional PDF and dataset often don’t do justice to analyzing power of PDF– You don’t see selecting power of PDF in dimensions that are

projected out

– Possible solution: don’t plot all events, but show only events passing cut of signal,bkg likelihood ratios constructed from PDF dimensions that are not shown in the plot

macros/tut6.C

Plain projection of mESof previous excercise

Nsig = 91 ± 10

Result from 3D fit

Close to sqrt(N)

Page 44: Roofit Tutorial

Next topic: How stable is your fit

• When looking at low statistics fit, you’ll want to check explicitly– Is your fit stable and unbiased

• Check by running through large set of toy MC samples– Fit each sample, accumulate fit statistics and make pull

distribution

• Technical procedure– Generate toy Monte Carlo sample with desired number of events

– Fit for signal in that sample

– Record number of fitted signal events

– Repeat steps 1-3 often

– Plot distributions of Nsig, (Nsig), pull(Nsig)

• RooFit can do all this for you with 2 lines of code!– Try out the example in macros/tut7.C

Experiment with lowering number of signal events

Page 45: Roofit Tutorial

How often does background mimic your signal?

• Useful quantity in determining importance of your signal: the P-value

– P-Value: How often does a data sample of comparable statistics with no signal mimic the signal yield you observe

– Tells you how probable it is that your peak is the result of a statistical fluctuation of the background

• Procedure very similar to last exercise – First generate fake ‘data’, fit data to determine ‘data signal yield’

– Generate toy Monte Carlo sample with 0 signal events

– Fit for signal in that sample

– Record number of fitted signal events

– Repeat steps 1-3 often

– See what fraction of fits result in a signal yield exceeding your ‘observed data yield’

• Try out the example in macros/tut8.C

Page 46: Roofit Tutorial

Additional Exercises

• Set up you own fit!

• Fit the top quark mass distribution in

• For the top signal (around 160 GeV/c2), use a Gaussian.

• For the background, try out

– Polynomial (RooPolynomial)

– Chebychev polynomial (RooChebychev)

macros/tut9.C

Minumum number of terms needed?Which background description works better?Why? Look at correlation matrix.

Page 47: Roofit Tutorial

Advanced examples

• See the macros/examples/ directory.


Recommended