Date post: | 17-Dec-2015 |
Category: |
Documents |
Upload: | emmeline-randall |
View: | 216 times |
Download: | 2 times |
RooUnfoldRooUnfoldunfolding frameworkunfolding framework
and algorithmsand algorithms
Tim AdyeRutherford Appleton Laboratory
BaBar Statistics Working GroupBaBar Collaboration Meeting
13th December 2005
13th December 2005
Tim Adye 2
Outline
• What is Unfolding?• and why might you want to do it?
• Overview of a few techniques• Regularised unfolding• Iterative method
• RooUnfold package• Currently implements three methods with a common
interface
• Status and Plans• References
13th December 2005
Tim Adye 3
Unfolding
• In other fields known as “deconvolution”, “unsmearing”
• Given a “true” PDF in μ, that is corrupted by detector effects, described by a response function, R, we measure a distribution in ν. In terms of histograms
• This may involve1. inefficiencies: lost events2. bias and smearing: events moving between bins
(off-diagonal Rij)
• With infinite statistics, it would be possible to recover the original PDF by inverting the response matrix
M
jjiji R
1
νRμ 1
Ni ..1
13th December 2005
Tim Adye 4
Not so simple…
• Unfortunately, if there are statistical fluctuations between bins this information is destroyed• Since R washes out statistical fluctuations, R-1 cannot
distinguish between wildly fluctuating and smooth PDFs• Obtain large negative correlations between adjacent bins• Large fluctuations in reconstructed bin contents
• Need some procedure to remove wildly fluctuating solutions1. Give added weight to “smoother” solutions
2. Solve for µ iteratively, starting with a reasonable guess and truncate iteration before it gets out of hand
3. Ignore bin-to-bin fluctuations altogether
13th December 2005
Tim Adye 5
What happens if you don’t smooth
13th December 2005
Tim Adye 6
True Gaussian, with Gaussian smearing, systematic translation, and variable inefficiency – trained using a different Gaussian
13th December 2005
Tim Adye 7
Double Breit-Wigner, with Gaussian smearing, systematic translation, and variable inefficiency – trained using a single
Gaussian
13th December 2005
Tim Adye 8
So why don’t we always do this?
• If the true PDF and resolution function can be parameterised, then a Maximum Likelihood fit is usually more convenient• Directly returns parameters of interest• Does not require binning
• If the response function doesn’t include smearing (ie. it’s diagonal), then apply bin-by-bin efficiency correction directly
• If result is just needed for comparison (eg. with MC), could apply response function to MC• simpler than un-applying response to data
13th December 2005
Tim Adye 9
When to use unfolding
• Use unfolding to recover theoretical distribution where• there is no a-priori parameterisation• this is needed for the result and not just comparison with
MC• there is significant bin-to-bin migration of events
13th December 2005
Tim Adye 10
Where could we use unfolding?
• Traditionally used to extract structure functions• Widely used outside PP for image reconstruction
• Dalitz plots• Cross-feed between bins due to misreconstruction
• “True” decay momentum distributions• Theory at parton level, we measure hadrons• Correct for hadronisation as well as detector effects
13th December 2005
Tim Adye 11
1. Regularised Unfolding
• Use Maximum Likelihood to fit smeared bin contents to measured data, but include regularisation function
where the regularisation parameter, α, controls the degree of smoothness (select α to, eg., minimise mean squared error)
• Various choices of regularisation function, S, are used• Tikhonov regularisation: minimise curvature
• for some definition of curvature, eg.
• RooUnfHistoSvd by Kerstin Tackmann and Heiko Lacker• based on GURU by Andreas Höcker and Vakhtang Kartvelishvili• uses Singular Value Decomposition
• RUN by Volker Blobel
• Maximum entropy:
)()(ln)(ln μμμ SLL
21
211 ])()[()(
M
iiiiiS μ
)/ln()/()( tottot i
M
iiS μ
13th December 2005
Tim Adye 12
2. Iterative method
• Uses Bayes’ theorem to invert
and using an initial set of probabilities, pi (eg. flat) obtain an improved estimate
• Repeating with new pi from these new bin contents converges quite rapidly• Truncating the iteration prevents us seeing the bad effects of
statistical fluctuations
• Fergus Wilson and I have implemented this method in ROOT/C++• Supports 1D, 2D, and 3D cases
) bin in valuetrue| bin in observed( jiPRij
j
N
j k kjk
iij
ii n
pR
pR
1
1ˆ
13th December 2005
Tim Adye 13
2D Unfolding Example
2D Smearing, bias, variable efficiency, and
variable rotation
13th December 2005
Tim Adye 14
RooUnfold Package
• Make these different methods available as ROOT/C++ classes with a common interface to specify• unfolding method and parameters• response matrix
• pass directly or fill from MC sample
• measured histogram• return reconstructed truth histogram and errors
• full covariance matrix
• Easy to do with multiple dimensions (when supported)
• This should make it easy to try and compare different methods in your analysis• Could also be useful outside BaBar!
13th December 2005
Tim Adye 15
RooUnfold Classes• RooUnfoldResponse
• response matrix with various filling and access methods• create from MC, use on data (can be stored in a file)
• RooUnfold – unfolding algorithm base class• RooUnfoldBayes – Iterative method• RooUnfoldSvd – Inteface to RooUnfHistoSvd package• RooUnfoldBinByBin – Simple bin-by-bin method
• Trivial implementation, but useful to compare with full unfolding
• RooUnfoldExample – Simple 1D example
• RooUnfoldTest and RooUnfoldTest2D• Test with different training and unfolding distributions
13th December 2005
Tim Adye 16
RooUnfold Status
• Available in CVS• Announced in Statistics HN• See README file for details of building and running
• Interface can still be adjusted based on comments• I already have an idea for simplifying use in multi-
dimensional case
13th December 2005
Tim Adye 17
Plans and possible improvements• So far this is mostly a programming exercise
• Would be interesting to compare the different methods for some real analysis distributions
• But YMMV
• Add common tools, useful for all algorithms• Inputs and results in different formats
• already supports histograms and ROOT vectors/matrices
• Automatic calculation of figures of merit (eg. Â2)• can also use standard ROOT functions on histograms
• Simplify selection of regularisation parameter
• More algorithms?• Maximum entropy regularisation• Simple matrix inversion without regularisation
• perhaps useful with large statistics
13th December 2005
Tim Adye 18
References - Overview
• G. Cowan, A Survey of Unfolding Methods for Particle Physics, Proc. Advanced Statistical Techniques in Particle Physics, Durham (2002)http://www.ippp.dur.ac.uk/Workshops/02/statistics/
• G. Cowan, Statistical Data Analysis, Oxford University Press (1998), Chapter 11: Unfolding
• R. Barlow, SLUO Lectures on Numerical Methods in HEP (2000),Lecture 9: Unfoldingwww-group.slac.stanford.edu/sluo/Lectures/Stat_Lectures.html
13th December 2005
Tim Adye 19
References - Techniques
• V. Blobel, Unfolding Methods in High Energy Physics,DESY 84-118 (1984); also CERN 85-02
• A. Höcker and V. Kartvelishvili, SVD Approach to Data Unfolding, NIM A 372 (1996) 469www.lancs.ac.uk/depts/physics/staff/kartvelishvili.html
• K. Tackmann, H. Lacker, Unfolding the Hadronic Mass Spectrumin B->Xu lν Decays, BAD 894.
• G. D’Agostini, A multidimensional unfolding method based on Bayes’ theorem, NIM A 362 (1995) 487