+ All Categories
Home > Documents > Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung...

Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung...

Date post: 11-Jan-2016
Category:
Upload: holly-janel-hardy
View: 212 times
Download: 0 times
Share this document with a friend
15
Physics data management tools: computational evolutions and benchmarks Mincheol Han 1 , Chan-Hyeung Kim 1 , Lorenzo Moneta 2 , Maria Grazia Pia 3 , Hee Seo 1 1 Hanyang University, Korea – 2 CERN, Switzerland 3 INFN Sezione Di Genova, Italy SNA + MC 2010 Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo 2010
Transcript
Page 1: Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung Kim 1, Lorenzo Moneta 2, Maria Grazia Pia 3, Hee Seo.

Physics data management tools: computational evolutions and benchmarks

Mincheol Han1, Chan-Hyeung Kim1, Lorenzo Moneta2, Maria Grazia Pia3, Hee Seo1

 1 Hanyang University, Korea – 2 CERN, Switzerland – 3 INFN Sezione Di Genova, Italy

SNA + MC 2010Joint International Conference on

Supercomputing in Nuclear Applications + Monte Carlo 2010

Page 2: Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung Kim 1, Lorenzo Moneta 2, Maria Grazia Pia 3, Hee Seo.

Physics data libraries

Data libraries Collection of experimental or theoretical tabulations of physics quantities e.g. cross sections, form factors, nuclear and atomic parameters etc. Distributed by data centres: RSICC (ORNL), NEA, NIST…

Essential ingredient of Monte Carlo simulation Use established data Speed up simulation w.r.t. using analytical formulae Common background for different Monte Carlo systems

ENDF/B, ENSDF, JENDL, CENDL, BROND, EEDL, EPDL, EADL…

Page 3: Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung Kim 1, Lorenzo Moneta 2, Maria Grazia Pia 3, Hee Seo.

Dealing with physics data

Data management Load (and store) data Retrieve data Use data: directly, by interpolation

Loading Usually in the simulation initialization phase Loading on demand

Retrieving In the course of the simulation (usually at each step) Can be source of significant overload

Page 4: Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung Kim 1, Lorenzo Moneta 2, Maria Grazia Pia 3, Hee Seo.

Original design in Geant4

G4EMDataSet

-z: G4int-energies: G4DataVector-data: G4DataVector-log_energies: G4DataVector-log_data: G4DataVector-algorithm: G4VDataSetAlgorithm

+FindValue()+GetComponent()+AddComponent()+NumberOfComponents()+GetEnergies()+GetData()+SetEnergiesData()+LoadData()

G4VEMDataSet

+FindValue()+GetComponent()+AddComponent()+NumberOfComponents()+GetEnergies()+GetData()+SetEnergiesData()+LoadData()

G4CompositeEMDataSet

-components: std::vector<G4VEMDataSet*>-algorithm: G4VDataSetAlgorithm

+FindValue()+GetComponent()+AddComponent()+NumberOfComponents()+GetEnergies()+GetData()+SetEnergiesData()+LoadData()-CleanUpComponents()

G4DNACrossSectionDataSet

-z: G4int-components: std::vector<G4VEMDataSet*>-algorithm: G4VDataSetAlgorithm

+FindValue()+GetComponent()+AddComponent()+NumberOfComponents()+GetEnergies()+GetData()+SetEnergiesData()+LoadData()-CleanUpComponents()

G4ShellEMDataSet

-z: G4int-components: std::vector<G4VEMDataSet*>-algorithm: G4VDataSetAlgorithm

+FindValue()+GetComponent()+AddComponent()+NumberOfComponents()+GetEnergies()+GetData()+SetEnergiesData()+LoadData()#CleanUpComponents()

G4VDataSetAlgorithm

+Calculate()+Clone()

G4LinLogInterpolation

+Calculate()+Clone()

G4LinLogLogInterpolation

+Calculate()+Clone()

G4LinInterpolation

+Calculate()+Clone()

G4SemiLogInterpolation

+Calculate()+Clone()

G4LogLogInterpolation

+Calculate()+Clone()

Composite PatternHandle different data collections transparently Data for materials Data for atoms Data for shells

Strategy PatternHandle interchangeable interpolation algorithms transparently

electromagnetic data (Livermore library)

Page 5: Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung Kim 1, Lorenzo Moneta 2, Maria Grazia Pia 3, Hee Seo.

Can we improve it?

Geant4 physics on a diet Leaner software design Improve computational performance Enhance clarity and transparency Facilitate testing Ease maintenance

CHEP 2009 R&D: physics models M.G. Pia et al., Design and performance evaluations of generic programming

techniques in a R&D prototype of Geant4 physics

Monte Carlo + CHEP 2010 R&D: physics data Prototype to evaluate candidate solutions quantitatively

This talk:selection of

preliminary results

Final and complete results will be

documented in a dedicated

publication

Page 6: Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung Kim 1, Lorenzo Moneta 2, Maria Grazia Pia 3, Hee Seo.

Test set-upTest case: Livermore library data EEDL (Evaluate Electron Data Library): ionisation, Bremsstrahlung EPDL97 (Evaluated Photon Data Library): Compton and Rayleigh

scattering, photoelectric effect, pair and triplet production EADL (Evaluated Atomic Data Library): atomic parameters

Computing environment Geant4 9.4-beta + G4EMLOW 6.13 Intel® Core™ Duo CPU E8500 with 3.16 GHz processor, 4 GB RAM,

Linux SLC5, gcc 4.3.5 compiler Intel® CPU U4100 with 1.30GHz processor, 2 GB RAM, MS Windows XP

SP3, MSVC++9 C++ compiler (with SP1)

Load test loading data for a number of elements between 1 and 100 each experiment repeated 100 times, the whole series repeated 10 times

Retrieve test finding the data associated with a randomly chosen atomic number finding procedure repeated 106 times, whole experiment repeated 10 times

Page 7: Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung Kim 1, Lorenzo Moneta 2, Maria Grazia Pia 3, Hee Seo.

Data structure

Improve the physical design of the data library itself

Large tabulations split into individual files (one per element)

0 20 40 60 80 100

0

1000

2000

3000

4000

Tim

e(m

s)

Count of ActiveZ

time (ms) to load data vs. number of elements present

in the experimental set-up

Excitation data

original data split data

Page 8: Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung Kim 1, Lorenzo Moneta 2, Maria Grazia Pia 3, Hee Seo.

Data structure

Large physics tabulations require large memory allocation for storing the data, time to load them into memory and to search trough them

Are all the data really necessary?

Reduce the amount of data

0 20 40 60 80 1000

20

40

60

80

100

120

140

160

180

Num

ber

of R

ows

Z number

A

B

C

A

B

CSuppress B if ● can be

interpolated with the same

accuracy based on A and C

time (ms) to load data vs. number of elements present

in the experimental set-up

Compton scattering functions

Number of data for each element

0 20 40 60 80 100

1000

1200

1400

1600

1800

2000

2200

The Number of Material

Tim

e(m

s)

reduced

original

original data reduced data

Page 9: Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung Kim 1, Lorenzo Moneta 2, Maria Grazia Pia 3, Hee Seo.

Use forthcoming C++ features

Current implementation uses STL map for most data, STL vector for a few data types

Evaluated unordered_map (AKA hash map)

Included in C++0x TR1

<tr1/unordered_map> gcc 4.3.x

<unordered_map> in MSVC

0 20 40 60 80 100340

360

380

400

420

440

460

480

500

Tim

e(m

s)

Count of ActiveZ

Pair production cross sections

time (ms) to load data vs. number of elements present

in the experimental set-up

STL mapunordered_map

Page 10: Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung Kim 1, Lorenzo Moneta 2, Maria Grazia Pia 3, Hee Seo.

Caching pre-calculated data

Recent modification in Geant4 low energy electromagnetic package: cache pre-calculated log10 data

Credit to current Geant4 low energy electromagnetic group

Not to be credited to the authors of this talk

The authors of this talk Quantified the time for loading/retrieving Quantified the memory consumption

to store additional (cached) data Reviewed the modified software design

and implementation: flaws

~10% time saving w.r.t. on-the-fly log10 calculation

0 20 40 60 80 100

380

400

420

440

460

480

500

520

540

560

580

600

620

Tim

e(m

s)

Count of ActiveZ

0 20 40 60 80 100-200

0

200

400

600

800

1000

1200

1400

1600

1800

Tim

e(m

s)

Count of ActiveZ

loading

retrieving

original modified

original modified

time (ms) to load and retrieve data vs. number of elements present

in the experimental set-up

Page 11: Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung Kim 1, Lorenzo Moneta 2, Maria Grazia Pia 3, Hee Seo.

Generic programming techniques

G4EMDataSet

-Z: int-energies: vector<double>-data: vector<double>-log_energies: vector<double>-log_data: vector<double>

+GetEnergies()+GetData()+GetLogEnergies()+GetLogData()+LoadData()+SetEnergiesData()

G4GenericCrossSectionHandler

-dataMap: unordered_map<int, CONT>

+FindValue()+LoadData()

Interpmethod : typenameCONT : typenameLENGTH = 1

G4ShellEMDataSet

-Z: int-components: vector<G4EMDataSet*>

+AddComponent()+GetComponent()+GetEnergies()+GetData()+GetLogEnergies()+GetLogData()+LoadData()#CleanUpComponents()

G4SpectrumEMDataSet

-Z: int-components: vector<G4EMDataSet*>

+AddComponent()+GetComponent()+GetEnergies()+GetData()+GetLogEnergies()+GetLogData()+LoadData()#CleanUpComponents()

G4SpectrumShellEMDataSet

-Z: int-components: vector<G4ShellEMDataSet*>

+GetComponent()+GetEnergies()+GetData()+GetLogEnergies()+GetLogData()+LoadData()#CleanUpComponents()

Interpolation

+GetInterpolationValue()+GetLogEnergy()+GetLogData()

CalcEnginePolicy : typenameLin

<<CppStruct>>

+Calc()+IneedLogEnergy()+IneedLogData()

LinLog<<CppStruct>>

+Calc()+IneedLogEnergy()+IneedLogData()

SemiLog<<CppStruct>>

+Calc()+IneedLogEnergy()+IneedLogData()

LogLog<<CppStruct>>

+Calc()+IneedLogEnergy()+IneedLogData()

LinLogLog<<CppStruct>>

+Calc()+IneedLogEnergy()+IneedLogData()

polymorphic behavior of data sets and interpolation algorithms is not necessary at runtime through dynamic binding

OOAD iteration

Preliminarydesign

Templates eliminate the overhead due to the virtual table associated with inheritance

Contribution to to improving execution speed

Page 12: Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung Kim 1, Lorenzo Moneta 2, Maria Grazia Pia 3, Hee Seo.

Effect of prototype design: loading

0 20 40 60 80 100

0

500

1000

1500

2000

Tim

e(m

s)

Count of ActiveZ

Rayleigh scattering form factors

0 20 40 60 80 100

0

200

400

600

800

1000

1200

Tim

e(m

s)

Count of ActiveZ

Bremsstrahlung cross sections

The extent of the improvement depends on the characteristics of the data

original prototype

original prototype

time (ms) to load data vs. number of elements present in the experimental set-up

Original design: STL vectors, load all elements

Page 13: Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung Kim 1, Lorenzo Moneta 2, Maria Grazia Pia 3, Hee Seo.

Effect of prototype design: retrieving

0 20 40 60 80 100

360

380

400

420

440

460

480

500

520

540

560

580

600

620

640

660

Tim

e(m

s)

Count of ActiveZ0 20 40 60 80 100

300

350

400

450

500

550

600

650

700

750

800

850

900

Tim

e(m

s)

Count of ActiveZ

Pair production cross sections Bremsstrahlung spectrum data

time (ms) to retrieve data vs. number of elements present in the experimental set-up

Original designPrototype design

Prototype design + unordered_map

Page 14: Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung Kim 1, Lorenzo Moneta 2, Maria Grazia Pia 3, Hee Seo.

Use vectors!

Some data sets in the original design do not require the use of STL map

Can be efficiently managed by using STL vectors

Not worthwhile to move them to unordered_map

0 20 40 60 80 100

220

240

260

280

300

320

340

360

380

400

420

440

460

480

500

520

Tim

e(m

s)

Count of ActiveZ

Rayleigh scattering form factors

time (ms) to retrieve data vs. number of elements present

in the experimental set-up

Original designPrototype design (map)

Prototype design (unordered_map)

Page 15: Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung Kim 1, Lorenzo Moneta 2, Maria Grazia Pia 3, Hee Seo.

ConclusionsPrototype R&D on Geant4 physics data management

Investigated Data structure Software design Use of C++0x TR1 features

Results Leaner software Improved performance

RD44 1994-1998

Geant4 R&D phaseCutting edge technologyRigorous software development process

Geant4 would profit from reenacting a R&D phase to exploit new technology

with the same spirit of scientific openness and rigorousness as RD44

Same conclusions at CHEP 2009 regarding physics modeling

AcknowledgmentThanks to CERN Directorate for support


Recommended