+ All Categories
Home > Documents > Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor...

Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor...

Date post: 19-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
22
Data-Intensive Data-Intensive Science: Addressing Science: Addressing common needs with common needs with shared tools shared tools Christopher Stubbs Christopher Stubbs Professor Professor Department of Physics Department of Physics Department of Astronomy Department of Astronomy [email protected] [email protected]
Transcript
Page 1: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

Data-Intensive Science: Data-Intensive Science: Addressing common needs Addressing common needs

with shared toolswith shared tools

Data-Intensive Science: Data-Intensive Science: Addressing common needs Addressing common needs

with shared toolswith shared tools

Christopher StubbsChristopher Stubbs

ProfessorProfessor

Department of PhysicsDepartment of Physics

Department of AstronomyDepartment of [email protected]@fas.harvard.edu

Christopher StubbsChristopher Stubbs

ProfessorProfessor

Department of PhysicsDepartment of Physics

Department of AstronomyDepartment of [email protected]@fas.harvard.edu

Page 2: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

2

Storing, analyzing, and exploiting large data setsStoring, analyzing, and exploiting large data setsStoring, analyzing, and exploiting large data setsStoring, analyzing, and exploiting large data sets

Searching for dark matter Searching for dark matter and dark energyand dark energy

Searching for dark matter Searching for dark matter and dark energyand dark energy

Searching Searching

for new for new

elementary elementary

particlesparticles

Searching Searching

for new for new

elementary elementary

particlesparticles

Detailed imaging of brain functionDetailed imaging of brain functionDetailed imaging of brain functionDetailed imaging of brain function

Page 3: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

3

Some common threadsSome common threadsSome common threadsSome common threads• Ambitious instruments copious dataAmbitious instruments copious data

• E.g. tens of TB per night from imminent astronomy surveys

• Loosely coupled computingLoosely coupled computing• Don’t need linked analysis that uses all images

• Diverse applications from common dataDiverse applications from common data• Simulations are an integral aspect Simulations are an integral aspect • Build apparatus here, run it elsewhereBuild apparatus here, run it elsewhere• International collaborationsInternational collaborations• Computer science aspects Computer science aspects

• World’s largest non-proprietary databases• Clustering, data mining, file system optimization…

• Ambitious instruments copious dataAmbitious instruments copious data• E.g. tens of TB per night from imminent astronomy surveys

• Loosely coupled computingLoosely coupled computing• Don’t need linked analysis that uses all images

• Diverse applications from common dataDiverse applications from common data• Simulations are an integral aspect Simulations are an integral aspect • Build apparatus here, run it elsewhereBuild apparatus here, run it elsewhere• International collaborationsInternational collaborations• Computer science aspects Computer science aspects

• World’s largest non-proprietary databases• Clustering, data mining, file system optimization…

Page 4: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

4

Page 5: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

5

27 km27 km27 km27 km

CERN, outside GenevaCERN, outside GenevaCERN, outside GenevaCERN, outside Geneva

Page 6: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

6

Seriously Big Toys. Seriously Big Toys. Seriously Big Toys. Seriously Big Toys.

Harvard involvement Harvard involvement in ATLAS detector:in ATLAS detector:

• J. DaCosta and G. Brandenberg at CERN now, in shakedown

• Built muon chambers here

• J. Huth plays leadership role in scientific computing for LHC

Harvard involvement Harvard involvement in ATLAS detector:in ATLAS detector:

• J. DaCosta and G. Brandenberg at CERN now, in shakedown

• Built muon chambers here

• J. Huth plays leadership role in scientific computing for LHC

Page 7: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

Event SimulationsEvent Simulations

>30 Million event >30 Million event simulations are typicalsimulations are typical

Pick an interactionPick an interaction

Propagate through Propagate through model of the detectormodel of the detector

Measure detection Measure detection efficienciesefficiencies

>30 Million event >30 Million event simulations are typicalsimulations are typical

Pick an interactionPick an interaction

Propagate through Propagate through model of the detectormodel of the detector

Measure detection Measure detection efficienciesefficiencies

Page 8: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

On-the-fly event On-the-fly event reconstructionreconstruction

Find tracks Find tracks

and trigger/store and trigger/store if interestingif interesting

Find tracks Find tracks

and trigger/store and trigger/store if interestingif interesting

Precise track Precise track determination determination Precise track Precise track

determination determination

AggregateAggregate

event statisticsevent statistics

AggregateAggregate

event statisticsevent statistics

Page 9: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

ATLAS computingATLAS computingATLAS computingATLAS computing• 5 million lines of code5 million lines of code

• 200 developers, worldwide200 developers, worldwide

• 200 collision events per second200 collision events per second

• Automated event selection in firmwareAutomated event selection in firmware

• Selected subset of events to diskSelected subset of events to disk

• These selected events distributed These selected events distributed worldwide to a hierarchy of data centers.worldwide to a hierarchy of data centers.

• 5 million lines of code5 million lines of code

• 200 developers, worldwide200 developers, worldwide

• 200 collision events per second200 collision events per second

• Automated event selection in firmwareAutomated event selection in firmware

• Selected subset of events to diskSelected subset of events to disk

• These selected events distributed These selected events distributed worldwide to a hierarchy of data centers.worldwide to a hierarchy of data centers.

Page 10: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

Sky Surveys in AstronomySky Surveys in AstronomySky Surveys in AstronomySky Surveys in AstronomyOptical:Optical:

PanSTARRSPanSTARRS

1.4 Gpix, 1.8m1.4 Gpix, 1.8m

Optical:Optical:

PanSTARRSPanSTARRS

1.4 Gpix, 1.8m1.4 Gpix, 1.8m

Radio:Radio: Mileura Wide-Field ArrayMileura Wide-Field Array

1 km array of 8000 custom antennas1 km array of 8000 custom antennas

128 gigabit/s computing challenge128 gigabit/s computing challenge

Radio:Radio: Mileura Wide-Field ArrayMileura Wide-Field Array

1 km array of 8000 custom antennas1 km array of 8000 custom antennas

128 gigabit/s computing challenge128 gigabit/s computing challenge

Page 11: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

11

Close, Far,Close, Far,Recent AncientRecent Ancient

Expansion Expansion historyhistory can be mapped by measuring can be mapped by measuring both distances and redshiftsboth distances and redshifts

Our View of the Expanding UniverseOur View of the Expanding UniverseOur View of the Expanding UniverseOur View of the Expanding Universe

Expansion causes stretching of light, “redshift”Expansion causes stretching of light, “redshift”

Page 12: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

12

(Hubble Space Telescope, NASA)(Hubble Space Telescope, NASA)

Supernovae are powerful cosmological probes

Distances to ~6% from brightness

Redshifts from features in spectra

Page 13: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

13

Redshift = Δλ / λ

Distanceto Supernova

Far away

Nearby0.01 0.1 1.0

Δλλ

Schmidt et al, High-z SN TeamSchmidt et al, High-z SN Team

Page 14: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

14

Near Earth AsteroidsNear Earth AsteroidsNear Earth AsteroidsNear Earth Asteroids

• Inventory of solar system is incompleteInventory of solar system is incomplete• R=1 km asteroids are dinosaur killersR=1 km asteroids are dinosaur killers• R=300m asteroids in ocean wipe out a R=300m asteroids in ocean wipe out a

coastlinecoastline• Demanding project: requires mapping the sky Demanding project: requires mapping the sky

down to 24down to 24thth every few days, individual every few days, individual exposures not to exceed ~20 sec. exposures not to exceed ~20 sec.

• PanSTARRS will detect NEAs to ~400m PanSTARRS will detect NEAs to ~400m

• Inventory of solar system is incompleteInventory of solar system is incomplete• R=1 km asteroids are dinosaur killersR=1 km asteroids are dinosaur killers• R=300m asteroids in ocean wipe out a R=300m asteroids in ocean wipe out a

coastlinecoastline• Demanding project: requires mapping the sky Demanding project: requires mapping the sky

down to 24down to 24thth every few days, individual every few days, individual exposures not to exceed ~20 sec. exposures not to exceed ~20 sec.

• PanSTARRS will detect NEAs to ~400m PanSTARRS will detect NEAs to ~400m

Page 15: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

Cosmic Cinematography: ChallengesCosmic Cinematography: ChallengesCosmic Cinematography: ChallengesCosmic Cinematography: Challenges

The “static” sky: The “static” sky:

optimal co-adding of images, optimal co-adding of images,

database issuesdatabase issues

The transient sky:The transient sky:

variability classificationvariability classification

asteroid association and orbitsasteroid association and orbits

light curve analysislight curve analysis

fusion with other data setsfusion with other data sets

The “static” sky: The “static” sky:

optimal co-adding of images, optimal co-adding of images,

database issuesdatabase issues

The transient sky:The transient sky:

variability classificationvariability classification

asteroid association and orbitsasteroid association and orbits

light curve analysislight curve analysis

fusion with other data setsfusion with other data sets

Page 16: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

16

A New Approach to Radio A New Approach to Radio Astronomy HardwareAstronomy Hardware

Page 17: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

17

A Brief History of the Universe

•culmination of structure formation •first luminous  structures•turning point after the Dark Ages

Era of Reionization

ionized

neutral( H )

ionized

z~6.2

“The

Gap

Page 18: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

18

BOOLARDY

Page 19: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

19

Lincoln Greenhill (CfA)- MWA project

Page 20: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

20

IIC affords us the opportunity to share IIC affords us the opportunity to share resources, tools and know-howresources, tools and know-how

IIC affords us the opportunity to share IIC affords us the opportunity to share resources, tools and know-howresources, tools and know-how

• Shared hardware maximizes effectivenessShared hardware maximizes effectiveness• Shared archival data storage, cooperativelyShared archival data storage, cooperatively• Reap benefits of sophisticated system Reap benefits of sophisticated system

administrators and database professionalsadministrators and database professionalsPeople are quantized, unaffordable for single group

• Learn from each other on technical topics Learn from each other on technical topics of common interestof common interestOften large discrepancies across subfields, IIC raises

all boats.

• Shared hardware maximizes effectivenessShared hardware maximizes effectiveness• Shared archival data storage, cooperativelyShared archival data storage, cooperatively• Reap benefits of sophisticated system Reap benefits of sophisticated system

administrators and database professionalsadministrators and database professionalsPeople are quantized, unaffordable for single group

• Learn from each other on technical topics Learn from each other on technical topics of common interestof common interestOften large discrepancies across subfields, IIC raises

all boats.

Page 21: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

8K x 8K pixel array8K x 8K pixel array

16 independent amplifiers16 independent amplifiers

Each is a 1024 x 2048 Each is a 1024 x 2048 subimagesubimage

8K x 8K pixel array8K x 8K pixel array

16 independent amplifiers16 independent amplifiers

Each is a 1024 x 2048 Each is a 1024 x 2048 subimagesubimage

Page 22: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu.

22

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.


Recommended