+ All Categories
Home > Technology > Switc Hpa

Switc Hpa

Date post: 26-Jun-2015
Category:
Upload: ptihpa
View: 324 times
Download: 1 times
Share this document with a friend
Popular Tags:
16
Digital Data Handling with Modern Cyberinfrastructure Scott Teige [email protected] October 2009
Transcript
Page 1: Switc Hpa

Digital Data Handling with Modern Cyberinfrastructure

Scott [email protected]

October 2009

Page 2: Switc Hpa

Scott Teige

Contents• The trend toward “born digital” data• The bad old days• The new days• Examples: the new, the old

Page 3: Switc Hpa

Scott Teige

Trends• The US will produce 113 million medical images in the

next year (CNN)• CT and MRI scans are “born digital”• Physics has a long tradition of digital data acquisition

which continues with, for example, the latest CERN experiments

• Chemistry, Biology, Geology, Communication and Culture, Anthropology and Economics are also producing increasing amounts of data

• Hard drives are down to $0.07 per GigaByte, 8GB thumb drives are SWAG at conferences.

Page 4: Switc Hpa

Scott Teige

The Bad old days (~1992)

Page 5: Switc Hpa

Scott Teige

The bad old days, part 2• Data written from the instrument to 8mm video tape (loss

of ~5%)• Tapes carried from DAQ computers to analysis

computers• Tapes carried (courier) from instrument building to

“storage” facility at BNL (Patty M. office bookshelves)• 2nd pass analysis on BNL mainframes (loss ~5%)• Tapes copied to DLT (loss ~10%)• … years pass …• DLT copied to HPSS (loss ~5%)

Page 6: Switc Hpa

Scott Teige

Almost there …• USArray, locations of the transportable seismographs.

Page 7: Switc Hpa

Scott Teige

Almost there …• Data written to a hard drive on the seismometer• Data uplinked via cell phone or satellite to central location• Researchers request specific portions of the data via web

interface• Data sent via e-mail (small request) or hard drive to

researcher (“large” request)• Once a year, or so, someone goes to the seismographs

and retrieves the hard drives…

Page 8: Switc Hpa

Scott Teige

A modern case• The electron microscope in Simon Hall

Page 9: Switc Hpa

Scott Teige

A modern case• Images are digitized by the instrument• The digitized images are written directly to the Data

Capacitor• The Data Capacitor appears as a local file system on the

researchers desktop computer, BigRed, Quarry and some other TeraGrid systems

• The researcher does quality checks, tuning, optimization, etc. on his local workstation.

• CPU intensive analysis is done on the large systems provided by IU or the TeraGrid

• Data is archived daily to the HPSS (via high bandwidth connection from DC to HPSS)

Page 10: Switc Hpa

Scott Teige

Infrastructure, The Data Capacitor

• >300 TeraBytes

Page 11: Switc Hpa

Scott Teige

Infrastructure, HPSS

• >3 PetaBytes

Page 12: Switc Hpa

Scott Teige

Infrastructure, CPU Resources

Big Red [TeraGrid System]30 TFLOPS IBM JS21 SuSE Cluster 768 blades/3072 cores: 2.5 GHz PPC 970MP8GB Memory, 4 cores per bladeMyrinet 2000LoadLeveler & Moab

Quarry [Future TeraGrid System]7 TFLOPS IBM HS21 RHEL Cluster140 blades/1120 cores: 2.0 GHz Intel Xeon

53358GB Memory, 8 cores per blade1Gb Ethernet (upgrading to 10Gb)PBS (Torque) & Moab

Page 13: Switc Hpa

Scott Teige

Infrastructure, Network

• 10 GigE to parts of campus, 1GigE to entire system• 4x10GigE from BigRed to DC• 48x1GigE from Quarry to DC• 15x10 GigE from DC to HPSS

Page 14: Switc Hpa

Scott Teige

What does this give you?

Page 15: Switc Hpa

Scott Teige

What does this give you? FAQ• How much data can I have?

• All of it, right now.• Where is my data?

• Everywhere.• Where can I analyze my data?

• Anywhere.• How long can I keep my data?

• Forever.• Is there a backup?

• Yes, two of them.

Page 16: Switc Hpa

Scott Teige

AcknowledgmentsThis material is based upon work supported by the National Science Foundation under

Grant Numbers 0116050 and 0521433. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation (NSF).

This work was support in part by the Indiana Metabolomics and Cytomics Initiative (METACyt). METACyt is supported in part by Lilly Endowment, Inc.

This work was support in part by the Indiana Genomics Initiative. The Indiana Genomics Initiative of Indiana University is supported in part by Lilly Endowment, Inc.

This work was supported in part by Shared University Research grants from IBM, Inc. to Indiana University.


Recommended