Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 1
Paul AveryUniversity of Floridahttp://www.phys.ufl.edu/~avery/[email protected]
Extending the Grid Reach in EuropeBrussels, Mar. 23, 2001
http://www.phys.ufl.edu/~avery/griphyn/talks/avery_brussels_23mar01.ppt
Global Data GridsThe Need for Infrastructure
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 2
Global Data Grid Challenge
“Global scientific communities, served by networks with bandwidths varying by orders of magnitude, need to perform computationally demanding analyses of geographically distributed datasets that will grow by at least 3 orders of magnitude over the next decade, from the 100 Terabyte to the 100 Petabyte scale.”
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 3
Data Intensive Science: 2000-2015Scientific discovery increasingly driven by IT
Computationally intensive analysesMassive data collectionsRapid access to large subsetsData distributed across networks of varying capability
Dominant factor: data growth (1 Petabyte = 1000 TB)
2000 ~0.5 Petabyte2005 ~10 Petabytes2010 ~100 Petabytes2015 ~1000 Petabytes?
How to collect, manage,access and interpret thisquantity of data?
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 4
Data Intensive DisciplinesHigh energy & nuclear physicsGravity wave searches (e.g., LIGO, GEO, VIRGO)Astronomical sky surveys (e.g., Sloan Sky Survey)Global “Virtual” ObservatoryEarth Observing SystemClimate modelingGeophysics
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 5
Data Intensive Biology and Medicine
Radiology dataX-ray sources (APS crystallography data)Molecular genomics (e.g., Human Genome)Proteomics (protein structure, activities, …)Simulations of biological molecules in situHuman Brain ProjectGlobal Virtual Population Laboratory (disease
outbreaks)TelemedicineEtc.Commercial applications not far behind
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 6
The Large Hadron Collider at CERN“Compact” Muon Solenoid
at the LHC
Standard man
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 7
1800 Physicists150 Institutes32 Countries
LHC Computing ChallengesComplexity of LHC environment and resulting dataScale: Petabytes of data per year (100 PB by 2010)Global distribution of people and resources
CMS Experiment
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 8
Global LHC Data Grid Hierarchy
Tier 1
T2
T2
T2
T2
T2
3
3
3
3
3
3
3
3
3
3
3
Tier 0 (CERN)
4 4 4 4
3 3
Tier0 CERNTier1 National LabTier2 Regional Center at UniversityTier3 University workgroupTier4 Workstation
GriPhyN:R&DTier2 centersUnify all IT resources
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 9
Global LHC Data Grid Hierarchy
Tier2 Center
Online System
CERN Computer Center > 20
TIPS
France Center
USA Center Italy Center UK Center
InstituteInstituteInstituteInstitute ~0.25TIPS
Workstations,other portals
~100 MBytes/sec
2.5-10 Gb/sec
100 - 1000
Mbits/sec
Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute has ~10 physicists working on one or more channels
Physics data cache
~PBytes/sec
2.5-10 Gb/sec
Tier2 CenterTier2 CenterTier2 Center
~622 Mbits/sec
Tier 0 +1
Tier 1
Tier 3
Tier 4
Tier2 Center
Experiment
Tier 2
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 10
Global Virtual Observatory
Source Catalogs Image Data
Specialized Data:Spectroscopy, Time Series,
PolarizationInformation Archives:
Derived & legacy data: NED,Simbad,ADS, etcDiscovery Tools:
Visualization, Statistics
Standards
Multi-wavelength astronomy,Multiple surveys
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 11
GVO: The New AstronomyLarge, globally distributed database engines
Integrated catalog and image databasesMulti-Petabyte data sizeGbyte/s aggregate I/O speed per site
High speed (>10 Gbits/s) backbonesCross-connecting, correlating the major archives
Scalable computing environment100s–1000s of CPUs for statistical analysis and discovery
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 12
Infrastructurefor
Global Grids
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 13
Grid InfrastructureGrid computing sometimes compared to electric
gridYou plug in to get resource (CPU, storage, …)You don’t care where resource is located
This analogy might have an unfortunate downside
You might need different sockets!
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 14
Role of Grid Infrastructure Provide essential common Grid infrastructure
Cannot afford to develop separate infrastructures
Meet needs of high-end scientific collaborationsAlready international and even global in scopeNeed to share heterogeneous resources among membersExperiments drive future requirements
Be broadly applicable outside scienceGovernment agencies: National, regional (EU), UNNon-governmental organizations (NGOs)Corporations, business networks (e.g., supplier networks)Other “virtual organizations”
Be scalable to the Global levelBut EU + US is a good starting point
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 15
A Path to Common Grid Infrastructure
Make a concrete planHave clear focus on infrastructure and standardsBe driven by high-performance applicationsLeverage resources & act coherentlyBuild large-scale Grid testbedsCollaborate with industry
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 16
Building Infrastructure from Data Grids
3 Data Grid projects recently fundedParticle Physics Data Grid (US, DOE)
Data Grid applications for HENPFunded 2000, 2001http://www.ppdg.net/
GriPhyN (US, NSF)Petascale Virtual-Data GridsFunded 9/2000 – 9/2005http://www.griphyn.org/
European Data Grid (EU)Data Grid technologies, EU deploymentFunded 1/2001 – 1/2004http://www.eu-datagrid.org/
HEP in common
Focus: infrastructure development & deployment
International scope
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 17
Background on Data Grid ProjectsThey support several disciplines
GriPhyN: CS, HEP (LHC), gravity waves, digital astronomy
PPDG: CS, HEP (LHC + current expts), Nuc. Phys., networking
DataGrid: CS, HEP, earth sensing, biology, networking
They are already joint projectsEach serving needs of multiple constituenciesEach driven by high-performance scientific applicationsEach has international componentsTheir management structures are interconnected
Each project developing and deploying infrastructure
US$23M (additional proposals for US$35M)What if they join forces?
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 18
A Common Infrastructure Opportunity
GriPhyN + PPDG + EU-DataGrid + national effortsFrance, Italy, UK, Japan
Have agreed to collaborate, develop joint infrastructure
Initial meeting March 4 in Amsterdam to discuss issuesFuture meetings in June, July
Preparing management document Joint management, technical boards + steering committee Coordination of people, resourcesAn expectation that this will lead to real work
Collaborative projectsGrid middleware Integration into applicationsGrid testbed: iVDGLNetwork testbed (Foster): T3 = Transatlantic Terabit
Testbed
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 19
iVDGL International Virtual-Data Grid Laboratory
A place to conduct Data Grid tests at scaleA concrete manifestation of world-wide grid activityA continuing activity that will drive Grid awarenessA basis for further funding
Scale of effortFor national, international scale Data Grid tests, operationsComputationally and data intensive computingFast networks
Who Initially US-UK-EUOther world regions laterDiscussions w/ Russia, Japan, China, Pakistan, India, South
America
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 20
iVDGL ParametersLocal control of resources vitally important
Experiments, politics demand itUS, UK, France, Italy, Japan, ...
Grid ExercisesMust serve clear purposesWill require configuration changes not trivial“Easy”, intra-experiment tests first (10-20%, national,
transatlantic)“Harder” wide-scale tests later (50-100% of all resources)
Strong interest from other disciplinesOur CS colleagues (wide scale tests)Other HEP + NP experimentsVirtual Observatory (VO) community in Europe/USGravity wave community in Europe/US/(Japan?)Bioinformatics
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 21
Revisiting the Infrastructure PathMake a concrete plan
GriPhyN + PPDG + EU DataGrid + national projects
Have clear focus on infrastructure and standardsAlready agreedCOGS (Consortium for Open Grid Software) to drive
standards?
Be driven by high-performance applicationsApplications are manifestly high-perf: LHC, GVO,
LIGO/GEO/Virgo, … Identify challenges today to create tomorrow’s Grids
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 22
Revisiting the Infrastructure Path (cont)
Leverage resources & act coherentlyWell-funded experiments depend on Data Grid infrastructureCollab. with national laboratories: FNAL, BNL, RAL, Lyon, KEK,
…Collab. with other Data Grid projects: US, UK, France, Italy,
JapanLeverage new resources: DTF, CAL-IT2, …Work through Global Grid Forum
Build and maintain large-scale Grid testbeds iVDGLT3
Collaboration with industry next slideEC investment in this opportunity
Leverage and extend existing projects, worldwide expertise Invest in testbedsWork with national projects (US/NSF, UK/PPARC, …)
Part of same infrastructure
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 23
Collaboration with Industry Industry efforts are similar, but only in spirit
ASP, P2P, home PCs, … IT industry mostly has not invested in Grid R&DWe have different motives, objectives, timescales
Still many areas of common interestClusters, storage, I/OLow cost cluster managementHigh-speed, distributed databasesLocal and wide-area networks, end-to-end performanceResource sharing, fault-tolerance, …
Fruitful collaboration requires clear objectivesEC could play important role in enabling
collaborations
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 24
Status of Data Grid ProjectsGriPhyN
US$12M funded by NSF/ITR 2000 program (5 year R&D)2001 supplemental funds requested for initial
deploymentsSubmitting 5-year proposal ($15M) to NSF Intend to fully develop production Data Grids
Particle Physics Data GridFunded in 1999, 2000 by DOE ($1.2 M per year)Submitting 3-year Proposal ($12M) to DOE Office of
Science
EU DataGrid10M Euros funded by EU (3 years, 2001 – 2004)Submitting proposal in April for additional funds
Other projects?
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 25
Grid ReferencesGrid Book
www.mkp.com/grids
Globuswww.globus.org
Global Grid Forumwww.gridforum.org
PPDGwww.ppdg.net
EU DataGridwww.eu-datagrid.org/
GriPhyNwww.griphyn.org
Brussels Grid Meeting (Mar. 23, 2001)
Paul Avery 26
SummaryGrids will qualitatively and quantitatively change
the nature of collaborations and approaches to computing
Global Data Grids provide challenges needed to build tomorrows Grids
We have a major opportunity to create common infrastructure
Many challenges during the coming transitionNew grid projects will provide rich experience and lessonsDifficult to predict situation even 3-5 years ahead