Post on 19-Dec-2015
transcript
The Global Storage Grid
Or, Managing Data for “Science 2.0”
Ian FosterComputation Institute
Argonne National Lab & University of Chicago
2
“Web 2.0” Software as services
Data- & computation-richnetwork services
Services as platforms Easy composition of services to create new capabilities
(“mashups”)—that themselves may be made accessible as new services
Enabled by massive infrastructure buildout Google projected to spend $1.5B on computers, networks,
and real estate in 2006 Dozens of others are spending substantially
Paid for by advertising
Declan Butler,Nature
3
Science 2.0:E.g., Cancer Bioinformatics Grid
Data Service@ uchicago.edu<BPEL
WorkflowDoc>
BPELEngine
Analytic service@ osu.edu
Analytic service@ duke.eduResearcher
Or Client App <WorkflowResults>
<WorkflowInputs> link
link
link
link
caBiG: https://cabig.nci.nih.gov/ BPEL work: Ravi Madduri et al.
4
Science 2.0:E.g., Virtual Observatories
Data ArchivesData Archives
User
Analysis toolsAnalysis tools
Gateway
Figure: S. G. Djorgovski
Discovery toolsDiscovery tools
5
Science 1.0 Science 2.0:For Example, Digital Astronomy
Tell me aboutthis star Tell me about
these 20K stars
Support 1000sof users
E.g., Sloan DigitalSky Survey, ~40 TB;others much bigger soon
6
Global Data Requirements
Service consumer Discover data Specify compute-intensive analyses Compose multiple analyses Publish results as a service
Service provider Host services enabling data access/analysis Support remote requests to services Control who can make requests Support time-varying, data- and compute-intensive
workloads
7
But: Amount of computation can be enormous Load can vary tremendously Users want to compose distributed services
data must sometimes be moved, anyway Fortunately
Networks are getting much faster (in parts) Workloads can have significant locality of
reference
Analyzing Large Data:“Move Computation to the Data”
9
Highly Connected “Core”:For Example, TeraGrid
StarlightStarlight
LALA
AtlantaAtlanta
SDSCSDSC
TACCTACC
PUPU
IUIU
ORNLORNL
NCSANCSA
ANLANL
PSCPSC
• 16 Supercomputers - 9 different types, multiple sizes• World’s fastest network• Globus Toolkit and other middleware providing single
login, application management, data movement, web services
30 Gigabits/s to large sites= 20-30 times major uni links= 30,000 times my home broadband= 1 full length feature film per sec
75 Teraflops (trillion calculations/s)= 12,500 faster than all 6 billion humans on earth each doing one calculation per second
11
The Two Dimensions of Science 2.0
Decompose across network
Clients integrate dynamically Select & compose services Select “best of breed” providers Publish result as new services
Decouple resource & service providers
Function
Resource
Data Archives
Analysis tools
Discovery toolsUsers
Fig: S. G. Djorgovski
12
Provisioning
Technology Requirements: Integration & Decomposition
Service-oriented Gridinfrastructure Provision physical
resources to support application workloads
ApplnService
ApplnService
Users
Workflows
Composition
Invocation
Service-oriented applications Wrap applications &
data as services Compose services
into workflows
“The Many Faces of IT as Service”, ACM Queue, Foster, Tuecke, 2005
13
Technology Requirements:Within the Core …
Provide “service hosting services” that allow consumers to negotiate the hosting of arbitrary data analysis services
Dynamically manage resources (compute, storage, network) to meet diverse computational demands
Provide strong internal security, bridging to diverse external sources of attributes
14
Globus SoftwareEnables Grid Infrastructure
Web service interfaces for behaviors relating to integration and decomposition Primitives: resources, state, security Services: execution, data movement, …
Open source software that implements those interfaces In particular, Globus Toolkit (GT4)
All standard Web services “Grid is a use case for Web services, focused on
resource management”
15
Open Source Grid Software
Data Mgmt
SecurityCommonRuntime
Execution Mgmt
Info Services
GridFTPAuthenticationAuthorization
ReliableFile
Transfer
Data Access& Integration
Grid ResourceAllocation &
ManagementIndex
CommunityAuthorization
DataReplication
CommunitySchedulingFramework
Delegation
ReplicaLocation
Trigger
Java Runtime
C Runtime
Python Runtime
WebMDS
WorkspaceManagement
Grid Telecontrol
Protocol
Globus Toolkit v4www.globus.org
CredentialMgmt
Globus Toolkit Version 4: Software for Service-Oriented Systems, LNCS 3779, 2-13, 2005
16
GT4 Data Services
Data movement GridFTP—secure,
reliable, performant Reliable File Transfer:
managed transfers Data Replication Service—managed replication
Replica Location Service Scales to 100s of millions of replicas
Data Access & Integration services Access to, and server-side processing, of structured data
Bandwidth Vs Striping
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
0 10 20 30 40 50 60 70
Degree of Striping
Ba
nd
wid
th (
Mb
ps
)
# Stream = 1 # Stream = 2 # Stream = 4
# Stream = 8 # Stream = 16 # Stream = 32
Disk-to-disk onTeraGrid
17
Security Services Attribute Authority (ATA)
Issue signed attribute assertions (incl. identity, delegation & mapping)
Authorization Authority (AZA) Decisions based on assertions & policy
VO AService
VOATA
VOAZA
MappingATA
VO BService
VOUser A
Delegation AssertionUser B can use Service A
VO-A Attr VO-B Attr
VOUser B
Resource AdminAttribute
VO MemberAttribute
VO Member Attribute
18
Service Hosting
Policy
Client
Environment
Activity
Allocate/provisionConfigure
Initiate activityMonitor activityControl activity
Interface Resource provider
WSRF (or WS-Transfer/WS-Man, etc.), Globus GRAM, Virtual Workspaces
19
Virtual OSG Clusters
OSG cluster
Xen hypervisors
TeraGrid cluster
OSG
“Virtual Clusters for Grid Communities,” Zhang et al., CCGrid 2006
20
Managed Storage:GridFTP with NeST (Demoware)
GridFTP Server
NeST Module
Disk
Storage
NeST Server
Chirp
Custom Application
(Lot operations, etc.)(chirp)
(File transfers)(GSI-FTP)
(File transfer)(chirp)
GT4 NeST
Bill Allcock, Nick LeRoy, Jeff Weber, et al.
21
Community
Services Provider
Content
Services
Capacity
Hosting Science Services1) Integrate services from other sources
Virtualize external services as VO services
2) Coordinate & compose Create new services from existing ones
Capacity Provider
“Service-Oriented Science”, Science, 2005
22
Virtualizing Existing Services Establish service agreement with service
E.g., WS-Agreement Delegate use to community (“VO”) user
UserA
VO Admin
UserBVO User
ExistingServices
23
Birmingham•
The Globus-BasedLIGO Data Grid
Replicating >1 Terabyte/day to 8 sites>40 million replicas so farMTBF = 1 month
LIGO Gravitational Wave Observatory
www.globus.org/solutions
Cardiff
AEI/Golm
24
Pull “missing” files to a storage system
List of required
Files
GridFTPLocal
ReplicaCatalog
ReplicaLocation
Index
Data Replication
Service
Reliable File
Transfer Service Local
ReplicaCatalog
GridFTP
Data Replication Service
“Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005
ReplicaLocation
Index
Data MovementData Location
Data Replication
25
Hypervisor/OS Deploy hypervisor/OS
Data Replication Service:Dynamic Deployment
Physical machineProcure hardware
VM VM Deploy virtual machine
State exposed & access uniformly at all levelsProvisioning, management, and monitoring at all levels
JVM Deploy container
DRS Deploy service GridFTP LRC
VO Services
GridFTP
26
Decomposition EnablesSeparation of Concerns & Roles
User
ServiceProvider
“Provide access to data D at S1, S2, S3 with performance P”
ResourceProvider
“Provide storage with performance P1, network with P2, …”
D
S1
S2
S3
D
S1
S2
S3Replica catalog,User-level multicast, …
D
S1
S2
S3
27
Example: Biology
Public PUMA Knowledge Base
Information about proteins analyzed against ~2 million gene sequences
Back OfficeAnalysis on Grid
Millions of BLAST, BLOCKS, etc., on
OSG and TeraGridNatalia Maltsev et al., http://compbio.mcs.anl.gov/puma2
29
Example:Earth System Grid
Climate simulation data Per-collection control Different user classes Server-side processing
Implementation (GT) Portal-based User
Registration (PURSE) PKI, SAML assertions GridFTP, GRAM, SRM
>2000 users >100 TB downloaded
www.earthsystemgrid.org — DOE OASCR
31
Example:Astro Portal Stacking Service
Purpose On-demand “stacks” of
random locations within ~10TB dataset
Challenge Rapid access to 10-10K
“random” files Time-varying load
Solution Dynamic acquisition of
compute, storage
++++++
=
+
S4 SloanDataWeb page
or Web Service
Joint work with Ioan Raicu & Alex Szalay
33
Example: Cybershake
Calculate hazard curves by generating synthetic seismograms from estimated rupture forecast
Rupture Forecast
Synthetic Seismogram
Strain GreenTensor
Hazard CurveSpectral Acceleration
Hazard Map
Tom Jordan et al., Southern California Earthquake Center
34
20 TB,1.8 CPU-year
Enlisting TeraGridResources
Workflow Scheduler/Engine
VO Service Catalog
Provenance Catalog
Data Catalog
SCECStorage
TeraGridCompute
TeraGridStorage
VO Scheduler
Ewa Deelman, Carl Kesselman, et al., USC Information Sciences Institute
Number of jobs per day (23 days), 261,823 jobs total, Number of CPU hours per day, 15,706 hours total (1.8 years)
1
10
100
1000
10000
100000
10/1
910
/21
10/2
310
/25
10/2
710
/29
10/3
111
/211
/411
/611
/8
11/1
0
JOBS
HRS
35
http://dev.globus.org
Guidelines(Apache)
Infrastructure(CVS, email,
bugzilla, Wiki)
ProjectsInclude
…
dev.globus — Community Driven Improvement of Globus Software, NSF OCI
36
Summary
“Science 2.0”—science as service, & service as platform—demands New infrastructure—service hosting New technology—hosting & management New policy—hierarchically controlled access
Data & storage management cannot be separated from computation management And increasingly become community roles
A need for new technologies, skills, & roles Creating, publishing, hosting, discovering, composing,
archiving, explaining … services
39
Acknowledgements
Carl Kesselman for many discussions Many colleagues, including those named on
slides, for research collaborations and/or slides
Colleagues involved in the TeraGrid, Open Science Grid, Earth System Grid, caBIG, and other Grid infrastructures
Globus Alliance members for Globus software R&D
DOE OASCR, NSF OCI, & NIH for support