Date post: | 11-Jan-2016 |
Category: |
Documents |
Upload: | gervais-francis |
View: | 212 times |
Download: | 0 times |
Computer Science Research
Ian FosterUniversity of Chicago & Argonne National Laboratory
GriPhyN NSF Project Review29-30 January 2003
Chicago
229 Jan 2003Ian Foster, U.Chicago [email protected]
Computer Science Research Introduction & Context (Ian Foster: 30 mins)
– Vision : Virtual data as e-science enabler
– Organization: Structure & interactions
– Dissemination: Targets and mechanisms
– The nature of future challenges Computer science research
– Virtual data (Mike Wilde: 15)
– Scheduling, planning (Ewa Deelman: 15)
– Execution (Mike Franklin: 15)
– Performance (Valerie Taylor: 15) Technology delivery (Miron Livny: 15)
– Virtual Data Toolkit Student presentations (60)
329 Jan 2003Ian Foster, U.Chicago [email protected]
Computer Science Research Introduction & Context (Ian Foster: 30 mins)
– Vision : Virtual data as e-science enabler
– Organization: Structure & interactions
– Dissemination: Targets and mechanisms
– The nature of future challenges Computer science research
– Virtual data (Mike Wilde: 15)
– Scheduling, planning (Ewa Deelman: 15)
– Execution (Mike Franklin: 15)
– Performance (Valerie Taylor: 15) Technology delivery (Miron Livny: 15)
– Virtual Data Toolkit Student presentations (60)
429 Jan 2003Ian Foster, U.Chicago [email protected]
PetaScale Virtual Data Grids (1)
Virtual Data ToolsRequest Planning & Scheduling Tools
Request Execution & Management Tools
Transforms
Distributed resources(code, storage,
computers, and network)
Resource Management
Services
Resource Management
Services
Security and Policy
Services
Security and Policy
Services
Other Grid ServicesOther Grid
Services
Interactive User Tools
Production TeamIndividual Investigator Research group
Raw datasource
PetaOpsPetabytes
Performance
529 Jan 2003Ian Foster, U.Chicago [email protected]
Petascale Virtual Data Grids (2)
GridOperations
simulation data
discovery
ScienceReview
Data Grid
storageelement
replica locationservice
storageelement
storageelement
Dat
aT
ran
spo
rt Sto
rage
Reso
urce
Mg
mt
virtualdata
catalogvirtual data
index
virtualdata
catalog
virtualdata
catalog
Computing Grid
workflowplanner
request plannerworkflowexecutor
(DAGman)
request executor(Condor-G,
GRAM)
requestpredictor
(Prophesy)
Grid Monitor
ProductionManager
Researcher
planning
discovery
com
po
sition
sim
ula
tio
n
anal
ysis
sharing
raw d
ata
detector
derivatio
n
629 Jan 2003Ian Foster, U.Chicago [email protected]
Computer Science and GriPhyN
ComputerScience
Research
VirtualData
Toolkit
PartnerPhysicsProjects
LargerScience
Community
Globus, Condor, NMI, EU DataGrid, PPDG Communities
ProductionDeployment
TechTransfer
Techniques& software
Requirements
Prototyping& experiments
Other linkages:- Work force- CS researchers- Industry
729 Jan 2003Ian Foster, U.Chicago [email protected]
Computer Science Challenges (1) Virtual data
– Representation, discovery, & manipulation of workflows and associated data & programs
Planning– Mapping workflows in an efficient, policy-aware manner to
distributed resources Execution
– Executing workflows, including data movements, reliably and efficiently
Performance– Monitoring aspects of system performance for scheduling &
troubleshooting
829 Jan 2003Ian Foster, U.Chicago [email protected]
Computer Science Challenges (2)
Engage meaningfully with physics groups Provide educational opportunities Develop, package, deliver, and support
quality software Achieve outreach to groups outside partner
physics experiments
929 Jan 2003Ian Foster, U.Chicago [email protected]
Computer Science Research Introduction & Context (Ian Foster: 30 mins)
– Vision : Virtual data as e-science enabler
– Organization: Structure & interactions
– Dissemination: Targets and mechanisms
– The nature of future challenges Computer science research
– Virtual data (Mike Wilde: 15)
– Scheduling, planning (Ewa Deelman: 15)
– Execution (Mike Franklin: 15)
– Performance (Valerie Taylor: 15) Technology delivery (Miron Livny: 15)
– Virtual Data Toolkit Student presentations (60)
1029 Jan 2003Ian Foster, U.Chicago [email protected]
GriPhyN Computer Science Team U.Chicago: Dumitrescu, Foster, Iamnitchi, Milligan, Ranganathan,
Ripeanu, Voeckler, Wilde USC/ISI: Deelman, Kesselman, Mehta, Patil, Singh, Vahi NWU -> TAMU: Taylor, Yin UCB: Franklin, Liu UCSD: Marzullo, Moore, Zhang, Jagatheesan UW-Madison: Alderman, Arpaci-Dusseau, Arpaci-Dusseau, Bailey,
Bent, Kosar, Livny, Roy, Stanley, Thain UF: Arbee, George, Jiang, Katageri, Ranka, Rodriguez UT Brownsville: Campanelli, Morris, Zamora LBNL: Shoshani
Faculty/Staff, Student/Postdoc (underlined = present)
1129 Jan 2003Ian Foster, U.Chicago [email protected]
Computer Science Research:How do We Work?
System architecture & virtual data toolkit as two overarching organizational mechanisms
Project activities all defined in relationship to these organizing principles: – Research: Explore new techniques to guide
evolution of the system architecture and VDT
– Development: Construct VDT software
– Evaluation: Apply and evaluate VDT software and/or new techniques in context of application challenges
1229 Jan 2003Ian Foster, U.Chicago [email protected]
Computer Science Research:How Are We Coordinated?
The activities of this large, multidisciplinary group are coordinated by frequent and multivalent communications– Face-to-face meetings in large & small groups
– Formal and informal documents defining requirements, challenge problems, testbeds
– Email, phone calls, videoconferences
– Cooperation on challenge problems and technology and application demonstrations
– Cooperation on software releases
1329 Jan 2003Ian Foster, U.Chicago [email protected]
GriPhyN Architecture/VDTand CS Research Projects
VirtualData
Planning
Execution
ChimeraVirtual
Data System+ Pegasus
Planner
DAGmanWorkflow
Globus Toolkit,Condor,Ganglia,
Etc.
Partial Queries(Liu, Franklin)
Decentralized scheduling (Ranganatha
n) Fault-
tolerantmaster-worker
(Marzullo) Scalable replicalocation service(UC, ISI team)
Policy-aware
scheduling(Dumitresc
u)
Ontologies
(Zhao)
NeST Storage mgmt
(UW team)
Virtual data language
design(Voeckler,Wild
e)
AI Planning(Deelman,Nara
ng) Virtual data language
applns(Milligan,
Zhao) DAGmanenhancemen
ts(UW team)
Prophesy (Taylor,
Yin)
HP monitoring(George)
VDT Research
1429 Jan 2003Ian Foster, U.Chicago [email protected]
GriPhyN Arch/VDT—CS ResearchDegree of Coupling
VirtualData
Planning
Execution
ChimeraVirtual
Data System+ Pegasus
Planner
DAGmanWorkflow
Globus Toolkit,Condor,Ganglia,
Etc.
Partial Queries(Liu, Franklin)
Decentralized scheduling (Ranganatha
n) Fault-
tolerantmaster-worker
(Marzullo) Scalable replicalocation service(UC, ISI team)
Policy-aware
scheduling(Dumitresc
u)
Ontologies
(Zhao)
NeST Storage mgmt
(UW team)
Virtual data language
design(Voeckler,Wild
e)
AI Planning(Deelman,Nara
ng) Virtual data language
applns(Milligan,
Zhao) DAGmanenhancemen
ts(UW team)
Prophesy (Taylor,
Yin)
HP monitoring(George)
VDT Research
Already
Underway
Pending
1529 Jan 2003Ian Foster, U.Chicago [email protected]
Examples of Technology Injection:Chimera R&D Timeline
Chimera-2• Type model• Dataset catalog• Metadata• Hyperlinks• Instance tracking• Performance data
20032002
Chimera-1• Java code & class model• XML VDL• TR/DV model• Compound TRs• General Grid exec env• Optimized DB schema
Chimera-0• Derivations only• Grid exec environment (prototype)• PERL & PostgresQL
Sloancluster finding
APPS
TECH
CMS analysis
prototype w/ROOT
CMS official event
simulation
Sloan cluster-finding science
CMS & ATLAS
analysis w/ROOT, CLARENS,
JASLIGO pulsar search
ATLAS events-on- demand
CMS event simulation
prototyping
Chimera-3• Knowledge repr.• Policy-driven planners• VD browsers, composers• …
2004
Sloan near-earth object
BioGrid
facility…
1629 Jan 2003Ian Foster, U.Chicago [email protected]
Computer Science Research Introduction & Context (Ian Foster: 30 mins)
– Vision : Virtual data as e-science enabler
– Organization: Structure & interactions
– Dissemination: Targets and mechanisms
– The nature of future challenges Computer science research
– Virtual data (Mike Wilde: 15)
– Scheduling, planning (Ewa Deelman: 15)
– Execution (Mike Franklin: 15)
– Performance (Valerie Taylor: 15) Technology delivery (Miron Livny: 15)
– Virtual Data Toolkit Student presentations (60)
1729 Jan 2003Ian Foster, U.Chicago [email protected]
Dissemination: Targets
Researchers and educators– Facilitate creation of new knowledge
Computer science research community– Contribute to knowledge
– Engage community in solving our problems Open source community
– Contribute to open Grid technology base Industry
– Contribute to vibrant commercial technology
1829 Jan 2003Ian Foster, U.Chicago [email protected]
Dissemination: Mechanisms
Software– VDT: adoption by LHC Computing Grid
– Globus Toolkit and Condor systems Publications and talks
– XX papers, YY tech reports, ZZ talks Workshops and meetings
– E.g., “Data Derivation & Provenance”, Oct 02 Community activities
– E.g., advisory committees, GGF standards
1929 Jan 2003Ian Foster, U.Chicago [email protected]
Representative Publications Annis, J., Zhao, Y., Voeckler, J., Wilde, M., Kent, S., Foster, I., Applying Chimera Virtual Data
Concepts to Cluster Finding in the Sloan Sky Survey. SC'2002, 2002. Bent, J., Venkataramani, V., LeRoy, N., Roy, A., Stanley, J., Arpaci-Dusseau, A.C., Arpaci-
Dusseau, R.H., Livny, M., Flexibility, Manageability, and Performance in a Grid Storage Appliance, HPDC’11, 2002.
Deelman, E., Blackburn, K., Ehrens, P., Kesselman, C., Koranda, S., Lazzarini, A., Mehta, G., Meshkat, L., Pearlman, L., Blackburn, K. and Williams., R., GriPhyN and LIGO: Building a Virtual Data Grid for Gravitational Wave Scientists, HPDC’11, 2002.
Foster, I., Voeckler, J., Wilde, M., Zhao, Y., Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation, SSDBM, 2002.
Iamnitchi, A., Ripeanu, M., Foster, I., Locating Data in (Small-World?) Peer-to-Peer Scientific Collaborations. 1st Intl. Workshop on Peer-to-Peer Systems, 2002.
Raman, P., George, A., Radlinski, M., Subramaniyan, R., GEMS: Gossip-Enabled Monitoring Service for Heterogeneous Distributed Systems, Technical Report, UF, 2002.
Ranganathan, K. and Foster, I., Decoupling Computation and Data Scheduling in Distributed Data Intensive Applications, HPDC’11, 2002.
Ripeanu, M., Foster, I., Iamnitchi, A. Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design. Internet Computing, 6 (1). 50-57. 2002.
2029 Jan 2003Ian Foster, U.Chicago [email protected]
Computer Science Research Introduction & Context (Ian Foster: 30 mins)
– Vision : Virtual data as e-science enabler
– Organization: Structure & interactions
– Dissemination: Targets and mechanisms
– The nature of future challenges Computer science research
– Virtual data (Mike Wilde: 15)
– Scheduling, planning (Ewa Deelman: 15)
– Execution (Mike Franklin: 15)
– Performance (Valerie Taylor: 15) Technology delivery (Miron Livny: 15)
– Virtual Data Toolkit Student presentations (60)
2129 Jan 2003Ian Foster, U.Chicago [email protected]
The Nature of Future Challenges
GriPhyN R&D is proving very successful– In terms of “new ideas”
– In terms of interest & adoption Our major challenges as we move forward are
to scale and sustain the effort – Research scope: virtual data => KR; planning,
execution => x1000 larger; …; …
– Software support: we need NMIx10!
– Infrastructure & application support See Atkins cyberinfrastructure report!
2229 Jan 2003Ian Foster, U.Chicago [email protected]
Summary
CS has made significant contributions both to experiments and to knowledge, e.g.– Virtual data concepts and technologies
– Scheduling in large-scale distributed systems
– DAGman workflow management & execution
– Scalable replica location services VDT (& underlying Globus Toolkit & Condor systems) a
good technology transfer vehicle– Adoption by major science projects
– Adoption of Grid concepts within industry Major challenge: exploiting opportunities