Uni Innsbruck Informatik - Uni Innsbruck Informatik - 11
Peer-to-Peer and Grid SystemsPeer-to-Peer and Grid Systems
Grid computingGrid computing
Michael Welzl Michael Welzl http://www.welzl.atDPS NSG Team DPS NSG Team http://dps.uibk.ac.at/nsgInstitute of Computer ScienceInstitute of Computer ScienceUniversity of Innsbruck, AustriaUniversity of Innsbruck, Austria
TU DarmstadtTU DarmstadtDarmstadt, GermanyDarmstadt, Germany
2-4 January 20072-4 January 2007
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 22
ComplexityComplexity
• Grid poses difficult problems– Heterogeneity and dynamicity of resources– Secure access to resources with different users in various roles,
belonging to VTs which belong to VOs• “run program X at site Y subject to community policy P,
providing access to data at Z according to policy Q”– Efficient assignment of data and tasks to machines (“scheduling“)
Clu
ster
wit
h de
dica
ted
Clu
ster
wit
h de
dica
ted
netw
ork
links
netw
ork
linksHeterogeneousHeterogeneous
distributeddistributed
systemssystems
MassivelyMassively
parallelparallel
systemssystems
SET
I@ho
me
SET
I@ho
me
GR
IDG
RID
Syst
ems
wit
h
Syst
ems
wit
hdi
stri
bute
d m
emor
y
dist
ribu
ted
mem
ory
Syst
ems
wit
h
Syst
ems
wit
hSh
ared
mem
ory
Shar
ed m
emor
y
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 33
Grid requirementsGrid requirements
• Computer scientists can tackle these problems– Grid application users and programmers are often not computer scientists
• Important goal: ease of use– Programmer should not worry (too much) about the Grid– User should worry even less– Ultimate goal: write and use an application as if using a single computer
(power grid metaphor)
• How do computer scientists simplify?
• Abstraction.
• We build layers.
• In a Grid, we typically have Middleware.
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 44
Grid MiddlewareGrid Middleware
Grid ArchitectureGrid Architecture
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 55
Grid computing without Grid computing without middlewaremiddleware
• Example manual Grid application execution1. scp code to 10 machines2. log in to the 10 machines via ssh and start “application > result“
everywhere3. Estimate running time, or let application tell you that it‘s done
(e.g. via TCP/IP communication in app code)4. retrieve result files via scp
• Tedious process - so write a script file
• Do this again for every application / environment?
• What if your colleagues need something similar?
• Standards needed, tools introduced
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 66
ToolkitsToolkits
• Most famous: Globus Toolkit– Evolution from GT2 via GT3 to GT4 influenced the whole Grid community– Reference implementation of Open Grid Forum (OGF) standards
• Other well-known examples– Condor
• Exists since mid-1980‘s• No Grid back then - system gradually evolved towards it• Traditional goal: harvest CPU power of normal user workstations
many Grid issues always had to be addressed anyway• Special interfaces now enable Condor-Globus communication (“Condor-G“)
– Unicore (used in D-Grid)– gLite (used in EGEE)
• Issues that these middlewares (should) address– Load Balancing, error management– Authentification, Authorization and Accounting (AAA)– Resource discovery, naming– Resource access and monitoring– Resource reservation and QoS management
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 77
How the tools are applied in How the tools are applied in practicepractice
WebBrowser
ComputeServer
DataCatalog
DataViewer
Tool
Certificateauthority
ChatTool
CredentialRepository
WebPortal
ComputeServer
Resources implement standard access & management interfaces
Collective services aggregate &/or
virtualize resources
Users work with client applications
Application services organize VOs & enable
access to other services
Databaseservice
Databaseservice
Databaseservice
SimulationTool
Camera
Camera
TelepresenceMonitor
RegistrationService
Source: Globus presentation by Ian Foster
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 88
Data Mgmt
SecurityCommonRuntime
Execution Mgmt
Info Services
Web ServicesComponents
Non-WS Components
Pre-WSAuthenticationAuthorization
GridFTPPre-WS
Grid ResourceAlloc. & Mgmt
Pre-WSMonitoring& Discovery
C CommonLibraries
AuthenticationAuthorization
ReliableFile
Transfer
Data Access& Integration
Grid ResourceAllocation &
ManagementIndex
Java WS Core
CommunityAuthorization
ReplicaLocation
eXtensibleIO (XIO)
CredentialMgmt
CommunitySchedulingFramework
Delegation
Example: Globus Toolkit version 4 Example: Globus Toolkit version 4 ((GT4GT4))
DataReplication
TriggerC
WS Core
Python WS Core
WebMDS
WorkspaceManagement
Grid Telecontrol
Protocol
Contrib/Preview
Core
Depre-cated
Example
Source: Globus presentation by Ian Foster
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 99
Grid Resource Allocation Manager Grid Resource Allocation Manager (GRAM)(GRAM)
• Globus tool for job execution– Unified, resource independent replacement for steps in “manual Grid“
example
• Unified way to set environment variables:Resource Specification Language (RSL) (stdout = x, arguments = y, ..)
• Steps 1-4 become– Blocking: “globus-job-run -stage hostname applicationname“
• -stage option copies code to remote machine• Different architectures: recompilation needed – but not supported!
– Nonblocking: scp code, then “globus-job-submit hostname applicationname“(staging not yet supported)
• Obtain unique URL, continuously use it to query job status• When done, use “globus-job-get-output URL stdout“ to retrieve stdout
• More complex systems are built on top of GRAM– E.g. Message Passing Interface (MPI) for the Grid: MPICH-G2
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 1010
GRAM /2GRAM /2
• GRAM leaves a lot of questions unanswered– How to recompile application for different architectures?
(automatically + in a unified way)– What if your computer‘s IP address changes?– What if the 10 accessed computer‘s IP addresses change?– What if two of the computers becomes unavailable?– What if 3 other users start to work with 5 of the 10 computers?
• A tool for each problem...– General-purpose Architecture for Reservation and Allocation (GARA)
Integrated QoS via “advance reservation“ of resources (CPU, Disk, Network)– Monitoring and Discovery System (MDS) for locating and monitoring resources– Resource Broker (Globus: do it yourself; Condor: “matchmaker“) translates
requirement specification (CPU, memory, ..) into IP address
• Diversity of complex tools standardized + available in Globus,addressing some but not all of the issues need for an architecture
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 1111
Open Grid Services Architecture Open Grid Services Architecture (OGSA)(OGSA)
• OGSA supports:– Creation,– Maintenance, and– Application of services– OGSA is based on a
layered architecture
• Two core components:1. OGSI provides base
infrastructure2. OGSA Platform provides
core services– Policy, logging, etc.
• Applications built on top of platform services
Domain-specific Services
OGSA Platform Services
Open Grid Service Infrastructure (OGSI)
Hosting Environment
Protocol
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 1212
Resource vs. ServiceResource vs. Service
• Resource is something sharable• Resource has several interfaces for accessing it• Service realizes the interface
– With all message binding etc. for use of the client
Client
Service
Service
Resource
Interface
Interface
Interface
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 1313
OGSA GoalsOGSA Goals
1. Identify use cases for OGSA platform components2. Identify and define OGSA platform components3. Define hosting and platform-specific bindings4. Define interoperable resource models
• Standardization in Open Grid Forum (OGF)– formerly Global Grid Forum (GGF)– modelled after IETF
• Lots of smaller goals under the above large goals:– Distributed resource management, seamless QoS, management
solutions, open interfaces, integration of existing solutions(e.g. web services), …
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 1515
OGSA Core ComponentsOGSA Core Components
• Core: OGSA Platform and OGSI• Independent of layers above and below
CMM Logging PolicyData
Access Integration
Provisioning
Service Collection
OGSA Meta-OS Services OGSA Domain Services
OGSI
Notification HandleMapper Factory Manageability Registry Lifecycle Discovery
Domain-specific Services
Hosting Platform (= Native OS, …)
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 1616
OGSIOGSI
• OGSI layer is based on web services– Services called grid services
• Every grid service is also a web service– Converse is not necessarily true
• OGSI specifies– How grid service instances are named– Common interfaces and behavior– How to specify additional interfaces
• Focus is on message-level interoperability– Common XML format
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 1717
Grid Services vs. Web ServicesGrid Services vs. Web Services
• Web services already widely used and well specified– WSDL for descriptions, XML for message formats
• Web services typically stateless• Grid services typically long-running processes with state
• Two kinds of stateful services1. Service maintains state about itself
• In grid services, service state = state of the resource2. Interaction pattern between client and service is stateful
• OGSI tackles the first problem– Need to separate physical state from logical state– Logical state maintained by service– Service might not contain all of physical state
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 1818
OGSI ModelOGSI Model
• Grid services layered on web services
• Grid services contain state
• Both communicate with XML messages
• Grid services described in GWSDL– Extension of WSDL
• Client programming is the same
• Transport selected at runtime– HTTP, SOAP, ...
Grid service
Web service
XML
StateService data
OGSI
Web service
GWSDLWSDLClient
WS + OGSI
WS
XML Messages
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 1919
TerminologyTerminology
• Web service– Software component identified by a URI and with an XML interface
• Stateful web service– Web service that maintains state between interactions
• Grid service– Stateful web service that conforms to OGSI
• Grid service description– Description of interface and behavior of a grid service– Defined in combination of WSDL and GWSDL
• Grid service instance– Instance of a grid service, identified with a grid service handle
(GSH)• Grid service references
– Description of a grid service endpoint• Service data element
– Publicly accessible state of a service
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 2020
Common Management Model Common Management Model (CMM)(CMM)
• CMM is an abstract representation of real resources
• Key terminology:
• Manageable resource– Entity that has some state to which management operations can be
applied– For example, hardware, software, ...
• Manageability– Concept where resource defines information that can be used to
manage it
• Management– Actual process of managing a resource, monitoring, modifying,
making decisions, ...
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 2121
CMMCMM
• Each management resource is a grid service
• Manageability Interface– Interfaces and behavior common to
all CMM services
• Domain-specific Interface– Additional, resource- and domain-
specific interfaces for managing the resource
• Three aspects of manageability– XML schema for modeling the
resource– Collection of manageability
portTypes– Guidelines for modeling the resource
Grid Service Facade to Managed Resource
Resource
Manageability Interface
Domain-specific
Interface
GSH
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 2222
Resource ModelingResource Modeling
• Resources defined with CMM typically coarse-grained services– Services are self-contained, few relationships to other services
• Resource lifecycle modeling– Lifecycle is a set of states that a resource can have
• How to define a generic lifecycle model?
• CMM defines a lifecycle model with 5 states– Down– Starting– Up– Stopping– Failed
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 2323
PolicyPolicy
• No clear definition; OGSA defines policy as“Definitive goal, course, or method of action based on a set of conditions to guide and determine present and future decisions”
• Policies are implemented and used in context– For example, security, workload, networking, etc. policies
• OGSA defines a framework for policies
• OGSA policy model is a collection of rules based on conditions and actions
• In general, policy is: if <condition> then <action>– For example: if <customer is boss> then <provide best QoS>
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 2424
Levels of PolicyLevels of Policy
• Policies can be defined on several levels
• Business level– Typically SLA between
institutions
• Domain level– Canonical form defined by OGSA
policy framework
• Device level– Device-specific format– Used at enforcement points
Business Level
Domain Level
Device Level
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 2525
Policy FrameworkPolicy Framework
• OGSA Policy Service Core
Policy Service Core
Repository Service
Policy Resolution
Service
Policy Validation
Service
Policy Transformation
Service
Policy Service Manager
Policy Service Agent
Policy AdminPolicy Tools
Policy Autonomic Manager
Policy Enforcement Point
Common Management
Model
Producer of Policies
Consumer of Policies
Canonicalpolicies
Canonicalpolicies
Non-canonical (changes to resource)
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 2626
Policy Framework ComponentsPolicy Framework Components
• Policy Manager– Responsible for controlling
access to policy repository– Expects policies in canonical
format
• Policy Repository– Stores policy documents– Provides abstract interface– Any storage, accessed through
Data Access Interface Services
• Policy Enforcement Points– Execute policy– Work with Policy Service Agent
to resolve conflicts
• Policy Service Agent– Policy decision maker agents
• Policy Transformation Service– Transforms business objectives
and canonical policies to device-level configurations
• Policy Validation Service– Validate policy changes
• Policy Resolution Service– Evaluate policies in context of
business-level SLA
• Policy Tools and Autonomic Managers– Creation of policy documents
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 2727
LoggingLogging
• OGSA defines distributed logging system
• Provides facilities for– Decoupling of log producers and consumers– Transforming logs to a common representation
• Uses XML and XSLT– Filtering and aggregation– Configurable persistency– Consumption patterns– Secure logging
• Requirements result in a publish/subscribe-type logging system
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 2828
Data Access and ReplicationData Access and Replication
• Data Access and Integration Services (DAIS)
• DAIS group is working on a common data management and interface solution to data access
• OGSA requirements on data management:– Data access service
• Uniform access regardless of heterogeneity– Data replication
• Allows local copies of data for improved performance– Data caching service– Metadata catalog and services– Schema transformation services– Storage services
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 2929
Conceptual ModelConceptual Model
• Two kinds of resources– Resources external to OGSI– OGSI-compliant logical counterparts of the above
EDRM
EDR
EDR
DRM
DREDRM
EDRM
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 3030
Conceptual ModelConceptual Model
• External Data Resource Manager (EDRM) and Data Resource Manager (DRM)– Data management system, e.g., file system– DRM represents EDRM
• External Data Resource (EDR) and Data Resource (DR)– EDR is data managed by EDRM, e.g. directory in a file system– DR is a grid service that binds to EDR – DR is the contact point to data and exposes metadata
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 3131
Why Grid Security is hardWhy Grid Security is hard
• Resources may be valuable and the problems sensitive
• Resources often located in distinct administrative domains– Each resource has its own policies and procedures
• Set of resources used by a single computation may be large, dynamic, and unpredictable– Not just client/server– Requires delegation?
• Security must be broadly available and applicable– Standard, well-tested, well-understood protocols– Integrated with wide variety of tools
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 3232
Grid Security RequirementsGrid Security Requirements
• User view1. Easy to use2. Single sign-on3. Run applications
• ftp,ssh,Web,…4. User based trust model5. Proxies/agents (delegation)
• Resource owner view1. Specify local access control2. Auditing, accounting, etc. 3. Integration w/ local system
• Kerberos, AFS, license mgr.
4. Protection from compromisedresources
• Developer view1. API/SDK with
authentication, flexible message protection,
2. Flexible communication, delegation, ...
3. Direct calls to various security functions
4. Security integrated into higher-level SDKs
• How to meet all these requirements?
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 3333
Grid Security Infrastructure (GSI)Grid Security Infrastructure (GSI)
• Extensions to standard protocols & APIs– Standards: SSL/TLS, X.509 & CA, GSS-API– Extensions for single sign-on and delegation
• Globus Toolkit reference implementation of GSI– SSLeay/OpenSSL + GSS-API + SSO/delegation– Tools and services to interface to local security– Tools for credential management
• Login, logout, etc.• Smartcards• MyProxy: Web portal login and delegation• K5cert: Automatic X.509 certificate creation
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 3434
Community Authorization ServiceCommunity Authorization Service
• Question: How does a large community grant its users access to a large set of resources?– Should minimize burden on both the users and resource
providers
• Community Authorization Service (CAS)– Community negotiates access to resources– Resource outsources some authorization to CAS– CAS handles user registration, group membership…– User who wants access to resource asks CAS for a capability
credential– Resources can also do local access control
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 3535
Security SummarySecurity Summary
• GSI successfully addresses wide variety of Grid security issues
• Broad acceptance, deployment, integration with tools
• Standardization ongoing in OGF
• Community Authorization Service to address community-based allocation of resources– Continuing development
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 3636
Architectural evolution of the GridArchitectural evolution of the Grid
• OGSI / OGSA: Open Grid Service Infrastructure / Architecture– OGSI failed: too complex, not compliant with Web Service
standards– There are different (standards compliant) ways to achieve the
same
• Move to WSRF is a general move towards service orientation (SOA) GT3
GT4
Source: Globus presentation by Ian Foster
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 3737
Resource VirtualisationResource Virtualisation
• Handle heterogeneous and low-level resources in a uniform and high-level manner using Web technologies
ServiceOrientation
ResourceOrientation
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 3838
Compute Resource VirtualisationCompute Resource Virtualisation
PBS
Web Service-basedGRAM
Posix Fork ssh
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 3939
Service Oriented ArchitectureService Oriented Architecture
ServiceProvider
ServiceContract
Service
ServiceConsumer
Client
ServiceRegistry
RegisterFind
Bind
UDDIUDDI
WSDL
SOAP
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 4040
Web Services StandardsWeb Services Standards
UDDIRegistry
ServiceConsumer
WebService
SOAP
WSDLpoints to description
findsservice
describesservice
communicates withXML messages
points to service
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 4141
Web Services and the GridWeb Services and the Grid
• Standards are important– Stability– Interoperability– Tool support– Implementation re-use– Web standards are present world wide
• But very few standards in Web Services– Young technology– SOAP, WSDL, UDDI
• Are these three standards enough to build a service-oriented Grid infrastructure?
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 4242
Business ApplicationsBusiness Applications
• Web services are designed for business applications
• Short lifetime– State is not really needed
• Reliability is first priority– Statelessness ensures easy recovery upon crashes– Transaction-based
• Persistent– Never shut-down– E.g. Amazon, google– Lifetime is not an issue
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 4343
• Jobs are typical example of Grid resources– One implementation– Multiple instances that start and end
• Resources are transient– Limited lifetime (start and end times, not persistent)– Web Services implementations support multiple and shared
instances, but this is not portable
• Resources have state– Queued– Running– Terminated– Failed
Grid ResourcesGrid Resources
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 4444
Web Services Lifetime ManagementWeb Services Lifetime Management
• Web services are persistent
• Lifetime management supported by different Web services implementations, but– This is not standardised– Implementation specific– No portablility and no interoperability
• Sample Web services lifecycle events– Deployment, Creation, Execution, Destruction, Undeployment,
Failure– All have different names and APIs– Axis WSDP ETTK Systinet …
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 4545
Web Services StateWeb Services State
• A Web service is a remote procedure call over the Internet using XML messages– No words about state
• Stateless service implements message exchanges with no access or use of information not contained in the input message– Cannot remember information– Serve don’t care requests– E.g., compress and uncompress a file
• Requirements– Manage internal data and attributes across multiple
• Invocations• Clients• Other dependent services
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 4646
Stateless Web Service InvocationStateless Web Service Invocation
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 4747
Stateful Web Service InvocationStateful Web Service Invocation
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 4848
Web Services Resource FrameworkWeb Services Resource Framework
Web Services
Domain-Specific Applications / Services
OGSA Core Services
Job Management
WS-GRAM
Data ServicesGridFTP
WS-Resource FrameworkW3COASIS
OGF
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 4949
WSRF ScopeWSRF Scope
• Models stateful resources– Using existing Web services standards (unmodified SOAP
& WSDL)– Defines new thin standards
WS-S
erv
ice
Gro
up
WS-RenewableReferences
WS-
Noti
fica
tion
ModelingStateful
Resources with Web Services
WS-B
ase
Faults
WS-ResourceProperties W
S-
Reso
urce
Lifetim
e
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 5050
Web Service StandardsWeb Service Standards
Service Composition
Transports
Messaging
Description
Quality ofExperience(QoX)
HTTP/HTTPS SMTP RMI / IIOP
XSD WSDL
SOAPXML WS-Addressing WS-Renewable References
WS-Metadata ExchangeWS-Policy
WS-Service Group
WS-Resource Properties
JMS
WS-Security
WS-Reliable Messaging WS-Transaction
WS-Resource Lifetime
WS-Base Faults
WS-Notification BPEL4WS
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 5151
WSRF StandardsWSRF Standards
• WS-Addressing– Reference and identification of stateful resources in a Web services
context
• WS-ResourceProperties– Modeling of state as an XML document.– Accessing state as WSDL defined interfaces
• WS-ResourceLifetime– Management of leases on resource access– Create, destroy, expire
• WS-ServiceGroups– Creating and managing aggregations of Web services
• WS-BaseFaults– Baseline for extensible fault framework– Ability to reproduce exception hierarchies, as in Java
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 5252
Problem: performance demandProblem: performance demandvs. large stacksvs. large stacks
Source: http://img.dell.com/images/global/topics/power/ps1q02-broadcom1.gif
DoD
IP
TCP
HTTP
SOAP
Middleware
WS-RF
Grid apps
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 5353
HTTP 1.1
Layering inefficiency example:Layering inefficiency example:loss of “connection“ semanticsloss of “connection“ semantics
IP
TCP
HTTP 1.0
SOAP
Stateless
Connection state
Stateless
Can‘t reuseconnections!
Connection state
Web Service
Grid Service
Doesn‘t care, can do both
Stateless
Stateful
Can‘t reuseconnections!
WS-RF
Could reuse connections, but doesn‘t!
Breaking the chain
Reuses connections
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 5454
Research towards the power outletResearch towards the power outlet
Automatic parallelization
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 5555
Current SoACurrent SoA
• Standards are only specified when mechanisms are known to work– Globus only includes such working elements
• Lots of important features missing
• Practical issues with existing middlewares– Submitting a Globus job is very slow (Austrian Grid: approx. 20 seconds)
significant granularity limit for parallelization!– Globus is a huge piece of software
• Currently, some confusion about right location of features– On top of middleware? (research on top of Globus)– In middleware? (other Middleware projects)– In the OS? (XtreemOS)
Upcoming slides concern mechanisms which are mostly on topand partially within middleware
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 5656
Automatic parallelizationAutomatic parallelization
• Has been addressed in the past
• Microcode parallelism (pipelining in CPU)– Relatively easy: simple dependencies
• Instruction level parallelism– More complex dependencies– Can automatically be analyzed by compiler
• Reordering, loop unrolling, ..
for (i=1; i<100; i++) a[i] = a[i] + b[i] * c[i];
/* Thread 1 */for (i=1; i<50; i++) a[i] = a[i] + b[i] * c[i]; /* Thread 2 */for (i=50; i<100; i++) a[i] = a[i] + b[i] * c[i];(Intel C++ compiler)
Source: WIKIPEDIA
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 5757
Automatic parallelization /2Automatic parallelization /2
• Parallel Computing: complete applications parallelized– Very complex dependencies– Decomposition methods + mapping of tasks onto processors: usually
not automatic (depends on problem and interconnection network)– Algorithm specific methods developed (matrix operations, sorting, ..)– Some parts can be automatized, but not everything
explicit parallelism (OpenMP) and even allocation (MPI) quite popular
• Some research efforts on half-automaticparallelization (“manual“ aid)– Programmer knows about problem-specific
locality needs (interacting code elements)– Examples:
• Java extensions such as JavaSymphony[Thomas Fahringer, Alexandru Jugravu]
• HPF+ HALO concept[Siegfried Benkner] Source: http://www.par.univie.ac.at/~sigi/aurora/project2/
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 5858
Automatic parallelization in GridsAutomatic parallelization in Grids
• Scheduling; important issue for “power outlet“ goal– Automatic distribution of tasks and inter-task data transmissions =
scheduling
• Grid scheduling encompasses– Resource Discovery
• Authorization Filtering, Application Requirement Definition,Minimal Requirement Filtering
– System Selection• Dynamic Information Gathering• System Selection
– Job Execution• (optional) Advance Reservation• Job Submission• Preparation Tasks• Monitoring Progress• Job Completion• Clean-up Tasks
• So far, most scheduling efforts consider embarassingly parallelapplications - typically parameter sweeps (no dependencies)
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 5959
The Scheduling ProblemThe Scheduling Problem
• Hypothesis– N tasks– M machines– Execution time of each task on each machine
• Goal– Find the optimum mapping of the N tasks on the M machines that
produces the fastest completion time
• Solution– Finding the optimum requires to evaluate all search space points– There are MN possible mapping– This is impossible in real time
• NP-complete optimization problem– Cannot be solved in polynomial time– Requires heuristics to find approximate “good” solutions
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 6060
Example algorithmsExample algorithms
• Minimum Execution Time (MET)– Schedule a task on the machine that executes it fastest– Do not consider queuing time– Fast machines will be overloaded and slow machines will be idle
• Minimum Completion Time (MCT)– The earliest completion time on a machine– I/O transfer time + MET + machine ready time
• Switching Algorithm (SA)– Use MET and MCT heuristics in a cyclic fashion– Use MET at the expense of load balance until a given threshold– Then use MCT to smooth the load across the machines
Note: needs predictions of this information!
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 6161
Gantt ChartGantt Chart
M1
T1M2
T2
M3
T3 T8
T6
T5 T7
T4
Timeline
Mach
ine
Gantt Chart
• Scheduling chart• How tasks are mapped to
machines over time• A timetable of machine
usage
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 6262
Condor case studyCondor case study
• Application name, parameters, etc. + requirements specified in ClassAds– “Requirements = Memory >= 256 && Disk > 10000; Rank = (KFLOPS*10000)
+ Memory“ only use computers which match requirements (else error), order them by rank
– Explicit support for parameter sweeps: loop variables
• Resources registered with description; “central manager“ checks pool against application ClassAds (“matchmaking“) every 5 minutes, assigns jobs
• Checkpointing in Condor: need to recompile applications,link with special library (redirects syscalls)– Save current state for fault tolerance or vacating jobs
• Because preempted by higher priority job, machine busy, or user demands it
• Used in Grid Application Development Software Project (GrADS) for rescheduling (dynamic scheduling) and metascheduling (negotiation between multiple applications); ClassAds language extended– e.g., aggregation functions such as Max, Min, Sum
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 6363
Grid workflow applicationsGrid workflow applications
• Dependencies between applications (or large parts of applications) typically specified in Directed Acyclic Graph (DAG)– Condor: DAG manager (DAGMan) uses .dag file for simple
dependencies– “Do not run job ‘B’ until job ‘A’ has completed successfully”
• DAGMan scheduling: for all tasks do...– Find task with earliest starting time– Allocate it to processor with Earlierst Finish Time– Remove task from list
• GriPhyN (Grid Physics Network) facilitates workflow designwith “Pegasus“ (Planning for Execution in Grids) framework– Specification of abstract workflow: identify application components,
formulate workflow specifying the execution order, usinglogical names for components and files
– Automatic generation of concrete wf (map components to resources)– Concrete workflow submitted to Condor-G/DAGMan
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 6464
Grid Workflow Applications /2Grid Workflow Applications /2
• Components are built, Web (Grid) Services are defined,Activities are specified
• Several projects (e.g. K-WF Grid) and systems (e.g. ASKALON) exist• Most applications have simple workflows
– E.g. Montage: dissects space image, distributes processing, merges results
Legacy codes
Components
Web Services
Grid Workflowsbased on activities
MPI HPF OMP
HPFOMPMPI
MPI Java Legacy Codes
Descriptor GenerationComponent Interaction Optimization, Adaptation
Service DescriptionDiscovery, SelectionDeployment, Invocation
Dynamic InstantiationService OrchestrationQuality of Service
Source: presentation by Thomas Fahringer
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 6565
Source: http://www.dps.uibk.ac.at/projects/teuta/
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 6666
Scheduling example: HEFT Scheduling example: HEFT algorithmalgorithmStep 1 - task prioritizingStep 1 - task prioritizing
• Rank of a task: longest “distance“ to the end(Mean processing + transfer costs)
• Tasks are sorted by decreasing rank order
Task
P1 P2
1 1 1
2 0.5 1.5
3 2 2
4 1.5 2.5
5 0.5 0.5
Task
Rank calculation
5 0.5
4 2+0.5+7=9.5
3 2+0.5+2=4.5
2 1+max(0.5+2+2+1, 0.5+7+2+2)=12.5
1 1+max(12.5+3, 0.5+7+2+4)=16.5
2
1
1
2
43
5
3
4
1
27
0.5
2
1
2
Task
Rank
1 16.5
2 12.5
4 9.5
3 4.5
5 0.5
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 6767
Step 2 - processor selection (EFT)Step 2 - processor selection (EFT)
Task
P1 P2
1 1 1
2 0.5 1.5
4 1.5 2.5
3 2 2
5 0.5 0.5
1
2
43
5
3
4
1
27
3
2
1
2
1
2
1
2
3
4
FT(T1, P1) = 1FT(T1, P2) = 1
FT(T2, P1) = 1+0.5=1.5FT(T2, P2) = 1+3+1.5=5.5
FT(T4, P1) = 1.5+1.5=3FT(T4, P2) = 1.5+2+2.5=6
FT(T3, P1) = 3+2=5FT(T3, P2) = 1.5+1+2=4.5
FT(T5, P1) = 4.5+2+0.5=7FT(T5, P2) = 3+7+0.5=10.5
P1 P20
1
2
3
4
5
6
7
Processor idle + task readyData transferTask processing5
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 6868
HEFT discussionHEFT discussion
• HEFT is not a solution, just a heuristic– problem is known to be NP-complete
• Outperformed competitors (DAGMan scheduling, genetic algorithm) in ASKALON real-life experiments– Still, many improvements possible
e.g., other functions than mean, and extension for rescheduling suggested
• Heterogeneous networkcapacities and trafficinteractions ignored
Tasks = {T1, T2, T3, T4}Resources = {R1, R2, R3, R4}Data transfers = {D1, D2, D3, D4}
Not detected!
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 6969
How far have we come?How far have we come?
• Remember: systems on last slides are still research– Not standardized, not part of reference middleware implementations– Right place (OS / Middleware / App) for some functions still undecided
• A lot is still manual– Basically three choices for deploying an application on the Grid
• Simply use it if it‘s a parameter sweep• “Gridify“ it (rewrite using customized allocation - e.g. MPICH-G2)• Utilize a workflow tool
• Convergence between P2P systems and Grids has only just begun
• Several issues and possible improvements– Large number of layers are a mismatch for high performance
demands– Network usage simplistic, no customized mechanisms
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 7070
How far have we come? /2How far have we come? /2
• Strangely, parallel processing background seems to be ignored– E.g., work on task-processor mapping + P2P overlays such as
hypercube = ?
Granularity
Complexity(Dependencies)
Microcode
Instruction level
parallelism
Arbitrary parallel applications
Parameter
sweeps
Workflow application
s
Manual aidneeded
Untouched!
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 7171
Grid InterNetworkingGrid InterNetworking
An example of ongoing research
Note: “InterNetworking“ refers to networking assuming infrastructure = public Internet(some different work was done for large dedicated networks(e.g. User Controlled LightPath, UCLP) )
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 7272
Enriched with customisedEnriched with customisednetwork mechanismsnetwork mechanisms
Original Internet technologyOriginal Internet technology
Traditional Internet Traditional Internet applicationsapplications(web browser, ftp, ..)(web browser, ftp, ..)
Real-time multimedia Real-time multimedia applications (VoIP, applications (VoIP, video conference, ..)video conference, ..)
Today‘s Grid Today‘s Grid applicationsapplications
Driving a racing carDriving a racing caron a public roadon a public road
Applications with specialApplications with specialnetwork properties andnetwork properties andrequirementsrequirements
Bringing the Grid to its full potential !Bringing the Grid to its full potential !
EC-GINEC-GIN EC-GINEC-GIN enabled enabledGrid applicationsGrid applications
Research gap: Grid-specificResearch gap: Grid-specificnetwork enhancementsnetwork enhancements
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 7373
Grid-network peculiaritiesGrid-network peculiarities• Special behavior
– Predictable traffic pattern - this is totally new to the Internet!– Web: users create traffic– FTP download: starts ... ends– Streaming video: either CBR or depends on content! (head movement, ..)
• Could be exploited by congestion control mechanisms– Distinction: Bulk data transfer (e.g. GridFTP) vs. control messages (e.g.
SOAP)– File transfers are often “pushed“ and not “pulled“– Distributed System which is active for a while
• overlay based network enhancements possible– Multicast– P2P paradigm: “do work for others for the sake of enhancing the whole
system (in your own interest)“ can be applied - e.g. act as a PEP, ...• sophisticated network measurements possible
– can exploit longevity and distributed infrastructure
• Special requirements– file transfer delay predictions
• note: useless without knowing about shared bottlenecks– QoS, but for file transfers only (“advance reservation“)
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 7474
What is EC-GIN?What is EC-GIN?
• European project: Europe-China Grid InterNetworking– STREP in IST FP6 Call 6– 2.2 MEuro, 11 partners (7 Europe + 4 China)– Networkers developing mechanisms for Grids
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 7575
Research ChallengesResearch Challenges
• Research Challenges:– How to model Grid traffic?
• Much is known about web traffic (e.g. self-similarity) - but the Grid is different!
– How to simulate a Grid-network?• Necessary for checking various environment conditions• May require traffic model (above)• Currently, Grid-Sim / Net-Sim are two separate worlds
(different goals, assumptions, tools, people)
– How to specify network requirements?• Explicit or implicit, guaranteed or “elastic“, various possible levels of
granularity
– How to align network and Grid economics?• Combined usage based pricing for various resources including the network
– What P2P methods are suitable for the Grid?• What is the right means for storing short-lived performance data?
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 7676
NWS: The Network Weather ServiceNWS: The Network Weather Service
• Most common tool for performance prediction– Important for making good scheduling decisions
• Distributed system consisting of– Name Server (boring)– Sensor - actual measurement instance, regularly stores values in......– Persistent State– Forecaster (calculations based on data in Persistent State)
• Interesting parts:
– SensorMeasured resources: availableCpu, bandwidthTcp, connectTimeTcp, currentCpu, freeDisk, freeMemory, latencyTcp
– ForecasterApply different models for prediction, compare with actual measurement data, choose best match
Duration of a long TCP transfer
RTT of a small message
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 7777
NWS critiqueNWS critique
• Architecture (splitting into sensors, forecaster etc.) seems reasonable;open source consider integrating new work in NWS
• Sensor– active measurements even though non-intrusiveness was an important
design goal - does not passively monitor TCP (i.e. ignores available data)– strange methodology:
(Large message throughput) “Empirically, we have observed that a message size of 64K bytes (..) yields meaningful results“
– ignores packet size ( = measurement granularity ) and path characteristics– trivial method - much more sophisticated methods available– point-to-point measurements: distributed infrastructure not taken into
account
• Forecaster– relies on these weird measurements, where we don‘t know much about the
distribution (but we do know some things about net traffic IFF properly measured)
– uses quite trivial models (but they may in fact suffice...)
ssthresh
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 7878
NWS measurements (Austrian Grid)NWS measurements (Austrian Grid)Muhammad Murtaza Yousaf, Michael Welzl, Malik Muhammad Junaid: "Fog in the Network Muhammad Murtaza Yousaf, Michael Welzl, Malik Muhammad Junaid: "Fog in the Network Weather Service: A case for novel Approaches", MetroGrid workshop, co-located with GridNets Weather Service: A case for novel Approaches", MetroGrid workshop, co-located with GridNets 2007, Lyon, France, 19 October 2007.2007, Lyon, France, 19 October 2007.
• Salzburg-Linz (left): more than 20 MB needed to saturate link• Within Innsbruck (right), Gigabit link: around 100 MB needed
• NWS supposedly designed to be non-intrusive...
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 7979
The impact of shared bottlenecksThe impact of shared bottlenecks
• Example problem:– C allocates tasks to A and B (CPU, memory available); both send results to C– B hinders A - task of B should have been kept at C!
• Path changes are rare - thus, possible to detect potential problem in advance– generate test messages from A, B to C - identify signature from B in A‘s traffic
• Another issue in this scenario: how valid is a prediction that A obtains if a measurement / prediction system does not know about the shared bottleneck?
A
B
C
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 8080
EC-GIN Large File Transfer Scenario EC-GIN Large File Transfer Scenario (LFTS)(LFTS)
Multipath file transfer
(AB + ACB) beneficial
Multipath file transfer not beneficial
due to shared bottleneck
Questions: when does this make sense, how to expose this functionality, how to authenticate and authorize…?
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 8181
Shared bottleneck detection with Shared bottleneck detection with SVDSVDMuhammad Murtaza Yousaf, Michael Welzl, Bulent Yener (2007); under submissionMuhammad Murtaza Yousaf, Michael Welzl, Bulent Yener (2007); under submission
• Input: end-to-end forward delays of multiple flows• Analysis
– Multivariate Analysis Method SVD (Singular Value Decomposition)• Matrix operation which yields clustered values for correlating flows
– Calculate differences between values, consider changes between clusters as outliers, apply simple outlier detection method
• Output: clusters of flows which share a bottleneck– Very precise, easy to calculate, can cluster multiple flows at the same
time (other work uses pairwise cross correlation)
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 8282
Extending the Padhye equation to Extending the Padhye equation to NN flows flowsDragana Damjanovic, Werner Heiss, Michael Welzl: "An Extension of the TCP Steady-State Dragana Damjanovic, Werner Heiss, Michael Welzl: "An Extension of the TCP Steady-State Throughput Equation for Parallel TCP Flows", poster, ACM SIGCOMM 2007, 27-31 August, Throughput Equation for Parallel TCP Flows", poster, ACM SIGCOMM 2007, 27-31 August, Kyoto, Japan.Kyoto, Japan.
• Fair amount of work done, but so far, no (easily usable) approximation exists which also takes loss into account
• Useful in a Grid for multiple reasons:– Prediction of GridFTP throughput (multiple TCP flows)– Protocol with tunable aggression (MulTFRC)
• Because, if a Grid application uses two flows which share a bottleneck, flow 1 may be 3.7 times as important to it as flow 2 (e.g. if flow 2 is from replication)
• For fairness: if we take a break, we may earn “aggression points“
)484(][)44(
][)325326(
][)12164(][120
2343
22
34
pbnpbnpbnWEbnppbn
WEnbppbnnpbnp
WEbnpnpbpWbpE
we get E[W] depending on n, pand b
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 8383
Other current EC-GIN workOther current EC-GIN work
• Scheduling of advance reservations for bulk data transfers (using high-speed congestion control mechanisms)– another NP-complete scheduling problem
• Work package dedicated to modeling (and ns-2 simulation code)– currently collecting measurements from everywhere...– Grid-Net scheduling simulator using ns-2 already available
• High-speed SOAP engine– Tackling multiple parts of the SOAP stack: asynchronous I/O, pull
based XML parsing, data mapping template
• Extending P2P incentive + security mechanisms for the Grid
Uni Innsbruck Informatik - Uni Innsbruck Informatik - 8484
ConclusionConclusion
• Grids exist, they are working and growing– and complex– only briefly touched upon several aspects of the Grid here
• At the same time, still a lot of room for enhancement– e.g. Grid InterNetworking is a research gap
• We are still quite far away from the power outlet goal
• Recently, the term “Grid“ has become unpopular– replaced with Service Oriented [whatever] (SO-?)– New EC vision: “Service Oriented Knowledge Utility“ (SOKU)
• Web Services + Semantics + Dynamics …
• Terms may change, but the user demand and researcher / developer interest will not go away; Grid will stay, but evolution still ongoing