Management and usage of large scale...

Post on 11-Jun-2020

2 views 0 download

transcript

Management and usage of large scale infrastructures

dacosta@irit.fr

2

Grid Computing and clouds

● Ian Foster on Grids : “Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”.

● Clouds: sharing of resources to achieve coherence and economies of scale, similar to a utility (like the electricity grid) over a network

Should be easy to use and could be easy to manage

3

User point of view

4

Several goals

● Low latency– How long do I have to wait my job

completion

● High Throughput– How many jobs can I finish in a timeframe

● Low cost– How much does it cost to me

● Low complexity– How long do I have to manage my jobs

5

Reduce the complexity

● Standardization bodies● Use of open protocols● High level QoS● Abstraction levels

– Grid● Everything is a resource

– Cloud● PaaS, IaaS, SaaS

Services !!!

6

Frontends

● First contact– Web Site

● Dedicated

– Eclipse Plugin● Developper

only

– Command line● Expert

only

7

Why doing it simple ?

● Framework for building submission websites

8

Cloud version : same complexity

9

Simple Job Submission

● Submit job to a GRAM servicedefault factory EPRgenerate job RSL to default localhost

● Command example:% globusrun-ws -submit -c /bin/touch touched_itSubmitting job...Done.Job ID: uuid:002a6ab8-6036-11d9-bae6-0002a5ad41e5Termination time: 01/07/2005 22:55 GMTCurrent job state: ActiveCurrent job state: CleanUpCurrent job state: DoneDestroying job...Done.

10

Simple Job Submission : Cloud version

11

Security is complex

● In clouds– Isolation and no sharing

– Delegated to other layers

● In grids– Virtual organization

– Cooperation between sites

– Trust mechanisms

12

Grid Security Infrastructure (GSI)

● Based on certificates● Several CA (Certificate

Authorities)● Trust relations are inherited from

CA● Communications are based on

SSL● Coarse grained

– Not adapted for reading few bytes in a file

13

Grid Security Infrastructure (GSI)

14

Timing and methodology

● Clouds– Everything by hand, you have what you

pay● PaaS / SaaS / IaaS

– Deployment/Development depends on what you buy

● Grids– Standardized (everything is a resource)

– Can do everything so everything is a pain

15

Example of Grid data communication

● Globus WSRF : Web Service Resource Framework

● Data accessis a service

16

Provider point of view

17

Job flow in grids : Question ? How many decisions

18

Basic useful services

● VO Management Service: resources allocation to each Virtual Organization.

● Resource Discovery and Management Service● Job Management Service● And much more: security (authentication,

authorisation, data management)…

● All all services interact: example Job Management Service needs Resource Discovery

● Need Standardization for interfaces to services Example: JobSubmissionService has a submitJob() method

19

Base infrastructure to implement the architecture OGSA?

OGSA: Open Grid Services Architecture

● The method invocation should also be standardized. Corba? RMI? RPC? No : Web Services!!

● But need Stateful Web Services!

● WSRF: Web Services Resource Framework

20

The Web services WSDL/SOAP/HTTP pancake

In theory extensible and generic.In reality complex and monolitic

21

Going more inside Web services invocations

You don’t have to program the stubs/nor the SOAP requests/responsesJust like Corba and RMI

22

From stateless to stateful WS

Using the concept of resources

23

WS-Resources

Web Service + Resource = WS-ResourcesTo address these, we need a

endpoint reference to specify the resource

Think how simple are DNS, RmiRegistry... Nope

24

Specification, WSRF and more

● WS-ResourceProperties: defined in the WSDL interface

● WS-ResourceLifetime: manage lifecycle of the WS-Resources

● WS-ServiceGroup: group services or WS-Resources together allow to find in the group services meeting a

particular property allow also to address all services of the group by

one entry point● WS-BaseFaults: for fault reporting● WS-Notification: producer/consumer mode● WS-Addressing: to address the WS-Resources

25

Grid middlewareProvides WS-R

Grid middlewareIS WS-R

26

Writing a WSRF Web/Grid Service

Five Steps, only !

1. Define the service’s interface. This is done with WSDL

2. Implement the service. This is done with Java.

3. Define the deployment parameters. This is done with WSDD and JNDI

4. Compile everything and generate a GAR file. This is done with Ant

5. Deploy service. This is also done with a GT4 tool

27

A example service interface

public interface Math

{public void add(int a);public void subtract(int a);public int getValueRP();

}

In Java or IDL, the description is simple…

28

WSDLservice description

<?xml version="1.0" encoding="UTF-8"?>

<definitions name="MathService”

targetNamespace="http://www.globus.org/namespaces/examples/core/MathService_instance"

xmlns="http://schemas.xmlsoap.org/wsdl/"

xmlns:tns="http://www.globus.org/namespaces/examples/core/MathService_instance"

xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"

xmlns:wsrp="http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-01.xsd"

xmlns:wsrpw="http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-01.wsdl"

xmlns:wsdlpp="http://www.globus.org/namespaces/2004/10/WSDLPreprocessor"

xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<wsdl:import

namespace="http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-01.wsdl"

location="../../wsrf/properties/WS-ResourceProperties.wsdl" />

29

<!­­==== P O R T T Y P E ==========­­>

<portType name="MathPortType"

wsdlpp:extends="wsrpw:GetResourceProperty"

wsrp:ResourceProperties="tns:MathResourceProperties">

<operation name="add">

<input message="tns:AddInputMessage"/>

<output message="tns:AddOutputMessage"/>

</operation>

<operation name="subtract">

<input message="tns:SubtractInputMessage"/>

<output message="tns:SubtractOutputMessage"/>

</operation>

<operation name="getValueRP">

<input message="tns:GetValueRPInputMessage"/>

<output message="tns:GetValueRPOutputMessage"/>

</operation>

</portType>

</definitions>

30

<!­­====== M E S S A G E S ======­­>

<message name="AddInputMessage">

<part name="parameters" element="tns:add"/>

</message>

<message name="AddOutputMessage">

<part name="parameters" element="tns:addResponse"/>

</message>

<message name="SubtractInputMessage">

<part name="parameters" element="tns:subtract"/>

</message>

<message name="SubtractOutputMessage">

<part name="parameters" element="tns:subtractResponse"/>

</message>

<message name="GetValueRPInputMessage">

<part name="parameters" element="tns:getValueRP"/>

</message>

<message name="GetValueRPOutputMessage">

<part name="parameters" element="tns:getValueRPResponse"/>

</message>

31

<! === T Y P E S ========­­>

<types>

<xsd:schema targetNamespace="http://www.globus.org/namespaces/examples/core/MathService_instance"   xmlns:tns="http://www.globus.org/namespaces/examples/core/MathService_instance"

   xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<!­­ REQUESTS AND RESPONSES ­­>

<xsd:element name="add" type="xsd:int"/>

<xsd:element name="addResponse">

<xsd:complexType/>

</xsd:element>

<xsd:element name="subtract" type="xsd:int"/>

<xsd:element name="subtractResponse">

<xsd:complexType/>

</xsd:element>

<xsd:element name="getValueRP">

<xsd:complexType/>

</xsd:element>

<xsd:element name="getValueRPResponse" type="xsd:int"/>

32

<!­­ RESOURCE PROPERTIES ­­>

<xsd:element name="Value" type="xsd:int"/>

<xsd:element name="LastOp" type="xsd:string"/>

<xsd:element name="MathResourceProperties">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="tns:Value" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="tns:LastOp" minOccurs="1" maxOccurs="1"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

</xsd:schema>

</types>

33

From stateless to stateful WS

Using the concept of resources

34

If you are still alive, you still have to

● Actually write the code● Configure the deployment

With WSDD and JNDI● Compile everithing with the right libraries● Generate a GAR file: Grid Archive ● Deploy into a container

● And it was a simple stateless service !

Most people just run code and forget about services

35

Behind the scene how does it work ?

36

Grids : Globus GRAM 4, everything is specified

GridFTPRFT

Delegation

GridFTP

GRAMservices

local sched.

user job

compute element

compute element and service host(s)

remote storage element(s)

FTP data

FTP control

clie

nt

job submit

delegate

xfer

req

uest

local job control

delegateGRAMadaptersu

do

37

Clouds : OpenStack : somewhat specified

OpenStack : Communication and meta-data

38

Structure

● Monitoring● Analyze● Decision● Implementation

MAPE-K loop

Concept view: actually several cooperative decisions

39

Monitoring

● Grid : Integrated monitoring– Ganglia

– NWS, Network Weather Service (adds prediction)

– Nagios

● Cloud– Provider : integrated

– User : no access to provider data● If you want something, deploy it

40

Monitoring example : Ganglia

● Goal: High performance

– Small messages to reduce network impact

– Hierarchical structure with aggregation nodes

– Scalability (few thousand nodes)

● Several components

– XDR for portable non-intrusive communication

– RRDtool for data storage and manipulation

– XML for data format

● Open Source

41

Analyze

Metrics

Computed using raw data from monitoring

ex: Energy consumption

● Grid: usually performance

– How many jobs are running

– How many are waiting

– How far are the deadlines

– Everything is at 100%

– Energy does (not) matter

42

Analyze

Metrics● Cloud

– Abstract « performance » do not exist : only users (QoS)

– Provider has an infrastructure point of view● Unused resources● Cost (electricity & management)

– Some classical metrics (Question : for who ?)

● Performance● Energy● Reliability● Dynamism

43

Decision

● Grids : already said– Most important : where and when to run

tasks

● Clouds– User: Optimize QoS

● Start new instances● Modify resource allocation of current instances

– Provider: save money (and electricity)● Consolidation● Switching on/off servers

44

Grid exemple : backfilling

Question : If 5 is longer, can we move 4 ?What could be the negative impact ?

45

Cloud exemple : steps for consolidation

46

Limits● Consolidation

– Real servers don't switch off

– Service interruption (even if few ms)

– Isolation

● Scheduling in general

– Fairness

– QoS evaluation

– Multi-metrics for antagonist objectives● « Performance », Energy, Resilience,

Dynamism

Question: How to manage reliability ?

47

Execute

● User: Depends on the application– Reconfiguration

– Data migration (web server, database)

– Scalability of the application

● Provider– Latency problems:

● Switching on/off a nodes: ~ 1 min

– Scale problem● Switching on/off 1000 nodes: power peaks

48

What about Peer to Peer ?

49

Control ?

● Several type of Peer to Peer systems– Corporate

● Distributed File system● Work Stealing

– Cooperative● Protein folding● BitCoins

50

Distributed Hash Table

● Main point of contact : DHT● Manages meta-data

– File systems

● Manages all data– Work sharing

● Several libraries– Kademlia

– Chord

51

Comparison with Grids and Clouds

● More specific– Toward simple data management

● Distributed file sharing

– Toward computation on simple data● Protein folding● BitCoins● Work stealing

● Some good properties– Low possibilities but simple to implement

– Decentralized Question : Decisions ?

52

Hype Cycle for Emerging Technologies, Gartner 2014

53

Bibliography

● The Grid 2: Blueprint for a New Computing Architecture. Ian Foster, Carl Kesselman

● The Globus Toolkit 4 Programmer’s Tutorial, Borja Sotomayor

● A view of cloud computing Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., ... & Zaharia, M.

● OpenStack: toward an open-source solution for cloud computing Sefraoui, Omar, Mohammed Aissaoui, and Mohsine Eleuldj

● Peer-to-peer computing Milojicic, Dejan S., Vana Kalogeraki, Rajan Lukose, Kiran Nagaraja, Jim Pruyne, Bruno Richard, Sami Rollins, and Zhichen Xu