+ All Categories
Home > Documents > BuildingaVirtualized(Distributed(Computing(Infrastructure ... · describe the InterGrid project)...

BuildingaVirtualized(Distributed(Computing(Infrastructure ... · describe the InterGrid project)...

Date post: 04-Jun-2018
Category:
Upload: lamdieu
View: 215 times
Download: 0 times
Share this document with a friend
20
Building a Virtualized Distributed Computing Infrastructure by Harnessing Grid and Cloud Technologies Alexandre di Costanzo, Marcos Dias de Assunção, and Rajkumar Buyya Grid Computing and Distributed Systems Laboratory Department of Computer Science and Software Engineering The University of Melbourne, Australia Abstract In this article, we present the realization of a system, termed as InterGrid, for interconnecting distributed computing infrastructures by harnessing virtual machines. The InterGrid aims to provide an execution environment for running applications on top of the interconnected infrastructures. The system uses virtual machines as the building blocks to construct execution environments that span multiple computing sites. An execution environment is a network of virtual machines created to fulfill the requirements of an application, thus running isolated from other execution environments. These environments can be extended to operate on Cloud infrastructure, such as Amazon EC2. The article provides an abstract view of the proposed architecture and its implementation; experiments show the scalability of an infrastructure managed by InterGrid and how the system can benefit from using Cloud infrastructure. Introduction Over the last decade, the distributed computing area has been characterized by the deployment of largescale Grids such as EGEE [1] and Grid’5000 [2],. Such Grids have provided the research community with an unprecedented number of resources, which have been used for various scientific endeavors. Several efforts have been made to enable interoperation between Grids by providing standard components and adapters for secure job submissions, data transfers, and information queries [3]. Despite these efforts, the heterogeneity of hardware and software has contributed to increasing complexity of deploying applications on these infrastructures. Moreover, recent advances in virtualization technologies [4] have led to the emergence of commercial infrastructure providers, also known as Cloud computing [5]. Handling the ever growing demands of
Transcript

Building  a  Virtualized  Distributed  Computing  Infrastructure  by  

Harnessing  Grid  and  Cloud  Technologies  

Alexandre  di  Costanzo,  Marcos  Dias  de  Assunção,  and  Rajkumar  Buyya  Grid  Computing  and  Distributed  Systems  Laboratory  

Department  of  Computer  Science  and  Software  Engineering  The  University  of  Melbourne,  Australia  

 

Abstract  

In   this   article,  we   present   the   realization   of   a   system,   termed   as   InterGrid,   for  

interconnecting   distributed   computing   infrastructures   by   harnessing   virtual  

machines.  The  InterGrid  aims  to  provide  an  execution  environment  for  running  

applications   on   top   of   the   interconnected   infrastructures.   The   system   uses  

virtual  machines  as  the  building  blocks  to  construct  execution  environments  that  

span  multiple  computing  sites.  An  execution  environment  is  a  network  of  virtual  

machines   created   to   fulfill   the   requirements   of   an   application,   thus   running  

isolated   from   other   execution   environments.   These   environments   can   be  

extended   to   operate   on   Cloud   infrastructure,   such   as   Amazon   EC2.   The   article  

provides  an  abstract  view  of  the  proposed  architecture  and  its  implementation;  

experiments  show  the  scalability  of  an  infrastructure  managed  by  InterGrid  and  

how  the  system  can  benefit  from  using  Cloud  infrastructure.  

Introduction  

Over  the  last  decade,  the  distributed  computing  area  has  been  characterized  by  

the   deployment   of   large-­‐scale  Grids   such   as   EGEE   [1]   and  Grid’5000   [2],.   Such  

Grids  have  provided  the  research  community  with  an  unprecedented  number  of  

resources,  which  have  been  used  for  various  scientific  endeavors.  Several  efforts  

have  been  made  to  enable   interoperation  between  Grids  by  providing  standard  

components   and   adapters   for   secure   job   submissions,   data   transfers,   and  

information  queries  [3].  Despite  these  efforts,  the  heterogeneity  of  hardware  and  

software  has  contributed  to   increasing  complexity  of  deploying  applications  on  

these   infrastructures.  Moreover,   recent   advances   in   virtualization   technologies  

[4]   have   led   to   the   emergence   of   commercial   infrastructure   providers,   also  

known   as   Cloud   computing   [5].   Handling   the   ever   growing   demands   of  

distributed   applications  while   addressing   heterogeneity,   remains   a   challenging  

task  that  can  require  resources  from  both  Grids  and  clouds.  

 

In   previous  work,  we   presented   an   architecture   for   resource   sharing   between  

Grids   [6]   inspired   by   the   peering   agreements   established   between   Internet  

Service  Providers  (ISPs)  in  the  Internet,  through  which  ISPs  agree  to  allow  traffic  

into   one   another's   networks.   This   work   presents   the   realization   of   the  

architecture,   which   is   termed   as   InterGrid   and   relies   on   InterGrid   Gateways  

(IGGs)  that  mediate  access  to  resources  of  participating  Grids.  The  InterGrid  also  

aims  at   tackling   the  heterogeneity  of  hardware  and  software  within  Grids.  The  

use   of   virtualization   technology   can   ease   the   deployment   of   applications  

spanning  multiple  Grids  as  it  allows  for  resource  control  in  a  contained  manner.  

In  this  way,  resources  allocated  by  one  Grid  to  another  are  used  to  deploy  virtual  

machines.   Virtual   machines   also   allow   the   use   of   resources   from   Cloud  

computing  providers.  

 

We  first  introduce  the  concepts  on  virtualization  and  Cloud  computing.  Then,  we  

describe   the   InterGrid   project   (www.gridbus.org/intergrid),   which   aims   to  

provide  an  infrastructure  for  deploying  applications  on  several  computing  sites,  

or   Grids,   by   using   virtualization   technologies.   These   applications   run   on  

networks  of   virtual  machines  or   execution  environments   created  on   top  of   the  

physical   infrastructure.   Next,   the   system   design   and   implementation   are  

presented.  To  illustrate  the  different  interactions  among  the  system  components,    

we  describe  the  InterGrid  at  runtime.  Finally,  experimental  results  demonstrate  

the   system  scalability   and   show  how  applications   can   combine   resources   from  

two  cloud  provider  sites.  

Background  and  Context  

Virtualization  Technology  and  Infrastructure  as  a  Service  

The   increasing   ubiquity   of   Virtual  Machine   (VM)   technologies   has   enabled   the  

creation   of   customized   environments   atop   physical   infrastructure   and   the  

emergence  of  business  models  such  as  Cloud  computing.  The  use  of  VMs  brings  

several  benefits  such  as:    

(i) Server   consolidation,   allowing   workloads   of   several   under-­‐utilized  

servers  to  be  placed  in  fewer  machines;  

(ii) The  ability  to  create  VMs  to  run  legacy  code  without  interfering  with  

other  applications'  APIs;  

(iii) Improved   security   through   the   creation   of   sandboxes   for   running  

applications  with  questionable  reliability;  and  

(iv) Performance   isolation,   thus   allowing   a   provider   to   offer   some  

guarantees  and  better  QoS  to  customers'  applications.  

Existing   virtual-­‐machine   based   resource   management   systems   can   manage   a  

cluster  of  computers  within  a  site  allowing  the  creation  of  virtual  workspaces  [7]  

or  virtual  clusters  [8].  They  can  bind  resources  to  virtual  clusters  or  workspaces  

according   to   a   user's   demand.   These   systems   commonly   provide   an   interface  

through   which   one   can   allocate   VMs   and   configure   them   with   the   operating  

system   and   software   of   choice.   These   resource   managers,   also   named   Virtual  

Infrastructure  Engines  (VIE),  allow  the  user  to  create  customized  virtual  clusters  

using  shares  of  the  physical  machines  available  at  the  site.  

As  explained  earlier,  virtualization  technology  minimizes  some  security  concerns  

inherent  to  the  sharing  of  resources  among  multiple  computing  sites.  We  utilize  

virtualization   software   to   realize   the   InterGrid   architecture   because   existing  

cluster   resource   managers   relying   on   VMs   can   provide   us   with   the   building  

blocks,   such  as   the  availability   information,   required   for   the   creation  of  virtual  

execution   environments.   In   addition,   relying   on   VMs   eases   the   deployment   of  

execution   environments   on  multiple   computing   sites;   the   user   application   can  

have   better   control   over   the   software   installed   on   the   resources   allocated  

without   compromising   the   operation   of   the   hosts'   operating   systems   at   the  

computing  sites.  

Virtualization   technologies   have   also   facilitated   the   realization   of   Cloud  

Computing  services.  Cloud  computing  includes  three  kinds  of  services  accessible  

by  the   Internet:  Software  as  a  Service  (SaaS),  Platform  as  a  Service  (PaaS),  and  

Infrastructure   as   a   Service   (IaaS).   This  work   considers   only   IaaS.   IaaS   aims   to  

give   provide   computing   resources   or   storage   as   a   service   to   users.   One   of   the  

major   players   in   Cloud   computing   is   Amazon   with   its   Elastic   Compute   Cloud  

(EC2)1,   which   comprises   several   data   centers   located   around   the   world.   EC2  

allows  users  to  deploy  VMs  on  demand  on  Amazon’s  infrastructure  and  pay  only  

for  the  computing,  storage  and  network  resources  they  use.  

Related  Work  

Existing   work   has   shown   how   to   enable   virtual   clusters   that   span   multiple  

physical   computer   clusters.   A   broker   is   responsible   for   managing   a   virtual  

domain  (i.e.  a  virtual  cluster)   in  VioCluster   [1],  and  can  borrow  resources   from  

another  broker.  Brokers  have  borrowing  and   lending  policies   that  define  when  

machines   are   requested   from   other   brokers   and   when   they   are   returned,  

respectively.    

Systems   for  virtualizing  a  physical   infrastructure  are  also  available.  Montero  et  

al.   [2]   investigated   the   deployment   of   custom   execution   environments   using  

Open  Nebula.  They  investigated  the  overhead  of  two  distinct  models  for  starting  

virtual  machines  and  adding   them  to  an  execution  environment.  Montero  et  al.  

[3]   also   used   GridWay   to   deploy   virtual   machines   on   a   Globus   Grid;   jobs   are  

encapsulated   as   virtual  machines.  Montero   et   al.   showed   that   the   overhead   of  

starting  a  virtual  machine  is  small  for  the  application  evaluated.  

Several   load-­‐sharing   mechanisms   have   been   investigated   in   the   distributed  

systems  realm.  Iosup  et  al.  [4]  proposed  a  matchmaking  mechanism  for  enabling  

resource  sharing  across  computational  Grids.  Surana  et  al.  [5]  addressed  the  load  

balancing  in  DHT-­‐based  P2P  networks.    

[1] P. Ruth, P. McGachey, and D. Xu. VioCluster: Virtualization for dynamic computational

domain. In IEEE International on Cluster Computing (Cluster 2005), pages 1–10, Burlington,

USA, September 2005. IEEE.

[2] R. S. Montero, E. Huedo, and I. M. Llorente. Dynamic deployment of custom execution

environments in Grids. In 2nd International Conference on Advanced Engineering Computing                                                                                                                1  http://aws.amazon.com/ec2/  

and Applications in Sciences (ADVCOMP ’08), pages 33–38, Valencia, Spain,

September/October 2008. IEEE Computer Society.

[3] A. Rubio-Montero, E. Huedo, R. Montero, and I. Llorente. Management of virtual machines on

globus Grids using GridWay. In IEEE International Parallel and Distributed Processing

Symposium (IPDPS 2007), pages 1–7, Long Beach, USA, March 2007. IEEE Computer Society.

[4] A. Iosup, D. H. J. Epema, T. Tannenbaum, M. Farrellee, and M. Livny. Inter-operating Grids

through delegated matchmaking. In 2007 ACM/IEEE Conference on Supercomputing (SC

2007), pages 1–12, New York, USA, November 2007. ACM Press.

[5] S. Surana, B. Godfrey, K. Lakshminarayanan, R. Karp, and I. Stoica. Load balancing in

dynamic structured peer-to-peer systems. Performance Evaluation, 63(3):217–240, 2006.

InterGrid  Concepts  

As   this  work  presents   the   realization  of   the   InterGrid,   this   section  provides   an  

overview   of   the   proposed   architecture;   details   about   the   architecture   are  

available  in  previous  work  [6].  

G1G2

G3

G4

G5G6

G5G6

G4

G3

G2

G1

InterGrid Gateway

(a)

(b)

(c)

Peering Arrangement

Grid  

Figure  1:  Abstract  view  of  the  software  layers  of  the  InterGrid.  

Figure  1  depicts  the  scenario  considered  by  the  InterGrid.  The  InterGrid  aims  to  

provide   a   software   system   that   allows   the   creation   of   execution   environments  

for  various  applications  (a)  on  top  of  the  physical  infrastructure  provided  by  the  

participating  Grids  (c).  The  allocation  of  resources  from  multiple  Grids  to   fulfill  

the   requirements   of   the   execution   environments   is   enabled   by   peering  

arrangements  established  between  gateways  (b).  

IGG

IGG

Virtual Infrastructure Engine (VIE)

Physical Resourc

es

User application

VIE(Cloud APIs)

1. Requestfor VMs

2. Enactment of leases

3. Application deployment

Organisation's Site

Cloud Provider

Peeringarrangement

 

Figure  2:  Application  deployment  

A  Grid  has  pre-­‐defined  peering  arrangements  with  other  Grids,  managed  by  IGGs  

and,  through  which  they  co-­‐ordinate  the  use  of  resources  of  the  InterGrid.  An  IGG  

is  aware  of  the  terms  of  the  peering  with  other  Grids;  selects  suitable  Grids  able  

to   provide   the   required   resources;   and   replies   to   requests   from   other   IGGs.  

Request  redirection  policies  determine  which  peering  Grid  is  selected  to  process  

a  request  and  a  price  at  which  the  processing  is  performed.  An  IGG  is  also  able  to  

allocate   resources   from  a  Cloud  provider.  Figure  2   illustrates  a   scenario  where  

an   IGG   allocates   resources   from   an   organization’s   local   cluster   for   deploying  

applications.   Under   peak   demands,   this   IGG   interacts   with   another   that   can  

allocate  resources  from  a  cloud  computing  provider.  

Although  applications  can  have  resource  management  mechanisms  of  their  own,  

we  consider  a  case  where  the  resources  allocated  by  an  application  are  used  for  

the  creation  of  a  Distributed  Virtual  Environment  (DVE),  which   is  a  network  of  

virtual  machines   that   runs   isolated   from   other   DVEs.   Therefore,   the   allocation  

and   management   of   the   acquired   resources   is   performed   on   behalf   of   the  

application  by  a  component  termed  DVE  Manager.  

InterGrid  Realization  

The  InterGrid  gateway  has  been  implemented  in   Java  and  its  main  components  

are  depicted  in  Figure  3.  

InterGrid Gateway

Pe

rsis

ten

ce

DB

Ja

va

De

rby

Co

mm

un

ica

tio

n M

od

ule

Me

ss

ag

e-P

as

sin

gManagement & Monitoring

JMX

Scheduler

(Provisioning Policies & Peering)

Virtual Machine Manager

EmulatorLocal

ResourcesIaaS

ProviderGrid

Middleware

 

Figure  3:  Main  components  of  the  InterGrid  Gateway.  

Communication   Module:   is   responsible   for   message-­‐passing.   This   module  

receives   messages   from   other   entities   and   delivers   them   to   the   components  

registered  as  listeners  for  those  types  of  messages.  It  also  allows  components  to  

communicate  with  other  entities  in  the  system  by  providing  the  functionality  to  

send   messages.   Message-­‐passing   helps   making   gateways   loosely   coupled   and  

building  more  failure-­‐tolerant  communication  protocols.  In  addition,  sender  and  

receiver   are   de-­‐coupled,   making   the   whole   system   more   resilient   to   traffic  

bursts.   All   the   functionalities   of   the   module   are   handled   by   one   central  

component,  the  Post  Office.  When  the  Post  Office  receives  an  incoming  message,  

the   message   is   associated   to   a   thread   and   then   forwarded   to   all   listeners.  

Threads  are  provided  by  a  pool  and  if   the  pool   is  empty,  messages  are  put   in  a  

queue  to  wait  for  available  threads.    Listeners  are  message  handlers,  which  deal  

with  messages.  All  listeners  are  notified  when  a  new  message  arrives.  Listeners  

can   individually  decide   to  process   the  message.  As  messages  are  asynchronous  

and   sent/served   in   parallel,   there   is   no   order   or   delivery   guaranties.   It   is   the  

responsibility   of   communicating   components   to   handle   those   properties.   For  

instance,   the   scheduler   uses   unique   message   identifiers   to   manage   request  

negotiations.      

Management  and  Monitoring:  are  performed  via  Java  Management  Extensions  

(JMX).  JMX  is  a  standard  API  for  management  and  monitoring  of  resources  such  

as  Java  applications.  It  also  includes  remote  secure  access,  so  a  remote  program  

can  interact  with  a  running  application  for  management  purposes.  The  gateway  

exports,   via   JMX,   management   operations   such   as   configuring   peering,  

connecting/disconnecting   to   another   gateway,   shutdown   the   gateway,   and  

managing   the   virtual  machine  manager.  All   these   operations   are   accessible   via  

JConsole,   which   is   a   graphical   client   provided   by   Java   to   connect   to   any  

application   using   JMX.   Moreover,   we   provide   a   command   line   interface   that  

interacts  with  the  components  via  JMX.    

Persistence:  relies  on  a  relational  database  for  storing  the  information  used  by  

the   gateway.     Information   such   as   peering   arrangements   and   templates   for  

virtual   machines   are   persistently   stored   in   the   database.   The   database   is  

provided   by   the   Apache   Derby   project2,   which   is   a   database   implemented  

entirely  in  Java.  

Scheduler:   the  component  here  represented  by   the  Scheduler  comprises  other  

components,   namely   the   resource   provisioning   policy,   the   peering   directory,  

request  redirection  and  the  enactment  module.  The  scheduler  interacts  with  the  

Virtual   Machine   Manager   in   order   to   create,   start   or   stop   VMs   to   fulfill   the  

requirements   of   the   scheduled   requests.   The   scheduler   also   maintains   the  

availability  information  obtained  by  the  VM  Manager.    

Virtual  Machine  Manager  (VMM)  

The  Virtual  Machine  Manager   (VMM)   is   the   link   between   the   gateway   and   the  

resources.   As   described   previously,   the   gateway   does   not   share   physical  

resources  directly,  but   relies  on  virtualization   technology   for  abstracting   them.  

                                                                                                               2  http://db.apache.org/derby  

Hence,  the  actual  resources  used  by  the  gateway  are  virtual  machines.  The  VMM  

relies   on   a   Virtual   Infrastructure   Engine   (VIE),   for   managing   VMs   on   a   set   of  

physical  resources.  Typically,  VIEs  are  able  to  create  and  stop  VMs  on  a  physical  

cluster.  At  present,  we  use  Open  Nebula3  as  a  VIE.  Open  Nebula  allows  the  user  to  

submit  VMs  on  physical  machines  using  different  kinds  of  hypervisors,   such  as  

XEN4.   Xen   is   a   hypervisor   that   allows   running   several   operating   systems   on   a  

same  host  concurrently.  A  hypervisor  gives  privileged  access  to  the  hardware  for  

the  guest  operating  systems.  The  host  can  control  and  limit  the  usage  of  certain  

resources  by  the  guests,  such  as  memory  or  CPU.  In  addition,  the  VMM  manages  

the  deployment  of  VMs  on  Grids  and  IaaS  providers.  

Virtual  Machine   template:   Open   Nebula's   terminology   is   used   to   explain   the  

idea  of  templates  for  VMs.  A  template  is  analogous  to  a  computer's  configuration,  

and  contains  a  description  for  a  virtual  machine  with  the  following  information:  

• Number  of  cores  or  processors  to  be  assigned  to  the  VM;  

• The  amount  of  memory  required  by  the  VM;  

• The  kernel  used  to  boot  the  VM's  operating  system;  

• The  disk  image  containing  the  VM's  file  system;  and  

• Optionally,  the  price  of  using  a  VM  over  one  hour.  

That   information   is   static,   i.e.   described  once   and   reusable,   and   is   provided  by  

the   gateway   administrator   at   the   set-­‐up   time   of   the   infrastructure.   The  

administrator   can   update,   add,   and   delete   templates   at   any   time.   In   addition,  

each  gateway  of  the  InterGrid  network  has  to  agree  on  the  templates  in  order  to  

provide  the  same  configuration  on  each  site.    

To   deploy   an   instance   of   a   given   template,   a   descriptor   is   generated   from   the  

information   in   the   template.   The   descriptor   contains   the   same   fields   as   the  

                                                                                                               3  http://www.opennebula.org  

4  http://www.xen.org/  

template   and   additional   information   related   to   a   specific   VM   instance.   The  

additional  fields  are:  

• The  disk  image  that  contains  the  file  system  for  the  VM;  

• The  address  of  the  physical  machine  that  hosts  the  VM;  

• The  network  configuration  of  the  VM;  and  

• For   deploying   on   an   IaaS   provider,   the   required   information   on   the  

remote  infrastructure,  such  as  account  information  for  the  provider.  

Before   starting   an   instance,   the   network   configuration   and   the   address   of   the  

host  are  given  by  the  scheduler.  The  scheduler  allocates  a  MAC  address  and/or  

an  IP  address  for  that  instance.  The  disk  image  field  is  specified  by  the  template  

but   can  be  modified   in   the  descriptor.  To  deploy  several   instances  of   the   same  

template   in   parallel,   each   instance   uses   a   temporary   copy   of   the   disk   image  

specified  by  the  template.  Hence,  the  descriptor  contains  the  path  to  the  copied  

disk  image.    

The   fields   of   the   descriptor   can   be   different   for   deploying   a   VM   on   an   IaaS  

provider.   For   example,   the   network   information   is   not   mandatory   for   using  

Amazon   as   EC2   automatically   assigns   a   public   IP   to   the   instances.   In   addition,  

EC2  creates  copies  of  the  disk  image  seamlessly  for  running  several  instances  in  

parallel.  Before  running  an  instance,  the  disk  image  must  be  uploaded  to  Amazon  

EC2  thus  ensuring  that  the  template  has  its  corresponding  disk  image.  

Virtual  Machine  Template  Directory,   the  gateway  works  with  a  repository  of  

VM   templates.   That   is,   templates   can   be   registered   to   the   repository   by   the  

gateway  administrator.   In  this  way,  a  user  can  request   instances  of  VMs  whose  

template   is   registered   on   the   gateway's   repository.   In   addition,   the   gateway  

administrator  has  to  upload  the  images  to  Amazon  if  the  gateway  uses  the  cloud  

as  a   resource  provider.  The  user   currently   cannot   submit  her  own   template  or  

disk  images  to  the  gateway.  

Virtual  Machine  Manager   allows   the  gateway   to  submit  and  deploy  VMs  on  a  

physical   infrastructure.   It   also   interacts   with   a   VIE   to   create   and   stop   virtual  

machines   on   a   physical   cluster.   The   implementation   of   the   VMM   is   generic   in  

order  to  connect  with  different  VIEs.  We  have  developed  VMMs  for  Open  Nebula,  

Amazon  EC2,  and  Grid’5000.  The  connection  with  Open  Nebula  consists  in  using  

the  Java  client5,  to  submit  and  stop  VMs  and  in  transforming  our  VM  template  to  

the   format  recognized  by  Open  Nebula.  Open  Nebula  runs  as  a  daemon  service  

on  a  master  node,  hence  the  VMM  works  as  a  remote  user  of  Open  Nebula.  

We  have  also  implemented  a  connector  to  deploy  VMs  on  IaaS  providers.  For  the  

moment,   only   Amazon   is   supported.   The   connector   is   a   wrapper   for   the  

command   line   tool   provided   by   Amazon.   In   addition,   we   have   developed   an  

emulated  VIE  for  testing  and  debugging  purposes.  The  VMM  for  Grid’5000  is  also  

a   wrapper   for   the   command   line   tools   (i.e.   OAR   scheduler   and   Kadeploy    

virtualization).  The  emulator  provides  a  list  of  fake  machines,  where  we  can  set  

the  number  of  cores  for  hosting  VMs.  Figure  4  shows  the  interaction  between  the  

gateway,  the  template  directory,  and  the  VMM.    

Virtual Machine Manager Service

EmulatorOpen Nebula

Amazon EC2

Public API

local physical infrastructure

VMVM

Interface

Template Directory

...opensuse; 1 core; 512MBfedora; 2 cores; 256 MBubuntu; 1 core; 128 MB

VMInstance vm = vmms.submit(vmTemplate, host) vmInstance.shutdown()

convert the generic templateto the virtual infrastructure engine

formatIaaS

VM

VM VM

VM

OAR/Kadeploy

Grid'5000  

Figure  4:  Design  and  interactions  of  the  Virtual  Machine  Manager.  

                                                                                                               5  http://opennebula.org/doku.php?id=ecosystem#java_api  

Distributed  Virtual  Environment  Manager  (DVE-­‐Manager)  

A  DVE  Manager  interacts  with  the  IGG  by  making  requests  for  VMs  and  querying  

their  status.  The  DVE  Manager  requests  VMS  from  the  gateway  on  behalf  of  the  

user  application  it  represents.    

When  the  reservation  starts,  the  DVE  manager  obtains  the  list  of  requested  VMs  

from  the  gateway.  This  list  contains  a  tuple  of  public  IP/private  IP  for  each  VM,  

which   the  DVE  Manager  uses   to   access   the  VMs   (with   SSH   tunnels).  With  EC2,  

VMs   have   a   public   IP   hence   the   DVE   can   access   to   the   VMs   directly   without  

tunnels.   Then,   the   DVE   Manager   handles   the   deployment   of   the   user's  

application.   We   have   evaluated   the   system   with   a   bag-­‐of-­‐tasks   application  

without  dependencies  between  tasks.  

InterGrid  Gateway  at  Runtime  

Figure  5  shows  the  main  interactions  between  InterGrid's  components.  The  user  

first  requests  VMs,  the  request  is  handle  by  a  command  line  interface.  The  user  

must   specify   which   VM   template   she   wants   to   use;   and   can   also   specify   the  

number   of   VM   instances,   the   ready   time   for   her   reservation,   the   deadline,   the  

walltime,   and   the   address   of   an   alternative   gateway.   The   client   returns   an  

identifier  for  the  submitted  request  from  the  gateway.  Next,  the  user  starts  a  DVE  

manager  with  the  returned  identifier  (or  a  list  of  identifiers)  and  its  application  

as  parameters.  The  application  is  described  by  a  text  file  where  each  line  is  one  

task  to  execute  on  a  remote  VM.  The  task  is  indeed  the  command  line  to  run  with  

SSH.   The  DVE  Manager  waits   until   the   request   has   been   scheduled   or   refused.  

The  local  gateway  tries  to  obtain  resources  from  the  underlying  VIEs.  When  that  

is  not  possible,   the   local   gateway  starts  a  negotiation  with   remote  gateways   in  

order   to   fulfill   the   request.   When   a   gateway   can   fulfill   the   request,   i.e.   can  

schedule  the  VMs,  it  sends  the  access  information  to  connect  to  the  assigned  VMs  

to   the   requester  gateway.  Once   the   requester  gateway  has  collected  all   the  VM  

access   information,   this   information   is   made   available   for   the   DVE   Manager.  

Finally,   the  DVE  Manager  configures  the  VMs,  set-­‐up  SSH  tunnels,  and  executes  

the   tasks   on   the   VMs.   In   future   work,   we   want   to   improve   the   description   of  

applications   to   allow   file-­‐transfer,   dependencies   between   tasks,   and   VM  

configuration.  

User

DVE Manager

Internet

Gateway

VM Manager

...1.2.3.71.2.3.61.2.3.41.2.3.1

VM address

...T5T4T3T2T1

Gateway

VM VM VMVM...

Gateway

App.

1. Submit an applicationwith resource requests

2. Forwardresource requests

3. Start thenegotiation

4. Find resources

5. Send VM access

6. Get VM list

7. Execute application tasks

Amazon EC2

 

Figure  5:  The  main  interactions  among  the  InterGrid  components.  

Under  the  peering  policy  considered  in  this  work,  each  gateway’s  scheduler  uses  

conservative   backfilling   to   schedule   the   requests.   When   the   scheduler   cannot  

start  a   request   immediately  using   local   resources,   then  a   redirection  algorithm  

will:  

1) Contact   the   remote   gateways   and   ask   for   offers   containing   the   earliest  

start   time   at   which   they  would   be   able   to   serve   the   request,   if   it   were  

redirected.  

2) For  each  offer  received,  check  whether  the  request  start  time  proposed  by  

a   peering   gateway   is   than   that   given   by   local   resources.   That   being   the  

case,   the   request   is   redirected;   otherwise,   the   algorithm  will   check   the  

next  offer.  

3) If   the   request   start   time   given   by   local   resources   is   better   than   those  

proposed   by   remote   gateways,   then   the   algorithm   will   schedule   the  

request  locally.  

This   strategy   has   been   previously   proposed   and   evaluated   in   a   smaller  

environment  including  a  local  cluster  and  a  cloud  computing  provider  [9].  

Experiments  

This   section   describes   two   experiments.   The   first   experiment   evaluates   the  

performance   of   allocation   decisions   by   measuring   how   the   gateways   manage  

load   via   peering   arrangements.   The   second   experiment   considers   the  

effectiveness   of   InterGrid   for   deploying   a   bag-­‐of-­‐tasks   application   on   Cloud  

providers.  

Peering  Arrangements  

This   experiment   uses   the   French   experimental   Grid   platform,   Grid'5000,   as  

scenario  and  testbed.  Grid'5000  comprises  nine  sites  geographically  distributed  

in  France,  currently  featuring  4792  cores.  

Each   gateway   created   in   this   experiment   represents   one   site   of  Grid'5000;   the  

gateway  runs  on  that  site.  To  prevent  gateways  from  interfering  on  real  users  of  

Grid’5000,   we   use   the   emulated   VMM,   which   instantiates   fictitious   VMs.   The  

number   of   emulated   hosts   is   the   number   of   real   cores   available   on   each   site.  

Figure  6  illustrates  the  Grid’5000  sites  and  the  evaluated  gateway  configurations.  

Sophia

OrsayRennes

Lille

Lyon

GrenobleBordeaux

Toulouse

Nancy

10 Gbps Link

# of cores: 272

# of cores: 684

# of cores: 268

# of cores: 436

# of cores: 574

# of cores: 618

# of cores: 650

# of cores: 568

# of cores: 722

Grid'5000 sites

IGG IGG

IGG IGG

IGG

IGG IGG

IGG IGG

Peering

Sophia

Sophia

Sophia

Orsay

Nancy

Orsay

Rennes

Rennes Nancy

2 gateways

3 gateways

4 gateways

 

Figure  6:  InterGrid  testbed  over  Grid'5000.  

The  site's  workloads  are  generated  using  Lublin  and  Feitelson's  model  [10],  here  

referred   to   as   Lublin99.   Lublin99   is   configured   to   generate   one-­‐day-­‐long  

workloads;  the  maximum  number  of  VMs  required  by  generated  requests  is  the  

number  of  cores  in  the  site.  To  generate  different  workloads,  the  mean  number  

of   virtual   machines   required   by   a   request   (specified   in  

log2)   is   set   to  

log2m − umed ,  where  m  is  the  maximum  number  of  virtual  machines  allowed  in  

system.  We  randomly  vary  umed  from  1.5  to  3.5.  In  addition,  to  simulate  a  burst  

of  request  arrivals  and  heavy  loads,  thus  stretching  the  system,  we  multiply  the  

inter-­‐arrival  time  by  0.1.  

 

Figure  7:  Load  characteristics  under  the  four-­‐gateway  scenario.  

Figure   7   shows   the   load   characteristics   under   the   four-­‐gateway   scenario.   The  

blue  bars  indicate  the  load  of  each  site  when  they  are  not  interconnected;  the  red  

bars  show  the   load  when  gateways  redirect  requests  to  one  another;   the  green  

bars   correspond   to   the   amount   of   load   accepted   by   each   gateway   from   other  

gateways;   and   the   purple   bars   represent   the   amount   of   load   redirected.   The  

results   show   that   the   policy   used   by   gateways   balances   the   load   across   sites,  

making   the   it   tend   to   1.   Rennes,   a   site  with   heavy   load,   benefits   from   peering  

with  other  gateways  as  a  great  share  of  its  load  is  redirected  to  other  sites.  

Table  1  presents  the  job  slowdown  improvement  caused  by  the  interconnection  

of  gateways.  Overall,  the  job  slowdown  is  improved  by  the  interconnection.  Sites  

with  the  heaviest   loads  (i.e.  Rennes  and  Nancy)  have  better  improvements.  The  

job  slowdown  of  sites  with  lower  loads  is  in  fact  worsened;  however,  this  impact  

is  minimized  as  the  number  of  gateways  increases,  which  leads  to  the  conclusion  

that  sites  with  light  load  suffer  a  smaller  impact  as  the  number  of  interconnected  

gateways   increases.   The   experiments   demonstrate   that   peering   is   overall  

beneficial   to   interconnected   sites;   benefits   derive   from   load   balancing   and  

overall  job  slowdown  improvement.  

Table  1:  Job  slowdown  improvement  under  different  gateway  configurations.  

Site   2  Gateways   3  Gateways   4  Gateways  Orsay   -­‐0.00006   N/A   0.00010  Nancy   N/A   3.95780   4.30864  Rennes   N/A   7.84217   12.82736  Sophia   0.20168   -­‐6.12047   -­‐3.12708  

 

Deploying  a  Bag-­‐of-­‐Tasks  Application  

This   experiment   considers   Evolutionary   Multi-­‐criterion   Optimization   (EMO)  

[11],  a  bag-­‐of-­‐tasks  application  for  solving  optimization  problems  using  a  multi-­‐

objective   evolutionary   algorithm.   Evolutionary   algorithms   are   a   class   of  

population-­‐based  meta-­‐heuristics  exploiting  the  concept  of  population  evolution  

to   find   solutions   to   optimization   problems.   The   optimal   solution   is   found   by  

using   an   iterative  process   that   evolves   the   collection  of   individuals   in   order   to  

improve  the  quality  of  the  solution.  Each  task  is  an  EMO  process  that  explores  a  

different  set  of  populations.  

Table  2:  Experimental  results  with  1  and  2  gateways  using  resources  from  Amazon  EC2.  

Number  of  VMs   1  Gateway  (time  in  seconds)  

2  Gateways  Each  gateway  provides  1/2  of  the  VMs  

(time  in  seconds)  5   4,780   -­‐  10   3,017   3,177  15   2,407   -­‐  20   2,108   2,070  

 

Figure   8   shows   the   testbed   for   running   the   experiments.   We   carry   out   one  

experiment   in   two   steps.   First,  we  evaluate   the   execution   time  of  EMO  using   a  

single   gateway.   Then,   we   force   InterGrid   to   provide   resources   from   two  

gateways.   In  this  case,  we   limit   the  number  of  available  cores   for  running  VMs,  

and   the   DVE   Manager   submits   two   requests.   For   10   VMs,   both   gateways   are  

limited  to  nine  cores  and  the  DVE  Manager  sends  two  requests  for  five  VMs  each.  

Next,   for   20   VMs   gateways   set   the   limit   to   20   cores   and   the   DVE   Manager  

requests  10  VMs  twice.  The  two  gateways  use  resources  from  EC2.  The  requests  

demand  a  small  EC2  instance  running  Windows  2003  Server.  Table  2  reports  the  

results   of   both   experiments.   The   experiments   show   that   the   execution   time   of  

the  bag-­‐of-­‐tasks  application  does  not  suffer  important  performance  degradations  

with  one  or  two  gateways.    

Amazon EC2

IGG-1 IGG-2

USA

site 1

USA

site n...

Getting Virtual Machine from the Cloud

Negotiating resources

 

Figure  8:  Testbed  used  to  run  EMO  on  a  Cloud  Computing  provider.  

Conclusion  

This   article   presented   a   system,   the   InterGrid,   for   deploying   applications   on  

multiple-­‐site  execution  environments.  The  system  relies  on  gateways  inspired  by  

the   peering   agreements   between   ISPs.   We   described   the   virtualization  

technology   used   by   InterGrid   to   abstract   physical   infrastructures.   Then,   we  

showed  the  motivation  and  the  realization  of  the  InterGrid  system.    

 

As   the   management   of   the   local   resource   is   based   on   virtual   infrastructure  

engines,   we   detail   how   the   gateway   interacts   with   these   engines   to   control  

virtual  machines.  In  addition,  the  InterGrid  can  instantiate  VMs  on  Amazon  EC2  

and  deploy  system  images  on  Grids,  such  as  Grid’5000.  

 

Experimental  results  emulating  a  real  Grid  showed  the  performance  of  allocation  

decisions  made  by  gateways.  We  also  presented  a  runtime  scenario  for  deploying  

a   parallel   evolutionary   algorithm   for   solving   optimization   problems.  With   that  

application,   we   validated   the   realization   of   the   InterGrid   by   comparing   the  

runtime   execution   of   the   application   when   obtaining   resources   from   one  

gateway   at   a   time   and   from   two   gateways,   which   were   deploying   VMs   on  

Amazon  EC2.  

 

In  future  work,  we  plan  to  improve  the  VM  template  directory  to  allow  users  to  

submit  their  own  VMs  and  to  synchronize  the  available  VMs  between  gateways.  

In  addition,   the  security  aspects  have  not  been  addressed   in   this  work  because  

they   are   handled   at   the   operating   system   and   network   levels,   it   would   be  

interesting  to  address  those  aspects  at  the  InterGrid  level.  

Acknowledgments  

We  thank  Mohsen  Amini  for  helping  in  the  system  implementation.  This  work  is  

supported   by   research   grants   from   the   Australian   Research   Council   and  

Australian   Department   of   Innovation,   Industry,   Science   and   Research.   Marcos'  

PhD  research   is  partially   supported  by  NICTA.  Some  experiments  were  carried  

out  using  the  Grid'5000  experimental  testbed,  being  developed  under  the  INRIA  

ALADDIN   development   action   with   support   from   CNRS,   RENATER,   several  

universities,  and  other  funding  bodies  (see  https://www.grid5000.fr).  

References  [1] Enabling   Grids   for   E-­‐sciencE   (EGEE)   project,   http://public.eu-­‐egee.org  

(2005).    

[2] F.  Cappello,  E.  Caron,  M.  Dayde,  F.  Desprez,  Y.  Jegou,  P.  Primet,  E.  Jeannot,  S.  Lanteri,  J.  Leduc,  N.  Melab,  G.  Mornet,  R.  Namyst,  B.  Quetier,  and  O.  Richard.  “Grid’5000:   a   large   scale   and   highly   reconfigurable   grid   experimental  testbed”.   In   6th   IEEE/ACM   International   Workshop   on   Grid   Computing,  2005.    

[3] Grid   Interoperability   Now   Community   Group   (GIN-­‐CG),  http://forge.ogf.org/sf/projects/gin  (2006).  

[4] X.  Zhu,  D.  Young,  B.  Watson,  Z.  Wang,  J.  Rolia,  S.  Singhal,  B.  McKee,  C.  Hyser,  D.  Gmach,   R.  Gardner,   et  al.   1000   Islands:   Integrated   Capacity   and  Workload  Management  for  the  Next  Generation  Data  Center.  In  Autonomic  Computing,   2008.   ICAC’08.   International   Conference   on,   pages   172–181,  2008.  

[5] M.  Armbrust,  A.  Fox,  R.  Griffith,  A.  D.  Joseph,  R.  H.  Katz,  A.  Konwinski,  G.  Lee,  D.  A.   Patterson,   A.  Rabkin,   I.  Stoica,   and   M.  Zaharia.   Above   the   clouds:   A  berkeley   view   of   cloud   computing.   Technical   Report   UCB/EECS-­‐2009-­‐28,  EECS  Department,  University  of  California,  Berkeley,  Feb  2009.  

[6] M.   D.   de   Assunção,   R.   Buyya,   S.   Venugopal,   InterGrid:   A   case   for  internetworking   islands   of   Grids,   Concurrency   and  Computation:   Practice  and  Experience  (CCPE)  20  (8)  (2008)  997–1024.  

[7] K.   Keahey,   I.   Foster,   T.   Freeman,   X.   Zhang,   Virtual  workspaces:   Achieving  quality  of  service  and  quality  of  life  in  the  Grids,  Scientific  Programming  13  (4)  (2006)  265–275.  

[8] J.  S.  Chase,  D.  E.  Irwin,  L.  E.  Grit,  J.  D.  Moore,  S.  E.  Sprenkle,  Dynamic  virtual  clusters  in  a  Grid  site  manager,  in:  12th  IEEE  International  Symposium  on  High   Performance   Distributed   Computing   (HPDC   2003),   IEEE   Computer  Society,  Washington,  DC,  USA,  2003,  p.  90.  

[9] M.  D.  de  Assunção,  A.  di  Costanzo  and  R.  Buyya.  Evaluating  the  Cost-­‐Benefit  of   Using   Cloud   Computing   to   Extend   the   Capacity   of   Clusters,   In  Proceedings   of   the   International   Symposium   on   High   Performance  Distributed  Computing  (HPDC  2009),  Munich,  Germany,  11-­‐13  Jun.  2009.  

[10] U.   Lublin   and   D.   G.   Feitelson.   The  workload   on   parallel   supercomputers:  Modeling   the   characteristics   of   rigid   jobs.   Journal   of   Parallel   and  Distributed  Computing,  63  (11)  (2008)  1105–1122.  

[11] M.   Kirley,   and   R.   Stewart,   “Multiobjective   evolutionary   algorithms   on  complex   networks”,   in   Proc.   of   the   Fourth   International   Conference   on  Evolutionary   Multi-­‐Criterion   Optimization,   Lecture   Notes   Computer  Science  4403,  Springer  Berlin,  Heidelberg,  2007,  pp.  81–95.  

 

Alexandre  di  Costanzo  is  a  research  fellow  at  the  University  of  Melbourne.  His  

research   interests   are  distributed   computing  and  especially   grid   computing.  Di  

Costanzo   has   a   PhD   in   computer   science   from   the   University   of   Nice   Sophia  

Antipolis,  France.  Contact  him  at  [email protected].  

Marcos  Dias  de  Assunção   is   a   PhD   candidate   at   the  University   of  Melbourne,  

Australia.  His  PhD  thesis  is  on  peering  and  resource  allocation  across  Grids.  The  

current   topics   of   his   interest   include   Grid   scheduling,   virtual   machines,   and  

network  virtualization.  Contact  him  at  [email protected]  

Rajkumar  Buyya  is  an  Associate  Professor  and  Reader  of  Computer  Science  and  

Software   Engineering;   and   Director   of   the   Grid   Computing   and   Distributed  

Systems   (GRIDS)  Laboratory   at   the  University  of  Melbourne,  Australia.  He   also  

serves  as  CEO  of  Manjrasoft,  Australia.  Contact  him  at  [email protected].  


Recommended