+ All Categories
Home > Documents > The Ibis e-Science Software Framework Henri Bal, Frank J. Seinstra, Jason Maassen, Niels Drost High...

The Ibis e-Science Software Framework Henri Bal, Frank J. Seinstra, Jason Maassen, Niels Drost High...

Date post: 14-Dec-2015
Category:
Upload: joshua-janson
View: 220 times
Download: 0 times
Share this document with a friend
Popular Tags:
58
The Ibis e-Science Software Framework Henri Bal, Frank J. Seinstra, Jason Maassen, Niels Drost High Performance Distributed Computing Group Department of Computer Science VU University, Amsterdam, The Netherlands
Transcript

The Ibis e-Science Software Framework

Henri Bal, Frank J. Seinstra, Jason Maassen, Niels Drost

High Performance Distributed Computing Group

Department of Computer Science

VU University, Amsterdam, The Netherlands

Introduction

● Distributed systems continue to change● Clusters, grids, clouds, mobile devices

● Distributed applications continue to change● e-Science, web, pervasive applications

● Distributed programming continues to be notoriously difficult

Distributed Systems: 1980sMultiple PCs on a (local) network

● Networks of Workstations (NOWs)● Collections of Workstations (COWs)● Processor pools● Condor pools● Clusters

Distributed Systems: 1990sSharing wide-area resources

● Metacomputing (Smarr & Catlett, CACM)● Flocking Condor (Epema)● DAS (Distributed ASCI Supercomputer)● Grid Blueprint (Foster & Kesselman)● Desktop grids, SETI@home

Distributed Systems: 2000s

● Cloud computing● Pay-on-demand● Virtualization

● Hardware diversity /heterogeneous computing● Green IT● The Networked World

● Sensor networks● Smart phones

Our approach● Study fundamental underlying problems● … hand-in-hand with realistic applications● … integrate solutions in one system: Ibis

Distributed SystemsUser

!

● Funding from NWO (2002), VL-e (2003-2009), EU (JavaGAT, XtreemOS, Contrail), VU, COMMIT

● ‘Problem Solving’ vs. ‘System Fighting’● Jungle Computing● Example applications:

● Computational Astrophysics● Multimedia Content Analysis

● The Ibis Software Framework● The 3 Common Uses of Ibis

● ‘Master Key’ + ‘Glue’ + ‘HPC’

● Some current work: Green Clouds

Outline

Ibis: ‘Problem Solving’ vs. ‘System Fighting’

● DACH 2008, Japan● Distributed multi-cluster system

● Heterogeneous● Distributed database (image pairs)

● Large vs small databases/images● Partial replication

● Image-pair comparison given (in C)

● Find all supernova candidates● Task 1: As fast as possible● Task 2: Idem, under system crashes

A Random Example: Supernova Detection

‘Problem Solving’ vs. ‘System Fighting’

● All participating teams struggled (1 month)● Middleware instabilities…● Connectivity problems…● Load balancing…

● But not the Ibis team● Winner (by far) in both categories● Note: many Japanese teams with years of experience

● Hardware, middleware, network, C-code, image data…● Focus on ‘problem solving’, not ‘system fighting’

● incl. ‘opening’ of black-box C-code

Ibis Results: Awards & Prizes

1st Prize: SCALE 2008

AAAI-VC 2007Most Visionary Research Award

1st Prize: DACH 2008 - BS 1st Prize: DACH 2008 - FT

WebPie: A Web-Scale Parallel Inference Engine

J. Urbani, S. Kotoulas,J. Maassen, N. Drost, F.J. Seinstra,

F. van Harmelen, and H.E. Bal

3rd Prize: ISWC 2008 1st Prize: SCALE 2010

● Many domains; data/compute intensive, real-time...● Winner Sustainability Award in the Enlighten Your Research (EYR)

competition, 7 Dec. 2011 (Frank Seinstra)

Ibis Users…

…and many more

Jungle Computing

Jungle Computing (Frank Seinstra)

● ‘Worst case’ computing as required by end-users● Distributed● Heterogeneous● Hierarchical (incl. multi-/many-cores)

Why Jungle Computing?

● Scientists often forced to use a wide variety of resources simultaneously to solve computational problems, e.g. due to:

● Desire for scalability● Distributed nature of (input) data● Software heterogeneity (e.g.: mix of C/MPI and CUDA)● Ad hoc hardware availability● Energy consumption (use most energy-efficient resource)● …

● Note: most users do not need ‘worst case’ jungle● Ibis aims to apply to any subset

Example Application Domains

● Computational Astrophysics (Leiden)● AMUSE: multi-model / multi-kernel simulations● “Simulating the Universe on an Intercontinental

Grid” - Portegies Zwart et al (IEEE Computer, Aug 2010)

● Climate Modeling (Utrecht)● CPL: multi-model / multi-kernel simulations

● Atmosphere, ocean, source rock formation, …- hardware: (potentially) very

diverse - high resolution => speed & scalability - …

Domain Example #1:

Computational Astrophysics

Domain Example #1: Computational Astrophysics

Demonstrated live at SC’11, Nov 12-18, 2011, Seattle, USA (two week ago)

Domain Example #1: Computational Astrophysics

AMUSE

radiative transport

gravitational dynamics

hydro-dynamics

stellar evolution

● The AMUSE system (Leiden University)● Early Star Cluster Evolution, including gas

● Gravitational dynamics (N-body): GPU / GPU-cluster● Stellar evolution: Beowulf cluster / Cloud● Hydro-dynamics, Radiative transport: Supercomputer

Domain Example #1: Computational Astrophysics

Demonstrated live at SC’11, Nov 12-18, 2011, Seattle, USA

Domain Example #2:

Multimedia Content Analysis

Multimedia Content Analysis (MMCA)

● Aim:● Automatic extraction of ‘semantic concepts’ from image sets and

video streams

● Depending on specific problem & size of data set:● May take hours, days, weeks, months, years…

● Applications in (a.o):● Remote Sensing● Security / Surveillance● Medical Imaging● Document Analysis● Multimedia Systems● Astronomy

● Application types:● Real-time vs. off-line● Fine-grained vs. coarse-grained● Data-intensive / compute-intensive / information-intensive

Multimedia Content Analysis (MMCA)

Domain Example #2: Color-based Object Recognition by a Grid-connected Robot Dog

Seinstra et al (IEEE Multimedia, Oct-Dec 2007)Seinstra et al (AAAI’07: Most Visionary Research Award)

Successful…

● …but many fundamental problems unsolved!● Scaling up to very large systems● Platform independence● Middleware independence● Connectivity (a.o. firewalls, …)● Fault-tolerance● …

● Software support tool(s) urgently needed!● Jungle-aware + transparent + efficient● No progress until ‘discovery’ of Ibis

The Ibis Software Framework

The Ibis Software Framework

● Offers all functionality to efficiently & transparently implement & run Jungle Computing applications

● Designed for dynamic / hostile environments

● Modular and flexible● Allow replacement of Ibis components by external ones, including

native code

● Open source● Download: http://www.cs.vu.nl/ibis/

Ibis Design

● Applications need functionality for● Programming (as in programming languages)● Deployment (as in operating systems)

Programming

Logical

Likes math

Deployment

Practical

Visual (GUI)

Ibis Software Stack

JavaGAT

● Java Grid Application Toolkit● High-level API for developing (Grid) applications independently of

the underlying (Grid) middleware● Use (Grid) services; file cp, resource discovery, job submission, …

● Note: SAGA API standardized by OGF● Simple API for Grid Applications (a.o. with LSU)● SAGA on top of JavaGAT (and v.v.)

Zorilla

● A prototype P2P middleware● A Zorilla system consists of a collection of nodes, connected by a

P2P network● Each node independent & implements all middleware functionality● No central components● Supports fault-tolerance and malleability● Easily combines resources in multiple administrative domains

IbisDeploy

Ibis Portability Layer (IPL)

● Java-centric ‘run-anywhere’ communication library● Sent along with your application● “MPI for the Grid”

● Supports fault-tolerance and malleability● Resource tracking (Join-Elect-Leave model)● Open-world / Closed world

● Efficient● Highly optimized object serialization● Can use optimized native libraries (e.g. MPI, Infiniband)

SmartSockets

● Robust connection setup

● Always connection in 30 different scenarios

Problems:Firewalls

Network Address Translation (NAT)

Non-routed networksMulti-homing

Ibis Programming Models

● IPL-based programming models, a.o.:● Satin:

● A divide-and-conquer model● MPJ:

● The MPI binding for Java● RMI:

● Object-Oriented remote Procedure Call● Jorus:

● A ‘user transparent’ parallel model for multimedia applications

The 3 Common Uses of Ibis

Ibis as ‘Master Key’ (or ‘Passepartout’)

● Use JavaGAT to access ‘any’ system● Develop/run applications independently of available middlewares● JavaGAT ‘adaptors’ required for each middleware● ‘Intelligent dispatching’ even allows for transparent use of multiple

middlewares

● Example: file copy● JavaGAT vs. Globus

● Simple, portable, …● SAGA API standardized

package org.gridlab.gat.io.cpi.rftgt4; import java.net.MalformedURLException;import java.net.URL;import java.rmi.RemoteException;import java.security.cert.X509Certificate;import java.util.Calendar;import java.util.HashMap;import java.util.LinkedList;import java.util.List;import java.util.Map;import java.util.Vector; import javax.xml.namespace.QName;import javax.xml.rpc.ServiceException;import javax.xml.rpc.Stub;import javax.xml.soap.SOAPElement; import org.apache.axis.message.addressing.EndpointReferenceType;import org.apache.axis.types.URI.MalformedURIException;import org.globus.axis.util.Util;import org.globus.delegation.DelegationConstants;import org.globus.delegation.DelegationException;import org.globus.delegation.DelegationUtil;import org.globus.gsi.GlobusCredential;import org.globus.gsi.GlobusCredentialException;import org.globus.gsi.gssapi.GlobusGSSCredentialImpl;import org.globus.gsi.jaas.JaasGssUtil;import org.globus.rft.generated.BaseRequestType;import org.globus.rft.generated.CreateReliableFileTransferInputType;import org.globus.rft.generated.CreateReliableFileTransferOutputType;import org.globus.rft.generated.DeleteRequestType;import org.globus.rft.generated.DeleteType;import org.globus.rft.generated.OverallStatus;import org.globus.rft.generated.RFTFaultResourcePropertyType;import org.globus.rft.generated.RFTOptionsType;import org.globus.rft.generated.ReliableFileTransferFactoryPortType;import org.globus.rft.generated.ReliableFileTransferPortType;import org.globus.rft.generated.Start;import org.globus.rft.generated.TransferRequestType;import org.globus.rft.generated.TransferType;import org.globus.transfer.reliable.client.BaseRFTClient;import org.globus.transfer.reliable.service.RFTConstants;import org.globus.wsrf.NotificationConsumerManager;import org.globus.wsrf.NotifyCallback;import org.globus.wsrf.ResourceException;import org.globus.wsrf.WSNConstants;import org.globus.wsrf.container.ContainerException;import org.globus.wsrf.container.ServiceContainer;import org.globus.wsrf.core.notification.ResourcePropertyValueChangeNotificationElementType;import org.globus.wsrf.encoding.DeserializationException;import org.globus.wsrf.encoding.ObjectDeserializer;import org.globus.wsrf.impl.security.authentication.Constants;import org.globus.wsrf.impl.security.authorization.Authorization;import org.globus.wsrf.impl.security.authorization.HostAuthorization;import org.globus.wsrf.impl.security.authorization.IdentityAuthorization;import org.globus.wsrf.impl.security.authorization.SelfAuthorization;import org.globus.wsrf.impl.security.descriptor.ClientSecurityDescriptor;import org.globus.wsrf.impl.security.descriptor.ContainerSecurityDescriptor;import org.globus.wsrf.impl.security.descriptor.GSISecureMsgAuthMethod;import org.globus.wsrf.impl.security.descriptor.GSITransportAuthMethod;import org.globus.wsrf.impl.security.descriptor.ResourceSecurityDescriptor;import org.globus.wsrf.impl.security.descriptor.SecurityDescriptorException;import org.globus.wsrf.security.SecurityManager;import org.gridlab.gat.CouldNotInitializeCredentialException;import org.gridlab.gat.CredentialExpiredException;import org.gridlab.gat.GATContext;import org.gridlab.gat.GATInvocationException;import org.gridlab.gat.GATObjectCreationException;import org.gridlab.gat.Preferences;import org.gridlab.gat.URI;import org.gridlab.gat.io.cpi.FileCpi;import org.gridlab.gat.security.globus.GlobusSecurityUtils;import org.ietf.jgss.GSSCredential;import org.ietf.jgss.GSSException;import org.oasis.wsn.Subscribe;import org.oasis.wsn.TopicExpressionType;import org.oasis.wsrf.faults.BaseFaultType;import org.oasis.wsrf.lifetime.SetTerminationTime;import org.oasis.wsrf.properties.GetMultipleResourcePropertiesResponse;import org.oasis.wsrf.properties.GetMultipleResourceProperties_Element;import org.oasis.wsrf.properties.ResourcePropertyValueChangeNotificationType; class RFTGT4NotifyCallback implements NotifyCallback { RFTGT4FileAdaptor transfer; OverallStatus status;  public RFTGT4NotifyCallback(RFTGT4FileAdaptor transfer) { super(); this.transfer = transfer; this.status = null; }  @SuppressWarnings("unchecked") public void deliver(List topicPath, EndpointReferenceType producer, Object messageWrapper) { try { ResourcePropertyValueChangeNotificationType message = ((ResourcePropertyValueChangeNotificationElementType) messageWrapper) .getResourcePropertyValueChangeNotification(); this.status = (OverallStatus) message.getNewValue().get_any()[0] .getValueAsType(RFTConstants.OVERALL_STATUS_RESOURCE, OverallStatus.class); if (status.getFault() != null) { transfer.setFault(getFaultFromRP(status.getFault())); } // RunQueue.getInstance().add(this.resourceKey); } catch (Exception e) { } transfer.setStatus(status); }  private BaseFaultType getFaultFromRP(RFTFaultResourcePropertyType fault) { if (fault == null) { return null; }  if (fault.getDelegationEPRMissingFaultType() != null) { return fault.getDelegationEPRMissingFaultType(); } else if (fault.getRftAuthenticationFaultType() != null) { return fault.getRftAuthenticationFaultType(); } else if (fault.getRftAuthorizationFaultType() != null) { return fault.getRftAuthorizationFaultType(); } else if (fault.getRftDatabaseFaultType() != null) { return fault.getRftDatabaseFaultType(); } else if (fault.getRftRepeatedlyStartedFaultType() != null) { return fault.getRftRepeatedlyStartedFaultType(); } else if (fault.getTransferTransientFaultType() != null) { return fault.getTransferTransientFaultType(); } else if (fault.getRftTransferFaultType() != null) { return fault.getRftTransferFaultType(); } else { return null; } }} @SuppressWarnings("serial")public class RFTGT4FileAdaptor extends FileCpi { public static final Authorization DEFAULT_AUTHZ = HostAuthorization .getInstance(); Integer msgProtectionType = Constants.SIGNATURE; static final int TERM_TIME = 20; static final String PROTOCOL = "https"; private static final String BASE_SERVICE_PATH = "/wsrf/services/"; public static final int DEFAULT_DURATION_HOURS = 24; public static final Integer DEFAULT_MSG_PROTECTION = Constants.SIGNATURE; public static final String DEFAULT_FACTORY_PORT = "8443"; private static final int DEFAULT_GRIDFTP_PORT = 2811; NotificationConsumerManager notificationConsumerManager; EndpointReferenceType notificationConsumerEPR; EndpointReferenceType notificationProducerEPR; String securityType;

String factoryUrl; GSSCredential proxy; Authorization authorization; String host; OverallStatus status; BaseFaultType fault; String locationStr; ReliableFileTransferFactoryPortType factoryPort;  public RFTGT4FileAdaptor(GATContext gatContext, Preferences preferences, URI location) throws GATObjectCreationException { super(gatContext, preferences, location); if (!location.isCompatible("gsiftp") && !location.isCompatible("gridftp")) { throw new GATObjectCreationException("cannot handle this URI"); }  String globusLocation = System.getenv("GLOBUS_LOCATION"); if (globusLocation == null) { throw new GATObjectCreationException("$GLOBUS_LOCATION is not set"); } System.setProperty("GLOBUS_LOCATION", globusLocation); System.setProperty("axis.ClientConfigFile", globusLocation + "/client-config.wsdd"); this.host = location.getHost(); this.securityType = Constants.GSI_SEC_MSG; this.authorization = null; this.proxy = null;  try { proxy = GlobusSecurityUtils.getGlobusCredential(gatContext, preferences, "globus", location, DEFAULT_GRIDFTP_PORT); } catch (CouldNotInitializeCredentialException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (CredentialExpiredException e) { // TODO Auto-generated catch block e.printStackTrace(); }  this.notificationConsumerManager = null; this.notificationConsumerEPR = null; this.notificationProducerEPR = null; this.status = null; this.fault = null; factoryPort = null; this.factoryUrl = PROTOCOL + "://" + host + ":" + DEFAULT_FACTORY_PORT + BASE_SERVICE_PATH + RFTConstants.FACTORY_NAME; locationStr = setLocationStr(location); }  String setLocationStr(URI location) { if (location.getScheme().equals("any")) { return "gsiftp://" + location.getHost() + ":" + location.getPort() + "/" + location.getPath(); } else { return location.toString(); } }  protected boolean copy2(String destStr) throws GATInvocationException { EndpointReferenceType credentialEndpoint = getCredentialEPR();  TransferType[] transferArray = new TransferType[1]; transferArray[0] = new TransferType(); transferArray[0].setSourceUrl(locationStr); transferArray[0].setDestinationUrl(destStr);  RFTOptionsType rftOptions = new RFTOptionsType(); rftOptions.setBinary(Boolean.TRUE); // rftOptions.setIgnoreFilePermErr(false); TransferRequestType request = new TransferRequestType(); request.setRftOptions(rftOptions); request.setTransfer(transferArray); request.setTransferCredentialEndpoint(credentialEndpoint); setRequest(request);  while (!transfersDone()) { try { Thread.sleep(1000); } catch (InterruptedException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } } return transfersSucc(); }  public void copy(URI dest) throws GATInvocationException { String destUrl = setLocationStr(dest); if (!copy2(destUrl)) { throw new GATInvocationException( "RFTGT4FileAdaptor: file copy failed"); } }  public void subscribe(ReliableFileTransferPortType rft) throws GATInvocationException { Map<Object, Object> properties = new HashMap<Object, Object>(); properties.put(ServiceContainer.CLASS, "org.globus.wsrf.container.GSIServiceContainer"); if (this.proxy != null) { ContainerSecurityDescriptor containerSecDesc = new ContainerSecurityDescriptor(); SecurityManager.getManager(); try { containerSecDesc.setSubject(JaasGssUtil .createSubject(this.proxy)); } catch (GSSException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: ContainerSecurityDescriptor failed, " + e); } properties.put(ServiceContainer.CONTAINER_DESCRIPTOR, containerSecDesc); } this.notificationConsumerManager = NotificationConsumerManager .getInstance(properties); try { this.notificationConsumerManager.startListening(); } catch (ContainerException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: NotificationConsumerManager failed, " + e); } List<Object> topicPath = new LinkedList<Object>(); topicPath.add(RFTConstants.OVERALL_STATUS_RESOURCE); ResourceSecurityDescriptor securityDescriptor = new ResourceSecurityDescriptor(); String authz = null; if (authorization == null) { authz = Authorization.AUTHZ_NONE; } else if (authorization instanceof HostAuthorization) { authz = Authorization.AUTHZ_NONE; } else if (authorization instanceof SelfAuthorization) { authz = Authorization.AUTHZ_SELF; } else if (authorization instanceof IdentityAuthorization) { // not supported throw new GATInvocationException( "RFTGT4FileAdaptor: identity authorization not supported"); } else { // throw an sg throw new GATInvocationException( "RFTGT4FileAdaptor: set authorization failed"); } securityDescriptor.setAuthz(authz); Vector<Object> authMethod = new Vector<Object>(); if (this.securityType.equals(Constants.GSI_SEC_MSG)) { authMethod.add(GSISecureMsgAuthMethod.BOTH); } else { authMethod.add(GSITransportAuthMethod.BOTH); } try { securityDescriptor.setAuthMethods(authMethod); } catch (SecurityDescriptorException e) {

throw new GATInvocationException( "RFTGT4FileAdaptor: setAuthMethods failed, " + e); }  RFTGT4NotifyCallback notifyCallback = new RFTGT4NotifyCallback(this); try { notificationConsumerEPR = notificationConsumerManager .createNotificationConsumer(topicPath, notifyCallback, securityDescriptor); } catch (ResourceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: createNotificationConsumer failed, " + e); } Subscribe subscriptionRequest = new Subscribe(); subscriptionRequest.setConsumerReference(notificationConsumerEPR); TopicExpressionType topicExpression = null; try { topicExpression = new TopicExpressionType( WSNConstants.SIMPLE_TOPIC_DIALECT, RFTConstants.OVERALL_STATUS_RESOURCE); } catch (MalformedURIException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: create TopicExpressionType failed, " + e); } subscriptionRequest.setTopicExpression(topicExpression); try { rft.subscribe(subscriptionRequest); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: subscription failed, " + e); } }  protected EndpointReferenceType getCredentialEPR() throws GATInvocationException { this.status = null; URL factoryURL = null; try { factoryURL = new URL(factoryUrl); } catch (MalformedURLException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set factoryURL failed, " + e); } try { factoryPort = BaseRFTClient.rftFactoryLocator .getReliableFileTransferFactoryPortTypePort(factoryURL); } catch (ServiceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set factoryPort failed, " + e); } setSecurityTypeFromURL(factoryURL); return populateRFTEndpoints(factoryPort); }  protected void setRequest(BaseRequestType request) throws GATInvocationException { CreateReliableFileTransferInputType input = new CreateReliableFileTransferInputType(); if (request instanceof TransferRequestType) { input.setTransferRequest((TransferRequestType) request); } else { input.setDeleteRequest((DeleteRequestType) request); }  Calendar termTimeDel = Calendar.getInstance(); termTimeDel.add(Calendar.MINUTE, TERM_TIME); input.setInitialTerminationTime(termTimeDel); CreateReliableFileTransferOutputType response = null; try { response = factoryPort.createReliableFileTransfer(input); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set createReliableFileTransfer failed, " + e); } EndpointReferenceType reliableRFTEndpoint = response .getReliableTransferEPR(); ReliableFileTransferPortType rft = null; try { rft = BaseRFTClient.rftLocator .getReliableFileTransferPortTypePort(reliableRFTEndpoint); } catch (ServiceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: getReliableFileTransferPortTypePort failed, " + e); } setStubSecurityProperties((Stub) rft); subscribe(rft); Calendar termTime = Calendar.getInstance(); termTime.add(Calendar.MINUTE, TERM_TIME); SetTerminationTime reqTermTime = new SetTerminationTime(); reqTermTime.setRequestedTerminationTime(termTime); try { rft.setTerminationTime(reqTermTime); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: setTerminationTime failed, " + e); } try { rft.start(new Start()); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: start failed, " + e); } }  private void setSecurityTypeFromURL(URL url) { if (url.getProtocol().equals("http")) { securityType = Constants.GSI_SEC_MSG; } else { Util.registerTransport(); securityType = Constants.GSI_TRANSPORT; } }  private void setStubSecurityProperties(Stub stub) { ClientSecurityDescriptor secDesc = new ClientSecurityDescriptor();  if (this.securityType.equals(Constants.GSI_SEC_MSG)) { secDesc.setGSISecureMsg(this.getMessageProtectionType()); } else { secDesc.setGSITransport(this.getMessageProtectionType()); }  secDesc.setAuthz(getAuthorization());  if (this.proxy != null) { // set proxy credential secDesc.setGSSCredential(this.proxy); }  stub._setProperty(Constants.CLIENT_DESCRIPTOR, secDesc); }  public Integer getMessageProtectionType() { return (this.msgProtectionType == null) ? RFTGT4FileAdaptor.DEFAULT_MSG_PROTECTION : this.msgProtectionType; }  public Authorization getAuthorization() { return (authorization == null) ? DEFAULT_AUTHZ : this.authorization; }  private EndpointReferenceType populateRFTEndpoints( ReliableFileTransferFactoryPortType factoryPort) throws GATInvocationException { EndpointReferenceType[] delegationFactoryEndpoints = fetchDelegationFactoryEndpoints(factoryPort); EndpointReferenceType delegationEndpoint = delegate(delegationFactoryEndpoints[0]); return delegationEndpoint; }

private EndpointReferenceType delegate( EndpointReferenceType delegationFactoryEndpoint) throws GATInvocationException { GlobusCredential credential = null; if (this.proxy != null) { credential = ((GlobusGSSCredentialImpl) this.proxy) .getGlobusCredential(); } else { try { credential = GlobusCredential.getDefaultCredential(); } catch (GlobusCredentialException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } }  int lifetime = DEFAULT_DURATION_HOURS * 60 * 60;  ClientSecurityDescriptor secDesc = new ClientSecurityDescriptor(); if (this.securityType.equals(Constants.GSI_SEC_MSG)) { secDesc.setGSISecureMsg(this.getMessageProtectionType()); } else { secDesc.setGSITransport(this.getMessageProtectionType()); } secDesc.setAuthz(getAuthorization());  if (this.proxy != null) { secDesc.setGSSCredential(this.proxy); }  // Get the public key to delegate on. X509Certificate[] certsToDelegateOn = null; try { certsToDelegateOn = DelegationUtil.getCertificateChainRP( delegationFactoryEndpoint, secDesc); } catch (DelegationException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } X509Certificate certToSign = certsToDelegateOn[0];  // FIXME remove when there is a DelegationUtil.delegate(EPR, ...) String protocol = delegationFactoryEndpoint.getAddress().getScheme(); String host = delegationFactoryEndpoint.getAddress().getHost(); int port = delegationFactoryEndpoint.getAddress().getPort(); String factoryUrl = protocol + "://" + host + ":" + port + BASE_SERVICE_PATH + DelegationConstants.FACTORY_PATH;  // send to delegation service and get epr. EndpointReferenceType credentialEndpoint = null; try { credentialEndpoint = DelegationUtil.delegate(factoryUrl, credential, certToSign, lifetime, false, secDesc); } catch (DelegationException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } return credentialEndpoint; }  public EndpointReferenceType[] fetchDelegationFactoryEndpoints( ReliableFileTransferFactoryPortType factoryPort) throws GATInvocationException {  GetMultipleResourceProperties_Element request = new GetMultipleResourceProperties_Element(); request .setResourceProperty(new QName[] { RFTConstants.DELEGATION_ENDPOINT_FACTORY }); GetMultipleResourcePropertiesResponse response; try { response = factoryPort.getMultipleResourceProperties(request); } catch (RemoteException e) { e.printStackTrace(); throw new GATInvocationException( "RFTGT4FileAdaptor: getMultipleResourceProperties, " + e); } SOAPElement[] any = response.get_any();  EndpointReferenceType epr1 = null; try { epr1 = (EndpointReferenceType) ObjectDeserializer.toObject(any[0], EndpointReferenceType.class); } catch (DeserializationException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: ObjectDeserializer, " + e); } EndpointReferenceType[] endpoints = new EndpointReferenceType[] { epr1 }; return endpoints; }  synchronized void setStatus(OverallStatus status) { this.status = status; }  public int transfersActive() { if (status == null) { return 1; } return status.getTransfersActive(); }  public int transfersFinished() { if (status == null) { return 0; } return status.getTransfersFinished(); }  public int transfersCancelled() { if (status == null) { return 0; } return status.getTransfersCancelled(); }  public int transfersFailed() { if (status == null) { return 0; } return status.getTransfersFailed(); }  public int transfersPending() { if (status == null) { return 1; } return status.getTransfersPending(); }  public int transfersRestarted() { if (status == null) { return 0; } return status.getTransfersRestarted(); }  public boolean transfersDone() { return (transfersActive() == 0 && transfersPending() == 0 && transfersRestarted() == 0); }  public boolean transfersSucc() { return (transfersDone() && transfersFailed() == 0 && transfersCancelled() == 0); }  /* * private BaseFaultType getFaultFromRP(RFTFaultResourcePropertyType fault) { * if (fault == null) { return null; } * * if (fault.getRftTransferFaultType() != null) { return * fault.getRftTransferFaultType(); } else if * (fault.getDelegationEPRMissingFaultType() != null) { return * fault.getDelegationEPRMissingFaultType(); } else if * (fault.getRftAuthenticationFaultType() != null) { return * fault.getRftAuthenticationFaultType(); } else if * (fault.getRftAuthorizationFaultType() != null) { return * fault.getRftAuthorizationFaultType(); } else if * (fault.getRftDatabaseFaultType() != null) { return * fault.getRftDatabaseFaultType(); } else if * (fault.getRftRepeatedlyStartedFaultType() != null) { return * fault.getRftRepeatedlyStartedFaultType(); } else if * (fault.getTransferTransientFaultType() != null) { return * fault.getTransferTransientFaultType(); } else { return null; } } */  /* * private BaseFaultType deserializeFaultRP(SOAPElement any) throws * Exception { return getFaultFromRP((RFTFaultResourcePropertyType) * ObjectDeserializer .toObject(any, RFTFaultResourcePropertyType.class)); } */  void setFault(BaseFaultType fault) { this.fault = fault; } }

package tutorial;

import org.gridlab.gat.GAT;import org.gridlab.gat.GATContext;import org.gridlab.gat.URI;import org.gridlab.gat.io.File;

public class RemoteCopy { public static void main(String[] args) throws Exception { GATContext context = new GATContext();

URI src = new URI(args[0]); URI dest = new URI(args[1]); File file = GAT.createFile(context, src);

file.copy(dest); GAT.end(); }}

Ibis as ‘Glue’

● Use IPL + SmartSockets, generally for wide-area communication

● Linking up separate ‘activities’ of an application● Activities: often largely ‘independent’ tasks implemented in any

popular language or model (e.g. C/MPI, CUDA, Fortran, Java…)

● Each typically running on a single GPU/node/Cluster/Cloud/…● Automatically circumvent connectivity problems

● Example:

With SmartSockets: No SmartSockets:

Ibis as ‘HPC Solution’

● Use Ibis as replacement for e.g. C++/MPI code● Benefits:

● (better) portability● malleability (open world)● fault-tolerance● (run-time) task migration

● Downside:● requires recoding

● Comparable speedups:

C++/MPI

Sockets + SSH Tunneling

SSH

● Code pre-installed at each cluster site● Instable / faulty communication● Connectivity problems ● Execution on each cluster ‘by hand’

MMCA: Situation in 2004/2005

Parallel

Horus

Client

Parallel

Horus

Server

Parallel

Horus

Client

C++/MPI

Sockets + SSH Tunneling

JavaGAT + IbisDeploy

Phase 1: Ibis as ‘Master Key’ (2006)

Parallel

Horus

Client

Parallel

Horus

Server

Parallel

Horus

Client

● Code pre-installed at each cluster site● Instable / faulty communication● Connectivity problems ● Execution on each cluster ‘by hand’

C++/MPI

IPL + SmartSockets

JavaGAT + IbisDeploy

Phase 2: Ibis as ‘Glue’ (2006/2007)

Parallel

Horus

Client

Parallel

Horus

Server

Parallel

Horus

Client

● Code pre-installed at each cluster site● Instable / faulty communication● Connectivity problems ● Execution on each cluster ‘by hand’

Ibis/Java

IPL + SmartSockets

JavaGAT + IbisDeploy

Phase 3: Ibis as ‘HPC Solution’ (2008)

Parallel

Jorus

Client

Parallel

Jorus

Server

Parallel

Jorus

Client

● Code pre-installed at each cluster site● Instable / faulty communication● Connectivity problems ● Execution on each cluster ‘by hand’

‘Master Key’ + ‘Glue’ + ‘HPC’

● Step-wise conversion to 100% Ibis / Java● Phase 1: JavaGAT as ‘Master Key’● Phase 2: IPL + SmartSockets as ‘Glue’● Phase 3: Ibis as ‘HPC Solution’● After each phase a fully functional, working solution was available!

● Eventual result:● ‘wall-socket computing from a memory stick’● Remember: the ‘Promise of the Grid’?● Awards at AAAI 2007 and CCGrid 2008

100% Ibis Implementation (2008++)

Seinstra, Maassen, Drost et al. (SCALE 2008 @ CCGrid 2008: First Prize Winner)Bal, Maassen, Drost, Seinstra et al. (IEEE Computer, Aug 2010)

Some current work

● NWO Smart Energy Systems project with Univ. of Amsterdam (Cees de Laat) & SARA

● How to map high-performance applications onto hybrid distributed computing system, taking both performance & energy consumption into account

● System-level approach to reduce HPC energy consumption

Green Clouds

DAS-4: infrastructure for Green IT

Dual quad-core Xeon E5620 Various accelerators (GPUs, multicores, ….)Scientific LinuxBuilt by ClusterVision

VU (74)

TU Delft (32) Leiden (16)

UvA/MultimediaN (16/36)

SURFnet6

10 Gb/s lambdasASTRON (23)

● Adapt resources to application needs dynamically, accounting for computational & energy efficiency

● Using Ibis malleability support

● Exploit hardware diversity● Graphics Processing Units (GPUs) have much higher FLOPS/Watt

for many applications

● Use optical and photonic networks● Build a knowledge base & semantic infrastructure

description

Main ideas

Other current PhD projects using Ibis

● Distributed reasoning over semantic web data● WebPIE: Parallel reasoner on Web scale● Written in Java, uses Hadoop (MapReduce)

● Graph applications (HiPG)● E.g. for bioinformatics applications● http://www.graph500.org/

● Games & distributed model checking● Deal with large state space

● Distributed smart phone applications● Computation & communication offloading to a cloud

Conclusions

● Ibis enables problem solving (avoids system fighting)

● Successfully applied in many domains● Astronomy, multimedia analysis, climate modeling,

remote sensing, semantic web, medical imaging, …● Data intensive, compute intensive, real-time…

● Open source, download:● www.cs.vu.nl/ibis/

Conclusions (2)

● Jungle Computing is hard

● High-Performance Jungle Computing even harder

● While research into efficient & green Jungle-aware programming models has only just begun…

● …Ibis provides the basic functionality to efficiently & transparently overcome most Jungle Computing complexities

Omitted

General Requirements

● Resource independence● Transparent / easy deployment

● Middleware independence & interoperability● Jungle-aware middleware

● Jungle-aware communication● Robust connectivity● System-support for malleability and fault-tolerance● Globally unique naming

● Transparent parallelism & application-level fault-tolerance● Easy integration with external software (legacy codes)

● MPI, CUDA, C++, Python, Java, Fortran, …

● Need for user-friendly programming tools● Shield domain-experts from all complexities of parallel, distributed,

heterogeneous, and hierarchical computing● Familiar (sequential) programming model(s)

Solution:tool to make

parallel & distributed computing

transparent to user

- familiar programming- easy execution

Jungle Computing Systems

User

Multimedia Content Analysis (MMCA)

“The Future of Ibis”

● Ibis/Constellation:● Generalized programming framework for

‘all’ Jungle Computing applications● Automatically maps any application activity (task)

onto any appropriate executor (HW)● By way of ‘contexts’, for example:

● Activity's context: “I need a GPU”● Executor’s context: “I represent a GPU”

● Note:● Activities may represent any type of task:

● Also legacy codes, scripts, 3rd party software, …


Recommended