Towards negotiable SLA-based QoS Support for Data Services
ESSI Seminar
UPC, 11th May 2010
Jesús Bisbal Universitat Pompeu Fabra, Barcelona, Spain
Published in Proceedings of the 10th IEEE International Conference on Grid Computing (Grid 2010), pages 259-265.
Outline
Motivation for domain-specific data QoS Quality of Service (QoS) – Service Level
Objectives (SLOs) QoS Model QoS Negotiation and QoS SLAs QoS Management in Data Mediation Experimental Evaluation Conclusions and future directions
QoS Scenario – Traditional Objectives
I want to pay less than 10 €, I can start simulation today
at noon and I need the results by 3 pm
CFD Client Medical
practitioner
Remote HPC facilities to be used by many different customers/clients Guaranteed response times and price
Resource reservation Capacity/resource estimation
Need to go beyond time and price guarantees: QoS in data services
QoS-aware Grid
Service Negotiation
Blood flow Simulation
Motivation – QoS on Biomedical Data
WSS
3DRA
CFDCFD
PC-MR vs USFlow rates
CF Peak
Mod& W
magnitude phasemagnitude phase
@neurIST project – EU Integrated Project for the ‘Integrated Biomedical Informatics for the Management of Cerebral Aneurysms’ Service-oriented ICT infrastructure providing
On-demand simulation, analysis and data-integration services Handling multi-scale, multi-modal information at distributed resources
Improve decision making processes by integrating all the available information to identify at-risk patients and reducing necessary treatment
Support computational design processes towards a next generation of smart flow-correcting implants and reduce current treatment costs
Support the knowledge discovery for linking genetics to disease, vasospasm and blood clotting after cerebral hemorrhage
Support the integration of modeling, simulation and visualization of multimodal data
Support integration of and access to data and computing resources
@neuEndo
@neuRisk
@neuInfo @neuCompute
@neuFuse
@neuLink
@neurIST Apps
Data ServiceClient
Data Service
DBS
Data Service
DBS
DBS
DBS
Data Mediation approach
Data access and integration Virtualization of heterogeneous
data sources as services Hierarchical composition of data
services Integration of multiple data sources Based on OGSA-DAI, de-facto
standard for data access on the Grid Distributed Query Processing (DQP)
Data mediation services set up manually - Mapping Schemas Large efforts required Future semantic mediation …
OGSA/DAI
GDMS
Relational XML
GDMS Mapping Schema
GDMS Transform. Functions
OGSA-DQP
Evaluation Service
Evaluation Service
Data Service
Evaluation Service
CSV
Data Mediation Service
Data Service Data Service
Virtual DB
Data Service Client
Data Service
DBS
Data Service
DBS
DBS
DBS
Mapping Schema overview
Global-as-View (GAV) mediation approach
1. Definition of Global Schema
2. Mapping rules between the global schema and the integrated schemas
Data Mediation Architecture
Architecture of the Vienna Grid Environment (VGE)
QoS Management for data - new Data Mediation Engine and
Distributed Query Processing (DQP) run on a service hosting environment (Tomcat + Axis)
Query Evaluation Services set up on several hosts (DQP)
Data Sources to be integrated run on separated hosts
Evaluation Service
Host X
DMZ
…
Host 1
…
Host 2
Tomcat
Service Provisioning Environment Deployment Tool
Distributed Data Mediation Service
QoS Management
Data MediationEngine
Distributed QueryProcessing Engine
Fire
wal
l
Apa
che
Web
Bro
wse
r
JK Conn.Fire
wal
l
Logging Security Monitoring
Client API
Upload()Start()
Download
Secu
rity
Logg
ing
Query Compiler
Query Execution Engine
Mediation Schema
Transformation Functions
Estimation Models
Aggreation Functions
Clie
ntD
istr
ibut
edD
ata
Med
iatio
nSe
rvic
eEv
alua
tion
Serv
ices
Med
iate
dD
ata
Sour
ces
Evaluation Service
Host 3
DataService
Host DS1
DataService
Host DS2
DataService
Host DSX
Data Mediation Practice Follows a Best Effort strategy for data services
Queries all services available Applies mapping rules Compiles result
Recall that “The Grid ... uses standard, open, general purpose protocols and interfaces coordinates resources that are NOT subject to centralized control o delivers non-trivial qualities of service” Foster, Kesselman (2002)
Explore the specificities of Qualities of Service within Data Mediation Services Common requirement for advanced scientific applications Defines path to Business Model for typical (scientific) usages Experimentation using the VGE-based data mediation middleware QoS Management prior to initiating data mediation and QDP
Usage of Data Grid Services
Mediator
Data Mediation
…
Data Browsing
Applications
Blood Flow
BioIS Barcelona (HCPB) Geneva (UNIGE) Rotterdam (NAT) Oxford (OXF) Sheffield (STH)
Clinical Sites
…
Virtual DB
Virtualization of distributed and heterogeneous data sources as a large single virtual database (federation of data access)
clincial center/hospital
BioIS
Data is fragmented Amount of relevant data Cost of data access Security/Privacy
Why QoS for Data in this Context?
Biomedical research use-cases Data mining (epidemiology) Content-Based Information Retrieval
(decision support) Atlas generation (population variability)
Mediator
Data Mediation
Data Browsing
Applications
Blood Flow
ClinicalSites
…
Virtual DB
…
BioIS
BioIS
QoS Objectives SLOs for Data
Adapt QoS management from computing services to data services
Service Level Objective (SLO)
Description
Cost Price of query execution, based on pricing model (e.g. constant, function of result size)
Response Time Guarantee response time to retrieve all results, depends on size of query result
Data Cardinality
Cardinality of total subjects (e.g. tuples) returned
Cardinality of reliable / quality (complete) subjects, or level of constraints satisfaction acceptable
Cardinality of queried subjects
Data Diversity Maintain a certain diversity of data sources being queried (providers) – epidemiology
Data Locality Specify the Locality of data access (legal constraints)
SLOs for Data: Monitoring
New SLOs require novel Monitoring – SALMon to identify degradation
Identify Response Time degradation after SLA have been accepted
Data-intensive scientific domains with QoS beyond response time Need to monitor the satisfaction of agreed SLAs for these other
qualities of service
Data Service
QoS Manager
Resources
Service Provider 1
Client Application QoS
Negotiator
Request SELECT x,y,z FROM TABLE A,B,C WHERE CONDITION
QoS Card>100 Price <1€ Diversity>3
Client driven QoS negotiation with potential service providers Client supplies: QoS requirements (e.g. data quality) and data request
Request/Offers are Web Service Level Agreements (WSLA) Individual QoS Management for
each service (and data source)
Ask for WSLA offers
Offer WSLA
Accept/ Reject offers
Card 150 Cost 0,6€ Diversity 4
WSLA offered
Data Service
QoS Manager
Service Provider N
Card 200 Price: 0,8€ Diversity 5
WSLA offered
…
QoS Model for Data Services
Est.
Mod
els
Resources Est.
Mod
els
Data Client
Data Service 1
Data Service 2
requestQoS
getEstimation
getEstimation
requestQoS
confirmQoS
QoS Models
QoS Models
QoS Negotiation and WSLAs
Negotiation follows (multiple rounds of) Request-Offer and finally a confirmation
Based on Web Service Level Agreement (WSLA)
Data Provider 1 Data Provider 2
Data Client
Data Service 1
Data Service 2
requestQoS
getEstimation
getEstimation
requestQoS
confirmQoS
QoS Models
QoS Models
QoS Negotiation and WSLAs <SLA xmlns="http://www.ibm.com/wsla" … > <Parties> <ServiceConsumer> <!–- from certificate --> </ServiceConsumer> </Parties> <ServiceDefinition … name=“BioIS"> <SLAParameter name=“cost" ...> <SLAParameter name="cardinality" ... <SLAParameter name="diversity" ... ... <!–- Metrics for each SLA parameter --> ... ... </ServiceDefinition> <Obligations> <ServiceLevelObjective name="cost"> ... <Expression><Predicate xsi:type="LessEqual"> <SLAParameter>price</SLAParameter> <Value>1</Value> <!–- 1 Euro --> ... <ServiceLevelObjective name="cardinality"> ... <Expression><Predicate xsi:type=„GreaterEqual"> <SLAParameter>cardinality</SLAParameter> <Value>100</Value> <!–- 100 result sets --> <!–- other objectives --> ... </Obligations> </SLA>
Client Data Provider 1 Data Provider 2
Data Client
Data Service 1
Data Service 2
requestQoS
getEstimation
getEstimation
requestQoS
confirmQoS
QoS Models
QoS Models
QoS Negotiation and WSLAs <SLA xmlns="http://www.ibm.com/wsla" … > <Parties> <ServiceConsumer>... <ServiceProvider>... </Parties> <ServiceDefinition … name=“BioIS_UPF"> <SLAParameter name=“cost" ...> <SLAParameter name="cardinality" ... <SLAParameter name="diversity" ... <!–- Metrics for each SLA parameter --> ... <WSDLFile>https://datanode.upf.edu/.../ds?wsdl ... <!–- Definition of service operations --> ... </ServiceDefinition> <Obligations> <ServiceLevelObjective name="cost"> ... <Expression><Predicate xsi:type="Equal"> <SLAParameter>cost</SLAParameter> <Value>0,6</Value> <!–- 0,6 Euro --> ... <ServiceLevelObjective name="cardinality"> ... <Expression><Predicate xsi:type=„Equal"> <SLAParameter>cardinality</SLAParameter> <Value>150</Value> <!–- 150 result sets --> <!–- other objectives --> ... </Obligations> </SLA>
QoS Aggregation of Federated Data Services
Client aggregates QoS from several Data providers to meet SLO Data mediation/federation services aggregate QoS offers Data Mediation Service –
Global Schema
DAS-2 DAS-3 DAS-4 DAS-5 Data Access
Service (DAS-1)
DAS-6
Scenario
SLO Satisfaction condition Aggregation Function Cost ≤ Σ cost(DASi) Response time ≤ max resp(DASi) Cardinality ≥ Σ card(DASi) Diversity ≥/= Σ dive(DASi) Locality = Λ loca(DASi)
QoS Management
Estimation model predicts one or more SLOs Data source specific (relational DBs vs. PACs/DICOM images)
Estimation Models may dependent on prediction of another model
Rq/Query Descriptor
QoS Request
from client
Tot Cardinality Model
Reliable Car. Model
Diversity Model
QoS Manager
QoS Offer
to client
Cost Model
…
Cardinality Estimation Model
QoS request:Set of SLOs
QoS offer:Set of SLOs
Price Estimation Model
LocalityEstimation Model
TimeEstimation Model
QoS Management
Estimation Models may dependent on prediction of another model Challenge of orchestrating the models (direct acyclic graph of models)
Brute force: executing all permutations of models (<5 SLOs) Topology sort to identify model invocation sequence (>5 SLOs)
Rq/Query Descriptor
QoS Request
from client
Tot Cardinality Model
Reliable Car. Model
Diversity Model
QoS Manager
QoS Offer
to client
Cost Model
…
Conflicting objectives, cyclic dependencies - potential solutions: Genetic algorithms
Mixed integer programming and linear programming (MIP/LP)
Answer set programming (ASP)
Sample queries against @neurIST existing (best effort) data services Execute with QoS constraints (cardinality 50 or 100) and without
constraints Messure query execution time
Experimental Evaluation
QoS Support saves up to 60% query execution time
Samples queries sorted by size of their results
Ranging from: Q1 few KBytes to Q20 few MBytes
0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Best effort (no QoS) QoS/50 QoS/100
Compare gain with respect to ‘best effort´ query execution policy
Experimental Evaluation (II)
QoS guarantees the specified constraints (i.e. cardinality of results) But... QoS/100 can be worse... Thus efficient QoS Management and
Negotiation remains challenging
-0,20
0,00
0,20
0,40
0,60
0,80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
QoS/50 QoS/100
Conclusions Domain driven QoS approach, tested in @neurIST sources QoS Negotiation
Request-Offer-Confirmation workflow Aggregation of Service Level Objectives (SLOs)
QoS Management QoS Estimation Models Different orchestration approaches
Future Work Identify synergies with Earth Observation applications (ESA,
www.esa.int/esaEO) for SLOs for data services Evaluate guarantee of other data-SLOs (data diversity, quality,
locality) QoS Support for more heterogenous data resources (different
image modelities, simulation results/models, genetics, etc.) Investigation of more sophisticated QoS Mgmt models
Evaluate resolution of conflicting objectives Cloud infrastructure provision
Thank You
Questions?
Integrated Biomedical Informatics for the Management of Cerebral Aneurysms
Project duration: 2006-2010 (FP 6)
30 Partners
Budget: ~17,5 MEuro
Objectives:
Development of a generic IT infrastructure for the management & processing of heterogeneous data associated with the diagnosis & treatment of cerebral aneurysms.
Transform the management of cerebral aneurysm by providing new insight, personalised risk assessment and methods for the design of improved medical devices and treatment protocols.
www.aneurist.org
The @neurIST Project
Hospital
Hospital
Hospital
Clinicians
eHealth Researcher
Ethical committee
General Practitioner, Patient
Compute resource providers
Data resource providers
Generic Processes: Obtaining relevant clinical information of
patients (EHR – Electronic Health Record) Providing clinical decision support Offering simulation services Creating normalized population-based datasets Providing knowledge discovery services
Compute power for simulations
Patient data confidentiality Data access and integration
Motivation – QoS on Biomedical Data