2http://www.ogsadai.org.uk
The OGSA-DAI Project
A generic framework for integrating data access and computation– Uniform interface to relational, XML, flat file data resources
Using the grid to take specific classes of computation nearer to the data
Kit of parts for building tailored access and integration applications
Investigations to inform DAIS-WG One reference implementation for DAIS Releases publicly available NOW
3http://www.ogsadai.org.uk
Project Partners
Powered by ….
Funded by the Grid Core Programme
4http://www.ogsadai.org.uk
Project Membership
Principal Investigators
Project Manager
Programme Management Board Chair
Technical Review Board Chair
Research Team
IBM Dissemination TeamEPCC Team
Charaka CharakaMike Ally AmyMario
Malcolm
Kostas
Norman Paul
Neil
Andy Simon BrianDave PatrickNeil
IBM Development Team
6http://www.ogsadai.org.uk
Project Status
Current release 4.0– Globus Toolkit 3.2 compliant– Platform and language independent
• Java 1.4• Document model
Work concentrated on data access– Wraps data resources without hiding underlying data
model– Provide base for higher-level services
• Distributed Query Processing (DQP)• Data federation services
7http://www.ogsadai.org.uk
Supported Data Resources
Relational XML Other
MySQL Xindice Files DB2 eXist ?Oracle PostgreSQL SQLServer
8http://www.ogsadai.org.uk
Web Service Architecture
Service Registry
Service Consumer
Service Provider
Publish
Bind
Disc
over
9http://www.ogsadai.org.uk
OGSA-DAI Service Architecture
DAISGR
Service Consumer
GDSFGDS
Publish
Bind
Disc
over
10http://www.ogsadai.org.uk
OGSA-DAI Services
OGSA-DAI uses three main service types– DAISGR (registry) for discovery– GDSF (factory) to represent a data resource– GDS (data service) to access a data resource
This will change
acce
sses
represents
DAISGR GDSF GDS
DataResource
locates creates
11http://www.ogsadai.org.uk
GDSF and GDS
Grid Data Service Factory (GDSF)– Represents a data resource– Persistent service
• Currently static (no dynamic GDSFs)
– Cannot instantiate new services to represent other/new databases
– Exposes capabilities and metadata– May register with a DAISGR
Grid Data Service (GDS)– Created by a GDSF– Generally transient service– Required to access data resource– Holds the client session
13http://www.ogsadai.org.uk
DAISGR
DAI Service Group Registry (DAISGR)– Persistent service– Based on OGSI ServiceGroups– GDSFs may register with DAISGR– Clients access DAISGR to discover
• Resources• Services (may need specific capabilities)
– Support a given portType or activity
14http://www.ogsadai.org.uk
Analyst
RegistryDAISGR
FactoryGDSF
registerServicefindServiceData
findServiceData
Data resource publication through registry Data location hidden by factory Data resource meta data available through
Service Data Elements
Location
15http://www.ogsadai.org.uk
Interaction Model: Start up
OGSI Container
OGSI Container
GDSF
DAISGR1. Start OGSI containers with persistent services.2. Here GDSF represents Frog database.
16http://www.ogsadai.org.uk
Interaction Model: Registration
OGSI Container
OGSI Container
GDSF
DAISGR3. GDSF registers with DAISGR.
Frogs: GSH
17http://www.ogsadai.org.uk
Interaction Model: Discovery
OGSI Container
OGSI Container
GDSF
DAISGR4. Client wants to know about frogs. Can: (i) Query the GDSF directly if known or(ii) Identify suitable GDSF through DAISGR.
Frogs: GSH
Mmmmm…
Frogs?
Find
Serv
ice:
Fro
gsGSH
: GDSF
18http://www.ogsadai.org.uk
Interaction Model: Service Creation
OGSI Container
OGSI Container
GDSF
DAISGR5. Having identified a suitable GDSF client asks a GDS to be created.Frogs: GSH
GDS
CreateService
GSH: GDS
19http://www.ogsadai.org.uk
Interaction Model: Perform
OGSI Container
OGSI Container
GDSF
DAISGR
6. Client interacts with GDS by sending Perform documents.7. GDS responds with a
Response document.8. Client may terminate GDS
when finished or let it die naturally.
Frogs: GSH
GDSPerform Document
Response Document
20http://www.ogsadai.org.uk
Interaction Model: Summary
Only described an access use case– Client not concerned with connection mechanism– Similar framework could accommodate service-service
interactions
Discovery aspect is important– Probably requires a human– Needs adequate definition of metadata
• Definitions of ontologies and vocabularies - not something that OGSA-DAI is doing …
21http://www.ogsadai.org.uk
More Complex Behaviour
Data Resource
Container
Client GDSGDT
Data Resource
Container
GDS
GDT
Deliver data back to the client.
Data Resource
Deliver data to
a third
party.
Deliver data another GDS.
And there's a lot more that you can do …
22http://www.ogsadai.org.uk
Usage Patterns
GA
Q
S+R
Data
Q - QueryD - DeliveryS - StatusR - ResultU - UpdateI - Data id
Q+D
A
C
GS
R
G
C
A
Q
S
D
R
A G
Q+U
S
Retrieve Update/Insert Pipeline
G2=C
G1=P
A I
Q1
S2
S1
U/R
Q2+D
Q1+D
G2=C
A
G1=P
S2
S1
Q2
U/R
Actors
- OGSI process - Non-OGSI processA - AnalystC - ConsumerG - GDSP - Producer
CallResponse
Data Flow
A
PG
U
IQ
S
A
PG
U
I
S
Q+D
23http://www.ogsadai.org.uk
Project Using OGSA-DAI
24http://www.ogsadai.org.uk
Projects Using OGSA-DAI
OGSA-DAI(http://www.ogsadai.org.uk)
AstroGrid(http://www.astrogrid.org/)
BioSimGrid(http://www.biosimgrid.org/)
BioGrid(http://www.biogrid.jp/)
Bridges(http://www.brc.dcs.gla.ac.uk/projects/bridges/)
eDiaMoND (http://www.ediamond.ox.ac.uk/)
FirstDig(http://www.epcc.ed.ac.uk/~firstdig/)
GeneGrid(http://www.qub.ac.uk/escience/projects.php#genegrid)
GEON(http://www.geongrid.org/)
IU RGRBench(http://www.cs.indiana.edu/~plale/projects/RGR/OGSA-DAI.html)
myGrid(http://www.mygrid.org.uk/)
N2Grid(http://www.cs.univie.ac.at/institute/index.html?project-80=80)
ODD-Genes(http://www.epcc.ed.ac.uk/oddgenes/)
OGSA-WebDB(http://www.gtrc.aist.go.jp/dbgrid/)
INWA(http://www.epcc.ed.ac.uk/)
25http://www.ogsadai.org.uk
Project classification
OGSA-DAI
BiologicalSciences
PhysicalSciences
Commercial Applications
ComputerSciences
• FirstDig
• INWA
• Bridges • AstroGrid
• BioSimGrid• BioGrid
• eDiamond• myGrid
• ODD-Genes
• N2Grid
• GEON
• MCS
• IU RGBench
• OGSA Web-DB
• GeneGrid
• GridMiner
26http://www.ogsadai.org.uk
Points to Note
Feedback from users largely positive– Good suggestions– Fair criticisms– How OGSA-DAI is being used– Where it succeeds and where it fails– Helping us to capture requirements
Hope to allow user contributions– Plan to establish a policy/framework for this
Engage more with User Community– Meetings scheduled for this year
• OGSA-DAI mini-workshop at AHM 2004• OGSA-DAI tutorials at various meetings/locations
27http://www.ogsadai.org.uk
e-Digital MammOgraphy National Database– Mammogram - X-ray of the breast
Built prototype of a national database of mammographic images – In support of the UK Breast screening
programme
Employed Grid technologies to facilitate process
Thanks to eDiaMonND project and the Digital Database for Screening Mammography
for this image.
28http://www.ogsadai.org.uk
Breast screening in the UK began in 1988– Women aged 50-64 screened every 3 Years– Women aged 50-70 from 2004– 1 View/Breast → 2 views by 2003
UK has– Over 90 Breast screening units throughout the UK– Each one deals with about 45000 women on average p.a.
Each centre sees 5000-20000 images/year In 2001-02 → 2002-03
– Screened: 1.4M → 1.5M – Recalled for Assessment : 77911 → 79441 – Cancers detected : 10003 → 10467– Lives per year Saved: 300 → 1250 (by 2010)
Distributed team of doctors perform the analysis
29http://www.ogsadai.org.uk
DB2 ContentManager
DB2 ContentManager
DB2 ContentManager
DB2 ContentManager
DB2 Federation
OGSA-DAI OGSA-DAI OGSA-DAI OGSA-DAI
Database Files
OGSA-DAI
Core Services
Core Services
Core Services
Core Services
DataLoad
TrainingApp
TrainingServices
UCLKCL UEDCHU
CoreAPI
TrainingAPI
TrainingApplication
Core & Training API
OGSA-DAI
DataLoad
TrainingApp
Core & Training API
DataLoad
TrainingApp
Core & Training API
DataLoad
TrainingApp
Core & Training API
30http://www.ogsadai.org.uk
eDiaMoND Findings:– OGSA-DAI provides a flexible framework– Dynamically configure the system through discovery– Activities can operate with different levels of granularity– Federation can be introduced at various levels– Good documentation on how to extend the framework
• Extended Activities to access IBM DB2 Content Manager
– Changes between versions broke some things• Low level XML issues
31http://www.ogsadai.org.uk
FirstDIG
Data mining with the First Transport Group, UK– Example: “When buses are more than 10 minutes late there is an
82% chance that revenue drops by at least 10%”– "The results of this exercise will revolutionise the way we do
things in the bus industry.“, Darren Unwin, Divisional Manager, First South Yorkshire.
OGSA-DAIOGSA-DAI OGSA-DAIOGSA-DAI
OGSA-DAI Client Application
Data Mining Application
32http://www.ogsadai.org.uk
INWA
Innovation Node: Western Australia– Informing Business & Regional Policy:
Grid-enabled fusion of global data and local knowledge
Project– Run from Nov 2003 - Aug 2004– Involved 10 partners (6 UK + 4 Australia)
Aim– Data mine commercially sensitive data– Security an absolute MUST– Employ Grid technologies– Need access to data and computational resources
Demonstrator using:– OGSA-DAI
• Incorporate data resources
– Sun DCG's TOG (Transfer-queue Over Globus)• Handle job submission to analyse micro array data
33http://www.ogsadai.org.uk
user@australia
Curtin,Australia
EPCC,UK
INWA
Grid Engine
Bank Telco
Grid Engine
Bank Telco
OGSA-DAI OGSA-DAI
OGSA-DAI OGSA-DAI
TOG
TOG
Data Browser
Data Browser
user@edinburgh
Telco data
Bank data
Australian property
UK Property
34http://www.ogsadai.org.uk
INWA: Lessons Learned
Performing Data Integration:– TimeZone date problems
Security issues:– Bugs in
• JavaCoG in GT3• OGSA-DAI could not switch security for Grid data transfers• TOG had no security option
– All of these have been fixed
Middleware not mature enough for commercial deployment
35http://www.ogsadai.org.uk
Why OGSA-DAI?
Why use OGSA-DAI over JDBC?– Can embed additional functionality at the service end
• Transformations, compressions• Third party delivery• The extensible activity framework
– Avoiding unnecessary data movement– Common interface to heterogeneous data resources
• Relational, XML databases, and files
– Usefulness of the Registry for service discovery• Dynamic service binding process• Provision of good meta-data is necessary
– Language independence at the client end• Do not need to use Java
– Platform independence• Do not have to worry about connection technology, drivers, etc