Introduction to OGSA-DAI
Neil Chue Hong
15th February 2006GGF16, Athens
Data Services: challenges
Scale Many sites, large collections, many uses
Longevity Research requirements outlive technical decisions
Diversity No “one size fits all” solutions will work
Primary Data, Data Products, Meta Data, Administrative data, …
Many Data Resources Independently owned & managed Geographically distributed
and I haven’t even mentioned security yet!
Use Cases for Data Services Data Filtering:
Single source producing large amounts of data distributed to many sites downstream
Data Discovery: many sources, many query entry points in a linked system
Data Translation: source to sink, conversion of data model / structure
Data Federation: many sources, linked to provide view as a single source
Data Replication full or partial copies to improve throughput
Data Integration (model aggregation) e.g. integration of time variant data, streams, files
Data Integration (knowledge expansion) forming links between databases to increase knowledge
Requirements on Data Services? Common Data Model e.g. RowSet Common Query Language(s) e.g. XQuery, SQL Standard access to
data resource schema information physical data resource information for optimisation purposes data resource descriptive information for discovery / integration
Single, seamless security model Dynamic publication and discovery Multiple, efficient delivery methods Move computation towards data Data aggregation functionality Replication information
OGSA-DAI In One Slide An engineered extensible
framework for data access and integration.
Expose heterogeneous data resources to a grid through web services.
Interact with data resources: Queries and updates. Data transformation /
compression Data delivery.
Customise for your project using Additional Activities Client Toolkit APIs Data Resource handlers
A base for higher-level services federation, mining, visualisation,
…
OGSA-DAI Philosophy
We provide the basic, general functionality e.g. querying relational databases, delivery
mechanisms, schema extractors You add the specialist functionality
e.g. map overlays Several well-defined extension points
client toolkit activity plugins data resource accessor model
MySQL
OGSA-DAI service
Engine
SQLQuery
JDBCData
Resources
Activities
DB2
GZip GridFTPXPath
XMLDB
XIndice
readFile
File
SWISSPROT
XSLT
SQLServer
Data-bases
ApplicationApplicationClient ToolkitClient Toolkit
MySQL
OGSA-DAI service
Engine
SQLQuery
JDBC
SQL
JDBC
SQL
JDBC
SQL
JDBC
SQL
JDBC
MultipleSQL GDS
SQLQuery
Distributed Query Processing
Higher level services building on OGSA-DAI
Queries mapped to algebraic expressions for evaluation
Parallelism represented by partitioning queries Use exchange operators
table_scan(protein)
table_scantermID=S92(proteinTerm)
reduce
reduce
hash_join(proteinId)
op_call(Blast)
reduce
exchange
exchange
3,4
1 2
DQP architecture
Co-ordinator
Evaluator Evaluator Evaluator
OGSA-DAI
OGSA-DAI
OGSA-DAI
OGSA-DAI
Query SQL & OQL
OGSA-DAI activity
WS-I only
Using client toolkit
All interfaces that aresupported by toolkit
Map Retrieval: Integration
Using security and extensibility (overlay)
OGCODS 2 GIS OraclePortlet
ODS 1OracleCensus
ODS 3 Application data
SO-OGC
JDBC
SO-OGC
SQL/XML
NGS Authentication
Integrated service for Data & Metadata
Dat
aR
esou
rce
Dat
aR
esou
rce
Storage Manager
Dat
aR
esou
rce
BD messages
Dat
aR
esou
rce
Dat
aR
esou
rce
Metadata Manager
Dat
aR
esou
rce
MD messages
Naming Service
Metadata & Data Service
Client
Dat
aR
esou
rce
Dat
aR
esou
rce
MDS/GridFTP/GSI Integration
Can publish any OGSA-DAI resource property to a local MDS Index Service e.g. databaseSchema, activityTypes information published is on a per-resource basis, and
can differ for each resource Can transfer results via GridFTP rather than via
SOAP still working on tuning options
Can use X509 certificates to secure services but still a coarse grained security by default
Future plans: overview A new version of the OGSA-DAI Engine
better support for concurrency, sessions, monitoring and notification
Implementing new DAIS specifications Key things that we will be addressing:
Performance (particularly format representation and transport) Security Model which can be applied across platforms Transactions provision More data integration facilities
Integration with other components registries (e.g. GRIMOIRES) workflow editors (e.g. Taverna)
Working with new projects e.g. CancerGrid, iSpider, GEODE
Future plans: Performance
WebRowSet is not efficient aim to use ResultSet and
CSV instead where possible
SOAP is not efficient aim to use SOAP
w/Attachments, MTOM
ResultSet to RowSet conversion
WebRowSet is larger
CSV scales better for output
Conversion and validation takes the time work in progress Jan06
From contribution to core
One of a group of projects moving to GlobDev project (more later)
Hope to use this as a way of encouraging collaborations and contributions
Different levels of contributions Based on OGSA-DAI? Works with OGSA-DAI? Part of OGSA-DAI?
Contributing to OGSA-DAI
Additional functionality: Provide activities which implement specific
functionality Provide extra client functionality Provide different security mechanisms Provide higher level components and
applications
Further information The OGSA-DAI Project Site:
http://www.ogsadai.org.uk The DAIS-WG site:
http://forge.gridforum.org/projects/dais-wg/
OGSA-DAI Users Mailing list [email protected]
Formal support for OGSA-DAI releases http://bugzilla.globus.org (OGSA-DAI)
OGSA-DAI training courses (live and online)