Data access and integration with OGSA-DAI:
OGSA-DQP
Steven Lynden
University of Manchester
Data access & integration with OGSA-DAI: GGF 172
Introduction
OGSA-DQP is a service based distributed query processor
It evaluates queries over distributed data sources wrapped by OGSA-DAI
It is built using OGSA-DAI extensibility points People involved:
• University of Manchester• Tasos Gounaris, Steven Lynden, Alvaro Fernandes, Rizos Sakellariou,
Norman Paton
• University of Newcastle• Jim Smith, Arijit Mukherjee, Paul Watson
• OGSA-DAI Prototype release 3.0 available from the OGSA-DAI
website• Install on OGSA-DAI WSRF/WS-I 2.1
Data access & integration with OGSA-DAI: GGF 173
OGSA-DQP high-level overview
OGSA-DQP uses a middleware approach.
It can be seen as a mediator over OGSA-DAI wrappers.
Usability: use it as an OGSA-DAI data service.
DQP is capable of planning, scheduling and executing in parallel the distributed queries
Calls to analysis (Web) services can be declared within queries and invoked by DQP.
DBMS
data
OGSA-DQP
Query Results
OGSA-DAI
OGSA-DAI
DBMS
data
Data access & integration with OGSA-DAI: GGF 174
Using OGSA-DQP
All interactions are client-server based Firstly, configure OGSA-DQP by specifying the
data sources and analysis services to be used (administration)
DQP creates a global schema which can then be used to formulate queries
The user may then submit queries Infrastructural requirements:
- OGSA-DAI-wrapped relational databases
- Analysis services (optional)
- Evaluation infrastructure
Data access & integration with OGSA-DAI: GGF 175
OGSA-DQP architecture
OGSA-DAIdata service
perform
EvaluatorQE
EvaluatorQE
EvaluatorQE
The “OGSA-DQP service”, Grid Distributed Query Service (GDQS) AKA “Coordinator”
AKA Grid Query Evaluation Service (GQES)
DQP activities installed
Data access & integration with OGSA-DAI: GGF 176
OGSA-DQP architecture
DQP evaluator services:• Are plain Web services• Implement the QueryEvaluation port type:
•evaluate – the input is a query plan partition which is subsequently executed
•receiveData – allows the evaluator to receive data from other evaluators
OGSA-DAI extensions:• DQP resource – a resource which encapsulates a distributed
query infrastructure: DQP evaluator services, OGSA-DAI data services etc. Implemented as a data resource accessor.
• OQL query statement activity – enables the submission of a query in Object Query Language (OQL)
• DQP factory activity – enables the creation and configuration of DQP resources.
Data access & integration with OGSA-DAI: GGF 177
Example query
Given two DBMSs and one analysis tool (i.e., a Web service):• goTerm : a GO Gene Ontology table in a remote mySQL DB, exposed by an
OGSA-DAI data service• protein : a table in a protein sequence DB, exposed by an OGSA-DAI
data service• Blast (sequence alignment scoring Web service);
We want to obtain alignment scores for a sequence against proteins of a certain kind
The user submits a single query referencing data stored at multiple sites. The author of the query need not be aware of how/where data is stored. Queries are written in Object Query Language (OQL):
select p.proteinId, Blast(p.sequence)from protein p, goTerm twhere t.termId = ‘GO:0005942’ and p.proteinId=t.proteinId
Data access & integration with OGSA-DAI: GGF 178
Background: OQL
Why?• OGSA-DQP is based on a parallel distributed query processor
for object databases (Polar*)• The standard query language of object databases is OQL
Polar* is still used by DQP to parse, optimise and schedule queries
Instead of querying object databases, we are now querying relational databases
OQL queries are compiled by Polar* into distributed query plans.
During the execution of the query plan, DQP will query relational data sources using SQL.
Data access & integration with OGSA-DAI: GGF 179
Client interaction with OGSA-DQP
Two main client/server interactions:
1. Configuration: the client sends a perform document requesting the service to create a DQP data service resource
2. Query submission: the client sends a perform document requesting the service to execute an Object Query Language (OQL) query, using a DQP data service resource created in (1)
The data service resource created in (1) encapsulates the distributed query infrastructure used to execute queries. Differs from the typical OGSA-DAI data service resources e.g. relational data service resource
Data access & integration with OGSA-DAI: GGF 1710
DQP configuration
OGSA-DAIdata serviceperform
<perform><DQPFactory>Evaluator URLsOGSA-DAI data service resourcesWeb service URLs</DQPFactory> </perform> OGSA-DAI
data serviceGetRP
DQP factory activity
OGSA-DAIdata service
GetRP
creates
DQP DSR • Global schema of imported DBs & analysis services• Set of evaluators that can be used• Physical DB metadata (used to optimise queries)
Result: resource ID of created DSR
Data access & integration with OGSA-DAI: GGF 1711
DQP query evaluation
OGSA-DAIdata serviceperform
<perform><OQLQueryStatement><expression>OQL query</expression></OQLQueryStatement> </perform>
OGSA-DAIdata service
perform
OQLQueryStatement
DQP DSR
EvaluatorQE
transport
OGSA-DAIdata service
perform
Analysisservice
. . .EvaluatorQE
EvaluatorQE
Result: WebRowSet XML Stream
Data access & integration with OGSA-DAI: GGF 1712
Interacting with an OGSA-DQP service
Three options:• A command line client
• Allows configuration and query submission via the execution of Apache Ant scripts
• Client toolkit classes• Allow you to integrate OGSA-DQP into your own
applications
[The above utilities are part of the main OGSA-DQP download]
• GUI client
Data access & integration with OGSA-DAI: GGF 1713
Command-line client
Configuration example:
$ ant factory
-Ddqp.config.file=config.xml
-Durl=http://rpc122.cs.man.ac.uk/axis/services/service1
-Dresource.id=dqp-factory
Querying the global schema – example:
$ ant getschemas
-Durl=http://rpc122.cs.man.ac.uk/axis/services/service1
-Dresource.id=ogsadai-911acvd122
Data access & integration with OGSA-DAI: GGF 1714
Command-line client
Query submission example:
$ ant query
-Durl=http://rpc122.cs.man.ac.uk/axis/services/service1
-Dresource.id=ogsadai-911acvd122
-Dclient.query=“%print select i.id from i in go_goterms;”
-Dclient.output.file=results.xml
Results will be saved as a WebRowSet, the standard XML representation of relational results used by OGSA-DAI
Data access & integration with OGSA-DAI: GGF 1715
Client toolkit classes
Client toolkit classes are provided for the activities contributed by OGSA-DQP:•GDQSFactory class used to construct DQPFactory activities
•OQLQuery class used to construct OQLQueryStatement activities
The client toolkit allows the integration of DQP with other applications and seamless interaction with the OGSA-DAI client toolkit
OGSA-DQP client toolkit is Java only…
Data access & integration with OGSA-DAI: GGF 1716
Query execution using client toolkit
1 GenericServiceFetcher fetcher = GenericServiceFetcher.getInstance();2 DataService service = fetcher.getDataService(url,resourceID);3 OQLQuery oqlQuery = new OQLQuery(query);4 OutputStreamActivity outputStream = new OutputStreamActivity();5 outputStream.setInput( oqlQuery.getOutput() );6 ActivityRequest request = new ActivityRequest();7 request.add( oqlQuery );8 service.perform(request);9 oqlQuery.getResultSet();10 java.sql.ResultSet rs = outputStream.getResultSet();
Data access & integration with OGSA-DAI: GGF 1717
Demo: The GUI Client
The GUI allows you to:• Interact with OGSA-DQP services. The GUI is pre-configured
with the URL of a OGSA-DQP service we have deployed at EPCC.
• View the configuration parameters of DQP data service resources
• View the global schema maintained by a DQP data service resource
• Submit OQL queries to DQP data service resources
• View the results of queries
• View graphical and XML representations of query plans
Data access & integration with OGSA-DAI: GGF 1718
Services @ Newcastle University
Evaluatorservice
giga01.ncl.ac.uk
OGSA-DAI dataservice
GO Term DB
Evaluatorservice
giga02.ncl.ac.uk
OGSA-DAI dataservice
Protein interaction DB
Evaluatorservice
giga03.ncl.ac.uk
OGSA-DAI dataservice
Protein Term DB
Evaluatorservice
giga04.ncl.ac.uk
OGSA-DAI dataservice
Protein property DB
Data access & integration with OGSA-DAI: GGF 1719
Services @ Newcastle University
Evaluatorservice
giga05.ncl.ac.uk
OGSA-DAI dataservice
Protein Sequence DB
Evaluatorservice
giga06.ncl.ac.uk
Evaluatorservice
giga07.ncl.ac.uk
Evaluatorservice
giga08.ncl.ac.uk
Evaluatorservice
giga09.ncl.ac.uk
Entropyanalyserservice
Data access & integration with OGSA-DAI: GGF 1720
Database tables
name length sqlTypeName
id 32 varchar
type 55 varchar
name 255 varchar
GO Terms extent name: “goterms_goterms”
Protein interactionsExtent name: “interaction_protein_interactions”
name length sqlTypeName
ORF1 50 varchar
ORF2 50 varchar
baitProtein 50 varchar
interactionType 5 varchar
repeats 11 int
experimenter 100 varchar
Data access & integration with OGSA-DAI: GGF 1721
Database tables
Protein terms extent name: “protein_term_protein_goterm”
name length sqlTypeName
ORF 55 varchar
molecularWeight 12 float
hydrophobicity 12 float
Protein propertiesextent name: “protein_property_protein_propertys”
name length sqlTypeName
ORF 50 varchar
sequence 65535 text
Protein sequenceextent name: “protein_sequence_protein_sequences”
name length sqlTypeName
ORF 55 varchar
GOTermIdentifier
32 varchar
Data access & integration with OGSA-DAI: GGF 1722
DQP service @ EPCC
OGSA-DAI data service
DQP factory
GIGA resource
test.ogsadai.org.uk
Encapsulates the distributed query environment deployed at Newcastle
ogsadai-1092f60c1e1
Data access & integration with OGSA-DAI: GGF 1723
Conclusion
OGSA-DQP is a service based distributed query processor that is:
• Exposed as a service
• Implemented as an orchestration of services It provides an example of how the OGSA-DAI extensibility points can be
used…• The activity extensibility points are used
• New data resource accessors are implemented
• Dynamic resource deployment is used during configuration to create new resources
Benefits:• OGSA-DAI manages activity concurrency – we didn’t need to write concurrent
code
• OGSA-DQP can take advantage of the host of delivery options provided by OGSA-DAI
• OGSA-DQP is insulated from multiple platforms (WS-I, WSRF) by OGSA-DAI