Date post: | 15-Dec-2015 |
Category: |
Documents |
Upload: | willow-sauceman |
View: | 214 times |
Download: | 0 times |
Data Access & Integration in the ISPIDER Proteomics
Grid
L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S.
Hubbard, S. M. Embury, N. W. Paton
Overview
• The ISPIDER project• Data Access & Integration of
Proteomics Resources•Challenges•Middleware•Proteomics resources & global schema•System architecture & query
processing
• Future Work
ISPIDER
• Project Goals:
• Build an integrated platform of proteomic resources
• Use existing resources – produce new ones
• Create clients for querying, visualisation, etc.
ISPIDER
• Objective: develop an integrated platform of proteome-related resources, using existing standards
• Benefits:• Access to increased breadth of information• More reliable analyses• Integration brings added value
Challenges
• Proteomics repositories in disparate locationsneed for distributed solution:• common access, distributed query processing
need for integration:• overlapping data, different representations
• Data/schemas constantly updated/evolve need virtual or hybrid integration need schema evolution support
Middleware (1/2)
• OGSA-DAI: middleware exposing data sources on Grids via web services• open-source and extensible• uniform access to relational & XML data sources• supports a variety of operations, e.g.
querying/updating, data transformation, data delivery
• OGSA-DQP: service-based distributed query processor• supports querying of relational OGSA-DAI data sources• offers implicit parallelism for data-intensive requests
Middleware (2/2)
• AutoMed: heterogeneous data transformation and integration system• subsumes traditional data integration
approaches• handles various data models – easily
extensible• virtual/materialised/hybrid integration• schema evolution• data warehousing tools
Data Integration Approaches
• Global-As-View (GAV) approach: describe GS constructs with view definitions over LSi constructs
• Local-As-View (LAV) approach: describe LSi constructs with view definitions over GS constructs RDF
XMLFileRDB
Local Schema
GlobalSchema
Local SchemaLocal Schema
Vie
wD
efin
itio
n
View
Def
initi
on
View
Definition
Both-As-View (BAV) Approach
• Schema transformation approach
• For each pair (LSi,GS): incrementally modify LSi/GS to match GS/LSi RDF
XMLFileRDB
Local Schema
GlobalSchema
Local SchemaLocal Schema
Tra
nsf
orm
atio
np
ath
wa
y
Tran
sfor
mat
ion
path
way
Transformation
pathway
BAV Example
• Transformation pathway consists of primitive transformations
• Pathway contains both GAV & LAV definitions• Transformations are automatically reversible• Metadata in AutoMed Repository
S1 Sg
I1S1
add(C1,q1) I2add(C2,q2) I3
add(C3,q3) I4add(C4,q4) I5
rename(C5,C6) I6delete(C7,q5) Sg
delete(C9,q6)
S1 Sg
I1S1
delete(C1,q1) I2delete(C2,q2) I3
delete(C3,q3) I4delete(C4,q4) I5
rename(C6,C5) I6add(C7,q5) Sg
add(C9,q6)
Proteomics Resources
• PEDRo• collection of descriptions of experimental data sets in proteomics• has been used as a format for exchanging proteomics data
• gpmDB• contains a large number of proteins and peptide identifications• initially designed to assist in the validation of peptide MS/MS
spectra and protein coverage patterns• PepSeeker
• developed as part of the ISPIDER project• comprehensive resource of peptide/protein identifications
• PRIDE• centralised, standards compliant, public proteomics repository• contains protein/peptide identifications + evidence supporting
them
Global Schema
• Trade-off between:• being able to answer specific user queries • a full integration
• Properties:• Based on PEDRo’s peptide/ protein
identification section and …• expanded with information unique in other
resources• Entities identified by LSIDs
System Architecture
• Sources wrapped with OGSA-DAI
• AutoMed toolkit wraps OGSA-DAI resources
• Integration of OGSA-DAI resources
• Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP
AutoMedMetadata
Repository
OGSA-DQPQES
OGSA-DQPQES
OGSA-DQPQES
PepSeeker
AutoMed DAIwrapper
AutoMed DAIwrapper
AutoMed DAIwrapper
DistributedQuery Processor
GlobalAutoMed Schema
AutoMedSchema
AutoMedSchema
AutoMedSchema
AutoMedQuery Processor
IQL query
OQL query
OGSA-DAIGDS
OGSA-DAIGDS
OGSA-DAIGDS
gpmDBPedro
AutoMed DQPwrapper
OQL result
IQL result
IQL query
IQL result
AutoMedWrappers
OGSA-DQPQDQS
transformation pathways
System Architecture
• Sources wrapped with OGSA-DAI
• AutoMed toolkit wraps OGSA-DAI resources
• Integration of OGSA-DAI resources
• Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP
PepSeeker
OGSA-DAIGDS
OGSA-DAIGDS
OGSA-DAIGDS
gpmDBPedro
AutoMedMetadata
Repository
OGSA-DQPQES
OGSA-DQPQES
OGSA-DQPQES
AutoMed DAIwrapper
AutoMed DAIwrapper
AutoMed DAIwrapper
DistributedQuery Processor
GlobalAutoMed Schema
AutoMedSchema
AutoMedSchema
AutoMedSchema
AutoMedQuery Processor
IQL query
OQL query
AutoMed DQPwrapper
OQL result
IQL result
IQL query
IQL result
AutoMedWrappers
OGSA-DQPQDQS
transformation pathways
System Architecture
• Sources wrapped with OGSA-DAI
• AutoMed toolkit wraps OGSA-DAI resources
• Integration of OGSA-DAI resources
• Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP
AutoMedMetadata
Repository
OGSA-DQPQES
OGSA-DQPQES
OGSA-DQPQES
PepSeeker
AutoMed DAIwrapper
AutoMed DAIwrapper
AutoMed DAIwrapper
DistributedQuery Processor
GlobalAutoMed Schema
AutoMedSchema
AutoMedSchema
AutoMedSchema
AutoMedQuery Processor
IQL query
OQL query
OGSA-DAIGDS
OGSA-DAIGDS
OGSA-DAIGDS
gpmDBPedro
AutoMed DQPwrapper
OQL result
IQL result
IQL query
IQL result
AutoMedWrappers
OGSA-DQPQDQS
transformation pathways
System Architecture
• Sources wrapped with OGSA-DAI
• AutoMed toolkit wraps OGSA-DAI resources
• Integration of OGSA-DAI resources
• Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP
AutoMedMetadata
Repository
OGSA-DQPQES
OGSA-DQPQES
OGSA-DQPQES
PepSeeker
AutoMed DAIwrapper
AutoMed DAIwrapper
AutoMed DAIwrapper
DistributedQuery Processor
GlobalAutoMed Schema
AutoMedSchema
AutoMedSchema
AutoMedSchema
AutoMedQuery Processor
IQL query
OQL query
OGSA-DAIGDS
OGSA-DAIGDS
OGSA-DAIGDS
gpmDBPedro
AutoMed DQPwrapper
OQL result
IQL result
IQL query
IQL result
AutoMedWrappers
OGSA-DQPQDQS
transformation pathways
System Architecture
• Sources wrapped with OGSA-DAI
• AutoMed toolkit wraps OGSA-DAI resources
• Integration of OGSA-DAI resources
• Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP
AutoMedMetadata
Repository
OGSA-DQPQES
OGSA-DQPQES
OGSA-DQPQES
PepSeeker
AutoMed DAIwrapper
AutoMed DAIwrapper
AutoMed DAIwrapper
DistributedQuery Processor
GlobalAutoMed Schema
AutoMedSchema
AutoMedSchema
AutoMedSchema
AutoMedQuery Processor
IQL query
OQL query
OGSA-DAIGDS
OGSA-DAIGDS
OGSA-DAIGDS
gpmDBPedro
AutoMed DQPwrapper
OQL result
IQL result
IQL query
IQL result
AutoMedWrappers
OGSA-DQPQDQS
transformation pathways
Query Processing
• Query is submitted to AutoMed’s GQP:• Reformulated• Optimised
• AutoMed-DQP Wrapper:• IQL OQL• OGSA-DQP
evaluates OQL queries
• OQL result IQL result
AutoMedMetadata
Repository
OGSA-DQPQES
OGSA-DQPQES
OGSA-DQPQES
PepSeeker
AutoMed DAIwrapper
AutoMed DAIwrapper
AutoMed DAIwrapper
DistributedQuery Processor
GlobalAutoMed Schema
AutoMedSchema
AutoMedSchema
AutoMedSchema
AutoMedQuery Processor
IQL query
OQL query
OGSA-DAIGDS
OGSA-DAIGDS
OGSA-DAIGDS
gpmDBPedro
AutoMed DQPwrapper
OQL result
IQL result
IQL query
IQL result
AutoMedWrappers
OGSA-DQPQDQS
transformation pathways
Query Processing
• Query is submitted to AutoMed’s GQP:• Reformulated• Optimised
• AutoMed-DQP Wrapper:• IQL OQL• OGSA-DQP
evaluates OQL queries
• OQL result IQL result
AutoMedMetadata
Repository
OGSA-DQPQES
OGSA-DQPQES
OGSA-DQPQES
PepSeeker
AutoMed DAIwrapper
AutoMed DAIwrapper
AutoMed DAIwrapper
DistributedQuery Processor
GlobalAutoMed Schema
AutoMedSchema
AutoMedSchema
AutoMedSchema
AutoMedQuery Processor
IQL query
OQL query
OGSA-DAIGDS
OGSA-DAIGDS
OGSA-DAIGDS
gpmDBPedro
AutoMed DQPwrapper
OQL result
IQL result
IQL query
IQL result
AutoMedWrappers
OGSA-DQPQDQS
transformation pathways
Summary
• Proteomics repositories in disparate locationsneed for distributed solution
need for integration
• Data/schemas constantly updated/evolve need virtual or hybrid integration
support schema evolution
AutoMedMetadata
Repository
OGSA-DQPQES
OGSA-DQPQES
OGSA-DQPQES
PepSeeker
AutoMed DAIwrapper
AutoMed DAIwrapper
AutoMed DAIwrapper
DistributedQuery Processor
GlobalAutoMed Schema
AutoMedSchema
AutoMedSchema
AutoMedSchema
AutoMedQuery Processor
IQL query
OQL query
OGSA-DAIGDS
OGSA-DAIGDS
OGSA-DAIGDS
gpmDBPedro
AutoMed DQPwrapper
OQL result
IQL result
IQL query
IQL result
AutoMedWrappers
OGSA-DQPQDQS
transformation pathways
Future Work
• Schema evolution
• Evaluation of AutoMed advantage
• Expose AutoMed functionality to the Grid
• AutoMed and Taverna integration
Future Work
• Taverna: tool for Web Service orchestration in workflows• Related services may be
incompatible• Current solution involves writing
custom code for every pair of WS
• Use AutoMed toolkit for semi-automatic integration of XML Web Services• mappings from WS to ontologies• automatic integration
WSproducer
format
WSconsumer
format
Step 1(manual)
WSproducer
format
WSconsumer
format
Step 2(automatic)
RDFS
RDFS
ISPIDER Project Members
• Birkbeck College• Nigel Martin• Alex Poulovassilis• Lucas Zamboulis (R.A.)• Hao Fan (former R.A.)
• European Bioinformatics Institute• Rolf Apweiler• Henning Hermjakob• Weimin Zhu• Chris Taylor• Phil Jones• Nisha Vinod
• University of Manchester• Simon Hubbard • Steve Oliver• Suzanne Embury• Norman Paton• Carol Goble• Robert Stevens• Khalid Belhajjame (R.A.)• Jennifer Siepen (R.A.)
• U.C.L.• David Jones• Christine Orengo• Melissa Pentony (R.A.)