+ All Categories
Home > Documents > Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J....

Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J....

Date post: 15-Dec-2015
Category:
Upload: willow-sauceman
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
23
Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard, S. M. Embury, N. W. Paton
Transcript
Page 1: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Data Access & Integration in the ISPIDER Proteomics

Grid

L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S.

Hubbard, S. M. Embury, N. W. Paton

Page 2: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Overview

• The ISPIDER project• Data Access & Integration of

Proteomics Resources•Challenges•Middleware•Proteomics resources & global schema•System architecture & query

processing

• Future Work

Page 3: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

ISPIDER

• Project Goals:

• Build an integrated platform of proteomic resources

• Use existing resources – produce new ones

• Create clients for querying, visualisation, etc.

Page 4: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

ISPIDER

• Objective: develop an integrated platform of proteome-related resources, using existing standards

• Benefits:• Access to increased breadth of information• More reliable analyses• Integration brings added value

Page 5: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Challenges

• Proteomics repositories in disparate locationsneed for distributed solution:• common access, distributed query processing

need for integration:• overlapping data, different representations

• Data/schemas constantly updated/evolve need virtual or hybrid integration need schema evolution support

Page 6: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Middleware (1/2)

• OGSA-DAI: middleware exposing data sources on Grids via web services• open-source and extensible• uniform access to relational & XML data sources• supports a variety of operations, e.g.

querying/updating, data transformation, data delivery

• OGSA-DQP: service-based distributed query processor• supports querying of relational OGSA-DAI data sources• offers implicit parallelism for data-intensive requests

Page 7: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Middleware (2/2)

• AutoMed: heterogeneous data transformation and integration system• subsumes traditional data integration

approaches• handles various data models – easily

extensible• virtual/materialised/hybrid integration• schema evolution• data warehousing tools

Page 8: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Data Integration Approaches

• Global-As-View (GAV) approach: describe GS constructs with view definitions over LSi constructs

• Local-As-View (LAV) approach: describe LSi constructs with view definitions over GS constructs RDF

XMLFileRDB

Local Schema

GlobalSchema

Local SchemaLocal Schema

Vie

wD

efin

itio

n

View

Def

initi

on

View

Definition

Page 9: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Both-As-View (BAV) Approach

• Schema transformation approach

• For each pair (LSi,GS): incrementally modify LSi/GS to match GS/LSi RDF

XMLFileRDB

Local Schema

GlobalSchema

Local SchemaLocal Schema

Tra

nsf

orm

atio

np

ath

wa

y

Tran

sfor

mat

ion

path

way

Transformation

pathway

Page 10: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

BAV Example

• Transformation pathway consists of primitive transformations

• Pathway contains both GAV & LAV definitions• Transformations are automatically reversible• Metadata in AutoMed Repository

S1 Sg

I1S1

add(C1,q1) I2add(C2,q2) I3

add(C3,q3) I4add(C4,q4) I5

rename(C5,C6) I6delete(C7,q5) Sg

delete(C9,q6)

S1 Sg

I1S1

delete(C1,q1) I2delete(C2,q2) I3

delete(C3,q3) I4delete(C4,q4) I5

rename(C6,C5) I6add(C7,q5) Sg

add(C9,q6)

Page 11: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Proteomics Resources

• PEDRo• collection of descriptions of experimental data sets in proteomics• has been used as a format for exchanging proteomics data

• gpmDB• contains a large number of proteins and peptide identifications• initially designed to assist in the validation of peptide MS/MS

spectra and protein coverage patterns• PepSeeker

• developed as part of the ISPIDER project• comprehensive resource of peptide/protein identifications

• PRIDE• centralised, standards compliant, public proteomics repository• contains protein/peptide identifications + evidence supporting

them

Page 12: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Global Schema

• Trade-off between:• being able to answer specific user queries • a full integration

• Properties:• Based on PEDRo’s peptide/ protein

identification section and …• expanded with information unique in other

resources• Entities identified by LSIDs

Page 13: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

System Architecture

• Sources wrapped with OGSA-DAI

• AutoMed toolkit wraps OGSA-DAI resources

• Integration of OGSA-DAI resources

• Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

AutoMedMetadata

Repository

OGSA-DQPQES

OGSA-DQPQES

OGSA-DQPQES

PepSeeker

AutoMed DAIwrapper

AutoMed DAIwrapper

AutoMed DAIwrapper

DistributedQuery Processor

GlobalAutoMed Schema

AutoMedSchema

AutoMedSchema

AutoMedSchema

AutoMedQuery Processor

IQL query

OQL query

OGSA-DAIGDS

OGSA-DAIGDS

OGSA-DAIGDS

gpmDBPedro

AutoMed DQPwrapper

OQL result

IQL result

IQL query

IQL result

AutoMedWrappers

OGSA-DQPQDQS

transformation pathways

Page 14: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

System Architecture

• Sources wrapped with OGSA-DAI

• AutoMed toolkit wraps OGSA-DAI resources

• Integration of OGSA-DAI resources

• Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

PepSeeker

OGSA-DAIGDS

OGSA-DAIGDS

OGSA-DAIGDS

gpmDBPedro

AutoMedMetadata

Repository

OGSA-DQPQES

OGSA-DQPQES

OGSA-DQPQES

AutoMed DAIwrapper

AutoMed DAIwrapper

AutoMed DAIwrapper

DistributedQuery Processor

GlobalAutoMed Schema

AutoMedSchema

AutoMedSchema

AutoMedSchema

AutoMedQuery Processor

IQL query

OQL query

AutoMed DQPwrapper

OQL result

IQL result

IQL query

IQL result

AutoMedWrappers

OGSA-DQPQDQS

transformation pathways

Page 15: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

System Architecture

• Sources wrapped with OGSA-DAI

• AutoMed toolkit wraps OGSA-DAI resources

• Integration of OGSA-DAI resources

• Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

AutoMedMetadata

Repository

OGSA-DQPQES

OGSA-DQPQES

OGSA-DQPQES

PepSeeker

AutoMed DAIwrapper

AutoMed DAIwrapper

AutoMed DAIwrapper

DistributedQuery Processor

GlobalAutoMed Schema

AutoMedSchema

AutoMedSchema

AutoMedSchema

AutoMedQuery Processor

IQL query

OQL query

OGSA-DAIGDS

OGSA-DAIGDS

OGSA-DAIGDS

gpmDBPedro

AutoMed DQPwrapper

OQL result

IQL result

IQL query

IQL result

AutoMedWrappers

OGSA-DQPQDQS

transformation pathways

Page 16: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

System Architecture

• Sources wrapped with OGSA-DAI

• AutoMed toolkit wraps OGSA-DAI resources

• Integration of OGSA-DAI resources

• Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

AutoMedMetadata

Repository

OGSA-DQPQES

OGSA-DQPQES

OGSA-DQPQES

PepSeeker

AutoMed DAIwrapper

AutoMed DAIwrapper

AutoMed DAIwrapper

DistributedQuery Processor

GlobalAutoMed Schema

AutoMedSchema

AutoMedSchema

AutoMedSchema

AutoMedQuery Processor

IQL query

OQL query

OGSA-DAIGDS

OGSA-DAIGDS

OGSA-DAIGDS

gpmDBPedro

AutoMed DQPwrapper

OQL result

IQL result

IQL query

IQL result

AutoMedWrappers

OGSA-DQPQDQS

transformation pathways

Page 17: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

System Architecture

• Sources wrapped with OGSA-DAI

• AutoMed toolkit wraps OGSA-DAI resources

• Integration of OGSA-DAI resources

• Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

AutoMedMetadata

Repository

OGSA-DQPQES

OGSA-DQPQES

OGSA-DQPQES

PepSeeker

AutoMed DAIwrapper

AutoMed DAIwrapper

AutoMed DAIwrapper

DistributedQuery Processor

GlobalAutoMed Schema

AutoMedSchema

AutoMedSchema

AutoMedSchema

AutoMedQuery Processor

IQL query

OQL query

OGSA-DAIGDS

OGSA-DAIGDS

OGSA-DAIGDS

gpmDBPedro

AutoMed DQPwrapper

OQL result

IQL result

IQL query

IQL result

AutoMedWrappers

OGSA-DQPQDQS

transformation pathways

Page 18: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Query Processing

• Query is submitted to AutoMed’s GQP:• Reformulated• Optimised

• AutoMed-DQP Wrapper:• IQL OQL• OGSA-DQP

evaluates OQL queries

• OQL result IQL result

AutoMedMetadata

Repository

OGSA-DQPQES

OGSA-DQPQES

OGSA-DQPQES

PepSeeker

AutoMed DAIwrapper

AutoMed DAIwrapper

AutoMed DAIwrapper

DistributedQuery Processor

GlobalAutoMed Schema

AutoMedSchema

AutoMedSchema

AutoMedSchema

AutoMedQuery Processor

IQL query

OQL query

OGSA-DAIGDS

OGSA-DAIGDS

OGSA-DAIGDS

gpmDBPedro

AutoMed DQPwrapper

OQL result

IQL result

IQL query

IQL result

AutoMedWrappers

OGSA-DQPQDQS

transformation pathways

Page 19: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Query Processing

• Query is submitted to AutoMed’s GQP:• Reformulated• Optimised

• AutoMed-DQP Wrapper:• IQL OQL• OGSA-DQP

evaluates OQL queries

• OQL result IQL result

AutoMedMetadata

Repository

OGSA-DQPQES

OGSA-DQPQES

OGSA-DQPQES

PepSeeker

AutoMed DAIwrapper

AutoMed DAIwrapper

AutoMed DAIwrapper

DistributedQuery Processor

GlobalAutoMed Schema

AutoMedSchema

AutoMedSchema

AutoMedSchema

AutoMedQuery Processor

IQL query

OQL query

OGSA-DAIGDS

OGSA-DAIGDS

OGSA-DAIGDS

gpmDBPedro

AutoMed DQPwrapper

OQL result

IQL result

IQL query

IQL result

AutoMedWrappers

OGSA-DQPQDQS

transformation pathways

Page 20: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Summary

• Proteomics repositories in disparate locationsneed for distributed solution

need for integration

• Data/schemas constantly updated/evolve need virtual or hybrid integration

support schema evolution

AutoMedMetadata

Repository

OGSA-DQPQES

OGSA-DQPQES

OGSA-DQPQES

PepSeeker

AutoMed DAIwrapper

AutoMed DAIwrapper

AutoMed DAIwrapper

DistributedQuery Processor

GlobalAutoMed Schema

AutoMedSchema

AutoMedSchema

AutoMedSchema

AutoMedQuery Processor

IQL query

OQL query

OGSA-DAIGDS

OGSA-DAIGDS

OGSA-DAIGDS

gpmDBPedro

AutoMed DQPwrapper

OQL result

IQL result

IQL query

IQL result

AutoMedWrappers

OGSA-DQPQDQS

transformation pathways

Page 21: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Future Work

• Schema evolution

• Evaluation of AutoMed advantage

• Expose AutoMed functionality to the Grid

• AutoMed and Taverna integration

Page 22: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Future Work

• Taverna: tool for Web Service orchestration in workflows• Related services may be

incompatible• Current solution involves writing

custom code for every pair of WS

• Use AutoMed toolkit for semi-automatic integration of XML Web Services• mappings from WS to ontologies• automatic integration

WSproducer

format

WSconsumer

format

Step 1(manual)

WSproducer

format

WSconsumer

format

Step 2(automatic)

RDFS

RDFS

Page 23: Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

ISPIDER Project Members

• Birkbeck College• Nigel Martin• Alex Poulovassilis• Lucas Zamboulis (R.A.)• Hao Fan (former R.A.)

• European Bioinformatics Institute• Rolf Apweiler• Henning Hermjakob• Weimin Zhu• Chris Taylor• Phil Jones• Nisha Vinod

• University of Manchester• Simon Hubbard • Steve Oliver• Suzanne Embury• Norman Paton• Carol Goble• Robert Stevens• Khalid Belhajjame (R.A.)• Jennifer Siepen (R.A.)

• U.C.L.• David Jones• Christine Orengo• Melissa Pentony (R.A.)


Recommended