Date post: | 01-Apr-2015 |
Category: |
Documents |
Upload: | marissa-ellerman |
View: | 212 times |
Download: | 0 times |
EBI is an Outstation of the European Molecular Biology Laboratory.
Bird‘s Eye View of ...
Molecular Interaction Standards: PSI-MI XML
PSI-MI Tool support (APIs, Validator)
ChEBI
APO-SYS workshop20 – 21st January 2009Berlin
PROTEOMICS STANDARD INITIATIVE
A gentle introduction to the
2
3
Engineering 1850Engineering 1850
• Nuts and bolts fit perfectly together, but only if they originate from the same factory
• Standardisation proposal in 1864 by William Sellers
• It took until after WWII until it was generally accepted, though …
Proteomics 2003Proteomics 2003
• Proteomics data are perfectly compatible, but only if they are from the same lab / database / software
• “Publish and vanish” by data producers
• Collecting all publicly available data requires huge effort
• Urgent need for standardisation
4
• Community standard for Molecular Interactions
• XML schema and detailed controlled vocabularies
• Jointly developed by major data providers: BIND, CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono, U. Bielefeld, U. Bordeaux, U. Cambridge, and others
• Version 1.0 published in February 2004The HUPO PSI Molecular Interaction Format - A community standard for the representation of protein interaction data.Henning Hermjakob et al, Nature Biotechnology 2004, 22, 176-183.
• Version 2.5 published in October 2007Broadening the Horizon – Level 2.5 of the HUPO-PSI Format for Molecular Interactions;
Samuel Kerrien et al. BioMed Central. 2007.
PSI-MI XML format
5
• Collecting and combining data from different sources has become easier
• Standardized annotation through PSI-MI ontologies
• Tools from different organizations can be chained, e.g. analysis of IntAct data in Cytoscape.
PSI-MI XML benefits
http://www.psidev.info/MIHome page
PSI-MI CONTROLLED VOCABULARIES
An overview of the
6
7
Ontology Lookup Service• Makes available OBO controlled vocabularies• Web site allows for searching and browsing their
hierarchy
htt
p:/
/ww
w.e
bi.
ac.u
k/o
nto
log
y-l
ooku
ph
ttp
://w
ww
.eb
i.ac.u
k/o
nto
log
y-l
ooku
p
8
Ontology Lookup Service
• Each term has a definition as well as literature reference
htt
p:/
/ww
w.e
bi.
ac.u
k/o
nto
log
y-l
ooku
ph
ttp
://w
ww
.eb
i.ac.u
k/o
nto
log
y-l
ooku
p
PSI-MI XML 2.5 DATA MODELAn overview of the
9
10
PSI-MI 2.5 Standards
11
• Top level structure unchangedcompared to PSI-MI 1.0
• Use of Id/Ref on main objects
Bird’s eye view of PSI-MI XML 2.5
12
Main objects - Experiment
Controlled by Ontologies
Literature references
Confidence measures
13
Main objects - Interactor
Generic interactor
Reference to a public database
14
Main objects - Interaction
Controlled by Ontology
Copyright
Experiment
Kinetics parameters
Confidence value
15
Basics – Controlled Vocabularies• Why ?
• Ensure data consistency
• Provide reliable mean for searching & filtering data
• How ?
• By providing a reference to an ontology term
Using
Xref !
!
16
Main objects - Participant
e.g. enzyme target
Interactor
e.g. bait, prey
Delivery methodexpression level…
Interactor used experimentally
Building of Complex
PSI-MI TAB DATA MODELAn overview of the
17
18
Standard columns (15):• ID(s) interactor A & B• Alt. ID(s) interactor A & B • Alias(es) interactor A & B• Interaction detection method(s)• Publication 1st author(s)• Publication Identifier(s)• Taxid interactor A & B• Interaction type(s)• Source database(s)• Interaction identifier(s)• Confidence value(s)
PSIMITAB Standard Columns
INTACT EXTENDED MITABA quick look into
19
20
IntAct specific columns (+11):• Experimental role(s) of interactors• Biological role(s) of interactors• Properties (CrossReference) of interactors• Type(s) of interactors• HostOrganism(s)• Expansion method(s)• Dataset name(s)
Standard columns (15):• ID(s) interactor A & B• Alt. ID(s) interactor A & B • Alias(es) interactor A & B• Interaction detection method(s)• Publication 1st author(s)• Publication Identifier(s)• Taxid interactor A & B• Interaction type(s)• Source database(s)• Interaction identifier(s)• Confidence value(s)
+
PSIMITAB Extended Columns
PSI-MI XML 2.5 JAVA APIA hands on introduction to
21
22
PSI-MI XML Java API
• Uses Java 5• Provides binding between XML and Java object model• Tools to read/write XML from/to file• Read can be done in 2 fashions:
• Load a whole file in an EntrySet• Only allows to load large files if you have enough memory• Easy to update content and write back to file
• Index XML data and give access though an IndexedEntry• Memory efficient with large files• Allows to browse through interactions, experiments…• Trickier to write updated content (yet, feasible)
PSI-MI TAB 2.5 JAVA APIA hands on introduction to
23
24
PSI-MI TAB Java API
• Uses Java 5• Provides binding between TAB and a Java object model• Tools to read/write TAB from/to file• You can read in 2 fashions:
• Load a whole file in a Collection<BinaryInteraction>• Only allows to load large files if you have enough memory
• Load interaction one at a time using Iterator<BinaryInteraction>• Memory efficient with large files
25
• PSI-MI XML is the de facto standard for molecular interactions
• We have code samples & exercises for both APIs ! Let me know if you want access to it …
• The Java API makes it easy to handle
Summary
http://psidev.info/MIPSI-MI Home page
http://www.psidev.info/index.php?q=node/60#toolsAPI Download
ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psi25Data
R packages for PSI-MIQuick introduction to
26
27
Rintact & RpsiXML
• Initiative from the Wolfgang Huber’s group at the EBI
• Allows to read PSI-MI XML data into R data structure
• Enables data analysis using existing packages such as: RBGL, ppiStats, apComplex, …
• Currently supports: IntAct, MINT, HPRD, DIP, BioGRID, MIPS/CORUM, MatriDB, MPACT.
http://www.bioconductor.org/packages/2.1/bioc/html/Rintact.html
API Download
http://www.bioconductor.org/packages/2.3/bioc/vignettes/RpsiXML/inst/doc/RpsiXML.pdf
Documentation
PSI SEMANTIC VALIDATORQuick introduction to
28
29
The PSI validator framework automatically checks that experimental data reported using a specific XML format and various CVs are compliant with the overall MIAPE recommendations.
The semantic validator checks :
- the XML syntax
- the appropriate CV terms are used in specific locations of a document
- misc. consistency check
The Framework (in the context of PSI)
30
OntologyManager
Ontology Mapping Rule Object Rule
Semantic ValidatorMessagesData Model
Config Config Config
OBO
OLS
DataFile
Components of the Validator
31
The Ontology Manager
Declaration of ontologies or Controlled Vocabularies:
• location,
• format,
• retrieval method (local file or via web services)
32
Ontology Lookup Service
Currrently 61 Ontologies available
Web Service for easy access
33
CV Mapping Rules
Is an explicit specification of which CV terms may/should/must be used in a given location.
•crucial to bind a data model to a set of CVs
•necessary to enforce MIAPE guidelines
•allows to develop CVs independently from a schema (necessary to comply to CV guidelines)
•this mapping is specified in an XML file
34
Exchange Format
Referenced ontologies and CVs
Resulting mapping file <CvMappingRule scopePath="/mzML/sampleList/sample” cvTermsCombinationLogic="OR" requirementLevel="MAY">
<CvTerm termAccession="GO:0005575" useTerm="false" termName="cellular_component" allowChildren="true" isRepeatable="true" cvIdentifierRef="GO"></CvTerm> <CvTerm termAccession="BTO:0000000" useTerm="false" termName="brenda source tissue ontology" allowChildren="true" isRepeatable="true" cvIdentifierRef="BTO"/> </CvMappingRule>
<CvMappingRule scopePath="/mzML/instrumentList/instrument/componentList/analyzer” cvTermsCombinationLogic=“AND" requirementLevel="MUST"> <CvTerm termAccession="MS:1000443" useTerm="false" termName="data file checksum type" allowChildren="true" isRepeatable="true" cvIdentifierRef="MS"></CvTerm> <CvTerm termAccession="MS:1000480" useTerm="false" termName=“Mass Analyzer" allowChildren="true" isRepeatable="true" cvIdentifierRef="MS"></CvTerm></CvMappingRule>
<CvMappingRule scopePath="/mzML/instrumentList/instrument/componentList/detector" cvTermsCombinationLogic=“AND" requirementLevel="MUST"> <CvTerm termAccession="MS:1000026" useTerm="false" termName=“Detector Type" allowChildren="true" isRepeatable="false" cvIdentifierRef="MS"/> <CvTerm termAccession="MS:1000027" useTerm="false" termName="detector acquisition mode" allowChildren="false" isRepeatable="true" cvIdentifierRef="MS"/></CvMappingRule>
CV Mapping Rules – example with MzML
35
•A data model is not bound to a single mapping
•PSI MI and MS workgroup provide a mapping corresponding to their respective minimum reporting guidelines (MIAPE)
•Mapping can be customized by any end user of a standard to be more or less granular
CV Mapping Rules – final thoughts
36
List of consistency check tailored to specific data type
Examples:- taxid is an existing entry at NCBI- PubMed ID is an existing publication- protein and DNA sequence defined using
appropriate alphabet- CV dependency rules
Note: These rules are to be programmed in Java
The Object Rules
37
Fancy Building Your Own ?
We are currently finalizing a tutorial to guide users in writing a validator based on their own data model. It provides:
• Additional explanation on the Validator’s modules
• Example of configuration files
• A working prototype based on a made up data model
• Source code available to get you quick-started.http://psidev.info/validator
EBI is an Outstation of the European Molecular Biology Laboratory.
IntAct team
Rol
f Apw
eile
r
•Henning Hermjakob•Sandra Orchard•Jyoti Khadake•Luisa Montecchi•Dave Thorneycroft•Cathy Derow•Prem Achuthan•Bruno Aranda•Samuel Kerrien
IntA
ct
is f
un
ded
by t
he E
uro
pean
Com
mis
sio
n u
nd
er
FELIC
S,
con
tract
nu
mb
er
021902 (
RII
3)
EBI is an Outstation of the European Molecular Biology Laboratory.
• Luisa Montecchi-Palazzi• Florian Reisinger• Lennart Martens• Andy Jones• Mathias Oesterheld• Bruno Aranda• Prem Achuthan • Henning Hermjakob
PSI participants(direct contributors to the validator)
• Juan A Vizcaino• Chris Taylor• Eric Deutsch• Pierre Alain Binz• Susanna Sansone• Frank Gibson• Zsuzsanna Bencsath• Daniel Schober• Trish Wetzel• Pete Souda
Other PSI participants
40
????
??? ?
??
?
?
?
?
?
?
??
?
?
? ?
?