Date post: | 31-Mar-2015 |
Category: |
Documents |
Upload: | destini-kendricks |
View: | 216 times |
Download: | 0 times |
FGE-OM: Functional Genomics Experiment - Object Model
Andy Jones
Department of Computing Science
University of Glasgow
Overview
• Introduction to proteomics
• Motivation for shared standards
• FGE-OM
• Database implementation - RAPAD
• Current biological projects
Proteomics Workflow
Sample OriginProtein
Solubilisation
Image Analysis
Database Search
ID Vol X Y 1 454 23 242 222 28 873 12 20 12 4 662 262 1015 49 222 906 113 485 10 7 119 98 987
Multiple GelAnalysis
ID Vol X Y 1 654 23 242 25 28 873 187 21 16 4 672 262 1115 54 222 906 113 487 10 7 125 98 987
MALDI MS/MS
Protein identification
2D-PAGE
StatisticalAnalysis
Mass Spectrometry
Motivation for Shared Standards
• Data from large studies using multiple techniques can be compared more easily
• Proteomics standardisation can learn from past efforts of MGED
• Shared aspects of microarrays & proteomics:– Overview of experiment– Sample origin– Experimental protocols (similarity between RNA extraction &
protein solubilisation)– Higher level analysis across multiple samples– RNA fluorescence signal, similar to protein volume on a 2-D
gel
FGE-OM
Components common to all functional genomics experiments
Microarray specific components
Classes modelling proteomics technologies
Top-level of the Object Model
Namespaces
BioOM
ArrayOM
ProteomicsOM
Functional Genomics Experiment - Object Model
MAGE-OMderived
PEDRo and Gla-PSIderived
• A database for microarrays and proteomics• Based on RAD microarray database at Penn• Additional tables to store proteomics• Interface based on the RAD Study-Annotator
RAD Study-Annotator:Manduchi et al. Bioinformatics2003, (in press)
Proteomics Standards
• PEDRo - Proteomics Experiment Data Repository – Proposal for standard covering sample
origin, protein separation and mass spec– Accepted by Proteomics Standards
Initiative as a draft standard– Published in Nature Biotech 21:247-254
(2003)
http://pedro.man.ac.uk http://psidev.sourceforge.net/
Proteomics Standards
Gla-PSI– Glasgow proposal for PSI
More detailed coverage of:• Image analysis• Multiple analysis of 2D gels• DIGE • Statistical analysis
Comparative and Functional Genomics 4:492-501 (2003)
Experiment ProtocolBio-
MaterialMeasure-
ment
BioAssayBioAssay
Data
BioEvent DescriptionBio-
SequenceBQS
Higher Level
Analysis
AuditAnd Security
Identifiable
Extendable
Describable
Packages Classes
Overview of BioOM packages
• BioAssay: removed Hybridization class into ArrayOM• BioAssayData: removed BioDataCube and related classes into ArrayOM• Other packages: unchanged from MAGE-OM
• Array,ArrayDesign, DesignElement• Describe layout of array
• QuantitiationType - microarray specific classes e.g. Signal• But, standard statistical tests could be incorporated into BioOM in the future•ArrayBioAssay contains only Hybridization class •ArrayBioAssayData contains BioDataCube - data dimensions
• Not directly applicable to proteomics or other experiments
Overview of ArrayOM packages
Array
Array BioAssay
ArrayDesign
Array BioAssayData
Quantitation Type
DesignElement
BioOM: BioAssayData vs ArrayOM:ArrayBioAssayData
BioAssay Data
BioAssay Dimension
BioData Tuples
BioData Values
MeasuredBioAssay
Data
Relationships between classes are the same as MAGE-OM
BioAssay Datum
BioAssay Map
BioData Cube
Composite Sequence Dimension
BioAssay Mapping
Derived BioAssay
Data
Design Element
Dimension
Design Element
Map
Design Element Mapping
Feature Dimension
QuantitationType
Dimension
QuantitationTypeMap
Quantitation Type
Mapping
Reporter Dimension
Transform-ation
BioOM ArrayOM
• Only the most generic classes kept in BioOM• Data model from MAGE does not fit proteomics• Matching spots across gels is more complex
Protein Separation
MassSpec Protocol
Proteome BioAssay
MassSpec Data
ProteinRecord
ProteinData
Overview of ProteomicsOM packages
• Packages derived from PEDRo and Gla-PSI
• Linked to classes in BioOM for adding generic descriptions and protocols
• Different design principles from MAGE-OM
• Classes have attributes that specify many of the datatypes to be captured
Gel2D
Column
Physical GelSpot
Fraction
Separation techniques Separation products
Source biomaterial
BioMaterialBioAssay Treatment
BioMaterialMeasurement
ProteomicsOM:ProteinSeparation package
• Separation techniques: subclass of BioAssayTreatment• Separation products: subclass of BioMaterial• Product of one separation technique can lead into another using BioMaterialMeasurement• A generic protocol can be attached to BioAssayTreatment
ProteinSeparation Package
BioOM
ProteomicsOM
Legend • GelImageAnalysis - analysis of 2-DE by specialist software
• Re-uses Image and ImageAcquisition from BioOM
• Linked by Physical-BioAssay
BioAssay Treatment
Physical BioAssay
BioAssay
Image
Channel
Image Acquisition
GelImage Analysis
Measured BioAssay
Feature Extraction
Measured BioAssay
Data
BioAssay Data
targettreatment
ProteomicsOM:ProteomeBioAssay package
Gel2D - 1st, 2nd dimension, stain protocols, operator, MW & pI range
Image
AcquisitionImage Channel
GelImage Analysis
ProteomicsOM:ProteinData package
GelImage Analysis
Feature Extraction
Identified Spot
Physical GelSpot
BioMaterial
DIGESingle Spot
BioData Tuples
BioData Values
Multiple Analysis
Matched Spots
Physical BioAssay
BioAssay Data
BioAssay Dimension
SpotRatio
• IdentifiedSpot stores spot data e.g. volume
• Subclass of Physical GelSpot and BioMaterial for capturing further treatments
• DIGESingleSpot captures single channel
• BioAssayDimension captures spots matched across gels
ProteinData Package
Search capabilities over protein name, range of pI, mass or spot volume
Clicking a spot loads protein data pages
Identified Spot
Protein
MassSpecProtocol and MassSpecData
MassSpec Experiment
PeakList
Peak
MassSpecProtocol Package MassSpecData Package
BioOM
ProteomicsOM
Legend
BioAssay Treatment
PEDRo derived classes modelling MS protocol
PEDRo derived classes modelling database searches
• MassSpecExperiment at top level
• BioAssayTreatment links to source of material and protocol (via BioEvent)
• Also links to specific classes for MS details e.g. ion source
• Data stored as a list of peaks• Classes for capturing
database searches from PEDRo
BioMaterial Measurement
ProteinRecord package
Location
speciesmodificationType
Protein Modification
Protein
OntologyEntry
DatabaseEntry
BioOM
ProteomicsOM
Legend
• Proteins identified by MS and database searches
• Class Protein stores a single protein record
• Protein modifications stored using OntologyEntry
• Link to external records stored in DatabaseEntry
ProteinHit DBSearch
MassSpecData package
ProteinRecord package
Display protein name, species, pI and MW
Data about protein modifications observed
Protein Modification
Protein
Measures of quality of match by MS. Link to MASCOT results
ProteinHit DBSearch
Link to GeneDB record - parasite genome database
- Accession and database URL are stored in the DatabaseEntry table
Protein
Database Entry
Link to Genbank record(or other database)
Protein
Database Entry
ImageAcquisition
FeatureExtraction
BioAssayTreatment
Physical BioAssay
Physical BioAssay
Image
Measured BioAssay
BioMaterial Measurement
Material TypeDNARNAProteinCell...
Experiment
Treatment BioMaterial
BioMaterial
Gel2D
LCColumn
MassSpecExperiment
MeasuredBio-AssayData
GelImage Analysis
AcquisitionProtocol
Proteomics Workflow
• Top level stores experiment description
• Extraction of protein mixture: BioMaterial and Treatment
• 2-DE and liquid chromatography: subclasses of BioAssayTreatment
• BioMaterialMeasurement used to link multiple separations together
• Image scan and image analysis - link to PhysicalBioAssay
• MeasuredBioAssay links to spot data and MS data
Current Project: Trypanosoma brucei
• Trypanosomes cause sleeping sickness and other diseases in Africa and Latin America
• Model organism for parasitology
Aims:• Genome sequencing, microarrays and proteomics to find all
the expressed genes and proteins - GeneDB at Sanger• Proteomics component in Glasgow• 2-DE and MS to find approx. 4000 proteins• Find potential drug targets and improve genome annotation
Work In Progress
• Develop RAPAD prototype, store and query data from a range of experiment types
• Support Trypanosome project - future integration with microarray and genomics
• Tools for generating FGE-ML and XMLSchema• Incorporate proteomics component into
database system at Penn (GUS)– Add proteomics support to ToxoDB, PlasmoDB,
GeneDB
ContactEmail: [email protected]
http://www.dcs.gla.ac.uk/~jonesa/FGE/fge.html
Bioinformatics Research Centre - www.brc.dcs.gla.ac.uk
The Functional Genomics Facility at Glasgow is supported by a Wellcome Trust grant. My research is supported by an MRC Bioinformatics PhD studentship.
AcknowledgementsThis work is in collaboration with the CBIL at Penn, in particular Chris Stoeckert and Angel Pizarro. Trypanosome data is from studies by Mike Turner and Anne Faldas in IBLS at Glasgow.
PhD supervisors: Ela Hunt and Jonathan Wastling
ProteomeBioAssay Package
MassSpecProtocol Package
MassSpecData Package
ProteinRecord Package