+ All Categories
Home > Documents > By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics...

By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics...

Date post: 26-Dec-2015
Category:
Upload: curtis-richard
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
44
By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology
Transcript
Page 1: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

By Xianfeng (Jeff) Chen

Computational and Systems Biologist

May 7, 2009

Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology

Page 2: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Agenda Today

(1) Cyber-infrastructure and systems biology.

(2) High performance computing and software for peptide/protein identification and quantification, data mining/target discovery, on mass spectrometry generated proteomics data. (3) Relational database management system, genome annotation methodology, systems biology data integration, biology knowledge generation and augmentation.

Page 3: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Section One: Cyber-infrastructure and Systems Biology

Reductionist approach,one gene, one protein

Systems approach,multiple genes, network

analysis

Cutting edge science and technology

Page 4: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Status of Technologies in Systems Biology

Page 5: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Cyber-infrastructure for Systems Biology Cyber-infrastructure for Systems Biology

• “…. build new types of scientific and engineering knowledge environments and organizations to pursue research in new ways and with increased efficacy.

• …..new NSF funding of $1 billion per year is needed to achieve critical mass …….

2008Awarded $50 millions

http://www.communitytechnology.org/nsf_ci_report/

2004Awarded to $100 millions

2004Awarded $85 millions

Page 6: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Supporting Cyber- infrastructure and Systems Biology Workflow

Historic strong area

Supporting

Page 7: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

(DOE - Genomics: GTL Roadmap, p.52)

Cyber-knowledge System to Enable Genomics-based Predicative Medicine

Page 8: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

System Integration at Systems Biology CenterSystem Integration at Systems Biology Center

Core Laboratory Facility:Data Generation

Core Computational Facility:Data Processing, Storage,

and Dissemination

Cyber-infrastructure, Data Management, Data Analysis Pipeline, and Data Display

(1) LIMS for raw data & protocol(2) Preprocessed data management(3) High throughput computing(4) Data validation and integration(5) Knowledge representation

Data Mining and Knowledge Discovery

Page 9: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

PC Single CPU Computing Unix Multiple CPUs Computing Cluster Computing

Cyber-infrastructure Component (1) : High Performance Computing

Step 1 Step 2Start point

Most labs 5-10 biological labs in US 2-4 biological labs

For large sets of data analysis

--- Migration of Bio-Computing Capability

Page 10: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Cyber-infrastructure Component (2) : Integrated Knowledgebase System

--- Case Study of National Biodefense Proteomics Data Center

Page 11: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Public File Server

Private File ServerOracle Relational Database

Database query,

Data upload over

http

Batch Processing

(1) Data uploading;

(2) Data validation;

(3) Data analysis;

(4) Data processing

Perl,

Java

Web services

Data exchange using XML based

SOAP

---- System Integration Case 1: UVa Proteomics Data Center---- System Integration Case 1: UVa Proteomics Data Center

High Performance

and ThroughputComputing

Data ManagementData Management

Section Two: High Performance Computing and Proteomics

Page 12: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Protein Database Search EnginesMascot Matrix Science

Sequest / Bioworks Scripps/ThermoX! Tandem the GPMSpectrum Mill Agilent Technologies

OMSSA NCBIPEAKS Bioinformatics Solutions Inc. Phenyx GeneBio

Statistical Validation and QuantitationPeptideProphet Institute for Systems Biology ProteinProphet Institute for Systems Biology ASAPRatio, XPRESS, Libra Institute for Systems Biology Scaffold Batch System Proteome Software, Inc.SIEVE ThermoCensus Scripps Research Institute

Open Data StandardsFuGE and XAR FHCRC, ICBC, ITMAT, & ManchesterMIAPE HUPO PSI and Collaborators mzXML, pepXML, protXML Institute for Systems Biology MS1, MS2, SQT Scripps Research Institute

Computational Proteomics Software and Algorithms

Many more ……..…

Page 13: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

System Integration Case 2: National Biodefense Proteomics Data Center

http://www.proteomicsresource.org

Awarded $14 millions

Page 14: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

(1) University of Michigan Microarray and mass spectrometry

(2) Caprion Pharmaceuticals Mass spectrometry

(3) Harvard Proteomics Institute Genomics and protein expression array

(4) Albert Einsten College of Medicine Mass spectrometry

(5) PNNL Mass spectrometry

(6) Scripps NMR structural, X-ray crystal diffraction data, and Mass spectrometry

(7) Myriad Genetics Yeast two-hybrid system

Proteomics Research Centers (PRC) and Their Major Data Types

PRC Organizations Major Data Types

Page 15: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Proteomics Data Flow

PRCS

VBI

Public

Data Sources

2D GELS

Protein Array

LC

Immunoaffinity purification

Y2H

MS

MS/MS

NMR

X-Ray Cryoelectron Microscopy

X-Ray Defraction

etc…

Data Types

QA

&

QC

Quality Assurance

& Quality Control

Converting to Standard Format

Standard

Format

Standard Format for Each Data Type

QA

&

QC

Quality Assurance

& Quality Control

Data Modeling / Decomposition

Relational Database

MIAME and MIAPE-like Standards/SOP for Data Submission

Page 16: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Proteomics Database Architecture

Page 17: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Search By Experiment/Sample

Databases in Proteomics Data Center

Page 18: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

• Annotation improvement and interaction network analysis

(1) Non-homologous based methods -------------- Phylogenetic profiling,

Rosetta stone pattern,

Operon analysis,

Co-expression profiling,

Gene neighboring etc.

(2) Comparative genomics with reference genomes --- E. coli, yeast, Arabidopsis,

etc. model organisms.

• Identifying anchor points for data integration

(1) Known metabolic pathway;

(2) Known signal transduction pathway;

(3) Known gene regulation machinery;

(4) Known protein-protein interaction map.

Strategies for Annotating Raw Data into Meaningful Knowledge

BMC Bioinformatics 2006, 7 (Suppl 4):S18

Page 19: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Qualitative Data Integration and Knowledge Augmentation Based on Networks Biology

Page 20: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Quantitative Proteome Profiling

--- The field is 2-3 years old

Thermo SIEVE Scatter Plot of 14 UVa Raw Files for Validation of Data Quality and Absolute Quantification.

Scaffold Capability of Proteome Spectra Counts of Semi-quantification.

Page 21: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Search Engine Comparison at UVa Proteomics Data Center (1)

Few common annotations

Low annotation rates

Page 22: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Peptide/Protein Identifications with Various Protein Database Search Engines (2)

X!Tandem missed OMSSA missed

Sequest over-predicted

Page 23: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

UVaPDC, MS/MS Search Engine Comparison (3)

Spectra counts

Common annotations

Statistics on confident values

Page 24: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Statistics and Summarization Capability of Scaffold

--- The best feather of the software

Page 25: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.
Page 26: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Data Mining on Data Processed via Computational Approach

Knowledge-based Discovery

Page 27: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Identified

Identified

Rate limited step

Knowledge Inference

Knowledge Inference

Inference on Gene Network in Systems Biology

(1) Y2H, (2) MS pull down assay, (3) Co-expression assay.

Where are the significant regulatory steps impacting pathway expression ?

Target/lead protein

Page 28: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Raf

MAPK

EDH1

EPS8L1* or

EPS8L2*GDP

GTP

NRas*EPS15

Mucin-4*

Gα* GγGTP

P

EGFRAdenylate

Cyclase

ATPcAMP

Cell ProliferationMP Formation

P

Gα*

Urinary Biomarker Identification ---EGFR Pathway Related Bladder Cancer

----- Small scale analysis

* Differentially expressed

Patient with Bladder Cancer

Healthy Individual

Urine Urine

Urine Microparticles

LC-MS/MS

SEQUEST

Spectral Count Analysis

Western Blotting

EPS8L2

Exosomes

Ectosomes

Page 29: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Patten Matching on Gene Signatures at Various Biological States

--- Large-scale analysis

*** query signatures are compared to reference gene/protein expression signatures for known perturbations or disease phenotypes. (many to many association analysis)

Page 30: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Section Three : Knowledge Base Establishment

Database Case 1 Soybean Upstream Regulatory Elements for Ongoing Regulatory Motif Annotation

Page 31: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

115

89

Nominated Transcription Factor Involved in Stress Response

Group IX

Red Dot = Soybean ERF genes

Implicated in regulating wounding and jasmonate responses

Soybean Promoter :

GmERFs, Gmubis, Gmcons, GmWRKYs

more and more and more……..

10 promoters per month

Promoter

Page 32: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Ongoing Effort on Transcription Factor Binding Motifs

---- Identify genetic circuits of cell wall, starch, and lipid biosynthesis and degradation

Page 33: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Elucidation of Conserved Co-expression Networks via Data Integration with Expression Profiling Data

Page 34: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

(1) BMC Bioinformatics. 2007, 8:129.(2) BMC Bioinformatics. 2008, 9:53.

Database Case 2 CGKB and TOBFAC Knowledge Bases

Page 35: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Genome Annotation Strategy (1) : Homology-based Annotation

263,425 total cowpea gene space sequence (GSS).

High level coding region detection !

BMC Genomics. 2008, 9:103.

Page 36: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Genome Annotation Strategy (2) : Metabolic Pathway Integration

BMC Bioinformatics. 2007, 8:129.

Page 37: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Genome Annotation Strategy (3) : GO Integration with Distribution of Function Assignments

BMC Genomics. 2008, 9:103.

Page 38: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Genome Annotation Strategy (4): Comparative Genomics at Genome-scale

BMC Genomics. 2008, 9:103.

---- Example of medicago vs cowpea

Page 39: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Genome Annotation Strategy (5): Comparison at Gene Family Level

(1) BMC Genomics. 2008, 9:103.(2) Plant Physiology. 2008, 147:280-295.

--- WRKY and CONSTANS (CO) and CO-like Gene Families of Cowpea Transcription Factors

Page 40: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Genome Annotation Strategies: (6) Repeat, (7) Domain, (8) Gene Model

BMC Bioinformatics. 2007, 8:129.

Repeat

Domain

Gene Model

Page 41: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Genome Annotation Strategy (9) : Comparative Genomics on Network for Conserved Protein Complexes

Comparative genome analysis

Conserved networks

Page 42: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Published Protein-Protein (PPI) Interactions in Organisms

Example of Yeast PPI

Page 43: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Genome Annotation Strategy (10): Functional Validation of Genes of Interest Through Reverse Genetics Program

My name

2008

Page 44: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009 Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology.

Acknowledgement


Recommended