+ All Categories
Home > Technology > STRING - Cross-species integration of known and predicted protein-protein interactions

STRING - Cross-species integration of known and predicted protein-protein interactions

Date post: 11-May-2015
Category:
Upload: lars-juhl-jensen
View: 429 times
Download: 1 times
Share this document with a friend
Description:
University of Southern California, Los Angeles, California, December 15, 2005
Popular Tags:
26
STRING Cross-species integration of known and predicted protein-protein interactions Lars Juhl Jensen EMBL Heidelberg
Transcript
Page 1: STRING - Cross-species integration of known and predicted protein-protein interactions

STRINGCross-species integration of known and

predicted protein-protein interactions

Lars Juhl JensenEMBL Heidelberg

Page 2: STRING - Cross-species integration of known and predicted protein-protein interactions

STRING provides a protein network based on integration of diverse types of evidence

Genomic neighborhood

Species co-occurrence

Gene fusions

Database imports

Exp. interaction data

Microarray expression data

Literature co-mentioning

Page 3: STRING - Cross-species integration of known and predicted protein-protein interactions

Inferring functional modules fromgene presence/absence patterns

Restingprotuberances

Protractedprotuberance

Cellulose

© Trends Microbiol, 1999

CellCell wall

Anchoring proteins

Cellulosomes

Cellulose

The “Cellulosome”

Page 4: STRING - Cross-species integration of known and predicted protein-protein interactions

Genomic context methods

© Nature Biotechnology, 2004

Page 5: STRING - Cross-species integration of known and predicted protein-protein interactions

Formalizing the phylogenetic profile method

Align all proteins against allAlign all proteins against all

Calculate best-hit profileCalculate best-hit profile

Join similar species by PCAJoin similar species by PCA

Calculate PC profile distancesCalculate PC profile distances

Calibrate against KEGG mapsCalibrate against KEGG maps

Page 6: STRING - Cross-species integration of known and predicted protein-protein interactions

Predicting functional and physical interactions from gene fusion/fission events

Find in A genes that matcha the same gene in B

Find in A genes that matcha the same gene in B

Exclude overlappingalignments

Exclude overlappingalignments

Calibrate againstKEGG maps

Calibrate againstKEGG maps

Calculate all-against-allpairwise alignments

Calculate all-against-allpairwise alignments

Page 7: STRING - Cross-species integration of known and predicted protein-protein interactions

Inferring functional associations from evolutionarily conserved operons

Identify runs of adjacent geneswith the same direction

Identify runs of adjacent geneswith the same direction

Score each gene pair based onintergenic distances

Score each gene pair based onintergenic distances

Calibrate against KEGG mapsCalibrate against KEGG maps

Infer associationsin other species

Infer associationsin other species

Page 8: STRING - Cross-species integration of known and predicted protein-protein interactions

Score calibration against a common reference

• Many diverse types of evidence– The quality of each is judged by

very different raw scores

– Quality differences exist among data sets of the same type

• Solved by calibrating all scores against a common reference– Scores are directly comparable

– Probabilistic scores allow evidence to be combined

• Requirements for the reference– Must represent a compromise of

the all types of evidence

– Broad species coverage

Page 9: STRING - Cross-species integration of known and predicted protein-protein interactions

Integrating physical interaction screens

Complexpull-down

experiments

Complexpull-down

experiments

Yeast two-hybriddata sets are

inherently binary

Yeast two-hybriddata sets are

inherently binary

Calculate scorefrom number of

(co-)occurrences

Calculate scorefrom number of

(co-)occurrences

Calculate scorefrom non-shared

partners

Calculate scorefrom non-shared

partners

Calibrate against KEGG mapsCalibrate against KEGG maps

Infer associations in other speciesInfer associations in other species

Combine evidence from experimentsCombine evidence from experiments

Page 10: STRING - Cross-species integration of known and predicted protein-protein interactions

Mining microarray expression databases

Re-normalize arraysby modern methodto remove biases

Re-normalize arraysby modern methodto remove biases

Buildexpression

matrix

Buildexpression

matrix

Combinesimilar arrays

by PCA

Combinesimilar arrays

by PCA

Calculate pairwiselinear correlation

coefficients

Calculate pairwiselinear correlation

coefficients

Calibrateagainst

KEGG maps

Calibrateagainst

KEGG maps

Inferassociations inother species

Inferassociations inother species

Page 11: STRING - Cross-species integration of known and predicted protein-protein interactions

?

Source species

Target species

Evidence transfer based on “fuzzy orthology”

• Orthology transfer is tricky– Correct assignment of orthology

is difficult for distant species

– Functional equivalence cannot be guaranteed for in-paralogs

• These problems are addressed by our “fuzzy orthology” scheme– Confidence scores for functional

equivalence are calculated from all-against-all alignment

– Evidence is distributed across possible pairs according to confidence scores in the case of many-to-many relationships

Page 12: STRING - Cross-species integration of known and predicted protein-protein interactions

Multiple evidence types from several species

Page 13: STRING - Cross-species integration of known and predicted protein-protein interactions

Getting more specific – generally speaking

• Benchmarking against one common reference allows integration of heterogeneous data

• The different types of data do not all tell us about the same kind of functional associations

• It should be possible to assign likely interaction types from supporting evidence types

• The aim: to construct an accurate, qualitative models of biological systems or processes

• The models should be accurate even at the level of individual interactions

• This allows specific, testable hypotheses to be made based on high-throughput experimental data

Page 14: STRING - Cross-species integration of known and predicted protein-protein interactions

Yeast culture Microarrays Gene expression Expression profile

600 periodically expressed genes (with associated peak times) that encode “dynamic

proteins”

The parts listNew analysis

Getting the parts list

Cho & Spellman et al.

Page 15: STRING - Cross-species integration of known and predicted protein-protein interactions

Constructing a reliable protein network

• The stickiness of an interaction was scored based on its local network topology

• We benchmarked these scores for each individual data set against a common reference

• Impossible interactions were eliminated based on subcellular localization data

• By restricting the network to a particular system the error rate is further reduced

Page 16: STRING - Cross-species integration of known and predicted protein-protein interactions

Cell cycle microarray

data

Physical PPI interactions with

confidence scores

Expand the set of proteins to include non-periodic proteins that are strongly connected to

periodic proteins

Raw DataNode selection

List of periodically expressed proteins

with peak time

Interactions

Require compatible compartments and high confidence

Extract cell cycle network

Extracting a cell cycle interaction network

Page 17: STRING - Cross-species integration of known and predicted protein-protein interactions

The temporal interaction network

Interacting proteins are expressed close in time

Two thirds of the dynamic proteins lack interactions but likely participate in transient interactions

Page 18: STRING - Cross-species integration of known and predicted protein-protein interactions

Static proteins comprise a third of the interactions at all times of the cell cycle

Their time of action can be predicted from interactions with dynamic proteins

Static proteins play a major role

Page 19: STRING - Cross-species integration of known and predicted protein-protein interactions

Cdc28p and its interaction partners

Page 20: STRING - Cross-species integration of known and predicted protein-protein interactions

Just-in-time synthesis vs. just-in-time assembly

Most dynamic proteins are expressed just before they are needed to carry out their function

Most complexes also contain static proteins

Just-in-time assembly of complexes appear to be the general principle

The time of assembly is controlled synthesizing the last subunits just-in-time

Page 21: STRING - Cross-species integration of known and predicted protein-protein interactions

Assembly of the pre-replication complex

Page 22: STRING - Cross-species integration of known and predicted protein-protein interactions

Network as a discovery tools

The network enables us to place 30+ uncharacterized proteins in a temporal interaction context

Quite detailed hypotheses can be made concerning the their function

The network also contains entire novel modules and complexes

Page 23: STRING - Cross-species integration of known and predicted protein-protein interactions

Transcription is linked to phosphorylation

A genome-wide screen identified 332 Cdc28p targets, which include– 6% of all yeast proteins

– 8% of the static proteins

– 27% of the dynamic ones

A similar correlation was observed with predicted PEST regions

This suggests a hitherto undescribed link between transcriptional and post-translational control

Page 24: STRING - Cross-species integration of known and predicted protein-protein interactions

Conclusions

• Genomic context methods are able to infer the function of many prokaryotic proteins from genome sequences alone

• Integration of large-scale experimental data allows similar predictions to be made for eukaryotic proteins

• Benchmarking is a prerequisite for data integration

• It is possible to construct highly reliable models through careful integration of high-throughput experimental data

• Try STRING at http://string.embl.de

Page 25: STRING - Cross-species integration of known and predicted protein-protein interactions

Acknowledgments

• The STRING team– Christian von Mering– Berend Snel– Martijn Huynen– Daniel Jaeggi– Steffen Schmidt– Sean Hooper– Julien Lagarde– Mathilde Foglierini– Peer Bork

• New context methods– Jan Korbel– Christian von Mering– Peer Bork

• Cell cycle analysis– Ulrik de Lichtenberg– Thomas Skøt Jensen– Anders Fausbøll– Søren Brunak

Page 26: STRING - Cross-species integration of known and predicted protein-protein interactions

Thank you!


Recommended