Date post: | 13-Mar-2016 |
Category: |
Documents |
Upload: | astra-santos |
View: | 46 times |
Download: | 0 times |
0
Announcement
• JOBIM007, the next French Bioinformatics meeting will be held in Marseille in early july 2007
• The official announcement will be made through the bioinfo mailing list in December 2006
Bernard Jacq
IBDML MarseilleTAG 2006Annecy
Analysis of Protein-protein interaction networks :
towards functional classificationsof proteomes
Bernard JacqIBDML Marseille
TAG 2006Annecy
Analysis of Protein-protein interaction networks :
towards functional classificationsof proteomes
• Often it is possible to understand the cellular functions of uncharacterized proteins through their linkages to characterized proteins. In broader terms, the networks of linkages offer a new view of the meaning of protein function, and in time should offer a deepened understanding of the functioning of cells.
David Eisenberg, Edward M. Marcotte, Ioannis Xenarios & Todd O. YeatesNature (2000), 405, 823-826
• A complete understanding of protein functionality will require information on many levels: knowledge of transcriptional, translational and posttranslational regulation, binding constants, structures, protein interactions and cellular networking …..
Chandra L. Tucker, Joseph F. Gera and Peter UetzTrends in Cell Biology (2001), 11, 102-105
Summary
• The notion of protein function(s) and its complicated relationships with protein structure
• Bioinformatics approaches to the study of protein function(s)
• The Bioinformatics study of protein-protein interaction networks as a powerful way to study, predict and compare protein function(s)
Summary
• The notion of protein function(s) and its complicated relationships with protein structure
• Bioinformatics approaches to the study of protein function(s)
• The Bioinformatics study of protein-protein interaction networks as a powerful way to study, predict and compare protein function(s)
From Ognjenka Goga Vukmirovic & Shirley M. TilghmanNature (2000), 405, 820-822
Multi-disciplinary approaches to study protein function
Structure and function are the yin and the yang of biology
Function : toxin,kills the cell
Yin
Yang
Protein 3D structure
The Causal Relations between Structure and Function in BiologyE. Stanley AbbotAmerican Journal of Psychology, Vol. 27, No. 2 (Apr., 1916) , pp. 245-250
The study of relationships between structure and function in biology is a very old question
Structure and function of biological objects
1/size Organ : kidney
Cell : tubular epithelial cell
molecule : ion channel
The kidney filter wastes (especially urea) from the blood and excrete them, along with water, as urine
An ion channel is an an assembly of several integral membrane proteins which permit the passage of ions through the membrane
Specialized cell of the kidney, involved in blood filtering
Structure Function
WHAT ARE PROTEIN STRUCTUREAND PROTEIN FUNCTION ?
• Proteins (amino-acid chains) fold in a specific manner in the 3D space, thus adopting unique shapes.
• The structure of a protein corresponds to a representation of this physical object (primary, secondary, ternary and quaternary structures)
• Even if this object is too smal to be seen directly (or under any microscope) visible, we have a very precise idea of its shape and organisation, thanks to X-Ray or NMR techniques.
• Each structural type (primary, secondary,…) of a given protein can be described precisely and unambiguously, allowing its computational manipulation (e.g. a primary structure is described using a string chain made of 20 possible characters only)
STRUCTURE
WHAT ARE PROTEIN STRUCTUREAND PROTEIN FUNCTION ?
• The function(s) of a protein corresponds to the effector properties of the structure, at different biological levels described thereafter.
• In contrary to the case of protein structures, there is no unique and non-ambiguous way to describe protein function
• This situation has precluded the use of bioinformatics in studying protein function … until a recent period.
FUNCTION
BIOCHEMICAL FUNCTIONMolecular activity of the gene product
Examples : ATPase, DNA-binding protein …
CELLULAR FUNCTIONCellular process in which the gene product is
involved integration of the biochemical function within a given process
Examples : DNA synthesis, nucleotide metabolism, protein trafic .....
It is essential to distinguish
different functional levels
Biochemical functions : Transcription factors DNA-binding protein
Cellular functions : RNA polymerase II dependant transcription chromatin/chromosome structure Carbohydrate Metabolism
EXAMPLE : THE FUNCTIONS OF THE YEAST RAP1 PROTEIN
There are more than two possible functional levels
Structural levels Functional levels
Different levels of functional integration
integration
Molecule Biochemical function
PathwaysInteraction networksbetween molecules
Cells
Tissues, organs
Organisms
Populations
Physiological regulations
Development, reproduction,aging
Inter-species relationships, Ecological Equilibria
Migrations,Communications
Structural levels Functional levels
Protein function can be defined at many structural levels
integration
Molecule Biochemical function
PathwaysInteraction networksbetween molecules
Cells
Tissues, organs
Organisms
Populations
Physiological regulations
Development, reproduction,aging
Inter-species relationships, Ecological Equilibria
Migrations,Communications
Protein function : a complex notion
• A function has to be defined in the context of a structural level
• A protein can have different functions, either within one given structural level and/or at different structural levels
• Necessity of a common language to describe function in different organisms : the GO initiative (Gene Ontology)
Summary
• The notion of protein function(s) and its complicated relationships with protein structure
• Bioinformatics approaches to the study of protein function(s)
• The Bioinformatics study of protein-protein interaction networks as a powerful way to study, predict and compare protein function(s)
How can we represent the function of a protein in a
computer ?• Description of a function with sentences (free text)• Keywords• Ontologies (EC Numbers, GO)
• Use raw, functional data :- expression data (in situ hybridisation,
microarrays)- data from protein complexes- binary interaction data (PP, P-DNA)
• Other, new ways ?
What would we like to do with computer representations of
protein function?
• Describe protein function :• be able to do it at different granularity levels
• Compare function(s)• for different proteins of a same organism• for the « same » proteins of different organisms• for different proteins of different organisms
• Predict function(s)
Functional prediction methods which make use of
genomic data
Genomic functional prediction methods
Inferences by correlation
Gene organisation conservation between organisms Rosetta Stone method (Marcotte et al. (1999),
Science 285, 751-753)
Gene order conservation between organisms Neighbour genes method (Dandekar et al. (1998)
TIBS 23, 324-328; Overbeek et al. (1999) PNAS 96, 2896-2901)
Qualitative gene content variations between organisms Phylogenetic profiles method (Pellegrini et al.
(1999) PNAS 96,4285-4288)
Marcotte et al.,Nature 402, 83-6 (1999)
Combined methods for
functional predictions
Nature 402, 83-6 (1999)
Example of a network
of functional
links between proteins
Propose the likely existence of functional links between proteins
These functional links suggest : that the corresponding proteins participate in a same cellular processus same or related cellular function
that there possibly exist direct interactions between these proteins (protein-protein interactions or protein-DNA) or indirect ones (protein complexes, genetic interactions)
Functional inference methods using correlations in genomic data :
Summary
Summary
• The notion of protein function(s) and its complicated relationships with protein structure
• Bioinformatics approaches to the study of protein function(s)
• The Bioinformatics study of protein-protein interaction networks as a powerful way to study, predict and compare protein function(s)
Limitations of genomic functional prediction methods
• Are often based upon inferences making use of structural data (sequence alignments, domain fusions, gene neighbors, phylogenetic profiles)• Sequence/structure similarity does not always mean functional similarity• Very often, these methods can be applied to a subset of a proteome only (e.g. rosetta stone method)• Are very dependant of annotation quality• Usually need a complete genomic sequence • Problems with automatised annotation transfer between proteins (transitive catastrophes)• None of these methods give access to cellular (or upper level) functional predictions; predictions usually remain at the biochemical level
• NB: In any case, a prediction has always to be experimentally verified !
StructureSequence
Function
Functional predictions:
Transcriptome Proteome Interactome
Genome
Classical approachesNew, Genomic approaches
THE PROTEIN-PROTEIN INTERACTION NETWORK
A PPI NETWORK CAN BE REPRESENTED BY A NON-ORIENTED GRAPH IN WHICH NODES REPRESENT PROTEINS AND EDGES
THE PHYSICAL INTERACTIONS BETWEEN THEM
HOW TO ANALYZE INTERACTION NETWORKS ?
SOME BIOLOGICAL QUESTIONS ASKED FROM A GRAPH THEORY POINT OF VIEW
Biology Graph theory
The largest group of interacting proteins?
The largest connected component of the graph
Proteins participating to the same complex ?
The quasi-cliques
Proteins involved in the same biological process ?
Classification and comparison based on the cellular
function ?
Extraction of graph node classes using classification
methods
Tucker, Gera and Uetz
Trends in Cell Biology, March 2001
AB
DC
What can be inferred about the functional relationships between A and B on the one hand and C and D on the
other ?
C and D interact directly and share several common interactors, whereas A and B do not
It is likely that the network (cellular) functions of C and D are related whereas that of proteins A and B are not
Development of a new functional classification
method (ProDistIn)
The central idea :
Do not compare proteins themselves but…
… compare the lists of their interactors…
• Aim : Develop a new method able to extract functional informations from the structure of a complex network; visualise it in an intuitive way
• Hypothesis : for any two proteins : - many common interaction partners => related functions- few or no common partners => unrelated function
• Approach :
The PRODISTIN method : Objectives and approach
Interaction graph distance matrix-T
0.8-Z
0.60.6-Y
0.70.50.4-X
TZYX
-T
0.8-Z
0.60.6-Y
0.70.50.4-X
TZYX
Classification tree
Class identification(topology, GO Biological
Process annotations)
Annotated tree
1- Czekanovski-Dice distance for protein pairs
e
c a
b
fgh
Y
d
XD(X, Y) = X spec + Y spec
(X U Y) + (X Y) 1 + 48 + 3
= 0.45 =
-T
0.84-Z
0.660.6-Y
0.770.50.45-X
TZYX
2- distance table for all possible pairs
ijklm
Z
T
nop
In order to make a functional comparison between N proteins:
- calculate D for all pairwise comparisons of proteins
- fill in a distance matrix
XYZT
3- clusterisation and tree drawing
Apply a clusterisation method (e.g. NJ) and
build a functional similarity tree
ProDistIn : the 3 first steps
Test on the yeast proteome• A total of 2946 direct protein-protein interactions
involving 2143 proteins• Only proteins with at least 3 interactors are
considered further• => Classification of 602 yeast proteins (10% of
the proteome)
• Double-hybrid screens (Fromont-Racine et al., Uetz et al., Ito et al.)• literature (via MIPS and YPD)• Information Extraction on Medline yeast abstracts
Data from :
RESULT :
FUNCTIONAL
PROXIMITY
TREE
FOR 602
YEAST
PROTEINS
Splicing
RNA MATURATION SUBTREE
RNADegradation
RNA METABOLISM GREAT TREE
Degradation
Splicing
?
Maturation3’ extremity
Translation
RNADegradation
Splicing
RNA METABOLISM GREAT TREE
Main conclusions
• Results correlate very well with current functional knowledge• Statistically robust• Allows prediction of protein function • Prediction of new functional groups • Provides an integrated functional view of a proteome
Publication (highly accessed): Brun, Martin, Chevenet, Guénoche, Jacq, Genome Biol. 2003
Cell cycleCell cycleCell cycle
Since its establishment, ProDistIn has already been used to :– Study functinal classes and make functional predictions
on more than 200 yeast proteins– Study the evolutionary fate of yeast genes originating
from an ancient genome duplication– Study the relationship between sequence similarity
and cellular function similarity– Study the main Drosophila signaling pathways in a
general PPI context– Study the human interactome (under way)– Study the interaction of viruses proteins on the human
proteome (under way)
• Aim : • Study Drosophila signaling pathways in the context of the cell
proteome : how are they organised ?• Propose the existence of new players in several classical
pathways
• Approach : • Constitute high-quality binary PP interaction lists for
Drosophila• Perform PRODISTIN classifications• Other types of bioinformatic analyses to analyse
communications (between pathways and with the rest of the interactome)
Study of 9 Drosophila developmental signaling pathways from the interactome perspective
Objectives and approach
A surprising result !
TOR-RAS2
WG2
HH1WG1-N1
TOL3TGF2 TOL1
INS2 SEV-RAS2-INS1
HH2-EGF
TGF1
TOL2
WG3-N2-TGF3
FGF • Pathways are not clustered together
• Each pathway is split in two to threeClasses (modules)
• Proteins from different pathwaysare often found in thesame functional classes
gro
Wnt2wg
fz fz2dsh
sgg
Axn
CkIa
arm
Apc2
Example: the Wnt pathway
Signaling pathways split
Mem
bran
eC
yto
Nuc
leus
TGF1
TGF2
TOL1
TOL2
INS2
N2WG3TGF3
WG2
HH1
WG1 N1SEV RAS2
INS1 TORRAS1
HH2 EGF
TOL TGF NOTCH WINGLESS HEDGEHOG EGF SEVENLESS INSULIN TORSO
Localization surepresentation, corroborated by Molecular FunctionsPolarization of signaling pathway modules
Functional classes localization
Drosophila wg pathwaynew putative
players
Dsh
armApc2Axn
sggCkIalpha
armpan nej
gro
wgWnt2
DshFz2 Fz
Proteins of the 'canonical' pathway
CG3402
dlg1raps
pk Vang
mus309
SH3PX1
Prediction of involvement
The functional classification allows to propose the involvement of new partners in signaling
pathways
Cytoplasm
Nucleus
Main conclusions
Publication: in preparation
- An alternative view of signal transduction :from linear signalling pathways to a modular, integrated signaling network
- Seems to be true for humans
- More communications within the signaling network than that with the rest of the interactome
- prediction of new components/regulators ? - in the Hedgehog pathway --> experimentally tested in P. Thérond's lab (Nice) - in the Wnt pathway --> discussion with R DasGupta (former Perrimon), NYU : predicted components observed in RNAi screens ?
Study of 9 Drosophila developmental signaling pathways from the interactome perspective
-The PRODISTIN method has been automatised and is now accessible to the community through the Webdistin server
- We have developed another graph analysis method to find functional classes based on the density of edges
Publication : Brun et al, BMC Bioinformatics (2004)
- Development of a method which uses weighted graphs. will allow to integrate other types of data (genetic interactions, transcriptome) by weighting the edges of a protein interaction graph (in preparation)
- Adaptation of PRODISTIN to Protein-DNA networks (classification of both proteins and genes)
Recent developments, Projects
Publication : Baudot et al, Bioinformatics (2006)
A change in our view of protein function
Classical view
The function of protein A is defined by
its action on the transformation of substrate (S) into product (P)
S (Substrate)
P (Product)
A
New perspective
A
The function of protein A is defined by
the context of its interactions with other products in the cell
Adapted from Eisenberg et al, Nature (2000)
Bioinformatics of Interactions and Regulations in Develomentgroup
Present Members :
• Anaïs Baudot (PhD Student)
• Christine Brun (Chargée de recherche)
• Carl Herrmann (Assistant Professor)
• Bernard Jacq (Research director)
• Pierre Mouren (Software Engeneer)
• Loredana Martignetti (PhD Student)
• Wissem Souiai (PhD Student)
• Delphine Pothier (M2 Student)
Previous Members (2002-2006)e
• Claudine Chaouyia (Assistant Professor)
• Aitor Gonzalez (PhD Student)
• Magali Lescot (Post-Doc)
• David Martin (PhD Student)
• Denis Thieffry (Professor)
+ 8 summer students
Collaboration: Alain Guénoche (IML)
How can we experimentally discover the function of a new gene/protein ?
1-The « classical approach »
Mutant Phenotype
Sequencing,structure
Functional tests
Gene cloning(one gene)
Proposal of a biochemicaland a cellular function onthe basis of experiments
Inferred biochemical function
From a gene-centeredapproach ……Genetical analysis
Molecular Biology
Molecular Biology,Bioinformatics
Genetical analysis, Biochemistry,
Molecular Biology
genes/proteins are the
elementary components of a system,the variations of
which are being studied. Determination of
cellular functionand access
to high levels of functionintegration
Functional Genomics and
Proteomics, Bioinformatics
The approach is changing, so the way of thinking should also change…
How can we discover the function of a new gene/protein ?
2- The « genomic approach »….Towards
Systems biology
The « Rosetta stone » method
Principle : makes use of gene organisation conservation/differences between organisms and of the modularity of proteins
If, in genome 1, gene A is composed of module alpha et gene B composed of module beta only,
If in genome 2, module alpha and module beta are found associated to build only one gene C
Then A et B could be functionally related genes/proteins.
Marcotte et al., Science 285, 751-753 (1999)
What information, brought about by genomics, can be used to develop
new functional prediction methods ?
WITHIN ONE ORGANISMPutative exons, introns, splice sites …. Presence of regulatory sequences near genes (promoters, enhancers ....)Gene content of an organism
BETWEEN ORGANISMSSequence variation/conservation between organisms Qualitative gene content variations between organisms Gene order conservation between organisms Gene organisation conservation between organisms
Marcotte et al., Science 285, 751-753 (1999)
« Rosetta Stone » method :examples
Principle : makes use of the variation of gene (or group of genes) order on the chromosomes
Dandekar et al. TIBS 1998Overbeek et al. PNAS 1999
ABC
Genome 1
AC
B
Genome 2
ABC
Genome 3
ABC
Genome 4
genes & are functionally relatedA B
Neighbour genes method
d
e
BB, Borrelia burgdorferi; DR, Deinococcus radiodurans; CA, Clostridium acetobutylicum; BS, Bacillus subtilis; EF, Enterococcus faecalis; MP, Mycoplasma pneumoniae; MG, Mycoplasma genitalium; ML, Mycobacterium leprae; MT, Mycobacterium tuberculosis; CJ, Campylobacter jejuni; TP, Treponema pallidum; HP, Helicobacter pylori; ST, Streptococcus pyogenes; PN, Streptococcus pneumoniae.
Example : Gene functional groups in glycolysis
Overbeek et al. (1999) PNAS 96, 2896-2901
Pellegrini et al. PNAS 96, 4285-4288 (1999)
Phylogeneticprofiles method
Principle : makes use of correlations (+ ou -) in the qualitative variation of gene content between different organisms
IBDML :
• Christine Brun
• Anais Baudot
• Bernard Jacq
• Wissem Souiai
Collaborations :
• A. Guénoche, IML
• F. Chevenet (Mtpllier)
• 1- Development of PRODISTIN, a generic method for functional classification of members (proteins, genes) of an interaction network
• 2- Application to the study of the functional evolutionary fate of yeast duplicated genes
• 3- Study of 9 Drosophila developmental signaling pathways from an interactome perspective
Structure andanalysis of protein-protein interaction networks
How can we extend individual functional predictions to a complete interaction
network ? >>> functional clusterisation
Example of the Prodistin method (PROtein DIStance based on INteractions
Brun et al., Genome Biology(2003) R, R6
The use of “complete” data changes everything
In classical molecular biology, the main problem is to try to bring a functional answer concerning one gene without studying nearly all other genes (99,9% of the genes)
In genomics, the cleverness is to imagine what you can do when you « see » all the genes (or a majority of them)
It is therefore necessary to change the way of thinking about genes (group of genes, modules...)
Projects - Take advantage of the knowledge gained in the study of Drosophila developmental pathways to study the PI3K vertebrate pathway :
- analyse the modular organisation of the PI3K pathway - continuation of the constitution of a human interactome from the literature
(Internote tool + human validation)- from this analysis, propose new potential components/regulators of
this pathway (will be submitted to experimental validations)- Evolutionary study of the PI3K pathway
Collaboration with E. Goillot’s (Lyon) and C. Brochier’s (Marseille) groups
- Use our human PPI list and newly obtained Y2H virus-human PPIs to study host-viruses interactions
Collaboration with V. Lotteau’s group (Lyon)
Study of 9 Drosophila developmental signaling pathways from the interactome perspective