Shared componentsShared componentsof protein complexesof protein complexes
Roland KrauseRoland KrauseFirst Online EMBL PhD symposiumFirst Online EMBL PhD symposium
December 4th December 4th –– 8th, 2006 8th, 2006
2
About protein complexesAbout protein complexes
Proteins interact to form stable, functional unitsProteins interact to form stable, functional units Can be easily observed experimentallyCan be easily observed experimentally Delineation of physical units defines our interpretation of theDelineation of physical units defines our interpretation of the
cellular mechanismscellular mechanisms Several proteins contribute to more than one complexSeveral proteins contribute to more than one complex
Shared componentsShared components
This presentationThis presentation Focus on open issues for the upcoming online discussionFocus on open issues for the upcoming online discussion Definition of protein complexesDefinition of protein complexes What do we know about shared componentsWhat do we know about shared components
Focus on TAP-MS data from Focus on TAP-MS data from Saccharomyces cerevisiaeSaccharomyces cerevisiae
3
Helpful notesHelpful notes
References used in this presentation can be found inReferences used in this presentation can be found inConnoteaConnotea, tagged with , tagged with „„1VS1VS““..
www.www.connoteaconnotea.org.org
Roland KrauseRoland Krause Max-Planck-Institute for Molecular Genetics, Berlin, GermanyMax-Planck-Institute for Molecular Genetics, Berlin, Germany Max-Planck-Institute for Infection Biology, Berlin, GermanyMax-Planck-Institute for Infection Biology, Berlin, Germany www.www.molgenmolgen.mpg..mpg.de/~krausede/~krause
rolandroland..krause@[email protected]
Analyzing protein complexesAnalyzing protein complexes
5
How to identify protein complexesHow to identify protein complexes
Tandem-affinity purification (TAP)Tandem-affinity purification (TAP) Addition of the TAP-tag to theAddition of the TAP-tag to the
target gene (the bait)target gene (the bait) Isolation of the protein tag,Isolation of the protein tag,
including proteins (preys) bindingincluding proteins (preys) bindingthe target proteinthe target protein
Two step purification procedureTwo step purification procedure Few contaminants due toFew contaminants due to
different conditionsdifferent conditions
Separation of components of theSeparation of components of thecomplexcomplex
Mass spectrometryMass spectrometry Identification of proteins, typicallyIdentification of proteins, typically
after after trypsinationtrypsination
Prey B
Prey D
Prey A
Prey CBait Tag
Bait Tag
Prey B
Prey D
Prey A
Prey CBait Tag
6
Recent datasetsRecent datasets
TAP-MS data forTAP-MS data forSaccharomyces cerevisiaeSaccharomyces cerevisiae Gavin Gavin et alet al, 2006, 2006
1993 bait proteins1993 bait proteins 2760 total proteins2760 total proteins 491 complexes491 complexes
Krogan Krogan et alet al, 2006, 2006 2352 2352 bait proteinsbait proteins 4073 total 4073 total proteinsproteins 2708 in 2708 in core setcore set 541 541 complexescomplexes
Literature curatedLiterature curatedreference data setsreference data sets MIPS MIPS set set of of complexescomplexes
State of State of the the art art since since 20012001 ~200 complexes,~200 complexes,
hierarchically organizedhierarchically organized
Reguly Reguly et alet al, 2006, 2006 Includes protein complexesIncludes protein complexes ‘‘jbiol36-s1.txtjbiol36-s1.txt’’, Supp. 2, Supp. 2 256 256 complexescomplexes
7
Quality of the data Quality of the data
In addition to In addition to bona fide bona fide interactors interactors of a given protein, all protein-of a given protein, all protein-protein interactions screening methods find manyprotein interactions screening methods find many additional,additional,seemingly unrelated proteinsseemingly unrelated proteins
TAP-MS was benchmarked to have a reproducibility of 70%TAP-MS was benchmarked to have a reproducibility of 70% How to deal with abundant proteins/contaminants?How to deal with abundant proteins/contaminants?
Ssb1/Ssa2 are found in most purifications and are likely to be part of allSsb1/Ssa2 are found in most purifications and are likely to be part of all Contaminants differ between screens and type of mass spectrometryContaminants differ between screens and type of mass spectrometry
No accepted method for evaluation existsNo accepted method for evaluation exists The individual experiments are performed under stableThe individual experiments are performed under stable
conditionsconditions The comparability between the experiments is probably superior toThe comparability between the experiments is probably superior to
individual experiments under specific conditionsindividual experiments under specific conditions
8
Examples for discoveriesExamples for discoveries
Novel, confirmed complexesNovel, confirmed complexes 90S Pre-Ribosome90S Pre-Ribosome
Gives rise to the primordial, Gives rise to the primordial, nucleolar nucleolar ribosomeribosome One of the largest complex in the yeast cellOne of the largest complex in the yeast cell Established functionally by Grandi Established functionally by Grandi et alet al, 2002, Dragon , 2002, Dragon et alet al, 2002, 2002
COP9/SignalosomeCOP9/Signalosome ““MissingMissing”” complex known in human, fly, Arabidopsis complex known in human, fly, Arabidopsis Known to be related to the 19S regulatory part of the proteasomeKnown to be related to the 19S regulatory part of the proteasome Shares components with the proteasome in yeastShares components with the proteasome in yeast
Novel interactors for known complexesNovel interactors for known complexes Iwr1 with RNA polymerase II (Iwr1 with RNA polymerase II (Krogan Krogan et alet al, 2006), 2006) YFL049w/Swp82 with SWI/SNF complex (Gavin YFL049w/Swp82 with SWI/SNF complex (Gavin et alet al,, 2002)2002) Apparent underestimate of protein complexes in the reference literatureApparent underestimate of protein complexes in the reference literature
9
A negative list of propertiesA negative list of properties
Little co-expression of complexesLittle co-expression of complexes Only few proteins are co-Only few proteins are co-
expressedexpressed RibosomeRibosome TypicallyTypically, , only core elements only core elements ofof
complexescomplexes Can we use transcription forCan we use transcription for
benchmarks of complexbenchmarks of complexpredictions?predictions?
Jensen Jensen et al, et al, 2006 (2006 (Cell cycleCell cycle))
No regulation of complex elementsNo regulation of complex elementsunder the same under the same promotorpromotor Simonis Simonis et al,et al, 2004 2004 Recent ChIP-chip dataRecent ChIP-chip data
No regulation of abundance at theNo regulation of abundance at theprotein levelprotein level Abundance data for all proteins inAbundance data for all proteins in
yeast from proves otherwiseyeast from proves otherwise((Ghaemmaghami Ghaemmaghami et al. 2003)et al. 2003)
Yeast two-hybrid does not resolveYeast two-hybrid does not resolvelocal structures local structures of of complexescomplexes Two proteins could be bridged byTwo proteins could be bridged by
additional additional proteinsproteinsAloyAloy, Russel, Russel
We can use We can use TAP-MS to TAP-MS to modelmodellocal interactionslocal interactions
Hernandez Hernandez et al, et al, 20062006
Can we use data integration forCan we use data integration foraccurate complex predictionaccurate complex prediction??
10
Computational approachesComputational approachesto complex predictionto complex prediction
MCODE (Bader and Hogue, 2003)MCODE (Bader and Hogue, 2003) Identification of local densitiesIdentification of local densities Multiple assignment possibleMultiple assignment possible
Spirin Spirin and and MirnyMirny, 2003, 2003 Superparamagnetic Superparamagnetic clusteringclustering
Krause Krause et alet al, 2003, 2003 Clustering of purifications, notClustering of purifications, not
proteinsproteins Works only with TAP-MS like dataWorks only with TAP-MS like data Preserves direct link to the source dataPreserves direct link to the source data
MCL - Markov clustering (Pereira-Leal,MCL - Markov clustering (Pereira-Leal,et al, 2004)et al, 2004) Used in Used in Krogan Krogan et al 2006et al 2006 No overlapping componentsNo overlapping components
Cost based clustering (King Cost based clustering (King et al, et al, 2004)2004)
Aloy Aloy and Russell in Gavin et al. 2006and Russell in Gavin et al. 2006 Socioaffinity Socioaffinity scorescore includes aincludes a
maximum of experimental informationmaximum of experimental information Relation of bait and preyRelation of bait and prey Identifies cores, modules andIdentifies cores, modules and
attachmentsattachments
Major differences of the resultsMajor differences of the results Number of complexesNumber of complexes Number of proteins consideredNumber of proteins considered
Difficulties in the comparisonDifficulties in the comparison No agreed benchmark procedureNo agreed benchmark procedure No consensus on a good solutionNo consensus on a good solution
11
Can we identify complexes reliably?Can we identify complexes reliably?
Unprecedented quality of the dataUnprecedented quality of the data
Confounding purification of Confounding purification of ‘‘oddodd’’ proteins proteins ContaminantsContaminants Weak interactionsWeak interactions Is there a distinction between the two?Is there a distinction between the two?
One of the problems in recognizing protein complexes lies in the complex biologyOne of the problems in recognizing protein complexes lies in the complex biology Many proteins need to be assigned to be more than one complexMany proteins need to be assigned to be more than one complex
Shared componentsShared components
For computational complex predictionFor computational complex prediction Increased complexity of the taskIncreased complexity of the task Can functional homogeneity be a useful criterion?Can functional homogeneity be a useful criterion?
Community effort to start a generic, accepted comparison schemeCommunity effort to start a generic, accepted comparison scheme
What do we know aboutWhat do we know about
shared components?shared components?
13
MIPS data set of complexes(Brohée and van Helden, 2006)
14
Shared components Shared components and hub and hub proteinsproteins
Shared components: Proteins that contribute to more than oneShared components: Proteins that contribute to more than onedistinct complexdistinct complex Variant complexes Variant complexes –– homologous complexes that retain unique parts homologous complexes that retain unique parts ConnectorsConnectors Local high degreeLocal high degree
Hub proteins: Proteins with a high degree in protein interactionHub proteins: Proteins with a high degree in protein interactionnetworksnetworks Highly connected proteins are removed in typical analysesHighly connected proteins are removed in typical analyses The remainder have been shown to have unsual propertiesThe remainder have been shown to have unsual properties
EssentialityEssentiality Conserved in evolutionConserved in evolution InterconnectivityInterconnectivity
15
Reasons for the observations Reasons for the observations ofofshared componentsshared components
Variant complexesVariant complexes Similar complexes that vary Similar complexes that vary in in few componentsfew components
Aggregation into megacomplexesAggregation into megacomplexes Transient interactorsTransient interactors
Convenient excuse for missing featuresConvenient excuse for missing features
Important for unicellular eukaryoteImportant for unicellular eukaryote, , increasingincreasingimportance for higher organismsimportance for higher organisms Protein Protein family specific expansionfamily specific expansion Alternative Alternative splicingsplicing
17
Tps3
Tsl1
Tps2
Tps1
Tps2
Tps1
Tps1
Tsl1
Tps2
Tps3
Trehalose Trehalose 6-phosphate 6-phosphate phosphatasephosphatase
Catalytic activity of Tps1 and Tps2 (shared)Catalytic activity of Tps1 and Tps2 (shared) Regulatory function with Tsl1 or Tps3 (exclusive)Regulatory function with Tsl1 or Tps3 (exclusive)
18
Cdc55Pph21
Tpd3
Rts1Pph21
Tpd3
Cdc55Pph22
Tpd3
Rts1Pph22
Tpd3
Cdc55
Pph21
Tpd3
Rts1
Pph22
Protein Protein phosphatase phosphatase 2A2A
19
Histone acetylase Histone acetylase complexes incomplexes inyeastyeast
Not shown: SAGA, several other complexes with overlapping components to the shown.(Krause et al. 2004)
Modular Modular decomposition decomposition of of proteinproteincomplexes complexes to to tackle variant complexestackle variant complexes
21
PrimeP
ParallelII
Series*
Chain, Shift of function between elements ofthe prime module.
Strong modules in undirected graphsStrong modules in undirected graphs
All proteins grouped to achieve a commonfunction. Obligatory interactors.
Alternative proteins that perform the same function.Often homologs.
“AND”
“XOR”
Identify nodes that have the same outside neighbours.
22
Tsl1Tps2
Tps1
Tps2Tps3
Tps1
Tsl1
Tps2
Tps3
Tps1
Modular decomposition
Trehalose Trehalose 6-phosphate6-phosphatephosphatasephosphatase
The tree describes our knowledge of the complexThe tree describes our knowledge of the complex Tps1 and Tps2 combine with either Tps3 or Tps1Tps1 and Tps2 combine with either Tps3 or Tps1
See See Gagneur Gagneur et al, 2004 for detailset al, 2004 for details
Tsl1Tps2 Tps3Tps1
II
23
The importanceThe importanceof the missing interactionof the missing interaction
The only distinction to show the alternative of Tsl1 andThe only distinction to show the alternative of Tsl1 andTps3 is the homology and the lack of a singleTps3 is the homology and the lack of a singleinteractioninteraction
When studying local, dense system, the notion of theWhen studying local, dense system, the notion of theabsence of binding between proteins allows to identifyabsence of binding between proteins allows to identifyvariant complexesvariant complexes
TAP-MS is a superior method to reveal suchTAP-MS is a superior method to reveal suchinteractionsinteractions
However, when studying local interaction, how can weHowever, when studying local interaction, how can weshow the absence reliably?show the absence reliably?
24
Results on the yeast setResults on the yeast set
Only few complexes have purifications for all complex membersOnly few complexes have purifications for all complex members TAP-MS does not retrieve direct interactionsTAP-MS does not retrieve direct interactions
No spoke model for modular decompositionNo spoke model for modular decomposition Missing purifications increases the number of parallel modulesMissing purifications increases the number of parallel modules
The algorithm is prone to contaminantsThe algorithm is prone to contaminants Ideally,Ideally,
Purifications for all baits under considerationPurifications for all baits under consideration Repetitions of individual experimentsRepetitions of individual experiments
Can we tackle mega-complexes?Can we tackle mega-complexes?
25
MegacomplexesMegacomplexes
Many protein complexes interact themselves to formMany protein complexes interact themselves to formlarger unitslarger units Considered as hierarchies in the MIPS complex data setsConsidered as hierarchies in the MIPS complex data sets
ExamplesExamples RibosomeRibosome ProteasomeProteasome Transcriptional machinery (~700 proteins)Transcriptional machinery (~700 proteins)
Depending on experimental conditions one retrievesDepending on experimental conditions one retrieveseithereither separate entities orseparate entities or the the megacomplexmegacomplex.. Typically, a mixture is encounteredTypically, a mixture is encountered
26
Transient interactionsTransient interactions
Definitions by different researchers varyDefinitions by different researchers vary Co-expression, maintenance through the cell cycleCo-expression, maintenance through the cell cycle Manual classificationManual classification
Stable complexes are obligatory Stable complexes are obligatory interactorsinteractors Multi-subunit enzymes are stableMulti-subunit enzymes are stable Receptor-ligand Receptor-ligand bindingbinding is unstableis unstable
Kinetic dataKinetic data Available from some binary interactionsAvailable from some binary interactions Largely unavailable for multi-component complexesLargely unavailable for multi-component complexes
27
Improvements and challengesImprovements and challenges
Determining complexesDetermining complexes Create a notion of proteinCreate a notion of protein
complexcomplex Define complex benchmark setsDefine complex benchmark sets Define standardized methods forDefine standardized methods for
complex comparisoncomplex comparison
Modular decompositionModular decomposition Could be used to create aCould be used to create a
benchmark setbenchmark set Importance of true negativeImportance of true negative
interactionsinteractions
Integrating interaction dataIntegrating interaction data Different types of data might help toDifferent types of data might help to
find find ““obviousobvious”” stable modules stable modules Many of the intricate features areMany of the intricate features are
missed in such approachesmissed in such approaches
Aim to create a high confidence dataAim to create a high confidence dataset using a single method (TAP) ratherset using a single method (TAP) ratherthan combining several one passthan combining several one passmethodsmethods
How to generate funding to perform aHow to generate funding to perform ax-fold coverage sampling of the yeastx-fold coverage sampling of the yeastinteractomeinteractome??
28
Open questions for Open questions for bioinformaticbioinformaticanalyses of protein complexesanalyses of protein complexes
How to benchmark the predictions of protein complexes?How to benchmark the predictions of protein complexes? Community efforts Community efforts –– public discussion public discussion
Shared components are a biological features of proteinShared components are a biological features of proteincomplexescomplexes Hub proteins vs shared components?Hub proteins vs shared components? Will they emerge simply from better data?Will they emerge simply from better data? Are our data sources sufficiently fine grained?Are our data sources sufficiently fine grained? How to treat How to treat ““transient interactionstransient interactions”” and and ““contaminantscontaminants””??
Build a ontologically precise definition of protein complexesBuild a ontologically precise definition of protein complexes Are semi-automated definitions feasible?Are semi-automated definitions feasible?
29
Thanks to Thanks to ……
Peer BorkPeer Bork Georg CasariGeorg Casari Thomas Thomas DandekarDandekar Julien GagneurJulien Gagneur Anne-Claude GavinAnne-Claude Gavin Bernhard KBernhard Küsterüster Christian von Christian von MeringMering Gitte NeubauerGitte Neubauer Jens RickJens Rick Rob RussellRob Russell Giulio Superti-FurgaGiulio Superti-Furga JJörg Schultzörg Schultz
30
Structural arrangementsStructural arrangements
for shared componentsfor shared components
32
Connectors between complexesConnectors between complexes
Tethering of two complexesTethering of two complexes
Sus1Sus1 Suggested to connect nuclearSuggested to connect nuclear
export and early gene expressionexport and early gene expression
33
Re-use of existing structuresRe-use of existing structures
Lsm1-7 complex cytoplasmic mRNA-capping/degradation
Lsm2-8 complex Nuclear U6 snRNP assembly
34
Exchange of the shared componentExchange of the shared component
SignallingSignalling networks networks Flow of informationFlow of information
Possibly across membranesPossibly across membranes
Sharing/ Exchange
35
Evolutionary scenariosEvolutionary scenarios
Histone acetylases RNA polymerases
Major functional unitsare duplicated
Small proteins of thecomplex are kept
Partial duplication of complex
36
Recruitment of factorsRecruitment of factors
Shared component of two unrelatedprotein complexes No conserved element between the two
complexes
Swd2 – WD40-containing protein inSET3-histone methylase andpolyadenylation
Recruitment