Mapping Protein-Protein Interactions
MEDG 505 (Genome Analysis)13 January 2005
•Morin: -Overview-IP-MS-Data integration
•Student presentations:-Y2H interactions-RNA vs Protein expression analysis
•Discussion:-Lessons-Application
Central Dogma
DNA RNA Protein Function
Humans:- ~25,000 genes- 25-40% with functional annotations
General Goal: Annotation of proteome -Identify disease related proteins-Identify therapeutic targets
How identify protein functions?
Protein Function
General purpose of proteins is to interaction with other molecules-Enzyme/substrate-Protein/protein
Cellular processes governed by complex networks of interacting proteins-Determination of protein-protein interactions infers functional hypotheses
Protein Annotation
-verifies biological role-translation to humans
problematic-differences in biology cloud
interpretation
-can verify biological role-binary interactions-often protein fragments-high false positives-extensively employed
-comprehensive and HTP-mRNAs infer proteome-identifies expression changes-silent to PTMs-cause and effect difficult to infer -interactions difficult to predict
Large Scale Methods for annotation of protein function:-Genetic
-Mutational analysis in model organisms-Yeast 2-hybrid
-Genomic-mRNA profiling
-Biochemical-MS analysis of purified protein complexes
-identifies interactions directly-yields higher order interactions -identifies PTMs-binding affinity can be employed-technically challengingLesson: All methods need to be employed to fully annotate proteome.
Tagged Protein Structure
ORF FLAGloxlox
C-tagged construct
N-tagged construct FLAG lox ORFlox
CMV
CMV
Properties of Immunoprecipitated Protein Complexes
Types of interacting proteins• Background binding to bait/matrix/MS (filter?)• Proteins from throughout lifespan • Processing/transport/degradation proteins (filter?)• Weak affinity (less reproducible?)• Strong affinity• Primary interactors• Secondary interactors• High data volume
Experimental design and analysis should be designed for expectations
Methodology for evaluation1-Experimental validation2-Bioinformatic evaluation3-Experimental reproducibility
-transfection/IP protocols
Method Characterization
Characterization Project
1- 49 Baits, from diverse protein families-tag both N and C termini
2- IP-MS, repeat 4+ times
3- 190 preys-hit:
-observed 2+ times-frequency less than 5%
4- Analyze
N- & C-Tag Hit Overlap
# hitsseen with N only 110 0.68seen with C only 29 0.18seen in both N&C 15 0.09seen when N+C are combined 8 0.05total 162
% of total hits
Lessons:1) 5 Hits per Bait.2) N-tags interfere less than C-tags.3) Both tags needed to get good representation.
0.770.27C-tag only experiment
N-tag only experiment
Fraction of total hits observed
Sample33 Baits
Prey Reproducibility
Sample42 Baits190 Preys
Note: ~50% of C-tags have 1.0 rate.
Lesson: Improve immunoprecipitation conditions.
Question: How many trials to see a prey 2 times?
Observed Reproducibility Rate
0.01 0.02 0.
07
0.39
0.01 0.
04
0.17
0.31
0.00
0.10
0.20
0.30
0.40
0.50
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Reproducibility Rate
Fra
ctio
n o
f H
its
N
Average
C
Number of Trials Needed to Observe Prey 2+ Times
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Reproducibility Rate
Fra
cti
on
of
Hit
Po
ol
2
3
4
5
6
# of trials
Note:•If hit = 3+ times then probability = 0.125
Planning Trial Size
2 3 4 5 60 0.00 0.00 0.00 0.00 0.00
0.1 0.01 0.03 0.05 0.08 0.110.2 0.04 0.10 0.18 0.26 0.340.3 0.09 0.22 0.35 0.47 0.580.4 0.16 0.35 0.52 0.66 0.770.5 0.25 0.50 0.69 0.81 0.890.6 0.36 0.65 0.82 0.91 0.960.7 0.49 0.78 0.92 0.97 0.990.8 0.64 0.90 0.97 0.99 1.000.9 0.81 0.97 1.00 1.00 1.00
1 1.00 1.00 1.00 1.00 1.00
Reproducibility Rate
Theoretical Probability of 2+ observations in X # of trials
H H
H T
T H
T T
H
H
T
T
H
T
H
T
H
T
H
T
H
H
T
T
H
T
H
T
T
H
T
H
Rate = 0.5
2 trials 3 trials
p: prey observation frequency
n: number of trials
k: number of observations required (2)
n
k
knkobs pp
knk
nP ))1)(((
)!(!
!)(
Binomial distribution equation
2 3 4 5 60 0.00 0.00 0.00 0.00 0.00 0.00
0.1 0.00 0.00 0.00 0.00 0.00 0.000.2 0.01 0.00 0.00 0.00 0.00 0.000.3 0.02 0.00 0.00 0.01 0.01 0.010.4 0.07 0.01 0.02 0.03 0.04 0.050.5 0.39 0.10 0.19 0.27 0.31 0.340.6 0.01 0.00 0.01 0.01 0.01 0.010.7 0.04 0.02 0.03 0.04 0.04 0.040.8 0.17 0.11 0.15 0.16 0.17 0.170.9 0.00 0.00 0.00 0.00 0.00 0.00
1 0.31 0.31 0.31 0.31 0.31 0.311.00 0.55 0.72 0.83 0.89 0.93
Fraction of Prey
Pool
Predicted Fraction of Observed Prey Pool Found in X # of trialsReproducibility
Rate
Lessons:•Identifies suspect data•Improving reproducibility rate
reduces number of trials needed.
Lessons:•Identifies suspect data•Improving reproducibility rate
reduces number of trials needed.
False Negative Rate
Predicted False Negative Rate
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Reproducibility Rate
Fra
ctio
n o
f H
it P
oo
l
2
3
4
5
6
# of trials
2 3 4 5 60 1.00 1.00 1.00 1.00 1.00
0.1 0.99 0.97 0.95 0.92 0.890.2 0.96 0.90 0.82 0.74 0.660.3 0.91 0.78 0.65 0.53 0.420.4 0.84 0.65 0.48 0.34 0.230.5 0.75 0.50 0.31 0.19 0.110.6 0.64 0.35 0.18 0.09 0.040.7 0.51 0.22 0.08 0.03 0.010.8 0.36 0.10 0.03 0.01 0.000.9 0.19 0.03 0.00 0.00 0.00
1 0.00 0.00 0.00 0.00 0.00
Reproducibility Rate
Theoretical Probability of NOT Observing 2+ in X # of trials
2 3 4 5 60 0.00 0.00 0.00 0.00 0.00 0.00
0.1 0.00 0.00 0.00 0.00 0.00 0.000.2 0.01 0.00 0.00 0.00 0.00 0.000.3 0.02 0.01 0.01 0.01 0.01 0.010.4 0.07 0.05 0.04 0.03 0.02 0.010.5 0.39 0.29 0.19 0.12 0.07 0.040.6 0.01 0.01 0.00 0.00 0.00 0.000.7 0.04 0.02 0.01 0.00 0.00 0.000.8 0.17 0.06 0.02 0.01 0.00 0.000.9 0.00 0.00 0.00 0.00 0.00 0.00
1 0.31 0.00 0.00 0.00 0.00 0.001.00 0.45 0.28 0.17 0.11 0.07
Fraction of Prey
Pool
Predicted Fraction of Prey Pool NOT Found in X # of trialsReproducibility
Rate
Lesson:•1 or 2 trials provides highly
incomplete dataset.
Predicted False Positive Rate Predicted False Postive Rate vs. Database
Frequency
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
2
3
4
5
6
# of trials
PathMap (global) observation frequency
Fals
e po
siti
ve f
requ
ency
Method-determine prey frequency in
database-Assume background proteins have
a uniform random distribution-Assume background does not
change with time or experimental conditions
-Compare prey frequency to predicted observation rate
)()((cutoff)E
))1)((()!(!
!)(
0
ivefalseposit
2
pNumhitsprisk
ppknk
nprisk
cutoffp
p
nm
k
knk
p: prey observation frequency
n: number of trials
k: number of observations required (2)
Efalsepositive: expected number of false positives
cutoff: frequency cutoff
Numhits(p): number of hits at each prey observation frequency
5%
< 0.05
“safe” region
Estimated Experimental False Positive Rate
Random Sampling Method -randomly reassign bait labels for each IP for all 49 baits-repeat -obtain 3, 4, and 5 trial sets, 49 baits each, with preys randomly assigned to a bait
(5% database frequency)-assume random distribution (no relation between baits)
# trials numberpercent (n=190)
Calculated number of false postives
3 1 1% < 14 6 3% < 25 7 4% < 3
Observed reproduced hits (false positives)
Results
-false positive rate 2-3X greater than calculated.-non-uniform distribution
Reasons-not independent experiments
-non-random-baits are related
-cross-contamination-equipment contamination
Managing False Positives
1-Control subtraction-empty vector immunoprecipitation-irrelevant protein immunoprecipitation
2-Reproducibility-2+ times
-3-4 biological replicates
3-Database frequency-observation frequency cutoff
4- Prioritization-annotation
5-Validation-reciprocal immunoprecipitation-co-expression
Human Pathway Pilot Project
Contract design:-20 baits, chosen by customer (17 actually provided)-N & C FLAG tags, constructed by MDSP.-Report all observed interactions.
Additional design parameters:-Expressed and immunoprecipitated 4 times each.-Report all interactions classified as hits.
TNF pathway-Proinflammatory cytokine expressed mainly by activated monocytes and macrophages-Highly studied
-Pathway members provide ready availability of baits.-Understanding incomplete, providing opportunity for discovery
-Disease involvement-Tumor progression and killing -Diabetes-Infection-Inflammation
-Pharmaceutical potential-Find protein targets that perform isolated TNF functions without side-effects.
Nucleus
NK CellFunction
???
xxx5
Rab5
EndocytosisRegulation
CS1
NaChannels
SGK
kinase
CyclinCell cycleControl
???
Transcription
SGK Gene
IKAP
RIPK2
CLARP
FADD
PPP1R3
CaspasesJak
Stat
IL10RBTGFr
NF-kB
IKK-1
IKB
TRAF6
NIK
TANK
CD40 KIAAxxx
Src
Ptyr PP
Fas
PPP
ENaC
CS1/Jasmine/19A24 Gene
A20
TNFr
Cell Death
IL10RA
xxx4
xxxA1xxxA8
KIAAxxx
PITSLRE(8)
xxxGP
xxxA' xxxBxxxA9
B-xxx1
Protein Sorting / Targetingxxx23, xxx-SR, xxx3, FLJxxx, xxx3, xxx4
DNA repair/Damagexxx14, xxx2
OthersxxxL1, xxxC1, FLJxxx,FLJxxx, xxx1, MGCxxx,
KIAAxxx,FLJxxx, xxxA11
xxxA13
xxx12
xxx1A
FLJxxx
CDC2
Xxxx
XRCC7
xxxL1
xxx7
xxx
xxxB12
xxx
xxxA3 xxx8
xxx1
xxx1D
Gxxx
FLJxxx
xxxA1
KIAAxxx
14-3-3
14-3-3 14-3-3
14-3-3
14-3-3
xxx8
xxx37xxx1-L
xxxCB
xxxF1
xxx15
xxx1
xxx4
TRAF2xxxCATBK1
TRAF3
3-xxxxxx1
xxxxxx1
xxx-99
GYS1
PP1CB
xxx
xxx130xxxL1xxxG4
xxx4
xxx14
xxx11
Transcriptional Regulation
xxx13xxx19
Protein Transport
xxx4, xxxA, xxxE, xxxG1, xxxG2, xxx4
xxxB
Transcription
Transcription
TNFa Bait Protein
Other TNFa Pathway protein
Prey protein
Interactions with Bait protein
Activation
Inhibition
Causal (indirect) interactions
with PreysTNFα Pathway: Inflammation/Cancer
- 17 Baits- Both N & C tags- 4 Immunoprecipitations
TNF Pathway Project Summary
Bait information number commentbaits 17membrane baits 3expressed 14 2 not expected to expressmembrane baits expressed 2baits with interactions 13expressed baits with no TNF context 7
Bait/Prey informationpreys 99known interactions 13new interactions 86baits placed in context 5new bait/prey/bait linkages 4 also observed 1 known linkage
Prey informationenzymes 37
proteins in druggable families 20+protease, GTPase, ATPase, kinase, phosphatase, receptor
proteins with no function 13 6 enzymes, 1 receptorhypothetical proteins 4transmembrane (TM) domain containing proteins 15
7 TM 1 receptor?potential plasma membrane proteins 8 others ER or mitochondrial
Potential antibody targets
Genes Regulating Cell Growth and Division
Systematic identification of pathways that couple cell growth and division in yeast
Science 297: 395-400, 2002.
Paul Jorgensen Joy L. Nishikawa
Bobby-Joe BreitkreutzMike Tyers
Program in Molecular Biology and CancerSamuel Lunenfeld Research Institute
Mount Sinai HospitalToronto, Ontario, Canada
Genetic Screen for Yeast Size Mutants
lge
lge
Wild typesize profile
whi
whi
Cell volume (fL)10 35 60 85 110
4812strains(~2 yrs)
sfp1
GALSFP1WT
GAL genes (10)
Nucleotide biosynthesis (12)
tRNA synthetases (6)
ribosome biogenesis (21)
RNA Polymerases I and III (10)
nucleolus (29)
Translation initiationand elongation (17)
Ribosomal protein genes (136)
SFP1
5
31.5-1.5-3
-5
scale
SFP1 regulated genes
Yeast Interaction Map
Ho et al. Nature 10:180-3, 2002.FLAG IP > LC-MS/MS
-725 bait attempts-493 baits > 1578 preys-646 unannotated preys
Protein interactions
Overlap of Genetic, Expression & Interaction Data
Common mRNA regulation
Genetic interaction
NucleolarNetwork
Gene Regulation in Breast Cancer
98 breast tumors x 25000 genes
430 2312460
van’t Veer et al. (2002) Nature 415, 530-6.
“genes that are overexpressed in tumors with a poor prognosis profile are potential targets for the rational development of new cancer drugs”
Proteins in the functional pathway of disease associated genes may identify additional or better therapeutic targets.
Overlap of PathMap and Breast Cancer Genes
van’t Veer et al. (2002) Nature 415, 530-6.
reporter Rosettaenz enz
ER 2460 194 8% 42 515 87BRCA1 430 27 6% 7 208 38Prognostic 231 28 12% 9 27 4
primary secondaryMDSP
Protein Networks in Prognosis Reporters
up regulated
down regulated
only
+ 55
35
4
16
enzyme
Interaction network provides contextInteraction network provides context
Integrated Genomic/Proteomic Breast Cancer Project
van’t Veer et al. (2002) Nature 415, 530-6.
reporter # of genes
ER 2460BRCA1 430Prognostic 231
•Profile gene expression changes during tumor progression •Assemble experimental gene set
-genes with expression changes-genes suspect for breast cancer progression
•Perform IP-MS to determine interacting proteins•Analyze for regulatory networks and critical pathways