The Critical Role of Mass Spectrometry
in a Proteomics Core Facility:
“Can you validate my western blot?”
J. Will Thompson
Sr. Laboratory Administrator
Duke Proteomics Core Facility
Duke Institute for Genome Sciences & Policy
Duke School of Medicine
“ Can you Validate My Western Blot?”One of the simplest, yet most important and impactful roles of a mass spectrometry proteomics core in a Biochemistry/School of Medicine setting is to provide verification of data acquired with classical techniques
IL-28a?
IL-28A (Interferon λ-2)MKLDMTGDCTPVLVLMAAVLTVTGAVPVARLHGALPDARGCHIAQFKSLSPQELQAFKRAKDALEESLLLKDCRCHSRLF PRTWDLRQLQVRERPMALEAELALTLKVLEATADTDPALVDVLDQPLHTLHHILSQFRACIQPQPTAGPRTRGRLHHWLYRLQEAPKKESPGCLEASVTFNLFRLLTRDLNCVASGDLCV
Red – peptides identified by MascotBlue – residues unique to IL-28A (versus IL-28B)
ELISA Standard(100 ng)
MWMarkers
Carrier Proteins(BSA, etc)
Duke Proteomics Core Facility• Established Summer 2007 by Duke School of Medicine and Duke Institute
for Genome Sciences & Policy (Arthur Moseley, Director)
• Now 5 full-time staff (3 Ph.D., 2 batchelors)
• Major Hardware– 5 Nanoacquity UPLCs, 1 with 2D technology
– 3 QToFs (Global Ultima, Premier, Synapt HDMS)
– 1 Xevo TQ
– 1 LTQ Orbitrap XL (HHMI)
– Mesoscale Discovery 2400 Imager
• Informatics Infrastructure– 28 Terabytes of NetApp storage
– 10-blade IBM Mascot server
– Dell R900 Server for Rosetta Elucidator
– Desktop workstations:• PLGS 2.4
• Mascot Daemon
• Mascot Distiller
• Scaffold
• VerifyE
Erik Arthur Will LauraMeredith
Challenges and Opportunities for Mass-Spectrometry Based Proteomics
• Clinical Proteomics / Biomarker Discovery– “Large” clinical-based studies where QC metrics must be tightly controlled,
deliverables are well-defined and expected, and data must be of high quality– “Discovery and Validation of a Serum Proteomic Signature of Response to
Interferon Therapy in Chronic HCV Infection”
• Translational Research– Medium to large-scale studies requiring cutting-edge but robust technology,
with longer timelines and more flexible end deliverables– “Spatial Proteomic Tissue Profiling of the Photoreceptor Rod Cell using
Cryosectioning and Mass Spectrometry”
• Basic Research– Highly collaborative small to medium-scale studies where new technologies
can be tested and ultimately deployed, with loose timelines and where hypothesis generation is many times a key goal
– “A Proteomics Approach to Dissect Lipid Droplet – Chlamydia Interactions”
Serum/PlasmaSample
HAPDepletion
DEPLETED PROTEOME
Peptide and Protein Quantitation
nanoscale
UPLC
MS
Quantitative Pipeline Qualitative Pipeline
Automated data transfer to NetApp enterprise data storage
Integration of quantitative and qualitative data (Rosetta Elucidator)
Automated translation to DB searchable format (.xml, .mgf)
Image Conversion, Image alignment and Quantitative Analysis (Rosetta Elucidator)
Database search of product ion spectra (Matrix Sciences Mascot or Waters’ IdentityE )
Peptide ID Quality Scoring & Translating Peptides to Proteins(Rosetta Elucidator or Proteome Software’s Scaffold)
MSE or MS/MS
Quantitative Serum Protein Mass Spectrometry in the Duke Proteomics Core Facility
DIGEST
Q-ToF Mass Spectrometry
High Resolution Accurate Mass MeasurementsPrecursor Ions and Product Ions
Data Acquisition for Biomarker Discovery
Column Condition QC Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 Sample 8 Sample 9 Sample 10 QC 2 Sample 11 Sample 12 Sample 13
Day 1 (+): Instrument Performance Checks, Column Conditioning, Preliminary database searches
Day 2: Data Collection Day 3: Data Collection
QC X-1 Sample X-5
Sample X-4
Sample X-3
Sample X-2
Sample X-1
Sample X QC X………
Day X: Data Collection
•Strategy is to maximize biological powering by analyzing as many samples as possible•Robust LC-MS platform allows singlicate analysis to be performed of each sample•Data QC is performed by daily injections of a “standard” of the same biofluid(Bioreclamation, Inc)•Need a higher throughput platform with same/better analytical metrics
Association of LC/MS “Features (Isotopes)”
TreatedControl
Raw Data
Ratio Data
Ratio Builder
Combined Data
Combined Data Builder
Aligned Data
PeakTeller
PeptideTeller Results (Keller et al, ISB)Peptide Annotation with Multiple Search Engines
Mascot Searches(semitryptic)
PLGS 2.4 Searches(tryptic)
A common score is assigned based on decoy database validation, allowing annotation with multiple search engines simultaneously with controlled FDR
ProteinTeller StatisticsMerged Protein Annotation with PLGS 2.4 / Mascot v 2.2 searches
“Typical” Protein Annotation Metrics Plasma Proteomics Study (~30+ patients)
• 3944 Peptides to 302 Proteins, single dimension of LC-MS analysis (2hrs/sample)– 3768 Peptides to 104 proteins (with 2+ peptides)
APOB_HUMAN, P04114
CO4B_HUMAN, P0C0L5
CERU_HUMAN, P00450
FINC_HUMAN, P02751
CFAH_HUMAN, P08603
CFAB_HUMAN, P00751
PLMN_HUMAN, P00747
ITIH2_HUMAN, P19823
CO5_HUMAN, P01031
APOA4_HUMAN, P06727
HEMO_HUMAN, P02790
VTDB_HUMAN, P02774
ITIH4_HUMAN, Q14624
AACT_HUMAN, P01011
AFAM_HUMAN, P43652
ANT3_HUMAN, P01008
A1BG_HUMAN, P04217
KNG1_HUMAN, P01042
ITIH1_HUMAN, P19827
THRB_HUMAN, P00734
Peptide Distribution
Metrics for File delivered for stats analysis:•33,862 total Isotope Groups•5065 annotated Isotope Groups•3944 unique peptide sequences•302 unique proteins
Reproducibility of Plasma Datasets
Analytical Variability Analytical + Biological Variability
25% CV
Challenges and Opportunities for Mass-Spectrometry Based Proteomics
• Clinical Proteomics / Biomarker Discovery– “Large” clinical-based studies where QC metrics must be tightly controlled,
deliverables are well-defined and expected, and data must be of high quality– “Discovery and Validation of a Serum Proteomic Signature of Response to
Interferon Therapy in Chronic HCV Infection”
• Translational Research– Medium to large-scale studies requiring cutting-edge but robust technology,
with longer timelines and more flexible end deliverables– “Spatial Proteomic Tissue Profiling of the Photoreceptor Rod Cell using
Cryosectioning and Mass Spectrometry”
• Basic Research– Highly collaborative small to medium-scale studies where new technologies
can be tested and ultimately deployed, with loose timelines and where hypothesis generation is many times a key goal
– “A Proteomics Approach to Dissect Lipid Droplet – Chlamydia Interactions”
Hepatitis C Infection
75% Have Chronic infection
Eligible for Treatment(SOC = IFN/Ribavirin)
Responders Non-responders (>50%)
DyslipidemiaChronic Insulin ResistanceSteatosisHepatic FibrosisLiver Cancer
Discovery Proteomics Focus MURDOCK Horizon 1“Start with an unmet clinical need”
Hepatitis C Virion
Clinical Biomarker / Clinical Diagnostic
3169 CHC patientsDuke Hepatology Database &
Biorepository
Number ofAnalytes
Number ofSamples
10,000s
10s
10-100
100 -1,000
10
1,000s
BiomarkerValidation
BiomarkerDiscovery
BiomarkerVerification
Open Platform LC/MS LC/MS/MS (MRM)Antibody-based Assays
Antibody-based AssaysLC/MS/MS (MRM)
The Classical Biomarker Discovery ParadigmApplication to Hepatitis C
G1 G2 G3R 10 5 5
NR 10 - -
Discovery cohort (n=30)
Discovery Cohort 2 (n=30)
Verification cohort 2 (n=250)
Verification cohort 3 (n=177, Industry Collaborator Clinical Trial)
Verification cohort 1 (n=41)
(March 2008)
(July 2008)
(August 2009)
Hypothesis Testing in Initial HCV 55-Patient Dataset
• Traditional t-Test/ANOVA, no statistically significant individual species– Non-parametric test
– Minimum p-value: 7.5 x 10-5 (not passing Bonferroni)
– Binary regression with best prediction
Model Fitting with Best Single Isotope Group Leave-One-Out Cross Validation
Traditional Hypothesis-Test is Not Powerful Enough to Extract Signal from Noise
Sparse Latent Factor Regression(Bayesian Factor Regression Modeling, BFRM)
35,000 Isotope Groups
Predictive Factor“Metaproteins”
Factor Score“Expression Value”
Statistical Analysis: Joe Lucas, PhD, Duke Institute for Genome Sciences and Policy
• Regression - Leads directly to prediction
• Sparsity – Many isotopes are irrelevant
• Latent Factors – let data determine important relationships
• Resulting model for prediction:
• 3 Metaproteins, 650 Isotope Groups
Latent Factors which contain Biological Information
Gender Differences ????? Differences
Transplant patient
Latent Factors which contain Biological Information
Ethnicity Drinking History
Cross-Validation Results, Predicting HCV Treatment Response (n=55)
Demographics: Race, Gender, Genotype and Viral LoadMetaprotein Factors
Metaprotein Factors + Demographics
AUROCsDemographic Factors 0.69Metaprotein Factors 0.84Both Factors0.89
Independent Verification with 41 New HCV Patients
Meta-protein, training and verification cohort 1
Data Fit for Metaprotein Predictors
Alignment Challenges
• Model is based on 9160 isotope groups
• We must match these to new data– Restrict to identical peptides with identical charge
state
– 1997 matches
• Estimate factor scores – Project the loadings of just these 1997
Accuracy after Projection
• Original model
• Use factors from projection onto 1997
• Same (training) samples
Discovery Cohort Predictions Entire Model
Discovery Cohort Predictions Only Peptides Available in Verification Data
Adapting to Projection
• Model averaging– Stochastic search
– Models that work with the projections
– Throws out poorly performing models
• Use this limited set to predict new 41
Blinded Prediction of Treatment Response
• Sensitivity: .78
• Specificity: .8
• PPV: .89
• NPV: .67
Difficult to set cutoff due to “batch effects”
• Sensitivity: .92• Specificity: .8• PPV: .89• NPV: .88
Moving Past Biomarker Discovery (HCV)Secondary Questions (and Strategy)
• Metaprotein predictors have been used to verify a predictive signature for HCV Tx response in an independent cohort
• Is this signature real and predictive?– More samples; verification/validation cohorts– Large cohort for validation and large number of peptides – 650
– Immunoassay (we know PTMs are important)– MRM
– scientifically the best way forward – 650 peptides is a MRM challenge
• What are the Predictive Proteins/Peptides?– Improve Peptide Annotation in Dataset
• pI Fractionation / Multidimensional LC• Improvement in DB search algorithms
• Improvements in data alignment algorithms
Challenges and Opportunities for Mass-Spectrometry Based Proteomics
• Clinical Proteomics / Biomarker Discovery– “Large” clinical-based studies where QC metrics must be tightly controlled,
deliverables are well-defined and expected, and data must be of high quality– “Discovery and Validation of a Serum Proteomic Signature of Response to
Interferon Therapy in Chronic HCV Infection”
• Translational Research– Medium to large-scale studies requiring cutting-edge but robust technology,
with longer timelines and more flexible end deliverables– “Spatial Proteomic Tissue Profiling of the Photoreceptor Rod Cell using
Cryosectioning and Mass Spectrometry”
• Basic Research– Highly collaborative small to medium-scale studies where new technologies
can be tested and ultimately deployed, with loose timelines and where hypothesis generation is many times a key goal
– “A Proteomics Approach to Dissect Lipid Droplet – Chlamydia Interactions”
Spatial Proteomic Tissue Profiling of the Photoreceptor Rod Cell using Cryosectioning and Mass Spectrometry
Tissue Sectioning, Lysis and Digestion
LC-MS data collection and processing(Rod Cell “reassembled” in-silico)
Western Blot Confirmation of Protein Trends
Boris Reidel, Nikolai Skiba, Vadim Arshavsky
Using Rosetta Elucidator to Find Matching Trends at Protein Level
(approximately 750 Proteins quantified, with over 3500 peptides)
Spatial Proteomic Tissue Profiling of the Photoreceptor Rod Cell using Cryosectioning and Mass Spectrometry
Boris Reidel, Nikolai Skiba, Vadim Arshavsky
Cellular Machinery of the Photoreceptor Cell(Proteins with specific Subcellular Localization)
Boris Reidel, Nikolai Skiba, Vadim Arshavsky
Cellular Machinery of the Photoreceptor Cell(Protein Translocation)
Light Adjusted RetinaDark Adjusted Retina Boris Reidel, Nikolai Skiba, Vadim Arshavsky
Challenges and Opportunities for Mass-Spectrometry Based Proteomics
• Clinical Proteomics / Biomarker Discovery– “Large” clinical-based studies where QC metrics must be tightly controlled,
deliverables are well-defined and expected, and data must be of high quality– “Discovery and Validation of a Serum Proteomic Signature of Response to
Interferon Therapy in Chronic HCV Infection”
• Translational Research– Medium to large-scale studies requiring cutting-edge but robust technology,
with longer timelines and more flexible end deliverables– “Spatial Proteomic Tissue Profiling of the Photoreceptor Rod Cell using
Cryosectioning and Mass Spectrometry”
• Basic Research– Highly collaborative small to medium-scale studies where new technologies
can be tested and ultimately deployed, with loose timelines and where hypothesis generation is many times a key goal
– “A Proteomics Approach to Dissect Lipid Droplet – Chlamydia Interactions”
A Proteomics Approach to Dissect
Lipid Droplet-Chlamydia interactions
Hector A. Saka, Raphael Valdivia
LDRB
RB
RB
Inclusion
Nucleus
Cytoplasm
EB
LD: Lipid droplet
RB: Reticulate body (non-infections, metabolically active)
EB: Elementary body (infectious, metabolically inactive)
Hypothesis:-Chlamydia bacteria utilizes lipid droplet to subvert host immune responseKey Question:-What are proteomic changes in the lipid droplet as a function of infectionApproach:-Isolate LDs from infected/uninfected cells with density gradient centrifugation-Analyze proteome
Peptide Level Expression DataVimentin shown independently to have n-terminal domain processed by bacterial protease CPAF
Kumar Y, Valdivia RH. Cell Host Microbe. 2008 4(2):159-69.
Key Point: Only by mining the data at the peptide levelcan one understand the underlying biology of this infection
- The N-terminal peptides are changing in expression
Recruitment and Processing of Host Proteins during Chlamydia Trachomatis Infection revealed ONLY at Peptide Level
• Chlamydia-induced subversion of host cell protein’s function leads to qualitative/quantitative changes in the lipid droplet proteome
decrease in expression of N-terminal tryptic peptides
increase in expression of N-terminal semi-tryptic peptides
• Chlamydia ‘co-opts’ the function of structural proteins via protease processing
– stabilizes the inclusion body; minimizes the exposure of the inclusion body contents to host immune-surveillance proteins
– Kumar and Valdivia, Cell Host Microbe. 2008 Aug 14;4(2):159-69.
*Method used to calculate absolute abundance adapted from Silva et al, Mol Cell Proteomics. 2006 Jan;5(1):144-56.
Using Absolute Quantification to Characterize Protein Abundance in Lipid Droplets
(Expression Levels of Top 50 most abundant Proteins shown)*
Novel Hypothesis Generation Using MSE and Absolute Quantitation
75
25
Sypro Orange
Discussion Points
• Software is generally undervalued with respect to how critical it is for success
• Robust analytical workflows and well-planned experiments (and/or well-curated clinical cohorts) are certainly a winning combination
• Vendor collaboration helps to decrease the time in which new developments can have impact
Key Colleagues and Funding Sources• Duke Proteomics Core Facility
– Arthur Moseley, Director
– Laura Dubois
– Erik Soderblom
– Meredith Turner
• HCV Project Team
– John McHutchison, PI
– Jeanette McCarthy, co-PI
– Joe Lucas
– Keyur Patel
• Duke Eye Institute
– Vadim Arshavsky, PI
– Nikolai P. Skiba
– Boris Reidel
• Duke Department of Molecular Genetics & Microbiology
– Raphael Valdivia, PI
– Alex Saka
• Industry Colleagues
– Waters Corporation
• Scott Geromanos
• Martha Stapels
• Keith Fadgen
• Jim Langridge
– Rosetta Biosoftware
• Andrey Bondenrenko
• Cindy Chepanoske
• Jon Karakowski
• Andy Keller
• Funding
– Duke School of Medicine
• Sally Kornbluth, Vice-Dean of Research
– Duke Translational Research Institute
• Victoria Christian, COO DTRI
– Duke Comprehensive Cancer Center
– MURDOCK Study (DHMRI)
– NCRR Grant Number 1UL1 RR024128-01 (CTSA)