Top-down characterization of proteins in bacteria with
unsequenced genomes
Colin WynneCatherine Fenselau
University of Maryland, College Park
Nathan EdwardsGeorgetown University Medical Center
2
Microorganism Identification Important application of mass spectrometry
Match spectra with sequence for identity Many bacteria will never be sequenced...
Pathogen simulants, for example ...but many have – about 1000 to date.
Can we use the available sequence to identify proteins from unsequenced bacteria? Yes, for some proteins in some organisms!
Yersinia rohdei, Erwinia herbicola, Enterobacter cloacae
3
Intact protein LC-MS/MS
Crude cell lysate
Capilary HPLC C8 column
LTQ-Orbitrap XL
Precursor scan: 30,000 @ 400 m/z
Data-dependent precursor selection: 5 most abundant ions 10 second dynamic
exclusion Charge-state +3 or
greater
CID product ion scan 15,000 @ 400 m/z
4
E:\Yersinia Work\yr_inclusion 3/11/2009 3:43:13 PM yrohdei
RT: 19.04 - 25.39
19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0
Time (min)
0
20
40
60
80
100
0
20
40
60
80
100
Re
lative
Ab
un
da
nce
25.3619.9919.93
25.2720.04 25.2319.89 23.0322.97 23.08
20.1019.83 23.64 25.1923.7022.88 24.6324.5720.1422.82
20.2019.7822.7220.2519.48
22.5220.41 22.0821.8420.60 21.04
20.00
21.03 21.46
NL: 1.66E8
TIC MS yr_inclusion
NL: 1.01E7
TIC F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00] MS yr_inclusion
yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
10
20
30
40
50
60
70
80
90
100
Re
lative
Ab
un
da
nce
576.83z=2
840.16z=7
720.39z=2 903.81
z=3785.41
z=4694.62
z=4
584.57z=4
928.49z=4559.55
z=41804.48
z=?992.53
z=3200.78z=?
329.71z=?
1253.14z=?
555.29z=4
1610.27z=?
1883.75z=?
1491.23z=?
1118.93z=?
1666.89z=?
1345.30z=?
461.16z=?
756.70 +8 MW 6044.11
CID Protein Fragmentation Spectrum from Y. rohdei
5
Enterobacteriaceae Protein Sequences
Exhaustive set of all Enterobacteriaceae protein sequences from Swiss-Prot, TrEMBL, RefSeq, Genbank, and CMR
Plus, Glimmer3 predictions on Enterobacteriaceae genomes from RefSeq Primary and alternative translation start-sites
Filter for intact mass in range 1 kDa – 20 kDa 253,626 distinct protein sequences, 256 species
Derived from "Rapid Microorganism Identification Database" (RMIDb.org) infrastructure.
6
ProSightPC 2.0
Product ion scan decharging Enabled by high-resolution fragment ion
measurements THRASH algorithm implementation
Absolute mass search mode 15 ppm fragment ion match tolerance 250 Da precursor ion match tolerance
"Single-click" analysis of entire LC-MS/MS datafile.
7
E:\Yersinia Work\yr_inclusion 3/11/2009 3:43:13 PM yrohdei
RT: 19.04 - 25.39
19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0
Time (min)
0
20
40
60
80
100
0
20
40
60
80
100
Re
lative
Ab
un
da
nce
25.3619.9919.93
25.2720.04 25.2319.89 23.0322.97 23.08
20.1019.83 23.64 25.1923.7022.88 24.6324.5720.1422.82
20.2019.7822.7220.2519.48
22.5220.41 22.0821.8420.60 21.04
20.00
21.03 21.46
NL: 1.66E8
TIC MS yr_inclusion
NL: 1.01E7
TIC F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00] MS yr_inclusion
yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
10
20
30
40
50
60
70
80
90
100
Re
lative
Ab
un
da
nce
576.83z=2
840.16z=7
720.39z=2 903.81
z=3785.41
z=4694.62
z=4
584.57z=4
928.49z=4559.55
z=41804.48
z=?992.53
z=3200.78z=?
329.71z=?
1253.14z=?
555.29z=4
1610.27z=?
1883.75z=?
1491.23z=?
1118.93z=?
1666.89z=?
1345.30z=?
461.16z=?
756.70 +8 MW 6044.11
CID Protein Fragmentation Spectrum from Y. rohdei
Match to Y. pestis 50SRibosomal Protein L32
8
Identified E. herbicola proteins
30S Ribosomal Protein S19 m/z 686.39, z 15+, E-value 1.96e-16, Δ 0.007
Six proteins identified with |Δ| < 0.02
9
DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128
Eight proteins identified with "large" |Δ|
Identified E. herbicola proteins
10
DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 1.91e-58, Δ 0.11
Use "Sequence Gazer" to find mass shift
Identified E. herbicola proteins
11
DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128
Extract N- and C-terminus sequence supported by at least 3 b- or y-ions
Identified E. herbicola proteins
12
E. herbicola protein sequences
13
E. herbicola sequences found in other species
14
Phylogenetic placement of E. herbicola
Phylogram Cladogramphylogeny.fr – "One-Click"
15
Genome Annotation Correction
Serratia proteamaculans CSR, RPS19
Citrobacter koseri RPL32
Enterobacter sakazakii RPS21
RPL30 Enterobacter sakazakii Sodalis glossinidius Photorhabdus
luminescens* Erwinia tasmaniensis Enterobacter sp. 638
Some spectra match Glimmer predictions only!
16
Conclusions Protein identification for unsequenced organisms.
Identification and localization for sequence mutations and post-translational modifications.
Extraction of confidently established sequence suitable for phylogenetic analysis.
Genome annotation correction.
New paradigm for phylogenetic analysis?
17
Acknowledgements
Dr. Catherine Fenselau Colin Wynne, Joe Cannon University of Maryland Biochemistry
Dr. Yan Wang University of Maryland Proteomics Core
Dr. Art Delcher University of Maryland CBCB
Funding: NIH/NCI
18
Shared "Biomarker" Proteins
19
Shared "Biomarker" Proteins