Top-down characterization of proteins in bacteria with unsequenced genomes

Top-down characterization of proteins in bacteria with

unsequenced genomes

Colin WynneCatherine Fenselau

University of Maryland, College Park

Nathan EdwardsGeorgetown University Medical Center

2

Microorganism Identification Important application of mass spectrometry

Match spectra with sequence for identity Many bacteria will never be sequenced...

Pathogen simulants, for example ...but many have – about 1000 to date.

Can we use the available sequence to identify proteins from unsequenced bacteria? Yes, for some proteins in some organisms!

Yersinia rohdei, Erwinia herbicola, Enterobacter cloacae

3

Intact protein LC-MS/MS

Crude cell lysate

Capilary HPLC C8 column

LTQ-Orbitrap XL

Precursor scan: 30,000 @ 400 m/z

Data-dependent precursor selection: 5 most abundant ions 10 second dynamic

exclusion Charge-state +3 or

greater

CID product ion scan 15,000 @ 400 m/z

4

E:\Yersinia Work\yr_inclusion 3/11/2009 3:43:13 PM yrohdei

RT: 19.04 - 25.39

19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0

Time (min)

0

20

40

60

80

100

0

20

40

60

80

100

Re

lative

Ab

un

da

nce

25.3619.9919.93

25.2720.04 25.2319.89 23.0322.97 23.08

20.1019.83 23.64 25.1923.7022.88 24.6324.5720.1422.82

20.2019.7822.7220.2519.48

22.5220.41 22.0821.8420.60 21.04

20.00

21.03 21.46

NL: 1.66E8

TIC MS yr_inclusion

NL: 1.01E7

TIC F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00] MS yr_inclusion

yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

10

20

30

40

50

60

70

80

90

100

Re

lative

Ab

un

da

nce

576.83z=2

840.16z=7

720.39z=2 903.81

z=3785.41

z=4694.62

z=4

584.57z=4

928.49z=4559.55

z=41804.48

z=?992.53

z=3200.78z=?

329.71z=?

1253.14z=?

555.29z=4

1610.27z=?

1883.75z=?

1491.23z=?

1118.93z=?

1666.89z=?

1345.30z=?

461.16z=?

756.70 +8 MW 6044.11

CID Protein Fragmentation Spectrum from Y. rohdei

5

Enterobacteriaceae Protein Sequences

Exhaustive set of all Enterobacteriaceae protein sequences from Swiss-Prot, TrEMBL, RefSeq, Genbank, and CMR

Plus, Glimmer3 predictions on Enterobacteriaceae genomes from RefSeq Primary and alternative translation start-sites

Filter for intact mass in range 1 kDa – 20 kDa 253,626 distinct protein sequences, 256 species

Derived from "Rapid Microorganism Identification Database" (RMIDb.org) infrastructure.

6

ProSightPC 2.0

Product ion scan decharging Enabled by high-resolution fragment ion

measurements THRASH algorithm implementation

Absolute mass search mode 15 ppm fragment ion match tolerance 250 Da precursor ion match tolerance

"Single-click" analysis of entire LC-MS/MS datafile.

7

E:\Yersinia Work\yr_inclusion 3/11/2009 3:43:13 PM yrohdei

RT: 19.04 - 25.39

19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0

Time (min)

0

20

40

60

80

100

0

20

40

60

80

100

Re

lative

Ab

un

da

nce

25.3619.9919.93

25.2720.04 25.2319.89 23.0322.97 23.08

20.1019.83 23.64 25.1923.7022.88 24.6324.5720.1422.82

20.2019.7822.7220.2519.48

22.5220.41 22.0821.8420.60 21.04

20.00

21.03 21.46

NL: 1.66E8

TIC MS yr_inclusion

NL: 1.01E7

TIC F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00] MS yr_inclusion

yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

10

20

30

40

50

60

70

80

90

100

Re

lative

Ab

un

da

nce

576.83z=2

840.16z=7

720.39z=2 903.81

z=3785.41

z=4694.62

z=4

584.57z=4

928.49z=4559.55

z=41804.48

z=?992.53

z=3200.78z=?

329.71z=?

1253.14z=?

555.29z=4

1610.27z=?

1883.75z=?

1491.23z=?

1118.93z=?

1666.89z=?

1345.30z=?

461.16z=?

756.70 +8 MW 6044.11

CID Protein Fragmentation Spectrum from Y. rohdei

Match to Y. pestis 50SRibosomal Protein L32

8

Identified E. herbicola proteins

30S Ribosomal Protein S19 m/z 686.39, z 15+, E-value 1.96e-16, Δ 0.007

Six proteins identified with |Δ| < 0.02

9

DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128

Eight proteins identified with "large" |Δ|


10

DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 1.91e-58, Δ 0.11

Use "Sequence Gazer" to find mass shift


11

DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128

Extract N- and C-terminus sequence supported by at least 3 b- or y-ions


12

E. herbicola protein sequences

13

E. herbicola sequences found in other species

14

Phylogenetic placement of E. herbicola

Phylogram Cladogramphylogeny.fr – "One-Click"

15

Genome Annotation Correction

Serratia proteamaculans CSR, RPS19

Citrobacter koseri RPL32

Enterobacter sakazakii RPS21

RPL30 Enterobacter sakazakii Sodalis glossinidius Photorhabdus

luminescens* Erwinia tasmaniensis Enterobacter sp. 638

Some spectra match Glimmer predictions only!

16

Conclusions Protein identification for unsequenced organisms.

Identification and localization for sequence mutations and post-translational modifications.

Extraction of confidently established sequence suitable for phylogenetic analysis.

Genome annotation correction.

New paradigm for phylogenetic analysis?

17

Acknowledgements

Dr. Catherine Fenselau Colin Wynne, Joe Cannon University of Maryland Biochemistry

Dr. Yan Wang University of Maryland Proteomics Core

Dr. Art Delcher University of Maryland CBCB

Funding: NIH/NCI

18

Shared "Biomarker" Proteins

19

Shared "Biomarker" Proteins

Date post:	20-Jan-2016
Category:	Documents
Upload:	kass
View:	21 times
Download:	0 times

Top-down characterization of proteins in bacteria with unsequenced genomes

Documents