Comparative genomics and metabolic reconstruction of
bacterial pathogens
Mikhail GelfandInstitute for Information Transmission Problems, RAS
GPBM-2004
Metabolic reconstruction
• Identification of missing genes in complete genomes
• Search for candidates– Analysis of individual genes to assign general
function:• homology• functional patterns• structural features
– Comparative genomics to predict specificity:• analysis of regulation• positional clustering• gene fusions• phylogenetic patterns
Enzymes• Identification of a gap in a pathway (universal,
taxon-specific, or in individual genomes)• Search for candidates assigned to the
pathway by co-localization and co-regulation (in many genomes)
• Prediction of general biochemical function from (distant) similarty and functional patterns
• Tentative filling of the gap• Verification by analysis of phylogenetic
patterns:– Absence in genomes without this pathway– Complementary distribution with known enzymes
for the same function
Transporters• Identification of candidates assigned to the pathway
by co-localization and co-regulation (in many genomes)
• Prediction of general function by analysis of transmembrane segments and similarty
• Prediction of specificity by analysis of phylogenetic patterns:– End product if present in genomes lacking this pathway
(substituting the biosynthetic pathway for an essential compound)
– Input metabolite if absent in genomes without the pathway (catabolic, also precursors in biosynthetic pathways)
– Entry point in the middle if substituting an upper or side part of the pathway in some genomes
Missing link in fatty acid biosynthesis in Streptococci
acpP
fabD
accA accD accB
accC
fabHfabF
fabG
fabZ
fabI
Gene fabI of Enoyl-ACP reductase (EC 1.3.1.9) is
missing in the genome 12B, and a number of Streptococci
fabI (Enoyl-ACP reductase, EC 1.3.1.9) target of triclosan.
Enzymatic activity, but no gene in Streptococci
Identification of a candidate by positional clustering
Genome XGenome X
TR? 6.3.4.15fabI hyp3.5.1.?hyp
TR? 2.1.1.79 FRNS
Genome YGenome Y
5.99.1.2
Clostridium acetobutylicum Clostridium acetobutylicum
TR?
Streptococcus pyogenesStreptococcus pyogenes
? hyp
TR? ?
fabH acpP
?
fabG fabF accAaccDaccCaccB fabZfabD
fabG fabF accBfabD accAaccDaccCfabZ
fabG fabF accAaccDaccCaccB fabZfabDfabH acpP
fabH acpP
fabH acpP fabG fabF accAaccDaccCaccB fabZfabD
Binding sites of FabR (“Tr?”, HTH)
HTH fabKfabH acpP fabG fabF accAaccDaccCaccB fabZfabDFad (42.1.17)
CONSENSUS acTTTGAtwaTCAAAgt
E. faecalis HTH-1 AgTTTGggTATCAAAGT
E. faecalis HTH-2 AgTTTGAacATCAAAtg
E. faecalis HTH-3 GtTTTGATAATCAAAGT
E. faecium HTH-1 ACTTTGATAATCAAAaT
E. faecium HTH-2 AgTTTGAacATCAAAag
E. faecium HTH-3 gaTTTGATAATCAAAcT
S. pyogenes 4.2.1.17 GaTTTGATTATCAAAtg 1
S. pyogenes HTH-1 AaTTTGATTgTCAAAGT 2
S. pyogenes fabK-1 CtTTTGATAtTCAAAtT 3
S. pyogenes fabK-2 AgTTTGATTATCAAAtT 4
S. pneumoniae 4.2.1.17 ACTTTGAcAgTgAAAta
S. pneumoniae HTH-1 gtTTTGATTgTaAAAGT
S. pneumoniae fabK-1 AgTTTGAcTgTCAAAtT
S. mutans 4.2.1.17-1 ACTTTGATTtTCAAAcT
S. mutans 4.2.1.17-2 AaTTTGATTATCttAaT
S. mutans HTH-1 ACTTTGATAgTCAAAGT
S. mutans fabK-1 AgTTTGAcAtTCAAAtc
S. mutans fabK-2 AgTTTGAcTgTCAAAtT
1 2 3 4
Metabolic reconstruction of the thiamin biosynthesis(new genes/functions shown in red)
thiN (confirmed)
(Gram-positive bacteria)
(Gram-negative bacteria)
Transport of HMPTransport of HET
Purine pathway
Carbohydrate metabolism in Streptococcus and Lactococcus spp.
S.
pneu
mon
iae
S.
pyog
enes
S.
equi
S.
uber
is
S.
agal
actia
e
S.
mut
ans
S.
ther
mop
hilu
s
S.
suis
L. la
ctis
L. c
asei
L. g
asse
ri
L. d
elbr
ueck
ii
P.
pent
osac
eus
L. b
revi
s L.
mes
ente
roid
esO
enoc
occu
s oe
ni
unknownarabinosearbutincellobiosedextranesculinfructosefucosegalactoseglucoseinulinlactosemaltosemannitolmannosemelibioseN-AcGluraffinoseribosesalicinsorbitolsorbosesucrosetagatosetrehalosexylose
Only biochemical data, genes unknown
Experimentally verified genes
Biochemical data and genomic predictions
Only genomic predictions
An uncharacterized locus in invasive speciesS
. pn
eum
onia
e
S.
pyog
enes
S.
equi
S.
uber
is
S.
agal
actia
e
S.
mut
ans
S.
ther
mop
hilu
s
S.
suis
L. la
ctis
L. c
asei
L. g
asse
ri
L. d
elbr
ueck
ii
P.
pent
osac
eus
L. b
revi
s L.
mes
ente
roid
esO
enoc
occu
s oe
ni
unknownarabinosearbutincellobiosedextranesculinfructosefucosegalactoseglucoseinulinlactosemaltosemannitolmannosemelibioseN-AcGluraffinoseribosesalicinsorbitolsorbosesucrosetagatosetrehalosexylose
S. pneumoniae
S. pyogenes
S. equi
S. agalactiae
S. suis
Structure of the genome loci
IS
IS
IS
S. pyogenes, S. agalactiae
S. equi
S. pneumoniae TIGR4
S. suis
S. pneumoniae R6
Gene functions
3-(4-deoxy-beta-D-gluc-4-enuronosyl)-N-acetyl-D-glucosamine
PTS transporterhydrolaseisomeraseoxidoreductasedehydrogenasekinasealdolasepyruvate +
D-glyceraldehyde 3-phosphate
hyaluronidase(hyaluronate lyase)
RegR
Candidate regulatory signal
Structure of the genome loci - 2
IS
IS
IS
S. pyogenes, S. agalactiae
S. equi
S. pneumoniae TIGR4
S. suis
S. pneumoniae R6
Possible function
• Pathway exists in invasive species• Sometimes co-localized with hyaluronidase• Always co-regulated with hyaluronidase
Thus:• Utilization of hyaluronate• May be involved in pathogenesis
Comparative genomics of zinc regulons
Two major roles of zinc in bacteria:
• Structural role in DNA polymerases, primases, ribosomal proteins, etc.
• Catalytic role in metal proteases and other enzymes
Genomes and regulators
nZURFUR family
???
AdcR ?MarR family
pZURFUR family
Regulators and signals nZUR-nZUR-
AdcRpZUR
TTAACYRGTTAA
GATATGTTATAACATATCGAAATGTTATANTATAACATTTC
GTAATGTAATAACATTAC
TAAATCGTAATNATTACGATTTA
Transporters
• Orthologs of the AdcABC and YciC transport systems
• Paralogs of the components of the AdcABC and YciC transport systems
• Candidate transporters with previously unknown specificity
zinT: regulation
zinT is isolated
fusion: adcA-zinT
E. coli, S. typhi, K. pneumoniae Gamma-proteobacteria
Alpha-proteobacteria
B. subtilis, S. aureus
S. pneumoniae, S. mutans, S. pyogenes, L. lactis, E. faecalis
Bacillus group
Streptococcus group
zinT is regulated by zinc repressors (nZUR-, nZUR-, pZUR)
adcA-zinT is regulated by zinc repressors (pZUR, AdcR) (ex. L.l.)
A. tumefaciens, R. sphaeroides
ZinT: protein sequence analysis
E. coli, S. typhi, K. pneumoniae, A. tumefaciens, R. sphaeroides, B. subtilis
L. lactis
Y. pestis, V. cholerae, B. halodurans
TM Zn AdcA
S. aureus, E. faecalis, S. pneumoniae, S. mutans, S. pyogenes
ZinT
ZinT: summary• zinT is sometimes fused to the gene of a zinc
transporter component adcA• zinT is expressed only in zinc-deplete
conditions• ZinT is attached to cell surface (has a TM-
segment)• ZinT has a zinc-binding domain
ZinT: conclusions:• ZinT is a new type of zinc-binding
component of zinc ABC transporter
Zinc regulation of PHT (pneumococcal histidine triad)
proteins of Streptococci
S. pneumoniae S. equiS. agalactiae
lmb phtD phtE
phtBphtA
lmb phtD
S. pyogenes
phtY
lmb phtD
zinc regulation shown in experiment
Structural features of PHP proteins
• PHT proteins contain multiple HxxHxH motifs
• PHT proteins of S. pneumoniae are paralogs (65-95% id)
• Sec-dependent hydrophobic leader sequences are present at the N-termini of PHT proteins
• Localization of PHT proteins from S. pneumoniae on bacterial cell surface has been confirmed by flow cytometry
PHH proteins: summary
• PHT proteins are induced in zinc-deplete conditions
• PHT proteins are localized at the cell surface
• PHT proteins have zinc-binding motifs
A hypothesis:• PHT proteins represent a new family of
zinc transporters
… incorrect
• Zinc-binding domains in zinc transporters:
EEEHEEHDHGEHEHSH
HSHEEHGHEEDDHDHSHEEHGHEEDDHHHHHDED
DEHGEGHEEEHGHEH
(histidine-aspartate-glutamate-rich)
• Histidine triads in streptococci:
HGDHYHY 7 out of 21
HGDHYHF 2 out of 21
HGNHYHF 2 out of 21
HYDHYHN 2 out of 21
HMTHSHW 2 out of 21
(specific pattern of histidines and aromatic amino acids)
Analyis of PHP proteins (cont’d)
• The phtD gene forms a candidate operon with the lmb gene in all Streptococcus species– Lmb: an adhesin involved in laminin binding,
adherence and internalization of streptococci into epithelial cells
• PhtY of S. pyogenes: – phtY regulated by AdcR
– PhtY consists of 3 domains:
PHT internalin H-rich
4 HIS TRIADS LRR IRHDYNHNHTYEDEEGHAHEHRDKDDHDHEHED
PHH proteins: summary-2
• PHT proteins are induced in zinc-deplete conditions• PHT proteins are localized at the cell surface• PHT proteins have structural zinc-binding motifs• phtD forms a candidate operon with an adhesin gene • PhtY contains an internalin domain responsible for the
streptococcal invasion
HypothesisPHT proteins are adhesins involved in the attachment of
streptococci to epithelium cells, leading to invasion
Zinc and (paralogs of) ribosomal proteins
L36 L33 L31 S14E. coli, S.typhi – – – + –K. pneumoniae – – – – –Y. pestis,V. cholerae – – – + –B subtilis – – + – – + – +S. aureus – – – – – – +Listeria spp. – – – – – +E. faecalis – – – – – – + –S. pne., S. mutans – – – – – –S. pyo., L. lactis – – – – – – +
nZU
RpZU
RAdc
R
Zn-ribbon motif (Makarova-Ponomarev-Koonin, 2001)
L36 L33 L31 S14E. coli, S.typhi (–) – (–) + –K. pneumoniae (–) – (–) – –Y. pestis,V. cholerae (–) – (–) + –B subtilis (–) (–) + – (–) + (–) +S. aureus (–) (–) – – – (–) +Listeria spp. (–) (–) – – (–) +E. faecalis (–) (–) – – – (–) + –S. pne., S. mutans (–) (–) – – – (–)S. pyo., L. lactis (–) (–) – – – (–) +
nZU
RpZU
RAdc
R
Summary of observations:
• Makarova-Ponomarev-Koonin, 2001:– L36, L33, L31, S14 are the only ribosomal proteins duplicated in
more than one species
– L36, L33, L31, S14 are four out of seven ribosomal proteins that contain the zinc-ribbon motif (four cysteines)
– Out of two (or more) copies of the L36, L33, L31, S14 proteins, one usually contains zinc-ribbon, while the other has eliminated it
• Among genes encoding paralogs of ribosomal proteins, there is (almost) always one gene regulated by a zinc repressor, and the corresponding protein never has a zinc ribbon motif
Bad scenario
Zn-rich conditions
Zn-deplete conditions: all Zn utilized by the ribosomes, no Zn for Zn-dependent enzymes
Regulatory mechanism
ribosomes
Zn-dependentenzymes
R
Sufficient Zn
Zn starvation
R
repressor
Good scenario
Zn-rich conditions
Zn-deplete conditions: some ribosomes without Zn, some Zn left for the enzymes
Prediction … (Proc Natl Acad Sci U S A. 2003 Aug 19;100(17):9912-7.)
… and confirmation (Mol Microbiol. 2004 Apr;52(1):273-83.)
• Andrei A. Mironov• Anna Gerasimova• Olga Kalinina• Alexei Kazakov (hyaluronate)• Ekaterina Kotelnikova • Galina Kovaleva• Pavel Novichkov• Olga Laikova (hyaluronate)• Ekaterina Panina (zinc)
(now at UCLA, USA)• Elizabeth Permina• Dmitry Ravcheev• Alexandra B. Rakhmaninova• Dmitry Rodionov (thiamin)• Alexey Vitreschak (thiamin)
(on leave at LORIA, France)
• Howard Hughes Medical Institute
• Ludwig Institute of Cancer Research
• Russian Fund of Basic Research
• Programs “Origin and Evolution of the Biosphere” and “Molecular and Cellular Biology”, Russian Academy of Sciences
• Andrei Osterman (Burnham Institute, San-Diego, USA) (fatty acids)