SDPpred: a method for identification of amino acid residuesthat determine differences in functional specificity
of homologous proteins and application thereof to the MIP family
of membrane transporters
Olga V. Kalinina
Pavel S. Novichkov
Andrey A. Mironov
Mikhail S. Gelfand
Aleksandra B. Rakhmaninova
Large families of proteins: generally similar biochemical function but many different specificities… Example: ~800 transcription factors of the LacI family.
Average sequence identity 30%. Bind different effectors and operators.
Some effectors:• lactose (LacI)
• D-fructose-6-phosphate (FruR)
• guanine, hypoxantine (PurR)
• cytidine, adenosine (CytR)
• trehalose-6-phosphate (TreR)
• D-gluconate (GntR)
• D-galactose (GalR)
• D-ribose (RbsR)
• maltose (MalR)
• raffinose (RafR)
• …….
• Х??
Positions that account
for specificity
Assignment of specificity to new proteins
ExperimentExperiment
Testing on families that include proteins with resolved 3D structure
SDPpredSDPpred
Description of specificity groups :Group А: No. 1-10,13…Group В: No.12, 14-16…
Group С: No. 17-45……
Q9KDW9 ----------MSPFLGEVIGTMILIILGGGVVAGVVLKGTKQ8Y6Z1 ----MIDTSLATQFLGEVIGTAILIILGAGVVAGVSLKRSKQ97JG6 ----------MTIFFAELVGTLLLILLGDGVVANVVLKNSKGLPF_ECOLI MSQT---STLKGQCIAEFLGTGLLIFFGVGCVA--ALKVAGQ8ZJK5 MSQTA-SSTLKGQCIAEFLGTGLLIFFGAGCVA--ALKLAGGLPF_HAEIN MDKS-----LKANCIGEFLGTALLIFFGVGCVA—-ALKVAGGLPF_PSEAE MTTAAPTPSLFGQCLAEFLGTALLIFFGTGCVA--ALKVAGAQPZ_BRUME ---------MLNKLSAEFFGTFWLVFGGCGSAILAA--AFPQ92NM3 ---------MFRKLSVEFLGTFWLVLGGCGSAVLAA--AFPQ8UJW4 ---------MGRKLLAEFFGTFWLVFGGCGSAVFAA--AFPAQPZ_ECOLI ---------MFRKLAAECFGTFWLVFGGCGSAVLAA--GFP Alig
nm
en
t
??
SDP is not equivalent to a functionally important position!
• Specificity group = group of proteins that have the same specificity (experimental data, genome analysis, etc.)
• SDP = alignment position that is conserved within specificity groups but differs between them
What are SDPs? (SDP = Specificity Determining Position)
• Mutual information Ip reflect the extent to which an alignment position
tends to be a SDP.
• Statistical significance of Ip.
Expected mutual information Ipexp of an alignment column.
Z-score.
(Mirny&Gelfand, 2002, J Mol Biol, 321(1))
• Smoothed amino acid frequencies: a leucine is more a methionine than a valine, and any arginine has a dash of lysine…
• Are 5 SDP with Z-score >10.5 better than 10 SDP with Z-score >9.0? Bernoulli estimator for selection of proper number of SDPs
• ы 21 ZZ
N - number of groups, - fraction of proteins in group i. - ratio of occurrences of amino acid In group i in position p to the length of the whole alignment column, - frequency of amino acid in the whole alignment column in position p,
Algorithm
N
i p
ppp iff
ififI
1
20
1 )()(
),(log),(
)(if
),( if p
)(pf
)exp(
exp
pIpIpI
pZ
)(),(),( ininif )()(
)()(),(),(
),(~
20
1
inin
inminin
if
kk
ZZkPk scores- Zobserved least at are thereminarg*
n
kni
iniin
k
pqC1
1minarg
kZ
k dZZZZPp )exp(2
1)( 2
pq 1
…
• Kalinina OV, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004)
Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families. Protein Sci 13(2): 443-56
• http://math.belozersky.msu.ru/~psn/
Kalinina OV, Novichkov PS, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins. Nucl Acids Res 32(Web Server issue): W424-8.
Web interfaceInput: multiple alignment of proteins
divided into specificity groups
=== AQP ===%sp|Q9L772|AQPZ_BRUME-------------------------------------mlnklsaeffgtfwlvfggcgsailaa--afp-------elgigflgvalafgltvltmayavggisg--ghfnpavslgltviiilgsts------------------------------slap------------------qlwlfwvaplvgavigaiiwkgllgrd---------------------------------------%sp|P48838|AQPZ_ECOLI-------------------------------------mfrklaaecfgtfwlvfggcgsavlaa--gfp-------elgigfagvalafgltvltmafavghisg--ghfnpavtiglwalvihgatd------------------------------kfap------------------qlwffwvvpivggiiggliyrtllekrd--------------------------------------%tr|Q92ZW9-------------------------------------mfkklcaeflgtcwlvlggcgsavlas--afp-------qvgigllgvsfafgltvltmaytvggisg--ghfnpavslglaviiilgsth------------------------------rrvp------------------qlwlfwiaplfgaaiagivwksvgeefrpvd-----------------------------------=== GLP ===%sp|P11244|GLPF_ECOLI----------------------------msqt---stlkgqciaeflgtglliffgvgcvaalkvag---------a-sfgqweisviwglgvamaiyltagvsg--ahlnpavtialwlglilaltd------------------------------dgn--------------g-vpr-flvplfgpivgaivgafayrkligrhlpcdicvveek--etttpseqkasl--------------%sp|P44826|GLPF_HAEIN----------------------------mdks-----lkancigeflgtalliffgvgcv
…
Web interfaceOutput
Alignment of the family with the SDPs highlighted(Alignment view)
Detailed description of each SDP(List of SDPs)
Plot of probabilities, used by the Bernoulli estimator to set the cutoff (Probability plot view)
Examples: the LacI family of bacterial transcription factors
• Training set: 459 sequences, average length: 338 amino acids, 85 specificity groups
10 residues contact NPF (analog of the effector)
6 residues make up intersubunit contacts
7 residues contact the operator sequence
7 residues in the effector contact zone (5Ǻ<dmin<10Ǻ)
5 residues in the intersubunit contact zone (5Ǻ<dmin<10Ǻ)
6 residues in the operator contact zone (5Ǻ<dmin<10Ǻ)
– 44 SDPs
LacI from E.coli
Examples: bacterial membrane channels of the MIP family
• Training set: 17 sequences, average length 280 amino acids, 2 specificity groups: Aquaporines & glyceroaquaporines
– 21 SDPs8 residues contact glycerol (substrate) (dmin<5Ǻ)
8 residues oriented to the channel
5 residues make up contacts with other subunits
GlpF from E.coli
Why does the prediction make sense? LacI from E.coli
• Total 348 amino acids
• 44 SDP
Non-contacting residues (distance to the DNA, effector, or the other subunit >10Ǻ)
Contact zone (may be functional)
Contacting residues (distance to the DNA, effector, or the other subunit <5Ǻ)
Why does the prediction make sense? GlpF from E.coli
• Total 281 amino acids
• 21 SDP
Contacting residues (distance to the substrate, or another subunit <5Ǻ)
Non-contacting residues (distance to the substrate, or another subunit >10Ǻ)
Contact zone (may be functional)
GlpF from E.coli, a membrane channel from the MIP family:
SDPs either interact with the substrate or are located on the outer surface of the monomer
Structure of the GlpF monomer Predicted SDPs
Glycerol
SDPs located on the outer surface of the GlpF monomer form subunit contacts
Glu43 from all four subunits
20Leu, 24Ile, 108Tyr of one subunit, 193Ser from another subunit
SDPs located on the outer surface of the GlpF monomer (continued)
Subunit I Subunit II Subunit IV
Residue Atom Residue Atom Residue Atom (Ǻ)
Glu43 OE1 Ser38 O 4.8
Glu43 OE2 Glu43 OE2 4.1
Glu43 CG Trp42 CD1 3.7
Glu43 OE2 Glu43 OE2 4.1
Subunit I Subunit II
Residue Atom Residue Atom (Ǻ)
Leu20 CD2 Ile158 CD1 4.3
Leu20 CD1 Leu162 CD2 4.5
Phe24 CZ Ile158 CG2 3.9
Phe24 CZ Leu186 CD1 3.9
Phe24 CE2 Val189 CG2 3.8
Phe24 CE2 Ile190 CG1 3.7
Phe24 CA Ser193 CB 3.9
Phe24 O Ser193 OG 4.2
Phe24 O Ser193 CB 3.3
Gly27 O Ser193 O 3.2
Cys28 CA Ser193 CA 3.8
Tyr108 OH Ser193 O 2.6
Tyr108 CE1 Met194 CE 3.7
Tyr108 CE1 Leu197 CD1 3.9
SDPs located on the outer surface of the GlpF monomer (continued)
Structure of contacts in the type A cluster
Structure of contacts in the type B cluster
Conclusions I. SDPpred: the SDP prediction method
• A method for identification of amino acid residues that account for differences in protein functional specificity– Does not rely on the protein 3D structure– Automatically determines the number of significant positions– Considers substitutions according to the chemical properties of
substituted amino acids
• Results agree with available structural and experimental data• Applicable to any protein family in a standard way
Kalinina OV, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) Automated selection of positions determining functional specificity of proteins by comparative analysis of
orthologous groups in protein families. Protein Sci 13(2): 443-56http://math.belozersky.msu.ru/~psn/
Kalinina OV, Novichkov PS, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) SDPpred: a tool for prediction of amino acid residues that determine differences in
functional specificity of homologous proteins. Nucl Acids Res 32(Web Server issue): W424-8.
Conclusions II. SDPs for GlpF from E.coli
• In protein families, whose members function as oligomers, predicted SDPs are often localized on the contact surface between subunits
• 5 “surface” SDPs in GlpF: 20Leu, 24Ile, 43Glu, 108Tyr, 193Ser. All of them participate in forming the quaternary structure Evolutionary pressure on amino acids that establish intersubunit
contacts correlates with evolutionary pressure on amino acids that account for the correct recognition of the substrate
• These residues form compact spatial clusters “structural clasps” for recognition of proper subunits
• Olga V. Kalinina• Pavel S. Novichkov• Andrey A. Mironov• Mikhail S. Gelfand• Aleksandra B. Rakhmaninova
– Department of Bioengineering and Bioinformatics, Moscow State University, Moscow, Russia
– Institute for Information Transmission Problems RAS, Moscow, Russia
– State Scientific Center GosNIIGenetika, Moscow, Russia
• Acknowledgements– Leonid A. Mirny– Olga Laikova– Vsevolod Makeev– Roman Sutormin– Shamil Sunyaev– Aleksey Finkelstein