1
Investigating Alkaline Phosphatase and Ketosteroid Isomerase
by Rational Design
A thesis presented
by
Nicholas A. DeLateur
to
The Department of Chemistry and Chemical Biology
in partial fulfillment of the requirements for the degree of
Master of Science in the field of Chemistry
Northeastern University
Boston, Massachusetts
August 8, 2013
2
© Copyright 2013
Nicholas A. DeLateur
All rights reserved
3
Investigating Alkaline Phosphatase and Ketosteroid Isomerase
by Rational Design
by
Nicholas A. DeLateur
ABSTRACT OF THESIS
Submitted in partial fulfillment of the requirements for the degree
of Master of Science in Chemistry and Chemical Biology
in the College of Science of Northeastern University,
August 8, 2013
4
Abstract
Enzymes catalyze chemical reactions many orders of magnitude faster than the
uncatalyzed reaction and are capable of doing so at physiological pH and temperature. As
enzymes consist of hundreds of amino acids, the ability to identify which residues contribute to
catalysis with high recall and low false positive rates is of critical importance to characterizing
and engineering enzymes. Theoretical Microscopic Anomalous Titration Curve Shapes
(THEMATICS) and Partial Order Optimum Likelihood (POOL) are programs developed at
Northeastern University that can identify the residues contributing to catalysis. THEMATICS
finds anomalous titration behavior, which correlates with catalytic activity. POOL combines the
THEMATICS input with geometric and evolutionary predictions to rank each residue by the
likelihood of its importance for catalysis.
Alkaline phosphatase (AP) is a protein found in all domains of life which cleaves
phosphate groups from a broad range of substrates. Ketosteroid isomerase performs an important
biological function in the metabolism of many bacteria by degrading steroids. THEMATICS and
POOL predict that alkaline phosphatase and ketosteroid isomerase contain most of their catalytic
power in the residues directly surrounding the reacting substrate molecule; there is very little
contribution from the residues in the distal or remote residues of the protein. This example is in
stark contrast to phosphoglucose isomerase (PGI) and nitrile hydratase (NH), where
THEMATICS and POOL predict a multi-layer active site, with residues in the second and third
shells contributing to activity. The predictions for KSI, PGI, and NH have been experimentally
validated.
5
Pseudomonas putida KSI (PpKSI) is strikingly efficient and selective. Three putative
KSIs identified from Structural Genomics were analyzed by THEMATICS and POOL and then
characterized in vitro to determine the presence of, or lack of, KSI activity. A putative KSI from
Mycobacterium tuberculosis (MtKSI) was predicted to have isomerase activity and biochemical
experiments reveal that the putative M. tuberculosis KSI does indeed possess KSI activity,
although with reduced efficiency compared to PpKSI.
To investigate this lower efficiency in the correctly annotated KSI, we engineered the
MtKSI active site to resemble more closely that of PpKSI under the hypothesis that these
mutations would increase the activity of MtKSI. However, we found that most of these mutations
alone or in tandem significantly lowered rather than increased activity. Variants S16Y, F111D,
S16Y/F64Y, S16Y/F111D, F64Y/F111D, and S16Y/F64Y/F111D lost catalytic power and were
essentially inactive. Variant F64Y retained catalytic power similar to the wild-type enzyme.
Although the active sites of MtKSI and PpKSI are similar, our attempts to increase the catalytic
efficiency by creating a more PpKSI-like active site of MtKSI were not successful.
Protein engineering relies on the ability to accurately predict sites of function. The best
predictor for active-site residues is POOL using THEMATICS, INTREPID, and ConCavity
inputs. We’ve shown that not only can POOL correctly predict the residues required for
catalysis, but these predictions can also be used to assign function to proteins whose function is
unknown or putatively assigned. Even if the residues required for catalysis are known, the ability
to engineer improved or novel function is still difficult and may require multiple approaches.
6
Acknowledgments
I am blessed with not one, but two advisors of extraordinary talent and patience. I am
forever grateful to Professor Penny Beuning for allowing me to begin work in her lab as a young
freshman with no experience in chemistry or biology. She has been an unending source of
mentoring and teaching. Professor Mary Jo Ondrechen has trusted me with project after project,
encouraging me to investigate and grow as a scientist, for which I will be always grateful.
Dr. Srinivas Somarowthu performed the herculean task of teaching me both the
computational and experimental aspects of THEMATICS/POOL, alkaline phosphatase, and
ketosteroid isomerase. I owe most of my practical knowledge in these areas to Sri, and am
thankful for the pleasure of meeting and working with him over these past years.
I want to thank the numerous past and present DNA and ORG lab members, with
emphasis towards Judith Hollander and Ramya Parasarum for graciously sharing bench space
and wisdom. Mark Naniong and Colleen Shea experimented on MtKSI as undergraduate
researchers and their impressive work contributed to the data contained in this thesis.
Neither this work—nor even my graduation—would be possible without Richard
Pumphrey, Cara Shockley, Andrew Bean, Jordan Keefe, and Katie Cameron assisting me
through the NU shuffle and my own shortcomings. Jeff Peterson, Professor Graham Jones,
Professor Carla Mattos, and Professor O’Doherty have provided me with immensely valuable
discussion and direction. I believe John Bottomy has forgiven me more than anyone on Earth; I
cherish his friendship and kindness.
7
I owe my inspiration and aptitude to my ever-supportive family, especially my parents
Sandra and Joe. They have been a never-ending source of love. Thank you so much Mom, Dad,
and Matt, along with Cole and Tiffany.
Funding that allowed these projects and my research to happen was provided by the
Office of the Provost at Northeastern University, the Matz Co-op Scholarship, and grants NSF:
MCB-0843603, CAREER MCB-0845033, and REU MCB-0843603.
8
Table of Contents
Abstract ........................................................................................................................................... 3
Acknowledgments........................................................................................................................... 6
Table of Contents ............................................................................................................................ 8
List of Figures ................................................................................................................................. 9
List of Tables ................................................................................................................................ 11
List of Abbreviations .................................................................................................................... 12
Chapter 1. Protein Engineering ..................................................................................................... 16
1.1. Proteins as catalysts ............................................................................................................ 16
1.2. Design vs. Redesign; Directed Evolution vs. Rational Design .......................................... 18
1.3. Functional Site Prediction with THEMATICS and POOL ................................................ 20
1.4. Catalysis by remote residues .............................................................................................. 24
Chapter 2. Alkaline Phosphatase .................................................................................................. 26
2.1. Introduction ........................................................................................................................ 26
2.2. Computational Predictions ................................................................................................. 28
2.3. Materials and Methods ....................................................................................................... 33
2.4. Results ................................................................................................................................ 35
2.5. Conclusions ........................................................................................................................ 39
Chapter 3. Ketosteroid Isomerase ................................................................................................. 43
3.1. Introduction ........................................................................................................................ 43
3.2. Computational Predictions ................................................................................................. 45
3.3. Materials and Methods ....................................................................................................... 46
3.4. Results ................................................................................................................................ 48
3.5. Conclusions ........................................................................................................................ 53
Chapter 4. Future Work ................................................................................................................ 56
4.1. POOL-rank cut-offs............................................................................................................ 56
Appendix A. Propagation of error in calculating catalytic efficiency .......................................... 59
References ..................................................................................................................................... 60
9
List of Figures
Figure 1.1. Alanine, aspartate, glutamate, and asparagine at pH 7. .............................................. 19
Figure 1.2. Phenylalanine, tyrosine, and serine at pH 7. .............................................................. 20
Figure 1.3. A titration curve of mean net charge as a function of pH for select lysine residues in
E. coli β-lactamase. ................................................................................................................. 22
Figure 1.4. Diagram of a multi-layered active site. ...................................................................... 25
Figure 2.1. The active site of alkaline phosphatase based on PDB ID: 1ALK.. ........................... 27
Figure 2.2. Diagram of Evolutionary Trace and THEMATICS predictions for AP. .................... 28
Figure 2.3. A POOL plot of POOL score vs. POOL rank for alkaline phosphatase. ................... 30
Figure 2.4. The 2nd
and 3rd
shell residues predicted by THEMATICS. ........................................ 32
Figure 2.5. Primers for site-directed mutagenesis of E. coli alkaline phosphatase.. .................... 33
Figure 2.6. Standard curve for 4-nitrophenol phosphate .............................................................. 36
Figure 2.7. Michaelis-Menten plots for AP in 1 M Tris-HCl pH 8.0 buffer. ............................... 37
Figure 2.8. Catalytic efficiencies of wild-type and variant alkaline phosphatases ....................... 38
Figure 2.9. AP residues investigated in this work.. ...................................................................... 39
Figure 2.10. A plot of Table 3 and Table 4 showing effects on catalytic efficiency based on
POOL rank for AP. ................................................................................................................. 40
Figure 3.1. Mechanism of KSI based on PpKSI numbering......................................................... 43
Figure 3.2. Primers for site-directed mutagenesis of MtKSI in plasmid pGST-Rv0760c. ........... 46
Figure 3.3. Standard curve for 4-androstene-3,17-dione (4AND). ............................................... 49
Figure 3.4. Michaelis-Menten plots for MtKSI WT and variants................................................. 49
Figure 3.5. WT and F64Y individual Michaelis-Menten plots. .................................................... 49
Figure 3.6. Single run of Michaelis-Menten plot for MtKSI F111D. ........................................... 50
10
Figure 3.7. “Top-down” view of PpKSI. ...................................................................................... 53
Figure 3.8. Three residues of interest in PpKSI without surrounding secondary structure. ......... 54
Figure 4.1. POOL plots for AP, KSI, PGI, NH, DnaE and DinB. ................................................ 58
11
List of Tables
Table 2.1. POOL predictions for alkaline phosphatase. ............................................................... 29
Table 2.2. Kinetic assays for alkaline phosphatase....................................................................... 35
Table 2.3. WT and variant AP kinetic parameters. ....................................................................... 37
Table 2.4. Summary calculations for WT alkaline phosphatase and variants. ............................. 38
Table 2.5. 1st shell variants of AP and their catalytic efficiency under comparable conditions to
our experiments. ...................................................................................................................... 41
Table 2.6. 2nd
and 3rd
shell variants of AP and their catalytic efficiency under comparable
conditions to our experiments. ................................................................................................ 42
Table 3.1. SALSA alignment of POOL predicted residues for known KSI proteins and proteins
annotated as putative KSIs.. .................................................................................................... 45
Table 3.2. Kinetic assays for MtKSI.. ........................................................................................... 48
Table 3.3. Vmax and KMapp
for MtKSI WT and variants. ................................................................ 50
Table 3.4. Comparison between the WT MtKSI and F111D variant at 90 μM 5AND. ............... 50
Table 3.5. Catalytic efficiency for MtKSI WT and variants......................................................... 51
Table 3.6. MtKSI WT and variants based on initial velocities at 30 μM 5AND. ......................... 52
Table A.1. Concentrations of enzymes used to gather kinetic data for alkaline phosphatase. ..... 59
12
List of Abbreviations
% Percent
°C Degrees Celsius
4AND 4-androstene-3,17-dione
5AND 5-androstene-3,17-dione
Å Ångströms
AP Alkaline phosphatase
BSA Bovine Serum Albumin
cm Centimeter
Da Dalton
DinB DNA Polymerase IV
DNA Deoxyribonucleic acid
DTT Dithiothreitol
E. coli Escherichia coli
ET Evolutionary Trace
FPLC Fast protein liquid chromatography
GST Glutathione S-transferase
h Hours
HEPES 4-(2-Hydroxyethyl)-1-Piperazineethanesulfonic Acid
kcat First order rate constant
kDa Kilodalton
KM Michaelis constant
13
KSI Ketosteroid isomerase
L Liter
M Molar
min Minutes
mL Milliliters
Ml Mesorhizobium loti
mM Millimolar
mmol Millimoles
Mt Mycobacterium tuberculosis
NH Nitrile hydratase
nM Nanomolar
nm Nanometers
NTF2 Nuclear Transcription Factor 2
OD Optical density
Pa Pectobacterium atrosepticum
PDB Protein Data Bank
PGI Phosphoglucose isomerase
PhoA Alkaline phosphatase
PNP para-nitrophenol
PNPP para-nitrophenol phosphate
POOL Partial Order Optimum Likelihood
PSI Protein Structure Initiative
14
R2 Regression co-efficient
rcf Relative centrifugal force
RNA Ribonucleic acid
SALSA Structurally Aligned Local Sites of Activity
SDS-PAGE Sodium dodecyl sulfate poly-acrylamide gel electrophoresis
SG Structural Genomics
SVM Support vector machine
TEV Tobacco etch virus
THEMATICS Theoretical Microscopic Anomalous Titration Curve Shapes
TM Melting temperature
Tris-HCl 2-amino-2-hydroxymethyl-propane-1,3-diol
μ3 3rd
central moment
μ4 4th
central moment
v/v Volume by volume
V0 Initial velocity
Vmax Maximum velocity
WT Wild-type
YT Yeast extract and Bacto Tryptone
μL Microliter
μM Micromolar
σ Error
15
A Ala Alanine
C Cys Cysteine
D Asp Aspartic Acid
E Glu Glutamic Acid
F Phe Phenylalanine
G Gly Glycine
H His Histidine
I Ile Isoleucine
K Lys Lysine
L Leu Leucine
M Met Methionine
N Asn Asparagine
P Pro Proline
Q Gln Glutamine
R Arg Arginine
S Ser Serine
T Thr Threonine
V Val Valine
W Trp Tryptophan
Y Tyr Tyrosine
16
Chapter 1. Protein Engineering
1.1. Proteins as catalysts
All known life forms create polymers of various combinations of 20 different amino
acids. These polymers are known as proteins and frequently act as catalysts, in which case they
are then referred to as enzymes. The linear chain of amino acids (primary structure) folds to form
local order such as α-helices and β-strands (secondary structure). These helices, strands, loops,
and other local structures fold into a single overall arrangement (tertiary structure); multiple
chains can associate with each other (quaternary structure). Enzymes catalyze reactions under
physiological conditions, such as neutral pH and room temperature, with extreme specificity and
high efficiency. With few exceptions, enzymes are responsible for catalyzing every important
chemical reaction in biology, giving rise to statements such as Orgel’s First Law:
Whenever a spontaneous process is too slow or too inefficient
a protein will evolve to speed it up or make it more efficient.
Most proteins are on the order of 100 to 1000 amino acids. With 20 canonical amino
acids with which to build, the number of possible protein sequences quickly becomes
unfathomable. For a protein on the smaller end, the number of possible sequences is 20100
. This
number however includes sequences that are nothing more than 200 prolines in a row, a
sequence that would be generally considered non-functional. Estimates put the fraction of
“functional” folds to be 1 in 1077
[1].
How enzymes are capable of achieving the remarkable feats of chemistry required of
them represents a central area of research in biochemistry. While a protein may be composed of
hundreds of amino acids, generally only a small handful of those amino acids are directly
17
involved in the performance of catalysis. These residues compose the “active site” of the
enzyme. In 1946 Linus Pauling postulated that the catalytic power of enzymes lies in their ability
to lower the energy of the transition state between substrate and product[2], a theory which
essentially is still true today[3]. To explain the ability of enzymes to perform reactions only on
their specific cognate substrates, the “lock and key”[4] theory was proposed, eventually giving
way to a more nuanced “induced fit”[5] theory taking into account the transition state geometry
and realistic expectation of a dynamic system. The lock-key and induced fit models generally
assume a globular fold with a solvent accessible active site. In many cases the active site is
buried within the protein or protein cavity[6]. Recently, Jiri Damborsky has pursued a “keyhole-
lock-key” model to address this complication[6, 7].
The Central Dogma[8] of biology, that DNA is transcribed into RNA which is then
translated into protein, provides a natural scheme in which to probe hypotheses about protein
sequence-structure-function relationships. By manipulation of an organism’s DNA, a variant
protein product is produced, which can then either be examined at an in vitro functional level
after isolation, or kept in the organism and the phenotype of the organism observed under
varying conditions to elucidate in vivo function. The sequence-structure relationship is a folding
problem, and while interesting in its own right, will not be addressed here in favor of the
structure-function relationship. One reason is that many sequences result in the same overall
structure. Another reason is that the active site is a structural feature and is the focus of protein
engineering endeavors.
Two things required for protein engineering as a field to emerge were:
-A method to change the protein sequence, and thus structure, with exquisite control
18
-A falsifiable hypothesis of how a change in protein structure will change protein function
This was finally accomplished in 1982, exemplified with a foundational study on tyrosyl-
transfer RNA synthetase[9], after the advent of site-directed mutagenesis which allowed specific,
controlled changes at the DNA level to be specified by the researcher. Protein engineering as a
field today produces marvelous work that ranges from designing enzymes to catalyze Diels-
Alder[10] and Kemp[11] reactions, building a fully functional enzyme from a 9-amino acid
alphabet[12], and creating a completely new fold never seen in nature[13].
1.2. Design vs. Redesign; Directed Evolution vs. Rational Design
To be strict, protein engineering, or protein design, would refer to the process of creating
a functional protein de novo (also referred to as “artificial” enzymes). Most protein engineering
however utilizes already functional enzymes, such as those extracted from organisms of research
interest, and manipulates them in a way to make them more functional, different in function, or
to make them lose their functionality. These are examples of protein reengineering and can often
be seen designated as such in the literature (for example, see recent review by Hilvert[14]).
Presently, the protein engineering paradigm of creating mutations and examining
resulting changes in function is well established. Implementation on the other hand is constrained
by the unfathomably high permutation level proteins occupy; it is impossible for an experimental
lab to investigate every residue in a protein especially if multiple mutations at the same residue
are desired. There are two main approaches to deal with this dilemma: directed evolution and
rational design.
Directed evolution draws upon Darwinian evolution concepts to discover mutations of
interest by iterating rounds of mutagenesis and selection. At its simplest, a gene encoding a
19
protein undergoes non-specific mutagenesis to introduce a large array of mutations, and the
resulting library expressed and a certain phenotype is selected. The survivors of the first round of
selection return to the mutagenesis step to repeat the process until a satisfying level of function is
attained. Since its inception, directed evolution has proven to be a powerful technique for protein
engineering[15, 16] for developing new or improved function.
Rational design represents the oldest method of protein engineering. Using hypotheses
about the roles of particular residues, specific mutations to specific amino acids are chosen,
created, and then the resulting change (or lack thereof) examined. The residues of interest can be
chosen based on crystal structures, previous experiments, sequence comparison, structural
comparison, etc. To determine which residues in a protein contribute to catalysis for example,
one would determine which residues are suspected of contributing to catalysis, and create one or
more mutations that probe this hypothesis.
Figure 1.1. Alanine, aspartate, glutamate, and asparagine at pH 7.
Alanine contains a mere methyl group as its side-chain, whereas aspartic acid is a short
acid. Glutamic acid is another acid with a side chain longer than aspartic acid by a single
methylene, and asparagine contains an amide group instead of the carboxyl group. Often a
residue is changed to alanine due to alanine’s simple nature, consisting of a single methyl group
O
NH3+
CH3O-
O-O
-
NH3+
O
O
O O
O-
O-
NH3+
O
O
NH2
NH3+
O-Glutamate
Alanine Aspartate
Asparagine
20
for a side chain residue. This approximates a loss of both functional group and bulk. Charge and
size are two of the most important characteristics to investigate for an amino acid’s contribution.
Figure 1.1 shows the differences between an aspartic acid and a change to asparagine to change
charge, or a change to glutamic acid to change size.
Many times a residue will contain multiple functionalities. For example, tyrosine contains
both a hydroxyl functional group and an aromatic functional group. To investigate the
contributions of these moieties as separately as possible, a series of mutations such as visualized
in Figure 1.2 could be made. A mutation from tyrosine to serine, while a drastic change in size,
would remove the aromatic functionality. A mutation from tyrosine to phenylalanine would
remove just the hydroxyl group, leaving the 6-membered aromatic ring intact.
O
NH3
+O-
O
NH3
+
OH
O-
O
NH3
+OH O
-
Figure 1.2. Phenylalanine, tyrosine, and serine at pH 7.
Tyrosine provides both an aromatic ring and a hydroxyl group to the active site of an
enzyme. Phenylalanine provides only the aromatic moiety whereas serine adds only a hydroxyl
group without aromaticity. These mutations allow us to test hypotheses pertaining to an
enzyme’s stability, mechanism, selectivity, or efficiency by rational change of the protein. This
method of investigation underpins protein engineering as a powerful tool to investigate the active
sites of proteins.
1.3. Functional Site Prediction with THEMATICS and POOL
The active site of the protein is commonly termed “where the chemistry happens”. For
our purposes we sometimes use a more strict definition of “residues within 5 Å of the site of
Phenylalanine Tyrosine Serine
21
reaction”. These residues interact with the substrate directly, whether it by hydrophobic
interactions, π- π interactions, (de)protonation, hydrogen bonds, Coulomb forces, dipole-dipole
interactions, or covalent bonding. It is of great interest to predict accurately and quickly the
active site of a given protein structure. To that end, the active site prediction method Theoretical
Microscopic Anomalous Titration Curve Shapes (THEMATICS) was published in 2001[17].
THEMATICS uses computational methods to calculate a theoretical titration curve for every
ionizable residue (K, R, D, E, H, Y, C) in a protein structure. A small minority of these titration
curves will show behavior that significantly differs from the ideal Henderson-Hasselbalch
behavior (Figure 1.3). While a single outlier may be a fluke, a “cluster”, defined as two or more
residues with deviant behavior within 6 Å of each other, is considered a positive hit for
identifying the active site.
THEMATICS utilizes the unique property of a catalyst to help find active sites; a catalyst
must replenish itself to the former state at the end of a chemical reaction. For enzymes, of which
there are many, that give or receive a proton there is a fundamental problem that to be acidic
enough to offer a proton, or basic enough to abstract a proton from the substrate, would
necessitate being too weak a base to take back the proton once owned by the enzyme, or too
weak an acid to give back the proton borrowed by the enzyme[18].
If a residue could be both an acceptor and donor of a proton simultaneously, or near
simultaneously, the paradox would be resolved. The residue would have to be ionizable over a
wide range of pH values and not follow Henderson-Hasselbalch behavior: the type of behavior
THEMATICS calculates for known residues of catalytic importance.
22
Figure 1.3. A titration curve of mean net charge as a function of pH for select lysine residues in E. coli β-lactamase.
In Figure 1.3[19] the two filled symbols show the titration curve of two lysines (K146
and K215) that do not contribute to catalysis. The two unfilled symbols in Figure 1.3 show the
titration curves of active site lysines K73 and K234. Note the classic, sharp transition of charge
states as modeled by the Henderson-Hasselbalch equation for the non-catalytic lysines contrasted
to the perturbed, anomalous behavior of the curves for catalytic lysines.
THEMATICS contains additional advantages beyond predicting active sites. Because the
criteria for prediction are based purely on computed chemical properties from the three-
dimensional coordinates for the query protein and are not dependent on homology,
THEMATICS remains immune to false positives due to homology or database misannotation. A
structure of an enzyme could be the only structure in existence, such as a novel or artificial fold,
and THEMATICS will still perform just as powerfully. It has been shown that THEMATICS
works well using a homology model as an input rather than empirical structures[20], and finds
both catalysis and recognition sites of enzymes[21]. Quantitation of the deviation from
Henderson-Hasselbalch behavior was implemented by examining the 3rd
and 4th
central moments
23
of the curves, which correspond to asymmetry and kurtosis respectively[22]. Residues scoring
more than one standard error higher than the average residue of its type were considered positive
hits (Z >1 for μ3 or μ4)[22]. The Z-score cut-off was later refined to Z >0.99 for μ3 or μ4 after it
was found to improve performance on the reference data set[23].
Originally, THEMATICS titration curves were inspected manually for non-Henderson-
Hasselbalch behavior, which raises both resource-commitment and scientist-bias issues.
Automation[24] alleviated of both of these concerns and paved the way to add Support Vector
Machines (SVM) as a potential way of raising THEMATICS recall and precision even
higher[25].
Partial-Order Optimum Likelihood (POOL) combines THEMATICS with other
predictors to create the best functional site predictor to date[26]. Originally using CASTp for
geometric features and ConSurf for sequence-based features[26], POOL has since[27]
incorporated ConCavity[28] for geometric features and INTREPID[29] for sequence-based
phylogenetics features. POOL provides many advantages over THEMATICS: the ability to
predict non-ionizable residues, include sequence/geometric information, and improved
performance. POOL allows non-ionizable residues to be predicted by assigning all residues an
environmental μ3 and μ4 based on the behavior of nearby residues. In addition to the 3rd
and 4th
central moments, the buffer range[27] (BR) was added as a feature to quantitate the wide-range
of buffering capability that is typically high for active site residues. POOL is publicly available
via web at http://www.pool.neu.edu/wPOOL/[30]
http://www.pool.neu.edu/wPOOL/
24
1.4. Catalysis by remote residues
Earlier we defined the active site as “residues within 5 Å of the site of reaction”. Even
during the seminal work on tyrosyl-transfer RNA synthetase the concept of residues remote from
the site of chemical transformation contributing to catalysis seemed evident and was validated by
showing that T40 and H45 contributed to catalysis by binding of the tail phosphate groups of the
ATP moiety[9]. However, here a stricter definition of remote residues is adopted, and we
redefine the active site as “residues within 5 Å of the substrate”, regardless of whether that
particular residue is directly involved in chemical reactions. With this definition, residues such as
T40 and H45 in tyrosyl-transfer RNA synthetase would not be considered remote, but rather it
could be said that the active site of tyrosyl-transfer RNA synthetase is particularly large to
accommodate a particularly large substrate.
As soon as THEMATICS was created, it was noted that certain predictions by
THEMATICS included residues that were not in direct contact with the substrate[17]. These
residues were not only far away from the site of the reaction, but did not have any interaction
with the substrate. Whether these predictions were false positives, or correct predictions yet to be
tested remained an open question[17, 18]. One could imagine an active site to be composed of
layers: the first layer are those residues that are within contact with the substrate, the second
layer would be composed of the residues in contact with, but behind, the first shell, the third shell
would be composed of the residues in contact with, but behind, the second shell.
Figure 1.4 abstractly shows a multi-layered active site consisting of a 1st shell that
interacts with the substrate, a 2nd
shell of residues interacting with the 1st shell, and a 3
rd shell of
residues interacting with the 2nd
shell. Each shell is approximately 5 Å in depth.
25
Figure 1.4. Diagram of a multi-layered active site.
These predicted residues were in the second, or even third, shell of the active site. Nitrile
Hydratase (NH), Phosphoglucose Isomerase (PGI) and DNA Polymerase IV (DinB) were
predicted to contain 2nd
shell residues contributing to catalysis. Alternatively, there are some
enzymes such as Ketosteroid Isomerase (KSI) where no second-shell residues are predicted to be
important for catalysis. It was found that indeed NH[31], PGI[32], and DinB[33] all contain
remote residues contributing to catalysis, whereas KSI[32] possesses a mostly single-layered
active site.
These results show that many, but not all, enzymes contain active sites that are extended,
utilizing remote residues to contribute towards catalysis. THEMATICS and POOL accurately
predict the contributions of remote residues to catalysis by a wide range of enzymes. Thus, the
extent of an enzyme’s active site can be predicted using POOL and THEMATICS.
26
Chapter 2. Alkaline Phosphatase
2.1. Introduction
Alkaline phosphatase (AP) appears across all domains of life releasing phosphate groups
from a wide range of substrates. AP is of great interest for use in diagnostic assays but the
bacterial enzyme is considered too slow compared to the mammalian enzyme, although the
temperature stability of the mammalian enzyme is much lower than the bacterial enzyme (65 °C
and 95 °C TM, respectively)[34]. Alkaline phosphatase has been a staple of enzymology studies
for decades[35] although it is under constant revision and further investigation as to its
mechanism[35-38]. Its thermostability, ubiquity in both nature and the chemical literature, and
ease of kinetic assay present an excellent learning opportunity. As such, the senior-level
Chemical Biology course at Northeastern University utilizes the site-directed mutagenesis and
Michaelis-Menten parameterization of alkaline phosphatase as a long term lab experiment. Some
mutations in this work were designed by undergraduates partaking in this course.
E. coli AP is encoded by the phoA gene and encodes 471 amino acids composing the
precursor protein; the first 21 amino acids contain a periplasmic signal sequence that is then
removed from the protein, resulting in a 450 amino acid enzyme that naturally dimerizes in
solution. Each monomer contains its own active site with three metal ions: two zinc and one
magnesium[39]. These metal ions are held in place by various residues and with no substrate
present are coordinated with three water molecules[40]. The magnesium ion is held in place by
D51, D153, T155, and E322; the zinc1 ion is held in place by R166, D327, H331, and H412; the
zinc2 ion is held in place by D51, R166, D369 and H370[35, 39-49]. K328 interacts with the
27
phosphate moiety through a water molecule[49] and S102 performs the nucleophilic attack on
the substrate[40, 50, 51].
Figure 2.1. The active site of alkaline phosphatase based on PDB ID: 1ALK. Zinc: purple; magnesium: yellow;
phosphate: green/red.
28
2.2. Computational Predictions
Alkaline phosphatase as analyzed by THEMATICS was predicted to have mostly 1st shell
residues, with two predicted 2nd
shell residues and one 3rd
shell residue. In addition to analysis by
THEMATICS, analysis by Evolutionary Trace (ET)[52, 53] predicted a much larger population
of residues that fully included those predicted by THEMATICS.
Figure 2.2. Diagram of Evolutionary Trace and THEMATICS predictions for AP.
29
Each group in Figure 2.2 represents a shell of the active site for AP. Residues predicted
by ET but not predicted by THEMATICS could be non-ionizable (such as the case of S102)
and/or simply not predicted by THEMATICS
POOL
Rank Residue
Raw
POOL score
Normalized
POOL Score
1 ASP 51 2.06E-02 1.00E+00
2 ASP 369 1.42E-02 6.87E-01
3 HIS 370 1.06E-02 5.15E-01
4 ASP 327 1.06E-02 5.15E-01
5 GLU 322 7.89E-03 3.83E-01
6 HIS 412 3.57E-03 1.73E-01
7 HIS 331 5.18E-04 2.52E-02
8 ASP 101 3.57E-04 1.73E-02
9 HIS 372 3.36E-04 1.63E-02
10 ASP 153 1.54E-04 7.50E-03
11 ARG 166 1.10E-04 5.36E-03
12 GLU 341 8.08E-05 3.92E-03
13 LYS 328 7.37E-05 3.58E-03
14 HIS 86 6.85E-05 3.33E-03
15 PRO 156 4.25E-05 2.06E-03
16 GLY 52 3.80E-05 1.85E-03
17 GLU 57 2.20E-05 1.07E-03
18 THR 155 2.11E-05 1.03E-03
19 SER 102 2.11E-05 1.03E-03
20 HIS 162 1.77E-05 8.59E-04
21 MET 53 1.69E-05 8.18E-04
22 ASP 330 9.87E-06 4.79E-04
23 ASN 44 8.80E-06 4.27E-04
24 PHE 317 8.80E-06 4.27E-04
25 GLY 207 8.80E-06 4.27E-04
Table 2.1. POOL predictions for alkaline phosphatase. THEMATICS predictions include those colored. Blue: 1st
shell; yellow: 2nd
shell
30
Shown here are only the 25 mostly highly ranked residues of 450. Residues in blue are
known residues contributing to catalysis via ligand or metals; residues in yellow are predicted by
THEMATICS to be 2nd
or 3rd
shell residues of interest. E341 helps form the dimer interface.
Figure 2.3. A POOL plot of POOL score vs. POOL rank for alkaline phosphatase.
The POOL plot in Figure 2.3 extends out towards a rank of 449, asymptotically
approaching a POOL score of 0. There are quite a few interesting predictions by POOL for
alkaline phosphatase (Table 2.1). It performs well in predicting the first shell of residues, as well
as the dimer-interface forming residue. Residues predicted by THEMATICS all reside in the top
22 (top 5%) of POOL ranking, including the 2nd
and 3rd
shell residues predicted by
THEMATICS. Threonine 155 and serine 102 are both essential for catalysis but only rank as 18
and 19 respectively; because neither serine nor threonine are considered ionizable, THEMATICS
would not predict these residues directly. The computational predictions shown above suggest
that alkaline phosphatase may have a few 2nd
and 3rd
shell residues important for catalysis,
namely E57, D330, and H372 (Figure 2.4).
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50
PO
OL
Sco
re
POOL rank
31
POOL discards the THEMATICS Boolean approach of assigning discrete yes/no values
to predictions of functional importance for residues in exchange for a ranking system (Figure
2.3), complete with its own advantages and disadvantages (see discussion[26]). Traditionally a
%-based cut-off, such as top 8%, 10%, or as low as top 5%, is utilized to determine what the user
should investigate as a residue important for catalysis. However the exact cut-off is still an area
of investigation (see Further Work) and can be dependent on the size and type of protein of
interest.
32
Figure 2.4. The 2nd
and 3rd
shell residues predicted by THEMATICS: (top) H372, (middle) D330, and (bottom) E57.
Zinc: purple; magnesium: yellow; phosphate: green/red.
H372
D330
E57
33
2.3. Materials and Methods
To investigate these predictions by THEMATICS and POOL pertaining to the possible
outer-shell residues in alkaline phosphatase, we employed site-directed mutagenesis to construct
mutants, expressed and purified them, and assayed their activities in reference to the wild-type
protein.
2.3.1. Materials
Quik-Change® site-directed mutagenesis kits (Agilent, CA) were used to make mutations
in pEK29[43] (provided by E. Kantorwitz (Boston College)) using primers below and confirmed
by DNA sequencing (Massachusetts General Hospital DNA Core, Cambridge, MA).
M75T
5'-GGCGATGGGACGGGGGACTCGG-3'
5'-CCGAGTCCCCCGTCCCATCGCC-3'
H394D
5'-CTGATCACGCCGACGCCAGCCAG-3'
5'-CTGGCTGGCGTCGGCGTGATCAG-3'
H108L
5'-GGGCAATACACTCTCTATGCGCTG-3'
5'-CAGCGCATAGAGAGTGTATTGCCC-3'
E79Q
5'-GGACTCGCAAATTACTGCCGCACG-3'
5'-CGTGCGGCAGTAATTTGCGAGTCC-3'
D352N
5'-CGATAAACAGAATCATGCTGCCAATCC-3'
5'-GGATTCGCAGCATGATTCTGTTTATCG-3'
H394L
5'-CGCTGATCACGCCCTCGCCAGCCAG-3'
5'-CTGGCTGGCGAGGGCGTGATCAGCG-3'
M75A
5'-CTGATTGGCGATGGGGCAGGGGACTCG-3'
5'-CGAGTCCCCTGCCCCATCGCCAATCAG-3'
E172Q
5'-GTTTCTACCGCACAGTTGCAGGATG-3'
5'-CATCCTGCAACTGTGCGGTAGAAAC-3'
S127A
5'-GACTCGGCTGCAGCAGCAACCGCC-3'
5'-GGCGGTTGCTGCTGCAGCCGAGTC-3'
Q457E
5'-GGACTGACCGACGAGACCGATCTC-3'
5'-GAGATCGGTCTCGTCGGTCAGTCC-3'
Figure 2.5. Primers for site-directed mutagenesis of E. coli alkaline phosphatase. Codons manipulated are
underlined.
SM547 cells, lacking a chromosomal phoA gene, were provided by E. Kantorwitz and
made competent by chemical treatment with CaCl2 and stored at -80 °C in aliquots. Primers were
34
hydrated to 100 μM concentration with sterile water, and a 5 μM stock created by diluting 20-
fold into sterile water.
2.3.2. Methods
For protein purification, plasmids to express either WT or variant AP were transformed
into SM547 competent cells and selected on LB agar containing 100 μg mL-1
ampicillin. An
overnight culture of 50 mL YT medium containing100 μg mL-1
ampicillin grown at 37 °C was
sub-cultured to 1 L YT supplemented with 100 μg mL-1
ampicillin and growth was continued for
12 hours at 37 °C. The cells were harvested, washed, and osmotically shocked as previously
described by Brockman & Heppel[54] and then precipitated, suspended, dialyzed, and purified
on a HiTrap FastFlow Q column (GE Healthcare) by FPLC as described by Chaidaroglou et
al.[43]. Purity of each fraction was determined by 10% SDS-PAGE and pure fractions were
stored at -20 °C. Concentration of protein was determined by Bradford assay (Bio-Rad) against a
bovine serum albumin standard.
Formation of para-nitrophenyl was measured at 410 nm at room temperature in High Tris
buffer (1.0 M Tris-HCl pH 8.0) from the cleavage of para-nitrophenyl phosphate to calculate
initial velocities with an extinction coefficient of 1.42 x 104 M
-1 cm
-1 (Figure 2.6). Non-linear
regression to calculate KM and kcat was performed using GraphPad Prism 5 version 5.02 . At least
three independent trials were performed for each protein. Data were collected every 0.5 seconds
starting at the 3rd
sec of the reaction and continuing for 2 min to construct the initial velocities,
initiated with addition of enzyme. PNPP was kept in the dark as much as possible, and stored in
light-resistant microcentrifuge tubes when aliquoted.
35
PNPP
μM
Buffer
(2X) Water
PNPP
2 mM
Enzyme
variable nM Total
1 500 483 2 15 1000
2 500 481 4 15 1000
5 500 475 10 15 1000
10 500 465 20 15 1000
20 500 445 40 15 1000
50 500 385 100 15 1000
100 500 285 200 15 1000
200 500 85 400 15 1000 Table 2.2. Kinetic assays for alkaline phosphatase. Bolded columns denote final concentrations, where all other
numbers refer to μL added to the cuvette.
2.4. Results
In order to determine initial velocities by monitoring production of the product 4-
nitrophenol phosphate (4-PNP), a standard curve with dilutions of 4-PNP gives a molar
extinction coefficient of 1.42 x 104 M
-1 cm
-1 similar to the reported 1.62 x 10
4 M
-1 cm
-1 [43].
36
Figure 2.6. Standard curve for 4-nitrophenol phosphate
Each alkaline phosphatase variant was tested concurrently with wild-type alkaline
phosphatase on the same day. Initial velocities, V0, for each substrate concentration (1-200 μM
PNPP) was calculated by taking the slope of the product formation (in a.u. min-1
) and dividing by
the 4-PNP molar extinction coefficient to give μM PNP min-1
.
y = 0.0142x + 0.0304 R² = 0.9993
0
0.5
1
1.5
2
2.5
0 20 40 60 80 100 120 140 160
Ab
sorb
an
ce (
410 n
m)
4-nitrophenol phosphate (μM)
37
Figure 2.7. Michaelis-Menten plots for AP in 1 M Tris-HCl pH 8.0 buffer. Error bars represent standard error of at
least three independent trials.
AP Variant Vmax (μM min-1
) KM (μM) R2
WT 4.8 (0.1) 26.7 (2.6) 0.94
M53A 8.9 (0.4) 26.2 (4.1) 0.96
M53T 1.7 (0.1) 22.7 (2.1) 0.98
E57Q 7.5 (0.3) 25.2 (3.5) 0.97
H86L 3.2 (0.1) 13.5 (0.9) 0.99
S105A 3.1 (0.2) 14.1 (3.1) 0.91
E150Q 5.2 (0.5) 33.2 (8.7) 0.90
D330N 2.0 (0.1) 22.0 (5.4) 0.87
H372D 3.0 (0.3) 53.1 (14.4) 0.91
H372L 4.1 (0.2) 10.6 (1.5) 0.95
Q435E 8.4 (0.1) 20.9 (1.1) 0.99
Table 2.3. WT and variant AP kinetic parameters. Standard errors are in parentheses and consist of at least three
independent trials.
Vmax is not proportional to kcat between enzymes due to the enzymes being at different
concentrations (Appendix A). None of the variants showed a dramatic decrease in activity. While
there are some small differences in individual kcat or KM values, the catalytic efficiencies are all
similar (Table 2.4).
38
PhoA Variant
POOL Rank
Å to PO4 Shell kcat (s
-1) KM (μM) Catalytic Efficiency
(106 M-1 s-1) Fold
Decrease
WT -- -- -- 40 (7.3) 28 (9) 1.43 (0.53) --
H372D 9 6.7 2nd 27 (10) 63 (44) 0.43 (0.34) 3.33 (2.91)
H372L 9 6.7 2nd 6.3 (0.1) 11 (2.9) 0.57 (0.15) 2.49 (1.13)
H86L 14 11.2 2nd 6.3 (0.1) 14 (0.7) 0.45 (0.02) 3.17 (1.19)
S105A 16 7.2 2nd 9.7 (0.7) 14 (3.2) 0.69 (0.17) 2.06 (0.91)
E57Q 17 12.3 3rd 21 (0.8) 26 (4) 0.81 (0.13) 1.77 (0.71)
M53A 21 14.6 3rd 25 (4.4) 27 (16) 0.93 (0.17) 1.54 (0.64)
M53T 21 14.6 3rd 14 (0.8) 23 (3.4) 0.61 (0.10) 2.36 (0.94)
D330N 22 11.0 2nd 17 (4.2) 25 (2.5) 0.68 (0.18) 2.1 (0.96)
Q453E 44 11.1 2nd 17.4 (0.4) 21 (3) 0.83 (0.12) 1.72 (0.68)
E150Q 136 10.2 2nd 13 (8.8) 34 (6.4) 0.38 (0.27) 3.74 (2.97) Table 2.4. Summary calculations for WT alkaline phosphatase and variants. Standard errors are in parentheses and
consist of at least three independent trials.
There is no correlation between either POOL rank nor distance (Å) to the phosphate
substrate for the residues tested, nor is there a significant difference in results between the 2nd
shell and the 3rd
shell residues as groups. Distances from the PO4 (Å) are based on
PDB:1ALK[39] and measured from tip of the residue side chain to the phosphorous atom.
Figure 2.8. Catalytic efficiencies of wild-type and variant alkaline phosphatases. Error bars represent standard error
over at least three independent trials.
0.00
0.50
1.00
1.50
2.00
2.50
WT H372D H372L H86L S105A E57Q M53A M53T D330N Q453E E150Q
Cata
lyti
c E
ffec
ien
cy (
10
6 M
-1 s
-1)
AP variant
39
2.5. Conclusions
WT alkaline phosphatase is catalytically efficient with a kcat/KM of 1.5 x 106 M
-1 s
-1,
which is similar to the literature values reported across various experiments[34, 43, 55, 56].
Mutations disrupting interactions of the active site at the first shell commonly decrease the
catalytic efficiency of alkaline phosphatase by many orders of magnitude. In contrast, throughout
this work and from the compiled literature of single-mutation variants, mutations in the second
or third shell have little to no effect on catalysis. Alkaline phosphatase seems to have a compact
active site comprising solely first-shell residues that contribute significantly to catalysis as
measured by single-point mutants.
Figure 2.9. AP residues investigated in this work. Zinc: purple; magnesium: yellow; phosphate: green/red.
It has been shown that the turnover rate of AP can be increased substantially by multiple
mutations, including 2nd
shell mutations. A D153G/D330N double mutant was reported to have
over 50-fold higher kcat than the WT AP; the KM was also raised by about 30 fold leaving the
40
enzyme with less than a 2-fold higher overall catalytic efficiency, however[34]; similarly,
D101A gives a 2-fold increase to kcat and KM negating each other[57]. D153A by itself, while
resulting in almost no change in catalytic efficiency, resulted in a 7-fold increase in each kcat and
KM[42]. D101, D153, and D330 are all predicted by POOL and rank 8th
, 10th
, and 22nd
respectively. Multiple mutations in close space achieved modest 2- to 6-fold increases in kcat/KM
including V99A, T100V, T100I, and D101S[58].
Figure 2.10. A plot of Table 3 and Table 4 showing effects on catalytic efficiency based on POOL rank for AP.
The largest loss of activity are seen in D327 and S102, with no mutations on residues
outside the top 20 predicted residues by POOL having a large (>1 magnitude) decrease in
catalytic efficiency. It is important to note that this compilation only examines single mutations
where both subunits are affected. Alkaline phosphatase has been known to shown intragenic
complementation where a heterodimer of variants A and B, AB, will have higher activity than
0.1
1
10
100
1000
10000
100000
1000000
0 20 40 60 80 100 120 140 160
Fold
Dec
rease
over
res
pec
tive
WT
AP
(log s
cale
)
POOL Rank for AP
41
AA or BB[56]. While the two active sites per dimer are more than 30 Å apart, there seems to be
molecular communication between them.
Variant Shell
Pool
Rank
POOL
percentile
(kcat/KM) wild-type /
(kcat/KM) mutant Reference
D51E 1 1 99 231 [44]
D369N 1 2 99 95 [56]
D327N 1 4 99 4350 [45]
D327N 1 4 99 100 [46]
D327A 1 4 99 >600,000 [45]
D327A 1 4 99 >1,000,000 [46]
E322K 1 5 99 1520 [56]
H412Y 1 6 99 >12,000 [56]
H412E 1 6 99 2237 [44]
H331E 1 7 98 972 [44]
D101S 1 8 98 0.2 [59]
D101A 1 8 98 1 [57]
D153G 1 10 98 0.2 [59]
D153H 1 10 98 1.1 [34]
D153H 1 10 98 3.5 [47]
D153E 1 10 98 1.3 [44]
D153A 1 10 98 1.1 [42]
D153N 1 10 98 1.1 [42]
R166A 1 11 98 313 [43]
R166S 1 11 98 125 [43]
R166Q 1 11 98 166 [48]
R166K 1 11 98 4 [48]
K328R 1 13 97 0.9 [58]
K328C 1 13 97 10 [51]
K328H 1 13 97 0.5 [34]
K328H 1 13 97 3.2 [49]
K328A 1 13 97 3.8 [49]
T155M 1 18 96 678 [56]
S102G 1 19 96 >300,000 [60]
S102A 1 19 96 >60,000 [60]
S102C 1 19 96 >19,000 [60]
Table 2.5. 1st shell variants of AP and their catalytic efficiency under comparable conditions to our experiments.
42
Variant Shell
Pool
Rank
POOL
percentile
(kcat/KM) wild-type /
(kcat/KM) mutant Reference
H372A 2 9 98 2.9 [61]
H372D 2 9 98 3.3 This Work
H372L 2 9 98 2.5 This Work
H86L 2 14 97 3.1 This Work
E57Q 2 17 96 1.8 This Work
M53A 2 21 95 1.5 This Work
M53T 2 21 95 2.4 This Work
D330N 2 22 95 0.2 [34]
D330N 2 22 95 2.1 This Work
Q435E 2 44 90 1.7 This Work
A103C 2 50 89 0.9 [58]
A103D 2 50 89 2.2 [58]
T100V 2 51 89 0.3 [58]
T100I 2 51 89 0.3 [58]
V99A 2 100 78 0.2 [58]
S105L 2 136 70 6.3 [56]
S105A 2 136 70 2.1 This Work
E150Q 3 106 76 3.7 This Work
E341K * 12 97 1407 [56]
T59A * 169 37 1.5 [62]
T59R * 169 37 >600,000 [62]
Table 2.6. 2nd
and 3rd
shell variants of AP and their catalytic efficiency under comparable conditions to our
experiments.
43
Chapter 3. Ketosteroid Isomerase
3.1. Introduction
Ketosteroid isomerase (KSI) moves a double bond to convert ∆5-3-ketosteroids to ∆
4-3-
ketosteroids by cleavage of the C-H bond at C4 and reattaching the proton at C6. This reaction is
characteristic of many biological processes of intramolecular abstraction and reprotonation
(Figure 3.1). Considering that some known KSI enzymes reach diffusion-limited rates of
reaction[63, 64], KSI is an attractive model for studying enzyme kinetics and active site
engineering[32, 63, 65]. There are two well-studied sources of ∆5-3-ketosteroid isomerase:
Pseudomonas putida (PpKSI) and Commamonas testosteroni (CtKSI). These two enzymes have
practically identical active sites and catalytic residues placement, while sharing only 34% amino
acid sequence identity[63]. This fold is not entirely uncommon in nature[66], being
superimposable on Nuclear Transcription Factor 2[67] despite lack of sequence homology or
function similarity[66, 68].
O-
O
R
OH
R
OHO
R
O
O
CH3
CH3
H
H
H
OHO
R
OH
R
OHO
R
O
O-
CH3
CH3
H
H
O-
O
R
OH
R
OHO
R
O
O
CH3
CH3
H
H
H
Figure 3.1. Mechanism of KSI based on PpKSI numbering.
The active site of KSI is particularly hydrophobic which is reasonable for an enzyme that
binds steroid ligands[66]. The mechanism for KSI involves abstraction of a proton at the C4
D40 D40 D40
D103 D103 D103
Y16 Y16 Y16
44
position by D40 (PpKSI numbering) followed by stabilization of the intermediate by D103 and
Y16[32, 63, 65, 66, 69]. Regeneration of the catalyst is achieved by the C6 carbon abstracting
the hydrogen from D40. The ability for an aspartic acid to act as a base is of particular interest,
especially with a nearby aspartic acid requiring protonation to stabilize the resulting enolate ion.
With the advent of the Protein Structure Initiative many crystal structures are uploaded to
the Protein Data Bank with putative, predicted, or unknown function. These structures often have
function assignments based purely on sequence or structural similarity. With misannotation in
databases becoming an increasing problem[70], recently we have developed a method to help
assign function to structures without biochemical data called SALSA: Structurally Aligned Local
Sites of Activity[71]. Because THEMATICS and POOL allow the active site of any protein to be
predicted regardless of existing homology and based solely upon the tertiary structure of the
enzyme they are optimal for prediction of protein function that may be incorrectly annotated.
There are three putative KSI proteins from structural genomics centers from three
organisms: Mycobacterium tuberculosis (MtKSI), Pectobacterium atrosepticum (PaKSI), and
Mesorhizobium loti (MlKSI). Previous work in our group by Dr. Srinivas Somarowthu has
shown that of these three, only MtKSI possesses KSI activity. However, the catalytic efficiency
of MtKSI was found to be on the order of 105 M
-1 s
-1, a thousand times lower than PpKSI’s
efficiency of 108 M
-1 s
-1. This begs the question: what are the key differences that lead to this
loss of activity between MtKSI and PpKSI? Can the activity of MtKSI be brought to PpKSI
levels by making the MtKSI active site more PpKSI-like?
http://www.pdb.org/pdb/search/smartSubquery.do?smartSearchSubtype=TreeEntityQuery&t=1&n=29471http://www.pdb.org/pdb/search/smartSubquery.do?smartSearchSubtype=TreeEntityQuery&t=1&n=381
45
3.2. Computational Predictions
For each known KSI and putative KSI, POOL ranked each residue’s importance for
catalysis and the top 10% for each was used as a cut-off. The structures were aligned based on
their active sites and a structural alignment Table was created (see Table 3.1). Nuclear
Transcription Factor 2 (NTF2) contains an incredibly similar overall fold without sharing any
function with KSI and thus was used as a negative control.
PDB Structurally aligned POOL predicted residues
PpKSI 1oh0 Y32 Y57 Y16 D40 W120 F56 G49 P41 D103 D35 G43 E39 M116
CtKSI 8cho F30 Y55 Y14 D38 F116 F54 G47 P39 D99 D33 G41 E37 M112
MtKSI 2z76 M32 F64 S16 D40 W128 F63 G56 P41 F111 D35 G43 E39 M124
MlKSI 3hx8 Y52 W76 F36 P60 S146 L75 G68 P61 Y125 D55 - F59 D142
PaKSI 3d9r Y35 Y59 Y19 G43 K131 V58 G51 P44 E110 D38 - M42 Y127
NTF2 1oun Y33 L56 Y18 W41 A122 K55 G48 E42 Q101 A36 - T40 D117
Table 3.1. SALSA alignment of POOL predicted residues for known KSI proteins and proteins annotated as putative
KSIs. Bold: POOL-predicted; underlined: literature annotated.
POOL prediction based on top 10% of rankings. The proteins in Table 3.1, in order from
top to bottom: two known KSIs, three SG putative KSIs, and a nuclear transcription factor of
similar structure, shown for comparison. For the three putative KSIs, only MtKSI’s active site is
both predicted and similar to the known KSI active sites; both MlKSI and PaKSI do not have
similar active sites, nor are the residues in the same spatial positions as the KSI active site
predicted to be important for activity. The match between MtKSI and PpKSI / CtKSI is not
100%. While a tyrosine to phenylalanine mismatch is somewhat conservative, it is of note that
for MtKSI that F64 of interest is not predicted by POOL to be important for catalysis. The same
can be said for the S16 where PpKSI and CtKSI have a tyrosine as well. The essential aspartic
acid at D40 is conserved, but curiously the other aspartic acid at PpKSI-D103 / CtKSI-D99
which is thought to be essential is replaced by a non-POOL-predicted F111 in MtKSI.
46
3.3. Materials and Methods
Wild-type MtKSI DNA was obtained in the form of a plasmid pGST-Rv0760c (Craig
Garen and Prof. Michael James, Department of Biochemistry, University of Alberta) encoding MtKSI with a
GST-tag, as well as an ampicillin resistance marker gene. Steroids were purchased from
Steraloids Inc, RI, USA. Primers were hydrated to 100 μM concentration with sterile water, and
a 5 μM stock was created by diluted 20-fold into sterile water. Codons manipulated are
underlined.
3.3.1. Methods
QuikChange (Agilent Technologies) site-directed mutagenesis was used to mutate the
wild-type KSI gene with the following mutations: S16Y, F64Y, F111D, S16Y/F64Y,
S16Y/F111D, F64Y/F111D, and S16Y/F64Y/F111D. Since the amino acids of interest are coded
by codons far enough apart, multiple mutations can be introduced using single-mutation primers
in succession.
MtKSI.F111D-F GGCGTGGACACCTACCGGGTG
MtKSI.F111D-R CACCCGGTAGGTGTCCACGCC
MtKSI.F64Y-F GGCGCCTTCTACGACACACAC
MtKSI.F64Y-R GTGTGTGTCGTAGAAGGCGCC
MtKSI.S16Y-F CGCAGTCGTACTGGCGGTGCG
MtKSI.S16Y-R CGCACCGCCAGTACGACTGCG
Figure 3.2. Primers for site-directed mutagenesis of MtKSI in plasmid pGST-Rv0760c.
BL21 DE3 pLysS competent cells were transformed with pGST-Rv0760c containing
either WT or mutations, and after streaking a transformed colony, a single colony was used to
inoculate 50 mL of LB liquid culture which was grown overnight with 100 μg μL-1
ampicillin.
The next day, the 50 mL culture transferred to a 500 mL of LB liquid culture with 100 μg μL-1
and grown with shaking for 2 h at 37 °C. Once an OD of 0.5-0.8 at 600 nm was obtained, the
47
culture was brought to 0.5 mmol L-1
IPTG to induce expression and agitated at room temperature
overnight. After overnight growth, the culture was harvested by centrifugation at 6000 RPM for
10 minutes, suspended in 1X Phosphate Buffered Saline (PBS) pH 7.3 with 1 mM DTT and ½ a
tablet of Roche Protease Inhibitor cocktail (Buffer A) and stored at -80°C.
Frozen pellets from the -80 °C freezer were thawed overnight in ice. The suspended,
thawed cells were subjected to sonication for 2 min (multiple rounds of 10 sec on followed by 10
sec off) and then clarified by centrifugation at 14,000 rcf for 60 min. The supernatant was
collected and loaded onto a disposable 4B Sepharose GST column resin (GE Healthcare). The
column was washed with Buffer A extensively, and then the GST-tagged MtKSI gradually eluted
with 1 to 10 mM reduced glutathione. Fractions containing MtKSI determined by SDS-PAGE
were collected and combined with histidine-tagged TEV protease overnight and then dialyzed
against Buffer A to remove any reduced glutathione. The solution was then run through a 4B
Sepharose GST column, except this time the KSI was collected in the initial flow through, and
then filtered onto a Nickel FPLC column to remove the histidine-tagged TEV protease, and the
MtKSI was collected in the flow through. Fractions containing MtKSI were determined by SDS-
PAGE, and then concentrated using Viva-spin tubes with a 5000 Da Molecular Weight Cut Off
(SartoriusStedim biotech) while being exchanged into KSI storage buffer (50 mM NaCl, 10 mM
Tris-HCl, 1 mM DTT, pH 8.0). Purity was determined by SDS-PAGE and protein concentration
determined by Bradford Assay against a BSA standard.
Activity of MtKSI was determined by formation of 4-androstene-3,17-dione (4AND) by
isomerization of 5-androstene-3,17-dione (5AND), measured at 248 nm by a UV/Vis instrument
for a fixed enzyme concentration and varying substrate concentration between 30 and 300 μM
48
5AND while keeping final methanol concentration 3.3% v/v (Table 3.2; Figure 3.3). Enzyme
concentration was fixed at a final concentration of 10 nM from a 1.2 μM stock that was made by
diluting purified KSI with a dilution buffer (34 mM KCl, 2.5 mM EDTA, 1% BSA, pH 7.0).
Reactions were blanked with all reagents except the substrate, 5AND. 5AND was added,
mixed completely quickly, and then the absorbance at 248 nm tracked for 60 seconds, starting
after 3 seconds, every 0.5 seconds.
[5AND]
μM final
[KSI]
nM
2 X
Buffer Water Methanol
KSI
1200 nM
5AND
3 mM
5AND
10 mM Total
10 10 1500 1375 90 25 10 - 3000
20 10 1500 1375 80 25 20 - 3000
30 10 1500 1375 70 25 30 - 3000
60 10 1500 1375 40 25 60 - 3000
90 10 1500 1375 10 25 90 - 3000
120 10 1500 1375 64 25 - 36 3000
180 10 1500 1375 46 25 - 54 3000
300 10 1500 1375 10 25 - 90 3000
Table 3.2. Kinetic assays for MtKSI. Bolded columns denote final concentrations, where all other numbers refer to
μL added to the cuvette.
3.4. Results
y = 0.0142x - 0.0088 R² = 0.9994
y = 0.0147x + 0.0076 R² = 0.9971
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50 60 70
Ab
sorb
an
ce (
248 n
m)
4AND (μM)
49
Figure 3.3. Standard curve for 4-androstene-3,17-dione (4AND).
A molar extinction coefficient of 1.4 x 104 M
-1 cm
-1 was used for kinetic analysis. Two of
two trials are shown in Figure 3.3.
Figure 3.4. Michaelis-Menten plots for WT MtKSI and variants. Error bars represent standard error of at least three
independent trials.
WT and F64Y KSI show increasing V0 with increasing substrate concentration, although
the V0 do not approach a Vmax due to poor solubility of the substrate. Therefore, all KM values for
KSI are reported as KMapp
. Vmax can still be extrapolated by non-linear regression, but with less
accuracy represented by lower regression coefficients and higher standard errors.
Figure 3.5. WT KSI and F64Y individual Michaelis-Menten plots.
MtKSI WT MtKSI F64Y
50
Any variant containing S16Y and/or F111D however does not show classic Michaelis-
Menten behavior along with significantly diminished activity. Purification of MtKSI-F111D was
problematic, including low yields and loss of protein after concentration. Only a small amount of
data could be obtained for F111D, but there seems to be no deviation from the behavior shown
by the other non-F64Y mutants.
MtKSI Variant Vmax KMapp
R2
WT 63.67 (29.08) 453.6 (297.3) 0.7611
S16Y 10.04 (25.41) 1607 (4635) 0.5909
F64Y 57.83 (3.993) 231.1 (27.89) 0.98
S16Y/F64Y 18.65 (29.19) 2485 (4247) 0.8972
S16Y/F111D 4.094 (1.148) 273.4 (127) 0.8289
F64Y/F111D 0.3873 (0.104) 3.235 920.3) 0.2783
S16Y/F64Y/F111D 3.821 (11.98) 1674 (5973) 0.5009 Table 3.3. Vmax and KM
app for WT MtKSI and variants. Standard errors are in parenthesis and consist of at least three
independent trials.
WT V0
μM min-1
F111D V0
μM min-1
Trial 1 11.8 0.68
Trial 2 4.9 0.70
Trial 3 11.8 0.88
Trial 4 13.8 0.85 Table 3.4. Comparison between the WT MtKSI and F111D variant at 90 μM 5AND.
Figure 3.6. Single experiment of Michaelis-Menten plot for MtKSI F111D.
51
kcat
(s-1
)
KMapp
(μM)
Catalytic
Efficiency
(103 M
-1 s
-1)
Fold decrease
to WT
WT 106 (48) 454 (297) 234 (187) --
S16Y 17 (42) 1607 (4635) 10 (40) 23 (88)
F64Y 96 (6.7) 231 (28) 417 (58) 0.6 (0.45)
F111D 1.5 (--) 70 (--) 36 (--) 6.4 (5.15)
S16Y/F64Y 31 (49) 2485 (4247) 13 (29) 18.7 (46)
S16Y/F111D 6.8 (1.9) 273 (127) 25 (14) 9.4 (9.1)
F64Y/F111D 0.6 (0.2) 3.2 (20) 200 (1253) 1.2 (7.4)
S16Y/F64Y/F111D 6.4 (20) 1674 (5973) 4 (18) 62 (296) Table 3.5. Catalytic efficiency for WT MtKSI and variants. Where available, standard errors are in
parentheses and consist of at least three independent trials.
For comparison, in Table 3.5, PpKSI’s catalytic efficiency is 100,000 x 103 M
-1 s
-1. F64Y
retained the same kcat while having a lower KMapp
, giving it a higher catalytic efficiency than the
WT. All other mutants lack the signal to noise required to make an accurate analysis of their
Michaelis-Menten paramaters or catalytic efficiency.
For any variant tested besides F64Y, the Michaelis-Menten parameters of KM and kcat
could not be calculated, evidenced by higher standard errors than measurements themselves for
most of these variants. Enzyme efficiencies may be compared without separating the KM and kcat
variables. If the concentration of substrate is negligible compared to the KM, the additive term of
substrate concentration in the Michaelis-Menten equation can be dropped.
Assuming [s]
52
MtKSI Variant V0
Fold Decrease to
WT
WT 3.87 (1.4) --
S16Y 0.27 (0.24) 14.3 (13.7)
F64Y 6.80 (3.3) 0.6 (0.35)
F111D 0.41 (--) 9.4 (--)
S16Y/F64Y 0.26 (0.02) 14.8 (5.7)
S16Y/F111D 0.47 (0.28) 8.2 (5.8)
F64Y/F111D 0.41 (0.23) 9.4 (6.4)
S16Y/F64Y/F111D 0.42 (--) 9.2 (--) Table 3.6. WT MtKSI and variants compared solely based on initial velocities at 30 μM 5AND. Where
available, standard deviations are in parentheses and represent at least three independent trials.
These results only report ratios of catalytic efficiency without examining kcat or KMapp
individually. F64Y results are similar between this method and full Michaelis-Menten kinetic
analysis.
53
3.5. Conclusions
For any mutation tested in MtKSI, or combination thereof, the resulting variant had little
to no KSI activity on 5AND except for the F64Y variant, and proper Michaelis-Menten curves
could not be constructed. Why did we not increase the catalytic efficiency to more closely
approximate the PpKSI and CtKSI forms with a more “PpKSI-like” active site?
Figure 3.7. “Top-down” view of PpKSI (PDB ID: 1OHO; Red), CtKSI (PDB ID: 8CHO; Orange), and
MtKSI (PDB ID: 2Z76; Yellow).
The steroid-binding pocket and active site is at the front of Figure 3.7. The left group of
residues is Y57, Y55, and F64 respectively. The top group of residues is Y16, Y14, and S16
respectively. The right group of residues is D103, D99, and F111 respectively. MtKSI-F64,
while being spatially aligned with PpKSI-Y57 in many structural alignments, is actually
swiveled almost 180° away from where they phenol group is pointing in either PpKSI or CtKSI
(Figure 3.7). There are few replacements for PpKSI-Y57 and MtKSI-Y113 is too far away to
54
take over its job[68].This seems to be a limitation of structural alignments more than SALSA,
but calls to attention the importance of human verification. In this respect, it makes sense for the
F64Y variant to have unmodified catalytic activity.
Figure 3.8. Three residues of interest in PpKSI (PDB ID: 1OHO; Red), CtKSI (PDB ID: 8CHO; Orange),
and MtKSI (PDB ID: 2Z76; Yellow) without surrounding secondary structure.
F111 and S16 from MtKSI overlap well with their SALSA partners in PpKSI and CtKSI
(Figure 3.8). However, mutations to make the side-chains similar resulted in loss of activity. The
natural substrate for MtKSI is unknown. Because Y16 PpKSI / CtKSI position is used in
recognition of the steroid ligand[68], MtKSI could very well use a different steroid. If the natural
substrate for MtKSI is a different steroid, this would explain the reduced catalytic efficiency on
5AND and sensitivity to changing the binding recognition pocket.
The identification of F111 in MtKSI as spatially equivalent to PpKSI-D103 does not
seem to be an alignment error; there are no residues in the MtKSI structure that seem capable of
55
replicating the essential catalytic role of D99/D103. Indeed, the authors in the report of the
crystal structure use this to argue against Rv0760c having KSI activity and reported no activity
on 5AND[68].
The peculiarity of the POOL and SALSA predictions stands out after these results. What
does it mean for an enzyme to not only have a strikingly different residue at a catalytic position,
but also for that residue to not be predicted for activity by POOL? Clearly it doesn’t discount a
certain functional activity, such as ketosteroid isomerization of 5AND, but it may correspond to
different substrate recognition, or even a different mechanism.
How many differences are required to declare two enzymes to have different functions,
and how many similarities must there be before they are declared similar? This is a current area
of investigation[71].
56
Chapter 4. Future Work
4.1. POOL-rank cut-offs
THEMATICS is a Boolean predictor giving either a yes or no for each residue in a
protein structure. In contrast, POOL assigns a ranking to every residue in a given protein
structure, and it is up to the user to determine what cut-off to implement for best results.
Traditionally, the top 5, 8, or even 10% of POOL predictions are considered to be positive
predictions[26, 27, 30, 32, 33, 71]. There remain two open questions:
1) Should POOL prediction be based on percentage or POOL score?
2) Can we use POOL to predict single-layer vs. multi-layer enzyme active sites?
Recent work has shown convincingly that a POOL normalized score cut-off is superior to
flat % cut-offs. By itself, a percentage cut-off presents an odd assumption that the number of
partaking residues of an active site is linearly and directly proportional to the total number of
amino acids. Rather, assigning an absolute cut-off of normalized POOL score (such as 0.01)
seems more rational, and is currently being investigated as the next generation cut-off for POOL.
The second question remains much more difficult and lies central to our work on active
site catalysis, engineering, and understanding. It has been shown that there are some multi-layer
active site enzymes[31-33] and some single-layer active site enzymes[32][this work]. How to
differentiate easily though, much less without examining each predicted residue, is still an
ongoing discussion. It has been proposed that the shape of the POOL plot itself may provide
predictive power regarding the extent of an enzyme’s active site. This hypothesis comes from an
empirical observation across a few proteins studied so far that the POOL plots seem to drop
57
much more sharply for enzymes with single-layer active sites than multi-layer active sites
(Figure 4.1).
Single-layered active site proteins alkaline phosphatase and ketosteroid isomerase have
sharp decreases immediately, flat-lining by their 10th
residue for AP and even by the 5th
residue
for KSI. Multi-layered active site proteins phosphoglucose isomerase, cobalt-type nitrile
hydratase, α subunit of pol III (DnaE), and pol IV (DinB) have extended tails on their POOL
plots and start flat-lining farther out compared to AP and KSI.
58
Figure 4.1. POOL plots for AP, KSI, PGI, NH, DnaE and DinB.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20
Norm
ali
zed
PO
OL
sco
re
POOL rank
AP POOL plot
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20
Norm
ali
zed
PO
OL
sco
re
POOL rank
KSI POOL plot
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20
Norm
ali
zed
PO
OL
sco
re
POOL rank
PGI POOL plot
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20
Norm
ali
zed
PO
OL
sco
re
POOL rank
NH POOL plot
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20
Norm
ali
zed
PO
OL
sco
re
POOL rank
DnaE POOL Plot
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20
Norm
ali
zed
PO
OL
sco
re
POOL rank
DinB POOL Plot
59
Appendix A. Propagation of error in calculating catalytic efficiency
Both the Vmax value and the KM value are calculated with their respective standard errors
based on the inputs to the GraphPad Prism program. Vmax values are converted to kcat values by
the following transformation, where kcat is in s-1
, Vmax in μM min-1
and [enzyme] in μM:
AP Variant [e] (μM)
WT 0.002
H372D 0.011
H372L 0.011
H86L 0.0083
S105A 0.0018
E57Q 0.006
M53A 0.006
M53T 0.002
D330N 0.002
Q453E 0.008
E150Q 0.0023
Table A.1. Concentrations of enzymes used to gather kinetic data for alkaline phosphatase.
All MtKSI kinetic experiments were done with 0.010 μM enzyme. Enzymes were diluted
from stock concentrations measured by Bradford assays using a BSA standard. Catalytic
efficiency is defined as the kcat divided by KM. To propagate the error in each measurement, I
used (where σx is the standard error of variable x):
√(
)
(
)
Where in our case Z is catalytic efficiency, X is kcat, and Y is KM.
60
References
1. Axe, D.D., Estimating the prevalence of protein sequences adopting functional enzyme
folds. J Mol Biol, 2004. 341(5): p. 1295-315.
2. Pauling, L., Molecular architecture and biological reactions. Chem. Eng. News, 1946.
24(10): p. 1375-1377.
3. Garcia-Viloca, M., et al., How enzymes work: analysis by modern rate theory and
computer simulations. Science, 2004. 303(5655): p. 186-95.
4. Fischer, E., Einfluss der Configuration auf die Wirkung der Enzyme. Berichte der
deutschen chemischen Gesellschaft, 1894. 27(3): p. 2985-2993.
5. Koshland, D.E., Application of a Theory of Enzyme Specificity to Protein Synthesis.
Proceedings of the National Academy of Sciences, 1958. 44(2): p. 98-104.
6. Damborsky, J. and J. Brezovsky, Computational tools for designing and engineering
biocatalysts. Curr Opin Chem Biol, 2009. 13(1): p. 26-34.
7. Gora, A., J. Brezovsky, and J. Damborsky, Gates of Enzymes. Chemical Reviews, 2013.
8. Crick, F., Central Dogma of Molecular Biology. Nature, 1970. 227(5258): p. 561-563.
9. Brannigan, J.A. and A.J. Wilkinson, Protein engineering 20 years on. Nat Rev Mol Cell
Biol, 2002. 3(12): p. 964-70.
10. Siegel, J.B., et al., Computational design of an enzyme catalyst for a stereoselective
bimolecular Diels-Alder reaction. Science, 2010. 329(5989): p. 309-13.
11. Rothlisberger, D., et al., Kemp elimination catalysts by computational enzyme design.
Nature, 2008. 453(7192): p. 190-5.
12. Walter, K.U., K. Vamvaca, and D. Hilvert, An active enzyme constructed from a 9-amino
acid alphabet. J Biol Chem, 2005. 280(45): p. 37742-6.
13. Kuhlman, B., et al., Design of a novel globular protein fold with atomic-level accuracy.
Science, 2003. 302(5649): p. 1364-8.
14. Hilvert, D., Design of protein catalysts. Annu Rev Biochem, 2013. 82: p. 447-70.
15. Turner, N.J., Directed evolution drives the next generation of biocatalysts. Nat Chem
Biol, 2009. 5(8): p. 567-73.
16. Jackel, C. and D. Hilvert, Biocatalysts by evolution. Curr Opin Biotechnol, 2010. 21(6):
p. 753-9.
17. Ondrechen, M.J., J.G. Clifton, and D. Ringe, THEMATICS: a simple computational
predictor of enzyme function from structure. Proc Natl Acad Sci U S A, 2001. 98(22): p.
473-8.
18. Shehadi, I.A., H. Yang, and M.J. Ondrechen, Future directions in protein function
prediction. Mol Biol Rep, 2002. 29(4): p. 329-35.
19. Shehadi, I.A., et al., Active site prediction for comparative model structures with
thematics. J Bioinform Comput Biol, 2005. 3(1): p. 127-43.
20. Shehadi, I.A., et al., THEMATICS is effective for active site prediction in comparative
model structures, in Proceedings of the second conference on Asia-Pacific bioinformatics
- Volume 292004, Australian Computer Society, Inc.: Dunedin, New Zealand. p. 209-215.
21. Ringe, D., et al., Protein structure to function: insights from computation. Cell Mol Life
Sci, 2004. 61(4): p. 387-92.
61
22. Ko, J., et al., Prediction of active sites for protein structures from computed chemical
properties. Bioinformatics, 2005. 21 Suppl 1: p. i258-65.
23. Wei, Y., et al., Selective prediction of interaction sites in protein structures with
THEMATICS. BMC Bioinformatics, 2007. 8(1): p. 119.
24. Ko, J., et al., Statistical criteria for the identification of protein active sites using
theoretical microscopic titration curves. Proteins: Structure, Function, and
Bioinformatics, 2005. 59(2): p. 183-195.
25. Tong, W., et al., Enhanced performance in prediction of protein active sites with
THEMATICS and support vector machines. Protein Sci, 2008. 17(2): p. 333-41.
26. Tong, W., et al., Partial Order Optimum Likelihood (POOL): Maximum Likelihood
Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties.
PLoS Comput Biol, 2009. 5(1): p. e1000266.
27. Somarowthu, S., et al., High-performance prediction of functional residues in proteins
with machine learning and computed input features. Biopolymers, 2011. 95(6): p. 390-
400.
28. Capra, J.A., et al., Predicting Protein Ligand Binding Sites by Combining Evolutionary
Sequence Conservation and 3D Structure. PLoS Comput Biol, 2009. 5(12): p. e1000585.
29. Sankararaman, S. and K. Sjölander, INTREPID—INformation-theoretic TREe traversal
for Protein functional site IDentification. Bioinformatics, 2008. 24(21): p. 2445-2452.
30. Somarowthu, S. and M.J. Ondrechen, POOL server: machine learning application for
functional site prediction in proteins. Bioinformatics, 2012. 28(15): p. 2078-2079.
31. Brodkin, H.R., et al., Evidence of the participation of remote residues in the catalytic
activity of Co-type nitrile hydratase from Pseudomonas putida. Biochemistry, 2011.
50(22): p. 4923-35.
32. Somarowthu, S., et al., A tale of two isomerases: compact versus extended active sites in
ketosteroid isomerase and phosphoglucose isomerase. Biochemistry, 2011. 50(43): p.
9283-95.
33. Walsh, J.M., et al., Effects of non-catalytic, distal amino acid residues on activity of E.
coli DinB (DNA polymerase IV). Environ Mol Mutagen, 2012. 53(9): p. 766-76.
34. Muller, B.H., et al., Improving Escherichia coli alkaline phosphatase efficacy by
additional mutations inside and outside the catalytic pocket. Chembiochem, 2001. 2(7-8):
p. 517-23.
35. Coleman, J.E., Structure and mechanism of alkaline phosphatase. Annu Rev Biophys
Biomol Struct, 1992. 21: p. 441-83.
36. Lassila, J.K., J.G. Zalatan, and D. Herschlag, Biological Phosphoryl-Transfer Reactions:
Understanding Mechanism and Catalysis. Annu Rev Biochem, 2011. 80(1): p. 669-702.
37. Andrews, L.D., T.D. Fenn, and D. Herschlag, Ground State Destabilization by Anionic
Nucleophiles Contributes to the Activity of Phosphoryl Transfer Enzymes. PLoS Biol,
2013. 11(7): p. e1001599.
38. Andrews, L.D., H. Deng, and D. Herschlag, Isotope-edited FTIR of alkaline phosphatase
resolves paradoxical ligand binding properties and suggests a role for ground-state
destabilization. J Am Chem Soc, 2011. 133(30): p. 11621-31.
39. Kim, E.E. and H.W. Wyckoff, Reaction mechanism of alkaline phosphatase based on
crystal structures. Two-metal ion catalysis. J Mol Biol, 1991. 218(2): p. 449-64.
62
40. Stec, B., K.M. Holtz, and E.R. Kantrowitz, A revised mechanism for the alkaline
phosphatase reaction involving three metal ions. J Mol Biol, 2000. 299(5): p. 1303-11.
41. Murphy, J.E., X. Xu, and E.R. Kantrowitz, Conversion of a magnesium binding site into
a zinc binding site by a single amino acid substitution in Escherichia coli alkaline