Download - Investigating alkaline phosphatase and ketosteroid …...POOL predict that alkaline phosphatase and ketosteroid isomerase contain most of their catalytic power in the residues directly

1

Investigating Alkaline Phosphatase and Ketosteroid Isomerase

by Rational Design

A thesis presented

by

Nicholas A. DeLateur

to

The Department of Chemistry and Chemical Biology

in partial fulfillment of the requirements for the degree of

Master of Science in the field of Chemistry

Northeastern University

Boston, Massachusetts

August 8, 2013

2

© Copyright 2013


All rights reserved

3

Investigating Alkaline Phosphatase and Ketosteroid Isomerase

by Rational Design

by


ABSTRACT OF THESIS

Submitted in partial fulfillment of the requirements for the degree

of Master of Science in Chemistry and Chemical Biology

in the College of Science of Northeastern University,

August 8, 2013

4

Abstract

Enzymes catalyze chemical reactions many orders of magnitude faster than the

uncatalyzed reaction and are capable of doing so at physiological pH and temperature. As

enzymes consist of hundreds of amino acids, the ability to identify which residues contribute to

catalysis with high recall and low false positive rates is of critical importance to characterizing

and engineering enzymes. Theoretical Microscopic Anomalous Titration Curve Shapes

(THEMATICS) and Partial Order Optimum Likelihood (POOL) are programs developed at

Northeastern University that can identify the residues contributing to catalysis. THEMATICS

finds anomalous titration behavior, which correlates with catalytic activity. POOL combines the

THEMATICS input with geometric and evolutionary predictions to rank each residue by the

likelihood of its importance for catalysis.

Alkaline phosphatase (AP) is a protein found in all domains of life which cleaves

phosphate groups from a broad range of substrates. Ketosteroid isomerase performs an important

biological function in the metabolism of many bacteria by degrading steroids. THEMATICS and

POOL predict that alkaline phosphatase and ketosteroid isomerase contain most of their catalytic

power in the residues directly surrounding the reacting substrate molecule; there is very little

contribution from the residues in the distal or remote residues of the protein. This example is in

stark contrast to phosphoglucose isomerase (PGI) and nitrile hydratase (NH), where

THEMATICS and POOL predict a multi-layer active site, with residues in the second and third

shells contributing to activity. The predictions for KSI, PGI, and NH have been experimentally

validated.

5

Pseudomonas putida KSI (PpKSI) is strikingly efficient and selective. Three putative

KSIs identified from Structural Genomics were analyzed by THEMATICS and POOL and then

characterized in vitro to determine the presence of, or lack of, KSI activity. A putative KSI from

Mycobacterium tuberculosis (MtKSI) was predicted to have isomerase activity and biochemical

experiments reveal that the putative M. tuberculosis KSI does indeed possess KSI activity,

although with reduced efficiency compared to PpKSI.

To investigate this lower efficiency in the correctly annotated KSI, we engineered the

MtKSI active site to resemble more closely that of PpKSI under the hypothesis that these

mutations would increase the activity of MtKSI. However, we found that most of these mutations

alone or in tandem significantly lowered rather than increased activity. Variants S16Y, F111D,

S16Y/F64Y, S16Y/F111D, F64Y/F111D, and S16Y/F64Y/F111D lost catalytic power and were

essentially inactive. Variant F64Y retained catalytic power similar to the wild-type enzyme.

Although the active sites of MtKSI and PpKSI are similar, our attempts to increase the catalytic

efficiency by creating a more PpKSI-like active site of MtKSI were not successful.

Protein engineering relies on the ability to accurately predict sites of function. The best

predictor for active-site residues is POOL using THEMATICS, INTREPID, and ConCavity

inputs. We’ve shown that not only can POOL correctly predict the residues required for

catalysis, but these predictions can also be used to assign function to proteins whose function is

unknown or putatively assigned. Even if the residues required for catalysis are known, the ability

to engineer improved or novel function is still difficult and may require multiple approaches.

6

Acknowledgments

I am blessed with not one, but two advisors of extraordinary talent and patience. I am

forever grateful to Professor Penny Beuning for allowing me to begin work in her lab as a young

freshman with no experience in chemistry or biology. She has been an unending source of

mentoring and teaching. Professor Mary Jo Ondrechen has trusted me with project after project,

encouraging me to investigate and grow as a scientist, for which I will be always grateful.

Dr. Srinivas Somarowthu performed the herculean task of teaching me both the

computational and experimental aspects of THEMATICS/POOL, alkaline phosphatase, and

ketosteroid isomerase. I owe most of my practical knowledge in these areas to Sri, and am

thankful for the pleasure of meeting and working with him over these past years.

I want to thank the numerous past and present DNA and ORG lab members, with

emphasis towards Judith Hollander and Ramya Parasarum for graciously sharing bench space

and wisdom. Mark Naniong and Colleen Shea experimented on MtKSI as undergraduate

researchers and their impressive work contributed to the data contained in this thesis.

Neither this work—nor even my graduation—would be possible without Richard

Pumphrey, Cara Shockley, Andrew Bean, Jordan Keefe, and Katie Cameron assisting me

through the NU shuffle and my own shortcomings. Jeff Peterson, Professor Graham Jones,

Professor Carla Mattos, and Professor O’Doherty have provided me with immensely valuable

discussion and direction. I believe John Bottomy has forgiven me more than anyone on Earth; I

cherish his friendship and kindness.

7

I owe my inspiration and aptitude to my ever-supportive family, especially my parents

Sandra and Joe. They have been a never-ending source of love. Thank you so much Mom, Dad,

and Matt, along with Cole and Tiffany.

Funding that allowed these projects and my research to happen was provided by the

Office of the Provost at Northeastern University, the Matz Co-op Scholarship, and grants NSF:

MCB-0843603, CAREER MCB-0845033, and REU MCB-0843603.

8

Table of Contents

Abstract ........................................................................................................................................... 3

Acknowledgments........................................................................................................................... 6

Table of Contents ............................................................................................................................ 8

List of Figures ................................................................................................................................. 9

List of Tables ................................................................................................................................ 11

List of Abbreviations .................................................................................................................... 12

Chapter 1. Protein Engineering ..................................................................................................... 16

1.1. Proteins as catalysts ............................................................................................................ 16

1.2. Design vs. Redesign; Directed Evolution vs. Rational Design .......................................... 18

1.3. Functional Site Prediction with THEMATICS and POOL ................................................ 20

1.4. Catalysis by remote residues .............................................................................................. 24

Chapter 2. Alkaline Phosphatase .................................................................................................. 26

2.1. Introduction ........................................................................................................................ 26

2.2. Computational Predictions ................................................................................................. 28

2.3. Materials and Methods ....................................................................................................... 33

2.4. Results ................................................................................................................................ 35

2.5. Conclusions ........................................................................................................................ 39

Chapter 3. Ketosteroid Isomerase ................................................................................................. 43

3.1. Introduction ........................................................................................................................ 43

3.2. Computational Predictions ................................................................................................. 45

3.3. Materials and Methods ....................................................................................................... 46

3.4. Results ................................................................................................................................ 48

3.5. Conclusions ........................................................................................................................ 53

Chapter 4. Future Work ................................................................................................................ 56

4.1. POOL-rank cut-offs............................................................................................................ 56

Appendix A. Propagation of error in calculating catalytic efficiency .......................................... 59

References ..................................................................................................................................... 60

9

List of Figures

Figure 1.1. Alanine, aspartate, glutamate, and asparagine at pH 7. .............................................. 19

Figure 1.2. Phenylalanine, tyrosine, and serine at pH 7. .............................................................. 20

Figure 1.3. A titration curve of mean net charge as a function of pH for select lysine residues in

E. coli β-lactamase. ................................................................................................................. 22

Figure 1.4. Diagram of a multi-layered active site. ...................................................................... 25

Figure 2.1. The active site of alkaline phosphatase based on PDB ID: 1ALK.. ........................... 27

Figure 2.2. Diagram of Evolutionary Trace and THEMATICS predictions for AP. .................... 28

Figure 2.3. A POOL plot of POOL score vs. POOL rank for alkaline phosphatase. ................... 30

Figure 2.4. The 2nd

and 3rd

shell residues predicted by THEMATICS. ........................................ 32

Figure 2.5. Primers for site-directed mutagenesis of E. coli alkaline phosphatase.. .................... 33

Figure 2.6. Standard curve for 4-nitrophenol phosphate .............................................................. 36

Figure 2.7. Michaelis-Menten plots for AP in 1 M Tris-HCl pH 8.0 buffer. ............................... 37

Figure 2.8. Catalytic efficiencies of wild-type and variant alkaline phosphatases ....................... 38

Figure 2.9. AP residues investigated in this work.. ...................................................................... 39

Figure 2.10. A plot of Table 3 and Table 4 showing effects on catalytic efficiency based on

POOL rank for AP. ................................................................................................................. 40

Figure 3.1. Mechanism of KSI based on PpKSI numbering......................................................... 43

Figure 3.2. Primers for site-directed mutagenesis of MtKSI in plasmid pGST-Rv0760c. ........... 46

Figure 3.3. Standard curve for 4-androstene-3,17-dione (4AND). ............................................... 49

Figure 3.4. Michaelis-Menten plots for MtKSI WT and variants................................................. 49

Figure 3.5. WT and F64Y individual Michaelis-Menten plots. .................................................... 49

Figure 3.6. Single run of Michaelis-Menten plot for MtKSI F111D. ........................................... 50

10

Figure 3.7. “Top-down” view of PpKSI. ...................................................................................... 53

Figure 3.8. Three residues of interest in PpKSI without surrounding secondary structure. ......... 54

Figure 4.1. POOL plots for AP, KSI, PGI, NH, DnaE and DinB. ................................................ 58

11

List of Tables

Table 2.1. POOL predictions for alkaline phosphatase. ............................................................... 29

Table 2.2. Kinetic assays for alkaline phosphatase....................................................................... 35

Table 2.3. WT and variant AP kinetic parameters. ....................................................................... 37

Table 2.4. Summary calculations for WT alkaline phosphatase and variants. ............................. 38

Table 2.5. 1st shell variants of AP and their catalytic efficiency under comparable conditions to

our experiments. ...................................................................................................................... 41

Table 2.6. 2nd

and 3rd

shell variants of AP and their catalytic efficiency under comparable

conditions to our experiments. ................................................................................................ 42

Table 3.1. SALSA alignment of POOL predicted residues for known KSI proteins and proteins

annotated as putative KSIs.. .................................................................................................... 45

Table 3.2. Kinetic assays for MtKSI.. ........................................................................................... 48

Table 3.3. Vmax and KMapp

for MtKSI WT and variants. ................................................................ 50

Table 3.4. Comparison between the WT MtKSI and F111D variant at 90 μM 5AND. ............... 50

Table 3.5. Catalytic efficiency for MtKSI WT and variants......................................................... 51

Table 3.6. MtKSI WT and variants based on initial velocities at 30 μM 5AND. ......................... 52

Table A.1. Concentrations of enzymes used to gather kinetic data for alkaline phosphatase. ..... 59

12

List of Abbreviations

% Percent

°C Degrees Celsius

4AND 4-androstene-3,17-dione

5AND 5-androstene-3,17-dione

Å Ångströms

AP Alkaline phosphatase

BSA Bovine Serum Albumin

cm Centimeter

Da Dalton

DinB DNA Polymerase IV

DNA Deoxyribonucleic acid

DTT Dithiothreitol

E. coli Escherichia coli

ET Evolutionary Trace

FPLC Fast protein liquid chromatography

GST Glutathione S-transferase

h Hours

HEPES 4-(2-Hydroxyethyl)-1-Piperazineethanesulfonic Acid

kcat First order rate constant

kDa Kilodalton

KM Michaelis constant

13

KSI Ketosteroid isomerase

L Liter

M Molar

min Minutes

mL Milliliters

Ml Mesorhizobium loti

mM Millimolar

mmol Millimoles

Mt Mycobacterium tuberculosis

NH Nitrile hydratase

nM Nanomolar

nm Nanometers

NTF2 Nuclear Transcription Factor 2

OD Optical density

Pa Pectobacterium atrosepticum

PDB Protein Data Bank

PGI Phosphoglucose isomerase

PhoA Alkaline phosphatase

PNP para-nitrophenol

PNPP para-nitrophenol phosphate

POOL Partial Order Optimum Likelihood

PSI Protein Structure Initiative

14

R2 Regression co-efficient

rcf Relative centrifugal force

RNA Ribonucleic acid

SALSA Structurally Aligned Local Sites of Activity

SDS-PAGE Sodium dodecyl sulfate poly-acrylamide gel electrophoresis

SG Structural Genomics

SVM Support vector machine

TEV Tobacco etch virus

THEMATICS Theoretical Microscopic Anomalous Titration Curve Shapes

TM Melting temperature

Tris-HCl 2-amino-2-hydroxymethyl-propane-1,3-diol

μ3 3rd

central moment

μ4 4th

central moment

v/v Volume by volume

V0 Initial velocity

Vmax Maximum velocity

WT Wild-type

YT Yeast extract and Bacto Tryptone

μL Microliter

μM Micromolar

σ Error

15

A Ala Alanine

C Cys Cysteine

D Asp Aspartic Acid

E Glu Glutamic Acid

F Phe Phenylalanine

G Gly Glycine

H His Histidine

I Ile Isoleucine

K Lys Lysine

L Leu Leucine

M Met Methionine

N Asn Asparagine

P Pro Proline

Q Gln Glutamine

R Arg Arginine

S Ser Serine

T Thr Threonine

V Val Valine

W Trp Tryptophan

Y Tyr Tyrosine

16

Chapter 1. Protein Engineering

1.1. Proteins as catalysts

All known life forms create polymers of various combinations of 20 different amino

acids. These polymers are known as proteins and frequently act as catalysts, in which case they

are then referred to as enzymes. The linear chain of amino acids (primary structure) folds to form

local order such as α-helices and β-strands (secondary structure). These helices, strands, loops,

and other local structures fold into a single overall arrangement (tertiary structure); multiple

chains can associate with each other (quaternary structure). Enzymes catalyze reactions under

physiological conditions, such as neutral pH and room temperature, with extreme specificity and

high efficiency. With few exceptions, enzymes are responsible for catalyzing every important

chemical reaction in biology, giving rise to statements such as Orgel’s First Law:

Whenever a spontaneous process is too slow or too inefficient

a protein will evolve to speed it up or make it more efficient.

Most proteins are on the order of 100 to 1000 amino acids. With 20 canonical amino

acids with which to build, the number of possible protein sequences quickly becomes

unfathomable. For a protein on the smaller end, the number of possible sequences is 20100

. This

number however includes sequences that are nothing more than 200 prolines in a row, a

sequence that would be generally considered non-functional. Estimates put the fraction of

“functional” folds to be 1 in 1077

[1].

How enzymes are capable of achieving the remarkable feats of chemistry required of

them represents a central area of research in biochemistry. While a protein may be composed of

hundreds of amino acids, generally only a small handful of those amino acids are directly

17

involved in the performance of catalysis. These residues compose the “active site” of the

enzyme. In 1946 Linus Pauling postulated that the catalytic power of enzymes lies in their ability

to lower the energy of the transition state between substrate and product[2], a theory which

essentially is still true today[3]. To explain the ability of enzymes to perform reactions only on

their specific cognate substrates, the “lock and key”[4] theory was proposed, eventually giving

way to a more nuanced “induced fit”[5] theory taking into account the transition state geometry

and realistic expectation of a dynamic system. The lock-key and induced fit models generally

assume a globular fold with a solvent accessible active site. In many cases the active site is

buried within the protein or protein cavity[6]. Recently, Jiri Damborsky has pursued a “keyhole-

lock-key” model to address this complication[6, 7].

The Central Dogma[8] of biology, that DNA is transcribed into RNA which is then

translated into protein, provides a natural scheme in which to probe hypotheses about protein

sequence-structure-function relationships. By manipulation of an organism’s DNA, a variant

protein product is produced, which can then either be examined at an in vitro functional level

after isolation, or kept in the organism and the phenotype of the organism observed under

varying conditions to elucidate in vivo function. The sequence-structure relationship is a folding

problem, and while interesting in its own right, will not be addressed here in favor of the

structure-function relationship. One reason is that many sequences result in the same overall

structure. Another reason is that the active site is a structural feature and is the focus of protein

engineering endeavors.

Two things required for protein engineering as a field to emerge were:

-A method to change the protein sequence, and thus structure, with exquisite control

18

-A falsifiable hypothesis of how a change in protein structure will change protein function

This was finally accomplished in 1982, exemplified with a foundational study on tyrosyl-

transfer RNA synthetase[9], after the advent of site-directed mutagenesis which allowed specific,

controlled changes at the DNA level to be specified by the researcher. Protein engineering as a

field today produces marvelous work that ranges from designing enzymes to catalyze Diels-

Alder[10] and Kemp[11] reactions, building a fully functional enzyme from a 9-amino acid

alphabet[12], and creating a completely new fold never seen in nature[13].

1.2. Design vs. Redesign; Directed Evolution vs. Rational Design

To be strict, protein engineering, or protein design, would refer to the process of creating

a functional protein de novo (also referred to as “artificial” enzymes). Most protein engineering

however utilizes already functional enzymes, such as those extracted from organisms of research

interest, and manipulates them in a way to make them more functional, different in function, or

to make them lose their functionality. These are examples of protein reengineering and can often

be seen designated as such in the literature (for example, see recent review by Hilvert[14]).

Presently, the protein engineering paradigm of creating mutations and examining

resulting changes in function is well established. Implementation on the other hand is constrained

by the unfathomably high permutation level proteins occupy; it is impossible for an experimental

lab to investigate every residue in a protein especially if multiple mutations at the same residue

are desired. There are two main approaches to deal with this dilemma: directed evolution and

rational design.

Directed evolution draws upon Darwinian evolution concepts to discover mutations of

interest by iterating rounds of mutagenesis and selection. At its simplest, a gene encoding a

19

protein undergoes non-specific mutagenesis to introduce a large array of mutations, and the

resulting library expressed and a certain phenotype is selected. The survivors of the first round of

selection return to the mutagenesis step to repeat the process until a satisfying level of function is

attained. Since its inception, directed evolution has proven to be a powerful technique for protein

engineering[15, 16] for developing new or improved function.

Rational design represents the oldest method of protein engineering. Using hypotheses

about the roles of particular residues, specific mutations to specific amino acids are chosen,

created, and then the resulting change (or lack thereof) examined. The residues of interest can be

chosen based on crystal structures, previous experiments, sequence comparison, structural

comparison, etc. To determine which residues in a protein contribute to catalysis for example,

one would determine which residues are suspected of contributing to catalysis, and create one or

more mutations that probe this hypothesis.

Figure 1.1. Alanine, aspartate, glutamate, and asparagine at pH 7.

Alanine contains a mere methyl group as its side-chain, whereas aspartic acid is a short

acid. Glutamic acid is another acid with a side chain longer than aspartic acid by a single

methylene, and asparagine contains an amide group instead of the carboxyl group. Often a

residue is changed to alanine due to alanine’s simple nature, consisting of a single methyl group

O

NH3+

CH3O-

O-O

-

NH3+

O

O

O O

O-

O-

NH3+

O

O

NH2

NH3+

O-Glutamate

Alanine Aspartate

Asparagine

20

for a side chain residue. This approximates a loss of both functional group and bulk. Charge and

size are two of the most important characteristics to investigate for an amino acid’s contribution.

Figure 1.1 shows the differences between an aspartic acid and a change to asparagine to change

charge, or a change to glutamic acid to change size.

Many times a residue will contain multiple functionalities. For example, tyrosine contains

both a hydroxyl functional group and an aromatic functional group. To investigate the

contributions of these moieties as separately as possible, a series of mutations such as visualized

in Figure 1.2 could be made. A mutation from tyrosine to serine, while a drastic change in size,

would remove the aromatic functionality. A mutation from tyrosine to phenylalanine would

remove just the hydroxyl group, leaving the 6-membered aromatic ring intact.

O

NH3

+O-

O

NH3

+

OH

O-

O

NH3

+OH O

-

Figure 1.2. Phenylalanine, tyrosine, and serine at pH 7.

Tyrosine provides both an aromatic ring and a hydroxyl group to the active site of an

enzyme. Phenylalanine provides only the aromatic moiety whereas serine adds only a hydroxyl

group without aromaticity. These mutations allow us to test hypotheses pertaining to an

enzyme’s stability, mechanism, selectivity, or efficiency by rational change of the protein. This

method of investigation underpins protein engineering as a powerful tool to investigate the active

sites of proteins.

1.3. Functional Site Prediction with THEMATICS and POOL

The active site of the protein is commonly termed “where the chemistry happens”. For

our purposes we sometimes use a more strict definition of “residues within 5 Å of the site of

Phenylalanine Tyrosine Serine

21

reaction”. These residues interact with the substrate directly, whether it by hydrophobic

interactions, π- π interactions, (de)protonation, hydrogen bonds, Coulomb forces, dipole-dipole

interactions, or covalent bonding. It is of great interest to predict accurately and quickly the

active site of a given protein structure. To that end, the active site prediction method Theoretical

Microscopic Anomalous Titration Curve Shapes (THEMATICS) was published in 2001[17].

THEMATICS uses computational methods to calculate a theoretical titration curve for every

ionizable residue (K, R, D, E, H, Y, C) in a protein structure. A small minority of these titration

curves will show behavior that significantly differs from the ideal Henderson-Hasselbalch

behavior (Figure 1.3). While a single outlier may be a fluke, a “cluster”, defined as two or more

residues with deviant behavior within 6 Å of each other, is considered a positive hit for

identifying the active site.

THEMATICS utilizes the unique property of a catalyst to help find active sites; a catalyst

must replenish itself to the former state at the end of a chemical reaction. For enzymes, of which

there are many, that give or receive a proton there is a fundamental problem that to be acidic

enough to offer a proton, or basic enough to abstract a proton from the substrate, would

necessitate being too weak a base to take back the proton once owned by the enzyme, or too

weak an acid to give back the proton borrowed by the enzyme[18].

If a residue could be both an acceptor and donor of a proton simultaneously, or near

simultaneously, the paradox would be resolved. The residue would have to be ionizable over a

wide range of pH values and not follow Henderson-Hasselbalch behavior: the type of behavior

THEMATICS calculates for known residues of catalytic importance.

22

Figure 1.3. A titration curve of mean net charge as a function of pH for select lysine residues in E. coli β-lactamase.

In Figure 1.3[19] the two filled symbols show the titration curve of two lysines (K146

and K215) that do not contribute to catalysis. The two unfilled symbols in Figure 1.3 show the

titration curves of active site lysines K73 and K234. Note the classic, sharp transition of charge

states as modeled by the Henderson-Hasselbalch equation for the non-catalytic lysines contrasted

to the perturbed, anomalous behavior of the curves for catalytic lysines.

THEMATICS contains additional advantages beyond predicting active sites. Because the

criteria for prediction are based purely on computed chemical properties from the three-

dimensional coordinates for the query protein and are not dependent on homology,

THEMATICS remains immune to false positives due to homology or database misannotation. A

structure of an enzyme could be the only structure in existence, such as a novel or artificial fold,

and THEMATICS will still perform just as powerfully. It has been shown that THEMATICS

works well using a homology model as an input rather than empirical structures[20], and finds

both catalysis and recognition sites of enzymes[21]. Quantitation of the deviation from

Henderson-Hasselbalch behavior was implemented by examining the 3rd

and 4th

central moments

23

of the curves, which correspond to asymmetry and kurtosis respectively[22]. Residues scoring

more than one standard error higher than the average residue of its type were considered positive

hits (Z >1 for μ3 or μ4)[22]. The Z-score cut-off was later refined to Z >0.99 for μ3 or μ4 after it

was found to improve performance on the reference data set[23].

Originally, THEMATICS titration curves were inspected manually for non-Henderson-

Hasselbalch behavior, which raises both resource-commitment and scientist-bias issues.

Automation[24] alleviated of both of these concerns and paved the way to add Support Vector

Machines (SVM) as a potential way of raising THEMATICS recall and precision even

higher[25].

Partial-Order Optimum Likelihood (POOL) combines THEMATICS with other

predictors to create the best functional site predictor to date[26]. Originally using CASTp for

geometric features and ConSurf for sequence-based features[26], POOL has since[27]

incorporated ConCavity[28] for geometric features and INTREPID[29] for sequence-based

phylogenetics features. POOL provides many advantages over THEMATICS: the ability to

predict non-ionizable residues, include sequence/geometric information, and improved

performance. POOL allows non-ionizable residues to be predicted by assigning all residues an

environmental μ3 and μ4 based on the behavior of nearby residues. In addition to the 3rd

and 4th

central moments, the buffer range[27] (BR) was added as a feature to quantitate the wide-range

of buffering capability that is typically high for active site residues. POOL is publicly available

via web at http://www.pool.neu.edu/wPOOL/[30]

http://www.pool.neu.edu/wPOOL/

24

1.4. Catalysis by remote residues

Earlier we defined the active site as “residues within 5 Å of the site of reaction”. Even

during the seminal work on tyrosyl-transfer RNA synthetase the concept of residues remote from

the site of chemical transformation contributing to catalysis seemed evident and was validated by

showing that T40 and H45 contributed to catalysis by binding of the tail phosphate groups of the

ATP moiety[9]. However, here a stricter definition of remote residues is adopted, and we

redefine the active site as “residues within 5 Å of the substrate”, regardless of whether that

particular residue is directly involved in chemical reactions. With this definition, residues such as

T40 and H45 in tyrosyl-transfer RNA synthetase would not be considered remote, but rather it

could be said that the active site of tyrosyl-transfer RNA synthetase is particularly large to

accommodate a particularly large substrate.

As soon as THEMATICS was created, it was noted that certain predictions by

THEMATICS included residues that were not in direct contact with the substrate[17]. These

residues were not only far away from the site of the reaction, but did not have any interaction

with the substrate. Whether these predictions were false positives, or correct predictions yet to be

tested remained an open question[17, 18]. One could imagine an active site to be composed of

layers: the first layer are those residues that are within contact with the substrate, the second

layer would be composed of the residues in contact with, but behind, the first shell, the third shell

would be composed of the residues in contact with, but behind, the second shell.

Figure 1.4 abstractly shows a multi-layered active site consisting of a 1st shell that

interacts with the substrate, a 2nd

shell of residues interacting with the 1st shell, and a 3

rd shell of

residues interacting with the 2nd

shell. Each shell is approximately 5 Å in depth.

25

Figure 1.4. Diagram of a multi-layered active site.

These predicted residues were in the second, or even third, shell of the active site. Nitrile

Hydratase (NH), Phosphoglucose Isomerase (PGI) and DNA Polymerase IV (DinB) were

predicted to contain 2nd

shell residues contributing to catalysis. Alternatively, there are some

enzymes such as Ketosteroid Isomerase (KSI) where no second-shell residues are predicted to be

important for catalysis. It was found that indeed NH[31], PGI[32], and DinB[33] all contain

remote residues contributing to catalysis, whereas KSI[32] possesses a mostly single-layered

active site.

These results show that many, but not all, enzymes contain active sites that are extended,

utilizing remote residues to contribute towards catalysis. THEMATICS and POOL accurately

predict the contributions of remote residues to catalysis by a wide range of enzymes. Thus, the

extent of an enzyme’s active site can be predicted using POOL and THEMATICS.

26

Chapter 2. Alkaline Phosphatase

2.1. Introduction

Alkaline phosphatase (AP) appears across all domains of life releasing phosphate groups

from a wide range of substrates. AP is of great interest for use in diagnostic assays but the

bacterial enzyme is considered too slow compared to the mammalian enzyme, although the

temperature stability of the mammalian enzyme is much lower than the bacterial enzyme (65 °C

and 95 °C TM, respectively)[34]. Alkaline phosphatase has been a staple of enzymology studies

for decades[35] although it is under constant revision and further investigation as to its

mechanism[35-38]. Its thermostability, ubiquity in both nature and the chemical literature, and

ease of kinetic assay present an excellent learning opportunity. As such, the senior-level

Chemical Biology course at Northeastern University utilizes the site-directed mutagenesis and

Michaelis-Menten parameterization of alkaline phosphatase as a long term lab experiment. Some

mutations in this work were designed by undergraduates partaking in this course.

E. coli AP is encoded by the phoA gene and encodes 471 amino acids composing the

precursor protein; the first 21 amino acids contain a periplasmic signal sequence that is then

removed from the protein, resulting in a 450 amino acid enzyme that naturally dimerizes in

solution. Each monomer contains its own active site with three metal ions: two zinc and one

magnesium[39]. These metal ions are held in place by various residues and with no substrate

present are coordinated with three water molecules[40]. The magnesium ion is held in place by

D51, D153, T155, and E322; the zinc1 ion is held in place by R166, D327, H331, and H412; the

zinc2 ion is held in place by D51, R166, D369 and H370[35, 39-49]. K328 interacts with the

27

phosphate moiety through a water molecule[49] and S102 performs the nucleophilic attack on

the substrate[40, 50, 51].

Figure 2.1. The active site of alkaline phosphatase based on PDB ID: 1ALK. Zinc: purple; magnesium: yellow;

phosphate: green/red.

28

2.2. Computational Predictions

Alkaline phosphatase as analyzed by THEMATICS was predicted to have mostly 1st shell

residues, with two predicted 2nd

shell residues and one 3rd

shell residue. In addition to analysis by

THEMATICS, analysis by Evolutionary Trace (ET)[52, 53] predicted a much larger population

of residues that fully included those predicted by THEMATICS.

Figure 2.2. Diagram of Evolutionary Trace and THEMATICS predictions for AP.

29

Each group in Figure 2.2 represents a shell of the active site for AP. Residues predicted

by ET but not predicted by THEMATICS could be non-ionizable (such as the case of S102)

and/or simply not predicted by THEMATICS

POOL

Rank Residue

Raw

POOL score

Normalized

POOL Score

1 ASP 51 2.06E-02 1.00E+00

2 ASP 369 1.42E-02 6.87E-01

3 HIS 370 1.06E-02 5.15E-01

4 ASP 327 1.06E-02 5.15E-01

5 GLU 322 7.89E-03 3.83E-01

6 HIS 412 3.57E-03 1.73E-01

7 HIS 331 5.18E-04 2.52E-02

8 ASP 101 3.57E-04 1.73E-02

9 HIS 372 3.36E-04 1.63E-02

10 ASP 153 1.54E-04 7.50E-03

11 ARG 166 1.10E-04 5.36E-03

12 GLU 341 8.08E-05 3.92E-03

13 LYS 328 7.37E-05 3.58E-03

14 HIS 86 6.85E-05 3.33E-03

15 PRO 156 4.25E-05 2.06E-03

16 GLY 52 3.80E-05 1.85E-03

17 GLU 57 2.20E-05 1.07E-03

18 THR 155 2.11E-05 1.03E-03

19 SER 102 2.11E-05 1.03E-03

20 HIS 162 1.77E-05 8.59E-04

21 MET 53 1.69E-05 8.18E-04

22 ASP 330 9.87E-06 4.79E-04

23 ASN 44 8.80E-06 4.27E-04

24 PHE 317 8.80E-06 4.27E-04

25 GLY 207 8.80E-06 4.27E-04

Table 2.1. POOL predictions for alkaline phosphatase. THEMATICS predictions include those colored. Blue: 1st

shell; yellow: 2nd

shell

30

Shown here are only the 25 mostly highly ranked residues of 450. Residues in blue are

known residues contributing to catalysis via ligand or metals; residues in yellow are predicted by

THEMATICS to be 2nd

or 3rd

shell residues of interest. E341 helps form the dimer interface.

Figure 2.3. A POOL plot of POOL score vs. POOL rank for alkaline phosphatase.

The POOL plot in Figure 2.3 extends out towards a rank of 449, asymptotically

approaching a POOL score of 0. There are quite a few interesting predictions by POOL for

alkaline phosphatase (Table 2.1). It performs well in predicting the first shell of residues, as well

as the dimer-interface forming residue. Residues predicted by THEMATICS all reside in the top

22 (top 5%) of POOL ranking, including the 2nd

and 3rd

shell residues predicted by

THEMATICS. Threonine 155 and serine 102 are both essential for catalysis but only rank as 18

and 19 respectively; because neither serine nor threonine are considered ionizable, THEMATICS

would not predict these residues directly. The computational predictions shown above suggest

that alkaline phosphatase may have a few 2nd

and 3rd

shell residues important for catalysis,

namely E57, D330, and H372 (Figure 2.4).

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50

PO

OL

Sco

re

POOL rank

31

POOL discards the THEMATICS Boolean approach of assigning discrete yes/no values

to predictions of functional importance for residues in exchange for a ranking system (Figure

2.3), complete with its own advantages and disadvantages (see discussion[26]). Traditionally a

%-based cut-off, such as top 8%, 10%, or as low as top 5%, is utilized to determine what the user

should investigate as a residue important for catalysis. However the exact cut-off is still an area

of investigation (see Further Work) and can be dependent on the size and type of protein of

interest.

32

Figure 2.4. The 2nd

and 3rd

shell residues predicted by THEMATICS: (top) H372, (middle) D330, and (bottom) E57.

Zinc: purple; magnesium: yellow; phosphate: green/red.

H372

D330

E57

33

2.3. Materials and Methods

To investigate these predictions by THEMATICS and POOL pertaining to the possible

outer-shell residues in alkaline phosphatase, we employed site-directed mutagenesis to construct

mutants, expressed and purified them, and assayed their activities in reference to the wild-type

protein.

2.3.1. Materials

Quik-Change® site-directed mutagenesis kits (Agilent, CA) were used to make mutations

in pEK29[43] (provided by E. Kantorwitz (Boston College)) using primers below and confirmed

by DNA sequencing (Massachusetts General Hospital DNA Core, Cambridge, MA).

M75T

5'-GGCGATGGGACGGGGGACTCGG-3'

5'-CCGAGTCCCCCGTCCCATCGCC-3'

H394D

5'-CTGATCACGCCGACGCCAGCCAG-3'

5'-CTGGCTGGCGTCGGCGTGATCAG-3'

H108L

5'-GGGCAATACACTCTCTATGCGCTG-3'

5'-CAGCGCATAGAGAGTGTATTGCCC-3'

E79Q

5'-GGACTCGCAAATTACTGCCGCACG-3'

5'-CGTGCGGCAGTAATTTGCGAGTCC-3'

D352N

5'-CGATAAACAGAATCATGCTGCCAATCC-3'

5'-GGATTCGCAGCATGATTCTGTTTATCG-3'

H394L

5'-CGCTGATCACGCCCTCGCCAGCCAG-3'

5'-CTGGCTGGCGAGGGCGTGATCAGCG-3'

M75A

5'-CTGATTGGCGATGGGGCAGGGGACTCG-3'

5'-CGAGTCCCCTGCCCCATCGCCAATCAG-3'

E172Q

5'-GTTTCTACCGCACAGTTGCAGGATG-3'

5'-CATCCTGCAACTGTGCGGTAGAAAC-3'

S127A

5'-GACTCGGCTGCAGCAGCAACCGCC-3'

5'-GGCGGTTGCTGCTGCAGCCGAGTC-3'

Q457E

5'-GGACTGACCGACGAGACCGATCTC-3'

5'-GAGATCGGTCTCGTCGGTCAGTCC-3'

Figure 2.5. Primers for site-directed mutagenesis of E. coli alkaline phosphatase. Codons manipulated are

underlined.

SM547 cells, lacking a chromosomal phoA gene, were provided by E. Kantorwitz and

made competent by chemical treatment with CaCl2 and stored at -80 °C in aliquots. Primers were

34

hydrated to 100 μM concentration with sterile water, and a 5 μM stock created by diluting 20-

fold into sterile water.

2.3.2. Methods

For protein purification, plasmids to express either WT or variant AP were transformed

into SM547 competent cells and selected on LB agar containing 100 μg mL-1

ampicillin. An

overnight culture of 50 mL YT medium containing100 μg mL-1

ampicillin grown at 37 °C was

sub-cultured to 1 L YT supplemented with 100 μg mL-1

ampicillin and growth was continued for

12 hours at 37 °C. The cells were harvested, washed, and osmotically shocked as previously

described by Brockman & Heppel[54] and then precipitated, suspended, dialyzed, and purified

on a HiTrap FastFlow Q column (GE Healthcare) by FPLC as described by Chaidaroglou et

al.[43]. Purity of each fraction was determined by 10% SDS-PAGE and pure fractions were

stored at -20 °C. Concentration of protein was determined by Bradford assay (Bio-Rad) against a

bovine serum albumin standard.

Formation of para-nitrophenyl was measured at 410 nm at room temperature in High Tris

buffer (1.0 M Tris-HCl pH 8.0) from the cleavage of para-nitrophenyl phosphate to calculate

initial velocities with an extinction coefficient of 1.42 x 104 M

-1 cm

-1 (Figure 2.6). Non-linear

regression to calculate KM and kcat was performed using GraphPad Prism 5 version 5.02 . At least

three independent trials were performed for each protein. Data were collected every 0.5 seconds

starting at the 3rd

sec of the reaction and continuing for 2 min to construct the initial velocities,

initiated with addition of enzyme. PNPP was kept in the dark as much as possible, and stored in

light-resistant microcentrifuge tubes when aliquoted.

35

PNPP

μM

Buffer

(2X) Water

PNPP

2 mM

Enzyme

variable nM Total

1 500 483 2 15 1000

2 500 481 4 15 1000

5 500 475 10 15 1000

10 500 465 20 15 1000

20 500 445 40 15 1000

50 500 385 100 15 1000

100 500 285 200 15 1000

200 500 85 400 15 1000 Table 2.2. Kinetic assays for alkaline phosphatase. Bolded columns denote final concentrations, where all other

numbers refer to μL added to the cuvette.

2.4. Results

In order to determine initial velocities by monitoring production of the product 4-

nitrophenol phosphate (4-PNP), a standard curve with dilutions of 4-PNP gives a molar

extinction coefficient of 1.42 x 104 M

-1 cm

-1 similar to the reported 1.62 x 10

4 M

-1 cm

-1 [43].

36

Figure 2.6. Standard curve for 4-nitrophenol phosphate

Each alkaline phosphatase variant was tested concurrently with wild-type alkaline

phosphatase on the same day. Initial velocities, V0, for each substrate concentration (1-200 μM

PNPP) was calculated by taking the slope of the product formation (in a.u. min-1

) and dividing by

the 4-PNP molar extinction coefficient to give μM PNP min-1

.

y = 0.0142x + 0.0304 R² = 0.9993

0

0.5

1

1.5

2

2.5

0 20 40 60 80 100 120 140 160

Ab

sorb

an

ce (

410 n

m)

4-nitrophenol phosphate (μM)

37

Figure 2.7. Michaelis-Menten plots for AP in 1 M Tris-HCl pH 8.0 buffer. Error bars represent standard error of at

least three independent trials.

AP Variant Vmax (μM min-1

) KM (μM) R2

WT 4.8 (0.1) 26.7 (2.6) 0.94

M53A 8.9 (0.4) 26.2 (4.1) 0.96

M53T 1.7 (0.1) 22.7 (2.1) 0.98

E57Q 7.5 (0.3) 25.2 (3.5) 0.97

H86L 3.2 (0.1) 13.5 (0.9) 0.99

S105A 3.1 (0.2) 14.1 (3.1) 0.91

E150Q 5.2 (0.5) 33.2 (8.7) 0.90

D330N 2.0 (0.1) 22.0 (5.4) 0.87

H372D 3.0 (0.3) 53.1 (14.4) 0.91

H372L 4.1 (0.2) 10.6 (1.5) 0.95

Q435E 8.4 (0.1) 20.9 (1.1) 0.99

Table 2.3. WT and variant AP kinetic parameters. Standard errors are in parentheses and consist of at least three

independent trials.

Vmax is not proportional to kcat between enzymes due to the enzymes being at different

concentrations (Appendix A). None of the variants showed a dramatic decrease in activity. While

there are some small differences in individual kcat or KM values, the catalytic efficiencies are all

similar (Table 2.4).

38

PhoA Variant

POOL Rank

Å to PO4 Shell kcat (s

-1) KM (μM) Catalytic Efficiency

(106 M-1 s-1) Fold

Decrease

WT -- -- -- 40 (7.3) 28 (9) 1.43 (0.53) --

H372D 9 6.7 2nd 27 (10) 63 (44) 0.43 (0.34) 3.33 (2.91)

H372L 9 6.7 2nd 6.3 (0.1) 11 (2.9) 0.57 (0.15) 2.49 (1.13)

H86L 14 11.2 2nd 6.3 (0.1) 14 (0.7) 0.45 (0.02) 3.17 (1.19)

S105A 16 7.2 2nd 9.7 (0.7) 14 (3.2) 0.69 (0.17) 2.06 (0.91)

E57Q 17 12.3 3rd 21 (0.8) 26 (4) 0.81 (0.13) 1.77 (0.71)

M53A 21 14.6 3rd 25 (4.4) 27 (16) 0.93 (0.17) 1.54 (0.64)

M53T 21 14.6 3rd 14 (0.8) 23 (3.4) 0.61 (0.10) 2.36 (0.94)

D330N 22 11.0 2nd 17 (4.2) 25 (2.5) 0.68 (0.18) 2.1 (0.96)

Q453E 44 11.1 2nd 17.4 (0.4) 21 (3) 0.83 (0.12) 1.72 (0.68)

E150Q 136 10.2 2nd 13 (8.8) 34 (6.4) 0.38 (0.27) 3.74 (2.97) Table 2.4. Summary calculations for WT alkaline phosphatase and variants. Standard errors are in parentheses and

consist of at least three independent trials.

There is no correlation between either POOL rank nor distance (Å) to the phosphate

substrate for the residues tested, nor is there a significant difference in results between the 2nd

shell and the 3rd

shell residues as groups. Distances from the PO4 (Å) are based on

PDB:1ALK[39] and measured from tip of the residue side chain to the phosphorous atom.

Figure 2.8. Catalytic efficiencies of wild-type and variant alkaline phosphatases. Error bars represent standard error

over at least three independent trials.

0.00

0.50

1.00

1.50

2.00

2.50

WT H372D H372L H86L S105A E57Q M53A M53T D330N Q453E E150Q

Cata

lyti

c E

ffec

ien

cy (

10

6 M

-1 s

-1)

AP variant

39

2.5. Conclusions

WT alkaline phosphatase is catalytically efficient with a kcat/KM of 1.5 x 106 M

-1 s

-1,

which is similar to the literature values reported across various experiments[34, 43, 55, 56].

Mutations disrupting interactions of the active site at the first shell commonly decrease the

catalytic efficiency of alkaline phosphatase by many orders of magnitude. In contrast, throughout

this work and from the compiled literature of single-mutation variants, mutations in the second

or third shell have little to no effect on catalysis. Alkaline phosphatase seems to have a compact

active site comprising solely first-shell residues that contribute significantly to catalysis as

measured by single-point mutants.

Figure 2.9. AP residues investigated in this work. Zinc: purple; magnesium: yellow; phosphate: green/red.

It has been shown that the turnover rate of AP can be increased substantially by multiple

mutations, including 2nd

shell mutations. A D153G/D330N double mutant was reported to have

over 50-fold higher kcat than the WT AP; the KM was also raised by about 30 fold leaving the

40

enzyme with less than a 2-fold higher overall catalytic efficiency, however[34]; similarly,

D101A gives a 2-fold increase to kcat and KM negating each other[57]. D153A by itself, while

resulting in almost no change in catalytic efficiency, resulted in a 7-fold increase in each kcat and

KM[42]. D101, D153, and D330 are all predicted by POOL and rank 8th

, 10th

, and 22nd

respectively. Multiple mutations in close space achieved modest 2- to 6-fold increases in kcat/KM

including V99A, T100V, T100I, and D101S[58].

Figure 2.10. A plot of Table 3 and Table 4 showing effects on catalytic efficiency based on POOL rank for AP.

The largest loss of activity are seen in D327 and S102, with no mutations on residues

outside the top 20 predicted residues by POOL having a large (>1 magnitude) decrease in

catalytic efficiency. It is important to note that this compilation only examines single mutations

where both subunits are affected. Alkaline phosphatase has been known to shown intragenic

complementation where a heterodimer of variants A and B, AB, will have higher activity than

0.1

1

10

100

1000

10000

100000

1000000

0 20 40 60 80 100 120 140 160

Fold

Dec

rease

over

res

pec

tive

WT

AP

(log s

cale

)

POOL Rank for AP

41

AA or BB[56]. While the two active sites per dimer are more than 30 Å apart, there seems to be

molecular communication between them.

Variant Shell

Pool

Rank

POOL

percentile

(kcat/KM) wild-type /

(kcat/KM) mutant Reference

D51E 1 1 99 231 [44]

D369N 1 2 99 95 [56]

D327N 1 4 99 4350 [45]

D327N 1 4 99 100 [46]

D327A 1 4 99 >600,000 [45]

D327A 1 4 99 >1,000,000 [46]

E322K 1 5 99 1520 [56]

H412Y 1 6 99 >12,000 [56]

H412E 1 6 99 2237 [44]

H331E 1 7 98 972 [44]

D101S 1 8 98 0.2 [59]

D101A 1 8 98 1 [57]

D153G 1 10 98 0.2 [59]

D153H 1 10 98 1.1 [34]

D153H 1 10 98 3.5 [47]

D153E 1 10 98 1.3 [44]

D153A 1 10 98 1.1 [42]

D153N 1 10 98 1.1 [42]

R166A 1 11 98 313 [43]

R166S 1 11 98 125 [43]

R166Q 1 11 98 166 [48]

R166K 1 11 98 4 [48]

K328R 1 13 97 0.9 [58]

K328C 1 13 97 10 [51]

K328H 1 13 97 0.5 [34]

K328H 1 13 97 3.2 [49]

K328A 1 13 97 3.8 [49]

T155M 1 18 96 678 [56]

S102G 1 19 96 >300,000 [60]

S102A 1 19 96 >60,000 [60]

S102C 1 19 96 >19,000 [60]

Table 2.5. 1st shell variants of AP and their catalytic efficiency under comparable conditions to our experiments.

42

Variant Shell

Pool

Rank

POOL

percentile

(kcat/KM) wild-type /

(kcat/KM) mutant Reference

H372A 2 9 98 2.9 [61]

H372D 2 9 98 3.3 This Work

H372L 2 9 98 2.5 This Work

H86L 2 14 97 3.1 This Work

E57Q 2 17 96 1.8 This Work

M53A 2 21 95 1.5 This Work

M53T 2 21 95 2.4 This Work

D330N 2 22 95 0.2 [34]

D330N 2 22 95 2.1 This Work

Q435E 2 44 90 1.7 This Work

A103C 2 50 89 0.9 [58]

A103D 2 50 89 2.2 [58]

T100V 2 51 89 0.3 [58]

T100I 2 51 89 0.3 [58]

V99A 2 100 78 0.2 [58]

S105L 2 136 70 6.3 [56]

S105A 2 136 70 2.1 This Work

E150Q 3 106 76 3.7 This Work

E341K * 12 97 1407 [56]

T59A * 169 37 1.5 [62]

T59R * 169 37 >600,000 [62]

Table 2.6. 2nd

and 3rd

shell variants of AP and their catalytic efficiency under comparable conditions to our

experiments.

43

Chapter 3. Ketosteroid Isomerase

3.1. Introduction

Ketosteroid isomerase (KSI) moves a double bond to convert ∆5-3-ketosteroids to ∆

4-3-

ketosteroids by cleavage of the C-H bond at C4 and reattaching the proton at C6. This reaction is

characteristic of many biological processes of intramolecular abstraction and reprotonation

(Figure 3.1). Considering that some known KSI enzymes reach diffusion-limited rates of

reaction[63, 64], KSI is an attractive model for studying enzyme kinetics and active site

engineering[32, 63, 65]. There are two well-studied sources of ∆5-3-ketosteroid isomerase:

Pseudomonas putida (PpKSI) and Commamonas testosteroni (CtKSI). These two enzymes have

practically identical active sites and catalytic residues placement, while sharing only 34% amino

acid sequence identity[63]. This fold is not entirely uncommon in nature[66], being

superimposable on Nuclear Transcription Factor 2[67] despite lack of sequence homology or

function similarity[66, 68].

O-

O

R

OH

R

OHO

R

O

O

CH3

CH3

H

H

H

OHO

R

OH

R

OHO

R

O

O-

CH3

CH3

H

H

O-

O

R

OH

R

OHO

R

O

O

CH3

CH3

H

H

H

Figure 3.1. Mechanism of KSI based on PpKSI numbering.

The active site of KSI is particularly hydrophobic which is reasonable for an enzyme that

binds steroid ligands[66]. The mechanism for KSI involves abstraction of a proton at the C4

D40 D40 D40

D103 D103 D103

Y16 Y16 Y16

44

position by D40 (PpKSI numbering) followed by stabilization of the intermediate by D103 and

Y16[32, 63, 65, 66, 69]. Regeneration of the catalyst is achieved by the C6 carbon abstracting

the hydrogen from D40. The ability for an aspartic acid to act as a base is of particular interest,

especially with a nearby aspartic acid requiring protonation to stabilize the resulting enolate ion.

With the advent of the Protein Structure Initiative many crystal structures are uploaded to

the Protein Data Bank with putative, predicted, or unknown function. These structures often have

function assignments based purely on sequence or structural similarity. With misannotation in

databases becoming an increasing problem[70], recently we have developed a method to help

assign function to structures without biochemical data called SALSA: Structurally Aligned Local

Sites of Activity[71]. Because THEMATICS and POOL allow the active site of any protein to be

predicted regardless of existing homology and based solely upon the tertiary structure of the

enzyme they are optimal for prediction of protein function that may be incorrectly annotated.

There are three putative KSI proteins from structural genomics centers from three

organisms: Mycobacterium tuberculosis (MtKSI), Pectobacterium atrosepticum (PaKSI), and

Mesorhizobium loti (MlKSI). Previous work in our group by Dr. Srinivas Somarowthu has

shown that of these three, only MtKSI possesses KSI activity. However, the catalytic efficiency

of MtKSI was found to be on the order of 105 M

-1 s

-1, a thousand times lower than PpKSI’s

efficiency of 108 M

-1 s

-1. This begs the question: what are the key differences that lead to this

loss of activity between MtKSI and PpKSI? Can the activity of MtKSI be brought to PpKSI

levels by making the MtKSI active site more PpKSI-like?

http://www.pdb.org/pdb/search/smartSubquery.do?smartSearchSubtype=TreeEntityQuery&t=1&n=29471http://www.pdb.org/pdb/search/smartSubquery.do?smartSearchSubtype=TreeEntityQuery&t=1&n=381

45

3.2. Computational Predictions

For each known KSI and putative KSI, POOL ranked each residue’s importance for

catalysis and the top 10% for each was used as a cut-off. The structures were aligned based on

their active sites and a structural alignment Table was created (see Table 3.1). Nuclear

Transcription Factor 2 (NTF2) contains an incredibly similar overall fold without sharing any

function with KSI and thus was used as a negative control.

PDB Structurally aligned POOL predicted residues

PpKSI 1oh0 Y32 Y57 Y16 D40 W120 F56 G49 P41 D103 D35 G43 E39 M116

CtKSI 8cho F30 Y55 Y14 D38 F116 F54 G47 P39 D99 D33 G41 E37 M112

MtKSI 2z76 M32 F64 S16 D40 W128 F63 G56 P41 F111 D35 G43 E39 M124

MlKSI 3hx8 Y52 W76 F36 P60 S146 L75 G68 P61 Y125 D55 - F59 D142

PaKSI 3d9r Y35 Y59 Y19 G43 K131 V58 G51 P44 E110 D38 - M42 Y127

NTF2 1oun Y33 L56 Y18 W41 A122 K55 G48 E42 Q101 A36 - T40 D117

Table 3.1. SALSA alignment of POOL predicted residues for known KSI proteins and proteins annotated as putative

KSIs. Bold: POOL-predicted; underlined: literature annotated.

POOL prediction based on top 10% of rankings. The proteins in Table 3.1, in order from

top to bottom: two known KSIs, three SG putative KSIs, and a nuclear transcription factor of

similar structure, shown for comparison. For the three putative KSIs, only MtKSI’s active site is

both predicted and similar to the known KSI active sites; both MlKSI and PaKSI do not have

similar active sites, nor are the residues in the same spatial positions as the KSI active site

predicted to be important for activity. The match between MtKSI and PpKSI / CtKSI is not

100%. While a tyrosine to phenylalanine mismatch is somewhat conservative, it is of note that

for MtKSI that F64 of interest is not predicted by POOL to be important for catalysis. The same

can be said for the S16 where PpKSI and CtKSI have a tyrosine as well. The essential aspartic

acid at D40 is conserved, but curiously the other aspartic acid at PpKSI-D103 / CtKSI-D99

which is thought to be essential is replaced by a non-POOL-predicted F111 in MtKSI.

46

3.3. Materials and Methods

Wild-type MtKSI DNA was obtained in the form of a plasmid pGST-Rv0760c (Craig

Garen and Prof. Michael James, Department of Biochemistry, University of Alberta) encoding MtKSI with a

GST-tag, as well as an ampicillin resistance marker gene. Steroids were purchased from

Steraloids Inc, RI, USA. Primers were hydrated to 100 μM concentration with sterile water, and

a 5 μM stock was created by diluted 20-fold into sterile water. Codons manipulated are

underlined.

3.3.1. Methods

QuikChange (Agilent Technologies) site-directed mutagenesis was used to mutate the

wild-type KSI gene with the following mutations: S16Y, F64Y, F111D, S16Y/F64Y,

S16Y/F111D, F64Y/F111D, and S16Y/F64Y/F111D. Since the amino acids of interest are coded

by codons far enough apart, multiple mutations can be introduced using single-mutation primers

in succession.

MtKSI.F111D-F GGCGTGGACACCTACCGGGTG

MtKSI.F111D-R CACCCGGTAGGTGTCCACGCC

MtKSI.F64Y-F GGCGCCTTCTACGACACACAC

MtKSI.F64Y-R GTGTGTGTCGTAGAAGGCGCC

MtKSI.S16Y-F CGCAGTCGTACTGGCGGTGCG

MtKSI.S16Y-R CGCACCGCCAGTACGACTGCG

Figure 3.2. Primers for site-directed mutagenesis of MtKSI in plasmid pGST-Rv0760c.

BL21 DE3 pLysS competent cells were transformed with pGST-Rv0760c containing

either WT or mutations, and after streaking a transformed colony, a single colony was used to

inoculate 50 mL of LB liquid culture which was grown overnight with 100 μg μL-1

ampicillin.

The next day, the 50 mL culture transferred to a 500 mL of LB liquid culture with 100 μg μL-1

and grown with shaking for 2 h at 37 °C. Once an OD of 0.5-0.8 at 600 nm was obtained, the

47

culture was brought to 0.5 mmol L-1

IPTG to induce expression and agitated at room temperature

overnight. After overnight growth, the culture was harvested by centrifugation at 6000 RPM for

10 minutes, suspended in 1X Phosphate Buffered Saline (PBS) pH 7.3 with 1 mM DTT and ½ a

tablet of Roche Protease Inhibitor cocktail (Buffer A) and stored at -80°C.

Frozen pellets from the -80 °C freezer were thawed overnight in ice. The suspended,

thawed cells were subjected to sonication for 2 min (multiple rounds of 10 sec on followed by 10

sec off) and then clarified by centrifugation at 14,000 rcf for 60 min. The supernatant was

collected and loaded onto a disposable 4B Sepharose GST column resin (GE Healthcare). The

column was washed with Buffer A extensively, and then the GST-tagged MtKSI gradually eluted

with 1 to 10 mM reduced glutathione. Fractions containing MtKSI determined by SDS-PAGE

were collected and combined with histidine-tagged TEV protease overnight and then dialyzed

against Buffer A to remove any reduced glutathione. The solution was then run through a 4B

Sepharose GST column, except this time the KSI was collected in the initial flow through, and

then filtered onto a Nickel FPLC column to remove the histidine-tagged TEV protease, and the

MtKSI was collected in the flow through. Fractions containing MtKSI were determined by SDS-

PAGE, and then concentrated using Viva-spin tubes with a 5000 Da Molecular Weight Cut Off

(SartoriusStedim biotech) while being exchanged into KSI storage buffer (50 mM NaCl, 10 mM

Tris-HCl, 1 mM DTT, pH 8.0). Purity was determined by SDS-PAGE and protein concentration

determined by Bradford Assay against a BSA standard.

Activity of MtKSI was determined by formation of 4-androstene-3,17-dione (4AND) by

isomerization of 5-androstene-3,17-dione (5AND), measured at 248 nm by a UV/Vis instrument

for a fixed enzyme concentration and varying substrate concentration between 30 and 300 μM

48

5AND while keeping final methanol concentration 3.3% v/v (Table 3.2; Figure 3.3). Enzyme

concentration was fixed at a final concentration of 10 nM from a 1.2 μM stock that was made by

diluting purified KSI with a dilution buffer (34 mM KCl, 2.5 mM EDTA, 1% BSA, pH 7.0).

Reactions were blanked with all reagents except the substrate, 5AND. 5AND was added,

mixed completely quickly, and then the absorbance at 248 nm tracked for 60 seconds, starting

after 3 seconds, every 0.5 seconds.

[5AND]

μM final

[KSI]

nM

2 X

Buffer Water Methanol

KSI

1200 nM

5AND

3 mM

5AND

10 mM Total

10 10 1500 1375 90 25 10 - 3000

20 10 1500 1375 80 25 20 - 3000

30 10 1500 1375 70 25 30 - 3000

60 10 1500 1375 40 25 60 - 3000

90 10 1500 1375 10 25 90 - 3000

120 10 1500 1375 64 25 - 36 3000

180 10 1500 1375 46 25 - 54 3000

300 10 1500 1375 10 25 - 90 3000

Table 3.2. Kinetic assays for MtKSI. Bolded columns denote final concentrations, where all other numbers refer to

μL added to the cuvette.

3.4. Results

y = 0.0142x - 0.0088 R² = 0.9994

y = 0.0147x + 0.0076 R² = 0.9971

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70

Ab

sorb

an

ce (

248 n

m)

4AND (μM)

49

Figure 3.3. Standard curve for 4-androstene-3,17-dione (4AND).

A molar extinction coefficient of 1.4 x 104 M

-1 cm

-1 was used for kinetic analysis. Two of

two trials are shown in Figure 3.3.

Figure 3.4. Michaelis-Menten plots for WT MtKSI and variants. Error bars represent standard error of at least three

independent trials.

WT and F64Y KSI show increasing V0 with increasing substrate concentration, although

the V0 do not approach a Vmax due to poor solubility of the substrate. Therefore, all KM values for

KSI are reported as KMapp

. Vmax can still be extrapolated by non-linear regression, but with less

accuracy represented by lower regression coefficients and higher standard errors.

Figure 3.5. WT KSI and F64Y individual Michaelis-Menten plots.

MtKSI WT MtKSI F64Y

50

Any variant containing S16Y and/or F111D however does not show classic Michaelis-

Menten behavior along with significantly diminished activity. Purification of MtKSI-F111D was

problematic, including low yields and loss of protein after concentration. Only a small amount of

data could be obtained for F111D, but there seems to be no deviation from the behavior shown

by the other non-F64Y mutants.

MtKSI Variant Vmax KMapp

R2

WT 63.67 (29.08) 453.6 (297.3) 0.7611

S16Y 10.04 (25.41) 1607 (4635) 0.5909

F64Y 57.83 (3.993) 231.1 (27.89) 0.98

S16Y/F64Y 18.65 (29.19) 2485 (4247) 0.8972

S16Y/F111D 4.094 (1.148) 273.4 (127) 0.8289

F64Y/F111D 0.3873 (0.104) 3.235 920.3) 0.2783

S16Y/F64Y/F111D 3.821 (11.98) 1674 (5973) 0.5009 Table 3.3. Vmax and KM

app for WT MtKSI and variants. Standard errors are in parenthesis and consist of at least three

independent trials.

WT V0

μM min-1

F111D V0

μM min-1

Trial 1 11.8 0.68

Trial 2 4.9 0.70

Trial 3 11.8 0.88

Trial 4 13.8 0.85 Table 3.4. Comparison between the WT MtKSI and F111D variant at 90 μM 5AND.

Figure 3.6. Single experiment of Michaelis-Menten plot for MtKSI F111D.

51

kcat

(s-1

)

KMapp

(μM)

Catalytic

Efficiency

(103 M

-1 s

-1)

Fold decrease

to WT

WT 106 (48) 454 (297) 234 (187) --

S16Y 17 (42) 1607 (4635) 10 (40) 23 (88)

F64Y 96 (6.7) 231 (28) 417 (58) 0.6 (0.45)

F111D 1.5 (--) 70 (--) 36 (--) 6.4 (5.15)

S16Y/F64Y 31 (49) 2485 (4247) 13 (29) 18.7 (46)

S16Y/F111D 6.8 (1.9) 273 (127) 25 (14) 9.4 (9.1)

F64Y/F111D 0.6 (0.2) 3.2 (20) 200 (1253) 1.2 (7.4)

S16Y/F64Y/F111D 6.4 (20) 1674 (5973) 4 (18) 62 (296) Table 3.5. Catalytic efficiency for WT MtKSI and variants. Where available, standard errors are in

parentheses and consist of at least three independent trials.

For comparison, in Table 3.5, PpKSI’s catalytic efficiency is 100,000 x 103 M

-1 s

-1. F64Y

retained the same kcat while having a lower KMapp

, giving it a higher catalytic efficiency than the

WT. All other mutants lack the signal to noise required to make an accurate analysis of their

Michaelis-Menten paramaters or catalytic efficiency.

For any variant tested besides F64Y, the Michaelis-Menten parameters of KM and kcat

could not be calculated, evidenced by higher standard errors than measurements themselves for

most of these variants. Enzyme efficiencies may be compared without separating the KM and kcat

variables. If the concentration of substrate is negligible compared to the KM, the additive term of

substrate concentration in the Michaelis-Menten equation can be dropped.

Assuming [s]

52

MtKSI Variant V0

Fold Decrease to

WT

WT 3.87 (1.4) --

S16Y 0.27 (0.24) 14.3 (13.7)

F64Y 6.80 (3.3) 0.6 (0.35)

F111D 0.41 (--) 9.4 (--)

S16Y/F64Y 0.26 (0.02) 14.8 (5.7)

S16Y/F111D 0.47 (0.28) 8.2 (5.8)

F64Y/F111D 0.41 (0.23) 9.4 (6.4)

S16Y/F64Y/F111D 0.42 (--) 9.2 (--) Table 3.6. WT MtKSI and variants compared solely based on initial velocities at 30 μM 5AND. Where

available, standard deviations are in parentheses and represent at least three independent trials.

These results only report ratios of catalytic efficiency without examining kcat or KMapp

individually. F64Y results are similar between this method and full Michaelis-Menten kinetic

analysis.

53

3.5. Conclusions

For any mutation tested in MtKSI, or combination thereof, the resulting variant had little

to no KSI activity on 5AND except for the F64Y variant, and proper Michaelis-Menten curves

could not be constructed. Why did we not increase the catalytic efficiency to more closely

approximate the PpKSI and CtKSI forms with a more “PpKSI-like” active site?

Figure 3.7. “Top-down” view of PpKSI (PDB ID: 1OHO; Red), CtKSI (PDB ID: 8CHO; Orange), and

MtKSI (PDB ID: 2Z76; Yellow).

The steroid-binding pocket and active site is at the front of Figure 3.7. The left group of

residues is Y57, Y55, and F64 respectively. The top group of residues is Y16, Y14, and S16

respectively. The right group of residues is D103, D99, and F111 respectively. MtKSI-F64,

while being spatially aligned with PpKSI-Y57 in many structural alignments, is actually

swiveled almost 180° away from where they phenol group is pointing in either PpKSI or CtKSI

(Figure 3.7). There are few replacements for PpKSI-Y57 and MtKSI-Y113 is too far away to

54

take over its job[68].This seems to be a limitation of structural alignments more than SALSA,

but calls to attention the importance of human verification. In this respect, it makes sense for the

F64Y variant to have unmodified catalytic activity.

Figure 3.8. Three residues of interest in PpKSI (PDB ID: 1OHO; Red), CtKSI (PDB ID: 8CHO; Orange),

and MtKSI (PDB ID: 2Z76; Yellow) without surrounding secondary structure.

F111 and S16 from MtKSI overlap well with their SALSA partners in PpKSI and CtKSI

(Figure 3.8). However, mutations to make the side-chains similar resulted in loss of activity. The

natural substrate for MtKSI is unknown. Because Y16 PpKSI / CtKSI position is used in

recognition of the steroid ligand[68], MtKSI could very well use a different steroid. If the natural

substrate for MtKSI is a different steroid, this would explain the reduced catalytic efficiency on

5AND and sensitivity to changing the binding recognition pocket.

The identification of F111 in MtKSI as spatially equivalent to PpKSI-D103 does not

seem to be an alignment error; there are no residues in the MtKSI structure that seem capable of

55

replicating the essential catalytic role of D99/D103. Indeed, the authors in the report of the

crystal structure use this to argue against Rv0760c having KSI activity and reported no activity

on 5AND[68].

The peculiarity of the POOL and SALSA predictions stands out after these results. What

does it mean for an enzyme to not only have a strikingly different residue at a catalytic position,

but also for that residue to not be predicted for activity by POOL? Clearly it doesn’t discount a

certain functional activity, such as ketosteroid isomerization of 5AND, but it may correspond to

different substrate recognition, or even a different mechanism.

How many differences are required to declare two enzymes to have different functions,

and how many similarities must there be before they are declared similar? This is a current area

of investigation[71].

56

Chapter 4. Future Work

4.1. POOL-rank cut-offs

THEMATICS is a Boolean predictor giving either a yes or no for each residue in a

protein structure. In contrast, POOL assigns a ranking to every residue in a given protein

structure, and it is up to the user to determine what cut-off to implement for best results.

Traditionally, the top 5, 8, or even 10% of POOL predictions are considered to be positive

predictions[26, 27, 30, 32, 33, 71]. There remain two open questions:

1) Should POOL prediction be based on percentage or POOL score?

2) Can we use POOL to predict single-layer vs. multi-layer enzyme active sites?

Recent work has shown convincingly that a POOL normalized score cut-off is superior to

flat % cut-offs. By itself, a percentage cut-off presents an odd assumption that the number of

partaking residues of an active site is linearly and directly proportional to the total number of

amino acids. Rather, assigning an absolute cut-off of normalized POOL score (such as 0.01)

seems more rational, and is currently being investigated as the next generation cut-off for POOL.

The second question remains much more difficult and lies central to our work on active

site catalysis, engineering, and understanding. It has been shown that there are some multi-layer

active site enzymes[31-33] and some single-layer active site enzymes[32][this work]. How to

differentiate easily though, much less without examining each predicted residue, is still an

ongoing discussion. It has been proposed that the shape of the POOL plot itself may provide

predictive power regarding the extent of an enzyme’s active site. This hypothesis comes from an

empirical observation across a few proteins studied so far that the POOL plots seem to drop

57

much more sharply for enzymes with single-layer active sites than multi-layer active sites

(Figure 4.1).

Single-layered active site proteins alkaline phosphatase and ketosteroid isomerase have

sharp decreases immediately, flat-lining by their 10th

residue for AP and even by the 5th

residue

for KSI. Multi-layered active site proteins phosphoglucose isomerase, cobalt-type nitrile

hydratase, α subunit of pol III (DnaE), and pol IV (DinB) have extended tails on their POOL

plots and start flat-lining farther out compared to AP and KSI.

58

Figure 4.1. POOL plots for AP, KSI, PGI, NH, DnaE and DinB.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20

Norm

ali

zed

PO

OL

sco

re

POOL rank

AP POOL plot

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20

Norm

ali

zed

PO

OL

sco

re

POOL rank

KSI POOL plot

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20

Norm

ali

zed

PO

OL

sco

re

POOL rank

PGI POOL plot

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20

Norm

ali

zed

PO

OL

sco

re

POOL rank

NH POOL plot

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20

Norm

ali

zed

PO

OL

sco

re

POOL rank

DnaE POOL Plot

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20

Norm

ali

zed

PO

OL

sco

re

POOL rank

DinB POOL Plot

59

Appendix A. Propagation of error in calculating catalytic efficiency

Both the Vmax value and the KM value are calculated with their respective standard errors

based on the inputs to the GraphPad Prism program. Vmax values are converted to kcat values by

the following transformation, where kcat is in s-1

, Vmax in μM min-1

and [enzyme] in μM:

AP Variant [e] (μM)

WT 0.002

H372D 0.011

H372L 0.011

H86L 0.0083

S105A 0.0018

E57Q 0.006

M53A 0.006

M53T 0.002

D330N 0.002

Q453E 0.008

E150Q 0.0023

Table A.1. Concentrations of enzymes used to gather kinetic data for alkaline phosphatase.

All MtKSI kinetic experiments were done with 0.010 μM enzyme. Enzymes were diluted

from stock concentrations measured by Bradford assays using a BSA standard. Catalytic

efficiency is defined as the kcat divided by KM. To propagate the error in each measurement, I

used (where σx is the standard error of variable x):

√(

)

(

)

Where in our case Z is catalytic efficiency, X is kcat, and Y is KM.

60

References

1. Axe, D.D., Estimating the prevalence of protein sequences adopting functional enzyme

folds. J Mol Biol, 2004. 341(5): p. 1295-315.

2. Pauling, L., Molecular architecture and biological reactions. Chem. Eng. News, 1946.

24(10): p. 1375-1377.

3. Garcia-Viloca, M., et al., How enzymes work: analysis by modern rate theory and

computer simulations. Science, 2004. 303(5655): p. 186-95.

4. Fischer, E., Einfluss der Configuration auf die Wirkung der Enzyme. Berichte der

deutschen chemischen Gesellschaft, 1894. 27(3): p. 2985-2993.

5. Koshland, D.E., Application of a Theory of Enzyme Specificity to Protein Synthesis.

Proceedings of the National Academy of Sciences, 1958. 44(2): p. 98-104.

6. Damborsky, J. and J. Brezovsky, Computational tools for designing and engineering

biocatalysts. Curr Opin Chem Biol, 2009. 13(1): p. 26-34.

7. Gora, A., J. Brezovsky, and J. Damborsky, Gates of Enzymes. Chemical Reviews, 2013.

8. Crick, F., Central Dogma of Molecular Biology. Nature, 1970. 227(5258): p. 561-563.

9. Brannigan, J.A. and A.J. Wilkinson, Protein engineering 20 years on. Nat Rev Mol Cell

Biol, 2002. 3(12): p. 964-70.

10. Siegel, J.B., et al., Computational design of an enzyme catalyst for a stereoselective

bimolecular Diels-Alder reaction. Science, 2010. 329(5989): p. 309-13.

11. Rothlisberger, D., et al., Kemp elimination catalysts by computational enzyme design.

Nature, 2008. 453(7192): p. 190-5.

12. Walter, K.U., K. Vamvaca, and D. Hilvert, An active enzyme constructed from a 9-amino

acid alphabet. J Biol Chem, 2005. 280(45): p. 37742-6.

13. Kuhlman, B., et al., Design of a novel globular protein fold with atomic-level accuracy.

Science, 2003. 302(5649): p. 1364-8.

14. Hilvert, D., Design of protein catalysts. Annu Rev Biochem, 2013. 82: p. 447-70.

15. Turner, N.J., Directed evolution drives the next generation of biocatalysts. Nat Chem

Biol, 2009. 5(8): p. 567-73.

16. Jackel, C. and D. Hilvert, Biocatalysts by evolution. Curr Opin Biotechnol, 2010. 21(6):

p. 753-9.

17. Ondrechen, M.J., J.G. Clifton, and D. Ringe, THEMATICS: a simple computational

predictor of enzyme function from structure. Proc Natl Acad Sci U S A, 2001. 98(22): p.

473-8.

18. Shehadi, I.A., H. Yang, and M.J. Ondrechen, Future directions in protein function

prediction. Mol Biol Rep, 2002. 29(4): p. 329-35.

19. Shehadi, I.A., et al., Active site prediction for comparative model structures with

thematics. J Bioinform Comput Biol, 2005. 3(1): p. 127-43.

20. Shehadi, I.A., et al., THEMATICS is effective for active site prediction in comparative

model structures, in Proceedings of the second conference on Asia-Pacific bioinformatics

- Volume 292004, Australian Computer Society, Inc.: Dunedin, New Zealand. p. 209-215.

21. Ringe, D., et al., Protein structure to function: insights from computation. Cell Mol Life

Sci, 2004. 61(4): p. 387-92.

61

22. Ko, J., et al., Prediction of active sites for protein structures from computed chemical

properties. Bioinformatics, 2005. 21 Suppl 1: p. i258-65.

23. Wei, Y., et al., Selective prediction of interaction sites in protein structures with

THEMATICS. BMC Bioinformatics, 2007. 8(1): p. 119.

24. Ko, J., et al., Statistical criteria for the identification of protein active sites using

theoretical microscopic titration curves. Proteins: Structure, Function, and

Bioinformatics, 2005. 59(2): p. 183-195.

25. Tong, W., et al., Enhanced performance in prediction of protein active sites with

THEMATICS and support vector machines. Protein Sci, 2008. 17(2): p. 333-41.

26. Tong, W., et al., Partial Order Optimum Likelihood (POOL): Maximum Likelihood

Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties.

PLoS Comput Biol, 2009. 5(1): p. e1000266.

27. Somarowthu, S., et al., High-performance prediction of functional residues in proteins

with machine learning and computed input features. Biopolymers, 2011. 95(6): p. 390-

400.

28. Capra, J.A., et al., Predicting Protein Ligand Binding Sites by Combining Evolutionary

Sequence Conservation and 3D Structure. PLoS Comput Biol, 2009. 5(12): p. e1000585.

29. Sankararaman, S. and K. Sjölander, INTREPID—INformation-theoretic TREe traversal

for Protein functional site IDentification. Bioinformatics, 2008. 24(21): p. 2445-2452.

30. Somarowthu, S. and M.J. Ondrechen, POOL server: machine learning application for

functional site prediction in proteins. Bioinformatics, 2012. 28(15): p. 2078-2079.

31. Brodkin, H.R., et al., Evidence of the participation of remote residues in the catalytic

activity of Co-type nitrile hydratase from Pseudomonas putida. Biochemistry, 2011.

50(22): p. 4923-35.

32. Somarowthu, S., et al., A tale of two isomerases: compact versus extended active sites in

ketosteroid isomerase and phosphoglucose isomerase. Biochemistry, 2011. 50(43): p.

9283-95.

33. Walsh, J.M., et al., Effects of non-catalytic, distal amino acid residues on activity of E.

coli DinB (DNA polymerase IV). Environ Mol Mutagen, 2012. 53(9): p. 766-76.

34. Muller, B.H., et al., Improving Escherichia coli alkaline phosphatase efficacy by

additional mutations inside and outside the catalytic pocket. Chembiochem, 2001. 2(7-8):

p. 517-23.

35. Coleman, J.E., Structure and mechanism of alkaline phosphatase. Annu Rev Biophys

Biomol Struct, 1992. 21: p. 441-83.

36. Lassila, J.K., J.G. Zalatan, and D. Herschlag, Biological Phosphoryl-Transfer Reactions:

Understanding Mechanism and Catalysis. Annu Rev Biochem, 2011. 80(1): p. 669-702.

37. Andrews, L.D., T.D. Fenn, and D. Herschlag, Ground State Destabilization by Anionic

Nucleophiles Contributes to the Activity of Phosphoryl Transfer Enzymes. PLoS Biol,

2013. 11(7): p. e1001599.

38. Andrews, L.D., H. Deng, and D. Herschlag, Isotope-edited FTIR of alkaline phosphatase

resolves paradoxical ligand binding properties and suggests a role for ground-state

destabilization. J Am Chem Soc, 2011. 133(30): p. 11621-31.

39. Kim, E.E. and H.W. Wyckoff, Reaction mechanism of alkaline phosphatase based on

crystal structures. Two-metal ion catalysis. J Mol Biol, 1991. 218(2): p. 449-64.

62

40. Stec, B., K.M. Holtz, and E.R. Kantrowitz, A revised mechanism for the alkaline

phosphatase reaction involving three metal ions. J Mol Biol, 2000. 299(5): p. 1303-11.

41. Murphy, J.E., X. Xu, and E.R. Kantrowitz, Conversion of a magnesium binding site into

a zinc binding site by a single amino acid substitution in Escherichia coli alkaline