GFP as a tool to monitor membrane protein topology and overexpression in Escherichia coli David Eric Drew
2005
2
Doctoral thesis 2005 Department of Biochemistry and Biophysics Stockholm University, S-106 91 Stockholm Sweden ISBN 91-7155-160-3, pp. 1-65 Intellecta Docusys, Stockholm 2005 All previously published papers are reprinted with permission from the publisher.
3
Table of Contents Abstract 5 Abbreviations 6 List of publications 8 1. Introduction 9 1.2 Membrane proteins 11 1.2.1 α-helical architecture 12 1.2.2 Membrane protein biogenesis 15 1.2.3 Membrane protein folding 16 1.2.4 Membrane proteins and lipids 18 2.1 Membrane protein topology 20 2.1.1 Topology prediction algorithms 20 2.1.2 Reliability of topology prediction 21 2.1.3 Experimental topology mapping 22 2.2 High-throughput topology mapping of E. coli membrane proteins 25 2.2.1 A consensus approach for generating topology models 25 2.2.2 Using GFP as a cytoplasmic membrane protein topology
reporter in E. coli 25 2.2.3 Combining C-terminal orientation analysis with a consensus-
prediction approach 28 2.2.4 The reliability of topologies generated by a consensus approach 28 2.2.5 Generating topology models by constraining TMHMM 29 2.2.6 Why does GFP work as a topology reporter? 30 2.2.7 Comparing 2D maps to 3D-structures 31 2.2.8 Summary of high-throughput membrane protein topology mapping 31 3.1 Membrane protein overexpression 33 3.1.1 Limited availability of biogenesis factors and/or lipid space may
hamper membrane protein overexpression 33 3.1.2 ‘Trial-and-Error’ 34 3.1.3 Choosing a membrane protein overexpression host 35 3.1.4 General strategies for membrane protein overexpression in E. coli 36 3.1.5 The BL21(DE3)pET-system 38 3.1.6 Membrane protein purification 38 3.2 High-throughput membrane protein overexpression in E. coli 40 3.2.1 Inclusion bodies of membrane protein-GFP fusions are not fluorescent 40 3.2.2 GFP tagging works only for membrane proteins with a cytoplasmic
C-terminus 42 3.2.3 GFP as a membrane protein folding indicator in whole cells 42
4
3.2.4 GFP-based screen to optimize membrane protein overexpression 43 3.2.5 In-gel GFP fluorescence 44 3.2.6 GFP-based purification pipeline 46 3.2.7 Recovery of membrane proteins from GFP fusions using a site specific protease 46 3.2.8 How does this GFP-based method compare to other high-throughput
approaches? 47 3.2.9 Summary of high-throughput membrane protein overexpression 48 4. Characterization of the membrane protein YedZ 49 4.1.1 A test case for the GFP-based purification pipeline: YedZ 49 4.1.2 YedZ is a novel integral membrane flavocytochrome 49 4.1.3 The possible function of YedZ 52 5. Conclusions 55 References 56 Acknowledgements 65
5
Abstract
Membrane proteins are essential for life, and roughly one-quarter of all open
reading frames in sequenced genomes code for membrane proteins.
Unfortunately, our understanding of membrane proteins lags behind that of
soluble proteins, and is best reflected by the fact that only 0.5% of the structures
deposited in the protein data-bank (PDB) are of membrane proteins. This
discrepancy has arisen because their hydrophobicity - which enables them to
exist in a lipid environment - has made them resistant to most traditional
approaches used for procuring knowledge from their soluble counter-parts. As
such, novel methods are required to facilitate our knowledge acquisition of
membrane proteins.
In this thesis a generic approach for rapidly obtaining information on
membrane proteins from the classic bacterial encyclopedia Escherichia coli is
described. We have developed a Green Fluorescent Protein C-terminal tagging
approach, with which we can acquire information as to the topology and
‘expressibility’ of membrane proteins in a high-throughput manner. This
technology has been applied to the whole E. coli inner membrane proteome, and
stands as an important advance for further membrane protein research.
6
Abbreviations BiP binding protein, Hsp70 C-terminal carboxy-terminal ER endoplasmic reticulum FRET fluorescence resonance energy transfer GFP green fluorescent protein GPCR G-protein coupled receptor HMM hidden Markov model IMAC immobilized metal affinity chromatography IPTG isopropyl-β-D-thiogalactoside Lep signal peptidase I, leader peptidase Mo-MPT molybdenum-molybdopterin N-terminal amino-terminal NR nitrate reductase ORF open reading frame PE phosphatidylethanolamine PhoA alkaline phosphatase Pmf proton motive force SRP signal recognition particle Tat twin arginine translocation TEV tobacco etch virus TMs transmembrane segments UPR unfolding protein response
7
Amino acid designations Alanine Ala A Cysteine Cys C Aspartic acid Asp D Glutamic acid Glu E Phenylalanine Phe F Glycine Gly G Histidine His H Isoleucine Ile I Lysine Lys K Leucine Leu L Methionine Met M Asparagine Asn N Proline Pro P Glutamine Gln Q Arginine Arg R Serine Ser S Threonine Thr T Valine Val V Tryptophan Trp W Tyrosine Tyr Y
8
List of publications
This thesis is based upon the following publications: Paper I. Drew D, Sjöstrand D, Nilsson J, Urbig T, Chin CN, de Gier JW, von Heijne G. Rapid topology mapping of Escherichia coli inner-membrane proteins by prediction and PhoA/GFP fusion analysis. Proc Natl Acad Sci U S A. 2002 Mar 5;99(5):2690-5. Paper II. Rapp M, Drew D, Daley DO, Nilsson J, Carvalho T, Melen K, De Gier JW, von Heijne G. Experimentally based topology models for E. coli inner membrane proteins. Protein Sci. 2004 Apr;13(4):937-45. Paper III. Drew D, von Heijne G, Nordlund P, de Gier JW. Green fluorescent protein as an indicator to monitor membrane protein overexpression in Escherichia coli. FEBS Lett. 2001 Oct 26;507(2):220-4. Paper IV. Drew D, Slotboom D, Friso G, Reda T, Genevaux P, Rapp M, Meindl-Beinker N, Lambert W, Lerch M, Daley DO, van Wijk KJ, Hirst J, Kunji E, de Gier JW. A scalable, GFP-based pipeline for membrane protein overexpression screening and purification. Protein Sci. 2005 Aug;14(8):2011-7. Other Publications Urbanus ML, Fröderberg L, Drew D, Bjork P, de Gier JW, Brunner J, Oudega B, Luirink J. Targeting, insertion, and localization of Escherichia coli YidC. J Biol Chem. 2002 Apr 12;277(15):12718-23. Drew D, Fröderberg L, Baars L, de Gier JW. Assembly and overexpression of membrane proteins in Escherichia coli. Biochim Biophys Acta. 2003 Feb 17;1610(1):3-10. Review. Daley DO, Rapp M, Granseth E, Melen K, Drew D, von Heijne G. Global topology analysis of the Escherichia coli inner membrane proteome. Science. 2005 May 27;308(5726):1321-3.
9
1. Introduction All cells are surrounded by a membrane, a barrier that separates the cell from the
environment it faces. The membrane of the cell is mainly composed of lipid and
protein at an average ratio of 1:1 (Boon and Smith, 2002). Lipids are dual natured.
They consist of polar head groups that favor contact with water, and
hydrophobic tails - made up of acyl carbon chains - which implicitly avoid water.
The lipids pack into a fluid bilayer whereby the tails face each other and the head
groups, e.g., phosphate, are in contact with the surrounding water, Figure 1.
Figure 1. Schematic representation of a lipid bilayer; blue spheres represent polar head-groups, yellow sticks represent lipid tails, coloured cylinders represent membrane proteins, and attached sugars are represented by black antlers.
The driving force in the formation of a lipid bilayer is the spontaneous packing of
hydrophobic tails, as the entropy of water is increased during reduction of the
hydrated hydrophobic surface, i.e., the hydrophobic effect. The outcome is a
hydrophobic barrier that is impermeable for most molecules to cross without the
aid of proteins which are embedded in it. Not only are these ‘membrane
proteins’ required to facilitate the transport of various compounds either
passively or actively across the membrane, but they also e.g., impart structural
10
support, maintain voltage differences, enable interactions with other cells, and
transfer information from the outside to the inside of the cell. In other words
membrane proteins are essential for life, and roughly one-quarter of our genes
code for membrane proteins (Wallin and von Heijne, 1998). Strikingly, at least
~50% of all drugs manufactured today are targeted to membrane proteins
(Muller, 2000).
Unfortunately, our understanding of membrane proteins lags behind that
of soluble proteins, and is best reflected in the fact that only 0.5% of the
structures deposited in the protein data-bank (PDB) are of membrane proteins
(White, 2004). This discrepancy has arisen because their hydrophobicity, which
enables them to exist in a lipid environment, has made them resistant to most
traditional approaches used for procuring knowledge from their soluble counter-
parts. As such, novel methods are required to facilitate our knowledge
acquisition of membrane proteins.
In this thesis a generic approach for rapidly obtaining information on
membrane proteins from the classic bacterial encyclopedia Escherichia coli is
described. We have developed a Green Fluorescent Protein (GFP) C-terminal
tagging approach, with which we can acquire information as to the topology and
‘expressibility’ of membrane proteins in a high-throughput manner. This
technology has been applied to the E. coli inner membrane proteome, and stands
as an important advance for further membrane protein research.
Before we discuss this work in detail a clearer understanding of membrane
proteins is required.
11
1.2 Membrane proteins
Like lipids, membrane proteins consist of hydrophobic and hydrophilic parts.
These parts come together to produce two types of membrane protein
architecture, α-helical membrane proteins and β-barrel membrane proteins,
Figure 2. β-barrel membrane proteins are composed of an even number of anti-
parallel β-strands which hydrogen bond laterally to each other in the formation
of the barrel (Schulz, 2003). Amino acid side chains of mixed polarity extend into
the aqueous pore, whilst amino acids with apolar side chains line the outside of
the barrel and project into the lipid bilayer. Because this class of membrane
proteins is restricted to outer membranes of Gram-negative bacteria,
mitochondria and the outer envelope membrane of chloroplasts, it is not further
discussed here.
Figure 2: Two types of membrane protein architecture; (a) an example of a α-helical membrane protein and (b) an example of a β-barrel membrane protein (Walian et al., 2004).
12
1.2.1 α-helical architecture
The majority of membrane proteins are α-helical membrane proteins (Wallin and
von Heijne, 1998), henceforth they will be referred to simply as ‘membrane
proteins’. α-helical secondary structure is stabilized by main-chain hydrogen
bonding between backbone amide and carbonyl groups four amino acids apart.
Amino acid side chains with different physicochemical properties can extend at
predominantly right angles from the helix, i.e., amino acids with apolar side
chains project into the hydrophobic core of the lipid bilayer. Three dimensional
(3D) structures confirm that α-helices typically span the full-width of the lipid
bilayer, and are often referred to as trans-membrane segments (TMs).
Statistically, TMs are around 20-25 amino acids long with an average tilt angle of
24° to the membrane normal (Ulmschneider et al., 2005), though this tilt can
change to accommodate the thickness of the lipid bilayer (Park and Opella, 2005).
- Helix core-
A distance of ± 15Å from the centre of the membrane defines the core region of
the lipid bilayer, it has the lowest dielectric constant, and as such, charged
residues are uncommon in the middle of TMs (<6%) and hydrophobic amino
acids leucine, valine, isoleucine, alanine are abundant ~45% (Ulmschneider et al.,
2005), Figure 3. Amino acids with small side chains e.g., glycine and serine, are
also common, 7% each, facilitating packing between TMs (see section 1.2.3).
Biophysical and biological scales are broadly consistent with statistical
analysis. Charged residues arginine, aspartate, glutamate, and lysine are clearly
disfavored in the middle of the helix (∆Gapp 2.5 to 3.5 kcal/mol) whereas
hydrophobic amino acids leucine, valine, isoleucine, phenylalanine are favoured
(∆Gapp of -0.5 to -0.3 kcal/mol) (Hessa et al., 2005a). Proline while unfavorable is
often found in TMs to induce a helical angle change of some functional
significance (Senes et al., 2004), e.g., the sixth TM segment of the voltage-gated
potassium channel Kv1.2, contains a conserved Pro-X-Pro motif which forms a
13
receptor for its voltage sensor (Long et al., 2005a). Indeed, proline has one of the
largest phenotypic propensities in TM sequences from the Human Gene
Mutation Database (Senes et al., 2004).
Figure 3: Schematic representation of a TM segment in a lipid bilayer; residues with positional preference are indicated by their short-hand nomenclature, e.g., W= tryptophan (see abbreviations).
- Interfacial regions-
Aromatic amino acids tryptophan and tyrosine have a clear preference (∆Gapp -
0.6 kcal/mol) for the lipid interface (-25 to -15Å and 15 to 25Å), as these residues
can match their amphipathic side-chain character with that of the interfacial lipid
region, Figure 3 (Hessa et al., 2005a). The penalty of moving tryptophan or
tyrosine from the interface to the aqueous domain has been calculated to be 1.85
and 0.94 kcal/mol, respectively (Wimley and White, 1996). In addition, the
terminal placement of tryptophan in a model polyleucine TM segment is enough
to promote a C-terminal-in-orientation (Higy et al., 2004). In contrast,
14
phenylalanine has no positional preference for the interface (Hessa et al., 2005a;
Ulmschneider et al., 2005).
Charged residues make up one-fifth of the amino acids found in this
region (Ulmschneider et al., 2005). Along with polar residues, they often extend
their side-chains to the aqueous domain to help anchor TMs (Chamberlain et al.,
2004). This ‘snorkeling’ phenomenon is calculated to be stronger in positively
charged residues lysine and arginine, either because their side-chains are longer
and/or for the reason that they also interact favourably with negatively-charged
lipid head-groups (Strandberg and Killian, 2003). Snorkeling is also apparent for
the positively charged residues in interfacial helices which make up 30% of the
non-TM fold (Granseth et al., 2005b). Interestingly, lysine can make π-cation
interactions with tyrosine. This pairing promotes additional long-range
electrostatic interactions with negatively charged lipid head-groups (Gromiha
and Suwa, 2005).
As proline can destabilizes helices it more likely to be found at either end
of the helix (interfacial region), with the C-terminal end better tolerated over the
N-terminal end (Yohannan et al., 2004). The destabilizing effect of proline is
calculated to be stronger in straight TMs compared to angled TMs (Senes et al.,
2004). Proline may also aid protein folding by promoting the formation of
random coils (Ulmschneider et al., 2005), which make up 70% of the non-TM
segment fold found in this region (Granseth et al., 2005b).
-Non-membranous domains-
The hydrophilic membrane protein parts are composed of N- and C- terminal
tails and ‘loops’ that connect TMs. In all organisms, the frequency of positively
charged residues is higher in cytoplasmically localized non-membranous
domains, an observation that was coined the ‘positive-inside-rule’ (von Heijne,
1989). The preference of these positively charged residues for the cytoplasmic
domain influences the topology of connecting TMs accordingly. The ability of
15
positively charged residues to dictate the orientation of a TM segment seems to
depend on the overall hydrophobicity of the TM segment, and the distance of the
charged residues from it (Higy et al., 2004; Nilsson et al., 2005). The basis for the
rule still remains unclear. Although, it was demonstrated some time ago that the
proton-motive-force is required in the establishment of this phenomenon in E.
coli (Andersson and von Heijne, 1994), it offers only a partial explanation as there
is no apparent electrochemical potential across the ER membrane. Recently, it
was reported that charged residues in the translocon itself, by either attracting or
repelling charged amino acids may play a role, i.e., to promote the orientation of
a TM segment before it inserts into the lipid-bilayer (Goder et al., 2004).
Cytoplasmic N- and C- terminal tail orientations are predicted to be preferred in
all cells (Wallin and von Heijne, 1998). The percentage of E. coli membrane
proteins with both their N- and C-terminal ends in the cytoplasm was
experimentally measured at 60% (Daley et al., 2005). It appears that helices may
also have a preference for inserting into lipids as pairs (Hermansson and von
Heijne, 2003); the targeting to and insertion of membrane proteins into the
membrane is discussed below.
1.2.2 Membrane protein biogenesis
Although soluble domains of membrane proteins can fold autonomously into the
aqueous milieu of the cell, hydrophobic ΤΜs need to be actively assisted into the
lipid bilayer. This assistance surpasses the input of energy required to overcome
the insertion activation barrier imposed by the lipid bilayer, and prevents ΤΜs
from aggregating in the cytoplasm.
How does this work? If a (presumably) α-helical and sufficiently
hydrophobic stretch of amino acids has exited the ribosome tunnel, it will be
interpreted by the cell as a ‘signal’ for targeting to the membrane (Batey et al.,
2000; Huber et al., 2005). This signal is often present at the N-terminus of the
membrane protein, and is typically recognized by the signal recognition particle
16
(SRP) (Luirink and Sinning, 2004). SRP binds to the polypeptide chain, at least in
eukaryotes, halts further translation whilst targeting the nascent chain to the
lipid bilayer. Whether or not the ribosome can ‘prime’ SRP by sensing the
presence of a TM segment before it exits the ribosome is a matter of debate
(Houben et al., 2005; Woolhead et al., 2004). At the membrane, SRP makes
contact with the SRP receptor, and the nacent chain is subsequently transferred
to the Sec translocon; a multimeric protein-conducting channel embedded in the
lipid bilayer (Driessen et al., 2001; Van den Berg et al., 2004). Translation
subsequently resumes, and if the targeted nascent chain and/or other segments
downstream are hydrophobic and long enough, they will pass laterally through a
opening in the Sec translocon (Rapoport et al., 2004). The degree of insertion
seems to depend solely on energetically favorable helix-lipid interactions (Hessa
et al., 2005b). In the lipid bilayer, TM folding can be aided early on by other
membrane bound chaperones, such as YidC in the cytoplasmic bacterial
membrane (Houben et al., 2005).
1.2.3 Membrane protein folding
Membrane protein structures can (almost) be considered as ‘inside-out’ soluble
proteins, as the average hydrophobic exterior of a membrane protein is twice that
of its interior (Adamian et al., 2005). For membrane proteins with multiple TMs,
the TMs must come together to form a functional protein. From a global
perspective the hydrophobic effect drives the formation and subsequent
insertion of α-helices through the translocon - unfolding a 20 amino acid helix in
the lipid bilayer would cost ~40-80 kcal/mol (Schneider, 2004) - but what pushes
helices together?
Thermodynamic contributions of this process have been difficult to assess
because membrane proteins are difficult to purify and do not fold reversibly
under standard laboratory conditions (DeGrado et al., 2003); a requirement for
measuring folding equilibria, albeit that a fully reversible system was recently
17
established for the β-barrel membrane protein OmpA (Hong and Tamm, 2004).
Considerable understanding of this process has been based on helix dimerization
of glycophorin A (gpA), whereby physical association of GlyXXXGly (a
widespread motif in TMs, Senes et al., 2000) can be conveniently monitored, e.g.,
by analytical ultracentrifugation, fluorescence resonance energy transfer (FRET),
gel-electrophoresis, etc. (White and von Heijne, 2005).
The predominant view is that once inserted in the membrane, helix-helix
association is driven by the formation of favourable electrostatic interactions
between side chains of polar amino acids (Dawson et al., 2002). This is in line
with statistical analyses, as polar residues occupy 20% of all the residues found
in TMs (Dawson et al., 2003), and with the observation that in every TM segment
of every multispanning membrane protein structure solved so far, there is at
least one inter-helical hydrogen bond (Senes et al., 2004). Perceptually, the flip-
side of promiscuous electrostatic interactions between TMs is that it could lead to
aggregation by forming erroneous hydrogen bonds (Schneider, 2004). Yet this is
not the case. Once driven together, helical packing is coordinated by close,
specific Van der Waals interactions of non-polar residues, which often interlink
to build ‘knobs-into-holes’ packing (Engelman et al., 2003). As demonstrated for
TMs in mechanosensitive ion channels, this helix packing can be fine-tuned to
control the function of the protein in a most exquisite way (Edwards et al., 2005).
Mechanosensitive channels are force transducing molecules which move TMs to
open a channel in response to membrane tension (Kung, 2005). Mutations made
in a pore forming TM segment of a bacterial mechanosensitive channel to
strengthen knobs-into-holes packing to interacting TMs, leads to a loss-of-
function as the channel does not open under the same magnitude of membrane
tension; in contrast, an amino acid mutation to the polar amino acid serine makes
the channel easier to open as ‘wild-type’ helical packing is lost (Edwards et al.,
2005).
18
At short distances, hydrogen bonding between main chain Cα− H … O donors
may also stabilize helices (Senes et al., 2001), although their interaction is weak,
there can be many such interactions, e.g., in photosystem I there are 34 TMs and
75 Cα− H … O hydrogen-bonds (Jordan et al., 2001). Helix association can be
strong enough to maintain oligomerisation even in the absence of lipids and in
the presence of a harsh detergent, i.e., potassium channel KcsA remains a
tetramer in SDS (Krishnan et al., 2005).
Beyond the two-stage model of membrane protein folding (single TM
insertion and packing), bringing of helices together depends also on insertion of
co-factors, extramembranous polypeptide segment folding, and the assembly of
membrane protein complexes (Engelman et al., 2003). Lastly, it has been
speculated that the lipids themselves might drive interactions between TMs, as
computational measurements postulate that lipid entropy increases as the
protein-lipid interface decreases (Helms, 2002).
1.2.4 Membrane proteins and lipids
Clearly membrane proteins and lipids go hand-in-hand; they define favorable
amino acid residues in helices and dictate the insertion and folding rate of TMs
through the translocon. Not only is it becoming increasingly clear that certain
lipids interact more favorably with some amino acids (e.g., lysine/tyrosine π-
cation long-range interactions to phosphate head groups, Gromiha and Suwa,
2005), or to some membrane proteins (e.g., cardiolipin in the purple bacterial
photosynthetic reaction centre, Fyfe et al., 2005), but lipids to some extent must
also supply different lateral pressure to different membrane proteins (Jensen et
al., 2004).
Lateral pressure in different membranes is increased by the addition of
lipids with unsaturated chains and/or non-bilayer head-groups, e.g.,
phosphatidylethanolamine (PE). One idea is that membrane proteins insert easier
into a bilayer of lower curvature stress (e.g., as shown by in vitro folding studies
19
of bacteriorhodopsin into different liposomes), but that a certain degree of lateral
pressure is still needed to maintain a functional state (Booth, 2005). Interlinked is
the membrane bilayer thickness to TM segment length, that is, the degree of
hydrophobic mismatch between the α−helices and lipid (Jensen and Mouritsen,
2004). As demonstrated in vitro with the bacterial melibiose transporter (there are
many analogous examples), maximum transport is only reached at specific acyl
carbon chain lengths (Jensen and Mouritsen, 2004). Indeed, to seemingly match
the thickness of the lipid bilayer, on average, TMs of Golgi membrane proteins
are five amino acids shorter than those of plasma membrane proteins (Munro,
1998). Lastly, it is clear that lipid composition can affect membrane protein
topology (see next section). For instance, in the absence of PE the first six TMs of
lactose permease (LacY) are inverted; addition of PE after assembly of this partly
inverted protein restores the correct topology (Bogdanov et al., 2002).
20
2.1 Membrane protein topology
It is envisaged that in the future more rules that govern the architecture of a
membrane protein will be resolved, eventually allowing the construction of
meaningful in silico membrane protein 3D-structure predictions from amino acid
sequence (White and von Heijne, 2005). At present, to bridge the void created by
the lack of membrane protein structures, one can formulate 2D-structure models
using computer algorithms. 2D-structures are commonly referred to as
‘topology’ models, and define the number, position, and orientation of TMs
relative to the membrane.
2.1.1 Topology prediction algorithms
The most simplistic topology models are produced solely by computer
algorithms. The five topology predictors used in this thesis are described below.
[1] The algorithm TopPred scans for a TM segment in a given amino acid
sequence by searching for ‘threshold’ hydrophobicity over a typical TM segment
length (trapezoid-shaped window of 21aa). The positive-inside-rule is then used
to decide upon TM segment orientation (von Heijne, 1992).
[2] The Memsat algorithm increases the number of states used in TopPred
from two (helix or loop) to five (inside loop, outside loop, inside helix cap, helix
core, and outside helix). The probability that amino acids of an inputted amino
acid sequence belong in these states, their likelihood, is calculated based on a
membrane protein database of well-characterized topology. The most probable
outcome, i.e., the topology, is formulated by the statistical method ‘expectation
maximization’ and orientation/location agreed upon by incorporating another
dynamic programming algorithm (Jones et al., 1994).
[3] The PHDhtm algorithm estimates only two states (helix or loop), but
unlike TopPred, improves the signal by feeding off a multiple sequence
21
alignment. Notably, the algorithm has been ‘trained’ using neural networks from
a set of membrane proteins with known topology (Rost et al., 1996).
[4 and 5] The latest generation topology prediction programs HMMTOP
(Tusnady and Simon, 1998) and TMHMM (Krogh et al., 2001), are the ‘best’
combination of the aforementioned programs. Like Memsat, HMMTOP and
TMHMM take into account different states, five and seven respectively, and
analogous to PHDhtm use machine-learning algorithms, in this case, hidden
Markov models (HMM) to look for amino acid distribution patterns similar to
those defined in the training set. One advantage of TMHMM compared to the
other algorithms is that reliability scores are also generated. Recently, a newer
version of TMHMM was developed, like PHDhtm, it allows the input of multiple
sequence alignments. The TMHMM prediction performance is improved by ~8%
(Viklund and Elofsson, 2004).
2.1.2 Reliability of topology prediction
TMHMM is able to accurately predict the topology of 75% of the membrane
proteins used in training its HMM algorithm (Krogh et al., 2001). However, as
this training sample set is quite small, the predictive power is poorer for
previously unseen membrane proteins, 55-60% (Melen et al., 2003). The sample
set is also biased, as experimental determined topologies have favored those
membrane proteins that were easier to analyze owing to the fact they have had
clearly defined topological features, i.e., unusually hydrophobic TM segments
and/or an obvious positive charge difference between inside and outside loops
(Melen et al., 2003). As many of the easy to analyze proteins are prokaryotic in
origin, eukaryotic membrane proteins are underrepresented in all training sets
(Ott and Lingappa, 2002). Thus, the predictive performance by TMHMM for
eukaryotic membrane proteins is slightly worse, ~50% (Melen et al., 2003).
Highly reliable topology models can be generated by combining the
aforementioned five prediction methods, TopPred, Memsat, PHDhtm,
22
HMMTOP, and TMHMM; when all methods agree the topology is virtually
certain to be correct, whereas the fraction of correct topologies decreases with
increasing disagreement between the methods (Nilsson et al., 2000).
An approach to improve the membrane protein topology prediction is to
bioinformatically anchor domains in a prediction which are 100% certain to lie on
either one or the other side of the membrane, e.g., a cytosolic tyrosine
phosphatase domain. In eukaryotic genomes such domains provide 11%
coverage (Bernsel and Von Heijne, 2005). Alternatively, one can experimentally
map the location of loops and tails in a membrane protein by a variety of
methods (explained below). Just determining the C-terminal tail location of E. coli
membrane proteins helps TMHMM to improve its overall prediction accuracy
from 55 to 70%, i.e., as these domains can now be fixed in the topology prediction
(Melen et al., 2003).
2.1.3 Experimental topology mapping
Experimental approaches are often used to refine in silico topology models which
are not only biased, but (in general) are likely to miss details which are hard, if
not impossible, to predict, e.g., unanticipated inter- and intra- protein
interactions (Ott and Lingappa, 2002). One approach of obtaining information is
to use site-directed mutagenesis to introduce amino acids which are compatible
to different topology determination methods, e.g., cysteine scanning,
glycosylation mapping, and proteolytic cleavage.
For eukaryotic membrane proteins the most common method is
glycosylation mapping, which takes advantage of the fact that N-linked
glycosylation - the addition of ~2.5kDa worth of sugars to Asn-X-Ser/Thr
acceptor sequences - is possible only within the luminal compartment of the ER.
In practice, after adding glycosylation acceptor sequences into the predicted
soluble parts of the membrane protein by site-directed mutagenesis, the
membrane protein is transcribed and translated in vitro. The addition of sugars to
23
the membrane protein is distinguished from unglycosylated forms by the slight
difference in molecular weight after separation by SDS-PAGE (Nilsson and von
Heijne, 1993).
Perhaps the most labor intensive, and yet the most informative and least
invasive topology mapping method is cysteine scanning. In this method
cysteines are recombinantly added to a cysteine-less membrane protein, and
their localization within the membrane protein mapped by membrane permeable
or impermeable thiol-reagents (Bogdanov et al., 2005). This is a powerful method
as it is possible to elucidate the local environment of a single amino acid. This
approach was nicely demonstrated for the secondary-active transporter LacY
(Frillingos et al., 1998).
Another approach for obtaining topology information is to fuse a reporter to all
of the predicted solvent-exposed domains in the membrane protein. The reporter
can be fused end-to-end on, or ‘sandwiched’ (if chimera retains activity), into
different loops such that the full-length membrane protein is always expressed
(van Geest and Lolkema, 2000). When produced in E. coli the two most common
reporters are enzymes that catalyze a reaction on either one or the other side of
the membrane; the cytoplasm or periplasmic space (see below).
[1] Alkaline phosphatase (PhoA) is a soluble bacterial protein that is only
folded and functional when exported to the periplasm of E. coli where it can form
essential disulfide-bonds. It was one of the first, and still remains to be, one of the
most commonly used topology reporters. PhoA activity - the hydrolysis of
phosphoric esters – is measured easily with a substrate that changes colour upon
hydrolysis, e.g., p-nitrophenyl phosphate turns yellow. If PhoA remains in the
reducing environment of the cytoplasm it is sensitive to proteolysis because it
cannot form disulfide bonds (Manoil, 1991).
24
[2] β-galactosidase (LacZ) is a large tetrameric cytoplasmic enzyme, part of
the classic ‘lac operon’ which hydrolyzes lactose into galactose and glucose. It
complements PhoA as it is only active in the cytoplasm; when targeted to the
periplasm it becomes trapped in the membrane, and inactive. Its activity can also
be measured colorimetrically, as it turns the chromogenic substrate X-gal (5-
bromo-4-chloro-3-indoyl-β-D-galactoside) blue (Manoil, 1991).
To avoid false-negatives, reporter activity is usually normalized against
protein expression. Protein expression is typically measured by Western-blotting
or immunoprecipitations (IPs) (van Geest and Lolkema, 2000). Thus, analyzing
many fusions is often labor intensive. A disadvantage with LacZ is that it may
generate false-positives as a result of many artifacts, e.g., saturation of the export
machinery. In contrast, PhoA is reported to be more reliable because an active
fusion has to be successfully exported to the periplasm. In principle, a
combination of PhoA / LacZ reporters to the same sites in the membrane protein
is best. Unfortunately, ambiguous high LacZ and PhoA reporter activities to
identical fusion sites have been reported in many cases (van Geest and Lolkema,
2000).
Papers I and II
This thesis deals with the development of GFP as a high-throughput cytoplasmic
membrane protein topology reporter. GFP can be used in combination with the
periplasmic reporter PhoA, to rapidly establish the C-terminal tail orientation of
a membrane protein. The usefulness of combining this information with
bioinformatics to generate reliable topology models is shown.
25
2.2 High-throughput topology mapping of E. coli membrane proteins
High-throughput topology mapping requires a methodology that can
simultaneously handle many membrane proteins, is reliable, robust, and easy to
use. We have found that this is most easily accomplished for E. coli and
Saccharomyces cerevisiae membrane proteins in their respective hosts, by
combining topology prediction with minimal experimental information (Paper I;
Kim et al., 2003). Here we will focus only on the high-throughput topology
mapping of membrane proteins in E. coli. Topology prediction is best generated
by a ‘consensus approach’ or by constraining TMHMM (as explained in section
2.1.2). For analyzing many membrane proteins in E. coli, in favor of the other
approaches (section 2.1.3), minimal experimental information is best obtained
using single end-to-end C-terminal reporter-protein fusions.
2.2.1 A consensus approach for generating topology models
For about 80 out of the predicted 737 multispanning membrane proteins in E.
coli, five prediction programs (section 2.1.1) agree on the location of the N-
terminus, but disagree on the location of the C-terminus because of - plus or
minus - one TM segment. When the analysis of such cases was applied to a
membrane protein test set of known topology, the correct topology could always
be inferred from either one of the two majority predictions (Nilsson et al., 2000).
Thus, the reliability of the prediction is very high when all the methods agree,
and the correct topology can be simply determined by assigning the C-terminal
tail location of the membrane protein.
2.2.2 Using GFP as a cytoplasmic membrane protein topology reporter in E. coli
Because of the artifactual tendency of historically used cytoplasmic reporters
(e.g., LacZ), it was decided that the development of a new topology reporter
would benefit greatly the C-terminal mapping of many membrane proteins in E.
26
coli. For this reason, we sought to establish if GFP could be used to monitor
membrane protein topology. GFP was selected because it is incorrectly folded
and does not fluoresce when targeted to the periplasm of E. coli with a Sec-type
signal peptide (Feilmeier et al., 2000). This finding suggested that it would be
likewise inactive when fused to periplasmic membrane protein segments.
Importantly, GFP is compatible with the aforementioned high-throughput
criterion; fluorescence from E. coli cells expressing membrane protein-GFP
fusions is easy to measure, and only the amount of protein that is membrane
embedded is fluorescent (Paper III). To test if GFP could be used to assign the C-
terminal tail orientation of a membrane protein, GFP was fused to the C-terminal
tail of the membrane protein leader peptidase (Lep/periplasmic C-terminus) and
to its positive charge rearrangement mutant, inverted leader peptidase
(Lepinv/cytoplasmic C-terminus). Lep/Lepinv-GFP fusions were expressed
under standard conditions (section 3.1.4).
Induced expression at a temperature of 37°C produced clear differences in
Lep and Lepinv GFP fluorescence. The mutant Lepinv with the cytoplasmic C-
terminus was ~10-fold more fluorescent in liquid culture than Lep (Paper I). At
the lower temperature of 25°C the difference was less, therefore, cells were
always cultured at 37°C, Figure 4a. After Western-blotting using antibodies
directed against either GFP or Lep, it was apparent that the Lep-GFP fusion was
degraded, Figure 4c. As a further control, other membrane protein-GFP fusions
with cytoplasmic C-terminal tails were tested, Figure 4b. Membrane proteins
with periplasmic C-terminal tails contain less fusion, perhaps due to
degradation, and are consistently less fluorescent (Paper I).
27
Figure 4: GFP as an E. coli cytoplasmic topology reporter. A) Lep-GFP vs. Lepinv-whole-cell GFP fluorescence, B) ExbB-, SecF-, Lepinv-, Lep-, Sec- GFP whole-cell GFP fluorescence, C) Western-blotting of Lep-GFP and Lepinv-GFP after induced expression at 25°C (lanes 2, 5) or 37°C (lanes 3, 6); decorated with either Lep antibody (top panel) or GFP antibody (bottom panel), D) Contrasting PhoA (top graph)/GFP (bottom graph) activities for 12 E. coli membrane proteins that adhere to the majority-vote criterion (Paper I).
28
2.2.3 Combining C-terminal orientation analysis with a consensus-prediction approach
PhoA and GFP C-terminal fusions were made to an initial set of 12 membrane
proteins, MarC, PstA, TatC, YaeL, YcbM, YddQ, YdgE, YedZ, YgjV, YiaB, YigG,
and YnfA, out of a possible 80 or so E. coli membrane proteins that adhered to
our consensus criterion.
After expression of fusions, as before, GFP and PhoA activities were
measured. Cut-off values for what was considered ‘high’ or ‘low’ GFP
fluorescence were arbitrarily decided based on the differences between Lepinv-
GFP (cytoplasmic C-terminus), and Lep-GFP (periplasmic C-terminus)
fluorescence (Paper I). A ‘high’ fluorescent signal over a certain threshold (12,000
units) allowed a cytoplasmic location to be tentatively assigned. A ‘low’
fluorescent signal was considered ambiguous, as it is impossible to distinguish
between poorly expressing membrane proteins and those with periplasmic C-
terminal tails. The location of the C-terminus was established when the
fluorescent activity was in agreement with the activity from the periplasmic
reporter PhoA, Figure 4d (section 2.1.3).
Only two of the 12 membrane proteins (YaeL, YigG) had insufficient
differences between the PhoA and GFP activities to be certain of the location of
the C-terminus. For these two membrane proteins and a control, truncated GFP
fusions were made to clarify the C-terminal tail orientation. The final C-terminal
tail locations were then used to ascertain the correct topology predictions (Paper
I).
2.2.4 The reliability of topologies generated by a consensus approach
Encouraged by the consistent contrasting PhoA/GFP activity profiles used to
map topologies of 12 E. coli membrane proteins, C-terminal PhoA/GFP fusions
were made to another 37 E. coli membrane proteins and analyzed (Paper II). A
few membrane proteins included in this test set had a known topology. The GFP
activity from these membrane proteins were used to refine the original ad-hoc
29
cut-offs values made from contrasting Lep/Lepinv-GFP activity, in the
assignment of unambiguous C-terminal tail locations.
For 34 out of the 37 membrane proteins, contrasting PhoA and GFP
activities were sufficient to assign a C-terminal tail location. This brought the
total number of topologies mapped up to 46 (Paper II). After analyzing these 46
topologies it was clear that the majority prediction is most likely to offer the
correct topology; when 4 out of the 5 topology predictors agree the majority
prediction was correct - in regards to the location of the C-terminus - 90% of the
time.
How do these topology models compare to other topology studies? While
the topology prediction for TatC (an essential component of the TAT-translocase,
Palmer and Berks, 2003), with 6 TMs and cytoplasmic N-, C- termini was later
interpreted to have only 4TMs (Gouffi et al., 2002), other independent studies
have concurred with the topology prediction generated by our approach
(Behrendt et al., 2004). The topology determined for YaeL, a protein that belongs
to a family of membrane-embedded metalloproteases (Rudner et al., 1999), was
also the same as that previously determined for the related Bacillus subtilis protein
SpoIVFB as regards the location of the conserved HEXXH and NPDG motifs
relative to the inner membrane (Green and Cutting, 2000).
The consensus approach and the use of GFP as a topology reporter has
since been used by other researchers (Culham et al., 2003; Gandlur et al., 2004;
Jakubowski et al., 2004; McMurry et al., 2004; Severance et al., 2004).
2.2.5 Generating topology models by constraining TMHMM
Although the consensus approach is a useful strategy for generating reliable
topology models, it covers only ~10% of the α-helical membrane proteins in E.
coli. An alternative approach is to ‘feed’ into TMHMM the location of
experimentally determined amino acids, e.g., C-terminal tails. When this was
tested in silico, using a data set of 233 membrane proteins of known topology, the
30
overall prediction performance for TMHMM increased from ~70% unconstrained
to ~80% constrained (Melen et al., 2003). Somewhat unexpectedly, the prediction
performance actually gets worse if the residue to be fixed is not restricted to the
N- or C- terminus, but is chosen based on the "lowest probability loop residue"
selected from a TMHMM probability prediction profile. The main reason for this
is that loop regions predicted with greatest uncertainty, in fact, frequently
correspond to true transmembrane regions making this approach unfeasible
(Paper II).
To establish the C-terminal tail orientation, as before, dual PhoA/GFP
fusion reporters can be used (Papers I and II). The constraining of TMHMM for
generating improved topology models has been successfully applied to the entire
E. coli inner membrane proteome (Daley et al., 2005). Contrasting PhoA/GFP
activities were sufficient to assign unambiguous C-terminal tail locations for 75%
of the inner membrane proteome. Many of these proteins shared high homology
to another membrane protein in the genome. These membrane proteins were
used to assign C-terminal tail locations to membrane proteins not initially
mapped by this approach; the final coverage was ~90%. This topological
information has been extrapolated to assign topology maps to another 51,208
homologous membrane proteins in other bacterial genomes (Granseth et al.,
2005a).
2.2.6 Why does GFP work as a topology reporter?
Given that it is possible to export correctly folded GFP to many cellular
organelles (Tsien, 1998), including the periplasm of E. coli with a Sec independent
TAT-signal peptide (Thomas et al., 2001), why is GFP not fluorescent in the
periplasm when targeted to this compartment with a Sec-type signal sequence?
As it is possible, after acid-base treatment, to refold periplasmic GFP so that it
becomes fluorescent, it suggests that Sec-exported GFP is simply incorrectly
folded (Feilmeier et al., 2000). Our results indicate that the misfolded GFP is
31
sensitive to proteolysis when fused to periplasmic membrane protein segments
(Papers I and II); similar degradation has been noted for a few soluble proteins
terminally fused to membrane protein segments (Pourcher et al., 1996). GFP and
PhoA have now been used to assign the C-terminal tail location of over 500 E. coli
membrane proteins. In 71 out of 72 of the cases where the C-terminus of the
membrane protein was convincingly established beforehand (i.e., 3D-structure or
biochemical analyses), the PhoA/GFP assignments were in total agreement
(Daley et al., 2005).
2.2.7 Comparing 2D maps to 3D-structure
How often do topology predictions get it right? This is difficult to address as
there are so few membrane protein structures. If we consider topology as the
number of TM segments and their orientation relative to the membrane, the
constrained TMHMM topology predictions, compared to structure, are more
than 80% correct; the most frequent error is to leave one TM out. If we include
identifying reentrant loops, interfacial helices, and the exact positioning of
helices, topology predictions are (presently) only a first-step towards
understanding structure-function relationships. Understanding structural details
to this level is typically only possible with a high-resolution structure; section 3
will expand on this challenge.
2.2.8 Summary of high-throughput membrane protein topology mapping
In the absence of a 3D structure, one way to gain structural information of any
membrane protein is to determine its topology, i.e., the number, position, and the
overall in-out orientation of TMs relative to the membrane. In E. coli, this step is
usually accomplished by using reporter enzymes such as PhoA or LacZ fused to
different portions of the membrane protein. Usually, the number of reporter
fusions that needs to be made and analyzed for a complete topology
32
determination is equal to or larger than the number of TMs in the membrane
protein, thus requiring significant experimental effort.
We have shown that a reliable membrane protein topology can be simply
and rapidly deduced from a combination of in silico topology predictions and
single C-terminal PhoA/GFP reporter-protein fusions (Paper I). Although this
approach might have been possible using classical PhoA and LacZ fusions, GFP
offers an attractive alternative; the assay requires little experimental set-up,
measurements are completed in seconds, and as the GFP fluorescence is linear to
the amount which is folded - in contrast to enzymatically active fusions - GFP
activity does not need to be normalized to (quantified) protein expression (Paper
I). Indeed, after ambiguous results with classical PhoA/LacZ fusions, GFP has
been used to clarify the topology of the ABC transporter, DrrB (Gandlur et al.,
2004).
After a few modifications, this approach was possible on a larger scale
format (Paper II), and was extended to determine C-terminal locations, and
subsequently constrained TMHMM topology models for the entire E. coli inner
membrane proteome (Daley et al., 2005). This proteome information has been
used to up-date the Swiss-Prot and NCBI databases.
33
3.1 Membrane protein overexpression
One of the main obstacles towards understanding membrane proteins is the
difficulties associated with obtaining pure material for biochemical and
structural analysis (Grisshammer and Tate, 1995). Most membrane proteins
overexpress very poorly - typically less than < 1 mg/L - if they do at all. This is a
huge problem. Recently, in the magazine Nature it was stated that “… labs
around the world aim to add membrane proteins (structures) to international
databases over the next five years. But to do so, they must first be able to churn
out milligrams of easily purified protein ” (Hoag, 2005).
3.1.1 Limited availability of biogenesis factors and/or lipid space may hamper membrane
protein overexpression
Why do membrane proteins overexpress poorly? Intuitively, it seems that there
might be a limit to the availability of membrane protein biogenesis components
and space available in the lipid bilayer. Not only does the overexpression of
membrane proteins require the availability of components like, e.g., SRP and the
Sec translocon, to faithfully target and insert multiple copies of a membrane
protein into a suitable lipid bilayer, but the lipid bilayer is also obliged to
accommodate this ‘extra’ protein without compromising the membrane integrity
of the cell (Drew et al., 2003). In support of this idea are the following
observations;
- it has been shown that upon overexpression of membrane proteins in E.
coli SRP is titrated (Valent et al., 1997), - that the overexpression of membrane
proteins in yeast can lead to activation of the unfolded protein response (UPR)
(Griffith et al., 2003) (a mechanism against ER stress caused by unfolded protein
(Kaneko and Nomura, 2003), - by keeping expression levels low enough to
reduce the UPR response, one can increase the amount of functionally expressed
membrane protein (Griffith et al., 2003), - that the functional expression of the
34
serotonin transporter in insect cells can be enhanced nearly 3-fold by co-
expressing ER luminal folding chaperones calnexin, and to a lesser degree,
calreticulin and BiP (Tate et al., 1999).
In terms of lipid capacity, it was shown that expressing GPCRs in the eye
of the fly - a membrane dedicated almost exclusively to the GPCR rhodopsin - is
highly successful (Eroglu et al., 2002), and that the bacterium Lactococcus lactis is
a suitable host for membrane protein overexpression perhaps because of the
small number of endogenous membrane proteins (Kunji et al., 2003). Lastly, E.
coli mutant strains with improved membrane protein overexpression
characteristics were isolated (Miroux and Walker, 1996). After overexpression of
a membrane protein, the cells were biochemically analyzed and visualized under
an electron microscope; it was clear that for one of these strains the cell had
proliferated extra internal membranes (Arechaga et al., 2000).
3.1.2 ‘Trial-and-Error’
As the focus of the majority of expression studies has been to obtain functionally
expressed membrane protein, rather than analyzing membrane protein
overexpression per se, we do not know how generic the aforementioned problems
are. What is clear is that this is not the whole story. There are many other case-
by-case examples of further factors which may influence the ability to obtain
well-expressed functional membrane protein;
- the membrane protein is susceptible to degradation, e.g., by the ATP-
dependent integral membrane protein protease FtsH (Ito and Akiyama, 2005), -
the membrane protein is unstable if overexpressed without its complex
partner(s) e.g., SecY, the pore forming component of the translocon, is rapidly
degraded if expressed without SecE (Ito and Akiyama, 2005; Kihara et al., 1995), -
the composition of the membrane is unsuitable (Freedman et al., 1999), - the
membrane protein needs to be post-translationally modified; impossible in most
bacterial expression systems, e.g., N-linked glycosylation (Tate and Blakely,
35
1994) - the mRNA for the membrane protein is unstable (Afonyushkin et al.,
2003; Arechaga et al., 2003).
In principle, by studying the expression of a large number of membrane
proteins one could find some correlation between membrane proteins that
‘express poorly’ to those that ‘express well’ (Drew et al., 2003), e.g., membrane
proteins with multiple TMs are thought to give lower expression than those
containing fewer TMs (Grisshammer and Tate, 1995). Unfortunately, von Heijne
and co-workers did not find any correlation in any amino acid sequence
parameter tested between poor vs. well expressing membrane proteins for more
than 300 E. coli membrane proteins expressed in E. coli, e.g., size, degree of
hydrophobicity, number of TMs (Daley et al., 2005).
Our current lack of understanding means that membrane protein
‘expressibility’ cannot be predicted prior to experimental testing.
3.1.3 Choosing a membrane protein overexpression host
There are many approaches used in the overexpression of membrane proteins. In
general, it is preferred to overexpress membrane proteins into the membrane, as
the success rate of refolding membrane proteins from inclusion bodies is very
low (Drew et al., 2003). For obvious reasons, one would like to overexpress
membrane proteins in their endogenous host. This is not always possible; the
higher the organism from which the membrane protein comes from, the greater
the cost and time needed for successful overexpression in the most comparable
host to the membrane protein.
E. coli is often the first vehicle tested in the overexpression of both pro-
and eukaryotic membrane proteins; it is widely available, it is easy to work with
it, it is very versatile, and is cheap to use. Because of these factors numerous
membrane protein structures have been solved from material overexpressed in E.
coli; transporters (Abramson et al., 2003; Huang et al., 2003; Hunte et al., 2005;
Locher et al., 2002; Ma and Chang, 2004; Reyes and Chang, 2005; Yamashita et al.,
36
2005) respiratory proteins (Abramson et al., 2000; Bertero et al., 2003), ion
channels (Chang et al., 1998; Doyle et al., 1998; Dutzler et al., 2002), and other
channels (Fu et al., 2000; Khademi et al., 2004; Savage et al., 2003; Van den Berg et
al., 2004).
Unfortunately, there is only one example of a eukaryotic membrane
protein structure elucidated from overexpressed material, i.e., the rat voltage-
gated shaker K+ channel Kv1.2 (Long et al., 2005a). In this case the material was
not obtained by expression in E. coli, but in the yeast Pichia pastoris. Other
eukaryotic membrane protein structures have been solved, but with a membrane
protein that was isolated from naturally abundant sources, e.g., rhodopsin from
the bovine eye (Palczewski et al., 2000). While eukaryotic membrane proteins can
express well in E. coli, see e.g., (Quick and Wright, 2002), expression levels are
typically several orders of magnitude less than their bacterial counter-parts (Tate,
2001). If we want to solve eukaryotic membrane protein structures it seems that
the development of new E. coli strains or the use of hosts other than E. coli is
required. Indeed, it is possible to overexpress functional eukaryotic membrane
proteins in yeast, insect and mammalian cells, e.g., GPCR’s in yeast (Sarramegna
et al., 2002; Schiller et al., 2001), serotonin transporter in Sf9 cells using
baculovirus system (Tate et al., 1999), and rat glutamate transporter in BHK cells
using Semiliki Forest virus system (Raunser et al., 2005). Interestingly, the Gram-
positive bacterium Lactococcus lactis has shown to be a successful host for the
overexpression of eukaryotic mitochondrial carriers (Kunji et al., 2005; Kunji et
al., 2003).
3.1.4 General strategies for membrane protein overexpression in E. coli
There are many different strategies in each of the host systems used for
overexpression of a membrane protein. In general they involve adjusting the
type of promoter/plasmid system, culture conditions, and the protein itself by
37
truncations, mutations, and/or additions of various fusion tags. Here, we will
focus only on the expression of membrane proteins in E. coli.
In E. coli, the membrane protein to be expressed is usually cloned into a plasmid
under control of a tightly regulated and inducible promoter. The number of
plasmid copies per cell, the strength of the promoter, and the homogeneity of the
inducer across the cell population can all affect final yields. In general,
membrane protein overexpression strategies are the same as the ones used for
soluble proteins, see e.g., (Sorensen and Mortensen, 2005), with a few additional
points worth mentioning (outlined forthwith).
Membrane proteins typically inhibit cell growth when overexpressed,
thus it is advisable to use a tight promoter system, e.g., the pBAD promoter
(Morgan-Kiss et al., 2002), or the T7-based promoter in combination with the
plasmid pLysS (Pan and Malcolm, 2000). As membrane proteins typically contain
an N-terminal targeting signal, fusing a soluble protein to the N-terminus of the
membrane protein might be problematic, additionally so, for membrane proteins
with periplasmic N-terminal tails; large N-terminal domains are almost (ProW is
a notable exception), non-existent in the E. coli inner membrane proteome (Daley
et al., 2005). If the membrane protein naturally has a large extra-cytoplasmic N-
terminal domain e.g., like many GPCRs, an N-terminal signal sequence has
shown to be required for functional expression (Grisshammer et al., 1993; Weiss
and Grisshammer, 2002; Yeliseev et al., 2005). Indeed, for all GPCRs in sequenced
genomes N-terminal tails longer than 60 amino acids are considerably more
likely to contain a signal peptide (Wallin and von Heijne, 1995). Membrane
proteins seem to express better in E. coli at a temperature of 20-25°C rather than
37°C. Although expression at lower temperatures is often successful for soluble
protein as well (Sorensen and Mortensen, 2005), membrane proteins maybe more
sensitive to temperature as they fold co-translationally (section 1.2.2), i.e., over
this temperature range, the translation rate decreases linearly with temperature
38
(Farewell and Neidhardt, 1998). Plasmids with a cytoplasmic antibiotic resistance
marker (e.g., kanamycin) are recommended over periplasmic antibiotic resistance
markers (e.g., β-lactamase), as it may avoid any extra workload on the Sec
translocon (Ito and Akiyama, 1991).
3.1.5 The BL21(DE3) pET system
In this thesis, membrane proteins were overexpressed using the BL21(DE3) pET
system from a modified pET-28a plasmid (Waldo et al., 1999), which harbors a
kanamycin resistance marker. Protein expression in the pET vector is under the
control of the strong T7 promoter that in concert with the E. coli strain
BL21(DE3), is switched on in the presence of isopropyl-β-D-thiogalactoside
(IPTG), i.e., IPTG induces expression of the gene encoding the T7 RNA
polymerase that is located on the chromosome integrated λ phage gene DE3
(Studier and Moffatt, 1986). As membrane protein overexpression can be toxic
(Miroux and Walker, 1996), the BL21(DE3) strain is typically used in combination
with a plasmid which constitutively expresses a T7 lysozyme gene, i.e., pLysS/E.
The T7 lysozyme has a low affinity for the T7 RNA polymerase, and dampens
‘leaky’ expression (Pan and Malcolm, 2000). Although, some argue that the pET-
based system is not applicable to membrane proteins because it is too strong
(Wang et al., 2003), these plasmid and strain combinations have been used
extensively to successfully overexpress membrane proteins and to obtain
material for structure determination (Kastner et al., 2000; Miroux and Walker,
1996).
3.1.6 Membrane protein purification
All membrane proteins are routinely purified using a detergent (Seddon et al.,
2004). A detergent at a critical concentration will form a hydrophobic pocket,
typically a spherical micelle that retains the integrity of the membrane protein as
it extracts it from the lipid. Solubilization of membranes with detergent results in
39
a mixture of detergent, protein and lipid, in which the amount of lipid is
progressively reduced as the membrane protein is purified in a buffer containing
detergent (Seddon et al., 2004), e.g. by immobilized metal affinity
chromatography (IMAC), anion/cation exchange, size-exclusion
chromatography, etc. Finding the right detergent that retains the function of the
membrane protein can be tricky. The use of shorter chain detergents, e.g., n-
octyl-β-D-glycopyranoside to increase the number of protein-protein contacts for
protein crystallization, most often results in the membrane protein aggregating
instead. It is becoming increasingly clear that removal of too much lipid can be
detrimental (Fyfe et al., 2005; Long et al., 2005b). A number of structures have
revealed that certain lipids can play definitive functional and/or structural roles,
e.g., cardiolipin in the purple bacteria reaction centre (Fyfe et al., 2004).
Papers III and IV
One of the biggest obstacles towards understanding membrane protein
structure-function relationships is the difficulties associated with obtaining
milligram quantities of membrane protein. This thesis tackles this challenge by
developing GFP-based methodology to monitor membrane protein
overexpression in the E. coli membrane, and to use GFP as an aid in the
subsequent purification of membrane proteins.
40
3.2 High-throughput membrane protein overexpression in E. coli
Traditional membrane protein overexpression screening methods in E. coli are
quite laborious. In order to remove inclusion bodies, membranes are typically
first isolated from whole-cells before the overexpressed protein - via SDS-PAGE -
is detected by Coomaisse staining and/or Western-blotting; neither of which
methods are the most ideal for quantifying protein expression. Here, we present
an alternative, superior method. We show that the amount of GFP fluorescence
from E. coli cells expressing membrane protein-GFP fusions is a simple, fast, and
accurate estimate of expression. Not only does it complement the topology
mapping of membrane proteins (section 2.2), but it is easily transferable to many
laboratories, and enables the protein to be visualized during detergent
solubilization and purification.
3.2.1 Inclusion bodies of membrane protein-GFP fusions are not fluorescent
Waldo and co-workers showed that a C-terminal GFP fusion could be used to
reliably estimate the overexpression of soluble proteins in E. coli (Waldo et al.,
1999). In short, if a soluble protein was expressed into inclusion bodies, GFP did
not fold and was not fluorescent. In contrast, if the soluble protein was correctly
folded, GFP did fold and was fluorescent. The use of GFP to monitor the
overexpression of soluble proteins in E. coli has been reinforced by others
(Hedhammar et al., 2005). GFP is ideal for this purpose as it requires no
substrates for its fluorescence, is stable, and is easy to measure and quantify
(Tsien, 1998). To ascertain the reliability of monitoring membrane protein
expression with a C-terminal GFP moiety, GFP was fused to the C-terminus of a
number of well-characterized pro- and eukaryotic membrane proteins (Paper
III). This was important to verify, as the folding pathway for membrane proteins
is very different to soluble proteins (section 1.2.3) (Drew et al., 2003). Two of the
test-set membrane proteins (rat olfactory GST-GPCR and M13-procoat) were
41
known to express into inclusion bodies (Kiefer et al., 1996; Krebber et al., 1997).
The GFP used to test this contains the folding (F64L) and chromophore (S65T)
mutations (Tsien, 1998), and has been evolved in E. coli to have 42-fold higher
(soluble) expression than ‘wild-type’ GFP (Crameri et al., 1996). Under typical
culture conditions in the E. coli strain BL21(DE3)pLysS, membrane protein-GFP
fusions were overexpressed essentially as described in section 3.1.5. This system
was also the same as that shown to be successful for monitoring the expression of
soluble protein-GFP fusions in E. coli (Waldo et al., 1999).
After overexpression, cells were lysed and fractionated by differential
centrifugation. The amount of membrane protein-GFP fusion in these fractions
was measured by a combination of fluorescence and quantitive immunoblotting
(Paper III). After analyzing membrane protein fractions (high-speed spin) it was
clear that GFP fluorescence from isolated membranes was a good estimate of
expression. In contrast, it was apparent that not all of the membrane protein-GFP
fusion left in the unbroken E. coli cells (low-speed spin) was fluorescent. This was
particularly obvious for the M13 / GST-GPCR GFP fusions which were
previously shown to express into inclusion bodies. This was alternatively
visualized by Western-blotting an equivalent amount of GFP fluorescence from
the low-speed and the high-speed spin fraction. Inclusion bodies from M13-GFP
and GST-GPCR-GFP were in the order of ~50 and 90% of the total expressed
protein, respectively. They were later isolated by a sucrose step gradient and
were not fluorescent (Paper III); we have since verified this with other
membrane protein-GFP fusions.
Therefore, if the overexpressed fusion protein ends up in the insoluble
fraction as inclusion bodies GFP is not florescent; in contrast, if the fusion is
expressed in the cytoplasmic membrane, GFP does fold and is fluorescent, and
the amount of GFP fluorescence correlates with the amount of protein integrated
in the E. coli membrane (Papers III & IV). Recently, it has been shown that the
highest amount of GFP fluorescence from LacS-GFP overexpression in E. coli -
42
LacS is an Streptococcus thermophilus lactose transporter - coincides with
maximum LacS-GFP transport activity and not maximum LacS-GFP production
as judged by Western-blotting (Geertsma, 2005); GFP in this case only monitored
the amount of functionally expressed membrane protein.
3.2.2 GFP tagging works only for membrane proteins with a cytoplasmic C-terminus
As shown in section 2.2, GFP is inactive when targeted to the periplasm with a
Sec-type signal peptide (Paper I), thus, to use this approach membrane proteins
must have their C-terminus localized to the cytoplasm. As most membrane
proteins acquire a Cin topology this is only a minor drawback. The percentage of
multispanning membrane proteins with a Cin topology has been experimentally
measured at 80 and 83% in the E. coli and Saccharomyces cerevisiae genomes,
respectively (Daley et al., 2005; Kim, In preparation), and is predicted to be 70-
75% in all other sequenced genomes (Wallin and von Heijne, 1998).
3.2.3 GFP as a membrane protein folding indicator in whole cells
Since GFP is a slow folding protein ~t1/2 30 min (Fukuda et al., 2000; Waldo et al.,
1999), it is conceivable that GFP works as a folding indicator - when placed at the
C-terminal end of a membrane protein - because there is sufficient time for the
membrane protein to misfold before GFP has folded. The misfolded membrane
protein is most likely degraded by the cell or retained as inclusion bodies (Chang
et al., 2005). Nevertheless as GFP is very stable, once folded, it can remain
fluorescent even if the membrane protein itself is later degraded. This is evident
from cytosolic GFP frequently found in the supernatant of recovered membranes
(Paper III). It seems that the amount of cytosolic GFP is proportional to the
stability of the membrane protein. Similar observations have been made for
membrane proteins fused to other soluble protein tags, e.g., PhoA (Danielsen et
al., 1995; Pourcher et al., 1996). This means that an overexpression estimate made
from whole cells can be misleading. For this reason, the most accurate way to
43
estimate overexpression is to measure GFP fluorescence in isolated membranes,
as cytosolic GFP is not recovered in this fraction (Paper III). However, as the
isolation of membranes is somewhat laborious, to be ‘high-throughput’ a reliable
estimate has to be possible from whole-cells rather than recovered membranes.
The reliability that can be placed on whole-cell estimation was
investigated more thoroughly. In short, 48 E. coli membrane protein-GFP fusions
were overexpressed, and in order to cover different membrane protein
overexpression levels, 9 were purified. Satisfactorily, there is a clear correlation
between the amount of whole-cell fluorescence and the amount of purified
membrane protein-GFP fusion (Paper IV).
3.2.4 GFP-based screen to optimize membrane protein overexpression
For a number of membrane protein-GFP fusions the whole-cell GFP fluorescence
was measured from 1 ml and 1 L cultures. As there were no significant
differences in the amount of fluorescence per ml, 1 ml is a satisfactory culture
volume for overexpression screening, Figure 5 (Paper IV).
Figure 5: The comparison of GFP fluorescence from 13 membrane-protein GFP fusions cultured in either 1 ml or 1L.
44
As 5 ml cultures grown in a 24-well format are comparable to that of the 1 ml
culture condition, the expression of many membrane proteins can be rapidly
tested. This was demonstrated by the global analysis of the E. coli inner
membrane proteome (Daley et al., 2005).
Based on the fluorescence from membrane protein-GFP fusions it is
possible to quickly optimize overexpression of a single membrane protein.
Slightly varying standard culture parameters can dramatically change
overexpression yields, e.g., IPTG induction at cellular OD600 of 0.4 compared to
0.6, or IPTG concentration of 0.1 compared to 0.4 mM, can almost double yields
of the putative amino acid transporter YbaT (Drew, In preparation). Each
membrane protein can respond differently to these parameters in different BL21
strains i.e., BL21(DE3), BL21(DE3)pLysS, C41/43 walker strains (Miroux and
Walker, 1996). At present we are determining the parameters worth screening. So
far, the most consistent parameter for improving yields is to induce expression at
a temperature of 20-25°C instead of 37°C (Paper IV).
3.2.5 In-gel GFP fluorescence
The monitoring of membrane protein expression from whole-cells can be further
improved by subjecting a whole-cell sample to standard SDS-PAGE. GFP
remains partially intact under these conditions (were most proteins are
denatured), and exposure of the polyacrylamide gel to UV-light enables
detection of the GFP with a CCD-camera (Drew, In preparation). Thus, the
amount of full-length membrane protein-GFP fusion can also be monitored,
Figure 6.
45
Figure 6. Verification of whole-cell fluorescence from liquid culture with an in-gel fluorescence assay. A. Expression of YedZ-GFP, and quantification of fluorescent 'bands' correlates with whole-cell fluorescence. B. Optimizing expression of YciS-GFP in the Bl21(DE3)pLysS strain by lowering temperature after induction to 30 or 25 degrees and culturing cells after induction from 4 - 22 hours. (Drew, In preparation)
46
3.2.6 GFP-based purification pipeline
To establish a generic purification procedure or ‘pipeline’, a His8 tag was fused to
the C-terminus of GFP, i.e., gives membrane protein-GFP-His8. After standard
membrane protein-GFP-His8 overexpression and isolation, a number of
membrane proteins were purified by a combination of IMAC and size-exclusion
chromatography. Milligram amounts of E. coli membrane protein fusions, similar
levels to that found by others e.g., (Eshaghi et al., 2005), were routinely purified
from one liter cultures (Paper IV).
The GFP is a useful tool in the purification of membrane proteins. With
GFP present one can monitor the ability of different detergents to extract an
overexpressed fusion protein from the membrane. Even though the final choice
of detergent will also depend on the ability to preserve the membrane protein in
a fully functional state, poorly extracting detergents can be quickly eliminated in
this step. The GFP moiety of the membrane protein-GFP fusion also enables the
purification to be followed visually, and the binding efficiency of a fusion to a
column can be seen directly. Lastly, the GFP moiety of the membrane protein-
GFP fusion means it is possible to quickly and accurately determine protein
concentrations (Paper IV).
3.2.7 Recovery of membrane proteins from GFP fusions using a site specific protease
There are a few cases where purified membrane protein-GFP fusions have been
shown to be functional in vitro, e.g., (Quick and Wright, 2002). Membrane
protein-GFP fusions can also be active in vivo, as we showed for the essential E.
coli membrane protein YidC (see section 1.2.2), i.e., YidC-GFP is functional at
expression levels similar to endogenous amount of YidC, and localizes to the E.
coli cell-poles (Urbanus et al., 2002). However, as GFP may interfere with the
function of the protein and hinder protein crystallization, a Tobacco Etch Virus
protease cleavage site (ENLYFQG/S) was added to clip off the GFP-His8 moiety
from the membrane protein-TEV-GFP-His8 fusion. TEV protease was chosen
47
because; - it is a non-commercial specific protease, - it is easily produced in large
quantities, - and is active in the presence of many detergents (Mohanty et al.,
2003).
TEV protease was tested by incubating purified YbaT-TEV-GFP-His8 (a
putative amino acid transporter), GltP-TEV-GFP-His8 (a glutamate transporter)
(Wallace et al., 1990), and YedZ-TEV-GFP-His8 (a protein of unknown function)
with His10-TEV protease. After incubation, digestion was complete, and the
His10-TEV protease, undigested membrane protein-TEV-GFP-His8 fusion and
clipped-off GFP-His8 were easily removed by batch-binding material to metal
affinity resin (Paper IV). It was possible to recover intact functional full-length
membrane proteins from membrane protein-GFP fusions. Throughout, GFP
fluorescence could be used to monitor both the effectiveness of the TEV
digestion, and the purity of the recovered membrane protein.
Are isolated membrane proteins functional? Purified GltP was
reconstituted into lipid vesicles and its activity was compared to purified GltP-
His8. There was no difference in the glutamate uptake activity between GltP
recovered from GltP-TEV-GFP-His8 and purified GltP-His8 (Paper IV). For
further verification the YedZ protein was analysed in detail (section 4).
3.2.8 How does this GFP-based method compare to other high-throughput approaches?
Many high-throughput membrane protein overexpression initiatives have
estimated membrane protein expression by the quantification of ‘bands’ visible
on a polaycrylamide gel after Coomaisse staining and/or Western-blotting
(Dobrovetsky et al., 2005; Korepanova et al., 2005); in these cases membranes are
first isolated to remove any inclusion bodies. However, Coomassie staining is
inaccurate, lacks sensitivity, and Western-blotting is time consuming and not
always reliable, i.e., membrane proteins with different hydrophobicity bind
Coomaisse or can transfer to a semi-solid support inconsistently. To rapidly
judge (in a 96-well format) the expression of many His-tagged membrane
48
proteins in E. coli, Nordlund and co-workers developed a dot-blot detection
method which does not require an electrophoretic transfer step (Eshaghi et al.,
2005). This method is elegant, and is capable of simultaneously screening the
expression and detergent solubilization efficiency of numerous membrane
proteins. The main disadvantage of this method is that the amount of full-length
protein is estimated only after the binding and elution of overexpressed material
to 96-well coated Ni-NTA resin. This is expensive, and maybe unaffordable for
many laboratories.
3.2.9 Summary of high-throughput membrane protein overexpression
One of the main obstacles towards understanding membrane proteins is the
difficulties associated with obtaining pure material for biochemical and
structural analysis (Grisshammer and Tate, 1995). Unfortunately, in comparison
to soluble proteins, overexpression of membrane proteins typically yields little
protein. Novel approaches are badly needed to identify ‘workable’ material.
In this thesis, we have shown that a simple C-terminal GFP fusion is a
reliable folding indicator for membrane proteins expressed in E. coli with a
cytoplasmic C-terminal tail (Paper III). By incorporating a C-terminal His8 tag to
the end of GFP, and a site for the TEV protease to clip off the GFP-His8 fusion,
we show we can use an efficient, standardized purification protocol to purify
protein to yields >1 mg per liter of culture (Paper IV). As proof-of-principle of
this purification pipeline an E. coli membrane protein of previously unknown
function was characterized (next section).
49
4. Characterization of the membrane protein YedZ
YedZ belongs to a bacterial protein family of unknown function, UPF0191
(www.sanger.ac.uk). YedZ originally attracted our attention since cells
overexpressing YedZ-TEV-GFP-His8 were orange instead of green. Although its
orange colour suggested binding of some kind of cofactor, none of the Web-
based prediction tools used to analyze its amino acid sequence identified any
potential cofactor binding motifs.
4.1.1 A test case for the GFP-based purification pipeline: YedZ
Purification of the YedZ protein by our purification pipeline yields a protein that
is orange. Under both oxidizing and reducing conditions, optical spectra of the
purified YedZ protein were recorded (Paper IV). Under reducing conditions the
YedZ protein demonstrated an absorption spectra characteristic of cytochrome
b5, with a maximum α-peak at 558 nm. This annotation could be corroborated by
analysis of the purified sample by means of mass spectrometry. A major
monoisotopic peak was identified at 617 Da; a mass equal to that of heme b. With
an assay for heme, the YedZ to heme ratio was calculated at 1:1. Because of an
atypical absorption peak in the 450-500 nm regions, YedZ was also suspected to
bind flavin. This was confirmed by subjecting YedZ to reverse-phase liquid
chromatography. YedZ contains FMN rather than FAD, with a molar ratio of 0.7
FMN per YedZ molecule (Paper IV).
4.1.2 YedZ is a novel integral membrane flavocytochrome
How does YedZ bind the heme b? YedZ is a very hydrophobic membrane protein
(23% leucine and 7% valine), and consists of six TMs connected by very short
loops (Paper I).
The topology of YedZ is consistent with an alignment of bacterial
homologs, whereby putative loop regions fall into stretches of amino acid
50
residues with low similarity, Figure 7a. In contrast, the transmembrane segments
II-V are very well conserved in YedZ and its homologs. In helix III there is one
conserved histidine, and in helix V there are two. The two parallel histidines (H92
in helix III and H164 in helix V) close to the periplasmic face of the membrane are
clearly the most likely histidine pair for ligation of the heme b, Figure 7b. It is
plausible that these transmembrane segments form a four-helical bundle, and
coordinate heme b in a similar manner as in other cytochrome b containing
membrane proteins, see e.g., (Iwata et al., 1999).
How does YedZ bind FMN? The coordination of heme b by integral
membrane proteins has been well described, but binding of FMN in the plane of
the membrane is unprecedented. Based on our current knowledge of flavin-
binding protein structures it is difficult to envisage how the very short and
unconserved loops of YedZ could fold to bind FMN. To date, all examples of
flavin binding protein structures deposited in the Protein Data Bank have a fold
architecture of at least 100 amino acids to envelope the ligand e.g., a TIM-barrel
or a Rossman type fold (Fraaije and Mattevi, 2000; Hefti et al., 2003). On the other
hand, the conserved amino acids W and Y as observed for globular flavodoxin
proteins (Lostao et al., 2003) at the start and in the middle of transmembrane
segment V, could be the key residues for FMN binding in YedZ. Thus, in light of
the topology for YedZ, the unknown YedZ protein was annotated as the first
integral membrane flavocytochrome (Paper IV).
51
A.
B.
Figure 7. YedZ alignment and membrane topology A. Representative amino acid sequence alignment of YedZ bacterial homologs; ECOLI (E. coli) YERPE (Yersinia pestis), BRUME (Brucella melitensis) RHIME (Rhodospirillum rubrum) AGRT (Agrobacterium tumefaciens) CAUCR (Caulobacter crescentus), RALSO (Ralstonia solanacearum), PASMU (Pasteurella multocida), PSEAE (Pseudomonas aeruginosa) XANAC (Xanthomonas campestris), DEIRA (Deinococcus radiodurans) The bottom sequence is the consensus outlined in red. Predicted transmembrane segments for E. coli YedZ are marked with gray bars and numbered I-VI. 100% conserved amino acid residues are in red text and highlighted in yellow. B. YedZ consists of 6 hydrophobic transmembrane segments, it has N- and C- terminal cytoplasmic ends and short interconnecting loops.
1 15110 20 30 40 50 60 70 80 90 100 110 120 130 140(1)-QVTWLKVC-------LHLAGLLPFLWLVWAINHG---GLGADPVKDIQHFTGRTALKFLLATLLITPLARYAKQPLLIRTRRLLGLWCFAWATLHLTSYALLELGVNNLALLGKELITRPYLTLGIISWVILLALAFTSTQ-AMQRKLG-YEDZ_ECOLI/7-199 (1)-HITWLKIA-------IWLAATLPLLWLVLSINLG---GLSADPAKDIQHFTGRMALKLLLATLLVSPLARYSKQPLLLRCRRLLGLWCFAWGTLHLLSYSILELGLSNIGLLGHELINRPYLTLGIISWLVLLALALTSTR-WAQRKMG-Y0G1_YERPE/7-206 (1)-KKKTPRPGQWKLW-LLYTAGFVPAVWTFYLGATG---QLGADPVKTFEHLLGLWALRFLILTLLVTPMRDLTG-ITLLRYRRALGLLAFYYALMHFTTYMVLDQGL-NLSAIITDIVRRPFITIGMISLALLVPLALTSNN-WSIRKLG-Y304_BRUME/9-210 (1)-MLSLFRII-------IHVCCLGPVAWLAWVLLSGDESQLGADPIKEIQHFLGFSALTILLIMFILGKVFYLLKQPQLQVLRRALGLWAWFYVVLHVYAYLALELGY-DFSLFVQELVNRGYLIIGAIAFLILTLMALSSWS-YLKLKMG-Y538_PASMU/1-201 (1)-RYWYLRLA-------VFLGALAVPAWWLYQAWIF---ALGPDPGKTLVDRLGLGALVLLLLTLAMTPLQKLSGWPGWIAVRRQLGLWCFTYVLLHLSAYYVFILGL-DWGQLGIELSKRPYIIVGMLGFVCLFLLAITSNR-FAMRKLG-YAJ1_PSEAE/2-198 (1)-PKRLHGPS---IW-ALYILGFLPAVWGFYLGATG---RLPGNAVKEFEHLLGIWALRFLIATLAITPIRDLFG-VNWLRYRRALGLLAFYYVMMHFLTYMVLDQTL-LLPAIVADIARRPFITIGMAALVLLIPLAVTSNI-WSIRRLG-YD82_RHIME/8-207 (1)-KTLVHAAA---LA-PIALLGWQ--FWQVWQSGSD---ALGADPVAEIEHRTGLWALRLLLITLAITPLRQLTGQAVVIRFRRMLGLYAFFYATVHLAAYLTLDLRG-FWTQIFEEILKRPYITVGFAAWLLLMPLAITSTQGW-MRRLK-YG46_XANAC/12-211 (1)PKRYQPAA----IW-SLYVIGLCPGLWYFYLAATG---GLGFNPVKDFEHLLGIWALRFLCLGLLVTPLRDLFN-VNLIAYRRALGLIAFYYVLAHFTVYLVLDRGL-ILGSIAGDILKRPYIMLGMAGLIILIPLALTSNR-WSIRRLG-YJ20_AGRT5/11-211 (1)-PYAWLGPG-------VVLGGLLPTVFLLWDALSG---GLGANPVKQATHQTGQLALIVLTLSLACTPARVWLGWTWAARIRKALGLLAAFYAVLHFGIYLRGQDFS--LGRIWEDVTERPFITSGFAALLLLLPLVLTSGK-GSVRRLGFYP37_DEIRA/7-198 (1)-KKRPSKLQDTLVYGLVWLACFAPLAWLAWRGYAG---ELGANPIDKLIRELGEWGLRLLLVGLAITPAARILKMPRLVRFRRTVGLFAFAYVALHLLAYVGIDLFF-DWNQLWKDILKRPFITLGMLGFMLLIPLAVTSTNGWVIRMGR-YR47_CAUCR/7-209 (1)-SLRAVRIA-------VWLLALVPFLRLVVLGATD---RYGANPLEFVTRSTGTWTLVLLCCTLAVTPLRRLTGMNWLIRIRRMLGLYTFFYGTLHFLIWLLVDRGL-DPASMVKDIAKRPFITVGFAAFVLMIPLAATSTN-AMVRRLGGYT80_RALSO/10-211 (1) RIA LWLAGLLP LWLVW G TG GLGADPVKDI H LG WALRLLLLTLAITPLR L G LIR RR LGLWAFFYALLHL AYLVLDLGL LG I DILKRPYITLGMIAFLLLIPLALTS WSIRKLG Consensus (1)
I II III IV
151 215160 170 180 190 200(151)-KHWQQLHNFVYLVAILAPIHYLW--SVKIISPQPLIYAGLAVLLLALR-------YKKLRSLFNYEDZ_ECOLI/7-199(139)-ARWQKLHNWVYVVAILAPIHYLW--SVKTLSPWPIIYAVMAALLLLLRYKLLLPRYKKFRQWFRY0G1_YERPE/7-206(139)-RRWSSLHKLVYIAIAGSAVHFLM--SVKSWPAEPVIYAAIVAALLLWRLARP--YLRTRKPALRY304_BRUME/9-210(143)-KWWFYLHQLGYYALLLGAIHYVW--SVKNVTFSSMLYLILSIMILCDALYG-LFIKRKGRSTSAY538_PASMU/1-201(141)-SRWKKLHRLVYLILGLGLLHMLW--VVRADLEEWTLYAVVGASLMLLR--LPSIARRLPRLRTRYAJ1_PSEAE/2-198(138)-QRWNKLHRLVYVIAAAGALHFAM--SVKVVGPEQMLYLFLVAVLVAWRAVRKR-FLRWRRQGTAYD82_RHIME/8-207(140)-RNWGRLHMLIYPIGLLAVLHFWW--LVKSDIREPALYAGILAVLLGWRVWKKLSARQTTARRSTYG46_XANAC/12-211(139)-SRWNTLHKLVYLVLIVGVLHFVL--ARKSITLEPVFYISTMVVLLGYRLVRPSIMTMKRNKRARYJ20_AGRT5/11-211(140)FARWTLLHRLVYLAAALGALHYWW--GVKKDHSGPLLAVLVLAALGLAR-------LKTPARLNRYP37_DEIRA/7-198(137)-AAWSRLHRLVYLIVPLGVAHYYL--LVKADHRPPIIYGAVFVALMLWRVWE----GRRTASKSSYR47_CAUCR/7-209(146)GRRWQWLHRLVYVTGVLGILHYWWHKAGKHDFAEVSIYAAVMAVLLGLRVWWVWRGARQGAIAGGYT80_RALSO/10-211(138) RW LHRLVYLIAILG LHYLW SVK EPIIYA VLAVLL RL R R Consensus(151)
V VI
52
4.1.3 The possible function of YedZ
To further understand YedZ, we analyzed the DNA surrounding the yedZ gene in
the E. coli genome. The gene encoding YedZ is in a putative operon together with
a gene encoding YedY. YedY encodes a soluble periplasmic protein. The putative
yedYZ operon possesses an upstream σ70 consensus sequence, suggesting that it is
likely constitutively expressed, Figure 8a. A region of YedY shares at least 25%
sequence identity to a molybdenum-molybdopterin (Mo-MPT) binding domain
that is present in soluble assimilatory nitrate reductases (NR) found in plants,
algae and fungi, Figure 8b. These enzymes catalyze the conversion of nitrate to
nitrite to assimilate inorganic nitrogen (Barber et al., 2002).
Figure 8. Analysis of the putative yedYZ operon. A. The putative yedYZ operon possesses a σ70 consensus sequence. The stop codon of yedY is immediately followed by the start codon of yedZ and the ribosome binding site (RBS) of the yedZ gene is in the end of the yedY coding region. B. YedY belongs to a protein family of oxidoreductase molybdopterin binding proteins Pfam P11605, and shares 25 % sequence identity to the Mo-MPT domain in eukaryotic assimilatory nitrate reductases. The example shown is tobacco NR NIA2. C. E. coli ∆tatC and control cells expressing YedY-HA were labeled with 35S-methionine. YedY-HA was immunoprecipitated with an antiserum against the HA-epitope tag and immuno-precipitates were analyzed by means of SDS-PAGE and fluorography (unpublished).
53
The amino acid signature for Mo-MPT in bacteria and eukaryotes is usually
different as bacteria further modify this domain to Mo-MGD (Campbell, 1996).
However, a recent 3D-structure confirmed that YedY contained the unmodified
cofactor Mo-MPT (Loschi et al., 2004). Like other periplasmic co-factor containing
proteins it also contains a twin arginine motif (K-R-R-Q-V-L-K), which is a signal
for export via the TAT translocase; a proteinaceous channel that preferentially
translocates fully-folded proteins with cytosolically incorporated co-factors
(Gohlke et al., 2005), Figure 8c.
Eukaryotic assimilatory NR is a homodimer of two soluble ~100 kDa subunits,
each subunit containing 3 modular units in a 1:1:1 ratio of Mo-MPT, cytochrome
b5 and flavin, usually FAD. Each modular unit in NR is thought to have evolved
independently, and the three units are linked by highly variable hinge-like
sequences (Campbell, 2001; Hyde et al., 1991). Our analysis of the putative yedYZ
operon suggests that the membrane-bound YedZ protein could be equivalent to
the soluble cytochrome b5 and flavin-binding domains, and together with the
globular Mo-MPT containing protein YedY, constitute an assimilatory
periplasmic NR, Figure 9.
Figure 9. YedYZ is a novel nitrate reductase. Schematic representation of the relationship between YedYZ proteins and eukaryotic assimilatory NR, the example shown is the NR NIA2 from tobacco (unpublished).
54
In support of this hypothesis we have found that the periplasmic NR capacity in a
yedYZ deletion mutant strain (MG1655 derivative) was slightly lower than in a
control grown aerobically (unpublished data). More striking, was a clear increase
in NR activity for a single yedZ deletion. Presumably uncoupling YedY allows
greater access of the artificial electron donor (reduced methyl viologen), used in
the NR assay to the Mo-MPT site (unpublished data).
55
5. Conclusions
In E. coli, membrane protein topology can be rapidly deduced from a
combination of computer predictions and single C-terminal PhoA/GFP reporter-
protein fusions (Paper I). Reporter fusions made to either the N- or C-terminal
ends of membrane proteins are more informative than fusions placed elsewhere
(Paper II), and are enough to appreciably improve membrane protein topology
predictions (Daley et al., 2005; Melen et al., 2003). This approach has been
applied to the entire E. coli inner membrane proteome (Daley et al., 2005). As
expected, ~80% of membrane proteins contain cytoplasmic C-termini, and thus,
can be further analyzed by a GFP-based overexpression and purification
‘pipeline’ (Papers III and IV). This pipeline allows highly overexpressed
membrane proteins to be rapidly and easily screened for. As demonstrated for
the membrane proteins GltP and YedZ, this pipeline can recover intact, full-
length functional membrane proteins from membrane protein-GFP fusions
(Paper IV). The usefulness of combining the aforementioned topology
information with the GFP-based pipeline was brought to the fore with the
characterization of the membrane protein YedZ; the first identified integral
membrane flavocytochrome.
It is envisaged that this GFP-based methodology in combination with the
E. coli membrane protein-GFP library, will facilitate the characterization of the
many E. coli membrane proteins without a known function. As demonstrated by
the functional expression of the human KDEL receptor-GFP fusion in L. lactis
(Paper IV), this technology also holds great promise for the eagerly awaited
functional and structure characterization of eukaryotic membrane proteins.
56
References Abramson, J., Riistama, S., Larsson, G., Jasaitis, A., Svensson-Ek, M., Laakkonen, L.,
Puustinen, A., Iwata, S. and Wikstrom, M. (2000) The structure of the ubiquinol oxidase from Escherichia coli and its ubiquinone binding site. Nat Struct Biol, 7, 910-917.
Abramson, J., Smirnova, I., Kasho, V., Verner, G., Kaback, H.R. and Iwata, S. (2003) Structure and mechanism of the lactose permease of Escherichia coli. Science, 301, 610-615.
Adamian, L., Nanda, V., DeGrado, W.F. and Liang, J. (2005) Empirical lipid propensities of amino acid residues in multispan alpha helical membrane proteins. Proteins, 59, 496-509.
Afonyushkin, T., Moll, I., Blasi, U. and Kaberdin, V.R. (2003) Temperature-dependent stability and translation of Escherichia coli ompA mRNA. Biochem Biophys Res Commun, 311, 604-609.
Andersson, H. and von Heijne, G. (1994) Membrane protein topology: effects of delta mu H+ on the translocation of charged residues explain the 'positive inside' rule. EMBO J, 13, 2267-2272.
Arechaga, I., Miroux, B., Karrasch, S., Huijbregts, R., de Kruijff, B., Runswick, M.J. and Walker, J.E. (2000) Characterisation of new intracellular membranes in Escherichia coli accompanying large scale over-production of the b subunit of F(1)F(o) ATP synthase. FEBS Lett, 482, 215-219.
Arechaga, I., Miroux, B., Runswick, M.J. and Walker, J.E. (2003) Over-expression of Escherichia coli F1F(o)-ATPase subunit a is inhibited by instability of the uncB gene transcript. FEBS Lett, 547, 97-100.
Barber, M.J., Desai, S.K., Marohnic, C.C., Hernandez, H.H. and Pollock, V.V. (2002) Synthesis and bacterial expression of a gene encoding the heme domain of assimilatory nitrate reductase. Arch Biochem Biophys, 402, 38-50.
Batey, R.T., Rambo, R.P., Lucast, L., Rha, B. and Doudna, J.A. (2000) Crystal structure of the ribonucleoprotein core of the signal recognition particle. Science, 287, 1232-1239.
Behrendt, J., Standar, K., Lindenstrauss, U. and Bruser, T. (2004) Topological studies on the twin-arginine translocase component TatC. FEMS Microbiol Lett, 234, 303-308.
Bernsel, A. and Von Heijne, G. (2005) Improved membrane protein topology prediction by domain assignments. Protein Sci, 14, 1723-1728.
Bertero, M.G., Rothery, R.A., Palak, M., Hou, C., Lim, D., Blasco, F., Weiner, J.H. and Strynadka, N.C. (2003) Insights into the respiratory electron transfer pathway from the structure of nitrate reductase A. Nat Struct Biol, 10, 681-687.
Bogdanov, M., Heacock, P.N. and Dowhan, W. (2002) A polytopic membrane protein displays a reversible topology dependent on membrane lipid composition. EMBO J, 21, 2107-2116.
Bogdanov, M., Zhang, W., Xie, J. and Dowhan, W. (2005) Transmembrane protein topology mapping by the substituted cysteine accessibility method (SCAM(TM)): application to lipid-specific membrane protein topogenesis. Methods, 36, 148-171.
Boon, J.M. and Smith, B.D. (2002) Chemical control of phospholipid distribution across bilayer membranes. Med Res Rev, 22, 251-281.
Booth, P.J. (2005) Sane in the membrane: designing systems to modulate membrane proteins. Curr Opin Struct Biol, 15, 435-440.
Campbell, W.H. (1996) Nitrate Reductase Biochemistry Comes of Age. Plant Physiol, 111, 355-361.
57
Campbell, W.H. (2001) Structure and function of eukaryotic NAD(P)H:nitrate reductase. Cell Mol Life Sci, 58, 194-204.
Chamberlain, A.K., Lee, Y., Kim, S. and Bowie, J.U. (2004) Snorkeling preferences foster an amino acid composition bias in transmembrane helices. J Mol Biol, 339, 471-479.
Chang, G., Spencer, R.H., Lee, A.T., Barclay, M.T. and Rees, D.C. (1998) Structure of the MscL homolog from Mycobacterium tuberculosis: a gated mechanosensitive ion channel. Science, 282, 2220-2226.
Chang, H.C., Kaiser, C.M., Hartl, F.U. and Barral, J.M. (2005) De novo Folding of GFP Fusion Proteins: High Efficiency in Eukaryotes but Not in Bacteria. J Mol Biol, 353, 397-409.
Crameri, A., Whitehorn, E.A., Tate, E. and Stemmer, W.P. (1996) Improved green fluorescent protein by molecular evolution using DNA shuffling. Nat Biotechnol, 14, 315-319.
Culham, D.E., Hillar, A., Henderson, J., Ly, A., Vernikovska, Y.I., Racher, K.I., Boggs, J.M. and Wood, J.M. (2003) Creation of a fully functional cysteine-less variant of osmosensor and proton-osmoprotectant symporter ProP from Escherichia coli and its application to assess the transporter's membrane orientation. Biochemistry, 42, 11815-11823.
Daley, D.O., Rapp, M., Granseth, E., Melen, K., Drew, D. and von Heijne, G. (2005) Global topology analysis of the Escherichia coli inner membrane proteome. Science, 308, 1321-1323.
Danielsen, S., Boyd, D. and Neuhard, J. (1995) Membrane topology analysis of the Escherichia coli cytosine permease. Microbiology, 141 ( Pt 11), 2905-2913.
Dawson, J.P., Melnyk, R.A., Deber, C.M. and Engelman, D.M. (2003) Sequence context strongly modulates association of polar residues in transmembrane helices. J Mol Biol, 331, 255-262.
Dawson, J.P., Weinger, J.S. and Engelman, D.M. (2002) Motifs of serine and threonine can drive association of transmembrane helices. J Mol Biol, 316, 799-805.
DeGrado, W.F., Gratkowski, H. and Lear, J.D. (2003) How do helix-helix interactions help determine the folds of membrane proteins? Perspectives from the study of homo-oligomeric helical bundles. Protein Sci, 12, 647-665.
Dobrovetsky, E., Lu, M.L., Andorn-Broza, R., Khutoreskaya, G., Bray, J.E., Savchenko, A., Arrowsmith, C.H., Edwards, A.M. and Koth, C.M. (2005) High-throughput production of prokaryotic membrane proteins. J Struct Funct Genomics, 6, 33-50.
Doyle, D.A., Morais Cabral, J., Pfuetzner, R.A., Kuo, A., Gulbis, J.M., Cohen, S.L., Chait, B.T. and MacKinnon, R. (1998) The structure of the potassium channel: molecular basis of K+ conduction and selectivity. Science, 280, 69-77.
Drew, D., Froderberg, L., Baars, L. and de Gier, J.W. (2003) Assembly and overexpression of membrane proteins in Escherichia coli. Biochim Biophys Acta, 1610, 3-10.
Drew, D., Lerch, M, Kunji, E, Slotboom, DJ, and de Gier, JW. (In preparation) Optimizing membrane protein overexpression and purification using GFP fusions. Nature Methods.
Drew, D., Sjostrand, D., Nilsson, J., Urbig, T., Chin, C.N., de Gier, J.W. and von Heijne, G. (2002) Rapid topology mapping of Escherichia coli inner-membrane proteins by prediction and PhoA/GFP fusion analysis. Proc Natl Acad Sci U S A, 99, 2690-2695.
Drew, D., Slotboom, D.J., Friso, G., Reda, T., Genevaux, P., Rapp, M., Meindl-Beinker, N.M., Lambert, W., Lerch, M., Daley, D.O., Van Wijk, K.J., Hirst, J., Kunji, E. and De Gier, J.W. (2005) A scalable, GFP-based pipeline for membrane protein overexpression screening and purification. Protein Sci, 14, 2011-2017.
58
Drew, D.E., von Heijne, G., Nordlund, P. and de Gier, J.W. (2001) Green fluorescent protein as an indicator to monitor membrane protein overexpression in Escherichia coli. FEBS Lett, 507, 220-224.
Driessen, A.J., Manting, E.H. and van der Does, C. (2001) The structural basis of protein targeting and translocation in bacteria. Nat Struct Biol, 8, 492-498.
Dutzler, R., Campbell, E.B., Cadene, M., Chait, B.T. and MacKinnon, R. (2002) X-ray structure of a ClC chloride channel at 3.0 A reveals the molecular basis of anion selectivity. Nature, 415, 287-294.
Edwards, M.D., Li, Y., Kim, S., Miller, S., Bartlett, W., Black, S., Dennison, S., Iscla, I., Blount, P., Bowie, J.U. and Booth, I.R. (2005) Pivotal role of the glycine-rich TM3 helix in gating the MscS mechanosensitive channel. Nat Struct Mol Biol, 12, 113-119.
Engelman, D.M., Chen, Y., Chin, C.N., Curran, A.R., Dixon, A.M., Dupuy, A.D., Lee, A.S., Lehnert, U., Matthews, E.E., Reshetnyak, Y.K., Senes, A. and Popot, J.L. (2003) Membrane protein folding: beyond the two stage model. FEBS Lett, 555, 122-125.
Eroglu, C., Cronet, P., Panneels, V., Beaufils, P. and Sinning, I. (2002) Functional reconstitution of purified metabotropic glutamate receptor expressed in the fly eye. EMBO Rep, 3, 491-496.
Eshaghi, S., Hedren, M., Nasser, M.I., Hammarberg, T., Thornell, A. and Nordlund, P. (2005) An efficient strategy for high-throughput expression screening of recombinant integral membrane proteins. Protein Sci, 14, 676-683.
Farewell, A. and Neidhardt, F.C. (1998) Effect of temperature on in vivo protein synthetic capacity in Escherichia coli. J Bacteriol, 180, 4704-4710.
Feilmeier, B.J., Iseminger, G., Schroeder, D., Webber, H. and Phillips, G.J. (2000) Green fluorescent protein functions as a reporter for protein localization in Escherichia coli. J Bacteriol, 182, 4068-4076.
Fraaije, M.W. and Mattevi, A. (2000) Flavoenzymes: diverse catalysts with recurrent features. Trends Biochem Sci, 25, 126-132.
Freedman, S.D., Katz, M.H., Parker, E.M., Laposata, M., Urman, M.Y. and Alvarez, J.G. (1999) A membrane lipid imbalance plays a role in the phenotypic expression of cystic fibrosis in cftr(-/-) mice. Proc Natl Acad Sci U S A, 96, 13995-14000.
Frillingos, S., Sahin-Toth, M., Wu, J. and Kaback, H.R. (1998) Cys-scanning mutagenesis: a novel approach to structure function relationships in polytopic membrane proteins. FASEB J, 12, 1281-1299.
Fu, D., Libson, A., Miercke, L.J., Weitzman, C., Nollert, P., Krucinski, J. and Stroud, R.M. (2000) Structure of a glycerol-conducting channel and the basis for its selectivity. Science, 290, 481-486.
Fukuda, H., Arai, M. and Kuwajima, K. (2000) Folding of green fluorescent protein and the cycle3 mutant. Biochemistry, 39, 12025-12032.
Fyfe, P.K., Hughes, A.V., Heathcote, P. and Jones, M.R. (2005) Proteins, chlorophylls and lipids: X-ray analysis of a three-way relationship. Trends Plant Sci, 10, 275-282.
Fyfe, P.K., Isaacs, N.W., Cogdell, R.J. and Jones, M.R. (2004) Disruption of a specific molecular interaction with a bound lipid affects the thermal stability of the purple bacterial reaction centre. Biochim Biophys Acta, 1608, 11-22.
Gandlur, S.M., Wei, L., Levine, J., Russell, J. and Kaur, P. (2004) Membrane topology of the DrrB protein of the doxorubicin transporter of Streptomyces peucetius. J Biol Chem, 279, 27799-27806.
Geertsma, E.R. (2005) What lies between: Functional interfaces in a dimeric transporter. Department of Biochemistry. University of Groningen, Groningen, p. 101.
Goder, V., Junne, T. and Spiess, M. (2004) Sec61p contributes to signal sequence orientation according to the positive-inside rule. Mol Biol Cell, 15, 1470-1478.
59
Gohlke, U., Pullan, L., McDevitt, C.A., Porcelli, I., de Leeuw, E., Palmer, T., Saibil, H.R. and Berks, B.C. (2005) The TatA component of the twin-arginine protein transport system forms channel complexes of variable diameter. Proc Natl Acad Sci U S A, 102, 10482-10486.
Gouffi, K., Santini, C.L. and Wu, L.F. (2002) Topology determination and functional analysis of the Escherichia coli TatC protein. FEBS Lett, 525, 65-70.
Granseth, E., Daley, D.O., Rapp, M., Melen, K. and von Heijne, G. (2005a) Experimentally constrained topology models for 51,208 bacterial inner membrane proteins. J Mol Biol, 352, 489-494.
Granseth, E., von Heijne, G. and Elofsson, A. (2005b) A study of the membrane-water interface region of membrane proteins. J Mol Biol, 346, 377-385.
Green, D.H. and Cutting, S.M. (2000) Membrane topology of the Bacillus subtilis pro-sigma(K) processing complex. J Bacteriol, 182, 278-285.
Griffith, D.A., Delipala, C., Leadsham, J., Jarvis, S.M. and Oesterhelt, D. (2003) A novel yeast expression system for the overproduction of quality-controlled membrane proteins. FEBS Lett, 553, 45-50.
Grisshammer, R., Duckworth, R. and Henderson, R. (1993) Expression of a rat neurotensin receptor in Escherichia coli. Biochem J, 295 ( Pt 2), 571-576.
Grisshammer, R. and Tate, C.G. (1995) Overexpression of integral membrane proteins for structural studies. Q Rev Biophys, 28, 315-422.
Gromiha, M.M. and Suwa, M. (2005) Structural analysis of residues involving cation-pi interactions in different folding types of membrane proteins. Int J Biol Macromol, 35, 55-62.
Hedhammar, M., Stenvall, M., Lonneborg, R., Nord, O., Sjolin, O., Brismar, H., Uhlen, M., Ottosson, J. and Hober, S. (2005) A novel flow cytometry-based method for analysis of expression levels in Escherichia coli, giving information about precipitated and soluble protein. J Biotechnol, 119, 133-146.
Hefti, M.H., Vervoort, J. and van Berkel, W.J. (2003) Deflavination and reconstitution of flavoproteins. Eur J Biochem, 270, 4227-4242.
Helms, V. (2002) Attraction within the membrane. Forces behind transmembrane protein folding and supramolecular complex assembly. EMBO Rep, 3, 1133-1138.
Hermansson, M. and von Heijne, G. (2003) Inter-helical hydrogen bond formation during membrane protein integration into the ER membrane. J Mol Biol, 334, 803-809.
Hessa, T., Kim, H., Bihlmaier, K., Lundin, C., Boekel, J., Andersson, H., Nilsson, I., White, S.H. and von Heijne, G. (2005a) Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature, 433, 377-381.
Hessa, T., White, S.H. and von Heijne, G. (2005b) Membrane insertion of a potassium-channel voltage sensor. Science, 307, 1427.
Higy, M., Junne, T. and Spiess, M. (2004) Topogenesis of membrane proteins at the endoplasmic reticulum. Biochemistry, 43, 12716-12722.
Hoag, H. (2005) Expression of interest. Nature, 437, 164-165. Hong, H. and Tamm, L.K. (2004) Elastic coupling of integral membrane protein stability
to lipid bilayer forces. Proc Natl Acad Sci U S A, 101, 4065-4070. Houben, E.N., Zarivach, R., Oudega, B. and Luirink, J. (2005) Early encounters of a
nascent membrane protein: specificity and timing of contacts inside and outside the ribosome. J Cell Biol, 170, 27-35.
Huang, Y., Lemieux, M.J., Song, J., Auer, M. and Wang, D.N. (2003) Structure and mechanism of the glycerol-3-phosphate transporter from Escherichia coli. Science, 301, 616-620.
Huber, D., Boyd, D., Xia, Y., Olma, M.H., Gerstein, M. and Beckwith, J. (2005) Use of thioredoxin as a reporter to identify a subset of Escherichia coli signal sequences that promote signal recognition particle-dependent translocation. J Bacteriol, 187, 2983-2991.
60
Hunte, C., Screpanti, E., Venturi, M., Rimon, A., Padan, E. and Michel, H. (2005) Structure of a Na+/H+ antiporter and insights into mechanism of action and regulation by pH. Nature, 435, 1197-1202.
Hyde, G.E., Crawford, N.M. and Campbell, W.H. (1991) The sequence of squash NADH:nitrate reductase and its relationship to the sequences of other flavoprotein oxidoreductases. A family of flavoprotein pyridine nucleotide cytochrome reductases. J Biol Chem, 266, 23542-23547.
Ito, K. and Akiyama, Y. (1991) In vivo analysis of integration of membrane proteins in Escherichia coli. Mol Microbiol, 5, 2243-2253.
Ito, K. and Akiyama, Y. (2005) Cellular functions, mechanism of action, and regulation of ftsh protease. Annu Rev Microbiol, 59, 211-231.
Iwata, M., Okada, K. and Iwata, S. (1999) [Structure of cytochrome bc1 complex from bovine heart mitochondria]. Tanpakushitsu Kakusan Koso, 44, 643-654.
Jakubowski, S.J., Krishnamoorthy, V., Cascales, E. and Christie, P.J. (2004) Agrobacterium tumefaciens VirB6 domains direct the ordered export of a DNA substrate through a type IV secretion System. J Mol Biol, 341, 961-977.
Jensen, M.O. and Mouritsen, O.G. (2004) Lipids do influence protein function: the hydrophobic matching hypothesis revisited. Biochim Biophys Acta, 1666, 205-226.
Jensen, M.O., Mouritsen, O.G. and Peters, G.H. (2004) The hydrophobic effect: molecular dynamics simulations of water confined between extended hydrophobic and hydrophilic surfaces. J Chem Phys, 120, 9729-9744.
Jones, D.T., Taylor, W.R. and Thornton, J.M. (1994) A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry, 33, 3038-3049.
Jordan, P., Fromme, P., Witt, H.T., Klukas, O., Saenger, W. and Krauss, N. (2001) Three-dimensional structure of cyanobacterial photosystem I at 2.5 A resolution. Nature, 411, 909-917.
Kaneko, M. and Nomura, Y. (2003) ER signaling in unfolded protein response. Life Sci, 74, 199-205.
Kastner, C.N., Dimroth, P. and Pos, K.M. (2000) The Na+-dependent citrate carrier of Klebsiella pneumoniae: high-level expression and site-directed mutagenesis of asparagine-185 and glutamate-194. Arch Microbiol, 174, 67-73.
Khademi, S., O'Connell, J., 3rd, Remis, J., Robles-Colmenares, Y., Miercke, L.J. and Stroud, R.M. (2004) Mechanism of ammonia transport by Amt/MEP/Rh: structure of AmtB at 1.35 A. Science, 305, 1587-1594.
Kiefer, H., Krieger, J., Olszewski, J.D., Von Heijne, G., Prestwich, G.D. and Breer, H. (1996) Expression of an olfactory receptor in Escherichia coli: purification, reconstitution, and ligand binding. Biochemistry, 35, 16077-16084.
Kihara, A., Akiyama, Y. and Ito, K. (1995) FtsH is required for proteolytic elimination of uncomplexed forms of SecY, an essential protein translocase subunit. Proc Natl Acad Sci U S A, 92, 4532-4536.
Kim, H., Melen, K. and von Heijne, G. (2003) Topology models for 37 Saccharomyces cerevisiae membrane proteins based on C-terminal reporter fusions and predictions. J Biol Chem, 278, 10208-10213.
Kim, H., Unby, M, Melén, K, Warringer, J, Blomberg, A, von Heijne, G. (In preparation) A global topology map of the Saccharomyces cerevisiae membrane proteome.
Korepanova, A., Gao, F.P., Hua, Y., Qin, H., Nakamoto, R.K. and Cross, T.A. (2005) Cloning and expression of multiple integral membrane proteins from Mycobacterium tuberculosis in Escherichia coli. Protein Sci, 14, 148-158.
Krebber, C., Spada, S., Desplancq, D., Krebber, A., Ge, L. and Pluckthun, A. (1997) Selectively-infective phage (SIP): a mechanistic dissection of a novel in vivo selection for protein-ligand interactions. J Mol Biol, 268, 607-618.
61
Krishnan, M.N., Bingham, J.P., Lee, S.H., Trombley, P. and Moczydlowski, E. (2005) Functional Role and Affinity of Inorganic Cations in Stabilizing the Tetrameric Structure of the KcsA K+ Channel. J Gen Physiol, 126, 271-283.
Krogh, A., Larsson, B., von Heijne, G. and Sonnhammer, E.L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol, 305, 567-580.
Kung, C. (2005) A possible unifying principle for mechanosensation. Nature, 436, 647-654.
Kunji, E.R., Chan, K.W., Slotboom, D.J., Floyd, S., O'Connor, R. and Monne, M. (2005) Eukaryotic membrane protein overproduction in Lactococcus lactis. Curr Opin Biotechnol, 16, 546-51.
Kunji, E.R., Slotboom, D.J. and Poolman, B. (2003) Lactococcus lactis as host for overproduction of functional membrane proteins. Biochim Biophys Acta, 1610, 97-108.
Locher, K.P., Lee, A.T. and Rees, D.C. (2002) The E. coli BtuCD structure: a framework for ABC transporter architecture and mechanism. Science, 296, 1091-1098.
Long, S.B., Campbell, E.B. and Mackinnon, R. (2005a) Crystal structure of a mammalian voltage-dependent Shaker family K+ channel. Science, 309, 897-903.
Long, S.B., Campbell, E.B. and Mackinnon, R. (2005b) Voltage Sensor of Kv1.2: Structural Basis of Electromechanical Coupling. Science, 309, 903-908.
Loschi, L., Brokx, S.J., Hills, T.L., Zhang, G., Bertero, M.G., Lovering, A.L., Weiner, J.H. and Strynadka, N.C. (2004) Structural and biochemical identification of a novel bacterial oxidoreductase. J Biol Chem.
Lostao, A., Daoudi, F., Irun, M.P., Ramon, A., Fernandez-Cabrera, C., Romero, A. and Sancho, J. (2003) How FMN binds to anabaena apoflavodoxin: a hydrophobic encounter at an open binding site. J Biol Chem, 278, 24053-24061.
Luirink, J. and Sinning, I. (2004) SRP-mediated protein targeting: structure and function revisited. Biochim Biophys Acta, 1694, 17-35.
Ma, C. and Chang, G. (2004) Structure of the multidrug resistance efflux transporter EmrE from Escherichia coli. Proc Natl Acad Sci U S A, 101, 2852-2857.
Manoil, C. (1991) Analysis of membrane protein topology using alkaline phosphatase and beta-galactosidase gene fusions. Methods Cell Biol, 34, 61-75.
McMurry, J.L., Van Arnam, J.S., Kihara, M. and Macnab, R.M. (2004) Analysis of the cytoplasmic domains of Salmonella FlhA and interactions with components of the flagellar export machinery. J Bacteriol, 186, 7586-7592.
Melen, K., Krogh, A. and von Heijne, G. (2003) Reliability measures for membrane protein topology prediction algorithms. J Mol Biol, 327, 735-744.
Miroux, B. and Walker, J.E. (1996) Over-production of proteins in Escherichia coli: mutant hosts that allow synthesis of some membrane proteins and globular proteins at high levels. J Mol Biol, 260, 289-298.
Mohanty, A.K., Simmons, C.R. and Wiener, M.C. (2003) Inhibition of tobacco etch virus protease activity by detergents. Protein Expr Purif, 27, 109-114.
Morgan-Kiss, R.M., Wadler, C. and Cronan, J.E., Jr. (2002) Long-term and homogeneous regulation of the Escherichia coli araBAD promoter by use of a lactose transporter of relaxed specificity. Proc Natl Acad Sci U S A, 99, 7373-7377.
Muller, G. (2000) Towards 3D structures of G protein-coupled receptors: a multidisciplinary approach. Curr Med Chem, 7, 861-888.
Munro, S. (1998) Localization of proteins to the Golgi apparatus. Trends Cell Biol, 8, 11-15.
Nilsson, I.M. and von Heijne, G. (1993) Determination of the distance between the oligosaccharyltransferase active site and the endoplasmic reticulum membrane. J Biol Chem, 268, 5798-5801.
Nilsson, J., Persson, B. and von Heijne, G. (2000) Consensus predictions of membrane protein topology. FEBS Lett, 486, 267-269.
62
Nilsson, J., Persson, B. and von Heijne, G. (2005) Comparative analysis of amino acid distributions in integral membrane proteins from 107 genomes. Proteins, 60, 606-616.
Ott, C.M. and Lingappa, V.R. (2002) Integral membrane protein biosynthesis: why topology is hard to predict. J Cell Sci, 115, 2003-2009.
Palczewski, K., Kumasaka, T., Hori, T., Behnke, C.A., Motoshima, H., Fox, B.A., Le Trong, I., Teller, D.C., Okada, T., Stenkamp, R.E., Yamamoto, M. and Miyano, M. (2000) Crystal structure of rhodopsin: A G protein-coupled receptor. Science, 289, 739-745.
Palmer, T. and Berks, B.C. (2003) Moving folded proteins across the bacterial cell membrane. Microbiology, 149, 547-556.
Pan, S.H. and Malcolm, B.A. (2000) Reduced background expression and improved plasmid stability with pET vectors in BL21 (DE3). Biotechniques, 29, 1234-1238.
Park, S.H. and Opella, S.J. (2005) Tilt angle of a trans-membrane helix is determined by hydrophobic mismatch. J Mol Biol, 350, 310-318.
Pourcher, T., Bibi, E., Kaback, H.R. and Leblanc, G. (1996) Membrane topology of the melibiose permease of Escherichia coli studied by melB-phoA fusion analysis. Biochemistry, 35, 4161-4168.
Quick, M. and Wright, E.M. (2002) Employing Escherichia coli to functionally express, purify, and characterize a human transporter. Proc Natl Acad Sci U S A, 99, 8597-8601.
Rapoport, T.A., Goder, V., Heinrich, S.U. and Matlack, K.E. (2004) Membrane-protein integration and the role of the translocation channel. Trends Cell Biol, 14, 568-575.
Rapp, M., Drew, D., Daley, D.O., Nilsson, J., Carvalho, T., Melen, K., De Gier, J.W. and Von Heijne, G. (2004) Experimentally based topology models for E. coli inner membrane proteins. Protein Sci, 13, 937-945.
Raunser, S., Haase, W., Bostina, M., Parcej, D.N. and Kuhlbrandt, W. (2005) High-yield expression, reconstitution and structure of the recombinant, fully functional glutamate transporter GLT-1 from Rattus norvegicus. J Mol Biol, 351, 598-613.
Reyes, C.L. and Chang, G. (2005) Structure of the ABC transporter MsbA in complex with ADP.vanadate and lipopolysaccharide. Science, 308, 1028-1031.
Rost, B., Fariselli, P. and Casadio, R. (1996) Topology prediction for helical transmembrane proteins at 86% accuracy. Protein Sci, 5, 1704-1718.
Rudner, D.Z., Fawcett, P. and Losick, R. (1999) A family of membrane-embedded metalloproteases involved in regulated proteolysis of membrane-associated transcription factors. Proc Natl Acad Sci U S A, 96, 14765-14770.
Sarramegna, V., Talmont, F., Seree de Roch, M., Milon, A. and Demange, P. (2002) Green fluorescent protein as a reporter of human mu-opioid receptor overexpression and localization in the methylotrophic yeast Pichia pastoris. J Biotechnol, 99, 23-39.
Savage, D.F., Egea, P.F., Robles-Colmenares, Y., O'Connell, J.D., 3rd and Stroud, R.M. (2003) Architecture and selectivity in aquaporins: 2.5 a X-ray structure of aquaporin Z. PLoS Biol, 1, E72.
Schiller, H., Molsberger, E., Janssen, P., Michel, H. and Reilander, H. (2001) Solubilization and purification of the human ETB endothelin receptor produced by high-level fermentation in Pichia pastoris. Receptors Channels, 7, 453-469.
Schneider, D. (2004) Rendezvous in a membrane: close packing, hydrogen bonding, and the formation of transmembrane helix oligomers. FEBS Lett, 577, 5-8.
Schulz, G.E. (2003) Transmembrane beta-barrel proteins. Adv Protein Chem, 63, 47-70. Seddon, A.M., Curnow, P. and Booth, P.J. (2004) Membrane proteins, lipids and
detergents: not just a soap opera. Biochim Biophys Acta, 1666, 105-117.
63
Senes, A., Engel, D.E. and DeGrado, W.F. (2004) Folding of helical membrane proteins: the role of polar, GxxxG-like and proline motifs. Curr Opin Struct Biol, 14, 465-479.
Senes, A., Gerstein, M. and Engelman, D.M. (2000) Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions. J Mol Biol, 296, 921-936.
Senes, A., Ubarretxena-Belandia, I. and Engelman, D.M. (2001) The Calpha ---H...O hydrogen bond: a determinant of stability and specificity in transmembrane helix interactions. Proc Natl Acad Sci U S A, 98, 9056-9061.
Severance, S., Chakraborty, S. and Kosman, D.J. (2004) The Ftr1p iron permease in the yeast plasma membrane: orientation, topology and structure-function relationships. Biochem J, 380, 487-496.
Sorensen, H.P. and Mortensen, K.K. (2005) Soluble expression of recombinant proteins in the cytoplasm of Escherichia coli. Microb Cell Fact, 4, 1-8.
Strandberg, E. and Killian, J.A. (2003) Snorkeling of lysine side chains in transmembrane helices: how easy can it get? FEBS Lett, 544, 69-73.
Studier, F.W. and Moffatt, B.A. (1986) Use of bacteriophage T7 RNA polymerase to direct selective high-level expression of cloned genes. J Mol Biol, 189, 113-130.
Tate, C.G. (2001) Overexpression of mammalian integral membrane proteins for structural studies. FEBS Lett, 504, 94-98.
Tate, C.G. and Blakely, R.D. (1994) The effect of N-linked glycosylation on activity of the Na(+)- and Cl(-)-dependent serotonin transporter expressed using recombinant baculovirus in insect cells. J Biol Chem, 269, 26303-26310.
Tate, C.G., Whiteley, E. and Betenbaugh, M.J. (1999) Molecular chaperones stimulate the functional expression of the cocaine-sensitive serotonin transporter. J Biol Chem, 274, 17551-17558.
Thomas, J.D., Daniel, R.A., Errington, J. and Robinson, C. (2001) Export of active green fluorescent protein to the periplasm by the twin-arginine translocase (Tat) pathway in Escherichia coli. Mol Microbiol, 39, 47-53.
Tsien, R.Y. (1998) The green fluorescent protein. Annu Rev Biochem, 67, 509-544. Tusnady, G.E. and Simon, I. (1998) Principles governing amino acid composition of
integral membrane proteins: application to topology prediction. J Mol Biol, 283, 489-506.
Ulmschneider, M.B., Sansom, M.S. and Di Nola, A. (2005) Properties of integral membrane protein structures: derivation of an implicit membrane potential. Proteins, 59, 252-265.
Urbanus, M.L., Froderberg, L., Drew, D., Bjork, P., de Gier, J.W., Brunner, J., Oudega, B. and Luirink, J. (2002) Targeting, insertion, and localization of Escherichia coli YidC. J Biol Chem, 277, 12718-12723.
Valent, Q.A., de Gier, J.W., von Heijne, G., Kendall, D.A., ten Hagen-Jongman, C.M., Oudega, B. and Luirink, J. (1997) Nascent membrane and presecretory proteins synthesized in Escherichia coli associate with signal recognition particle and trigger factor. Mol Microbiol, 25, 53-64.
Van den Berg, B., Clemons, W.M., Jr., Collinson, I., Modis, Y., Hartmann, E., Harrison, S.C. and Rapoport, T.A. (2004) X-ray structure of a protein-conducting channel. Nature, 427, 36-44.
van Geest, M. and Lolkema, J.S. (2000) Membrane topology and insertion of membrane proteins: search for topogenic signals. Microbiol Mol Biol Rev, 64, 13-33.
Viklund, H. and Elofsson, A. (2004) Best alpha-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information. Protein Sci, 13, 1908-1917.
von Heijne, G. (1989) Control of topology and mode of assembly of a polytopic membrane protein by positively charged residues. Nature, 341, 456-458.
64
von Heijne, G. (1992) Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol, 225, 487-494.
Waldo, G.S., Standish, B.M., Berendzen, J. and Terwilliger, T.C. (1999) Rapid protein-folding assay using green fluorescent protein. Nat Biotechnol, 17, 691-695.
Wallace, B., Yang, Y.J., Hong, J.S. and Lum, D. (1990) Cloning and sequencing of a gene encoding a glutamate and aspartate carrier of Escherichia coli K-12. J Bacteriol, 172, 3214-3220.
Walian, P., Cross, T. and Jap, B.K. (2004) Structural genomics of membrane proteins. Genome Biology, 5, 215.1-8.
Wallin, E. and von Heijne, G. (1995) Properties of N-terminal tails in G-protein coupled receptors: a statistical study. Protein Eng, 8, 693-698.
Wallin, E. and von Heijne, G. (1998) Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci, 7, 1029-1038.
Wang, D.N., Safferling, M., Lemieux, M.J., Griffith, H., Chen, Y. and Li, X.D. (2003) Practical aspects of overexpressing bacterial secondary membrane transporters for structural studies. Biochim Biophys Acta, 1610, 23-36.
Weiss, H.M. and Grisshammer, R. (2002) Purification and characterization of the human adenosine A(2a) receptor functionally expressed in Escherichia coli. Eur J Biochem, 269, 82-92.
White, S.H. (2004) The progress of membrane protein structure determination. Protein Sci, 13, 1948-1949.
White, S.H. and von Heijne, G. (2005) Transmembrane helices before, during, and after insertion. Curr Opin Struct Biol, 15, 378-386.
Wimley, W.C. and White, S.H. (1996) Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nat Struct Biol, 3, 842-848.
Woolhead, C.A., McCormick, P.J. and Johnson, A.E. (2004) Nascent membrane and secretory proteins differ in FRET-detected folding far inside the ribosome and in their exposure to ribosomal proteins. Cell, 116, 725-736.
Yamashita, A., Singh, S.K., Kawate, T., Jin, Y. and Gouaux, E. (2005) Crystal structure of a bacterial homologue of Na(+)/Cl(-)-dependent neurotransmitter transporters. Nature, 437, 215-223.
Yeliseev, A.A., Wong, K.K., Soubias, O. and Gawrisch, K. (2005) Expression of human peripheral cannabinoid receptor for structural studies. Protein Sci, 14, 2638-2653.
Yohannan, S., Yang, D., Faham, S., Boulting, G., Whitelegge, J. and Bowie, J.U. (2004) Proline substitutions are not easily accommodated in a membrane protein. J Mol Biol, 341, 1-6.
65
Acknowledgements This will undoubtedly be the most read portion of the thesis. I hope I have managed to convey my heart-felt thanks in a way that is genuine. Enjoyable work has been the fruit of working with good people. Jan-Willem de Gier: For giving me freedom to pursue my research, for always supporting me, and for probably being the most loyal supervisor you will find anywhere! Gunnar von Heijne: For sharing your fountain of wisdom, for your patience, and for being a testimony to the fact that not all nice guys finish last! Edmund Kunji: For welcoming me warmly into your laboratory and for you infectious excitement of membrane protein research. Pär Nordlund: For your great hospitality, and for introducing me to some of those Swedish traditions. Dan Daley: For all those great training runs, for all those wonderful cooked meals of Geneth’s and for your enormous encouragement, help and time … thanks mate! Magnus Monne: For probably being one of the most generous Swedes anywhere, so much so, that you never complained about the random location of my socks, and you allowed me to share your king-size bed with you for 3 months!! Louise Baars: For your kindness, integrity, and intellect. Joy Kim: For being so joyful, for our fruitful scientific discussions, and for all your kindness and consideration. Wow. Mikaela Rapp: For being so much fun to work with - in the lab and around the coffee table - and for making me feel so welcome when I first arrived from NZ. Mirjam Lerch: For many interesting scientific discussions, for buying me English tea, and for your passion to just about anything. Tara Hessa: For your encouragement and for your ability to laugh when nothing is working. Linda Fröderberg: For calling a spade a spade, for all that endless administration, and for breaking-in Jan-Willem for us! Dirk Slotboom: For pushing me to use my brain and for your generosity. David Wickström: For your fantastic working attitude, for your kindness, and for passing the ball to me playing football … even if I really suck. Samuel Wagner: For all that chocolate you let me eat, and for the gift of earplugs. Marie Unby: For your cheery demeanor which makes working here so much fun. IngMarie Nilsson: For your kindness, generosity, and constant hard work. Marika Cassel: For your kindness, and patience when I am noisy in the office. Filippa Stenberg / Carolina Lundin: The new girls on the block … who make sure that no one will forget to buy the cake and/or champagne. Others (past and present) in DBB which makes this a great environment for working, special thanks to: Karl-Magnus, Gisela, Pavel, Shashi, Lotta, Pelle, Nadja, Bogos, Stefan, Inger, Kicki, and Anki. The rest of the group in Cambridge and in New Zealand: Peter M., Ted, Heather, Judy, Torsten, Ka Wai, Lisa, Marilyn and Peter H. To my good mates in Sweden: Anders, Lisa, Stig, Maria, Joel, Brenda, Bas, Sebastian, Geneth, and the wonderful Uhrnell family. To my good mates in New Zealand (my extended whanau): Daniel, Jamie, Esther, Nathan, Tammy, Steve, Katie, Strahan, Rachel, Jeremy, and Peter Haebel (honorary kiwi) … save a few waves for me boys!! To my family with whom life is very rich indeed: Mum, John, Dad, Jill, Marge, Geoffrey, Aaron, Carolina, Sofia, Paul, Jay, Jonathon, Lorraine, Johan, and my dearest twin, Natasha. To my wife Anna Maria: ….well, words are just not enough to describe how happy you make me! To my Lord: for his agape love and faithfulness.