GFP as a tool to monitor membrane protein topology and …198074/FULLTEXT01.pdf · 2009-02-27 ·...

GFP as a tool to monitor membrane protein topology and overexpression in Escherichia coli David Eric Drew

2005

2

Doctoral thesis 2005 Department of Biochemistry and Biophysics Stockholm University, S-106 91 Stockholm Sweden ISBN 91-7155-160-3, pp. 1-65 Intellecta Docusys, Stockholm 2005 All previously published papers are reprinted with permission from the publisher.

3

Table of Contents Abstract 5 Abbreviations 6 List of publications 8 1. Introduction 9 1.2 Membrane proteins 11 1.2.1 α-helical architecture 12 1.2.2 Membrane protein biogenesis 15 1.2.3 Membrane protein folding 16 1.2.4 Membrane proteins and lipids 18 2.1 Membrane protein topology 20 2.1.1 Topology prediction algorithms 20 2.1.2 Reliability of topology prediction 21 2.1.3 Experimental topology mapping 22 2.2 High-throughput topology mapping of E. coli membrane proteins 25 2.2.1 A consensus approach for generating topology models 25 2.2.2 Using GFP as a cytoplasmic membrane protein topology

reporter in E. coli 25 2.2.3 Combining C-terminal orientation analysis with a consensus-

prediction approach 28 2.2.4 The reliability of topologies generated by a consensus approach 28 2.2.5 Generating topology models by constraining TMHMM 29 2.2.6 Why does GFP work as a topology reporter? 30 2.2.7 Comparing 2D maps to 3D-structures 31 2.2.8 Summary of high-throughput membrane protein topology mapping 31 3.1 Membrane protein overexpression 33 3.1.1 Limited availability of biogenesis factors and/or lipid space may

hamper membrane protein overexpression 33 3.1.2 ‘Trial-and-Error’ 34 3.1.3 Choosing a membrane protein overexpression host 35 3.1.4 General strategies for membrane protein overexpression in E. coli 36 3.1.5 The BL21(DE3)pET-system 38 3.1.6 Membrane protein purification 38 3.2 High-throughput membrane protein overexpression in E. coli 40 3.2.1 Inclusion bodies of membrane protein-GFP fusions are not fluorescent 40 3.2.2 GFP tagging works only for membrane proteins with a cytoplasmic

C-terminus 42 3.2.3 GFP as a membrane protein folding indicator in whole cells 42

4

3.2.4 GFP-based screen to optimize membrane protein overexpression 43 3.2.5 In-gel GFP fluorescence 44 3.2.6 GFP-based purification pipeline 46 3.2.7 Recovery of membrane proteins from GFP fusions using a site specific protease 46 3.2.8 How does this GFP-based method compare to other high-throughput

approaches? 47 3.2.9 Summary of high-throughput membrane protein overexpression 48 4. Characterization of the membrane protein YedZ 49 4.1.1 A test case for the GFP-based purification pipeline: YedZ 49 4.1.2 YedZ is a novel integral membrane flavocytochrome 49 4.1.3 The possible function of YedZ 52 5. Conclusions 55 References 56 Acknowledgements 65

5

Abstract

Membrane proteins are essential for life, and roughly one-quarter of all open

reading frames in sequenced genomes code for membrane proteins.

Unfortunately, our understanding of membrane proteins lags behind that of

soluble proteins, and is best reflected by the fact that only 0.5% of the structures

deposited in the protein data-bank (PDB) are of membrane proteins. This

discrepancy has arisen because their hydrophobicity - which enables them to

exist in a lipid environment - has made them resistant to most traditional

approaches used for procuring knowledge from their soluble counter-parts. As

such, novel methods are required to facilitate our knowledge acquisition of

membrane proteins.

In this thesis a generic approach for rapidly obtaining information on

membrane proteins from the classic bacterial encyclopedia Escherichia coli is

described. We have developed a Green Fluorescent Protein C-terminal tagging

approach, with which we can acquire information as to the topology and

‘expressibility’ of membrane proteins in a high-throughput manner. This

technology has been applied to the whole E. coli inner membrane proteome, and

stands as an important advance for further membrane protein research.

6

Abbreviations BiP binding protein, Hsp70 C-terminal carboxy-terminal ER endoplasmic reticulum FRET fluorescence resonance energy transfer GFP green fluorescent protein GPCR G-protein coupled receptor HMM hidden Markov model IMAC immobilized metal affinity chromatography IPTG isopropyl-β-D-thiogalactoside Lep signal peptidase I, leader peptidase Mo-MPT molybdenum-molybdopterin N-terminal amino-terminal NR nitrate reductase ORF open reading frame PE phosphatidylethanolamine PhoA alkaline phosphatase Pmf proton motive force SRP signal recognition particle Tat twin arginine translocation TEV tobacco etch virus TMs transmembrane segments UPR unfolding protein response

7

Amino acid designations Alanine Ala A Cysteine Cys C Aspartic acid Asp D Glutamic acid Glu E Phenylalanine Phe F Glycine Gly G Histidine His H Isoleucine Ile I Lysine Lys K Leucine Leu L Methionine Met M Asparagine Asn N Proline Pro P Glutamine Gln Q Arginine Arg R Serine Ser S Threonine Thr T Valine Val V Tryptophan Trp W Tyrosine Tyr Y

8

List of publications

This thesis is based upon the following publications: Paper I. Drew D, Sjöstrand D, Nilsson J, Urbig T, Chin CN, de Gier JW, von Heijne G. Rapid topology mapping of Escherichia coli inner-membrane proteins by prediction and PhoA/GFP fusion analysis. Proc Natl Acad Sci U S A. 2002 Mar 5;99(5):2690-5. Paper II. Rapp M, Drew D, Daley DO, Nilsson J, Carvalho T, Melen K, De Gier JW, von Heijne G. Experimentally based topology models for E. coli inner membrane proteins. Protein Sci. 2004 Apr;13(4):937-45. Paper III. Drew D, von Heijne G, Nordlund P, de Gier JW. Green fluorescent protein as an indicator to monitor membrane protein overexpression in Escherichia coli. FEBS Lett. 2001 Oct 26;507(2):220-4. Paper IV. Drew D, Slotboom D, Friso G, Reda T, Genevaux P, Rapp M, Meindl-Beinker N, Lambert W, Lerch M, Daley DO, van Wijk KJ, Hirst J, Kunji E, de Gier JW. A scalable, GFP-based pipeline for membrane protein overexpression screening and purification. Protein Sci. 2005 Aug;14(8):2011-7. Other Publications Urbanus ML, Fröderberg L, Drew D, Bjork P, de Gier JW, Brunner J, Oudega B, Luirink J. Targeting, insertion, and localization of Escherichia coli YidC. J Biol Chem. 2002 Apr 12;277(15):12718-23. Drew D, Fröderberg L, Baars L, de Gier JW. Assembly and overexpression of membrane proteins in Escherichia coli. Biochim Biophys Acta. 2003 Feb 17;1610(1):3-10. Review. Daley DO, Rapp M, Granseth E, Melen K, Drew D, von Heijne G. Global topology analysis of the Escherichia coli inner membrane proteome. Science. 2005 May 27;308(5726):1321-3.

9

1. Introduction All cells are surrounded by a membrane, a barrier that separates the cell from the

environment it faces. The membrane of the cell is mainly composed of lipid and

protein at an average ratio of 1:1 (Boon and Smith, 2002). Lipids are dual natured.

They consist of polar head groups that favor contact with water, and

hydrophobic tails - made up of acyl carbon chains - which implicitly avoid water.

The lipids pack into a fluid bilayer whereby the tails face each other and the head

groups, e.g., phosphate, are in contact with the surrounding water, Figure 1.

Figure 1. Schematic representation of a lipid bilayer; blue spheres represent polar head-groups, yellow sticks represent lipid tails, coloured cylinders represent membrane proteins, and attached sugars are represented by black antlers.

The driving force in the formation of a lipid bilayer is the spontaneous packing of

hydrophobic tails, as the entropy of water is increased during reduction of the

hydrated hydrophobic surface, i.e., the hydrophobic effect. The outcome is a

hydrophobic barrier that is impermeable for most molecules to cross without the

aid of proteins which are embedded in it. Not only are these ‘membrane

proteins’ required to facilitate the transport of various compounds either

passively or actively across the membrane, but they also e.g., impart structural

10

support, maintain voltage differences, enable interactions with other cells, and

transfer information from the outside to the inside of the cell. In other words

membrane proteins are essential for life, and roughly one-quarter of our genes

code for membrane proteins (Wallin and von Heijne, 1998). Strikingly, at least

~50% of all drugs manufactured today are targeted to membrane proteins

(Muller, 2000).

Unfortunately, our understanding of membrane proteins lags behind that

of soluble proteins, and is best reflected in the fact that only 0.5% of the

structures deposited in the protein data-bank (PDB) are of membrane proteins

(White, 2004). This discrepancy has arisen because their hydrophobicity, which

enables them to exist in a lipid environment, has made them resistant to most

traditional approaches used for procuring knowledge from their soluble counter-

parts. As such, novel methods are required to facilitate our knowledge

acquisition of membrane proteins.

In this thesis a generic approach for rapidly obtaining information on

membrane proteins from the classic bacterial encyclopedia Escherichia coli is

described. We have developed a Green Fluorescent Protein (GFP) C-terminal

tagging approach, with which we can acquire information as to the topology and

‘expressibility’ of membrane proteins in a high-throughput manner. This

technology has been applied to the E. coli inner membrane proteome, and stands

as an important advance for further membrane protein research.

Before we discuss this work in detail a clearer understanding of membrane

proteins is required.

11

1.2 Membrane proteins

Like lipids, membrane proteins consist of hydrophobic and hydrophilic parts.

These parts come together to produce two types of membrane protein

architecture, α-helical membrane proteins and β-barrel membrane proteins,

Figure 2. β-barrel membrane proteins are composed of an even number of anti-

parallel β-strands which hydrogen bond laterally to each other in the formation

of the barrel (Schulz, 2003). Amino acid side chains of mixed polarity extend into

the aqueous pore, whilst amino acids with apolar side chains line the outside of

the barrel and project into the lipid bilayer. Because this class of membrane

proteins is restricted to outer membranes of Gram-negative bacteria,

mitochondria and the outer envelope membrane of chloroplasts, it is not further

discussed here.

Figure 2: Two types of membrane protein architecture; (a) an example of a α-helical membrane protein and (b) an example of a β-barrel membrane protein (Walian et al., 2004).

12

1.2.1 α-helical architecture

The majority of membrane proteins are α-helical membrane proteins (Wallin and

von Heijne, 1998), henceforth they will be referred to simply as ‘membrane

proteins’. α-helical secondary structure is stabilized by main-chain hydrogen

bonding between backbone amide and carbonyl groups four amino acids apart.

Amino acid side chains with different physicochemical properties can extend at

predominantly right angles from the helix, i.e., amino acids with apolar side

chains project into the hydrophobic core of the lipid bilayer. Three dimensional

(3D) structures confirm that α-helices typically span the full-width of the lipid

bilayer, and are often referred to as trans-membrane segments (TMs).

Statistically, TMs are around 20-25 amino acids long with an average tilt angle of

24° to the membrane normal (Ulmschneider et al., 2005), though this tilt can

change to accommodate the thickness of the lipid bilayer (Park and Opella, 2005).

- Helix core-

A distance of ± 15Å from the centre of the membrane defines the core region of

the lipid bilayer, it has the lowest dielectric constant, and as such, charged

residues are uncommon in the middle of TMs (<6%) and hydrophobic amino

acids leucine, valine, isoleucine, alanine are abundant ~45% (Ulmschneider et al.,

2005), Figure 3. Amino acids with small side chains e.g., glycine and serine, are

also common, 7% each, facilitating packing between TMs (see section 1.2.3).

Biophysical and biological scales are broadly consistent with statistical

analysis. Charged residues arginine, aspartate, glutamate, and lysine are clearly

disfavored in the middle of the helix (∆Gapp 2.5 to 3.5 kcal/mol) whereas

hydrophobic amino acids leucine, valine, isoleucine, phenylalanine are favoured

(∆Gapp of -0.5 to -0.3 kcal/mol) (Hessa et al., 2005a). Proline while unfavorable is

often found in TMs to induce a helical angle change of some functional

significance (Senes et al., 2004), e.g., the sixth TM segment of the voltage-gated

potassium channel Kv1.2, contains a conserved Pro-X-Pro motif which forms a

13

receptor for its voltage sensor (Long et al., 2005a). Indeed, proline has one of the

largest phenotypic propensities in TM sequences from the Human Gene

Mutation Database (Senes et al., 2004).

Figure 3: Schematic representation of a TM segment in a lipid bilayer; residues with positional preference are indicated by their short-hand nomenclature, e.g., W= tryptophan (see abbreviations).

- Interfacial regions-

Aromatic amino acids tryptophan and tyrosine have a clear preference (∆Gapp -

0.6 kcal/mol) for the lipid interface (-25 to -15Å and 15 to 25Å), as these residues

can match their amphipathic side-chain character with that of the interfacial lipid

region, Figure 3 (Hessa et al., 2005a). The penalty of moving tryptophan or

tyrosine from the interface to the aqueous domain has been calculated to be 1.85

and 0.94 kcal/mol, respectively (Wimley and White, 1996). In addition, the

terminal placement of tryptophan in a model polyleucine TM segment is enough

to promote a C-terminal-in-orientation (Higy et al., 2004). In contrast,

14

phenylalanine has no positional preference for the interface (Hessa et al., 2005a;

Ulmschneider et al., 2005).

Charged residues make up one-fifth of the amino acids found in this

region (Ulmschneider et al., 2005). Along with polar residues, they often extend

their side-chains to the aqueous domain to help anchor TMs (Chamberlain et al.,

2004). This ‘snorkeling’ phenomenon is calculated to be stronger in positively

charged residues lysine and arginine, either because their side-chains are longer

and/or for the reason that they also interact favourably with negatively-charged

lipid head-groups (Strandberg and Killian, 2003). Snorkeling is also apparent for

the positively charged residues in interfacial helices which make up 30% of the

non-TM fold (Granseth et al., 2005b). Interestingly, lysine can make π-cation

interactions with tyrosine. This pairing promotes additional long-range

electrostatic interactions with negatively charged lipid head-groups (Gromiha

and Suwa, 2005).

As proline can destabilizes helices it more likely to be found at either end

of the helix (interfacial region), with the C-terminal end better tolerated over the

N-terminal end (Yohannan et al., 2004). The destabilizing effect of proline is

calculated to be stronger in straight TMs compared to angled TMs (Senes et al.,

2004). Proline may also aid protein folding by promoting the formation of

random coils (Ulmschneider et al., 2005), which make up 70% of the non-TM

segment fold found in this region (Granseth et al., 2005b).

-Non-membranous domains-

The hydrophilic membrane protein parts are composed of N- and C- terminal

tails and ‘loops’ that connect TMs. In all organisms, the frequency of positively

charged residues is higher in cytoplasmically localized non-membranous

domains, an observation that was coined the ‘positive-inside-rule’ (von Heijne,

1989). The preference of these positively charged residues for the cytoplasmic

domain influences the topology of connecting TMs accordingly. The ability of

15

positively charged residues to dictate the orientation of a TM segment seems to

depend on the overall hydrophobicity of the TM segment, and the distance of the

charged residues from it (Higy et al., 2004; Nilsson et al., 2005). The basis for the

rule still remains unclear. Although, it was demonstrated some time ago that the

proton-motive-force is required in the establishment of this phenomenon in E.

coli (Andersson and von Heijne, 1994), it offers only a partial explanation as there

is no apparent electrochemical potential across the ER membrane. Recently, it

was reported that charged residues in the translocon itself, by either attracting or

repelling charged amino acids may play a role, i.e., to promote the orientation of

a TM segment before it inserts into the lipid-bilayer (Goder et al., 2004).

Cytoplasmic N- and C- terminal tail orientations are predicted to be preferred in

all cells (Wallin and von Heijne, 1998). The percentage of E. coli membrane

proteins with both their N- and C-terminal ends in the cytoplasm was

experimentally measured at 60% (Daley et al., 2005). It appears that helices may

also have a preference for inserting into lipids as pairs (Hermansson and von

Heijne, 2003); the targeting to and insertion of membrane proteins into the

membrane is discussed below.

1.2.2 Membrane protein biogenesis

Although soluble domains of membrane proteins can fold autonomously into the

aqueous milieu of the cell, hydrophobic ΤΜs need to be actively assisted into the

lipid bilayer. This assistance surpasses the input of energy required to overcome

the insertion activation barrier imposed by the lipid bilayer, and prevents ΤΜs

from aggregating in the cytoplasm.

How does this work? If a (presumably) α-helical and sufficiently

hydrophobic stretch of amino acids has exited the ribosome tunnel, it will be

interpreted by the cell as a ‘signal’ for targeting to the membrane (Batey et al.,

2000; Huber et al., 2005). This signal is often present at the N-terminus of the

membrane protein, and is typically recognized by the signal recognition particle

16

(SRP) (Luirink and Sinning, 2004). SRP binds to the polypeptide chain, at least in

eukaryotes, halts further translation whilst targeting the nascent chain to the

lipid bilayer. Whether or not the ribosome can ‘prime’ SRP by sensing the

presence of a TM segment before it exits the ribosome is a matter of debate

(Houben et al., 2005; Woolhead et al., 2004). At the membrane, SRP makes

contact with the SRP receptor, and the nacent chain is subsequently transferred

to the Sec translocon; a multimeric protein-conducting channel embedded in the

lipid bilayer (Driessen et al., 2001; Van den Berg et al., 2004). Translation

subsequently resumes, and if the targeted nascent chain and/or other segments

downstream are hydrophobic and long enough, they will pass laterally through a

opening in the Sec translocon (Rapoport et al., 2004). The degree of insertion

seems to depend solely on energetically favorable helix-lipid interactions (Hessa

et al., 2005b). In the lipid bilayer, TM folding can be aided early on by other

membrane bound chaperones, such as YidC in the cytoplasmic bacterial

membrane (Houben et al., 2005).

1.2.3 Membrane protein folding

Membrane protein structures can (almost) be considered as ‘inside-out’ soluble

proteins, as the average hydrophobic exterior of a membrane protein is twice that

of its interior (Adamian et al., 2005). For membrane proteins with multiple TMs,

the TMs must come together to form a functional protein. From a global

perspective the hydrophobic effect drives the formation and subsequent

insertion of α-helices through the translocon - unfolding a 20 amino acid helix in

the lipid bilayer would cost ~40-80 kcal/mol (Schneider, 2004) - but what pushes

helices together?

Thermodynamic contributions of this process have been difficult to assess

because membrane proteins are difficult to purify and do not fold reversibly

under standard laboratory conditions (DeGrado et al., 2003); a requirement for

measuring folding equilibria, albeit that a fully reversible system was recently

17

established for the β-barrel membrane protein OmpA (Hong and Tamm, 2004).

Considerable understanding of this process has been based on helix dimerization

of glycophorin A (gpA), whereby physical association of GlyXXXGly (a

widespread motif in TMs, Senes et al., 2000) can be conveniently monitored, e.g.,

by analytical ultracentrifugation, fluorescence resonance energy transfer (FRET),

gel-electrophoresis, etc. (White and von Heijne, 2005).

The predominant view is that once inserted in the membrane, helix-helix

association is driven by the formation of favourable electrostatic interactions

between side chains of polar amino acids (Dawson et al., 2002). This is in line

with statistical analyses, as polar residues occupy 20% of all the residues found

in TMs (Dawson et al., 2003), and with the observation that in every TM segment

of every multispanning membrane protein structure solved so far, there is at

least one inter-helical hydrogen bond (Senes et al., 2004). Perceptually, the flip-

side of promiscuous electrostatic interactions between TMs is that it could lead to

aggregation by forming erroneous hydrogen bonds (Schneider, 2004). Yet this is

not the case. Once driven together, helical packing is coordinated by close,

specific Van der Waals interactions of non-polar residues, which often interlink

to build ‘knobs-into-holes’ packing (Engelman et al., 2003). As demonstrated for

TMs in mechanosensitive ion channels, this helix packing can be fine-tuned to

control the function of the protein in a most exquisite way (Edwards et al., 2005).

Mechanosensitive channels are force transducing molecules which move TMs to

open a channel in response to membrane tension (Kung, 2005). Mutations made

in a pore forming TM segment of a bacterial mechanosensitive channel to

strengthen knobs-into-holes packing to interacting TMs, leads to a loss-of-

function as the channel does not open under the same magnitude of membrane

tension; in contrast, an amino acid mutation to the polar amino acid serine makes

the channel easier to open as ‘wild-type’ helical packing is lost (Edwards et al.,

2005).

18

At short distances, hydrogen bonding between main chain Cα− H … O donors

may also stabilize helices (Senes et al., 2001), although their interaction is weak,

there can be many such interactions, e.g., in photosystem I there are 34 TMs and

75 Cα− H … O hydrogen-bonds (Jordan et al., 2001). Helix association can be

strong enough to maintain oligomerisation even in the absence of lipids and in

the presence of a harsh detergent, i.e., potassium channel KcsA remains a

tetramer in SDS (Krishnan et al., 2005).

Beyond the two-stage model of membrane protein folding (single TM

insertion and packing), bringing of helices together depends also on insertion of

co-factors, extramembranous polypeptide segment folding, and the assembly of

membrane protein complexes (Engelman et al., 2003). Lastly, it has been

speculated that the lipids themselves might drive interactions between TMs, as

computational measurements postulate that lipid entropy increases as the

protein-lipid interface decreases (Helms, 2002).

1.2.4 Membrane proteins and lipids

Clearly membrane proteins and lipids go hand-in-hand; they define favorable

amino acid residues in helices and dictate the insertion and folding rate of TMs

through the translocon. Not only is it becoming increasingly clear that certain

lipids interact more favorably with some amino acids (e.g., lysine/tyrosine π-

cation long-range interactions to phosphate head groups, Gromiha and Suwa,

2005), or to some membrane proteins (e.g., cardiolipin in the purple bacterial

photosynthetic reaction centre, Fyfe et al., 2005), but lipids to some extent must

also supply different lateral pressure to different membrane proteins (Jensen et

al., 2004).

Lateral pressure in different membranes is increased by the addition of

lipids with unsaturated chains and/or non-bilayer head-groups, e.g.,

phosphatidylethanolamine (PE). One idea is that membrane proteins insert easier

into a bilayer of lower curvature stress (e.g., as shown by in vitro folding studies

19

of bacteriorhodopsin into different liposomes), but that a certain degree of lateral

pressure is still needed to maintain a functional state (Booth, 2005). Interlinked is

the membrane bilayer thickness to TM segment length, that is, the degree of

hydrophobic mismatch between the α−helices and lipid (Jensen and Mouritsen,

2004). As demonstrated in vitro with the bacterial melibiose transporter (there are

many analogous examples), maximum transport is only reached at specific acyl

carbon chain lengths (Jensen and Mouritsen, 2004). Indeed, to seemingly match

the thickness of the lipid bilayer, on average, TMs of Golgi membrane proteins

are five amino acids shorter than those of plasma membrane proteins (Munro,

1998). Lastly, it is clear that lipid composition can affect membrane protein

topology (see next section). For instance, in the absence of PE the first six TMs of

lactose permease (LacY) are inverted; addition of PE after assembly of this partly

inverted protein restores the correct topology (Bogdanov et al., 2002).

20

2.1 Membrane protein topology

It is envisaged that in the future more rules that govern the architecture of a

membrane protein will be resolved, eventually allowing the construction of

meaningful in silico membrane protein 3D-structure predictions from amino acid

sequence (White and von Heijne, 2005). At present, to bridge the void created by

the lack of membrane protein structures, one can formulate 2D-structure models

using computer algorithms. 2D-structures are commonly referred to as

‘topology’ models, and define the number, position, and orientation of TMs

relative to the membrane.

2.1.1 Topology prediction algorithms

The most simplistic topology models are produced solely by computer

algorithms. The five topology predictors used in this thesis are described below.

[1] The algorithm TopPred scans for a TM segment in a given amino acid

sequence by searching for ‘threshold’ hydrophobicity over a typical TM segment

length (trapezoid-shaped window of 21aa). The positive-inside-rule is then used

to decide upon TM segment orientation (von Heijne, 1992).

[2] The Memsat algorithm increases the number of states used in TopPred

from two (helix or loop) to five (inside loop, outside loop, inside helix cap, helix

core, and outside helix). The probability that amino acids of an inputted amino

acid sequence belong in these states, their likelihood, is calculated based on a

membrane protein database of well-characterized topology. The most probable

outcome, i.e., the topology, is formulated by the statistical method ‘expectation

maximization’ and orientation/location agreed upon by incorporating another

dynamic programming algorithm (Jones et al., 1994).

[3] The PHDhtm algorithm estimates only two states (helix or loop), but

unlike TopPred, improves the signal by feeding off a multiple sequence

21

alignment. Notably, the algorithm has been ‘trained’ using neural networks from

a set of membrane proteins with known topology (Rost et al., 1996).

[4 and 5] The latest generation topology prediction programs HMMTOP

(Tusnady and Simon, 1998) and TMHMM (Krogh et al., 2001), are the ‘best’

combination of the aforementioned programs. Like Memsat, HMMTOP and

TMHMM take into account different states, five and seven respectively, and

analogous to PHDhtm use machine-learning algorithms, in this case, hidden

Markov models (HMM) to look for amino acid distribution patterns similar to

those defined in the training set. One advantage of TMHMM compared to the

other algorithms is that reliability scores are also generated. Recently, a newer

version of TMHMM was developed, like PHDhtm, it allows the input of multiple

sequence alignments. The TMHMM prediction performance is improved by ~8%

(Viklund and Elofsson, 2004).

2.1.2 Reliability of topology prediction

TMHMM is able to accurately predict the topology of 75% of the membrane

proteins used in training its HMM algorithm (Krogh et al., 2001). However, as

this training sample set is quite small, the predictive power is poorer for

previously unseen membrane proteins, 55-60% (Melen et al., 2003). The sample

set is also biased, as experimental determined topologies have favored those

membrane proteins that were easier to analyze owing to the fact they have had

clearly defined topological features, i.e., unusually hydrophobic TM segments

and/or an obvious positive charge difference between inside and outside loops

(Melen et al., 2003). As many of the easy to analyze proteins are prokaryotic in

origin, eukaryotic membrane proteins are underrepresented in all training sets

(Ott and Lingappa, 2002). Thus, the predictive performance by TMHMM for

eukaryotic membrane proteins is slightly worse, ~50% (Melen et al., 2003).

Highly reliable topology models can be generated by combining the

aforementioned five prediction methods, TopPred, Memsat, PHDhtm,

22

HMMTOP, and TMHMM; when all methods agree the topology is virtually

certain to be correct, whereas the fraction of correct topologies decreases with

increasing disagreement between the methods (Nilsson et al., 2000).

An approach to improve the membrane protein topology prediction is to

bioinformatically anchor domains in a prediction which are 100% certain to lie on

either one or the other side of the membrane, e.g., a cytosolic tyrosine

phosphatase domain. In eukaryotic genomes such domains provide 11%

coverage (Bernsel and Von Heijne, 2005). Alternatively, one can experimentally

map the location of loops and tails in a membrane protein by a variety of

methods (explained below). Just determining the C-terminal tail location of E. coli

membrane proteins helps TMHMM to improve its overall prediction accuracy

from 55 to 70%, i.e., as these domains can now be fixed in the topology prediction

(Melen et al., 2003).

2.1.3 Experimental topology mapping

Experimental approaches are often used to refine in silico topology models which

are not only biased, but (in general) are likely to miss details which are hard, if

not impossible, to predict, e.g., unanticipated inter- and intra- protein

interactions (Ott and Lingappa, 2002). One approach of obtaining information is

to use site-directed mutagenesis to introduce amino acids which are compatible

to different topology determination methods, e.g., cysteine scanning,

glycosylation mapping, and proteolytic cleavage.

For eukaryotic membrane proteins the most common method is

glycosylation mapping, which takes advantage of the fact that N-linked

glycosylation - the addition of ~2.5kDa worth of sugars to Asn-X-Ser/Thr

acceptor sequences - is possible only within the luminal compartment of the ER.

In practice, after adding glycosylation acceptor sequences into the predicted

soluble parts of the membrane protein by site-directed mutagenesis, the

membrane protein is transcribed and translated in vitro. The addition of sugars to

23

the membrane protein is distinguished from unglycosylated forms by the slight

difference in molecular weight after separation by SDS-PAGE (Nilsson and von

Heijne, 1993).

Perhaps the most labor intensive, and yet the most informative and least

invasive topology mapping method is cysteine scanning. In this method

cysteines are recombinantly added to a cysteine-less membrane protein, and

their localization within the membrane protein mapped by membrane permeable

or impermeable thiol-reagents (Bogdanov et al., 2005). This is a powerful method

as it is possible to elucidate the local environment of a single amino acid. This

approach was nicely demonstrated for the secondary-active transporter LacY

(Frillingos et al., 1998).

Another approach for obtaining topology information is to fuse a reporter to all

of the predicted solvent-exposed domains in the membrane protein. The reporter

can be fused end-to-end on, or ‘sandwiched’ (if chimera retains activity), into

different loops such that the full-length membrane protein is always expressed

(van Geest and Lolkema, 2000). When produced in E. coli the two most common

reporters are enzymes that catalyze a reaction on either one or the other side of

the membrane; the cytoplasm or periplasmic space (see below).

[1] Alkaline phosphatase (PhoA) is a soluble bacterial protein that is only

folded and functional when exported to the periplasm of E. coli where it can form

essential disulfide-bonds. It was one of the first, and still remains to be, one of the

most commonly used topology reporters. PhoA activity - the hydrolysis of

phosphoric esters – is measured easily with a substrate that changes colour upon

hydrolysis, e.g., p-nitrophenyl phosphate turns yellow. If PhoA remains in the

reducing environment of the cytoplasm it is sensitive to proteolysis because it

cannot form disulfide bonds (Manoil, 1991).

24

[2] β-galactosidase (LacZ) is a large tetrameric cytoplasmic enzyme, part of

the classic ‘lac operon’ which hydrolyzes lactose into galactose and glucose. It

complements PhoA as it is only active in the cytoplasm; when targeted to the

periplasm it becomes trapped in the membrane, and inactive. Its activity can also

be measured colorimetrically, as it turns the chromogenic substrate X-gal (5-

bromo-4-chloro-3-indoyl-β-D-galactoside) blue (Manoil, 1991).

To avoid false-negatives, reporter activity is usually normalized against

protein expression. Protein expression is typically measured by Western-blotting

or immunoprecipitations (IPs) (van Geest and Lolkema, 2000). Thus, analyzing

many fusions is often labor intensive. A disadvantage with LacZ is that it may

generate false-positives as a result of many artifacts, e.g., saturation of the export

machinery. In contrast, PhoA is reported to be more reliable because an active

fusion has to be successfully exported to the periplasm. In principle, a

combination of PhoA / LacZ reporters to the same sites in the membrane protein

is best. Unfortunately, ambiguous high LacZ and PhoA reporter activities to

identical fusion sites have been reported in many cases (van Geest and Lolkema,

2000).

Papers I and II

This thesis deals with the development of GFP as a high-throughput cytoplasmic

membrane protein topology reporter. GFP can be used in combination with the

periplasmic reporter PhoA, to rapidly establish the C-terminal tail orientation of

a membrane protein. The usefulness of combining this information with

bioinformatics to generate reliable topology models is shown.

25

2.2 High-throughput topology mapping of E. coli membrane proteins

High-throughput topology mapping requires a methodology that can

simultaneously handle many membrane proteins, is reliable, robust, and easy to

use. We have found that this is most easily accomplished for E. coli and

Saccharomyces cerevisiae membrane proteins in their respective hosts, by

combining topology prediction with minimal experimental information (Paper I;

Kim et al., 2003). Here we will focus only on the high-throughput topology

mapping of membrane proteins in E. coli. Topology prediction is best generated

by a ‘consensus approach’ or by constraining TMHMM (as explained in section

2.1.2). For analyzing many membrane proteins in E. coli, in favor of the other

approaches (section 2.1.3), minimal experimental information is best obtained

using single end-to-end C-terminal reporter-protein fusions.

2.2.1 A consensus approach for generating topology models

For about 80 out of the predicted 737 multispanning membrane proteins in E.

coli, five prediction programs (section 2.1.1) agree on the location of the N-

terminus, but disagree on the location of the C-terminus because of - plus or

minus - one TM segment. When the analysis of such cases was applied to a

membrane protein test set of known topology, the correct topology could always

be inferred from either one of the two majority predictions (Nilsson et al., 2000).

Thus, the reliability of the prediction is very high when all the methods agree,

and the correct topology can be simply determined by assigning the C-terminal

tail location of the membrane protein.

2.2.2 Using GFP as a cytoplasmic membrane protein topology reporter in E. coli

Because of the artifactual tendency of historically used cytoplasmic reporters

(e.g., LacZ), it was decided that the development of a new topology reporter

would benefit greatly the C-terminal mapping of many membrane proteins in E.

26

coli. For this reason, we sought to establish if GFP could be used to monitor

membrane protein topology. GFP was selected because it is incorrectly folded

and does not fluoresce when targeted to the periplasm of E. coli with a Sec-type

signal peptide (Feilmeier et al., 2000). This finding suggested that it would be

likewise inactive when fused to periplasmic membrane protein segments.

Importantly, GFP is compatible with the aforementioned high-throughput

criterion; fluorescence from E. coli cells expressing membrane protein-GFP

fusions is easy to measure, and only the amount of protein that is membrane

embedded is fluorescent (Paper III). To test if GFP could be used to assign the C-

terminal tail orientation of a membrane protein, GFP was fused to the C-terminal

tail of the membrane protein leader peptidase (Lep/periplasmic C-terminus) and

to its positive charge rearrangement mutant, inverted leader peptidase

(Lepinv/cytoplasmic C-terminus). Lep/Lepinv-GFP fusions were expressed

under standard conditions (section 3.1.4).

Induced expression at a temperature of 37°C produced clear differences in

Lep and Lepinv GFP fluorescence. The mutant Lepinv with the cytoplasmic C-

terminus was ~10-fold more fluorescent in liquid culture than Lep (Paper I). At

the lower temperature of 25°C the difference was less, therefore, cells were

always cultured at 37°C, Figure 4a. After Western-blotting using antibodies

directed against either GFP or Lep, it was apparent that the Lep-GFP fusion was

degraded, Figure 4c. As a further control, other membrane protein-GFP fusions

with cytoplasmic C-terminal tails were tested, Figure 4b. Membrane proteins

with periplasmic C-terminal tails contain less fusion, perhaps due to

degradation, and are consistently less fluorescent (Paper I).

27

Figure 4: GFP as an E. coli cytoplasmic topology reporter. A) Lep-GFP vs. Lepinv-whole-cell GFP fluorescence, B) ExbB-, SecF-, Lepinv-, Lep-, Sec- GFP whole-cell GFP fluorescence, C) Western-blotting of Lep-GFP and Lepinv-GFP after induced expression at 25°C (lanes 2, 5) or 37°C (lanes 3, 6); decorated with either Lep antibody (top panel) or GFP antibody (bottom panel), D) Contrasting PhoA (top graph)/GFP (bottom graph) activities for 12 E. coli membrane proteins that adhere to the majority-vote criterion (Paper I).

28

2.2.3 Combining C-terminal orientation analysis with a consensus-prediction approach

PhoA and GFP C-terminal fusions were made to an initial set of 12 membrane

proteins, MarC, PstA, TatC, YaeL, YcbM, YddQ, YdgE, YedZ, YgjV, YiaB, YigG,

and YnfA, out of a possible 80 or so E. coli membrane proteins that adhered to

our consensus criterion.

After expression of fusions, as before, GFP and PhoA activities were

measured. Cut-off values for what was considered ‘high’ or ‘low’ GFP

fluorescence were arbitrarily decided based on the differences between Lepinv-

GFP (cytoplasmic C-terminus), and Lep-GFP (periplasmic C-terminus)

fluorescence (Paper I). A ‘high’ fluorescent signal over a certain threshold (12,000

units) allowed a cytoplasmic location to be tentatively assigned. A ‘low’

fluorescent signal was considered ambiguous, as it is impossible to distinguish

between poorly expressing membrane proteins and those with periplasmic C-

terminal tails. The location of the C-terminus was established when the

fluorescent activity was in agreement with the activity from the periplasmic

reporter PhoA, Figure 4d (section 2.1.3).

Only two of the 12 membrane proteins (YaeL, YigG) had insufficient

differences between the PhoA and GFP activities to be certain of the location of

the C-terminus. For these two membrane proteins and a control, truncated GFP

fusions were made to clarify the C-terminal tail orientation. The final C-terminal

tail locations were then used to ascertain the correct topology predictions (Paper

I).

2.2.4 The reliability of topologies generated by a consensus approach

Encouraged by the consistent contrasting PhoA/GFP activity profiles used to

map topologies of 12 E. coli membrane proteins, C-terminal PhoA/GFP fusions

were made to another 37 E. coli membrane proteins and analyzed (Paper II). A

few membrane proteins included in this test set had a known topology. The GFP

activity from these membrane proteins were used to refine the original ad-hoc

29

cut-offs values made from contrasting Lep/Lepinv-GFP activity, in the

assignment of unambiguous C-terminal tail locations.

For 34 out of the 37 membrane proteins, contrasting PhoA and GFP

activities were sufficient to assign a C-terminal tail location. This brought the

total number of topologies mapped up to 46 (Paper II). After analyzing these 46

topologies it was clear that the majority prediction is most likely to offer the

correct topology; when 4 out of the 5 topology predictors agree the majority

prediction was correct - in regards to the location of the C-terminus - 90% of the

time.

How do these topology models compare to other topology studies? While

the topology prediction for TatC (an essential component of the TAT-translocase,

Palmer and Berks, 2003), with 6 TMs and cytoplasmic N-, C- termini was later

interpreted to have only 4TMs (Gouffi et al., 2002), other independent studies

have concurred with the topology prediction generated by our approach

(Behrendt et al., 2004). The topology determined for YaeL, a protein that belongs

to a family of membrane-embedded metalloproteases (Rudner et al., 1999), was

also the same as that previously determined for the related Bacillus subtilis protein

SpoIVFB as regards the location of the conserved HEXXH and NPDG motifs

relative to the inner membrane (Green and Cutting, 2000).

The consensus approach and the use of GFP as a topology reporter has

since been used by other researchers (Culham et al., 2003; Gandlur et al., 2004;

Jakubowski et al., 2004; McMurry et al., 2004; Severance et al., 2004).

2.2.5 Generating topology models by constraining TMHMM

Although the consensus approach is a useful strategy for generating reliable

topology models, it covers only ~10% of the α-helical membrane proteins in E.

coli. An alternative approach is to ‘feed’ into TMHMM the location of

experimentally determined amino acids, e.g., C-terminal tails. When this was

tested in silico, using a data set of 233 membrane proteins of known topology, the

30

overall prediction performance for TMHMM increased from ~70% unconstrained

to ~80% constrained (Melen et al., 2003). Somewhat unexpectedly, the prediction

performance actually gets worse if the residue to be fixed is not restricted to the

N- or C- terminus, but is chosen based on the "lowest probability loop residue"

selected from a TMHMM probability prediction profile. The main reason for this

is that loop regions predicted with greatest uncertainty, in fact, frequently

correspond to true transmembrane regions making this approach unfeasible

(Paper II).

To establish the C-terminal tail orientation, as before, dual PhoA/GFP

fusion reporters can be used (Papers I and II). The constraining of TMHMM for

generating improved topology models has been successfully applied to the entire

E. coli inner membrane proteome (Daley et al., 2005). Contrasting PhoA/GFP

activities were sufficient to assign unambiguous C-terminal tail locations for 75%

of the inner membrane proteome. Many of these proteins shared high homology

to another membrane protein in the genome. These membrane proteins were

used to assign C-terminal tail locations to membrane proteins not initially

mapped by this approach; the final coverage was ~90%. This topological

information has been extrapolated to assign topology maps to another 51,208

homologous membrane proteins in other bacterial genomes (Granseth et al.,

2005a).

2.2.6 Why does GFP work as a topology reporter?

Given that it is possible to export correctly folded GFP to many cellular

organelles (Tsien, 1998), including the periplasm of E. coli with a Sec independent

TAT-signal peptide (Thomas et al., 2001), why is GFP not fluorescent in the

periplasm when targeted to this compartment with a Sec-type signal sequence?

As it is possible, after acid-base treatment, to refold periplasmic GFP so that it

becomes fluorescent, it suggests that Sec-exported GFP is simply incorrectly

folded (Feilmeier et al., 2000). Our results indicate that the misfolded GFP is

31

sensitive to proteolysis when fused to periplasmic membrane protein segments

(Papers I and II); similar degradation has been noted for a few soluble proteins

terminally fused to membrane protein segments (Pourcher et al., 1996). GFP and

PhoA have now been used to assign the C-terminal tail location of over 500 E. coli

membrane proteins. In 71 out of 72 of the cases where the C-terminus of the

membrane protein was convincingly established beforehand (i.e., 3D-structure or

biochemical analyses), the PhoA/GFP assignments were in total agreement

(Daley et al., 2005).

2.2.7 Comparing 2D maps to 3D-structure

How often do topology predictions get it right? This is difficult to address as

there are so few membrane protein structures. If we consider topology as the

number of TM segments and their orientation relative to the membrane, the

constrained TMHMM topology predictions, compared to structure, are more

than 80% correct; the most frequent error is to leave one TM out. If we include

identifying reentrant loops, interfacial helices, and the exact positioning of

helices, topology predictions are (presently) only a first-step towards

understanding structure-function relationships. Understanding structural details

to this level is typically only possible with a high-resolution structure; section 3

will expand on this challenge.

2.2.8 Summary of high-throughput membrane protein topology mapping

In the absence of a 3D structure, one way to gain structural information of any

membrane protein is to determine its topology, i.e., the number, position, and the

overall in-out orientation of TMs relative to the membrane. In E. coli, this step is

usually accomplished by using reporter enzymes such as PhoA or LacZ fused to

different portions of the membrane protein. Usually, the number of reporter

fusions that needs to be made and analyzed for a complete topology

32

determination is equal to or larger than the number of TMs in the membrane

protein, thus requiring significant experimental effort.

We have shown that a reliable membrane protein topology can be simply

and rapidly deduced from a combination of in silico topology predictions and

single C-terminal PhoA/GFP reporter-protein fusions (Paper I). Although this

approach might have been possible using classical PhoA and LacZ fusions, GFP

offers an attractive alternative; the assay requires little experimental set-up,

measurements are completed in seconds, and as the GFP fluorescence is linear to

the amount which is folded - in contrast to enzymatically active fusions - GFP

activity does not need to be normalized to (quantified) protein expression (Paper

I). Indeed, after ambiguous results with classical PhoA/LacZ fusions, GFP has

been used to clarify the topology of the ABC transporter, DrrB (Gandlur et al.,

2004).

After a few modifications, this approach was possible on a larger scale

format (Paper II), and was extended to determine C-terminal locations, and

subsequently constrained TMHMM topology models for the entire E. coli inner

membrane proteome (Daley et al., 2005). This proteome information has been

used to up-date the Swiss-Prot and NCBI databases.

33

3.1 Membrane protein overexpression

One of the main obstacles towards understanding membrane proteins is the

difficulties associated with obtaining pure material for biochemical and

structural analysis (Grisshammer and Tate, 1995). Most membrane proteins

overexpress very poorly - typically less than < 1 mg/L - if they do at all. This is a

huge problem. Recently, in the magazine Nature it was stated that “… labs

around the world aim to add membrane proteins (structures) to international

databases over the next five years. But to do so, they must first be able to churn

out milligrams of easily purified protein ” (Hoag, 2005).

3.1.1 Limited availability of biogenesis factors and/or lipid space may hamper membrane

protein overexpression

Why do membrane proteins overexpress poorly? Intuitively, it seems that there

might be a limit to the availability of membrane protein biogenesis components

and space available in the lipid bilayer. Not only does the overexpression of

membrane proteins require the availability of components like, e.g., SRP and the

Sec translocon, to faithfully target and insert multiple copies of a membrane

protein into a suitable lipid bilayer, but the lipid bilayer is also obliged to

accommodate this ‘extra’ protein without compromising the membrane integrity

of the cell (Drew et al., 2003). In support of this idea are the following

observations;

- it has been shown that upon overexpression of membrane proteins in E.

coli SRP is titrated (Valent et al., 1997), - that the overexpression of membrane

proteins in yeast can lead to activation of the unfolded protein response (UPR)

(Griffith et al., 2003) (a mechanism against ER stress caused by unfolded protein

(Kaneko and Nomura, 2003), - by keeping expression levels low enough to

reduce the UPR response, one can increase the amount of functionally expressed

membrane protein (Griffith et al., 2003), - that the functional expression of the

34

serotonin transporter in insect cells can be enhanced nearly 3-fold by co-

expressing ER luminal folding chaperones calnexin, and to a lesser degree,

calreticulin and BiP (Tate et al., 1999).

In terms of lipid capacity, it was shown that expressing GPCRs in the eye

of the fly - a membrane dedicated almost exclusively to the GPCR rhodopsin - is

highly successful (Eroglu et al., 2002), and that the bacterium Lactococcus lactis is

a suitable host for membrane protein overexpression perhaps because of the

small number of endogenous membrane proteins (Kunji et al., 2003). Lastly, E.

coli mutant strains with improved membrane protein overexpression

characteristics were isolated (Miroux and Walker, 1996). After overexpression of

a membrane protein, the cells were biochemically analyzed and visualized under

an electron microscope; it was clear that for one of these strains the cell had

proliferated extra internal membranes (Arechaga et al., 2000).

3.1.2 ‘Trial-and-Error’

As the focus of the majority of expression studies has been to obtain functionally

expressed membrane protein, rather than analyzing membrane protein

overexpression per se, we do not know how generic the aforementioned problems

are. What is clear is that this is not the whole story. There are many other case-

by-case examples of further factors which may influence the ability to obtain

well-expressed functional membrane protein;

- the membrane protein is susceptible to degradation, e.g., by the ATP-

dependent integral membrane protein protease FtsH (Ito and Akiyama, 2005), -

the membrane protein is unstable if overexpressed without its complex

partner(s) e.g., SecY, the pore forming component of the translocon, is rapidly

degraded if expressed without SecE (Ito and Akiyama, 2005; Kihara et al., 1995), -

the composition of the membrane is unsuitable (Freedman et al., 1999), - the

membrane protein needs to be post-translationally modified; impossible in most

bacterial expression systems, e.g., N-linked glycosylation (Tate and Blakely,

35

1994) - the mRNA for the membrane protein is unstable (Afonyushkin et al.,

2003; Arechaga et al., 2003).

In principle, by studying the expression of a large number of membrane

proteins one could find some correlation between membrane proteins that

‘express poorly’ to those that ‘express well’ (Drew et al., 2003), e.g., membrane

proteins with multiple TMs are thought to give lower expression than those

containing fewer TMs (Grisshammer and Tate, 1995). Unfortunately, von Heijne

and co-workers did not find any correlation in any amino acid sequence

parameter tested between poor vs. well expressing membrane proteins for more

than 300 E. coli membrane proteins expressed in E. coli, e.g., size, degree of

hydrophobicity, number of TMs (Daley et al., 2005).

Our current lack of understanding means that membrane protein

‘expressibility’ cannot be predicted prior to experimental testing.

3.1.3 Choosing a membrane protein overexpression host

There are many approaches used in the overexpression of membrane proteins. In

general, it is preferred to overexpress membrane proteins into the membrane, as

the success rate of refolding membrane proteins from inclusion bodies is very

low (Drew et al., 2003). For obvious reasons, one would like to overexpress

membrane proteins in their endogenous host. This is not always possible; the

higher the organism from which the membrane protein comes from, the greater

the cost and time needed for successful overexpression in the most comparable

host to the membrane protein.

E. coli is often the first vehicle tested in the overexpression of both pro-

and eukaryotic membrane proteins; it is widely available, it is easy to work with

it, it is very versatile, and is cheap to use. Because of these factors numerous

membrane protein structures have been solved from material overexpressed in E.

coli; transporters (Abramson et al., 2003; Huang et al., 2003; Hunte et al., 2005;

Locher et al., 2002; Ma and Chang, 2004; Reyes and Chang, 2005; Yamashita et al.,

36

2005) respiratory proteins (Abramson et al., 2000; Bertero et al., 2003), ion

channels (Chang et al., 1998; Doyle et al., 1998; Dutzler et al., 2002), and other

channels (Fu et al., 2000; Khademi et al., 2004; Savage et al., 2003; Van den Berg et

al., 2004).

Unfortunately, there is only one example of a eukaryotic membrane

protein structure elucidated from overexpressed material, i.e., the rat voltage-

gated shaker K+ channel Kv1.2 (Long et al., 2005a). In this case the material was

not obtained by expression in E. coli, but in the yeast Pichia pastoris. Other

eukaryotic membrane protein structures have been solved, but with a membrane

protein that was isolated from naturally abundant sources, e.g., rhodopsin from

the bovine eye (Palczewski et al., 2000). While eukaryotic membrane proteins can

express well in E. coli, see e.g., (Quick and Wright, 2002), expression levels are

typically several orders of magnitude less than their bacterial counter-parts (Tate,

2001). If we want to solve eukaryotic membrane protein structures it seems that

the development of new E. coli strains or the use of hosts other than E. coli is

required. Indeed, it is possible to overexpress functional eukaryotic membrane

proteins in yeast, insect and mammalian cells, e.g., GPCR’s in yeast (Sarramegna

et al., 2002; Schiller et al., 2001), serotonin transporter in Sf9 cells using

baculovirus system (Tate et al., 1999), and rat glutamate transporter in BHK cells

using Semiliki Forest virus system (Raunser et al., 2005). Interestingly, the Gram-

positive bacterium Lactococcus lactis has shown to be a successful host for the

overexpression of eukaryotic mitochondrial carriers (Kunji et al., 2005; Kunji et

al., 2003).

3.1.4 General strategies for membrane protein overexpression in E. coli

There are many different strategies in each of the host systems used for

overexpression of a membrane protein. In general they involve adjusting the

type of promoter/plasmid system, culture conditions, and the protein itself by

37

truncations, mutations, and/or additions of various fusion tags. Here, we will

focus only on the expression of membrane proteins in E. coli.

In E. coli, the membrane protein to be expressed is usually cloned into a plasmid

under control of a tightly regulated and inducible promoter. The number of

plasmid copies per cell, the strength of the promoter, and the homogeneity of the

inducer across the cell population can all affect final yields. In general,

membrane protein overexpression strategies are the same as the ones used for

soluble proteins, see e.g., (Sorensen and Mortensen, 2005), with a few additional

points worth mentioning (outlined forthwith).

Membrane proteins typically inhibit cell growth when overexpressed,

thus it is advisable to use a tight promoter system, e.g., the pBAD promoter

(Morgan-Kiss et al., 2002), or the T7-based promoter in combination with the

plasmid pLysS (Pan and Malcolm, 2000). As membrane proteins typically contain

an N-terminal targeting signal, fusing a soluble protein to the N-terminus of the

membrane protein might be problematic, additionally so, for membrane proteins

with periplasmic N-terminal tails; large N-terminal domains are almost (ProW is

a notable exception), non-existent in the E. coli inner membrane proteome (Daley

et al., 2005). If the membrane protein naturally has a large extra-cytoplasmic N-

terminal domain e.g., like many GPCRs, an N-terminal signal sequence has

shown to be required for functional expression (Grisshammer et al., 1993; Weiss

and Grisshammer, 2002; Yeliseev et al., 2005). Indeed, for all GPCRs in sequenced

genomes N-terminal tails longer than 60 amino acids are considerably more

likely to contain a signal peptide (Wallin and von Heijne, 1995). Membrane

proteins seem to express better in E. coli at a temperature of 20-25°C rather than

37°C. Although expression at lower temperatures is often successful for soluble

protein as well (Sorensen and Mortensen, 2005), membrane proteins maybe more

sensitive to temperature as they fold co-translationally (section 1.2.2), i.e., over

this temperature range, the translation rate decreases linearly with temperature

38

(Farewell and Neidhardt, 1998). Plasmids with a cytoplasmic antibiotic resistance

marker (e.g., kanamycin) are recommended over periplasmic antibiotic resistance

markers (e.g., β-lactamase), as it may avoid any extra workload on the Sec

translocon (Ito and Akiyama, 1991).

3.1.5 The BL21(DE3) pET system

In this thesis, membrane proteins were overexpressed using the BL21(DE3) pET

system from a modified pET-28a plasmid (Waldo et al., 1999), which harbors a

kanamycin resistance marker. Protein expression in the pET vector is under the

control of the strong T7 promoter that in concert with the E. coli strain

BL21(DE3), is switched on in the presence of isopropyl-β-D-thiogalactoside

(IPTG), i.e., IPTG induces expression of the gene encoding the T7 RNA

polymerase that is located on the chromosome integrated λ phage gene DE3

(Studier and Moffatt, 1986). As membrane protein overexpression can be toxic

(Miroux and Walker, 1996), the BL21(DE3) strain is typically used in combination

with a plasmid which constitutively expresses a T7 lysozyme gene, i.e., pLysS/E.

The T7 lysozyme has a low affinity for the T7 RNA polymerase, and dampens

‘leaky’ expression (Pan and Malcolm, 2000). Although, some argue that the pET-

based system is not applicable to membrane proteins because it is too strong

(Wang et al., 2003), these plasmid and strain combinations have been used

extensively to successfully overexpress membrane proteins and to obtain

material for structure determination (Kastner et al., 2000; Miroux and Walker,

1996).

3.1.6 Membrane protein purification

All membrane proteins are routinely purified using a detergent (Seddon et al.,

2004). A detergent at a critical concentration will form a hydrophobic pocket,

typically a spherical micelle that retains the integrity of the membrane protein as

it extracts it from the lipid. Solubilization of membranes with detergent results in

39

a mixture of detergent, protein and lipid, in which the amount of lipid is

progressively reduced as the membrane protein is purified in a buffer containing

detergent (Seddon et al., 2004), e.g. by immobilized metal affinity

chromatography (IMAC), anion/cation exchange, size-exclusion

chromatography, etc. Finding the right detergent that retains the function of the

membrane protein can be tricky. The use of shorter chain detergents, e.g., n-

octyl-β-D-glycopyranoside to increase the number of protein-protein contacts for

protein crystallization, most often results in the membrane protein aggregating

instead. It is becoming increasingly clear that removal of too much lipid can be

detrimental (Fyfe et al., 2005; Long et al., 2005b). A number of structures have

revealed that certain lipids can play definitive functional and/or structural roles,

e.g., cardiolipin in the purple bacteria reaction centre (Fyfe et al., 2004).

Papers III and IV

One of the biggest obstacles towards understanding membrane protein

structure-function relationships is the difficulties associated with obtaining

milligram quantities of membrane protein. This thesis tackles this challenge by

developing GFP-based methodology to monitor membrane protein

overexpression in the E. coli membrane, and to use GFP as an aid in the

subsequent purification of membrane proteins.

40

3.2 High-throughput membrane protein overexpression in E. coli

Traditional membrane protein overexpression screening methods in E. coli are

quite laborious. In order to remove inclusion bodies, membranes are typically

first isolated from whole-cells before the overexpressed protein - via SDS-PAGE -

is detected by Coomaisse staining and/or Western-blotting; neither of which

methods are the most ideal for quantifying protein expression. Here, we present

an alternative, superior method. We show that the amount of GFP fluorescence

from E. coli cells expressing membrane protein-GFP fusions is a simple, fast, and

accurate estimate of expression. Not only does it complement the topology

mapping of membrane proteins (section 2.2), but it is easily transferable to many

laboratories, and enables the protein to be visualized during detergent

solubilization and purification.

3.2.1 Inclusion bodies of membrane protein-GFP fusions are not fluorescent

Waldo and co-workers showed that a C-terminal GFP fusion could be used to

reliably estimate the overexpression of soluble proteins in E. coli (Waldo et al.,

1999). In short, if a soluble protein was expressed into inclusion bodies, GFP did

not fold and was not fluorescent. In contrast, if the soluble protein was correctly

folded, GFP did fold and was fluorescent. The use of GFP to monitor the

overexpression of soluble proteins in E. coli has been reinforced by others

(Hedhammar et al., 2005). GFP is ideal for this purpose as it requires no

substrates for its fluorescence, is stable, and is easy to measure and quantify

(Tsien, 1998). To ascertain the reliability of monitoring membrane protein

expression with a C-terminal GFP moiety, GFP was fused to the C-terminus of a

number of well-characterized pro- and eukaryotic membrane proteins (Paper

III). This was important to verify, as the folding pathway for membrane proteins

is very different to soluble proteins (section 1.2.3) (Drew et al., 2003). Two of the

test-set membrane proteins (rat olfactory GST-GPCR and M13-procoat) were

41

known to express into inclusion bodies (Kiefer et al., 1996; Krebber et al., 1997).

The GFP used to test this contains the folding (F64L) and chromophore (S65T)

mutations (Tsien, 1998), and has been evolved in E. coli to have 42-fold higher

(soluble) expression than ‘wild-type’ GFP (Crameri et al., 1996). Under typical

culture conditions in the E. coli strain BL21(DE3)pLysS, membrane protein-GFP

fusions were overexpressed essentially as described in section 3.1.5. This system

was also the same as that shown to be successful for monitoring the expression of

soluble protein-GFP fusions in E. coli (Waldo et al., 1999).

After overexpression, cells were lysed and fractionated by differential

centrifugation. The amount of membrane protein-GFP fusion in these fractions

was measured by a combination of fluorescence and quantitive immunoblotting

(Paper III). After analyzing membrane protein fractions (high-speed spin) it was

clear that GFP fluorescence from isolated membranes was a good estimate of

expression. In contrast, it was apparent that not all of the membrane protein-GFP

fusion left in the unbroken E. coli cells (low-speed spin) was fluorescent. This was

particularly obvious for the M13 / GST-GPCR GFP fusions which were

previously shown to express into inclusion bodies. This was alternatively

visualized by Western-blotting an equivalent amount of GFP fluorescence from

the low-speed and the high-speed spin fraction. Inclusion bodies from M13-GFP

and GST-GPCR-GFP were in the order of ~50 and 90% of the total expressed

protein, respectively. They were later isolated by a sucrose step gradient and

were not fluorescent (Paper III); we have since verified this with other

membrane protein-GFP fusions.

Therefore, if the overexpressed fusion protein ends up in the insoluble

fraction as inclusion bodies GFP is not florescent; in contrast, if the fusion is

expressed in the cytoplasmic membrane, GFP does fold and is fluorescent, and

the amount of GFP fluorescence correlates with the amount of protein integrated

in the E. coli membrane (Papers III & IV). Recently, it has been shown that the

highest amount of GFP fluorescence from LacS-GFP overexpression in E. coli -

42

LacS is an Streptococcus thermophilus lactose transporter - coincides with

maximum LacS-GFP transport activity and not maximum LacS-GFP production

as judged by Western-blotting (Geertsma, 2005); GFP in this case only monitored

the amount of functionally expressed membrane protein.

3.2.2 GFP tagging works only for membrane proteins with a cytoplasmic C-terminus

As shown in section 2.2, GFP is inactive when targeted to the periplasm with a

Sec-type signal peptide (Paper I), thus, to use this approach membrane proteins

must have their C-terminus localized to the cytoplasm. As most membrane

proteins acquire a Cin topology this is only a minor drawback. The percentage of

multispanning membrane proteins with a Cin topology has been experimentally

measured at 80 and 83% in the E. coli and Saccharomyces cerevisiae genomes,

respectively (Daley et al., 2005; Kim, In preparation), and is predicted to be 70-

75% in all other sequenced genomes (Wallin and von Heijne, 1998).

3.2.3 GFP as a membrane protein folding indicator in whole cells

Since GFP is a slow folding protein ~t1/2 30 min (Fukuda et al., 2000; Waldo et al.,

1999), it is conceivable that GFP works as a folding indicator - when placed at the

C-terminal end of a membrane protein - because there is sufficient time for the

membrane protein to misfold before GFP has folded. The misfolded membrane

protein is most likely degraded by the cell or retained as inclusion bodies (Chang

et al., 2005). Nevertheless as GFP is very stable, once folded, it can remain

fluorescent even if the membrane protein itself is later degraded. This is evident

from cytosolic GFP frequently found in the supernatant of recovered membranes

(Paper III). It seems that the amount of cytosolic GFP is proportional to the

stability of the membrane protein. Similar observations have been made for

membrane proteins fused to other soluble protein tags, e.g., PhoA (Danielsen et

al., 1995; Pourcher et al., 1996). This means that an overexpression estimate made

from whole cells can be misleading. For this reason, the most accurate way to

43

estimate overexpression is to measure GFP fluorescence in isolated membranes,

as cytosolic GFP is not recovered in this fraction (Paper III). However, as the

isolation of membranes is somewhat laborious, to be ‘high-throughput’ a reliable

estimate has to be possible from whole-cells rather than recovered membranes.

The reliability that can be placed on whole-cell estimation was

investigated more thoroughly. In short, 48 E. coli membrane protein-GFP fusions

were overexpressed, and in order to cover different membrane protein

overexpression levels, 9 were purified. Satisfactorily, there is a clear correlation

between the amount of whole-cell fluorescence and the amount of purified

membrane protein-GFP fusion (Paper IV).

3.2.4 GFP-based screen to optimize membrane protein overexpression

For a number of membrane protein-GFP fusions the whole-cell GFP fluorescence

was measured from 1 ml and 1 L cultures. As there were no significant

differences in the amount of fluorescence per ml, 1 ml is a satisfactory culture

volume for overexpression screening, Figure 5 (Paper IV).

Figure 5: The comparison of GFP fluorescence from 13 membrane-protein GFP fusions cultured in either 1 ml or 1L.

44

As 5 ml cultures grown in a 24-well format are comparable to that of the 1 ml

culture condition, the expression of many membrane proteins can be rapidly

tested. This was demonstrated by the global analysis of the E. coli inner

membrane proteome (Daley et al., 2005).

Based on the fluorescence from membrane protein-GFP fusions it is

possible to quickly optimize overexpression of a single membrane protein.

Slightly varying standard culture parameters can dramatically change

overexpression yields, e.g., IPTG induction at cellular OD600 of 0.4 compared to

0.6, or IPTG concentration of 0.1 compared to 0.4 mM, can almost double yields

of the putative amino acid transporter YbaT (Drew, In preparation). Each

membrane protein can respond differently to these parameters in different BL21

strains i.e., BL21(DE3), BL21(DE3)pLysS, C41/43 walker strains (Miroux and

Walker, 1996). At present we are determining the parameters worth screening. So

far, the most consistent parameter for improving yields is to induce expression at

a temperature of 20-25°C instead of 37°C (Paper IV).

3.2.5 In-gel GFP fluorescence

The monitoring of membrane protein expression from whole-cells can be further

improved by subjecting a whole-cell sample to standard SDS-PAGE. GFP

remains partially intact under these conditions (were most proteins are

denatured), and exposure of the polyacrylamide gel to UV-light enables

detection of the GFP with a CCD-camera (Drew, In preparation). Thus, the

amount of full-length membrane protein-GFP fusion can also be monitored,

Figure 6.

45

Figure 6. Verification of whole-cell fluorescence from liquid culture with an in-gel fluorescence assay. A. Expression of YedZ-GFP, and quantification of fluorescent 'bands' correlates with whole-cell fluorescence. B. Optimizing expression of YciS-GFP in the Bl21(DE3)pLysS strain by lowering temperature after induction to 30 or 25 degrees and culturing cells after induction from 4 - 22 hours. (Drew, In preparation)

46

3.2.6 GFP-based purification pipeline

To establish a generic purification procedure or ‘pipeline’, a His8 tag was fused to

the C-terminus of GFP, i.e., gives membrane protein-GFP-His8. After standard

membrane protein-GFP-His8 overexpression and isolation, a number of

membrane proteins were purified by a combination of IMAC and size-exclusion

chromatography. Milligram amounts of E. coli membrane protein fusions, similar

levels to that found by others e.g., (Eshaghi et al., 2005), were routinely purified

from one liter cultures (Paper IV).

The GFP is a useful tool in the purification of membrane proteins. With

GFP present one can monitor the ability of different detergents to extract an

overexpressed fusion protein from the membrane. Even though the final choice

of detergent will also depend on the ability to preserve the membrane protein in

a fully functional state, poorly extracting detergents can be quickly eliminated in

this step. The GFP moiety of the membrane protein-GFP fusion also enables the

purification to be followed visually, and the binding efficiency of a fusion to a

column can be seen directly. Lastly, the GFP moiety of the membrane protein-

GFP fusion means it is possible to quickly and accurately determine protein

concentrations (Paper IV).

3.2.7 Recovery of membrane proteins from GFP fusions using a site specific protease

There are a few cases where purified membrane protein-GFP fusions have been

shown to be functional in vitro, e.g., (Quick and Wright, 2002). Membrane

protein-GFP fusions can also be active in vivo, as we showed for the essential E.

coli membrane protein YidC (see section 1.2.2), i.e., YidC-GFP is functional at

expression levels similar to endogenous amount of YidC, and localizes to the E.

coli cell-poles (Urbanus et al., 2002). However, as GFP may interfere with the

function of the protein and hinder protein crystallization, a Tobacco Etch Virus

protease cleavage site (ENLYFQG/S) was added to clip off the GFP-His8 moiety

from the membrane protein-TEV-GFP-His8 fusion. TEV protease was chosen

47

because; - it is a non-commercial specific protease, - it is easily produced in large

quantities, - and is active in the presence of many detergents (Mohanty et al.,

2003).

TEV protease was tested by incubating purified YbaT-TEV-GFP-His8 (a

putative amino acid transporter), GltP-TEV-GFP-His8 (a glutamate transporter)

(Wallace et al., 1990), and YedZ-TEV-GFP-His8 (a protein of unknown function)

with His10-TEV protease. After incubation, digestion was complete, and the

His10-TEV protease, undigested membrane protein-TEV-GFP-His8 fusion and

clipped-off GFP-His8 were easily removed by batch-binding material to metal

affinity resin (Paper IV). It was possible to recover intact functional full-length

membrane proteins from membrane protein-GFP fusions. Throughout, GFP

fluorescence could be used to monitor both the effectiveness of the TEV

digestion, and the purity of the recovered membrane protein.

Are isolated membrane proteins functional? Purified GltP was

reconstituted into lipid vesicles and its activity was compared to purified GltP-

His8. There was no difference in the glutamate uptake activity between GltP

recovered from GltP-TEV-GFP-His8 and purified GltP-His8 (Paper IV). For

further verification the YedZ protein was analysed in detail (section 4).

3.2.8 How does this GFP-based method compare to other high-throughput approaches?

Many high-throughput membrane protein overexpression initiatives have

estimated membrane protein expression by the quantification of ‘bands’ visible

on a polaycrylamide gel after Coomaisse staining and/or Western-blotting

(Dobrovetsky et al., 2005; Korepanova et al., 2005); in these cases membranes are

first isolated to remove any inclusion bodies. However, Coomassie staining is

inaccurate, lacks sensitivity, and Western-blotting is time consuming and not

always reliable, i.e., membrane proteins with different hydrophobicity bind

Coomaisse or can transfer to a semi-solid support inconsistently. To rapidly

judge (in a 96-well format) the expression of many His-tagged membrane

48

proteins in E. coli, Nordlund and co-workers developed a dot-blot detection

method which does not require an electrophoretic transfer step (Eshaghi et al.,

2005). This method is elegant, and is capable of simultaneously screening the

expression and detergent solubilization efficiency of numerous membrane

proteins. The main disadvantage of this method is that the amount of full-length

protein is estimated only after the binding and elution of overexpressed material

to 96-well coated Ni-NTA resin. This is expensive, and maybe unaffordable for

many laboratories.

3.2.9 Summary of high-throughput membrane protein overexpression

One of the main obstacles towards understanding membrane proteins is the

difficulties associated with obtaining pure material for biochemical and

structural analysis (Grisshammer and Tate, 1995). Unfortunately, in comparison

to soluble proteins, overexpression of membrane proteins typically yields little

protein. Novel approaches are badly needed to identify ‘workable’ material.

In this thesis, we have shown that a simple C-terminal GFP fusion is a

reliable folding indicator for membrane proteins expressed in E. coli with a

cytoplasmic C-terminal tail (Paper III). By incorporating a C-terminal His8 tag to

the end of GFP, and a site for the TEV protease to clip off the GFP-His8 fusion,

we show we can use an efficient, standardized purification protocol to purify

protein to yields >1 mg per liter of culture (Paper IV). As proof-of-principle of

this purification pipeline an E. coli membrane protein of previously unknown

function was characterized (next section).

49

4. Characterization of the membrane protein YedZ

YedZ belongs to a bacterial protein family of unknown function, UPF0191

(www.sanger.ac.uk). YedZ originally attracted our attention since cells

overexpressing YedZ-TEV-GFP-His8 were orange instead of green. Although its

orange colour suggested binding of some kind of cofactor, none of the Web-

based prediction tools used to analyze its amino acid sequence identified any

potential cofactor binding motifs.

4.1.1 A test case for the GFP-based purification pipeline: YedZ

Purification of the YedZ protein by our purification pipeline yields a protein that

is orange. Under both oxidizing and reducing conditions, optical spectra of the

purified YedZ protein were recorded (Paper IV). Under reducing conditions the

YedZ protein demonstrated an absorption spectra characteristic of cytochrome

b5, with a maximum α-peak at 558 nm. This annotation could be corroborated by

analysis of the purified sample by means of mass spectrometry. A major

monoisotopic peak was identified at 617 Da; a mass equal to that of heme b. With

an assay for heme, the YedZ to heme ratio was calculated at 1:1. Because of an

atypical absorption peak in the 450-500 nm regions, YedZ was also suspected to

bind flavin. This was confirmed by subjecting YedZ to reverse-phase liquid

chromatography. YedZ contains FMN rather than FAD, with a molar ratio of 0.7

FMN per YedZ molecule (Paper IV).

4.1.2 YedZ is a novel integral membrane flavocytochrome

How does YedZ bind the heme b? YedZ is a very hydrophobic membrane protein

(23% leucine and 7% valine), and consists of six TMs connected by very short

loops (Paper I).

The topology of YedZ is consistent with an alignment of bacterial

homologs, whereby putative loop regions fall into stretches of amino acid

50

residues with low similarity, Figure 7a. In contrast, the transmembrane segments

II-V are very well conserved in YedZ and its homologs. In helix III there is one

conserved histidine, and in helix V there are two. The two parallel histidines (H92

in helix III and H164 in helix V) close to the periplasmic face of the membrane are

clearly the most likely histidine pair for ligation of the heme b, Figure 7b. It is

plausible that these transmembrane segments form a four-helical bundle, and

coordinate heme b in a similar manner as in other cytochrome b containing

membrane proteins, see e.g., (Iwata et al., 1999).

How does YedZ bind FMN? The coordination of heme b by integral

membrane proteins has been well described, but binding of FMN in the plane of

the membrane is unprecedented. Based on our current knowledge of flavin-

binding protein structures it is difficult to envisage how the very short and

unconserved loops of YedZ could fold to bind FMN. To date, all examples of

flavin binding protein structures deposited in the Protein Data Bank have a fold

architecture of at least 100 amino acids to envelope the ligand e.g., a TIM-barrel

or a Rossman type fold (Fraaije and Mattevi, 2000; Hefti et al., 2003). On the other

hand, the conserved amino acids W and Y as observed for globular flavodoxin

proteins (Lostao et al., 2003) at the start and in the middle of transmembrane

segment V, could be the key residues for FMN binding in YedZ. Thus, in light of

the topology for YedZ, the unknown YedZ protein was annotated as the first

integral membrane flavocytochrome (Paper IV).

51

A.

B.

Figure 7. YedZ alignment and membrane topology A. Representative amino acid sequence alignment of YedZ bacterial homologs; ECOLI (E. coli) YERPE (Yersinia pestis), BRUME (Brucella melitensis) RHIME (Rhodospirillum rubrum) AGRT (Agrobacterium tumefaciens) CAUCR (Caulobacter crescentus), RALSO (Ralstonia solanacearum), PASMU (Pasteurella multocida), PSEAE (Pseudomonas aeruginosa) XANAC (Xanthomonas campestris), DEIRA (Deinococcus radiodurans) The bottom sequence is the consensus outlined in red. Predicted transmembrane segments for E. coli YedZ are marked with gray bars and numbered I-VI. 100% conserved amino acid residues are in red text and highlighted in yellow. B. YedZ consists of 6 hydrophobic transmembrane segments, it has N- and C- terminal cytoplasmic ends and short interconnecting loops.

1 15110 20 30 40 50 60 70 80 90 100 110 120 130 140(1)-QVTWLKVC-------LHLAGLLPFLWLVWAINHG---GLGADPVKDIQHFTGRTALKFLLATLLITPLARYAKQPLLIRTRRLLGLWCFAWATLHLTSYALLELGVNNLALLGKELITRPYLTLGIISWVILLALAFTSTQ-AMQRKLG-YEDZ_ECOLI/7-199 (1)-HITWLKIA-------IWLAATLPLLWLVLSINLG---GLSADPAKDIQHFTGRMALKLLLATLLVSPLARYSKQPLLLRCRRLLGLWCFAWGTLHLLSYSILELGLSNIGLLGHELINRPYLTLGIISWLVLLALALTSTR-WAQRKMG-Y0G1_YERPE/7-206 (1)-KKKTPRPGQWKLW-LLYTAGFVPAVWTFYLGATG---QLGADPVKTFEHLLGLWALRFLILTLLVTPMRDLTG-ITLLRYRRALGLLAFYYALMHFTTYMVLDQGL-NLSAIITDIVRRPFITIGMISLALLVPLALTSNN-WSIRKLG-Y304_BRUME/9-210 (1)-MLSLFRII-------IHVCCLGPVAWLAWVLLSGDESQLGADPIKEIQHFLGFSALTILLIMFILGKVFYLLKQPQLQVLRRALGLWAWFYVVLHVYAYLALELGY-DFSLFVQELVNRGYLIIGAIAFLILTLMALSSWS-YLKLKMG-Y538_PASMU/1-201 (1)-RYWYLRLA-------VFLGALAVPAWWLYQAWIF---ALGPDPGKTLVDRLGLGALVLLLLTLAMTPLQKLSGWPGWIAVRRQLGLWCFTYVLLHLSAYYVFILGL-DWGQLGIELSKRPYIIVGMLGFVCLFLLAITSNR-FAMRKLG-YAJ1_PSEAE/2-198 (1)-PKRLHGPS---IW-ALYILGFLPAVWGFYLGATG---RLPGNAVKEFEHLLGIWALRFLIATLAITPIRDLFG-VNWLRYRRALGLLAFYYVMMHFLTYMVLDQTL-LLPAIVADIARRPFITIGMAALVLLIPLAVTSNI-WSIRRLG-YD82_RHIME/8-207 (1)-KTLVHAAA---LA-PIALLGWQ--FWQVWQSGSD---ALGADPVAEIEHRTGLWALRLLLITLAITPLRQLTGQAVVIRFRRMLGLYAFFYATVHLAAYLTLDLRG-FWTQIFEEILKRPYITVGFAAWLLLMPLAITSTQGW-MRRLK-YG46_XANAC/12-211 (1)PKRYQPAA----IW-SLYVIGLCPGLWYFYLAATG---GLGFNPVKDFEHLLGIWALRFLCLGLLVTPLRDLFN-VNLIAYRRALGLIAFYYVLAHFTVYLVLDRGL-ILGSIAGDILKRPYIMLGMAGLIILIPLALTSNR-WSIRRLG-YJ20_AGRT5/11-211 (1)-PYAWLGPG-------VVLGGLLPTVFLLWDALSG---GLGANPVKQATHQTGQLALIVLTLSLACTPARVWLGWTWAARIRKALGLLAAFYAVLHFGIYLRGQDFS--LGRIWEDVTERPFITSGFAALLLLLPLVLTSGK-GSVRRLGFYP37_DEIRA/7-198 (1)-KKRPSKLQDTLVYGLVWLACFAPLAWLAWRGYAG---ELGANPIDKLIRELGEWGLRLLLVGLAITPAARILKMPRLVRFRRTVGLFAFAYVALHLLAYVGIDLFF-DWNQLWKDILKRPFITLGMLGFMLLIPLAVTSTNGWVIRMGR-YR47_CAUCR/7-209 (1)-SLRAVRIA-------VWLLALVPFLRLVVLGATD---RYGANPLEFVTRSTGTWTLVLLCCTLAVTPLRRLTGMNWLIRIRRMLGLYTFFYGTLHFLIWLLVDRGL-DPASMVKDIAKRPFITVGFAAFVLMIPLAATSTN-AMVRRLGGYT80_RALSO/10-211 (1) RIA LWLAGLLP LWLVW G TG GLGADPVKDI H LG WALRLLLLTLAITPLR L G LIR RR LGLWAFFYALLHL AYLVLDLGL LG I DILKRPYITLGMIAFLLLIPLALTS WSIRKLG Consensus (1)

I II III IV

151 215160 170 180 190 200(151)-KHWQQLHNFVYLVAILAPIHYLW--SVKIISPQPLIYAGLAVLLLALR-------YKKLRSLFNYEDZ_ECOLI/7-199(139)-ARWQKLHNWVYVVAILAPIHYLW--SVKTLSPWPIIYAVMAALLLLLRYKLLLPRYKKFRQWFRY0G1_YERPE/7-206(139)-RRWSSLHKLVYIAIAGSAVHFLM--SVKSWPAEPVIYAAIVAALLLWRLARP--YLRTRKPALRY304_BRUME/9-210(143)-KWWFYLHQLGYYALLLGAIHYVW--SVKNVTFSSMLYLILSIMILCDALYG-LFIKRKGRSTSAY538_PASMU/1-201(141)-SRWKKLHRLVYLILGLGLLHMLW--VVRADLEEWTLYAVVGASLMLLR--LPSIARRLPRLRTRYAJ1_PSEAE/2-198(138)-QRWNKLHRLVYVIAAAGALHFAM--SVKVVGPEQMLYLFLVAVLVAWRAVRKR-FLRWRRQGTAYD82_RHIME/8-207(140)-RNWGRLHMLIYPIGLLAVLHFWW--LVKSDIREPALYAGILAVLLGWRVWKKLSARQTTARRSTYG46_XANAC/12-211(139)-SRWNTLHKLVYLVLIVGVLHFVL--ARKSITLEPVFYISTMVVLLGYRLVRPSIMTMKRNKRARYJ20_AGRT5/11-211(140)FARWTLLHRLVYLAAALGALHYWW--GVKKDHSGPLLAVLVLAALGLAR-------LKTPARLNRYP37_DEIRA/7-198(137)-AAWSRLHRLVYLIVPLGVAHYYL--LVKADHRPPIIYGAVFVALMLWRVWE----GRRTASKSSYR47_CAUCR/7-209(146)GRRWQWLHRLVYVTGVLGILHYWWHKAGKHDFAEVSIYAAVMAVLLGLRVWWVWRGARQGAIAGGYT80_RALSO/10-211(138) RW LHRLVYLIAILG LHYLW SVK EPIIYA VLAVLL RL R R Consensus(151)

V VI

52

4.1.3 The possible function of YedZ

To further understand YedZ, we analyzed the DNA surrounding the yedZ gene in

the E. coli genome. The gene encoding YedZ is in a putative operon together with

a gene encoding YedY. YedY encodes a soluble periplasmic protein. The putative

yedYZ operon possesses an upstream σ70 consensus sequence, suggesting that it is

likely constitutively expressed, Figure 8a. A region of YedY shares at least 25%

sequence identity to a molybdenum-molybdopterin (Mo-MPT) binding domain

that is present in soluble assimilatory nitrate reductases (NR) found in plants,

algae and fungi, Figure 8b. These enzymes catalyze the conversion of nitrate to

nitrite to assimilate inorganic nitrogen (Barber et al., 2002).

Figure 8. Analysis of the putative yedYZ operon. A. The putative yedYZ operon possesses a σ70 consensus sequence. The stop codon of yedY is immediately followed by the start codon of yedZ and the ribosome binding site (RBS) of the yedZ gene is in the end of the yedY coding region. B. YedY belongs to a protein family of oxidoreductase molybdopterin binding proteins Pfam P11605, and shares 25 % sequence identity to the Mo-MPT domain in eukaryotic assimilatory nitrate reductases. The example shown is tobacco NR NIA2. C. E. coli ∆tatC and control cells expressing YedY-HA were labeled with 35S-methionine. YedY-HA was immunoprecipitated with an antiserum against the HA-epitope tag and immuno-precipitates were analyzed by means of SDS-PAGE and fluorography (unpublished).

53

The amino acid signature for Mo-MPT in bacteria and eukaryotes is usually

different as bacteria further modify this domain to Mo-MGD (Campbell, 1996).

However, a recent 3D-structure confirmed that YedY contained the unmodified

cofactor Mo-MPT (Loschi et al., 2004). Like other periplasmic co-factor containing

proteins it also contains a twin arginine motif (K-R-R-Q-V-L-K), which is a signal

for export via the TAT translocase; a proteinaceous channel that preferentially

translocates fully-folded proteins with cytosolically incorporated co-factors

(Gohlke et al., 2005), Figure 8c.

Eukaryotic assimilatory NR is a homodimer of two soluble ~100 kDa subunits,

each subunit containing 3 modular units in a 1:1:1 ratio of Mo-MPT, cytochrome

b5 and flavin, usually FAD. Each modular unit in NR is thought to have evolved

independently, and the three units are linked by highly variable hinge-like

sequences (Campbell, 2001; Hyde et al., 1991). Our analysis of the putative yedYZ

operon suggests that the membrane-bound YedZ protein could be equivalent to

the soluble cytochrome b5 and flavin-binding domains, and together with the

globular Mo-MPT containing protein YedY, constitute an assimilatory

periplasmic NR, Figure 9.

Figure 9. YedYZ is a novel nitrate reductase. Schematic representation of the relationship between YedYZ proteins and eukaryotic assimilatory NR, the example shown is the NR NIA2 from tobacco (unpublished).

54

In support of this hypothesis we have found that the periplasmic NR capacity in a

yedYZ deletion mutant strain (MG1655 derivative) was slightly lower than in a

control grown aerobically (unpublished data). More striking, was a clear increase

in NR activity for a single yedZ deletion. Presumably uncoupling YedY allows

greater access of the artificial electron donor (reduced methyl viologen), used in

the NR assay to the Mo-MPT site (unpublished data).

55

5. Conclusions

In E. coli, membrane protein topology can be rapidly deduced from a

combination of computer predictions and single C-terminal PhoA/GFP reporter-

protein fusions (Paper I). Reporter fusions made to either the N- or C-terminal

ends of membrane proteins are more informative than fusions placed elsewhere

(Paper II), and are enough to appreciably improve membrane protein topology

predictions (Daley et al., 2005; Melen et al., 2003). This approach has been

applied to the entire E. coli inner membrane proteome (Daley et al., 2005). As

expected, ~80% of membrane proteins contain cytoplasmic C-termini, and thus,

can be further analyzed by a GFP-based overexpression and purification

‘pipeline’ (Papers III and IV). This pipeline allows highly overexpressed

membrane proteins to be rapidly and easily screened for. As demonstrated for

the membrane proteins GltP and YedZ, this pipeline can recover intact, full-

length functional membrane proteins from membrane protein-GFP fusions

(Paper IV). The usefulness of combining the aforementioned topology

information with the GFP-based pipeline was brought to the fore with the

characterization of the membrane protein YedZ; the first identified integral

membrane flavocytochrome.

It is envisaged that this GFP-based methodology in combination with the

E. coli membrane protein-GFP library, will facilitate the characterization of the

many E. coli membrane proteins without a known function. As demonstrated by

the functional expression of the human KDEL receptor-GFP fusion in L. lactis

(Paper IV), this technology also holds great promise for the eagerly awaited

functional and structure characterization of eukaryotic membrane proteins.

56

References Abramson, J., Riistama, S., Larsson, G., Jasaitis, A., Svensson-Ek, M., Laakkonen, L.,

Puustinen, A., Iwata, S. and Wikstrom, M. (2000) The structure of the ubiquinol oxidase from Escherichia coli and its ubiquinone binding site. Nat Struct Biol, 7, 910-917.

Abramson, J., Smirnova, I., Kasho, V., Verner, G., Kaback, H.R. and Iwata, S. (2003) Structure and mechanism of the lactose permease of Escherichia coli. Science, 301, 610-615.

Adamian, L., Nanda, V., DeGrado, W.F. and Liang, J. (2005) Empirical lipid propensities of amino acid residues in multispan alpha helical membrane proteins. Proteins, 59, 496-509.

Afonyushkin, T., Moll, I., Blasi, U. and Kaberdin, V.R. (2003) Temperature-dependent stability and translation of Escherichia coli ompA mRNA. Biochem Biophys Res Commun, 311, 604-609.

Andersson, H. and von Heijne, G. (1994) Membrane protein topology: effects of delta mu H+ on the translocation of charged residues explain the 'positive inside' rule. EMBO J, 13, 2267-2272.

Arechaga, I., Miroux, B., Karrasch, S., Huijbregts, R., de Kruijff, B., Runswick, M.J. and Walker, J.E. (2000) Characterisation of new intracellular membranes in Escherichia coli accompanying large scale over-production of the b subunit of F(1)F(o) ATP synthase. FEBS Lett, 482, 215-219.

Arechaga, I., Miroux, B., Runswick, M.J. and Walker, J.E. (2003) Over-expression of Escherichia coli F1F(o)-ATPase subunit a is inhibited by instability of the uncB gene transcript. FEBS Lett, 547, 97-100.

Barber, M.J., Desai, S.K., Marohnic, C.C., Hernandez, H.H. and Pollock, V.V. (2002) Synthesis and bacterial expression of a gene encoding the heme domain of assimilatory nitrate reductase. Arch Biochem Biophys, 402, 38-50.

Batey, R.T., Rambo, R.P., Lucast, L., Rha, B. and Doudna, J.A. (2000) Crystal structure of the ribonucleoprotein core of the signal recognition particle. Science, 287, 1232-1239.

Behrendt, J., Standar, K., Lindenstrauss, U. and Bruser, T. (2004) Topological studies on the twin-arginine translocase component TatC. FEMS Microbiol Lett, 234, 303-308.

Bernsel, A. and Von Heijne, G. (2005) Improved membrane protein topology prediction by domain assignments. Protein Sci, 14, 1723-1728.

Bertero, M.G., Rothery, R.A., Palak, M., Hou, C., Lim, D., Blasco, F., Weiner, J.H. and Strynadka, N.C. (2003) Insights into the respiratory electron transfer pathway from the structure of nitrate reductase A. Nat Struct Biol, 10, 681-687.

Bogdanov, M., Heacock, P.N. and Dowhan, W. (2002) A polytopic membrane protein displays a reversible topology dependent on membrane lipid composition. EMBO J, 21, 2107-2116.

Bogdanov, M., Zhang, W., Xie, J. and Dowhan, W. (2005) Transmembrane protein topology mapping by the substituted cysteine accessibility method (SCAM(TM)): application to lipid-specific membrane protein topogenesis. Methods, 36, 148-171.

Boon, J.M. and Smith, B.D. (2002) Chemical control of phospholipid distribution across bilayer membranes. Med Res Rev, 22, 251-281.

Booth, P.J. (2005) Sane in the membrane: designing systems to modulate membrane proteins. Curr Opin Struct Biol, 15, 435-440.

Campbell, W.H. (1996) Nitrate Reductase Biochemistry Comes of Age. Plant Physiol, 111, 355-361.

57

Campbell, W.H. (2001) Structure and function of eukaryotic NAD(P)H:nitrate reductase. Cell Mol Life Sci, 58, 194-204.

Chamberlain, A.K., Lee, Y., Kim, S. and Bowie, J.U. (2004) Snorkeling preferences foster an amino acid composition bias in transmembrane helices. J Mol Biol, 339, 471-479.

Chang, G., Spencer, R.H., Lee, A.T., Barclay, M.T. and Rees, D.C. (1998) Structure of the MscL homolog from Mycobacterium tuberculosis: a gated mechanosensitive ion channel. Science, 282, 2220-2226.

Chang, H.C., Kaiser, C.M., Hartl, F.U. and Barral, J.M. (2005) De novo Folding of GFP Fusion Proteins: High Efficiency in Eukaryotes but Not in Bacteria. J Mol Biol, 353, 397-409.

Crameri, A., Whitehorn, E.A., Tate, E. and Stemmer, W.P. (1996) Improved green fluorescent protein by molecular evolution using DNA shuffling. Nat Biotechnol, 14, 315-319.

Culham, D.E., Hillar, A., Henderson, J., Ly, A., Vernikovska, Y.I., Racher, K.I., Boggs, J.M. and Wood, J.M. (2003) Creation of a fully functional cysteine-less variant of osmosensor and proton-osmoprotectant symporter ProP from Escherichia coli and its application to assess the transporter's membrane orientation. Biochemistry, 42, 11815-11823.

Daley, D.O., Rapp, M., Granseth, E., Melen, K., Drew, D. and von Heijne, G. (2005) Global topology analysis of the Escherichia coli inner membrane proteome. Science, 308, 1321-1323.

Danielsen, S., Boyd, D. and Neuhard, J. (1995) Membrane topology analysis of the Escherichia coli cytosine permease. Microbiology, 141 ( Pt 11), 2905-2913.

Dawson, J.P., Melnyk, R.A., Deber, C.M. and Engelman, D.M. (2003) Sequence context strongly modulates association of polar residues in transmembrane helices. J Mol Biol, 331, 255-262.

Dawson, J.P., Weinger, J.S. and Engelman, D.M. (2002) Motifs of serine and threonine can drive association of transmembrane helices. J Mol Biol, 316, 799-805.

DeGrado, W.F., Gratkowski, H. and Lear, J.D. (2003) How do helix-helix interactions help determine the folds of membrane proteins? Perspectives from the study of homo-oligomeric helical bundles. Protein Sci, 12, 647-665.

Dobrovetsky, E., Lu, M.L., Andorn-Broza, R., Khutoreskaya, G., Bray, J.E., Savchenko, A., Arrowsmith, C.H., Edwards, A.M. and Koth, C.M. (2005) High-throughput production of prokaryotic membrane proteins. J Struct Funct Genomics, 6, 33-50.

Doyle, D.A., Morais Cabral, J., Pfuetzner, R.A., Kuo, A., Gulbis, J.M., Cohen, S.L., Chait, B.T. and MacKinnon, R. (1998) The structure of the potassium channel: molecular basis of K+ conduction and selectivity. Science, 280, 69-77.

Drew, D., Froderberg, L., Baars, L. and de Gier, J.W. (2003) Assembly and overexpression of membrane proteins in Escherichia coli. Biochim Biophys Acta, 1610, 3-10.

Drew, D., Lerch, M, Kunji, E, Slotboom, DJ, and de Gier, JW. (In preparation) Optimizing membrane protein overexpression and purification using GFP fusions. Nature Methods.

Drew, D., Sjostrand, D., Nilsson, J., Urbig, T., Chin, C.N., de Gier, J.W. and von Heijne, G. (2002) Rapid topology mapping of Escherichia coli inner-membrane proteins by prediction and PhoA/GFP fusion analysis. Proc Natl Acad Sci U S A, 99, 2690-2695.

Drew, D., Slotboom, D.J., Friso, G., Reda, T., Genevaux, P., Rapp, M., Meindl-Beinker, N.M., Lambert, W., Lerch, M., Daley, D.O., Van Wijk, K.J., Hirst, J., Kunji, E. and De Gier, J.W. (2005) A scalable, GFP-based pipeline for membrane protein overexpression screening and purification. Protein Sci, 14, 2011-2017.

58

Drew, D.E., von Heijne, G., Nordlund, P. and de Gier, J.W. (2001) Green fluorescent protein as an indicator to monitor membrane protein overexpression in Escherichia coli. FEBS Lett, 507, 220-224.

Driessen, A.J., Manting, E.H. and van der Does, C. (2001) The structural basis of protein targeting and translocation in bacteria. Nat Struct Biol, 8, 492-498.

Dutzler, R., Campbell, E.B., Cadene, M., Chait, B.T. and MacKinnon, R. (2002) X-ray structure of a ClC chloride channel at 3.0 A reveals the molecular basis of anion selectivity. Nature, 415, 287-294.

Edwards, M.D., Li, Y., Kim, S., Miller, S., Bartlett, W., Black, S., Dennison, S., Iscla, I., Blount, P., Bowie, J.U. and Booth, I.R. (2005) Pivotal role of the glycine-rich TM3 helix in gating the MscS mechanosensitive channel. Nat Struct Mol Biol, 12, 113-119.

Engelman, D.M., Chen, Y., Chin, C.N., Curran, A.R., Dixon, A.M., Dupuy, A.D., Lee, A.S., Lehnert, U., Matthews, E.E., Reshetnyak, Y.K., Senes, A. and Popot, J.L. (2003) Membrane protein folding: beyond the two stage model. FEBS Lett, 555, 122-125.

Eroglu, C., Cronet, P., Panneels, V., Beaufils, P. and Sinning, I. (2002) Functional reconstitution of purified metabotropic glutamate receptor expressed in the fly eye. EMBO Rep, 3, 491-496.

Eshaghi, S., Hedren, M., Nasser, M.I., Hammarberg, T., Thornell, A. and Nordlund, P. (2005) An efficient strategy for high-throughput expression screening of recombinant integral membrane proteins. Protein Sci, 14, 676-683.

Farewell, A. and Neidhardt, F.C. (1998) Effect of temperature on in vivo protein synthetic capacity in Escherichia coli. J Bacteriol, 180, 4704-4710.

Feilmeier, B.J., Iseminger, G., Schroeder, D., Webber, H. and Phillips, G.J. (2000) Green fluorescent protein functions as a reporter for protein localization in Escherichia coli. J Bacteriol, 182, 4068-4076.

Fraaije, M.W. and Mattevi, A. (2000) Flavoenzymes: diverse catalysts with recurrent features. Trends Biochem Sci, 25, 126-132.

Freedman, S.D., Katz, M.H., Parker, E.M., Laposata, M., Urman, M.Y. and Alvarez, J.G. (1999) A membrane lipid imbalance plays a role in the phenotypic expression of cystic fibrosis in cftr(-/-) mice. Proc Natl Acad Sci U S A, 96, 13995-14000.

Frillingos, S., Sahin-Toth, M., Wu, J. and Kaback, H.R. (1998) Cys-scanning mutagenesis: a novel approach to structure function relationships in polytopic membrane proteins. FASEB J, 12, 1281-1299.

Fu, D., Libson, A., Miercke, L.J., Weitzman, C., Nollert, P., Krucinski, J. and Stroud, R.M. (2000) Structure of a glycerol-conducting channel and the basis for its selectivity. Science, 290, 481-486.

Fukuda, H., Arai, M. and Kuwajima, K. (2000) Folding of green fluorescent protein and the cycle3 mutant. Biochemistry, 39, 12025-12032.

Fyfe, P.K., Hughes, A.V., Heathcote, P. and Jones, M.R. (2005) Proteins, chlorophylls and lipids: X-ray analysis of a three-way relationship. Trends Plant Sci, 10, 275-282.

Fyfe, P.K., Isaacs, N.W., Cogdell, R.J. and Jones, M.R. (2004) Disruption of a specific molecular interaction with a bound lipid affects the thermal stability of the purple bacterial reaction centre. Biochim Biophys Acta, 1608, 11-22.

Gandlur, S.M., Wei, L., Levine, J., Russell, J. and Kaur, P. (2004) Membrane topology of the DrrB protein of the doxorubicin transporter of Streptomyces peucetius. J Biol Chem, 279, 27799-27806.

Geertsma, E.R. (2005) What lies between: Functional interfaces in a dimeric transporter. Department of Biochemistry. University of Groningen, Groningen, p. 101.

Goder, V., Junne, T. and Spiess, M. (2004) Sec61p contributes to signal sequence orientation according to the positive-inside rule. Mol Biol Cell, 15, 1470-1478.

59

Gohlke, U., Pullan, L., McDevitt, C.A., Porcelli, I., de Leeuw, E., Palmer, T., Saibil, H.R. and Berks, B.C. (2005) The TatA component of the twin-arginine protein transport system forms channel complexes of variable diameter. Proc Natl Acad Sci U S A, 102, 10482-10486.

Gouffi, K., Santini, C.L. and Wu, L.F. (2002) Topology determination and functional analysis of the Escherichia coli TatC protein. FEBS Lett, 525, 65-70.

Granseth, E., Daley, D.O., Rapp, M., Melen, K. and von Heijne, G. (2005a) Experimentally constrained topology models for 51,208 bacterial inner membrane proteins. J Mol Biol, 352, 489-494.

Granseth, E., von Heijne, G. and Elofsson, A. (2005b) A study of the membrane-water interface region of membrane proteins. J Mol Biol, 346, 377-385.

Green, D.H. and Cutting, S.M. (2000) Membrane topology of the Bacillus subtilis pro-sigma(K) processing complex. J Bacteriol, 182, 278-285.

Griffith, D.A., Delipala, C., Leadsham, J., Jarvis, S.M. and Oesterhelt, D. (2003) A novel yeast expression system for the overproduction of quality-controlled membrane proteins. FEBS Lett, 553, 45-50.

Grisshammer, R., Duckworth, R. and Henderson, R. (1993) Expression of a rat neurotensin receptor in Escherichia coli. Biochem J, 295 ( Pt 2), 571-576.

Grisshammer, R. and Tate, C.G. (1995) Overexpression of integral membrane proteins for structural studies. Q Rev Biophys, 28, 315-422.

Gromiha, M.M. and Suwa, M. (2005) Structural analysis of residues involving cation-pi interactions in different folding types of membrane proteins. Int J Biol Macromol, 35, 55-62.

Hedhammar, M., Stenvall, M., Lonneborg, R., Nord, O., Sjolin, O., Brismar, H., Uhlen, M., Ottosson, J. and Hober, S. (2005) A novel flow cytometry-based method for analysis of expression levels in Escherichia coli, giving information about precipitated and soluble protein. J Biotechnol, 119, 133-146.

Hefti, M.H., Vervoort, J. and van Berkel, W.J. (2003) Deflavination and reconstitution of flavoproteins. Eur J Biochem, 270, 4227-4242.

Helms, V. (2002) Attraction within the membrane. Forces behind transmembrane protein folding and supramolecular complex assembly. EMBO Rep, 3, 1133-1138.

Hermansson, M. and von Heijne, G. (2003) Inter-helical hydrogen bond formation during membrane protein integration into the ER membrane. J Mol Biol, 334, 803-809.

Hessa, T., Kim, H., Bihlmaier, K., Lundin, C., Boekel, J., Andersson, H., Nilsson, I., White, S.H. and von Heijne, G. (2005a) Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature, 433, 377-381.

Hessa, T., White, S.H. and von Heijne, G. (2005b) Membrane insertion of a potassium-channel voltage sensor. Science, 307, 1427.

Higy, M., Junne, T. and Spiess, M. (2004) Topogenesis of membrane proteins at the endoplasmic reticulum. Biochemistry, 43, 12716-12722.

Hoag, H. (2005) Expression of interest. Nature, 437, 164-165. Hong, H. and Tamm, L.K. (2004) Elastic coupling of integral membrane protein stability

to lipid bilayer forces. Proc Natl Acad Sci U S A, 101, 4065-4070. Houben, E.N., Zarivach, R., Oudega, B. and Luirink, J. (2005) Early encounters of a

nascent membrane protein: specificity and timing of contacts inside and outside the ribosome. J Cell Biol, 170, 27-35.

Huang, Y., Lemieux, M.J., Song, J., Auer, M. and Wang, D.N. (2003) Structure and mechanism of the glycerol-3-phosphate transporter from Escherichia coli. Science, 301, 616-620.

Huber, D., Boyd, D., Xia, Y., Olma, M.H., Gerstein, M. and Beckwith, J. (2005) Use of thioredoxin as a reporter to identify a subset of Escherichia coli signal sequences that promote signal recognition particle-dependent translocation. J Bacteriol, 187, 2983-2991.

60

Hunte, C., Screpanti, E., Venturi, M., Rimon, A., Padan, E. and Michel, H. (2005) Structure of a Na+/H+ antiporter and insights into mechanism of action and regulation by pH. Nature, 435, 1197-1202.

Hyde, G.E., Crawford, N.M. and Campbell, W.H. (1991) The sequence of squash NADH:nitrate reductase and its relationship to the sequences of other flavoprotein oxidoreductases. A family of flavoprotein pyridine nucleotide cytochrome reductases. J Biol Chem, 266, 23542-23547.

Ito, K. and Akiyama, Y. (1991) In vivo analysis of integration of membrane proteins in Escherichia coli. Mol Microbiol, 5, 2243-2253.

Ito, K. and Akiyama, Y. (2005) Cellular functions, mechanism of action, and regulation of ftsh protease. Annu Rev Microbiol, 59, 211-231.

Iwata, M., Okada, K. and Iwata, S. (1999) [Structure of cytochrome bc1 complex from bovine heart mitochondria]. Tanpakushitsu Kakusan Koso, 44, 643-654.

Jakubowski, S.J., Krishnamoorthy, V., Cascales, E. and Christie, P.J. (2004) Agrobacterium tumefaciens VirB6 domains direct the ordered export of a DNA substrate through a type IV secretion System. J Mol Biol, 341, 961-977.

Jensen, M.O. and Mouritsen, O.G. (2004) Lipids do influence protein function: the hydrophobic matching hypothesis revisited. Biochim Biophys Acta, 1666, 205-226.

Jensen, M.O., Mouritsen, O.G. and Peters, G.H. (2004) The hydrophobic effect: molecular dynamics simulations of water confined between extended hydrophobic and hydrophilic surfaces. J Chem Phys, 120, 9729-9744.

Jones, D.T., Taylor, W.R. and Thornton, J.M. (1994) A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry, 33, 3038-3049.

Jordan, P., Fromme, P., Witt, H.T., Klukas, O., Saenger, W. and Krauss, N. (2001) Three-dimensional structure of cyanobacterial photosystem I at 2.5 A resolution. Nature, 411, 909-917.

Kaneko, M. and Nomura, Y. (2003) ER signaling in unfolded protein response. Life Sci, 74, 199-205.

Kastner, C.N., Dimroth, P. and Pos, K.M. (2000) The Na+-dependent citrate carrier of Klebsiella pneumoniae: high-level expression and site-directed mutagenesis of asparagine-185 and glutamate-194. Arch Microbiol, 174, 67-73.

Khademi, S., O'Connell, J., 3rd, Remis, J., Robles-Colmenares, Y., Miercke, L.J. and Stroud, R.M. (2004) Mechanism of ammonia transport by Amt/MEP/Rh: structure of AmtB at 1.35 A. Science, 305, 1587-1594.

Kiefer, H., Krieger, J., Olszewski, J.D., Von Heijne, G., Prestwich, G.D. and Breer, H. (1996) Expression of an olfactory receptor in Escherichia coli: purification, reconstitution, and ligand binding. Biochemistry, 35, 16077-16084.

Kihara, A., Akiyama, Y. and Ito, K. (1995) FtsH is required for proteolytic elimination of uncomplexed forms of SecY, an essential protein translocase subunit. Proc Natl Acad Sci U S A, 92, 4532-4536.

Kim, H., Melen, K. and von Heijne, G. (2003) Topology models for 37 Saccharomyces cerevisiae membrane proteins based on C-terminal reporter fusions and predictions. J Biol Chem, 278, 10208-10213.

Kim, H., Unby, M, Melén, K, Warringer, J, Blomberg, A, von Heijne, G. (In preparation) A global topology map of the Saccharomyces cerevisiae membrane proteome.

Korepanova, A., Gao, F.P., Hua, Y., Qin, H., Nakamoto, R.K. and Cross, T.A. (2005) Cloning and expression of multiple integral membrane proteins from Mycobacterium tuberculosis in Escherichia coli. Protein Sci, 14, 148-158.

Krebber, C., Spada, S., Desplancq, D., Krebber, A., Ge, L. and Pluckthun, A. (1997) Selectively-infective phage (SIP): a mechanistic dissection of a novel in vivo selection for protein-ligand interactions. J Mol Biol, 268, 607-618.

61

Krishnan, M.N., Bingham, J.P., Lee, S.H., Trombley, P. and Moczydlowski, E. (2005) Functional Role and Affinity of Inorganic Cations in Stabilizing the Tetrameric Structure of the KcsA K+ Channel. J Gen Physiol, 126, 271-283.

Krogh, A., Larsson, B., von Heijne, G. and Sonnhammer, E.L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol, 305, 567-580.

Kung, C. (2005) A possible unifying principle for mechanosensation. Nature, 436, 647-654.

Kunji, E.R., Chan, K.W., Slotboom, D.J., Floyd, S., O'Connor, R. and Monne, M. (2005) Eukaryotic membrane protein overproduction in Lactococcus lactis. Curr Opin Biotechnol, 16, 546-51.

Kunji, E.R., Slotboom, D.J. and Poolman, B. (2003) Lactococcus lactis as host for overproduction of functional membrane proteins. Biochim Biophys Acta, 1610, 97-108.

Locher, K.P., Lee, A.T. and Rees, D.C. (2002) The E. coli BtuCD structure: a framework for ABC transporter architecture and mechanism. Science, 296, 1091-1098.

Long, S.B., Campbell, E.B. and Mackinnon, R. (2005a) Crystal structure of a mammalian voltage-dependent Shaker family K+ channel. Science, 309, 897-903.

Long, S.B., Campbell, E.B. and Mackinnon, R. (2005b) Voltage Sensor of Kv1.2: Structural Basis of Electromechanical Coupling. Science, 309, 903-908.

Loschi, L., Brokx, S.J., Hills, T.L., Zhang, G., Bertero, M.G., Lovering, A.L., Weiner, J.H. and Strynadka, N.C. (2004) Structural and biochemical identification of a novel bacterial oxidoreductase. J Biol Chem.

Lostao, A., Daoudi, F., Irun, M.P., Ramon, A., Fernandez-Cabrera, C., Romero, A. and Sancho, J. (2003) How FMN binds to anabaena apoflavodoxin: a hydrophobic encounter at an open binding site. J Biol Chem, 278, 24053-24061.

Luirink, J. and Sinning, I. (2004) SRP-mediated protein targeting: structure and function revisited. Biochim Biophys Acta, 1694, 17-35.

Ma, C. and Chang, G. (2004) Structure of the multidrug resistance efflux transporter EmrE from Escherichia coli. Proc Natl Acad Sci U S A, 101, 2852-2857.

Manoil, C. (1991) Analysis of membrane protein topology using alkaline phosphatase and beta-galactosidase gene fusions. Methods Cell Biol, 34, 61-75.

McMurry, J.L., Van Arnam, J.S., Kihara, M. and Macnab, R.M. (2004) Analysis of the cytoplasmic domains of Salmonella FlhA and interactions with components of the flagellar export machinery. J Bacteriol, 186, 7586-7592.

Melen, K., Krogh, A. and von Heijne, G. (2003) Reliability measures for membrane protein topology prediction algorithms. J Mol Biol, 327, 735-744.

Miroux, B. and Walker, J.E. (1996) Over-production of proteins in Escherichia coli: mutant hosts that allow synthesis of some membrane proteins and globular proteins at high levels. J Mol Biol, 260, 289-298.

Mohanty, A.K., Simmons, C.R. and Wiener, M.C. (2003) Inhibition of tobacco etch virus protease activity by detergents. Protein Expr Purif, 27, 109-114.

Morgan-Kiss, R.M., Wadler, C. and Cronan, J.E., Jr. (2002) Long-term and homogeneous regulation of the Escherichia coli araBAD promoter by use of a lactose transporter of relaxed specificity. Proc Natl Acad Sci U S A, 99, 7373-7377.

Muller, G. (2000) Towards 3D structures of G protein-coupled receptors: a multidisciplinary approach. Curr Med Chem, 7, 861-888.

Munro, S. (1998) Localization of proteins to the Golgi apparatus. Trends Cell Biol, 8, 11-15.

Nilsson, I.M. and von Heijne, G. (1993) Determination of the distance between the oligosaccharyltransferase active site and the endoplasmic reticulum membrane. J Biol Chem, 268, 5798-5801.

Nilsson, J., Persson, B. and von Heijne, G. (2000) Consensus predictions of membrane protein topology. FEBS Lett, 486, 267-269.

62

Nilsson, J., Persson, B. and von Heijne, G. (2005) Comparative analysis of amino acid distributions in integral membrane proteins from 107 genomes. Proteins, 60, 606-616.

Ott, C.M. and Lingappa, V.R. (2002) Integral membrane protein biosynthesis: why topology is hard to predict. J Cell Sci, 115, 2003-2009.

Palczewski, K., Kumasaka, T., Hori, T., Behnke, C.A., Motoshima, H., Fox, B.A., Le Trong, I., Teller, D.C., Okada, T., Stenkamp, R.E., Yamamoto, M. and Miyano, M. (2000) Crystal structure of rhodopsin: A G protein-coupled receptor. Science, 289, 739-745.

Palmer, T. and Berks, B.C. (2003) Moving folded proteins across the bacterial cell membrane. Microbiology, 149, 547-556.

Pan, S.H. and Malcolm, B.A. (2000) Reduced background expression and improved plasmid stability with pET vectors in BL21 (DE3). Biotechniques, 29, 1234-1238.

Park, S.H. and Opella, S.J. (2005) Tilt angle of a trans-membrane helix is determined by hydrophobic mismatch. J Mol Biol, 350, 310-318.

Pourcher, T., Bibi, E., Kaback, H.R. and Leblanc, G. (1996) Membrane topology of the melibiose permease of Escherichia coli studied by melB-phoA fusion analysis. Biochemistry, 35, 4161-4168.

Quick, M. and Wright, E.M. (2002) Employing Escherichia coli to functionally express, purify, and characterize a human transporter. Proc Natl Acad Sci U S A, 99, 8597-8601.

Rapoport, T.A., Goder, V., Heinrich, S.U. and Matlack, K.E. (2004) Membrane-protein integration and the role of the translocation channel. Trends Cell Biol, 14, 568-575.

Rapp, M., Drew, D., Daley, D.O., Nilsson, J., Carvalho, T., Melen, K., De Gier, J.W. and Von Heijne, G. (2004) Experimentally based topology models for E. coli inner membrane proteins. Protein Sci, 13, 937-945.

Raunser, S., Haase, W., Bostina, M., Parcej, D.N. and Kuhlbrandt, W. (2005) High-yield expression, reconstitution and structure of the recombinant, fully functional glutamate transporter GLT-1 from Rattus norvegicus. J Mol Biol, 351, 598-613.

Reyes, C.L. and Chang, G. (2005) Structure of the ABC transporter MsbA in complex with ADP.vanadate and lipopolysaccharide. Science, 308, 1028-1031.

Rost, B., Fariselli, P. and Casadio, R. (1996) Topology prediction for helical transmembrane proteins at 86% accuracy. Protein Sci, 5, 1704-1718.

Rudner, D.Z., Fawcett, P. and Losick, R. (1999) A family of membrane-embedded metalloproteases involved in regulated proteolysis of membrane-associated transcription factors. Proc Natl Acad Sci U S A, 96, 14765-14770.

Sarramegna, V., Talmont, F., Seree de Roch, M., Milon, A. and Demange, P. (2002) Green fluorescent protein as a reporter of human mu-opioid receptor overexpression and localization in the methylotrophic yeast Pichia pastoris. J Biotechnol, 99, 23-39.

Savage, D.F., Egea, P.F., Robles-Colmenares, Y., O'Connell, J.D., 3rd and Stroud, R.M. (2003) Architecture and selectivity in aquaporins: 2.5 a X-ray structure of aquaporin Z. PLoS Biol, 1, E72.

Schiller, H., Molsberger, E., Janssen, P., Michel, H. and Reilander, H. (2001) Solubilization and purification of the human ETB endothelin receptor produced by high-level fermentation in Pichia pastoris. Receptors Channels, 7, 453-469.

Schneider, D. (2004) Rendezvous in a membrane: close packing, hydrogen bonding, and the formation of transmembrane helix oligomers. FEBS Lett, 577, 5-8.

Schulz, G.E. (2003) Transmembrane beta-barrel proteins. Adv Protein Chem, 63, 47-70. Seddon, A.M., Curnow, P. and Booth, P.J. (2004) Membrane proteins, lipids and

detergents: not just a soap opera. Biochim Biophys Acta, 1666, 105-117.

63

Senes, A., Engel, D.E. and DeGrado, W.F. (2004) Folding of helical membrane proteins: the role of polar, GxxxG-like and proline motifs. Curr Opin Struct Biol, 14, 465-479.

Senes, A., Gerstein, M. and Engelman, D.M. (2000) Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions. J Mol Biol, 296, 921-936.

Senes, A., Ubarretxena-Belandia, I. and Engelman, D.M. (2001) The Calpha ---H...O hydrogen bond: a determinant of stability and specificity in transmembrane helix interactions. Proc Natl Acad Sci U S A, 98, 9056-9061.

Severance, S., Chakraborty, S. and Kosman, D.J. (2004) The Ftr1p iron permease in the yeast plasma membrane: orientation, topology and structure-function relationships. Biochem J, 380, 487-496.

Sorensen, H.P. and Mortensen, K.K. (2005) Soluble expression of recombinant proteins in the cytoplasm of Escherichia coli. Microb Cell Fact, 4, 1-8.

Strandberg, E. and Killian, J.A. (2003) Snorkeling of lysine side chains in transmembrane helices: how easy can it get? FEBS Lett, 544, 69-73.

Studier, F.W. and Moffatt, B.A. (1986) Use of bacteriophage T7 RNA polymerase to direct selective high-level expression of cloned genes. J Mol Biol, 189, 113-130.

Tate, C.G. (2001) Overexpression of mammalian integral membrane proteins for structural studies. FEBS Lett, 504, 94-98.

Tate, C.G. and Blakely, R.D. (1994) The effect of N-linked glycosylation on activity of the Na(+)- and Cl(-)-dependent serotonin transporter expressed using recombinant baculovirus in insect cells. J Biol Chem, 269, 26303-26310.

Tate, C.G., Whiteley, E. and Betenbaugh, M.J. (1999) Molecular chaperones stimulate the functional expression of the cocaine-sensitive serotonin transporter. J Biol Chem, 274, 17551-17558.

Thomas, J.D., Daniel, R.A., Errington, J. and Robinson, C. (2001) Export of active green fluorescent protein to the periplasm by the twin-arginine translocase (Tat) pathway in Escherichia coli. Mol Microbiol, 39, 47-53.

Tsien, R.Y. (1998) The green fluorescent protein. Annu Rev Biochem, 67, 509-544. Tusnady, G.E. and Simon, I. (1998) Principles governing amino acid composition of

integral membrane proteins: application to topology prediction. J Mol Biol, 283, 489-506.

Ulmschneider, M.B., Sansom, M.S. and Di Nola, A. (2005) Properties of integral membrane protein structures: derivation of an implicit membrane potential. Proteins, 59, 252-265.

Urbanus, M.L., Froderberg, L., Drew, D., Bjork, P., de Gier, J.W., Brunner, J., Oudega, B. and Luirink, J. (2002) Targeting, insertion, and localization of Escherichia coli YidC. J Biol Chem, 277, 12718-12723.

Valent, Q.A., de Gier, J.W., von Heijne, G., Kendall, D.A., ten Hagen-Jongman, C.M., Oudega, B. and Luirink, J. (1997) Nascent membrane and presecretory proteins synthesized in Escherichia coli associate with signal recognition particle and trigger factor. Mol Microbiol, 25, 53-64.

Van den Berg, B., Clemons, W.M., Jr., Collinson, I., Modis, Y., Hartmann, E., Harrison, S.C. and Rapoport, T.A. (2004) X-ray structure of a protein-conducting channel. Nature, 427, 36-44.

van Geest, M. and Lolkema, J.S. (2000) Membrane topology and insertion of membrane proteins: search for topogenic signals. Microbiol Mol Biol Rev, 64, 13-33.

Viklund, H. and Elofsson, A. (2004) Best alpha-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information. Protein Sci, 13, 1908-1917.

von Heijne, G. (1989) Control of topology and mode of assembly of a polytopic membrane protein by positively charged residues. Nature, 341, 456-458.

64

von Heijne, G. (1992) Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol, 225, 487-494.

Waldo, G.S., Standish, B.M., Berendzen, J. and Terwilliger, T.C. (1999) Rapid protein-folding assay using green fluorescent protein. Nat Biotechnol, 17, 691-695.

Wallace, B., Yang, Y.J., Hong, J.S. and Lum, D. (1990) Cloning and sequencing of a gene encoding a glutamate and aspartate carrier of Escherichia coli K-12. J Bacteriol, 172, 3214-3220.

Walian, P., Cross, T. and Jap, B.K. (2004) Structural genomics of membrane proteins. Genome Biology, 5, 215.1-8.

Wallin, E. and von Heijne, G. (1995) Properties of N-terminal tails in G-protein coupled receptors: a statistical study. Protein Eng, 8, 693-698.

Wallin, E. and von Heijne, G. (1998) Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci, 7, 1029-1038.

Wang, D.N., Safferling, M., Lemieux, M.J., Griffith, H., Chen, Y. and Li, X.D. (2003) Practical aspects of overexpressing bacterial secondary membrane transporters for structural studies. Biochim Biophys Acta, 1610, 23-36.

Weiss, H.M. and Grisshammer, R. (2002) Purification and characterization of the human adenosine A(2a) receptor functionally expressed in Escherichia coli. Eur J Biochem, 269, 82-92.

White, S.H. (2004) The progress of membrane protein structure determination. Protein Sci, 13, 1948-1949.

White, S.H. and von Heijne, G. (2005) Transmembrane helices before, during, and after insertion. Curr Opin Struct Biol, 15, 378-386.

Wimley, W.C. and White, S.H. (1996) Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nat Struct Biol, 3, 842-848.

Woolhead, C.A., McCormick, P.J. and Johnson, A.E. (2004) Nascent membrane and secretory proteins differ in FRET-detected folding far inside the ribosome and in their exposure to ribosomal proteins. Cell, 116, 725-736.

Yamashita, A., Singh, S.K., Kawate, T., Jin, Y. and Gouaux, E. (2005) Crystal structure of a bacterial homologue of Na(+)/Cl(-)-dependent neurotransmitter transporters. Nature, 437, 215-223.

Yeliseev, A.A., Wong, K.K., Soubias, O. and Gawrisch, K. (2005) Expression of human peripheral cannabinoid receptor for structural studies. Protein Sci, 14, 2638-2653.

Yohannan, S., Yang, D., Faham, S., Boulting, G., Whitelegge, J. and Bowie, J.U. (2004) Proline substitutions are not easily accommodated in a membrane protein. J Mol Biol, 341, 1-6.

65

Acknowledgements This will undoubtedly be the most read portion of the thesis. I hope I have managed to convey my heart-felt thanks in a way that is genuine. Enjoyable work has been the fruit of working with good people. Jan-Willem de Gier: For giving me freedom to pursue my research, for always supporting me, and for probably being the most loyal supervisor you will find anywhere! Gunnar von Heijne: For sharing your fountain of wisdom, for your patience, and for being a testimony to the fact that not all nice guys finish last! Edmund Kunji: For welcoming me warmly into your laboratory and for you infectious excitement of membrane protein research. Pär Nordlund: For your great hospitality, and for introducing me to some of those Swedish traditions. Dan Daley: For all those great training runs, for all those wonderful cooked meals of Geneth’s and for your enormous encouragement, help and time … thanks mate! Magnus Monne: For probably being one of the most generous Swedes anywhere, so much so, that you never complained about the random location of my socks, and you allowed me to share your king-size bed with you for 3 months!! Louise Baars: For your kindness, integrity, and intellect. Joy Kim: For being so joyful, for our fruitful scientific discussions, and for all your kindness and consideration. Wow. Mikaela Rapp: For being so much fun to work with - in the lab and around the coffee table - and for making me feel so welcome when I first arrived from NZ. Mirjam Lerch: For many interesting scientific discussions, for buying me English tea, and for your passion to just about anything. Tara Hessa: For your encouragement and for your ability to laugh when nothing is working. Linda Fröderberg: For calling a spade a spade, for all that endless administration, and for breaking-in Jan-Willem for us! Dirk Slotboom: For pushing me to use my brain and for your generosity. David Wickström: For your fantastic working attitude, for your kindness, and for passing the ball to me playing football … even if I really suck. Samuel Wagner: For all that chocolate you let me eat, and for the gift of earplugs. Marie Unby: For your cheery demeanor which makes working here so much fun. IngMarie Nilsson: For your kindness, generosity, and constant hard work. Marika Cassel: For your kindness, and patience when I am noisy in the office. Filippa Stenberg / Carolina Lundin: The new girls on the block … who make sure that no one will forget to buy the cake and/or champagne. Others (past and present) in DBB which makes this a great environment for working, special thanks to: Karl-Magnus, Gisela, Pavel, Shashi, Lotta, Pelle, Nadja, Bogos, Stefan, Inger, Kicki, and Anki. The rest of the group in Cambridge and in New Zealand: Peter M., Ted, Heather, Judy, Torsten, Ka Wai, Lisa, Marilyn and Peter H. To my good mates in Sweden: Anders, Lisa, Stig, Maria, Joel, Brenda, Bas, Sebastian, Geneth, and the wonderful Uhrnell family. To my good mates in New Zealand (my extended whanau): Daniel, Jamie, Esther, Nathan, Tammy, Steve, Katie, Strahan, Rachel, Jeremy, and Peter Haebel (honorary kiwi) … save a few waves for me boys!! To my family with whom life is very rich indeed: Mum, John, Dad, Jill, Marge, Geoffrey, Aaron, Carolina, Sofia, Paul, Jay, Jonathon, Lorraine, Johan, and my dearest twin, Natasha. To my wife Anna Maria: ….well, words are just not enough to describe how happy you make me! To my Lord: for his agape love and faithfulness.

Date post:	30-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

GFP as a tool to monitor membrane protein topology and …198074/FULLTEXT01.pdf · 2009-02-27 ·...

Documents