PROPEPTIDES OF PROTEASES EVOLVED SENSORS TO … · subsequently evolve complex cellular compartment...

PROPEPTIDES OF PROTEASES

EVOLVED SENSORS TO

EXPLOIT ORGANELLAR PH

Johannes Elferich, Danielle M. Williamson, Bala Krishnamoorthy¶, and Ujwal Shinde*

Department of Biochemistry and Molecular Biology,

Oregon Health and Science University, 3181 SW Sam Jackson Park Road,

Portland, OR, 97239, USA

¶ Department of Mathematics, Washington State University

Pullman, WA, 99164, USA

*Corresponding Author:

Ujwal Shinde, [email protected] Phone (503)-494-8683 Facsimile: (503)-494-8393

Running title: Protease evolution to exploit organelle pH

SUMMARY:

Eukaryotic cells maintain strict control over protein secretion, in part by utilizing the pH-gradient

maintained within their secretory pathway. How eukaryotic proteins evolved from prokaryotic

orthologs to exploit the pH-gradient for biological function remains a fundamental question in

cell biology. We have previously demonstrated that protein domains located within precursor

proteins, propeptides, encode histidine-driven pH-sensors to regulate organelle-specific

activation of the eukaryotic proteases, furin and proprotein convertase-1/3. Using bioinformatics,

we analyzed over 10,000 unique proteases within evolutionarily unrelated families, and

established that eukaryotic propeptides are enriched in histidines when compared to prokaryotic

orthologs. On this basis, we propose that eukaryotic proteins evolved to contain histidines within

cognate propeptides to exploit the tightly controlled pH-gradient of the secretory pathway,

thereby directing activation within specific organelles. Enrichment of histidine in propeptides

may therefore be used to predict the presence of pH sensors in other proteases or even

protease substrates.

HIGHLIGHTS:

• Histidine residues in propeptides act as pH-sensors in furin, a eukaryotic protease

• Histidine is enriched in eukaryotic, but not prokaryotic, subtilase propeptides

• Histidine enrichment is found in protein families unrelated to subtilases

• We propose histidine enrichment as an evolutionary mechanism to sense organellar pH

INTRODUCTION:

Eukaryotes are descendants of distinct prokaryotic cells that united symbiotically to

subsequently evolve complex cellular compartment called organelles (Embley and Martin,

2006). Although both prokaryotes and eukaryotes are able to secrete proteins, only eukaryotes

employ multi-compartmental secretory and endocytotic pathways. These pathways maintain a

precise pH-gradient that acidifies from the endoplasmic reticulum (pH~7.2) to secretory vesicles

(pH~5.5). This gradient provides the unique environmental conditions essential for the optimal

structure and function of proteins within distinct biochemical pathways (Casey et al., 2010).

Since many secreted eukaryotic proteins have prokaryotic orthologs, how and when

eukaryotic proteins evolved the ability to regulate their activity within different organelles is a

central question germane to our understanding of protein trafficking. Comparing secreted

eukaryotic proteins with their bacterial orthologs may potentially provide information about

mechanisms by which protein activity is regulated during trafficking through the secretory

pathway.

Proteases hydrolyze peptide bonds and likely arose early during evolution as simple

catabolic catalysts that generated amino acid residues in primitive organisms (Lopez-Otin and

Bond, 2008). Due to their ubiquitous distribution within prokaryotes, eukaryotes, and archea, the

three domains of life, proteases are well suited for analysis of selective pressures that drove

adaptation of eukaryotic proteins to the complex organelle trafficking system. Since uncontrolled

proteolysis has catastrophic consequences, cells appear to have evolved two distinct

mechanisms that maintain protease activities under exquisite spatiotemporal control (Lopez-

Otin and Bond, 2008). The first mechanism involves co-evolution of specific endogenous

inhibitors, typically within compartments distinct from those containing active enzymes. The

second mechanism involves proteases being synthesized as inactive precursors called

zymogens, which become active by limited intra- or intermolecular proteolysis. In some cases

the two regulatory mechanisms are combined; N-terminal propeptides co-evolved to facilitate

folding of cognate catalytic domains and act as potent inhibitors after cleavage from the catalytic

domain (Shinde and Inouye, 1993; Shinde and Thomas, 2011).

Subtilases – a ubiquitous super-family of serine proteases – represents an ideal group of

homologs to analyze protein adaptation to eukaryotic organelles, since they exist in all three

domains of life. Bacterial subtilisin and mammalian proprotein convertase (PC) sub-families

constitute the most extensively studied enzymes (Shinde and Thomas, 2011). Despite

evolutionary divergence, proteins in these subfamilies display common folds with conserved

catalytic triads. Subtilases are almost always expressed as zymogens, with amino and

occasionally carboxy propeptide extensions. They are classified into two sub-families;

Extracellular Serine Proteases (ESP) and Intracellular Serine Proteases (ISP) (Subbian et al.,

2004). ESPs have 80-100 residue long propeptides that catalyze folding and act as inhibitors

after cleavage, while ISPs have shorter propeptides that only act as inhibitors in the zymogen.

Catalytic domains and propeptides of mammalian PCs are closely related to protease domains

of ESPs and not ISPs (Shinde and Thomas, 2011). Similar to bacterial ESPs, propeptides of

PCs assist folding and require two ordered steps of proteolytic cleavage for activation. The two

proteolytic cleavages are precisely controlled within different organelles. The first cleavage

occurs rapidly after protein folding in the endoplasmic reticulum and results in a non-covalent

complex between the propeptide and the catalytic domain. Activation requires an additional

cleavage within the propeptide; in the case of furin this cleavage occurs only after the protein is

trafficked to a different organelle, the trans-golgi-network (TGN) (Anderson et al., 2002). Other

PCs are activated in a similar manner, but within different compartments (Seidah and Prat,

2012).

Experiments in vitro show that the pH of the TGN is sufficient to trigger the second

activating cleavage of furin (Anderson et al., 1997) and that a histidine residue in the propeptide

acts as a pH-sensor (Feliciangeli et al., 2006). We recently showed that propeptides of PCs

mediate the pH of activation, as swapping propeptides between PCs reassigned the pH of

activation (Dillon et al., 2012). We therefore hypothesized that propeptides of eukaryotic

subtilases evolved to sense organelle pH in order to direct activation. Such a broad hypothesis

is difficult to test experimentally, as it would require biochemical studies on a large number of

proteins. We overcame this problem by predicting properties of protein sequences based on our

hypothesis, and testing these against sequence databases using statistical methods. Histidine is

the only residue with an intrinsic pKa near the physiological range (~6.5) and therefore likely

involved in pH-sensing mechanisms. In this paper we show that enrichment of histidines in

propeptides correlates with the requirement to sense pH for activation within the subtilase

family. Furthermore, we demonstrate similar enrichment in other protease families, indicating

that enrichment of histidines in propeptides is a common mechanism to regulate activity in the

secretory pathway.

RESULTS:

Propeptide sequences of subtilases are more divergent than cognate catalytic domains:

To identify conserved sequence elements unique to either prokaryotic or eukaryotic subtilases,

we performed an evolutionary conservation analysis using the ConSurf server (Ashkenazy et

al., 2010). The analysis of prokaryotic subtilisin and eukaryotic proprotein convertase families

was initiated using sequences of Subtilisin E and Proprotein Convertase 1/3 (PC1/3),

respectively. The resulting conservation scores were mapped on the crystal structure of the

propepide:Subtilisin E complex (PDB: 1SCJ) and on a homology model of the propeptide:PC1

complex (based on PDB: 1P8J and 1KN6), respectively (Figure 1). Catalytic domains of

eukaryotic and prokaryotic subtilases depict a highly conserved core. On the contrary,

propeptides demonstrate less sequence conservation, with the dibasic cleavage motif at the C-

terminus of eukaryotic propeptides representing the only conserved region.

Since histidine 69 was demonstrated to function as a pH sensor in furin (Feliciangeli et

al., 2006), and given that propeptides of furin and PC1/3 alone are sufficient to impart organelle-

specific pH-dependent activation of cognate catalytic domains (Dillon et al., 2012), we analyzed

whether histidine residues demonstrate any sequence conservation within propeptides.

Although we could not identify absolutely conserved histidine residues in propeptides of

eukaryotic subtilases, several positions in our alignment contain a histidine residue in a

substantial fraction of sequences, especially at the position corresponding to histidine 69 in furin

(53.3% of sequences). In contrast, prokaryotic subtilases, which do not traverse the secretory

pathway, appear to encode less histidines within their propeptides. However, when catalytic

sequences are compared, we find strictly conserved histidine residues within prokaryotic and

eukaryotic sequences, and studies indicate that they play essential roles in catalysis or protein

stability (Carter and Wells, 1987). Hence, biased enrichment for histidine residues appears

localized within propeptides of eukaryotic subtilases.

The ConSurf analysis can only accommodate 150 sequences within each group, and a

search initiated using Subtilisin E and PC1/3 may introduce a selection bias based on input

sequences. Since subtilases encompass over 10,107 unique sequences, as per the PFAM

database family PF00082 (Punta et al., 2012), we developed a robust analysis of histidine

distribution within the available sequence data. Since PFAM employs a hidden Markov model of

only the catalytic domain in subtilases to scour through sequence databases, this method

avoids any selection bias for propeptide sequences. As PFAM families include only approximate

demarcations for start and stop positions for catalytic domains, we made the following three

suppositions to define propeptide and catalytic domain in each sequence: (i) the first 20

residues correspond to signal peptides (Hebert and Molinari, 2007) (ii) residues between

position 21 and the start of the catalytic domain correspond to the propeptide, and (iii) subtilases

with propeptides less than 50 residues represent ISP-like sequences that employ different

mechanisms for folding and activation (Subbian et al., 2004). This stringency provides a total of

6533 unique sequences from the PF00082 family for further analyses of a histidine bias.

Increased histidine content in propeptides of subtilases correlate with requirement of pH-

mediated activation:

We computed the abundance of histidine residues in propeptides ([His]Pro) and catalytic

domains ([His]Cat) for all sequences that met the above criteria. For comparison, we calculated

the difference in abundance in propeptides and catalytic domains within each protein (Δ[His] =

[His]Pro – [His]Cat). A positive value of Δ[His] indicates abundance of histidines in propeptides, a

negative value signifies abundance in catalytic domains, while near zero values imply equal

distribution. While Δ[His] values in individual proteins may be subject to random fluctuations, the

absence of any functional requirements would result in a distribution centered around zero. If

histidine residues in propeptides are required for the experimentally observed function of

sensing organelle specific pH, they would be selected during evolution, and one would expect

statistical bias for positive Δ[His] only within eukaryotic subtilases and near zero or negative

Δ[His] for prokaryotic subtilases.

For initial assessments, we plotted Δ[His] on a phylogenetic tree generated by the PFAM

database (Figure 2A). The tree is consistent with the homology groups defined by Siezen and

coworkers (Siezen and Leunissen, 1997), with the largest clades representing subtilisin, kexin,

proteinase K, and pyrolisin, as well as the later characterized sedolisin family (Wlodawer et al.,

2003). Four of these five families contain eukaryotic and prokaryotic proteins, suggesting these

families diverged before speciation. Only the subtilisin family is exclusively found within

prokaryotes. Interestingly, we observed that three of these four families display a predominantly

positive Δ[His] in eukaryotes, but not in prokaryotes. Only sedolisins show positive Δ[His] values

in both prokaryotes and eukaryotes.

To validate that positive Δ[His] values are unique to eukaryotic sequences, we

constructed a tree based on NCBI taxonomic classification and plotted Δ[His] within all

subtilases (Figure 2B). The slightly negative mean values of Δ[His] in prokaryotic and archaic

proteins imply that there is no functional requirements for histidines in prokaryotic propeptides.

In contrast, we observe a predominantly positive Δ[His] values in eukaryotes, with a mean value

of 1.72%. This difference signifies a strong increase in histidine content in the propeptide

compared to the catalytic domain. We observed positive Δ[His] values in all 3 kingdoms of

higher eukaryotes. In bacteria, the difference of about -0.3% was consistent in the 3 most

represented phylums. Interestingly, the phylum of Acidobacteria had a mean difference of

2.13%, comparable to eukaryotes.

Although the above analysis provides a visual description, we wanted to analyze the

statistical significance of the observed bias towards positive Δ[His] values within eukaryotes.

First, we plotted distributions of [His]Pro and [His]Cat for subtilases in prokaryotes and eukaryotes

(Figure 2C). The catalytic domains in both species display a distribution centered on 2%, with

eukaryotes having slightly higher [His]Cat values than prokaryotes, as expected from the average

histidine content in the UniProt database. While the distribution of [His]Pro in prokaryotes is

shifted towards lower values with several propeptides completely lacking histidines, the [His]Pro

in eukaryotes is shifted to higher values, much greater than the catalytic domains. It is important

to note that the distribution of [His]Pro in eukaryotes displays a much higher deviation than that

for [His]Cat within both prokaryotes and eukaryotes, which is likely due to the shorter length of

propeptides. When we investigated distributions for every amino acid we found that this

enrichment exists only for histidine residues (Figure S1). To further analyze this bias we also

investigated the distribution of Δ[His] (Figure 2D), which clearly demonstrates the differences in

histidine bias in prokaryotes and eukaryotes. The Δ[His] distribution in both species are

positively skewed, with median values of -0.56% (mean = -0.34%) and 1.5% (mean = 1.7%) for

prokaryotes and eukaryotes, respectively. When differences in distribution for every individual

amino acids were plotted (Δ[AA]), only cysteine displays a difference between prokaryotes and

eukaryotes similar to histidine (Figure S2). The cysteine bias is likely due to higher prevalence

of disulfide bonds in eukaryotes than prokaryotes. To quantify this distribution difference

between species we employed a non-parametric Mann-Whitney test (Table 1). For several

amino acids, the test resulted in small p-values (

effect size of 0.5. As seen in Figure 2E, histidine shows the highest deviation from 0.5,

suggesting this bias is not by pure chance. Only cysteine deviates substantially (more than 0.15

units from 0.5), which is likely due to higher frequency of disulfide bonds in eukaryotes than

prokaryotes. The fact that deviation from 0.5 in the effect size for histidine is considerably

greater than that observed for cysteine suggests a biological significance for a histidine bias.

Since possible errors in database annotation and differences in length between

propeptides and catalytic domains may result in a false-positive bias, we developed a test that is

independent of the start annotation in the PFAM database. We calculated histidine content in a

20-residue sliding window from the beginning of the sequence to the end of catalytic domain for

all sequences. After normalization as described in Methods, we averaged the resulting histidine

content profiles for eukaryotic and prokaryotic proteins (Figure 2F). Eukaryotic but not

prokaryotic proteins show an increase in histidine content in the first 100 residues,

corresponding to the propeptide. Proteins from both species have increased histidine content at

positions 200-250 along with a small increase at the C-terminus of the catalytic domain, likely

due to presence of the catalytic histidine, and a conserved histidine at the C-terminus of the

catalytic domain. Changes in length of the sliding window do not change the overall profile

(Figure S3).

To decipher correlations that may exist between the histidine bias and experimental

evidence of pH-dependent activation, we analyzed histidine contents in propeptides and

catalytic domains of individual proteins. For prokaryotes and archaea, we selected proteins that

displayed “reviewed status” in the UniProt database, and for eukaryotic sequences all homologs

in Homo sapiens, Saccharomyces cerevisiae, Arabidopsis thaliana, and the model proteases

cucumisin and Proteinase K/R (Figure 2G). While most bacterial proteins show comparable

histidine content in propeptides and catalytic domain (approximately 2%), Kumamolisin and

Xanthomonalisin display histidine content >4% in their propeptides. Consistent with our

hypothesis, both proteins undergo activation at acidic pH in vitro (Oda et al., 1987; Oyama et al.,

2002), which is not surprising because their hosts display optimum growth under acidic

conditions. Since the intracellular pH within these cells is maintained near neutral, pH sensing is

an ideal mechanism for discerning intracellular and extracellular environments. Hence, it is not

surprising that the sedolisin family shows a histidine bias within propeptides of prokaryotic

sequences, given that this family is optimized to function at low pH. Interestingly, Acidobacteria

almost exclusively express sedolisins, which explains their positive Δ[His] values. On the other

hand, all eukaryotic propeptides display histidine content above 4%, with the exception of

Proteinase K/R and SKI-1. Recombinant expression of proteinase K/R in E. coli produces active

protease (Gunkel and Gassen, 1989) and SKI-1 loses its propeptide in the ER (Seidah et al.,

1999), suggesting these processes occur at neutral pH, relaxing the necessity for histidines.

In conclusion, our results demonstrate that the histidine bias in subtilase propeptides

generally correlates with host species as it is present in eukaryotes but not in prokaryotes

(Figure 2). These results are consistent with our hypothesis that during evolution subtilases

responded to the requirement of regulating their activation as per the pH of their environment by

encoding histidine residues in propeptides to sense pH and direct activation.

Cathepsin propeptides are enriched in histidine:

To investigate whether our hypothesis applies to other pH-activated, propeptide-

dependent proteases, we analyzed histidine content in cathepsins, a large family of cysteine

peptidases found mainly in lysosomes (Turk et al., 2012). Similar to subtilases, acidic pH

initiates activation of cathepsins by propeptide proteolysis. Due to these parallels, we

hypothesized that eukaryotic cathepsins should show a similar bias for histidine in their

propeptides.

We used the PFAM family PF00112 to obtain cathepsin sequences and analyzed them

in a manner identical to subtilases. Figure 3A shows the phylogenetic tree for various

cathepsins along with their Δ[His] values. While only few cathepsin homologs exist in

prokaryotes, this paucity is not due to exclusion of sequences based on our criteria. Although

experimental data on prokaryotic cathepsins is scarce, we included them in the analysis for

comparison. The two major well-studied lysosomal cathepsin families are cathepsin L and

cathepsin B, both of which activate at low pH (Nishimura et al., 1988; Turk et al., 1993). Since

no experimental data regarding activation of cathepsin O, cathepsin F, and plant cathepsins

was found, we excluded them from further analysis. Nonetheless, the latter two cathepsins also

display the Δ[His] bias.

While the cathepsin L family shows bias towards positive Δ[His] values, the cathepsin B

family does not. The distributions of [His]Pro and [His]Cat in the cathepsin L family are similar to

those in eukaryotic subtilases (Figure 3B). However, the cathepsin B family displays increased

[His]Pro and [His]Cat values, leading to near-zero Δ[His] values (Figure 3C). Prokaryotic

cathepsins show similar distributions as prokaryotic subtilases. The small number of prokaryotic

sequences precludes a statistical comparison between species with robustness similar to

subtilases.

We next applied the sliding window analysis to validate the increased histidine content in

the propeptides of cathepsin L, and to map the specific location of increased histidine content in

the sequence of cathepsin B (Figure 3D). Prokaryotic cathepsins showed low histidine content

throughout the protein, with one peak between residues 250 and 300, which is due to the

catalytic histidine. Consistent with our hypothesis, an additional increase in histidine content

exists within the first 100 residues of cathepsin L. Interestingly, cathepsin B shows a moderate

increase in histidine content within the first 100 residues compared to prokaryotes, but a

substantial second peak corresponding to the occluding loop within the catalytic domain (Figure

3D and 3E). A comparison of the crystal structures of cathepsin L and cathepsin B (Figure 3E)

shows that the catalytic domains of the two families are similar. However, the cathepsin B

propeptide is truncated compared to cathepsin L, while the occluding loop in the catalytic

domain is longer in cathepsin B. Moreover, the cathepsin B occluding loop in the catalytic

domain extends into the region occupied by the cathepsin L propeptide to form direct contacts

with the cathepsin B propeptide. Notably, histidines within the occluding loop of cathepsin B

occupy similar spatial locations as histidine residues within the cathepsin L propeptide. This

suggests that the pH-sensing capability in cathepsin B is encoded not only within the

propeptide, but also in the occluding loop within the catalytic domain. Consistent with this

prediction, experimental data demonstrates that the occluding loop interacts with the propeptide

in a pH-dependent manner and mutation of histidines in the occluding loop to alanine blocks

activation (Quraishi et al., 1999). Moving pH-sensitivity from the propeptide into the catalytic

domain provides an evolutionary advantage to cathepsin B by enabling it to switch between an

endo- and exopeptidase in a pH dependent manner (Illy et al., 1997). In summary, these results

are consistent with our hypothesis, although subtle variations can exist within individual

propeptide-dependent protease families.

Propeptides in the cytosolic caspase family do not display a histidine bias:

Our hypothesis assumes that eukaryotic proteases require histidines in their propeptides to

sense the pH of the secretory pathway. Therefore, proteases that are expressed and function in

the cytosol would be expected to show no histidine bias within their propeptides. Caspases

constitute the most prominent, propeptide-dependent cytosolic protease family. They are

responsible for initiating apoptosis within eukaryotic cells (Creagh et al., 2003). Similar to

subtilases and cathepsins, caspases are expressed as inactive zymogens and activated by

proteolytic processing. Although apoptosis is linked with mild acidification of the cytosol, pH is

not shown as important for triggering caspase activation.

We used the PFAM family PF00656 to obtain caspase sequences and processed them

as described in Methods. The phylogenetic tree demonstrates that caspase homologs are found

in metazoans, fungi, and plants (Figure 4A). We exclude metacaspases (homologs in fungi and

plants) from our analysis because their propeptides contain histidine residues that are involved

in zinc binding (Tsiatsiani et al., 2011). Metazoan caspases demonstrate increased [His]Cat

values, while the [His]Pro values are similar to that of prokaryotic caspases (Figure 4B).

Consistently, Δ[His] values were slightly smaller for eukaryotic proteins (Figure 4C). The sliding

window analysis of prokaryotic and eukaryotic caspases shows that there is no substantial

histidine enrichment in the N-terminal residues (Figure 4D). Overall these results are consistent

with the assumption that the functional requirement of histidines in the propeptide is unique to

proteases that need to sense pH to direct their activation.

DISCUSSION:

We report a correlation of increased histidine content in propeptides with the

requirement to sense pH. But does a correlation imply causality? Histidine residues play

multiple unique roles in proteins because they can (i) function as proton exchangers in enzyme

catalysis (Dodson and Wlodawer, 1998), (ii) form complexes with soft metals such as iron and

zinc (Andreini et al., 2009), (iii) provide unique hydrogen bonding geometry, and (iv) alter protein

structure and interactions in a pH-dependent manner. Since propeptides are not part of the

active site that mediates proteolysis, and because the propeptides analyzed in this study do not

bind metal ions (Coulombe et al., 1996; Jain et al., 1998; Tangrea et al., 2002), one can exclude

the first two roles. It is also unlikely that propeptides in eukaryotes have a different requirement

for hydrogen bonding than their prokaryotic orthologs, thus endorsing their roles as pH-sensors

as the most likely explanation for the observed histidine bias.

Histidine residues have been demonstrated to function as pH sensors within various

proteins in prokaryotes and eukaryotes (Casey et al., 2010; Srivastava et al., 2007). In some

cases such as Na+/H+ antiporters, pH-regulation is found in prokaryotic and eukaryotic orthologs

(Slepkov et al., 2007). This is due to the common functional requirement to regulate intracellular

pH. Since in the case of the protease families investigated here, pH-sensing seems to be

unique to eukaryotic proteins (with the exception of sedolisins), one can surmise that this is

indeed due to a functional requirement that is unique to eukaryotes, such as regulation within

the secretory pathway.

It is important to note that electrostatic interactions within a protein family can migrate

during the course of evolution, even though their physiological functions such as pH sensing

can be conserved. This may appear through mutations that introduce a redundant charge pair,

which display a pKa similar to the previous one, to allow the original charge to disappear during

subsequent steps of evolution while maintaining or subtly modifying its titration properties,

without requiring exquisite stereo chemistry (Harrison, 2008). In fact, we have observed that the

histidine residues in the propeptide of eukaryotic subtilases are not strictly conserved (Figure 1)

but are spatially “unconstrained” within their sequence. Based on this finding, we preferred to

analyze overall histidine content within propeptides and catalytic domains. Since histidine is

among the least abundant amino acids within proteins (~2.3%), small changes in their numbers

can significantly influence the overall content, especially in the rather short propeptides.

We selected three specific families for our analysis because we wanted proteins that

have (i) a well-distributed phylogeny in prokaryotes and eukaryotes (ii) propeptide domains

necessary for chaperoning folding (iii) the structures of both propeptides and catalytic domains

solved, and (iv) activation pathways which are experimentally well characterized. Subtilases,

cathepsins and caspases, conform to the above requirements, and represent more than 10,000

protease sequences.

It is astonishing that the histidine bias is found in families that have no evolutionary

relationship. Also, it is consistently found in subfamilies of subtilases, even though these likely

diverged before the speciation of eukaryotes and prokaryotes. This suggests that different

protease families may have reacted independently to the same evolutionary pressures to

converge to similar solutions, and therefore represent examples of convergent evolution. We

argue that there are two reasons why pH-sensing has independently evolved within propeptide

domains. First, propeptides appear to be under less evolutionary pressure than cognate

protease domains, which must maintain their catalytic function throughout evolution,

necessitating preservation of the active site geometry. As a consequence, propeptides are more

likely to develop new features that allow the modulation of the protease in different

environments. Secondly, since catalytic domains must often work in diverse pH-environments,

the introduction of elements that change protein conformation in a pH-dependent manner are

likely to be detrimental for protein stability and function, and are better “outsourced” to domains

of the protein that are no longer present in the mature protein. The notable exception is the

cathepsin B family, where the presence of the pH-sensitive occluding loop in the catalytic

domain allows a pH-dependent change in protease characteristics, which might be important for

the biological function (Nagler et al., 1997).

Further research must focus on the mechanisms by which histidines in propeptides

mediate activation. Since, not every histidine in a protein acts as a pH-sensor, determinants of

pH-sensing other than the mere presence of histidines must exist. For example, the propeptides

of furin and PC1/3 both are enriched in histidine, but they show differences in pH-sensitivity.

How are such differences encoded within their sequences? Understanding the physical

principles will also better predict the functional significance of histidines within sequences, an

important challenge given the abundance of sequence data compared to experimental

evidences.

While details of the mechanism of pH-sensing in propeptide-dependent proteases are

unknown, we speculate that two general mechanisms are possible. First, protonation of

histidines due to lower pH, could destabilize the interaction between propeptide and protease

domain, either by directly affecting charge or hydrophobic interactions, or by destabilizing the

propeptide structure, which is required for binding. Since subtilases and cathepsins can

autoactivate, the subsequent increase in the fraction of free protease leads to digestion of the

free propeptide and thereby prevents rebinding. A second potential mechanism is that

protonation of histidine leads to structural changes in the propeptides that make cleavage motifs

within the propeptide accessible for active proteases. While a mixture of both mechanisms is

likely responsible, we speculate that the second mechanism plays a prominent role, since a

histidine within the core and not at the interface to the catalytic domain was essential for pH-

mediated activation of furin (Feliciangeli et al., 2006). Also, the second mechanism was shown

to regulate maturation of the dengue virus within the secretory pathway, where lowering the pH

leads to drastic conformational changes in the capsid proteins, which expose cleavage sites to

control capsid processing (Yu et al., 2008). In this case the substrate, and not the protease,

controls activation, tempting one to speculate that such a mechanism may be found in other

proteins processed by proteases within the secretory pathway.

Since histidine enrichment correlates with the pH-mediated activation in subtilases and

cathepsins, we speculate that it can be used to predict proteins that use a similar mechanism for

activation. A list of all human proteins with annotated propeptides in the UniProt database,

which have more histidines in their propeptides than expected assuming a probability for

histidine of 2.3%, (Table S1) includes 52 proteins that are either secreted or targeted to the

secretory to endocytotic pathway. While this biased can be random or caused by other factors,

such as zinc binding sited, which could explain why metalloproteases like “A Disintegrin and

metalloproteinase domain-containing protein” (ADAM) or Matrix metalloproteases are frequent

in the list, we propose that members on that list use the pH of the secretory pathway to regulate

their proteolytic activation. While the lack of knowledge about their activation and functions of

propeptides, especially in prokaryotic homologs, did not allow us to include a detailed analysis,

we note that several of these proteins show consistent enrichment of histidines in eukaryotic,

but not prokaryotic homologs (Table S2).

In summary, this study suggests a prominent role of the pH gradient in the secretory

pathway in orchestrating the proteolytic processing of secreted proteins. Any disturbances in

this gradient could therefore lead do disregulation of protease activity. Disregulation of

proprotein convertases and cathepsins can have adverse effects, and are associated with

diseases like cancer, artherosclerosis and Dent’s disease (Reiser et al., 2010; Seidah and Prat,

2012). Since all these diseases are also associated with changes in cytosolic pH (Naghavi et

al., 2002; Webb et al., 2011), studies that address whether the secretory pH-gradient is also

effected are needed to address the question of whether pH-disregulation plays a role in

disturbing the regulation of the secretory pathway.

EXPERIMENTAL PROCEDURE:

Conservation Analysis: Analysis of conserved residues was performed using the ConSurf

server with standard settings (Ashkenazy et al., 2010). The crystal structure of the

propeptide:subtilisin E complex (PDB: 1SCJ) was used as input for analyzing bacterial

subtilases, while a homology model for the catalytic domain of PC1 based on the crystal

structure of furin (PDB: 1P8J) and an NMR solution structure of the PC1 propeptide (PDB:

1KN6) docked onto the catalytic domain using the subtilisin structure as a reference, was used

for eukaryotic subtilases. Results were analyzed and plotted using the UCSF Chimera package

(Pettersen et al., 2004).

Data acquisition: The BioMart interface of the InterPro database (Hunter et al., 2012) was

used to download UniProt sequence identifiers, start and stop positions, and taxonomy

identifiers of annotations from the entries PF00082, PF00112, and PF00656 of the PFAM

database for subtilases, cathepsins, and caspases, respectively (Punta et al., 2012; The UniProt

Consortium, 2012). Protein sequences were downloaded from the UniProt database. Phylogeny

was downloaded from the PFAM database, and taxonomy was obtained from the NCBI

Taxonomy homepage.

Amino acid content calculations: Sequences with two annotated catalytic domains or those

marked as deprecated in the UniProt database were discarded. The catalytic domains were

defined as sequences between the start and stop annotations while propeptides were defined

as sequences between positions 20 and the start annotations for subtilases and cathepsins.

The first 20 residues were not included since they represent the signal peptide. Since caspases

lack signal peptides, residues from position 1 to the start annotations were denoted as

propeptides. Sequences with propeptides shorter than 50 residues or longer than 300 residues

were discarded. For the remaining sequences the amino acid contents of the propeptides and

catalytic domains, [AA]Pro and [AA]Cat, were calculated by dividing the number of occurrences of

the amino acid AA in a sequence by the sequence length. The difference between [AA]Pro and

[AA]Cat was calculated as Δ[AA].

Tree construction: NCBI taxonomy based trees were constructed using taxonomy identifiers

as input for the iTol Tree generator (Letunic and Bork, 2011) and adding each protein as a node

of their species. Trees were plotted using the ‘ape’ package written in R statistical computing

language (Paradis et al., 2004; R Core Team, 2012).

Statistical testing: A non-parametric Mann-Whitney test was performed to assess differences

in the distribution of Δ[AA] between prokaryotes and eukaryotes using the R statistical

computing language. The effect size was calculated as U/mn, by dividing the test statistic U by

the product of the two sample sizes (Newcombe, 2006).

Sliding window analysis: For each sequence the number of histidines, #His(i,k), in a window

of length k starting at position i, ranging from 1 to n–k+1 were counted, where n is the length of

the sequence. To account for different sequence lengths, the starting sequence positions were

normalized as follows:

#!"#!"#$ !, ! = #!"#( !! ∗ !, !) ;

Where, ! is the median sequence length and the term !! ∗ ! was rounded to the nearest integer.

For each position i, the #!"#!"#$ !, ! values were averaged and then divided by k to obtain

the average histidine content, #His(i), at that position. This method assumes that differences in

length due to insertion and deletions are evenly distributed within the protein. Using a multiple

sequence alignment for normalization would potentially account better for the position of

insertion, but the number of sequences and the low quality of the alignment especially in the

propeptide region made that impractical for this study.

ACKNOWLEDGEMENTS:

REFERENCES:

FIGURE LEGENDS:

Figure 1: Propeptides are more divergent than cognate catalytic domains. Conservation

scores mapped onto a ribbon presentation of (A) Subtilisin E and (B) PC1/3. Thick tubes

represent high divergence at this position while thin tubes represent conservation. Color

indicates percentage of sequences that encode a histidine residue at this position from 0%

(grey) to 100% (blue)

Figure 2: Histidines are enriched in propeptides of eukaryotic, but not prokaryotic,

subtilases. (A) Phylogenetic tree of subtilases from the PFAM database. Bars on the outside

indicate the Δ[His] value of each sequence. A black circle represents 0%. Bars pointing outward

and inward represent positive and negative Δ[His] values, respectively. Dashed circles outside

and inside of the solid black circle represent Δ[His] values of ±1%. Eukaryotic, prokaryotic, and

archean sequences are colored red, blue, and green, respectively. Black arcs on the outside

mark the clades of major subtilase subfamilies. (B) Tree based on the NCBI taxonomy

classification. Annotation of Δ[His] and color coding are as above. Thick black arcs mark the

three super-kingdoms of life, while thin arcs denote kingdoms of eukaryotes and phylums of

prokaryotes. (C) Kernel density estimation of the distribution of [His]Pro and [His]Cat in

prokaryotes and eukaryotes. (D) Kernel density estimation of the distribution of Δ[His] for

prokaryotes and eukaryotes. (E) Effect size U/mn of the Mann-Whitney test for difference

between the distributions shown in panel D performed for all 20 natural amino acids. Figure S1

and S2 show the Kernel density estimations for all amino acids. (F) Sliding Window Analysis of

average histidine content in eukaryotic and prokaryotic subtilases using a window of 20

residues. The black dashed line indicates average histidine content in the UniProt database.

See methods for detailed explanation of normalization of the sequence length. Arrows indicate

relative position of annotations for the end of the propeptide domain and the catalytic histidine

residue according to subtilisin E and PC1/3. (G) Bar graph showing [His]Pro and [His]Cat values

for selected subtilases. Blue, red, and green shades represent prokaryotic, eukaryotic, and

archean sequences, respectively. Light shades indicate [His]Cat and dark shades indicate [His]Pro

Figure 3: Histidine enrichment exists only in propeptide domains of the Cathepsin L

family, while it is also present in the occluding loop of the Cathepsin B family. (A)

Phylogenetic tree of cathepsins from the PFAM database. Bars on the outside indicate the

Δ[His] value of each sequence. A black circle represents 0%. Bars pointing outward and inward

represent positive and negative Δ[His] values, respectively. Dashed circles outside and inside of

the solid black circle represent Δ[His] values of ±1%. Eukaryotic, prokaryotic, archean, and viral

sequences are colored red, blue, green, and cyan, respectively. Black arcs on the outside mark

the clades of major cathepsin subfamilies, with the cathepsin L family shown in green and the

cathepsin B family shown in purple. (B) Kernel density estimation of the distribution of [His]Pro

and [His]Cat in cathepsin L and B families and in prokaryotes. (C) Kernel density estimation of

the distribution of Δ[His] in cathepsin L and B families and in prokaryotes. (D) Sliding Window

Analysis of average histidine content in cathepsin L and B families and in prokaryotes using a

window of 20 residues. The black dashed line indicates average histidine content in the UniProt

database. See methods for detailed explanation of normalization of the sequence length.

Arrows indicate relative position of annotations for the end of the propeptide domain and the

catalytic histidine residue according to Cathepsin L and B, as well as the occluding loop in

cathepsin B. (E) Structure superimposition of procathepsin L (PDB: 1BY8) and procathepsin B

(PDB: 1MIR). The catalytic domains are shown in grey ribbon, while propeptides are shown in

green and purple for cathepsin L and B, respectively. The occluding loop of cathepsin B is

colored in orange and the corresponding loop in cathepsin L is colored green. The sidechains of

histidine residues are depicted as stick representations. (F) A close up of interactions between

the occluding loop and the propeptide. Colors are as above. Structural depictions were created

using the UCSF Chimera Suite Pettersen, 2004 #807}.

Figure 4: The cytosolic caspase family shows no histidine bias in propeptides. (A)

Phylogenetic tree of caspases from the PFAM database. Bars on the outside indicate the Δ[His]

value of each sequence. A black circle represents 0%. Bars pointing outward and inward

represent positive and negative Δ[His] values, respectively. Dashed circles outside and inside of

the solid black circle represent Δ[His] values of ±1%. Prokaryotic, metazoan, plant, fungal and

other eukaryotic sequences are colored blue, yellow, cyan, purple and red, respectively. Black

arcs on the outside depict the metazoan caspase and metacaspase families. (B) Kernel density

estimation of the distribution of [His]Pro and [His]Cat in prokaryotes and metazoan shown in blue

and yellow, respectively. (C) Kernel density estimation of the distribution of Δ[His] in metazoan

and prokaryotic caspases shown in yellow and blue, respectively. (D) Sliding Window Analysis

of average histidine content in metazoan and prokaryotic caspases using a window of 20

residues. See methods for detailed explanation of normalization of the sequence length. Arrows

indicate relative position of annotations for the end of the propeptide domain and the catalytic

histidine residue according to Caspase 2.

TABLE LEGENDS:

Table 1: Results of Mann-Whitney tests to evaluate differences in distribution of Δ[AA]

between eukaryotes and prokaryotes. For each amino acid the following numbers are

reported: Median of Δ[AA] for eukaryotes and prokaryotes, test statistic of the Mann-Whitney

test, the resulting p-value, the effect size U/mn. Sample sizes were 2156 and 4256 for

eukaryotes and prokaryotes, respectively.

TABLES:

Residue Median [%]

Eukaryotes Median [%]

Prokaryotes U p U/mn

A -2.01 -0.29 3484667 6.3 x 10-56 0.38 V -0.03 -0.13 4692108 1.4 x 10-1 0.51 L 1.27 1.53 4494450 1.8 x 10-1 0.49 I -0.29 -0.61 4845335 2.4 x 10-4 0.53

M -0.32 -0.44 4781449 5.7 x 10-3 0.52 F 0.53 0.06 5019341 7.3 x 10-10 0.55 Y -0.27 -1.00 5564109 3.6 x 10-44 0.61 W -0.46 -0.92 5529692 3.2 x 10-41 0.60 S -0.47 -0.18 4329195 2.2 x10-4 0.47 T -1.36 -0.07 3616488 9.2 x 10-44 0.39 N -1.63 -1.86 4339854 3.9 x 10-4 0.47 Q 1.11 1.49 4198344 2.6 x 10-8 0.46 C -1.25 -0.28 2881834 7.5 x 10-132 0.31 G -5.2 -5.3 4576112 8.7 x 10-1 0.50 P -0.67 0.04 3852582 8.5 x 10-26 0.42 D -0.25 -1.52 5644738 1.9 x 10-51 0.62 E 2.85 2.51 4864907 7.7 x 10-5 0.53 H 1.53 -0.56 7048731 1.6 x 10-270 0.77 K 1.45 1.58 4356812 9.6 x 10-4 0.47 R 1.86 0.95 5376156 2.2 x 10-29 0.59

REFERENCES:

Anderson, E.D., Molloy, S.S., Jean, F., Fei, H., Shimamura, S., and Thomas, G. (2002). The ordered and compartment-specfific autoproteolytic removal of the furin intramolecular chaperone is required for enzyme activation. J Biol Chem 277, 12879-12890. Anderson, E.D., VanSlyke, J.K., Thulin, C.D., Jean, F., and Thomas, G. (1997). Activation of the furin endoprotease is a multiple-step process: requirements for acidification and internal propeptide cleavage. Embo J 16, 1508-1518. Andreini, C., Bertini, I., Cavallaro, G., Najmanovich, R.J., and Thornton, J.M. (2009). Structural analysis of metal sites in proteins: non-heme iron sites as a case study. J Mol Biol 388, 356-380. Ashkenazy, H., Erez, E., Martz, E., Pupko, T., and Ben-Tal, N. (2010). ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic acids research 38, W529-533. Carter, P., and Wells, J.A. (1987). Engineering enzyme specificity by "substrate-assisted catalysis". Science 237, 394-399. Casey, J.R., Grinstein, S., and Orlowski, J. (2010). Sensors and regulators of intracellular pH. Nat Rev Mol Cell Biol 11, 50-61. Coulombe, R., Grochulski, P., Sivaraman, J., Menard, R., Mort, J.S., and Cygler, M. (1996). Structure of human procathepsin L reveals the molecular basis of inhibition by the prosegment. Embo J 15, 5492-5503. Creagh, E.M., Conroy, H., and Martin, S.J. (2003). Caspase-activation pathways in apoptosis and immunity. Immunological reviews 193, 10-21. Dillon, S.L., Williamson, D.M., Elferich, J., Radler, D., Joshi, R., Thomas, G., and Shinde, U. (2012). Propeptides Are Sufficient to Regulate Organelle-Specific pH-Dependent Activation of Furin and Proprotein Convertase 1/3. J Mol Biol 423, 47-62. Dodson, G., and Wlodawer, A. (1998). Catalytic triads and their relatives. Trends Biochem Sci 23, 347-352. Embley, T.M., and Martin, W. (2006). Eukaryotic evolution, changes and challenges. Nature 440, 623-630. Feliciangeli, S.F., Thomas, L., Scott, G.K., Subbian, E., Hung, C.H., Molloy, S.S., Jean, F., Shinde, U., and Thomas, G. (2006). Identification of a pH sensor in the furin propeptide that regulates enzyme activation. J Biol Chem 281, 16108-16116. Gunkel, F.A., and Gassen, H.G. (1989). Proteinase K from Tritirachium album Limber. Characterization of the chromosomal gene and expression of the cDNA in Escherichia coli. European journal of biochemistry / FEBS 179, 185-194. Harrison, S.C. (2008). The pH sensor for flavivirus membrane fusion. The Journal of cell biology 183, 177-179. Hebert, D.N., and Molinari, M. (2007). In and out of the ER: protein folding, quality control, degradation, and related human diseases. Physiological reviews 87, 1377-1408. Hunter, S., Jones, P., Mitchell, A., Apweiler, R., Attwood, T.K., Bateman, A., Bernard, T., Binns, D., Bork, P., Burge, S., et al. (2012). InterPro in 2011: new developments in the family and domain prediction database. Nucleic acids research 40, D306-312. Illy, C., Quraishi, O., Wang, J., Purisima, E., Vernet, T., and Mort, J.S. (1997). Role of the occluding loop in cathepsin B activity. J Biol Chem 272, 1197-1202. Jain, S.C., Shinde, U., Li, Y., Inouye, M., and Berman, H.M. (1998). The crystal structure of an autoprocessed Ser221Cys-subtilisin E-propeptide complex at 2.0 A resolution. J Mol Biol 284, 137-144. Letunic, I., and Bork, P. (2011). Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic acids research 39, W475-478.

Lopez-Otin, C., and Bond, J.S. (2008). Proteases: multifunctional enzymes in life and disease. J Biol Chem 283, 30433-30437. Naghavi, M., John, R., Naguib, S., Siadaty, M.S., Grasu, R., Kurian, K.C., van Winkle, W.B., Soller, B., Litovsky, S., Madjid, M., et al. (2002). pH Heterogeneity of human and rabbit atherosclerotic plaques; a new insight into detection of vulnerable plaque. Atherosclerosis 164, 27-35. Nagler, D.K., Storer, A.C., Portaro, F.C., Carmona, E., Juliano, L., and Menard, R. (1997). Major increase in endopeptidase activity of human cathepsin B upon removal of occluding loop contacts. Biochemistry 36, 12608-12615. Newcombe, R.G. (2006). Confidence intervals for an effect size measure based on the Mann-Whitney statistic. Part 1: general issues and tail-area-based methods. Statistics in medicine 25, 543-557. Nishimura, Y., Kawabata, T., and Kato, K. (1988). Identification of latent procathepsins B and L in microsomal lumen: characterization of enzymatic activation and proteolytic processing in vitro. Archives of biochemistry and biophysics 261, 64-71. Oda, K., Sugitani, M., Fukuhara, K., and Murao, S. (1987). Purification and properties of a pepstatin-insensitive carboxyl proteinase from a gram-negative bacterium. Biochimica et biophysica acta 923, 463-469. Oyama, H., Hamada, T., Ogasawara, S., Uchida, K., Murao, S., Beyer, B.B., Dunn, B.M., and Oda, K. (2002). A CLN2-related and thermostable serine-carboxyl proteinase, kumamolysin: cloning, expression, and identification of catalytic serine residue. Journal of biochemistry 131, 757-765. Paradis, E., Claude, J., and Strimmer, K. (2004). APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics 20, 289-290. Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., and Ferrin, T.E. (2004). UCSF Chimera--a visualization system for exploratory research and analysis. Journal of computational chemistry 25, 1605-1612. Punta, M., Coggill, P.C., Eberhardt, R.Y., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K., Ceric, G., Clements, J., et al. (2012). The Pfam protein families database. Nucleic acids research 40, D290-301. Quraishi, O., Nagler, D.K., Fox, T., Sivaraman, J., Cygler, M., Mort, J.S., and Storer, A.C. (1999). The occluding loop in cathepsin B defines the pH dependence of inhibition by its propeptide. Biochemistry 38, 5017-5023. R Core Team (2012). R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing). Reiser, J., Adair, B., and Reinheckel, T. (2010). Specialized roles for cysteine cathepsins in health and disease. The Journal of clinical investigation 120, 3421-3431. Seidah, N.G., Mowla, S.J., Hamelin, J., Mamarbachi, A.M., Benjannet, S., Toure, B.B., Basak, A., Munzer, J.S., Marcinkiewicz, J., Zhong, M., et al. (1999). Mammalian subtilisin/kexin isozyme SKI-1: A widely expressed proprotein convertase with a unique cleavage specificity and cellular localization. Proceedings of the National Academy of Sciences of the United States of America 96, 1321-1326. Seidah, N.G., and Prat, A. (2012). The biology and therapeutic targeting of the proprotein convertases. Nature reviews Drug discovery 11, 367-383. Shinde, U., and Inouye, M. (1993). Intramolecular chaperones and protein folding. Trends Biochem Sci 18, 442-446. Shinde, U., and Thomas, G. (2011). Insights from bacterial subtilases into the mechanisms of intramolecular chaperone-mediated activation of furin. Methods Mol Biol 768, 59-106. Siezen, R.J., and Leunissen, J.A. (1997). Subtilases: the superfamily of subtilisin-like serine proteases. Protein science : a publication of the Protein Society 6, 501-523.

Slepkov, E.R., Rainey, J.K., Sykes, B.D., and Fliegel, L. (2007). Structural and functional analysis of the Na+/H+ exchanger. The Biochemical journal 401, 623-633. Srivastava, J., Barber, D.L., and Jacobson, M.P. (2007). Intracellular pH sensors: design principles and functional significance. Physiology (Bethesda) 22, 30-39. Subbian, E., Yabuta, Y., and Shinde, U. (2004). Positive selection dictates the choice between kinetic and thermodynamic protein folding and stability in subtilases. Biochemistry 43, 14348-14360. Tangrea, M.A., Bryan, P.N., Sari, N., and Orban, J. (2002). Solution structure of the pro-hormone convertase 1 pro-domain from Mus musculus. J Mol Biol 320, 801-812. The UniProt Consortium (2012). Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic acids research 40, D71-75. Tsiatsiani, L., Van Breusegem, F., Gallois, P., Zavialov, A., Lam, E., and Bozhkov, P.V. (2011). Metacaspases. Cell death and differentiation 18, 1279-1288. Turk, B., Dolenc, I., Turk, V., and Bieth, J.G. (1993). Kinetics of the pH-induced inactivation of human cathepsin L. Biochemistry 32, 375-380. Turk, V., Stoka, V., Vasiljeva, O., Renko, M., Sun, T., Turk, B., and Turk, D. (2012). Cysteine cathepsins: from structure, function and regulation to new frontiers. Biochimica et biophysica acta 1824, 68-88. Webb, B.A., Chimenti, M., Jacobson, M.P., and Barber, D.L. (2011). Dysregulated pH: a perfect storm for cancer progression. Nature reviews Cancer 11, 671-677. Wlodawer, A., Li, M., Gustchina, A., Oyama, H., Dunn, B.M., and Oda, K. (2003). Structural and enzymatic properties of the sedolisin family of serine-carboxyl peptidases. Acta biochimica Polonica 50, 81-102. Yu, I.M., Zhang, W., Holdaway, H.A., Li, L., Kostyuchenko, V.A., Chipman, P.R., Kuhn, R.J., Rossmann, M.G., and Chen, J. (2008). Structure of the immature dengue virus at low pH primes proteolytic maturation. Science 319, 1834-1837.

Figure 1

A

B

Catalytic Domain Propeptide

Catalytic Domain Propeptide

C-terminus propeptide

C-terminus propeptide

Figure 2

Archaea∆ [His] = − 0.37 %

Eukaryota∆ [His] = 1.72 %

Bacteria∆ [His] = − 0.34 %

AnimalsAnimalsAnimalsAnimalsAnimalsAnimalsAnimalsAnimalsAnimals∆ [His] = 2.36 %∆ [His] = 2.36 %∆ [His] = 2.36 %∆ [His] = 2.36 %∆ [His] = 2.36 %∆ [His] = 2.36 %∆ [His] = 2.36 %∆ [His] = 2.36 %∆ [His] = 2.36 %

PlantsPlantsPlantsPlantsPlantsPlantsPlantsPlantsPlants∆ [His] = 2.08 %∆ [His] = 2.08 %∆ [His] = 2.08 %∆ [His] = 2.08 %∆ [His] = 2.08 %∆ [His] = 2.08 %∆ [His] = 2.08 %∆ [His] = 2.08 %∆ [His] = 2.08 %

Fungi∆ [His] = 1.61 %

AcidobacteriaAcidobacteriaAcidobacteriaAcidobacteriaAcidobacteriaAcidobacteriaAcidobacteriaAcidobacteriaAcidobacteria∆ [His]=2.13%∆ [His]=2.13%∆ [His]=2.13%∆ [His]=2.13%∆ [His]=2.13%∆ [His]=2.13%∆ [His]=2.13%∆ [His]=2.13%∆ [His]=2.13%

ActinobacteriaActinobacteriaActinobacteriaActinobacteriaActinobacteriaActinobacteriaActinobacteriaActinobacteriaActinobacteria∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %

ProteobacteriaProteobacteriaProteobacteriaProteobacteriaProteobacteriaProteobacteriaProteobacteriaProteobacteriaProteobacteria∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %

FirmicutesFirmicutesFirmicutesFirmicutesFirmicutesFirmicutesFirmicutesFirmicutesFirmicutes∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %

Kexin/PCsKexin/PCsKexin/PCsKexin/PCsKexin/PCsKexin/PCsKexin/PCsKexin/PCsKexin/PCs

SubtilisinSubtilisinSubtilisinSubtilisinSubtilisinSubtilisinSubtilisinSubtilisinSubtilisin

Pyrolysin/CucumolisinPyrolysin/CucumolisinPyrolysin/CucumolisinPyrolysin/CucumolisinPyrolysin/CucumolisinPyrolysin/CucumolisinPyrolysin/CucumolisinPyrolysin/CucumolisinPyrolysin/Cucumolisin

Proteinase KProteinase KProteinase KProteinase KProteinase KProteinase KProteinase KProteinase KProteinase K

SedolisinSedolisinSedolisinSedolisinSedolisinSedolisinSedolisinSedolisinSedolisin

A B

0

5

10

15

20

25

30

Dens

ity

0 2 4 6∆ [His] [%]

ProkaryotesEukaryotes

0

10

20

30

40

50

60

70

Dens

ity

[His]Cat Prokaryotes[His]Pro Prokaryotes[His]Cat Eukaryotes[His]Pro Eukaryotes

0 1 2 3 4 5 6 7 8 10[His] [%]

Alanine

ValineLeucine

Isoleucine

Methionine

Phenylalanine

Tyrosine

Tryptophan

SerineThreonine

Asparagine

Glutamine

Cysteine

Glycine

ProlineHistidine

Aspartate

Glutamate

LysineArginine

0

0.25

0.5

0.75

1

U\m

n

0 100 200 300 400 500

01234567

Residue

Hist

idin

e co

nten

t [%

]

PC1: End

of propepti

de

Subtilisin E

: End of pr

opeptide

PC1: Cata

lytic histidin

e

Subtilisin E

: Catalytic h

istidine

Pyrolysin

HalolysinCucumisinARA12XSP1Proteinase KProteinase RCerevisinKexin

FurinPC2PC1PC6

PC7PC4

PC5P80146Aqualysin

P42780P16558XanthomonalisinPseudomonalisinAlkaline protease

Subtilisin EProtease eprBacillopeptidase FP54423P00783Subtilisin NATSubtilisin BPN'Subt. CarlsbergSubtilisinP20724KumamolisinP41363Q45670Alkaline proteaseSubtilisin J

Protease nisP

0 2 4 6 8

0 2 4 6 8

His content [%]

C D

E

F

G

Prokaryotes

Eukaryotes

Archae

[His]Pro

[His]Cat

Figure 3

Cathepsin OCathepsin OCathepsin OCathepsin OCathepsin OCathepsin OCathepsin OCathepsin OCathepsin O∆ [His]=0.28%∆ [His]=0.28%∆ [His]=0.28%∆ [His]=0.28%∆ [His]=0.28%∆ [His]=0.28%∆ [His]=0.28%∆ [His]=0.28%∆ [His]=0.28%

Cathepsin BCathepsin BCathepsin BCathepsin BCathepsin BCathepsin BCathepsin BCathepsin BCathepsin B∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %

Cathepsin LCathepsin LCathepsin LCathepsin LCathepsin LCathepsin LCathepsin LCathepsin LCathepsin L∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %

Cathepsin FCathepsin FCathepsin FCathepsin FCathepsin FCathepsin FCathepsin FCathepsin FCathepsin F∆ [His]=1.14%∆ [His]=1.14%∆ [His]=1.14%∆ [His]=1.14%∆ [His]=1.14%∆ [His]=1.14%∆ [His]=1.14%∆ [His]=1.14%∆ [His]=1.14%

Plant CathepsinsPlant CathepsinsPlant CathepsinsPlant CathepsinsPlant CathepsinsPlant CathepsinsPlant CathepsinsPlant CathepsinsPlant Cathepsins∆ [His]=1.81%∆ [His]=1.81%∆ [His]=1.81%∆ [His]=1.81%∆ [His]=1.81%∆ [His]=1.81%∆ [His]=1.81%∆ [His]=1.81%∆ [His]=1.81%

0

10

20

30

40

50

60

70

Dens

ity

[His]Cat Prokaryotes[His]Pro Prokaryotes[His]Cat Cathepsin L[His]Pro Cathepsin L[His]Cat Cathepsin B[His]Pro Cathepsin B

0 1 2 3 4 5 6 7 8 10[His] [%]

0

5

10

15

20

25

30

Dens

ity

0 2 4 6 8∆ [His] [%]

ProkaryotesCathepsin LCathepsin B

0 50 100 150 200 250 300

01234567

Residue

% H

istid

ine

cont

ent

Cathepsin L

: End of pr

opeptide

Cathepsin B

: End of pr

opeptide

Cathepsin B

: Occluding

loop

Cathepsin B

: Catalytic h

istidine

Cathepsin L

: Catalytic h

istidine

A

B C

D

E F

Figure 4

Metazoan CaspasesMetazoan CaspasesMetazoan CaspasesMetazoan CaspasesMetazoan CaspasesMetazoan CaspasesMetazoan CaspasesMetazoan CaspasesMetazoan Caspases∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %

MetacaspasesMetacaspasesMetacaspasesMetacaspasesMetacaspasesMetacaspasesMetacaspasesMetacaspasesMetacaspases∆ [His]=0.83%∆ [His]=0.83%∆ [His]=0.83%∆ [His]=0.83%∆ [His]=0.83%∆ [His]=0.83%∆ [His]=0.83%∆ [His]=0.83%∆ [His]=0.83%

0

10

20

30

40

50

60

70

Dens

ity

[His]Cat Prokaryotes[His]Pro Prokaryotes[His]Cat Metazoans[His]Pro Metazoans

0 1 2 3 4 5 6 7 8 10[His] [%]

0

5

10

15

20

25

30

Dens

ity

ProkaryotesMetazoans

0 2 4 6∆ [His] [%]

0 50 100 150 200 250 300 350

01234567

Residue

% H

istid

ine

cont

ent

Caspase 2

: End of pr

opeptide

Caspase 2

: Catalytic h

istidine

A

B C

D

Figure S1

N = 4256 Bandwidth = 0.005246

Dens

ity

Alanine0

2040

60

N = 4256 Bandwidth = 0.002666

Dens

ity

Valine

N = 4256 Bandwidth = 0.002614

Dens

ity

Leucine

Dens

ity

Isoleucine

020

4060

Dens

ity

Methionine

N = 4256 Bandwidth = 0.001882

Dens

ity

Phenylalanine

N = 4256 Bandwidth = 0.00222

Dens

ity

Tyrosine

020

4060

N = 4256 Bandwidth = 0.001004

Dens

ityTryptophan

Dens

ity

Serin

Dens

ity

Threonine

020

4060

N = 4256 Bandwidth = 0.00343

Dens

ity

Asparagine

Dens

ity

Glutamine

N = 4256 Bandwidth = 0.001254

Dens

ity

Cysteine

020

4060

Dens

ity

Glycine

Dens

ity

Proline

N = 4256 Bandwidth = 0.001215

Dens

ity

Histidine

020

4060

N = 4256 Bandwidth = 0.002323

Dens

ity

Aspartate

N = 4256 Bandwidth = 0.002885

Dens

ity

Glutamate

0 2.5 5 7.5 10 12.5 15 17.5 20

Dens

ity

Lysine

020

4060

0 2.5 5 7.5 10 12.5 15 17.5 20

N = 4256 Bandwidth = 0.002723

Dens

ity

Arginine

0 2.5 5 7.5 10 12.5 15 17.5 20

Bacteria ProtBacteria IMCEukaryota ProtEukaryota IMC

Content [%]

Dens

ity

Figure S2

N = 4256 Bandwidth = 0.00818

Dens

ity

Alanine0

2040

6080

N = 4256 Bandwidth = 0.004751

Dens

ity

Valine

N = 4256 Bandwidth = 0.00482

Dens

ity

Leucine

Dens

ity

Isoleucine

020

4060

80

N = 4256 Bandwidth = 0.002277

Dens

ity

Methionine

N = 4256 Bandwidth = 0.003236

Dens

ity

Phenylalanine

N = 4256 Bandwidth = 0.00302

Dens

ity

Tyrosine

020

4060

80

N = 4256 Bandwidth = 0.001516

Dens

ityTryptophan

N = 4256 Bandwidth = 0.005756

Dens

ity

Serin

N = 4256 Bandwidth = 0.00504

Dens

ity

Threonine

020

4060

80

N = 4256 Bandwidth = 0.004001

Dens

ity

Asparagine

N = 4256 Bandwidth = 0.004062De

nsity

Glutamine

N = 4256 Bandwidth = 0.001235

Dens

ity

Cysteine

020

4060

80

Dens

ity

Glycine

N = 4256 Bandwidth = 0.004345

Dens

ity

Proline

N = 4256 Bandwidth = 0.002244

Dens

ity

Histidine

020

4060

80

N = 4256 Bandwidth = 0.004321

Dens

ity

Aspartate

N = 4256 Bandwidth = 0.005338

Dens

ity

Glutamate

6 4 2 0 2 4 6

N = 4256 Bandwidth = 0.005832

Dens

ity

Lysine

020

4060

80

6 4 2 0 2 4 6

N = 4256 Bandwidth = 0.005206

Dens

ity

Arginine

6 4 2 0 2 4 6

Bacteria ProtBacteria IMCEukaryota ProtEukaryota IMC

Content Difference [%]

Dens

ity

Figure S3

0 100 200 300 400 500

01234567

Residue

Hist

idin

e co

nten

t [%

]

0 100 200 300 400 500

01234567

Residue

Hist

idin

e co

nten

t [%

]

PC1: End

of propepti

de

Subtilisin E

: End of pr

opeptide

PC1: Cata

lytic histidin

e

Subtilisin E

: Catalytic h

istidine

0 100 200 300 400 500

01234567

Residue

Hist

idin

e co

nten

t [%

]

100 200 300 400 500

01234567

Residue

Hist

idin

e co

nten

t [%

]

100 200 300 400 500

01234567

Residue

Hist

idin

e co

nten

t [%

]

10 Residue Windows

20 Residue Windows

30 Residue Windows

40 Residue Windows

50 Residue Windows

SUPPLEMENTAL TABLES: TABLE S2: Protein(Name( Prokaryotes( Eukaryotes ( Propeptid

e(Catalytic(( Δ(His)( Propeptid

e(Catalytic(( Δ(His)(

Cathepsin(D/E(Family( 1.34( 2.02( C0.68( 2.92( 1.15( +1.75(Carboxypeptidase(Y( 1.48( 1.95( C0.47( 2.16( 2.01( +0.15(alphaClytic((protease( 0.92( 0.99( C0.07( 1.61%( 0.82%( +0.79(Legumain( 2.19( 1.5( +0.69( 8.37( 4.4( +3.97(Lysosomal(Acid(Lipase(( 4.80( 3.14( +1.66( 5.00( 3.36( +1.64(Lysosomal(αCglucosidase(

( ( ( ( ( (

Coagulation(factor(VII( ( ( ( ( ( (CadherinCI( 0.81( 1.16( C0.35( 8.78( 1.9( +6.88(βCHexosaminidase( 1.53(

2.27(3.26(2.62(

C1.73(C0.35(

3.33(3.11(

1.98(α(2.13(β(

+1.35(+0.98(

BMP4( ( ( ( 6.22( 5.11( +1.11(Platelet(derived(growth(factor(

( ( ( ( ( (

SUPPLEMENTAL FIGURE LEGENDS: Figure S1: Distributions of [AA]Pro and [AA]Cat for all 20 amino aicds in eukaryotic and prokaryotic subtilases. Figure S2: Distributin of Δ[AA] for all 20 amino acids in eukaryotic and prokaryotic subtilases. Figure S3: Sliding window analysis of histidine content in eukaryotic and prokaryotic subtilases using different window sizes. SUPPLEMENTAL TABLE LEGENDS: Table S1: List of human proteins with histidine enrichement in their propeptides. All human proteins with annotated propeptides in the UniProt database that have more histidine in their propeptides than expected, assuming a 2.3% probability of histidine at each sequence position, with a significance level of 5%. Table S2: Histidine content of propeptide and catalytic domain of several protein families. Homologs in eukaryotes and prokaryotes were identified using BLAST. Propeptide regions were assigned by homology using multiple sequence alignments.

UniProt(identifierName Length(propeptideHistidine(contentP(X>=k)P12821 Angiotensin-converting1enzyme 74 6.76% 2.80E-02O14672 ADAM10 194 7.73% 5.08E-05O75078 ADAM11 202 4.46% 4.54E-02Q9Y3Q7 ADAM18 168 4.76% 4.15E-02Q9P0K1 ADAM22 197 6.09% 2.22E-03O75077 ADAM23 227 4.85% 1.69E-02Q9UKF2 ADAM30 171 4.68% 4.52E-02Q9BZ11 ADAM33 174 5.17% 2.01E-02O15204 ADAM-like1protein1decysin-1 175 6.29% 2.61E-03Q9H324 ADAM-TS10 208 5.29% 9.33E-03P58397 ADAM-TS12 215 6.05% 1.59E-03Q8TE57 ADAM-TS16 255 5.88% 9.60E-04Q8TE60 ADAM-TS18 237 5.91% 1.34E-03P59510 ADAM-TS20 232 4.74% 1.96E-02Q9UKP5 ADAM-TS6 223 8.52% 1.31E-06Q9P2N4 ADAM-TS9 269 4.09% 4.88E-02O95972 Bone1morphogenetic1protein115 249 5.22% 5.55E-03P12643 Bone1morphogenetic1protein12 259 5.79% 1.12E-03P12644 Bone1morphogenetic1protein14 273 6.23% 2.38E-04P18075 Bone1morphogenetic1protein17 263 5.70% 1.30E-03P55287 Cadherin-11 31 12.90% 5.36E-03Q13634 Cadherin-18 29 17.24% 4.82E-04P12830 Cadherin-1 132 6.06% 1.17E-02P14091 Cathepsin1E 34 8.82% 4.29E-02P09668 Cathepsin1H 85 8.24% 3.52E-03P43235 Cathepsin1K 99 6.06% 2.71E-02P25774 Cathepsin1S 98 8.16% 1.97E-03Q6YHK3 CD1091antigen 25 12.00% 1.92E-02P0CG37 Cryptic1protein 65 7.69% 1.70E-02Q14126 Desmoglein-2 26 11.54% 2.13E-02P12259 Coagulation1factor1V 836 3.59% 1.27E-02P02765 Alpha-2-HS-glycoprotein 40 12.50% 2.17E-03P09958 Furin 83 6.02% 4.28E-02O60383 Growth/differentiation1factor19 295 4.41% 2.04E-02P07686 Beta-hexosaminidase1subunit1beta 79 6.33% 3.58E-02P55103 Inhibin1beta1C1chain 218 4.59% 3.07E-02P58166 Inhibin1beta1E1chain 217 5.07% 1.25E-02P51460 Insulin-like13 47 14.89% 9.55E-05P19827 ITI1heavy1chain1H1 246 4.47% 2.84E-02Q99538 Legumain 110 9.09% 2.40E-04P09848 Lactase-phlorizin1hydrolase 847 3.78% 5.15E-03P10253 Lysosomal1alpha-glucosidase 42 9.52% 1.56E-02

P14151 L-selectin 10 20.00% 2.11E-02Q9NRE1 Matrix1metalloproteinase-26 72 8.33% 6.34E-03P16519 PCSK2 84 8.33% 3.29E-03P29122 PCSK6 86 5.81% 4.86E-02P01127 PDGF1subunit1B 112 5.36% 4.53E-02Q96B86 Repulsive1guidance1molecule1A 147 5.44% 2.10E-02P10600 Transforming1growth1factor1beta-3 280 5.00% 5.96E-03Q9BZD6 Proline-rich1Gla1protein14 32 9.38% 3.68E-02Q8N2E6 Prosalusin 163 5.52% 1.37E-02O43915 Vascular1endothelial1growth1factor1D 216 6.02% 1.66E-03

Date post:	20-Oct-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

PROPEPTIDES OF PROTEASES EVOLVED SENSORS TO … · subsequently evolve complex cellular compartment...

Documents