+ All Categories
Home > Documents > PROPEPTIDES OF PROTEASES EVOLVED SENSORS TO … · subsequently evolve complex cellular compartment...

PROPEPTIDES OF PROTEASES EVOLVED SENSORS TO … · subsequently evolve complex cellular compartment...

Date post: 20-Oct-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
37
PROPEPTIDES OF PROTEASES EVOLVED SENSORS TO EXPLOIT ORGANELLAR PH Johannes Elferich, Danielle M. Williamson, Bala Krishnamoorthy , and Ujwal Shinde * Department of Biochemistry and Molecular Biology, Oregon Health and Science University, 3181 SW Sam Jackson Park Road, Portland, OR, 97239, USA ¶ Department of Mathematics, Washington State University Pullman, WA, 99164, USA *Corresponding Author: Ujwal Shinde, [email protected] Phone (503)-494-8683 Facsimile: (503)-494-8393 Running title: Protease evolution to exploit organelle pH
Transcript
  • PROPEPTIDES OF PROTEASES

    EVOLVED SENSORS TO

    EXPLOIT ORGANELLAR PH

    Johannes Elferich, Danielle M. Williamson, Bala Krishnamoorthy¶, and Ujwal Shinde*

    Department of Biochemistry and Molecular Biology,

    Oregon Health and Science University, 3181 SW Sam Jackson Park Road,

    Portland, OR, 97239, USA

    ¶ Department of Mathematics, Washington State University

    Pullman, WA, 99164, USA

    *Corresponding Author:

    Ujwal Shinde, [email protected] Phone (503)-494-8683 Facsimile: (503)-494-8393

    Running title: Protease evolution to exploit organelle pH

  • SUMMARY:

    Eukaryotic cells maintain strict control over protein secretion, in part by utilizing the pH-gradient

    maintained within their secretory pathway. How eukaryotic proteins evolved from prokaryotic

    orthologs to exploit the pH-gradient for biological function remains a fundamental question in

    cell biology. We have previously demonstrated that protein domains located within precursor

    proteins, propeptides, encode histidine-driven pH-sensors to regulate organelle-specific

    activation of the eukaryotic proteases, furin and proprotein convertase-1/3. Using bioinformatics,

    we analyzed over 10,000 unique proteases within evolutionarily unrelated families, and

    established that eukaryotic propeptides are enriched in histidines when compared to prokaryotic

    orthologs. On this basis, we propose that eukaryotic proteins evolved to contain histidines within

    cognate propeptides to exploit the tightly controlled pH-gradient of the secretory pathway,

    thereby directing activation within specific organelles. Enrichment of histidine in propeptides

    may therefore be used to predict the presence of pH sensors in other proteases or even

    protease substrates.

    HIGHLIGHTS:

    • Histidine residues in propeptides act as pH-sensors in furin, a eukaryotic protease

    • Histidine is enriched in eukaryotic, but not prokaryotic, subtilase propeptides

    • Histidine enrichment is found in protein families unrelated to subtilases

    • We propose histidine enrichment as an evolutionary mechanism to sense organellar pH

  • INTRODUCTION:

    Eukaryotes are descendants of distinct prokaryotic cells that united symbiotically to

    subsequently evolve complex cellular compartment called organelles (Embley and Martin,

    2006). Although both prokaryotes and eukaryotes are able to secrete proteins, only eukaryotes

    employ multi-compartmental secretory and endocytotic pathways. These pathways maintain a

    precise pH-gradient that acidifies from the endoplasmic reticulum (pH~7.2) to secretory vesicles

    (pH~5.5). This gradient provides the unique environmental conditions essential for the optimal

    structure and function of proteins within distinct biochemical pathways (Casey et al., 2010).

    Since many secreted eukaryotic proteins have prokaryotic orthologs, how and when

    eukaryotic proteins evolved the ability to regulate their activity within different organelles is a

    central question germane to our understanding of protein trafficking. Comparing secreted

    eukaryotic proteins with their bacterial orthologs may potentially provide information about

    mechanisms by which protein activity is regulated during trafficking through the secretory

    pathway.

    Proteases hydrolyze peptide bonds and likely arose early during evolution as simple

    catabolic catalysts that generated amino acid residues in primitive organisms (Lopez-Otin and

    Bond, 2008). Due to their ubiquitous distribution within prokaryotes, eukaryotes, and archea, the

    three domains of life, proteases are well suited for analysis of selective pressures that drove

    adaptation of eukaryotic proteins to the complex organelle trafficking system. Since uncontrolled

    proteolysis has catastrophic consequences, cells appear to have evolved two distinct

    mechanisms that maintain protease activities under exquisite spatiotemporal control (Lopez-

    Otin and Bond, 2008). The first mechanism involves co-evolution of specific endogenous

    inhibitors, typically within compartments distinct from those containing active enzymes. The

    second mechanism involves proteases being synthesized as inactive precursors called

    zymogens, which become active by limited intra- or intermolecular proteolysis. In some cases

    the two regulatory mechanisms are combined; N-terminal propeptides co-evolved to facilitate

    folding of cognate catalytic domains and act as potent inhibitors after cleavage from the catalytic

  • domain (Shinde and Inouye, 1993; Shinde and Thomas, 2011).

    Subtilases – a ubiquitous super-family of serine proteases – represents an ideal group of

    homologs to analyze protein adaptation to eukaryotic organelles, since they exist in all three

    domains of life. Bacterial subtilisin and mammalian proprotein convertase (PC) sub-families

    constitute the most extensively studied enzymes (Shinde and Thomas, 2011). Despite

    evolutionary divergence, proteins in these subfamilies display common folds with conserved

    catalytic triads. Subtilases are almost always expressed as zymogens, with amino and

    occasionally carboxy propeptide extensions. They are classified into two sub-families;

    Extracellular Serine Proteases (ESP) and Intracellular Serine Proteases (ISP) (Subbian et al.,

    2004). ESPs have 80-100 residue long propeptides that catalyze folding and act as inhibitors

    after cleavage, while ISPs have shorter propeptides that only act as inhibitors in the zymogen.

    Catalytic domains and propeptides of mammalian PCs are closely related to protease domains

    of ESPs and not ISPs (Shinde and Thomas, 2011). Similar to bacterial ESPs, propeptides of

    PCs assist folding and require two ordered steps of proteolytic cleavage for activation. The two

    proteolytic cleavages are precisely controlled within different organelles. The first cleavage

    occurs rapidly after protein folding in the endoplasmic reticulum and results in a non-covalent

    complex between the propeptide and the catalytic domain. Activation requires an additional

    cleavage within the propeptide; in the case of furin this cleavage occurs only after the protein is

    trafficked to a different organelle, the trans-golgi-network (TGN) (Anderson et al., 2002). Other

    PCs are activated in a similar manner, but within different compartments (Seidah and Prat,

    2012).

    Experiments in vitro show that the pH of the TGN is sufficient to trigger the second

    activating cleavage of furin (Anderson et al., 1997) and that a histidine residue in the propeptide

    acts as a pH-sensor (Feliciangeli et al., 2006). We recently showed that propeptides of PCs

    mediate the pH of activation, as swapping propeptides between PCs reassigned the pH of

    activation (Dillon et al., 2012). We therefore hypothesized that propeptides of eukaryotic

    subtilases evolved to sense organelle pH in order to direct activation. Such a broad hypothesis

  • is difficult to test experimentally, as it would require biochemical studies on a large number of

    proteins. We overcame this problem by predicting properties of protein sequences based on our

    hypothesis, and testing these against sequence databases using statistical methods. Histidine is

    the only residue with an intrinsic pKa near the physiological range (~6.5) and therefore likely

    involved in pH-sensing mechanisms. In this paper we show that enrichment of histidines in

    propeptides correlates with the requirement to sense pH for activation within the subtilase

    family. Furthermore, we demonstrate similar enrichment in other protease families, indicating

    that enrichment of histidines in propeptides is a common mechanism to regulate activity in the

    secretory pathway.

    RESULTS:

    Propeptide sequences of subtilases are more divergent than cognate catalytic domains:

    To identify conserved sequence elements unique to either prokaryotic or eukaryotic subtilases,

    we performed an evolutionary conservation analysis using the ConSurf server (Ashkenazy et

    al., 2010). The analysis of prokaryotic subtilisin and eukaryotic proprotein convertase families

    was initiated using sequences of Subtilisin E and Proprotein Convertase 1/3 (PC1/3),

    respectively. The resulting conservation scores were mapped on the crystal structure of the

    propepide:Subtilisin E complex (PDB: 1SCJ) and on a homology model of the propeptide:PC1

    complex (based on PDB: 1P8J and 1KN6), respectively (Figure 1). Catalytic domains of

    eukaryotic and prokaryotic subtilases depict a highly conserved core. On the contrary,

    propeptides demonstrate less sequence conservation, with the dibasic cleavage motif at the C-

    terminus of eukaryotic propeptides representing the only conserved region.

    Since histidine 69 was demonstrated to function as a pH sensor in furin (Feliciangeli et

    al., 2006), and given that propeptides of furin and PC1/3 alone are sufficient to impart organelle-

    specific pH-dependent activation of cognate catalytic domains (Dillon et al., 2012), we analyzed

    whether histidine residues demonstrate any sequence conservation within propeptides.

    Although we could not identify absolutely conserved histidine residues in propeptides of

  • eukaryotic subtilases, several positions in our alignment contain a histidine residue in a

    substantial fraction of sequences, especially at the position corresponding to histidine 69 in furin

    (53.3% of sequences). In contrast, prokaryotic subtilases, which do not traverse the secretory

    pathway, appear to encode less histidines within their propeptides. However, when catalytic

    sequences are compared, we find strictly conserved histidine residues within prokaryotic and

    eukaryotic sequences, and studies indicate that they play essential roles in catalysis or protein

    stability (Carter and Wells, 1987). Hence, biased enrichment for histidine residues appears

    localized within propeptides of eukaryotic subtilases.

    The ConSurf analysis can only accommodate 150 sequences within each group, and a

    search initiated using Subtilisin E and PC1/3 may introduce a selection bias based on input

    sequences. Since subtilases encompass over 10,107 unique sequences, as per the PFAM

    database family PF00082 (Punta et al., 2012), we developed a robust analysis of histidine

    distribution within the available sequence data. Since PFAM employs a hidden Markov model of

    only the catalytic domain in subtilases to scour through sequence databases, this method

    avoids any selection bias for propeptide sequences. As PFAM families include only approximate

    demarcations for start and stop positions for catalytic domains, we made the following three

    suppositions to define propeptide and catalytic domain in each sequence: (i) the first 20

    residues correspond to signal peptides (Hebert and Molinari, 2007) (ii) residues between

    position 21 and the start of the catalytic domain correspond to the propeptide, and (iii) subtilases

    with propeptides less than 50 residues represent ISP-like sequences that employ different

    mechanisms for folding and activation (Subbian et al., 2004). This stringency provides a total of

    6533 unique sequences from the PF00082 family for further analyses of a histidine bias.

    Increased histidine content in propeptides of subtilases correlate with requirement of pH-

    mediated activation:

    We computed the abundance of histidine residues in propeptides ([His]Pro) and catalytic

    domains ([His]Cat) for all sequences that met the above criteria. For comparison, we calculated

    the difference in abundance in propeptides and catalytic domains within each protein (Δ[His] =

  • [His]Pro – [His]Cat). A positive value of Δ[His] indicates abundance of histidines in propeptides, a

    negative value signifies abundance in catalytic domains, while near zero values imply equal

    distribution. While Δ[His] values in individual proteins may be subject to random fluctuations, the

    absence of any functional requirements would result in a distribution centered around zero. If

    histidine residues in propeptides are required for the experimentally observed function of

    sensing organelle specific pH, they would be selected during evolution, and one would expect

    statistical bias for positive Δ[His] only within eukaryotic subtilases and near zero or negative

    Δ[His] for prokaryotic subtilases.

    For initial assessments, we plotted Δ[His] on a phylogenetic tree generated by the PFAM

    database (Figure 2A). The tree is consistent with the homology groups defined by Siezen and

    coworkers (Siezen and Leunissen, 1997), with the largest clades representing subtilisin, kexin,

    proteinase K, and pyrolisin, as well as the later characterized sedolisin family (Wlodawer et al.,

    2003). Four of these five families contain eukaryotic and prokaryotic proteins, suggesting these

    families diverged before speciation. Only the subtilisin family is exclusively found within

    prokaryotes. Interestingly, we observed that three of these four families display a predominantly

    positive Δ[His] in eukaryotes, but not in prokaryotes. Only sedolisins show positive Δ[His] values

    in both prokaryotes and eukaryotes.

    To validate that positive Δ[His] values are unique to eukaryotic sequences, we

    constructed a tree based on NCBI taxonomic classification and plotted Δ[His] within all

    subtilases (Figure 2B). The slightly negative mean values of Δ[His] in prokaryotic and archaic

    proteins imply that there is no functional requirements for histidines in prokaryotic propeptides.

    In contrast, we observe a predominantly positive Δ[His] values in eukaryotes, with a mean value

    of 1.72%. This difference signifies a strong increase in histidine content in the propeptide

    compared to the catalytic domain. We observed positive Δ[His] values in all 3 kingdoms of

    higher eukaryotes. In bacteria, the difference of about -0.3% was consistent in the 3 most

    represented phylums. Interestingly, the phylum of Acidobacteria had a mean difference of

    2.13%, comparable to eukaryotes.

  • Although the above analysis provides a visual description, we wanted to analyze the

    statistical significance of the observed bias towards positive Δ[His] values within eukaryotes.

    First, we plotted distributions of [His]Pro and [His]Cat for subtilases in prokaryotes and eukaryotes

    (Figure 2C). The catalytic domains in both species display a distribution centered on 2%, with

    eukaryotes having slightly higher [His]Cat values than prokaryotes, as expected from the average

    histidine content in the UniProt database. While the distribution of [His]Pro in prokaryotes is

    shifted towards lower values with several propeptides completely lacking histidines, the [His]Pro

    in eukaryotes is shifted to higher values, much greater than the catalytic domains. It is important

    to note that the distribution of [His]Pro in eukaryotes displays a much higher deviation than that

    for [His]Cat within both prokaryotes and eukaryotes, which is likely due to the shorter length of

    propeptides. When we investigated distributions for every amino acid we found that this

    enrichment exists only for histidine residues (Figure S1). To further analyze this bias we also

    investigated the distribution of Δ[His] (Figure 2D), which clearly demonstrates the differences in

    histidine bias in prokaryotes and eukaryotes. The Δ[His] distribution in both species are

    positively skewed, with median values of -0.56% (mean = -0.34%) and 1.5% (mean = 1.7%) for

    prokaryotes and eukaryotes, respectively. When differences in distribution for every individual

    amino acids were plotted (Δ[AA]), only cysteine displays a difference between prokaryotes and

    eukaryotes similar to histidine (Figure S2). The cysteine bias is likely due to higher prevalence

    of disulfide bonds in eukaryotes than prokaryotes. To quantify this distribution difference

    between species we employed a non-parametric Mann-Whitney test (Table 1). For several

    amino acids, the test resulted in small p-values (

  • effect size of 0.5. As seen in Figure 2E, histidine shows the highest deviation from 0.5,

    suggesting this bias is not by pure chance. Only cysteine deviates substantially (more than 0.15

    units from 0.5), which is likely due to higher frequency of disulfide bonds in eukaryotes than

    prokaryotes. The fact that deviation from 0.5 in the effect size for histidine is considerably

    greater than that observed for cysteine suggests a biological significance for a histidine bias.

    Since possible errors in database annotation and differences in length between

    propeptides and catalytic domains may result in a false-positive bias, we developed a test that is

    independent of the start annotation in the PFAM database. We calculated histidine content in a

    20-residue sliding window from the beginning of the sequence to the end of catalytic domain for

    all sequences. After normalization as described in Methods, we averaged the resulting histidine

    content profiles for eukaryotic and prokaryotic proteins (Figure 2F). Eukaryotic but not

    prokaryotic proteins show an increase in histidine content in the first 100 residues,

    corresponding to the propeptide. Proteins from both species have increased histidine content at

    positions 200-250 along with a small increase at the C-terminus of the catalytic domain, likely

    due to presence of the catalytic histidine, and a conserved histidine at the C-terminus of the

    catalytic domain. Changes in length of the sliding window do not change the overall profile

    (Figure S3).

    To decipher correlations that may exist between the histidine bias and experimental

    evidence of pH-dependent activation, we analyzed histidine contents in propeptides and

    catalytic domains of individual proteins. For prokaryotes and archaea, we selected proteins that

    displayed “reviewed status” in the UniProt database, and for eukaryotic sequences all homologs

    in Homo sapiens, Saccharomyces cerevisiae, Arabidopsis thaliana, and the model proteases

    cucumisin and Proteinase K/R (Figure 2G). While most bacterial proteins show comparable

    histidine content in propeptides and catalytic domain (approximately 2%), Kumamolisin and

    Xanthomonalisin display histidine content >4% in their propeptides. Consistent with our

    hypothesis, both proteins undergo activation at acidic pH in vitro (Oda et al., 1987; Oyama et al.,

    2002), which is not surprising because their hosts display optimum growth under acidic

  • conditions. Since the intracellular pH within these cells is maintained near neutral, pH sensing is

    an ideal mechanism for discerning intracellular and extracellular environments. Hence, it is not

    surprising that the sedolisin family shows a histidine bias within propeptides of prokaryotic

    sequences, given that this family is optimized to function at low pH. Interestingly, Acidobacteria

    almost exclusively express sedolisins, which explains their positive Δ[His] values. On the other

    hand, all eukaryotic propeptides display histidine content above 4%, with the exception of

    Proteinase K/R and SKI-1. Recombinant expression of proteinase K/R in E. coli produces active

    protease (Gunkel and Gassen, 1989) and SKI-1 loses its propeptide in the ER (Seidah et al.,

    1999), suggesting these processes occur at neutral pH, relaxing the necessity for histidines.

    In conclusion, our results demonstrate that the histidine bias in subtilase propeptides

    generally correlates with host species as it is present in eukaryotes but not in prokaryotes

    (Figure 2). These results are consistent with our hypothesis that during evolution subtilases

    responded to the requirement of regulating their activation as per the pH of their environment by

    encoding histidine residues in propeptides to sense pH and direct activation.

    Cathepsin propeptides are enriched in histidine:

    To investigate whether our hypothesis applies to other pH-activated, propeptide-

    dependent proteases, we analyzed histidine content in cathepsins, a large family of cysteine

    peptidases found mainly in lysosomes (Turk et al., 2012). Similar to subtilases, acidic pH

    initiates activation of cathepsins by propeptide proteolysis. Due to these parallels, we

    hypothesized that eukaryotic cathepsins should show a similar bias for histidine in their

    propeptides.

    We used the PFAM family PF00112 to obtain cathepsin sequences and analyzed them

    in a manner identical to subtilases. Figure 3A shows the phylogenetic tree for various

    cathepsins along with their Δ[His] values. While only few cathepsin homologs exist in

    prokaryotes, this paucity is not due to exclusion of sequences based on our criteria. Although

    experimental data on prokaryotic cathepsins is scarce, we included them in the analysis for

    comparison. The two major well-studied lysosomal cathepsin families are cathepsin L and

  • cathepsin B, both of which activate at low pH (Nishimura et al., 1988; Turk et al., 1993). Since

    no experimental data regarding activation of cathepsin O, cathepsin F, and plant cathepsins

    was found, we excluded them from further analysis. Nonetheless, the latter two cathepsins also

    display the Δ[His] bias.

    While the cathepsin L family shows bias towards positive Δ[His] values, the cathepsin B

    family does not. The distributions of [His]Pro and [His]Cat in the cathepsin L family are similar to

    those in eukaryotic subtilases (Figure 3B). However, the cathepsin B family displays increased

    [His]Pro and [His]Cat values, leading to near-zero Δ[His] values (Figure 3C). Prokaryotic

    cathepsins show similar distributions as prokaryotic subtilases. The small number of prokaryotic

    sequences precludes a statistical comparison between species with robustness similar to

    subtilases.

    We next applied the sliding window analysis to validate the increased histidine content in

    the propeptides of cathepsin L, and to map the specific location of increased histidine content in

    the sequence of cathepsin B (Figure 3D). Prokaryotic cathepsins showed low histidine content

    throughout the protein, with one peak between residues 250 and 300, which is due to the

    catalytic histidine. Consistent with our hypothesis, an additional increase in histidine content

    exists within the first 100 residues of cathepsin L. Interestingly, cathepsin B shows a moderate

    increase in histidine content within the first 100 residues compared to prokaryotes, but a

    substantial second peak corresponding to the occluding loop within the catalytic domain (Figure

    3D and 3E). A comparison of the crystal structures of cathepsin L and cathepsin B (Figure 3E)

    shows that the catalytic domains of the two families are similar. However, the cathepsin B

    propeptide is truncated compared to cathepsin L, while the occluding loop in the catalytic

    domain is longer in cathepsin B. Moreover, the cathepsin B occluding loop in the catalytic

    domain extends into the region occupied by the cathepsin L propeptide to form direct contacts

    with the cathepsin B propeptide. Notably, histidines within the occluding loop of cathepsin B

    occupy similar spatial locations as histidine residues within the cathepsin L propeptide. This

    suggests that the pH-sensing capability in cathepsin B is encoded not only within the

  • propeptide, but also in the occluding loop within the catalytic domain. Consistent with this

    prediction, experimental data demonstrates that the occluding loop interacts with the propeptide

    in a pH-dependent manner and mutation of histidines in the occluding loop to alanine blocks

    activation (Quraishi et al., 1999). Moving pH-sensitivity from the propeptide into the catalytic

    domain provides an evolutionary advantage to cathepsin B by enabling it to switch between an

    endo- and exopeptidase in a pH dependent manner (Illy et al., 1997). In summary, these results

    are consistent with our hypothesis, although subtle variations can exist within individual

    propeptide-dependent protease families.

    Propeptides in the cytosolic caspase family do not display a histidine bias:

    Our hypothesis assumes that eukaryotic proteases require histidines in their propeptides to

    sense the pH of the secretory pathway. Therefore, proteases that are expressed and function in

    the cytosol would be expected to show no histidine bias within their propeptides. Caspases

    constitute the most prominent, propeptide-dependent cytosolic protease family. They are

    responsible for initiating apoptosis within eukaryotic cells (Creagh et al., 2003). Similar to

    subtilases and cathepsins, caspases are expressed as inactive zymogens and activated by

    proteolytic processing. Although apoptosis is linked with mild acidification of the cytosol, pH is

    not shown as important for triggering caspase activation.

    We used the PFAM family PF00656 to obtain caspase sequences and processed them

    as described in Methods. The phylogenetic tree demonstrates that caspase homologs are found

    in metazoans, fungi, and plants (Figure 4A). We exclude metacaspases (homologs in fungi and

    plants) from our analysis because their propeptides contain histidine residues that are involved

    in zinc binding (Tsiatsiani et al., 2011). Metazoan caspases demonstrate increased [His]Cat

    values, while the [His]Pro values are similar to that of prokaryotic caspases (Figure 4B).

    Consistently, Δ[His] values were slightly smaller for eukaryotic proteins (Figure 4C). The sliding

    window analysis of prokaryotic and eukaryotic caspases shows that there is no substantial

    histidine enrichment in the N-terminal residues (Figure 4D). Overall these results are consistent

    with the assumption that the functional requirement of histidines in the propeptide is unique to

  • proteases that need to sense pH to direct their activation.

    DISCUSSION:

    We report a correlation of increased histidine content in propeptides with the

    requirement to sense pH. But does a correlation imply causality? Histidine residues play

    multiple unique roles in proteins because they can (i) function as proton exchangers in enzyme

    catalysis (Dodson and Wlodawer, 1998), (ii) form complexes with soft metals such as iron and

    zinc (Andreini et al., 2009), (iii) provide unique hydrogen bonding geometry, and (iv) alter protein

    structure and interactions in a pH-dependent manner. Since propeptides are not part of the

    active site that mediates proteolysis, and because the propeptides analyzed in this study do not

    bind metal ions (Coulombe et al., 1996; Jain et al., 1998; Tangrea et al., 2002), one can exclude

    the first two roles. It is also unlikely that propeptides in eukaryotes have a different requirement

    for hydrogen bonding than their prokaryotic orthologs, thus endorsing their roles as pH-sensors

    as the most likely explanation for the observed histidine bias.

    Histidine residues have been demonstrated to function as pH sensors within various

    proteins in prokaryotes and eukaryotes (Casey et al., 2010; Srivastava et al., 2007). In some

    cases such as Na+/H+ antiporters, pH-regulation is found in prokaryotic and eukaryotic orthologs

    (Slepkov et al., 2007). This is due to the common functional requirement to regulate intracellular

    pH. Since in the case of the protease families investigated here, pH-sensing seems to be

    unique to eukaryotic proteins (with the exception of sedolisins), one can surmise that this is

    indeed due to a functional requirement that is unique to eukaryotes, such as regulation within

    the secretory pathway.

    It is important to note that electrostatic interactions within a protein family can migrate

    during the course of evolution, even though their physiological functions such as pH sensing

    can be conserved. This may appear through mutations that introduce a redundant charge pair,

    which display a pKa similar to the previous one, to allow the original charge to disappear during

    subsequent steps of evolution while maintaining or subtly modifying its titration properties,

    without requiring exquisite stereo chemistry (Harrison, 2008). In fact, we have observed that the

  • histidine residues in the propeptide of eukaryotic subtilases are not strictly conserved (Figure 1)

    but are spatially “unconstrained” within their sequence. Based on this finding, we preferred to

    analyze overall histidine content within propeptides and catalytic domains. Since histidine is

    among the least abundant amino acids within proteins (~2.3%), small changes in their numbers

    can significantly influence the overall content, especially in the rather short propeptides.

    We selected three specific families for our analysis because we wanted proteins that

    have (i) a well-distributed phylogeny in prokaryotes and eukaryotes (ii) propeptide domains

    necessary for chaperoning folding (iii) the structures of both propeptides and catalytic domains

    solved, and (iv) activation pathways which are experimentally well characterized. Subtilases,

    cathepsins and caspases, conform to the above requirements, and represent more than 10,000

    protease sequences.

    It is astonishing that the histidine bias is found in families that have no evolutionary

    relationship. Also, it is consistently found in subfamilies of subtilases, even though these likely

    diverged before the speciation of eukaryotes and prokaryotes. This suggests that different

    protease families may have reacted independently to the same evolutionary pressures to

    converge to similar solutions, and therefore represent examples of convergent evolution. We

    argue that there are two reasons why pH-sensing has independently evolved within propeptide

    domains. First, propeptides appear to be under less evolutionary pressure than cognate

    protease domains, which must maintain their catalytic function throughout evolution,

    necessitating preservation of the active site geometry. As a consequence, propeptides are more

    likely to develop new features that allow the modulation of the protease in different

    environments. Secondly, since catalytic domains must often work in diverse pH-environments,

    the introduction of elements that change protein conformation in a pH-dependent manner are

    likely to be detrimental for protein stability and function, and are better “outsourced” to domains

    of the protein that are no longer present in the mature protein. The notable exception is the

    cathepsin B family, where the presence of the pH-sensitive occluding loop in the catalytic

    domain allows a pH-dependent change in protease characteristics, which might be important for

  • the biological function (Nagler et al., 1997).

    Further research must focus on the mechanisms by which histidines in propeptides

    mediate activation. Since, not every histidine in a protein acts as a pH-sensor, determinants of

    pH-sensing other than the mere presence of histidines must exist. For example, the propeptides

    of furin and PC1/3 both are enriched in histidine, but they show differences in pH-sensitivity.

    How are such differences encoded within their sequences? Understanding the physical

    principles will also better predict the functional significance of histidines within sequences, an

    important challenge given the abundance of sequence data compared to experimental

    evidences.

    While details of the mechanism of pH-sensing in propeptide-dependent proteases are

    unknown, we speculate that two general mechanisms are possible. First, protonation of

    histidines due to lower pH, could destabilize the interaction between propeptide and protease

    domain, either by directly affecting charge or hydrophobic interactions, or by destabilizing the

    propeptide structure, which is required for binding. Since subtilases and cathepsins can

    autoactivate, the subsequent increase in the fraction of free protease leads to digestion of the

    free propeptide and thereby prevents rebinding. A second potential mechanism is that

    protonation of histidine leads to structural changes in the propeptides that make cleavage motifs

    within the propeptide accessible for active proteases. While a mixture of both mechanisms is

    likely responsible, we speculate that the second mechanism plays a prominent role, since a

    histidine within the core and not at the interface to the catalytic domain was essential for pH-

    mediated activation of furin (Feliciangeli et al., 2006). Also, the second mechanism was shown

    to regulate maturation of the dengue virus within the secretory pathway, where lowering the pH

    leads to drastic conformational changes in the capsid proteins, which expose cleavage sites to

    control capsid processing (Yu et al., 2008). In this case the substrate, and not the protease,

    controls activation, tempting one to speculate that such a mechanism may be found in other

    proteins processed by proteases within the secretory pathway.

    Since histidine enrichment correlates with the pH-mediated activation in subtilases and

  • cathepsins, we speculate that it can be used to predict proteins that use a similar mechanism for

    activation. A list of all human proteins with annotated propeptides in the UniProt database,

    which have more histidines in their propeptides than expected assuming a probability for

    histidine of 2.3%, (Table S1) includes 52 proteins that are either secreted or targeted to the

    secretory to endocytotic pathway. While this biased can be random or caused by other factors,

    such as zinc binding sited, which could explain why metalloproteases like “A Disintegrin and

    metalloproteinase domain-containing protein” (ADAM) or Matrix metalloproteases are frequent

    in the list, we propose that members on that list use the pH of the secretory pathway to regulate

    their proteolytic activation. While the lack of knowledge about their activation and functions of

    propeptides, especially in prokaryotic homologs, did not allow us to include a detailed analysis,

    we note that several of these proteins show consistent enrichment of histidines in eukaryotic,

    but not prokaryotic homologs (Table S2).

    In summary, this study suggests a prominent role of the pH gradient in the secretory

    pathway in orchestrating the proteolytic processing of secreted proteins. Any disturbances in

    this gradient could therefore lead do disregulation of protease activity. Disregulation of

    proprotein convertases and cathepsins can have adverse effects, and are associated with

    diseases like cancer, artherosclerosis and Dent’s disease (Reiser et al., 2010; Seidah and Prat,

    2012). Since all these diseases are also associated with changes in cytosolic pH (Naghavi et

    al., 2002; Webb et al., 2011), studies that address whether the secretory pH-gradient is also

    effected are needed to address the question of whether pH-disregulation plays a role in

    disturbing the regulation of the secretory pathway.

  • EXPERIMENTAL PROCEDURE:

    Conservation Analysis: Analysis of conserved residues was performed using the ConSurf

    server with standard settings (Ashkenazy et al., 2010). The crystal structure of the

    propeptide:subtilisin E complex (PDB: 1SCJ) was used as input for analyzing bacterial

    subtilases, while a homology model for the catalytic domain of PC1 based on the crystal

    structure of furin (PDB: 1P8J) and an NMR solution structure of the PC1 propeptide (PDB:

    1KN6) docked onto the catalytic domain using the subtilisin structure as a reference, was used

    for eukaryotic subtilases. Results were analyzed and plotted using the UCSF Chimera package

    (Pettersen et al., 2004).

    Data acquisition: The BioMart interface of the InterPro database (Hunter et al., 2012) was

    used to download UniProt sequence identifiers, start and stop positions, and taxonomy

    identifiers of annotations from the entries PF00082, PF00112, and PF00656 of the PFAM

    database for subtilases, cathepsins, and caspases, respectively (Punta et al., 2012; The UniProt

    Consortium, 2012). Protein sequences were downloaded from the UniProt database. Phylogeny

    was downloaded from the PFAM database, and taxonomy was obtained from the NCBI

    Taxonomy homepage.

    Amino acid content calculations: Sequences with two annotated catalytic domains or those

    marked as deprecated in the UniProt database were discarded. The catalytic domains were

    defined as sequences between the start and stop annotations while propeptides were defined

    as sequences between positions 20 and the start annotations for subtilases and cathepsins.

    The first 20 residues were not included since they represent the signal peptide. Since caspases

    lack signal peptides, residues from position 1 to the start annotations were denoted as

    propeptides. Sequences with propeptides shorter than 50 residues or longer than 300 residues

    were discarded. For the remaining sequences the amino acid contents of the propeptides and

  • catalytic domains, [AA]Pro and [AA]Cat, were calculated by dividing the number of occurrences of

    the amino acid AA in a sequence by the sequence length. The difference between [AA]Pro and

    [AA]Cat was calculated as Δ[AA].

    Tree construction: NCBI taxonomy based trees were constructed using taxonomy identifiers

    as input for the iTol Tree generator (Letunic and Bork, 2011) and adding each protein as a node

    of their species. Trees were plotted using the ‘ape’ package written in R statistical computing

    language (Paradis et al., 2004; R Core Team, 2012).

    Statistical testing: A non-parametric Mann-Whitney test was performed to assess differences

    in the distribution of Δ[AA] between prokaryotes and eukaryotes using the R statistical

    computing language. The effect size was calculated as U/mn, by dividing the test statistic U by

    the product of the two sample sizes (Newcombe, 2006).

    Sliding window analysis: For each sequence the number of histidines, #His(i,k), in a window

    of length k starting at position i, ranging from 1 to n–k+1 were counted, where n is the length of

    the sequence. To account for different sequence lengths, the starting sequence positions were

    normalized as follows:

    #!"#!"#$ !, ! = #!"#( !! ∗ !, !) ;

    Where, ! is the median sequence length and the term !! ∗ ! was rounded to the nearest integer.

    For each position i, the #!"#!"#$ !, ! values were averaged and then divided by k to obtain

    the average histidine content, #His(i), at that position. This method assumes that differences in

    length due to insertion and deletions are evenly distributed within the protein. Using a multiple

    sequence alignment for normalization would potentially account better for the position of

    insertion, but the number of sequences and the low quality of the alignment especially in the

    propeptide region made that impractical for this study.

    ACKNOWLEDGEMENTS:

    REFERENCES:

  • FIGURE LEGENDS:

    Figure 1: Propeptides are more divergent than cognate catalytic domains. Conservation

    scores mapped onto a ribbon presentation of (A) Subtilisin E and (B) PC1/3. Thick tubes

    represent high divergence at this position while thin tubes represent conservation. Color

    indicates percentage of sequences that encode a histidine residue at this position from 0%

    (grey) to 100% (blue)

    Figure 2: Histidines are enriched in propeptides of eukaryotic, but not prokaryotic,

    subtilases. (A) Phylogenetic tree of subtilases from the PFAM database. Bars on the outside

    indicate the Δ[His] value of each sequence. A black circle represents 0%. Bars pointing outward

    and inward represent positive and negative Δ[His] values, respectively. Dashed circles outside

    and inside of the solid black circle represent Δ[His] values of ±1%. Eukaryotic, prokaryotic, and

    archean sequences are colored red, blue, and green, respectively. Black arcs on the outside

    mark the clades of major subtilase subfamilies. (B) Tree based on the NCBI taxonomy

    classification. Annotation of Δ[His] and color coding are as above. Thick black arcs mark the

    three super-kingdoms of life, while thin arcs denote kingdoms of eukaryotes and phylums of

    prokaryotes. (C) Kernel density estimation of the distribution of [His]Pro and [His]Cat in

    prokaryotes and eukaryotes. (D) Kernel density estimation of the distribution of Δ[His] for

    prokaryotes and eukaryotes. (E) Effect size U/mn of the Mann-Whitney test for difference

    between the distributions shown in panel D performed for all 20 natural amino acids. Figure S1

    and S2 show the Kernel density estimations for all amino acids. (F) Sliding Window Analysis of

    average histidine content in eukaryotic and prokaryotic subtilases using a window of 20

    residues. The black dashed line indicates average histidine content in the UniProt database.

    See methods for detailed explanation of normalization of the sequence length. Arrows indicate

    relative position of annotations for the end of the propeptide domain and the catalytic histidine

    residue according to subtilisin E and PC1/3. (G) Bar graph showing [His]Pro and [His]Cat values

    for selected subtilases. Blue, red, and green shades represent prokaryotic, eukaryotic, and

    archean sequences, respectively. Light shades indicate [His]Cat and dark shades indicate [His]Pro

  • Figure 3: Histidine enrichment exists only in propeptide domains of the Cathepsin L

    family, while it is also present in the occluding loop of the Cathepsin B family. (A)

    Phylogenetic tree of cathepsins from the PFAM database. Bars on the outside indicate the

    Δ[His] value of each sequence. A black circle represents 0%. Bars pointing outward and inward

    represent positive and negative Δ[His] values, respectively. Dashed circles outside and inside of

    the solid black circle represent Δ[His] values of ±1%. Eukaryotic, prokaryotic, archean, and viral

    sequences are colored red, blue, green, and cyan, respectively. Black arcs on the outside mark

    the clades of major cathepsin subfamilies, with the cathepsin L family shown in green and the

    cathepsin B family shown in purple. (B) Kernel density estimation of the distribution of [His]Pro

    and [His]Cat in cathepsin L and B families and in prokaryotes. (C) Kernel density estimation of

    the distribution of Δ[His] in cathepsin L and B families and in prokaryotes. (D) Sliding Window

    Analysis of average histidine content in cathepsin L and B families and in prokaryotes using a

    window of 20 residues. The black dashed line indicates average histidine content in the UniProt

    database. See methods for detailed explanation of normalization of the sequence length.

    Arrows indicate relative position of annotations for the end of the propeptide domain and the

    catalytic histidine residue according to Cathepsin L and B, as well as the occluding loop in

    cathepsin B. (E) Structure superimposition of procathepsin L (PDB: 1BY8) and procathepsin B

    (PDB: 1MIR). The catalytic domains are shown in grey ribbon, while propeptides are shown in

    green and purple for cathepsin L and B, respectively. The occluding loop of cathepsin B is

    colored in orange and the corresponding loop in cathepsin L is colored green. The sidechains of

    histidine residues are depicted as stick representations. (F) A close up of interactions between

    the occluding loop and the propeptide. Colors are as above. Structural depictions were created

    using the UCSF Chimera Suite Pettersen, 2004 #807}.

    Figure 4: The cytosolic caspase family shows no histidine bias in propeptides. (A)

    Phylogenetic tree of caspases from the PFAM database. Bars on the outside indicate the Δ[His]

    value of each sequence. A black circle represents 0%. Bars pointing outward and inward

    represent positive and negative Δ[His] values, respectively. Dashed circles outside and inside of

  • the solid black circle represent Δ[His] values of ±1%. Prokaryotic, metazoan, plant, fungal and

    other eukaryotic sequences are colored blue, yellow, cyan, purple and red, respectively. Black

    arcs on the outside depict the metazoan caspase and metacaspase families. (B) Kernel density

    estimation of the distribution of [His]Pro and [His]Cat in prokaryotes and metazoan shown in blue

    and yellow, respectively. (C) Kernel density estimation of the distribution of Δ[His] in metazoan

    and prokaryotic caspases shown in yellow and blue, respectively. (D) Sliding Window Analysis

    of average histidine content in metazoan and prokaryotic caspases using a window of 20

    residues. See methods for detailed explanation of normalization of the sequence length. Arrows

    indicate relative position of annotations for the end of the propeptide domain and the catalytic

    histidine residue according to Caspase 2.

  • TABLE LEGENDS:

    Table 1: Results of Mann-Whitney tests to evaluate differences in distribution of Δ[AA]

    between eukaryotes and prokaryotes. For each amino acid the following numbers are

    reported: Median of Δ[AA] for eukaryotes and prokaryotes, test statistic of the Mann-Whitney

    test, the resulting p-value, the effect size U/mn. Sample sizes were 2156 and 4256 for

    eukaryotes and prokaryotes, respectively.

  • TABLES:

    Residue Median [%]

    Eukaryotes Median [%]

    Prokaryotes U p U/mn

    A -2.01 -0.29 3484667 6.3 x 10-56 0.38 V -0.03 -0.13 4692108 1.4 x 10-1 0.51 L 1.27 1.53 4494450 1.8 x 10-1 0.49 I -0.29 -0.61 4845335 2.4 x 10-4 0.53

    M -0.32 -0.44 4781449 5.7 x 10-3 0.52 F 0.53 0.06 5019341 7.3 x 10-10 0.55 Y -0.27 -1.00 5564109 3.6 x 10-44 0.61 W -0.46 -0.92 5529692 3.2 x 10-41 0.60 S -0.47 -0.18 4329195 2.2 x10-4 0.47 T -1.36 -0.07 3616488 9.2 x 10-44 0.39 N -1.63 -1.86 4339854 3.9 x 10-4 0.47 Q 1.11 1.49 4198344 2.6 x 10-8 0.46 C -1.25 -0.28 2881834 7.5 x 10-132 0.31 G -5.2 -5.3 4576112 8.7 x 10-1 0.50 P -0.67 0.04 3852582 8.5 x 10-26 0.42 D -0.25 -1.52 5644738 1.9 x 10-51 0.62 E 2.85 2.51 4864907 7.7 x 10-5 0.53 H 1.53 -0.56 7048731 1.6 x 10-270 0.77 K 1.45 1.58 4356812 9.6 x 10-4 0.47 R 1.86 0.95 5376156 2.2 x 10-29 0.59

  • REFERENCES:

    Anderson, E.D., Molloy, S.S., Jean, F., Fei, H., Shimamura, S., and Thomas, G. (2002). The ordered and compartment-specfific autoproteolytic removal of the furin intramolecular chaperone is required for enzyme activation. J Biol Chem 277, 12879-12890. Anderson, E.D., VanSlyke, J.K., Thulin, C.D., Jean, F., and Thomas, G. (1997). Activation of the furin endoprotease is a multiple-step process: requirements for acidification and internal propeptide cleavage. Embo J 16, 1508-1518. Andreini, C., Bertini, I., Cavallaro, G., Najmanovich, R.J., and Thornton, J.M. (2009). Structural analysis of metal sites in proteins: non-heme iron sites as a case study. J Mol Biol 388, 356-380. Ashkenazy, H., Erez, E., Martz, E., Pupko, T., and Ben-Tal, N. (2010). ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic acids research 38, W529-533. Carter, P., and Wells, J.A. (1987). Engineering enzyme specificity by "substrate-assisted catalysis". Science 237, 394-399. Casey, J.R., Grinstein, S., and Orlowski, J. (2010). Sensors and regulators of intracellular pH. Nat Rev Mol Cell Biol 11, 50-61. Coulombe, R., Grochulski, P., Sivaraman, J., Menard, R., Mort, J.S., and Cygler, M. (1996). Structure of human procathepsin L reveals the molecular basis of inhibition by the prosegment. Embo J 15, 5492-5503. Creagh, E.M., Conroy, H., and Martin, S.J. (2003). Caspase-activation pathways in apoptosis and immunity. Immunological reviews 193, 10-21. Dillon, S.L., Williamson, D.M., Elferich, J., Radler, D., Joshi, R., Thomas, G., and Shinde, U. (2012). Propeptides Are Sufficient to Regulate Organelle-Specific pH-Dependent Activation of Furin and Proprotein Convertase 1/3. J Mol Biol 423, 47-62. Dodson, G., and Wlodawer, A. (1998). Catalytic triads and their relatives. Trends Biochem Sci 23, 347-352. Embley, T.M., and Martin, W. (2006). Eukaryotic evolution, changes and challenges. Nature 440, 623-630. Feliciangeli, S.F., Thomas, L., Scott, G.K., Subbian, E., Hung, C.H., Molloy, S.S., Jean, F., Shinde, U., and Thomas, G. (2006). Identification of a pH sensor in the furin propeptide that regulates enzyme activation. J Biol Chem 281, 16108-16116. Gunkel, F.A., and Gassen, H.G. (1989). Proteinase K from Tritirachium album Limber. Characterization of the chromosomal gene and expression of the cDNA in Escherichia coli. European journal of biochemistry / FEBS 179, 185-194. Harrison, S.C. (2008). The pH sensor for flavivirus membrane fusion. The Journal of cell biology 183, 177-179. Hebert, D.N., and Molinari, M. (2007). In and out of the ER: protein folding, quality control, degradation, and related human diseases. Physiological reviews 87, 1377-1408. Hunter, S., Jones, P., Mitchell, A., Apweiler, R., Attwood, T.K., Bateman, A., Bernard, T., Binns, D., Bork, P., Burge, S., et al. (2012). InterPro in 2011: new developments in the family and domain prediction database. Nucleic acids research 40, D306-312. Illy, C., Quraishi, O., Wang, J., Purisima, E., Vernet, T., and Mort, J.S. (1997). Role of the occluding loop in cathepsin B activity. J Biol Chem 272, 1197-1202. Jain, S.C., Shinde, U., Li, Y., Inouye, M., and Berman, H.M. (1998). The crystal structure of an autoprocessed Ser221Cys-subtilisin E-propeptide complex at 2.0 A resolution. J Mol Biol 284, 137-144. Letunic, I., and Bork, P. (2011). Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic acids research 39, W475-478.

  • Lopez-Otin, C., and Bond, J.S. (2008). Proteases: multifunctional enzymes in life and disease. J Biol Chem 283, 30433-30437. Naghavi, M., John, R., Naguib, S., Siadaty, M.S., Grasu, R., Kurian, K.C., van Winkle, W.B., Soller, B., Litovsky, S., Madjid, M., et al. (2002). pH Heterogeneity of human and rabbit atherosclerotic plaques; a new insight into detection of vulnerable plaque. Atherosclerosis 164, 27-35. Nagler, D.K., Storer, A.C., Portaro, F.C., Carmona, E., Juliano, L., and Menard, R. (1997). Major increase in endopeptidase activity of human cathepsin B upon removal of occluding loop contacts. Biochemistry 36, 12608-12615. Newcombe, R.G. (2006). Confidence intervals for an effect size measure based on the Mann-Whitney statistic. Part 1: general issues and tail-area-based methods. Statistics in medicine 25, 543-557. Nishimura, Y., Kawabata, T., and Kato, K. (1988). Identification of latent procathepsins B and L in microsomal lumen: characterization of enzymatic activation and proteolytic processing in vitro. Archives of biochemistry and biophysics 261, 64-71. Oda, K., Sugitani, M., Fukuhara, K., and Murao, S. (1987). Purification and properties of a pepstatin-insensitive carboxyl proteinase from a gram-negative bacterium. Biochimica et biophysica acta 923, 463-469. Oyama, H., Hamada, T., Ogasawara, S., Uchida, K., Murao, S., Beyer, B.B., Dunn, B.M., and Oda, K. (2002). A CLN2-related and thermostable serine-carboxyl proteinase, kumamolysin: cloning, expression, and identification of catalytic serine residue. Journal of biochemistry 131, 757-765. Paradis, E., Claude, J., and Strimmer, K. (2004). APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics 20, 289-290. Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., and Ferrin, T.E. (2004). UCSF Chimera--a visualization system for exploratory research and analysis. Journal of computational chemistry 25, 1605-1612. Punta, M., Coggill, P.C., Eberhardt, R.Y., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K., Ceric, G., Clements, J., et al. (2012). The Pfam protein families database. Nucleic acids research 40, D290-301. Quraishi, O., Nagler, D.K., Fox, T., Sivaraman, J., Cygler, M., Mort, J.S., and Storer, A.C. (1999). The occluding loop in cathepsin B defines the pH dependence of inhibition by its propeptide. Biochemistry 38, 5017-5023. R Core Team (2012). R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing). Reiser, J., Adair, B., and Reinheckel, T. (2010). Specialized roles for cysteine cathepsins in health and disease. The Journal of clinical investigation 120, 3421-3431. Seidah, N.G., Mowla, S.J., Hamelin, J., Mamarbachi, A.M., Benjannet, S., Toure, B.B., Basak, A., Munzer, J.S., Marcinkiewicz, J., Zhong, M., et al. (1999). Mammalian subtilisin/kexin isozyme SKI-1: A widely expressed proprotein convertase with a unique cleavage specificity and cellular localization. Proceedings of the National Academy of Sciences of the United States of America 96, 1321-1326. Seidah, N.G., and Prat, A. (2012). The biology and therapeutic targeting of the proprotein convertases. Nature reviews Drug discovery 11, 367-383. Shinde, U., and Inouye, M. (1993). Intramolecular chaperones and protein folding. Trends Biochem Sci 18, 442-446. Shinde, U., and Thomas, G. (2011). Insights from bacterial subtilases into the mechanisms of intramolecular chaperone-mediated activation of furin. Methods Mol Biol 768, 59-106. Siezen, R.J., and Leunissen, J.A. (1997). Subtilases: the superfamily of subtilisin-like serine proteases. Protein science : a publication of the Protein Society 6, 501-523.

  • Slepkov, E.R., Rainey, J.K., Sykes, B.D., and Fliegel, L. (2007). Structural and functional analysis of the Na+/H+ exchanger. The Biochemical journal 401, 623-633. Srivastava, J., Barber, D.L., and Jacobson, M.P. (2007). Intracellular pH sensors: design principles and functional significance. Physiology (Bethesda) 22, 30-39. Subbian, E., Yabuta, Y., and Shinde, U. (2004). Positive selection dictates the choice between kinetic and thermodynamic protein folding and stability in subtilases. Biochemistry 43, 14348-14360. Tangrea, M.A., Bryan, P.N., Sari, N., and Orban, J. (2002). Solution structure of the pro-hormone convertase 1 pro-domain from Mus musculus. J Mol Biol 320, 801-812. The UniProt Consortium (2012). Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic acids research 40, D71-75. Tsiatsiani, L., Van Breusegem, F., Gallois, P., Zavialov, A., Lam, E., and Bozhkov, P.V. (2011). Metacaspases. Cell death and differentiation 18, 1279-1288. Turk, B., Dolenc, I., Turk, V., and Bieth, J.G. (1993). Kinetics of the pH-induced inactivation of human cathepsin L. Biochemistry 32, 375-380. Turk, V., Stoka, V., Vasiljeva, O., Renko, M., Sun, T., Turk, B., and Turk, D. (2012). Cysteine cathepsins: from structure, function and regulation to new frontiers. Biochimica et biophysica acta 1824, 68-88. Webb, B.A., Chimenti, M., Jacobson, M.P., and Barber, D.L. (2011). Dysregulated pH: a perfect storm for cancer progression. Nature reviews Cancer 11, 671-677. Wlodawer, A., Li, M., Gustchina, A., Oyama, H., Dunn, B.M., and Oda, K. (2003). Structural and enzymatic properties of the sedolisin family of serine-carboxyl peptidases. Acta biochimica Polonica 50, 81-102. Yu, I.M., Zhang, W., Holdaway, H.A., Li, L., Kostyuchenko, V.A., Chipman, P.R., Kuhn, R.J., Rossmann, M.G., and Chen, J. (2008). Structure of the immature dengue virus at low pH primes proteolytic maturation. Science 319, 1834-1837.

  • Figure 1

    A

    B

    Catalytic Domain Propeptide

    Catalytic Domain Propeptide

    C-terminus propeptide

    C-terminus propeptide

  • Figure 2

    Archaea∆ [His] = − 0.37 %

    Eukaryota∆ [His] = 1.72 %

    Bacteria∆ [His] = − 0.34 %

    AnimalsAnimalsAnimalsAnimalsAnimalsAnimalsAnimalsAnimalsAnimals∆ [His] = 2.36 %∆ [His] = 2.36 %∆ [His] = 2.36 %∆ [His] = 2.36 %∆ [His] = 2.36 %∆ [His] = 2.36 %∆ [His] = 2.36 %∆ [His] = 2.36 %∆ [His] = 2.36 %

    PlantsPlantsPlantsPlantsPlantsPlantsPlantsPlantsPlants∆ [His] = 2.08 %∆ [His] = 2.08 %∆ [His] = 2.08 %∆ [His] = 2.08 %∆ [His] = 2.08 %∆ [His] = 2.08 %∆ [His] = 2.08 %∆ [His] = 2.08 %∆ [His] = 2.08 %

    Fungi∆ [His] = 1.61 %

    AcidobacteriaAcidobacteriaAcidobacteriaAcidobacteriaAcidobacteriaAcidobacteriaAcidobacteriaAcidobacteriaAcidobacteria∆ [His]=2.13%∆ [His]=2.13%∆ [His]=2.13%∆ [His]=2.13%∆ [His]=2.13%∆ [His]=2.13%∆ [His]=2.13%∆ [His]=2.13%∆ [His]=2.13%

    ActinobacteriaActinobacteriaActinobacteriaActinobacteriaActinobacteriaActinobacteriaActinobacteriaActinobacteriaActinobacteria∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %

    ProteobacteriaProteobacteriaProteobacteriaProteobacteriaProteobacteriaProteobacteriaProteobacteriaProteobacteriaProteobacteria∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %

    FirmicutesFirmicutesFirmicutesFirmicutesFirmicutesFirmicutesFirmicutesFirmicutesFirmicutes∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %

    Kexin/PCsKexin/PCsKexin/PCsKexin/PCsKexin/PCsKexin/PCsKexin/PCsKexin/PCsKexin/PCs

    SubtilisinSubtilisinSubtilisinSubtilisinSubtilisinSubtilisinSubtilisinSubtilisinSubtilisin

    Pyrolysin/CucumolisinPyrolysin/CucumolisinPyrolysin/CucumolisinPyrolysin/CucumolisinPyrolysin/CucumolisinPyrolysin/CucumolisinPyrolysin/CucumolisinPyrolysin/CucumolisinPyrolysin/Cucumolisin

    Proteinase KProteinase KProteinase KProteinase KProteinase KProteinase KProteinase KProteinase KProteinase K

    SedolisinSedolisinSedolisinSedolisinSedolisinSedolisinSedolisinSedolisinSedolisin

    A B

    0

    5

    10

    15

    20

    25

    30

    Dens

    ity

    0 2 4 6∆ [His] [%]

    ProkaryotesEukaryotes

    0

    10

    20

    30

    40

    50

    60

    70

    Dens

    ity

    [His]Cat Prokaryotes[His]Pro Prokaryotes[His]Cat Eukaryotes[His]Pro Eukaryotes

    0 1 2 3 4 5 6 7 8 10[His] [%]

    Alanine

    ValineLeucine

    Isoleucine

    Methionine

    Phenylalanine

    Tyrosine

    Tryptophan

    SerineThreonine

    Asparagine

    Glutamine

    Cysteine

    Glycine

    ProlineHistidine

    Aspartate

    Glutamate

    LysineArginine

    0

    0.25

    0.5

    0.75

    1

    U\m

    n

    0 100 200 300 400 500

    01234567

    Residue

    Hist

    idin

    e co

    nten

    t [%

    ]

    PC1: End

    of propepti

    de

    Subtilisin E

    : End of pr

    opeptide

    PC1: Cata

    lytic histidin

    e

    Subtilisin E

    : Catalytic h

    istidine

    Pyrolysin

    HalolysinCucumisinARA12XSP1Proteinase KProteinase RCerevisinKexin

    FurinPC2PC1PC6

    PC7PC4

    PC5P80146Aqualysin

    P42780P16558XanthomonalisinPseudomonalisinAlkaline protease

    Subtilisin EProtease eprBacillopeptidase FP54423P00783Subtilisin NATSubtilisin BPN'Subt. CarlsbergSubtilisinP20724KumamolisinP41363Q45670Alkaline proteaseSubtilisin J

    Protease nisP

    0 2 4 6 8

    0 2 4 6 8

    His content [%]

    C D

    E

    F

    G

    Prokaryotes

    Eukaryotes

    Archae

    [His]Pro

    [His]Cat

  • Figure 3

    Cathepsin OCathepsin OCathepsin OCathepsin OCathepsin OCathepsin OCathepsin OCathepsin OCathepsin O∆ [His]=0.28%∆ [His]=0.28%∆ [His]=0.28%∆ [His]=0.28%∆ [His]=0.28%∆ [His]=0.28%∆ [His]=0.28%∆ [His]=0.28%∆ [His]=0.28%

    Cathepsin BCathepsin BCathepsin BCathepsin BCathepsin BCathepsin BCathepsin BCathepsin BCathepsin B∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %

    Cathepsin LCathepsin LCathepsin LCathepsin LCathepsin LCathepsin LCathepsin LCathepsin LCathepsin L∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %

    Cathepsin FCathepsin FCathepsin FCathepsin FCathepsin FCathepsin FCathepsin FCathepsin FCathepsin F∆ [His]=1.14%∆ [His]=1.14%∆ [His]=1.14%∆ [His]=1.14%∆ [His]=1.14%∆ [His]=1.14%∆ [His]=1.14%∆ [His]=1.14%∆ [His]=1.14%

    Plant CathepsinsPlant CathepsinsPlant CathepsinsPlant CathepsinsPlant CathepsinsPlant CathepsinsPlant CathepsinsPlant CathepsinsPlant Cathepsins∆ [His]=1.81%∆ [His]=1.81%∆ [His]=1.81%∆ [His]=1.81%∆ [His]=1.81%∆ [His]=1.81%∆ [His]=1.81%∆ [His]=1.81%∆ [His]=1.81%

    0

    10

    20

    30

    40

    50

    60

    70

    Dens

    ity

    [His]Cat Prokaryotes[His]Pro Prokaryotes[His]Cat Cathepsin L[His]Pro Cathepsin L[His]Cat Cathepsin B[His]Pro Cathepsin B

    0 1 2 3 4 5 6 7 8 10[His] [%]

    0

    5

    10

    15

    20

    25

    30

    Dens

    ity

    0 2 4 6 8∆ [His] [%]

    ProkaryotesCathepsin LCathepsin B

    0 50 100 150 200 250 300

    01234567

    Residue

    % H

    istid

    ine

    cont

    ent

    Cathepsin L

    : End of pr

    opeptide

    Cathepsin B

    : End of pr

    opeptide

    Cathepsin B

    : Occluding

    loop

    Cathepsin B

    : Catalytic h

    istidine

    Cathepsin L

    : Catalytic h

    istidine

    A

    B C

    D

    E F

  • Figure 4

    Metazoan CaspasesMetazoan CaspasesMetazoan CaspasesMetazoan CaspasesMetazoan CaspasesMetazoan CaspasesMetazoan CaspasesMetazoan CaspasesMetazoan Caspases∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %∆ [His]= %

    MetacaspasesMetacaspasesMetacaspasesMetacaspasesMetacaspasesMetacaspasesMetacaspasesMetacaspasesMetacaspases∆ [His]=0.83%∆ [His]=0.83%∆ [His]=0.83%∆ [His]=0.83%∆ [His]=0.83%∆ [His]=0.83%∆ [His]=0.83%∆ [His]=0.83%∆ [His]=0.83%

    0

    10

    20

    30

    40

    50

    60

    70

    Dens

    ity

    [His]Cat Prokaryotes[His]Pro Prokaryotes[His]Cat Metazoans[His]Pro Metazoans

    0 1 2 3 4 5 6 7 8 10[His] [%]

    0

    5

    10

    15

    20

    25

    30

    Dens

    ity

    ProkaryotesMetazoans

    0 2 4 6∆ [His] [%]

    0 50 100 150 200 250 300 350

    01234567

    Residue

    % H

    istid

    ine

    cont

    ent

    Caspase 2

    : End of pr

    opeptide

    Caspase 2

    : Catalytic h

    istidine

    A

    B C

    D

  • Figure S1

    N = 4256 Bandwidth = 0.005246

    Dens

    ity

    Alanine0

    2040

    60

    N = 4256 Bandwidth = 0.002666

    Dens

    ity

    Valine

    N = 4256 Bandwidth = 0.002614

    Dens

    ity

    Leucine

    Dens

    ity

    Isoleucine

    020

    4060

    Dens

    ity

    Methionine

    N = 4256 Bandwidth = 0.001882

    Dens

    ity

    Phenylalanine

    N = 4256 Bandwidth = 0.00222

    Dens

    ity

    Tyrosine

    020

    4060

    N = 4256 Bandwidth = 0.001004

    Dens

    ityTryptophan

    Dens

    ity

    Serin

    Dens

    ity

    Threonine

    020

    4060

    N = 4256 Bandwidth = 0.00343

    Dens

    ity

    Asparagine

    Dens

    ity

    Glutamine

    N = 4256 Bandwidth = 0.001254

    Dens

    ity

    Cysteine

    020

    4060

    Dens

    ity

    Glycine

    Dens

    ity

    Proline

    N = 4256 Bandwidth = 0.001215

    Dens

    ity

    Histidine

    020

    4060

    N = 4256 Bandwidth = 0.002323

    Dens

    ity

    Aspartate

    N = 4256 Bandwidth = 0.002885

    Dens

    ity

    Glutamate

    0 2.5 5 7.5 10 12.5 15 17.5 20

    Dens

    ity

    Lysine

    020

    4060

    0 2.5 5 7.5 10 12.5 15 17.5 20

    N = 4256 Bandwidth = 0.002723

    Dens

    ity

    Arginine

    0 2.5 5 7.5 10 12.5 15 17.5 20

    Bacteria ProtBacteria IMCEukaryota ProtEukaryota IMC

    Content [%]

    Dens

    ity

  • Figure S2

    N = 4256 Bandwidth = 0.00818

    Dens

    ity

    Alanine0

    2040

    6080

    N = 4256 Bandwidth = 0.004751

    Dens

    ity

    Valine

    N = 4256 Bandwidth = 0.00482

    Dens

    ity

    Leucine

    Dens

    ity

    Isoleucine

    020

    4060

    80

    N = 4256 Bandwidth = 0.002277

    Dens

    ity

    Methionine

    N = 4256 Bandwidth = 0.003236

    Dens

    ity

    Phenylalanine

    N = 4256 Bandwidth = 0.00302

    Dens

    ity

    Tyrosine

    020

    4060

    80

    N = 4256 Bandwidth = 0.001516

    Dens

    ityTryptophan

    N = 4256 Bandwidth = 0.005756

    Dens

    ity

    Serin

    N = 4256 Bandwidth = 0.00504

    Dens

    ity

    Threonine

    020

    4060

    80

    N = 4256 Bandwidth = 0.004001

    Dens

    ity

    Asparagine

    N = 4256 Bandwidth = 0.004062De

    nsity

    Glutamine

    N = 4256 Bandwidth = 0.001235

    Dens

    ity

    Cysteine

    020

    4060

    80

    Dens

    ity

    Glycine

    N = 4256 Bandwidth = 0.004345

    Dens

    ity

    Proline

    N = 4256 Bandwidth = 0.002244

    Dens

    ity

    Histidine

    020

    4060

    80

    N = 4256 Bandwidth = 0.004321

    Dens

    ity

    Aspartate

    N = 4256 Bandwidth = 0.005338

    Dens

    ity

    Glutamate

    6 4 2 0 2 4 6

    N = 4256 Bandwidth = 0.005832

    Dens

    ity

    Lysine

    020

    4060

    80

    6 4 2 0 2 4 6

    N = 4256 Bandwidth = 0.005206

    Dens

    ity

    Arginine

    6 4 2 0 2 4 6

    Bacteria ProtBacteria IMCEukaryota ProtEukaryota IMC

    Content Difference [%]

    Dens

    ity

  • Figure S3

    0 100 200 300 400 500

    01234567

    Residue

    Hist

    idin

    e co

    nten

    t [%

    ]

    0 100 200 300 400 500

    01234567

    Residue

    Hist

    idin

    e co

    nten

    t [%

    ]

    PC1: End

    of propepti

    de

    Subtilisin E

    : End of pr

    opeptide

    PC1: Cata

    lytic histidin

    e

    Subtilisin E

    : Catalytic h

    istidine

    0 100 200 300 400 500

    01234567

    Residue

    Hist

    idin

    e co

    nten

    t [%

    ]

    100 200 300 400 500

    01234567

    Residue

    Hist

    idin

    e co

    nten

    t [%

    ]

    100 200 300 400 500

    01234567

    Residue

    Hist

    idin

    e co

    nten

    t [%

    ]

    10 Residue Windows

    20 Residue Windows

    30 Residue Windows

    40 Residue Windows

    50 Residue Windows

  • SUPPLEMENTAL TABLES: TABLE S2: Protein(Name( Prokaryotes( Eukaryotes ( Propeptid

    e(Catalytic(( Δ(His)( Propeptid

    e(Catalytic(( Δ(His)(

    Cathepsin(D/E(Family( 1.34( 2.02( C0.68( 2.92( 1.15( +1.75(Carboxypeptidase(Y( 1.48( 1.95( C0.47( 2.16( 2.01( +0.15(alphaClytic((protease( 0.92( 0.99( C0.07( 1.61%( 0.82%( +0.79(Legumain( 2.19( 1.5( +0.69( 8.37( 4.4( +3.97(Lysosomal(Acid(Lipase(( 4.80( 3.14( +1.66( 5.00( 3.36( +1.64(Lysosomal(αCglucosidase(

    ( ( ( ( ( (

    Coagulation(factor(VII( ( ( ( ( ( (CadherinCI( 0.81( 1.16( C0.35( 8.78( 1.9( +6.88(βCHexosaminidase( 1.53(

    2.27(3.26(2.62(

    C1.73(C0.35(

    3.33(3.11(

    1.98(α(2.13(β(

    +1.35(+0.98(

    BMP4( ( ( ( 6.22( 5.11( +1.11(Platelet(derived(growth(factor(

    ( ( ( ( ( (

    SUPPLEMENTAL FIGURE LEGENDS: Figure S1: Distributions of [AA]Pro and [AA]Cat for all 20 amino aicds in eukaryotic and prokaryotic subtilases. Figure S2: Distributin of Δ[AA] for all 20 amino acids in eukaryotic and prokaryotic subtilases. Figure S3: Sliding window analysis of histidine content in eukaryotic and prokaryotic subtilases using different window sizes. SUPPLEMENTAL TABLE LEGENDS: Table S1: List of human proteins with histidine enrichement in their propeptides. All human proteins with annotated propeptides in the UniProt database that have more histidine in their propeptides than expected, assuming a 2.3% probability of histidine at each sequence position, with a significance level of 5%. Table S2: Histidine content of propeptide and catalytic domain of several protein families. Homologs in eukaryotes and prokaryotes were identified using BLAST. Propeptide regions were assigned by homology using multiple sequence alignments.

  • UniProt(identifierName Length(propeptideHistidine(contentP(X>=k)P12821 Angiotensin-converting1enzyme 74 6.76% 2.80E-02O14672 ADAM10 194 7.73% 5.08E-05O75078 ADAM11 202 4.46% 4.54E-02Q9Y3Q7 ADAM18 168 4.76% 4.15E-02Q9P0K1 ADAM22 197 6.09% 2.22E-03O75077 ADAM23 227 4.85% 1.69E-02Q9UKF2 ADAM30 171 4.68% 4.52E-02Q9BZ11 ADAM33 174 5.17% 2.01E-02O15204 ADAM-like1protein1decysin-1 175 6.29% 2.61E-03Q9H324 ADAM-TS10 208 5.29% 9.33E-03P58397 ADAM-TS12 215 6.05% 1.59E-03Q8TE57 ADAM-TS16 255 5.88% 9.60E-04Q8TE60 ADAM-TS18 237 5.91% 1.34E-03P59510 ADAM-TS20 232 4.74% 1.96E-02Q9UKP5 ADAM-TS6 223 8.52% 1.31E-06Q9P2N4 ADAM-TS9 269 4.09% 4.88E-02O95972 Bone1morphogenetic1protein115 249 5.22% 5.55E-03P12643 Bone1morphogenetic1protein12 259 5.79% 1.12E-03P12644 Bone1morphogenetic1protein14 273 6.23% 2.38E-04P18075 Bone1morphogenetic1protein17 263 5.70% 1.30E-03P55287 Cadherin-11 31 12.90% 5.36E-03Q13634 Cadherin-18 29 17.24% 4.82E-04P12830 Cadherin-1 132 6.06% 1.17E-02P14091 Cathepsin1E 34 8.82% 4.29E-02P09668 Cathepsin1H 85 8.24% 3.52E-03P43235 Cathepsin1K 99 6.06% 2.71E-02P25774 Cathepsin1S 98 8.16% 1.97E-03Q6YHK3 CD1091antigen 25 12.00% 1.92E-02P0CG37 Cryptic1protein 65 7.69% 1.70E-02Q14126 Desmoglein-2 26 11.54% 2.13E-02P12259 Coagulation1factor1V 836 3.59% 1.27E-02P02765 Alpha-2-HS-glycoprotein 40 12.50% 2.17E-03P09958 Furin 83 6.02% 4.28E-02O60383 Growth/differentiation1factor19 295 4.41% 2.04E-02P07686 Beta-hexosaminidase1subunit1beta 79 6.33% 3.58E-02P55103 Inhibin1beta1C1chain 218 4.59% 3.07E-02P58166 Inhibin1beta1E1chain 217 5.07% 1.25E-02P51460 Insulin-like13 47 14.89% 9.55E-05P19827 ITI1heavy1chain1H1 246 4.47% 2.84E-02Q99538 Legumain 110 9.09% 2.40E-04P09848 Lactase-phlorizin1hydrolase 847 3.78% 5.15E-03P10253 Lysosomal1alpha-glucosidase 42 9.52% 1.56E-02

  • P14151 L-selectin 10 20.00% 2.11E-02Q9NRE1 Matrix1metalloproteinase-26 72 8.33% 6.34E-03P16519 PCSK2 84 8.33% 3.29E-03P29122 PCSK6 86 5.81% 4.86E-02P01127 PDGF1subunit1B 112 5.36% 4.53E-02Q96B86 Repulsive1guidance1molecule1A 147 5.44% 2.10E-02P10600 Transforming1growth1factor1beta-3 280 5.00% 5.96E-03Q9BZD6 Proline-rich1Gla1protein14 32 9.38% 3.68E-02Q8N2E6 Prosalusin 163 5.52% 1.37E-02O43915 Vascular1endothelial1growth1factor1D 216 6.02% 1.66E-03


Recommended