+ All Categories
Home > Documents > Supporting Information - CaltechAUTHORSangles in protein structures. J Mol Biol 231:1049–1067. 11....

Supporting Information - CaltechAUTHORSangles in protein structures. J Mol Biol 231:1049–1067. 11....

Date post: 11-Feb-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
14
Supporting Information Sobreira et al. 10.1073/pnas.1011223108 SI Materials and Methods ALDH Sequences. The following genomes were examined for alde- hyde dehydrogenase (ALDH) sequences: Aedes aegypti, Anopheles gambiae, Bos taurus, Branchiostoma oridae, Caenorhabditis ele- gans, Canis familiaris, Ciona intestinalis, Ciona savignyi, Drosophila melanogaster, Gallus gallus, Gasterosteus aculeatus, Homo sapiens, Macaca mulatta, Monodelphis domestica, Mus musculus, Nematostella vectensis, Ornithorhyncus anatinus, Oryzias latipes, Pan troglodytes, Phanerochaete chrysosporium, Phycomyces blakesleeanus, Populus trichocarpa, Rattus norvegicus, Saccharomyces cerevisiae, Strong- ylocentrotus purpuratus, Takifugu rubripes, Tetraodon nigroviridis, and Xenopus tropicalis. To these sequences, we added selected sequences from Saccoglossus kowalevskii EST and trace archives as well as ALDH sequences from various other animals (Apis melli- fera, Bombyx mori, Drosophila pseudoobscura, Macaca fasciculata, Mesocricetus auratus, Oryctolagus cuniculus, Ovis aries, Pongo pygmaeus, Taeniopygia guttata, Tribolium castaneum, Xenopus laevis), plants (Arabidopsis thaliana, Nicotiana tabacum, Oryza sativa, Populus trichocarpa, Secale cereale, Sorghum bicolor, Zea mays), and fungi (Chaetomium globosum, Cordyceps bassiana, Emericella nidulans, Magnaporthe grisea, Phaeosphaeria nodorum, Phanerochaete chrysosporium, Phycomyces blakesleeanus). To fa- cilitate structural comparisons with published work, the number- ing of ALDH amino acid residues was based on the classical numbering of residues in the mature human ALDH2 enzyme, which places the catalytic Cys at amino acid position 302 (1). Large-Scale Phylogenetic Analyses. ALDH sequences were obtained from www.aldh.org, whole genome data, EST databases, and trace archives by using both signature (InterPro IPR002086) (2) and BLAST searches (3). ALDH amino acid sequences were aligned by using MUSCLE (4) followed by manual renement. Bayesian inference (BI) was carried out by using MrBayes 3.1 (5) with a WAG+I+Γ 4 model predicted by ProtTest (6). Two runs of 5 million generations were computed for each tree. Convergence was veried, and the burn-in period was determined by plotting log likelihood versus time. Consensus trees and posterior prob- abilities were calculated by using the 50% majority rule. A maximum likelihood (ML) analysis was performed by using RAxML-VI-HPC with the PROTMIXWAG parameter (7) and with 100 bootstrap pseudoreplicates to assess node support. The phylogenetic analyses carried out with BI and ML resulted in essentially identical tree topologies. ALDH Signatures. ALDH1 and ALDH2 sequence signatures were obtained by aligning human ALDH1A2 and human ALDH2 and conrmed in an exhaustive alignment containing vertebrate ALDH1 and ALDH2 sequences by using MultAlin (multalin.toulouse.inra. fr/multalin). Individual amino acid frequencies were obtained from weblogo.berkeley.edu. Ancestral Sequence Reconstruction. Ancestral protein sequences at internal nodes of the ALDH1/2 phylogeny were reconstructed by using the program PAML 3.15 (8), assuming a WAG+Γ 4 model for a data matrix of 447 amino acid residues. To evaluate and limit the inuence of fast evolving sequences on the ancestral sequence reconstruction, ve separate datasets were analyzed (one in- cluding all available ALDH1/2 sequences, one excluding all se- quences with long branches, one excluding all protostome sequences, one excluding all lophotrochozoan sequences, and one excluding all tunicate sequences). For each analysis, a corre- sponding ML tree was calculated by using RAxML-VI-HPC with the PROTMIXWAG parameter (7) with 100 bootstrap replicates to assess node support, which served as input tree for the ancestral sequence calculation. These control calculations, based on vary- ing taxonomic sampling, successfully tested the robustness of the ancestral sequence reconstruction approach as well as the sub- sequent ancestral channel modeling and ancestral channel vol- ume calculation. 3D Modeling. ALDH1 and ALDH2 structures were modeled by homology using the 3D structure obtained from the Protein Data Bank (PDB) of sheep ALDH1A1 (PDB ID code 1BXS) or of human ALDH2 in complex with the dipsogenic inhibitor daidzin (PDB ID code 1OF7). Each target sequence was globally aligned with the template by using ClustalW. This alignment was used as input data for the modeling program Nest in Jackal (9). Pa- rameters were set for renement in all loops and secondary structure regions. The stereochemical quality of the 3D models was evaluated by using Procheck (10). Retinal binding to the substrate access channel was carried out by anchoring the alde- hyde structure in a position equivalent to that of daidzin. The 3D structure of retinal was obtained from MSD Ligand Chemistry (www.ebi.ac.uk/msd-srv/msdchem). Graphical analyses were performed with VMD (Visual Molecular Dynamics) (11). Cluster Analysis. We have compiled data on 202 ALDH1/2 vol- umes derived from 3D structure studies. TwoStep Cluster algo- rithm as implemented in SPSS 13.0 (SPSS for Windows, Rel. 13.0.2004) was used both to dene the best number of clusters that accommodate the empirical data and to classify individual ALDH1/2s into their respective volume clusters. The best number of clusters was chosen as a function of the smallest Schwartz Bayesian Criterion (BIC). Retinal Binding. Retinal binding to the substrate access channel was carried out by anchoring the aldehyde structure in a position equivalent to daidzin. The retinal 3D structure was obtained from MSD Ligand Chemistry (www.ebi.ac.uk/msd-srv/msdchem). The retinal atoms O1, C15, C14, C13, C12, C11, and C10 were su- perposed on the daidzin atoms O34, C31, C30, C29, C28, C26, and C25, respectively. Substrate Access Channel Volume Calculations. ALDH models were individually placed inside a grid box, in which x, y, and z coor- dinates were spaced by 0.8 Å. Void regions inside this box were determined by sequentially moving a probe molecule with a 1.4 Å radius through all grid points. At each point, the volume oc- cupied by the probe plus an average atomic radius of 1.6 Å was scanned. If no protein atoms were detected, the point was con- sidered to be in a void region. Multiple cavities were detected, and to isolate the substrate access channel from other spaces, the structure of human ALDH2 in complex with the dipsogenic inhibitor daidzin (PDB ID code 1OF7) was superposed on the boxed model. The void region in the boxed ALDH model matching the localization of daidzin inside the ALDH2 channel was isolated for volume calculation. The total cavity volume was determined as a sum of all distinct volume elements within the isolated void region. Each volume element was a cube dened by eight points enclosing a volume of 0.512 Å 3 . Modeled cavities were manually inspected to remove subsidiary spaces not asso- ciated with the substrate access channel. The van der Waals volumes for all amino acids have been described (12). Sobreira et al. www.pnas.org/cgi/content/short/1011223108 1 of 14
Transcript
  • Supporting InformationSobreira et al. 10.1073/pnas.1011223108SI Materials and MethodsALDH Sequences. The following genomes were examined for alde-hyde dehydrogenase (ALDH) sequences: Aedes aegypti, Anophelesgambiae, Bos taurus, Branchiostoma floridae, Caenorhabditis ele-gans, Canis familiaris, Ciona intestinalis, Ciona savignyi, Drosophilamelanogaster, Gallus gallus, Gasterosteus aculeatus, Homo sapiens,Macacamulatta,Monodelphis domestica,Musmusculus,Nematostellavectensis, Ornithorhyncus anatinus, Oryzias latipes, Pan troglodytes,Phanerochaete chrysosporium, Phycomyces blakesleeanus, Populustrichocarpa, Rattus norvegicus, Saccharomyces cerevisiae, Strong-ylocentrotus purpuratus, Takifugu rubripes, Tetraodon nigroviridis,and Xenopus tropicalis. To these sequences, we added selectedsequences fromSaccoglossus kowalevskiiESTand trace archives aswell as ALDH sequences from various other animals (Apis melli-fera, Bombyx mori, Drosophila pseudoobscura, Macaca fasciculata,Mesocricetus auratus, Oryctolagus cuniculus, Ovis aries, Pongopygmaeus, Taeniopygia guttata, Tribolium castaneum, Xenopuslaevis), plants (Arabidopsis thaliana, Nicotiana tabacum, Oryzasativa, Populus trichocarpa, Secale cereale, Sorghum bicolor, Zeamays), and fungi (Chaetomium globosum, Cordyceps bassiana,Emericella nidulans, Magnaporthe grisea, Phaeosphaeria nodorum,Phanerochaete chrysosporium, Phycomyces blakesleeanus). To fa-cilitate structural comparisons with published work, the number-ing of ALDH amino acid residues was based on the classicalnumbering of residues in the mature human ALDH2 enzyme,which places the catalytic Cys at amino acid position 302 (1).

    Large-Scale Phylogenetic Analyses.ALDH sequences were obtainedfrom www.aldh.org, whole genome data, EST databases, andtrace archives by using both signature (InterPro IPR002086) (2)and BLAST searches (3). ALDH amino acid sequences werealigned by using MUSCLE (4) followed by manual refinement.Bayesian inference (BI) was carried out by using MrBayes 3.1 (5)with a WAG+I+Γ4 model predicted by ProtTest (6). Two runs of5 million generations were computed for each tree. Convergencewas verified, and the burn-in period was determined by plottinglog likelihood versus time. Consensus trees and posterior prob-abilities were calculated by using the 50% majority rule. Amaximum likelihood (ML) analysis was performed by usingRAxML-VI-HPC with the PROTMIXWAG parameter (7) andwith 100 bootstrap pseudoreplicates to assess node support. Thephylogenetic analyses carried out with BI and ML resulted inessentially identical tree topologies.

    ALDH Signatures. ALDH1 and ALDH2 sequence signatures wereobtained by aligning human ALDH1A2 and human ALDH2 andconfirmed inanexhaustivealignmentcontainingvertebrateALDH1andALDH2sequencesbyusingMultAlin (multalin.toulouse.inra.fr/multalin). Individual amino acid frequencies were obtainedfrom weblogo.berkeley.edu.

    Ancestral Sequence Reconstruction. Ancestral protein sequences atinternal nodes of the ALDH1/2 phylogeny were reconstructed byusing the program PAML 3.15 (8), assuming a WAG+Γ4 modelfor a data matrix of 447 amino acid residues. To evaluate and limitthe influence of fast evolving sequences on the ancestral sequencereconstruction, five separate datasets were analyzed (one in-cluding all available ALDH1/2 sequences, one excluding all se-quences with long branches, one excluding all protostomesequences, one excluding all lophotrochozoan sequences, and oneexcluding all tunicate sequences). For each analysis, a corre-sponding ML tree was calculated by using RAxML-VI-HPC with

    the PROTMIXWAG parameter (7) with 100 bootstrap replicatesto assess node support, which served as input tree for the ancestralsequence calculation. These control calculations, based on vary-ing taxonomic sampling, successfully tested the robustness of theancestral sequence reconstruction approach as well as the sub-sequent ancestral channel modeling and ancestral channel vol-ume calculation.

    3D Modeling. ALDH1 and ALDH2 structures were modeled byhomology using the 3D structure obtained from the Protein DataBank (PDB) of sheep ALDH1A1 (PDB ID code 1BXS) or ofhuman ALDH2 in complex with the dipsogenic inhibitor daidzin(PDB ID code 1OF7). Each target sequence was globally alignedwith the template by using ClustalW. This alignment was used asinput data for the modeling program Nest in Jackal (9). Pa-rameters were set for refinement in all loops and secondarystructure regions. The stereochemical quality of the 3D modelswas evaluated by using Procheck (10). Retinal binding to thesubstrate access channel was carried out by anchoring the alde-hyde structure in a position equivalent to that of daidzin. The 3Dstructure of retinal was obtained from MSD Ligand Chemistry(www.ebi.ac.uk/msd-srv/msdchem). Graphical analyses wereperformed with VMD (Visual Molecular Dynamics) (11).

    Cluster Analysis. We have compiled data on 202 ALDH1/2 vol-umes derived from 3D structure studies. TwoStep Cluster algo-rithm as implemented in SPSS 13.0 (SPSS for Windows, Rel.13.0.2004) was used both to define the best number of clustersthat accommodate the empirical data and to classify individualALDH1/2s into their respective volume clusters. The best numberof clusters was chosen as a function of the smallest SchwartzBayesian Criterion (BIC).

    Retinal Binding. Retinal binding to the substrate access channelwas carried out by anchoring the aldehyde structure in a positionequivalent to daidzin. The retinal 3D structure was obtained fromMSD Ligand Chemistry (www.ebi.ac.uk/msd-srv/msdchem). Theretinal atoms O1, C15, C14, C13, C12, C11, and C10 were su-perposed on the daidzin atoms O34, C31, C30, C29, C28, C26,and C25, respectively.

    Substrate Access Channel Volume Calculations. ALDH models wereindividually placed inside a grid box, in which x, y, and z coor-dinates were spaced by 0.8 Å. Void regions inside this box weredetermined by sequentially moving a probe molecule with a 1.4Å radius through all grid points. At each point, the volume oc-cupied by the probe plus an average atomic radius of 1.6 Å wasscanned. If no protein atoms were detected, the point was con-sidered to be in a void region. Multiple cavities were detected,and to isolate the substrate access channel from other spaces,the structure of human ALDH2 in complex with the dipsogenicinhibitor daidzin (PDB ID code 1OF7) was superposed on theboxed model. The void region in the boxed ALDH modelmatching the localization of daidzin inside the ALDH2 channelwas isolated for volume calculation. The total cavity volume wasdetermined as a sum of all distinct volume elements within theisolated void region. Each volume element was a cube defined byeight points enclosing a volume of 0.512 Å3. Modeled cavitieswere manually inspected to remove subsidiary spaces not asso-ciated with the substrate access channel. The van der Waalsvolumes for all amino acids have been described (12).

    Sobreira et al. www.pnas.org/cgi/content/short/1011223108 1 of 14

    http://www.aldh.orghttp://multalin.toulouse.inra.fr/multalinhttp://multalin.toulouse.inra.fr/multalinhttp://weblogo.berkeley.eduhttp://www.ebi.ac.uk/msd-srv/msdchemhttp://www.ebi.ac.uk/msd-srv/msdchemwww.pnas.org/cgi/content/short/1011223108

  • Statistical Analyses. Substrate access channel volumes were ana-lyzed in unpaired sets of eukaryote ALDH1/2 and ALDH1Lenzymes as well as in paired sets of vertebrate ALDH1/2s(ALDH1As, ALDH1B1s, and ALDH2s) by nonparametricANOVA using the Kruskal–Wallis test and Dunn’s multiplecomparisons test.

    Gene Expression Studies. Sexually mature adults of the Floridaamphioxus (B. floridae) were collected by shovel and sieve inTampa Bay, Florida. Gametes were obtained by electric stimu-lation of adults. After fertilization, embryos and larvae wereraised in the laboratory and fixed at different developmentalstages. Five cDNA libraries were used for cloning ALDH1 andALDH2 genes from amphioxus. The cDNA libraries were madefrom RNA of unfertilized eggs, gastrulae, neurulae, early larvae,and mature adults (13). In situ hybridization experiments werecarried out as described (14) by using increased hybridizationtemperatures (65–70 °C) and nonconserved regions of the dif-ferent genes as templates for antisense riboprobe synthesis toensure probe specificity. Control experiments were carried outwith sense riboprobes to verify the specificity of the expressionpatterns. After in situ hybridization, embryos were mounted on

    glass slides, analyzed under the microscope, and photographedas whole mounts.C. intestinalis adults were obtained from M-Rep. Embryos and

    sperm were surgically removed and kept separate until in vitrofertilization. Fertilized eggs were dechorionated as described (15).Embryos were raised at 18 °C on gelatin/formaldehyde-coateddishes in artificial seawater. Embryos were collected and fixed in4% paraformaldehyde at various developmental stages (5–8 h,10–12 h). Template DNA plasmids were retrieved from theC. intestinalisGene Collection Release 1 library (16). Digoxigenin-labeled antisense RNA probes for ALDH1a, ALDH1b, ALDH1d,and ALDH2 were synthesized from clones GC38i10, GC38i10,GC30b02, and GC07d10, respectively. The ALDH1c probe wasgenerated by using PCR amplification of the last exon from C.intestinalis genomic DNA. Whole-mount in situ hybridization wasessentially performed as described (17) by using an increased hy-bridization temperature of 65 °C. Control experiments were car-ried out with sense riboprobes. Embryos were mounted inPermount on glass slides and expression was analyzed by usinga Leica DMR microscope.

    1. Moore SA, et al. (1998) Sheep liver cytosolic aldehyde dehydrogenase: The structurereveals the basis for the retinal specificity of class 1 aldehyde dehydrogenases.Structure 6:1541–1551.

    2. Mulder NJ, et al. (2003) The InterPro Database, 2003 brings increased coverage andnew features. Nucleic Acids Res 31:315–318.

    3. Altschul SF, et al. (1997) Gapped BLAST and PSI-BLAST: A new generation of proteindatabase search programs. Nucleic Acids Res 25:3389–3402.

    4. Edgar RC (2004) MUSCLE: Multiple sequence alignment with high accuracy and highthroughput. Nucleic Acids Res 32:1792–1797.

    5. Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference undermixed models. Bioinformatics 19:1572–1574.

    6. Abascal F, Zardoya R, Posada D (2005) ProtTest: Selection of best-fit models of proteinevolution. Bioinformatics 21:2104–2105.

    7. Stamatakis A (2006) RAxML-VI-HPC: Maximum likelihood-based phylogeneticanalyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690.

    8. Yang Z (1997) PAML: A program package for phylogenetic analysis by maximumlikelihood. Comput Appl Biosci 13:555–556.

    9. Petrey D, et al. (2003) Using multiple structure alignments, fast model building, andenergetic analysis in fold recognition and homology modeling. Proteins 53(Suppl):430–435.

    10. Laskowski RA, Moss DS, Thornton JM (1993) Main-chain bond lengths and bondangles in protein structures. J Mol Biol 231:1049–1067.

    11. Humphrey W, Dalke A, Schulten K (1996) VMD: Visual molecular dynamics. J MolGraph 14:27–28, 33–38.

    12. Richards FM (1974) The interpretation of protein structures: Total volume, groupvolume distributions and packing density. J Mol Biol 82:1–14.

    13. Yu J-K, et al. (2007) Axial patterning in cephalochordates and the evolution of theorganizer. Nature 445:613–617.

    14. Holland LZ, Holland PWH, Holland ND (1996) Revealing homologies between bodyparts of distantly related animals by in situ hybridization to developmental stages:Amphioxus versus vertebrates. In Molecular Zoology: Advances, Strategies, andProtocols, eds Ferraris JD, Palumbi SR (Wiley-Liss, New York), pp 267–282 and473–483.

    15. Mita-Miyazawa I, Ikegami S, Satoh N (1985) Histospecific acetylcholinesterasedevelopment in the presumptive muscle cells isolated from 16-cell-stage ascidianembryos with respect to the number of DNA replications. J Embryol Exp Morphol 87:1–12.

    16. Satou Y, et al. (2002) A cDNA resource from the basal chordate Ciona intestinalis.Genesis 33:153–154.

    17. Christiaen L, et al. (2008) The transcription/migration interface in heart precursors ofCiona intestinalis. Science 320:1349–1352.

    18. Klyosov AA (1996) Kinetics and specificity of human liver aldehyde dehydrogenasestoward aliphatic, aromatic, and fused polycyclic aldehydes. Biochemistry 35:4457–4467.

    Sobreira et al. www.pnas.org/cgi/content/short/1011223108 2 of 14

    www.pnas.org/cgi/content/short/1011223108

  • ALDH 5

    ALDH 7

    ALDH 6ALDH 4

    ALDH 3

    ALDH 8

    ALDH 9

    ALDH 16

    MetazoanALDH 1

    MetazoanALDH 2

    FungalALDH 1/2

    EukaryoticALDH 1/2

    MetazoanALDH 1L

    0.3

    1

    0.790.

    58

    0.98

    0.75

    0.82

    0.97

    1

    0.63

    1

    0.79

    0.99

    0.96

    0.97

    1

    1

    0.84

    0.98

    1

    1

    0.82

    1

    0.75

    0.56

    1

    1

    0.95

    1

    1

    0.9

    1

    1

    0.91

    1

    0.8

    0.89

    0.61

    0.98

    0.73

    0.95

    0.5

    0.9

    1

    1

    28.0

    0.84

    1

    0.97

    1

    47.0

    1

    0.930.53

    1

    0.58

    0.92

    1

    0.8

    18.0

    1

    0.99

    1

    1

    1

    0.96

    0.62

    1

    0.95

    0.93

    1

    1

    0.96

    1

    0.54

    0.99

    0.61

    0.79

    1

    1

    0.88

    0.54

    1

    0.830.

    84

    0.98

    1

    1

    1

    0.99

    1

    0.99

    0.84

    1

    0.64

    1

    0.62

    0.63

    0.64

    0.99

    11

    1

    0.86

    1

    0.66

    10.51

    0.51

    0.51

    1

    0.81

    O16

    648_

    ALD

    H4F

    1_S

    c

    O94788_ALDH1A2_RALDH2_Hs

    07835_Ci

    170078_Nv

    2006

    99_N

    v

    XP_786833_Sp

    XP_7

    8607

    7_Sp

    O74

    766_

    ALD

    H4D

    1_S

    c

    11750_Ci

    CG96

    29PA

    _Dm

    09381_Ci

    AK

    0958

    27_A

    LDH

    1L2_

    TFD

    H2_

    Hs

    CG

    3309

    2PA

    _Dm

    89632_Nv

    261794_Bf

    ALDH

    16A1

    _Hs07939_C

    i

    P43353_ALDH3B1_Hs

    P380

    67_A

    LDH5

    C1_S

    c

    2850

    95_B

    f

    XP_7

    8915

    1_Sp

    XP_78

    8850_S

    p

    XP_793

    662_Sp

    206892_Bf

    2314

    13_N

    v

    P00352_ALDH1A1_RALDH1_Hs

    XP

    _793

    827_

    Sp

    Q02252_A

    LDH

    6A1_H

    s

    P465

    62_A

    LDH7

    A2_C

    eO44555_ALDH3C2_Ce

    00550_Ci

    Q9H2A2_ALDH8A1_Hs

    P54115_A

    LDH

    1F2_S

    c

    Q9P7K9_ALDH15B1_Sc

    192075

    _Nv

    106815_Bf

    P49189

    _ALDH

    9A1_TM

    ABAD

    H_Hs

    150665_Nv

    P30837_ALDH1B1_ALDHXmito_Hs

    P40047_A

    LDH

    1D1_S

    c

    AK112485_Ci

    XP_7

    9315

    6x_S

    p

    113974_Bf

    177644_Nv

    P07

    275_

    ALD

    H4B

    1_S

    c

    Q19

    428_

    ALD

    H1L

    1_C

    e

    P494

    19_A

    LDH7

    A1_A

    TQ1_

    Hs

    09379_Ci

    P46367_A

    LDH

    1D2_S

    c102986_Nv

    07300_Ci

    O02266_ALDH5B1_Ce

    Q9V4M1_ALDH3G1_Dm

    XP

    _784

    777_

    Sp

    09471_Ci

    127028_B

    f

    280571_Bf

    XP_782957_Sp

    86762_Bf

    282069_Bf

    162370

    _Nv

    O14

    293_

    ALD

    H1S

    1_S

    c14829_Ci

    XP_790286_Sp

    XP

    _783603_Sp

    mD_1

    E4H

    DLA_4

    XN

    V9Q

    XP

    _782

    169_

    Sp

    Q9VLC5_ALDH1A10_Dm

    P30838_A

    LDH

    3A1_H

    s

    234970_B

    f

    CG

    6661PA

    _Dm

    P05091_ALDH2_Hs

    181421_Nv

    O16518_ALDH3C1_Ce

    P30

    038_

    ALD

    H4A

    1_H

    s

    P48448_ALDH3B2_Hs

    Q9U

    RW

    9_A

    LDH

    1R1_

    Sc

    Q18822_ALDH8B1_Ce

    XP_788528_Sp

    07853_Ci

    230530_Bf

    Q9V

    B96_A

    LDH

    1M1_D

    m

    1545

    09_N

    v

    Q9UT

    M8_A

    LDH5

    D1_S

    c

    2353

    02_N

    v

    O75

    891_

    FTH

    FD_A

    LDH

    1L1_

    Hs

    124720_Bf

    Q9VBP6_ALDH4G1_Dm

    09472_Ci

    2145

    34_B

    f

    XP_786787_Sp

    Q20352_AL

    DH9B1_Ce

    0751

    7_Ci

    P51648_A

    LDH

    3A2_H

    s

    P47895_ALDH1A3_RALDH3_Hs05778_C

    i

    1149

    17_B

    f

    cS_2

    G1H

    DLA_17774

    P

    1039

    08_N

    v

    245626_Nv

    XP_7

    9781

    5_Sp

    P51649_ALD

    H5A1_Hs

    BW

    049365_Ci

    O59808_ALDH10C1_Sc

    2695

    54_B

    f

    Q04458_ALDH14_Sc

    0807

    6_C

    i

    179476_Nv

    O46056_A

    LDD

    H6A

    2_Dm

    Q9TXM

    0_ALDH1J2_Ce

    Q20780_ALDH9B2_Ce

    118947_Bf

    P38694_ALDH15A1_Sc

    XP_7892

    39_Sp

    Q9V

    IC9_

    ALD

    H1L

    1_D

    m

    1865

    47_N

    v1752

    87_N

    v

    0265

    8_C

    i

    0647

    2_Ci

    P52713_A

    LDH

    6A3_C

    e

    Fig. S1. Phylogram of a comprehensive phylogenetic analysis of the ALDH superfamily. Posterior probabilities are indicated at each node. Each individualALDH family is supported by significant posterior probabilities (>0.95). Bf, Branchiostoma floridae; Ce, Caenorhabditis elegans; Ci, Ciona intestinalis; Dm,Drosophila melanogaster; Hs, Homo sapiens; Nv, Nematostella vectensis; Sc, Saccharomyces cerevisiae; Sp, Strongylocentrotus purpuratus.

    Sobreira et al. www.pnas.org/cgi/content/short/1011223108 3 of 14

    www.pnas.org/cgi/content/short/1011223108

  • Metazo

    an ALD

    H 8

    FungalA

    LDH

    1/2M

    etazoan A

    LDH

    1LP

    lantA

    LDH

    1/2M

    etazoan A

    LDH

    1

    RALDH 1

    RALDH 2

    RALDH 3

    Metazo

    an ALD

    H 2

    RALDH 4

    1

    0,99

    1

    0,65

    0,6

    1

    1

    1

    0,86

    1

    1

    0,95

    1

    0,95

    1

    0,51

    1

    0,92

    1

    1

    0,82

    0,76

    1

    1

    0,98

    0,52

    1

    0,64

    1

    1

    0,58

    1

    1

    1

    1

    1

    0,9

    1

    1

    1

    1

    1

    0,83

    1

    0,99

    1

    0,98

    1

    1

    0,53

    1

    0,64

    1

    1

    1

    1

    0,84

    0,68

    1

    1

    1

    1

    1

    1

    1

    1

    0,91

    0,721

    0,76

    0,74

    1

    0,77

    1

    0,92

    1

    1

    1

    1

    1

    1

    0,97

    1

    1

    1

    1

    0,56

    1

    1

    0,68

    11

    0,5

    1

    0,76

    0,54

    0,77

    0,98

    1

    0,72

    0,72

    0,64

    0,67

    0,98

    0,86

    1

    0,98

    0,65

    1

    133924_ALDH1/2e_Pc

    Q18822_ALDH8B1_Ce

    XP_786787_ALDH2b_Sp

    XP_001061932_ALDH8A1_Rn

    177987_ALDH2_Lg

    Q9H2A2_ALDH8A1_ALLDH12_Hs

    193864_ALDH8_Lg

    Q19428_ALDH1L1_Ce

    140859_ALDH1/2g_Pc

    Q8S528_ALDH2B7_At

    Q9FRX7_ALDH2B1_Os

    39661_ALDH1/2b_Pb

    192810_ALDH1b_Lg

    09161_ALDH8_Cs

    P40047_ALDH1D1_Sc

    159056_ALDH1b_Ct

    07853_ALDH1c_Ci

    NP_082546_ALDH1B1_Mm

    218707_ALDH1a_Lg

    P47771_ALDH1G2_ScO14293_ALDH1S1_Ssp

    Q9VLC5_ALDH1A10_Dm

    P46367_ALDH1D2_Sc

    Q9VIC9_ALDH1L1_Dm

    133289_ALDH1/2_Pc

    XP_793827_ALDH1La_Sp

    NM_009656_ALDH2_Mm

    ALDH2_Sk

    ALDH1L_Sk

    AAL99611_ALDH2D1_Zm

    NM_053896_ALDH1A2_Rn

    07835_ALDH1b_Ci

    Q20780_ALDH9B2_Ce

    Q9SU63_ALDH2B4_At

    124720_ALDH1d_Bf

    Q9LRI6_ALDH2B5_Os

    823362_ALDH2B1_Pt

    35266_ALDH1/2a_Pb

    138693_ALDH1/2c_Pc

    01705_ALDH2_Cs

    XP_790286_ALDH8b_Sp

    830473_ALDH2B3_Pt

    ALDH1Ae_Sk

    O94788_ALDH1A2_RALDH2_Hs

    181421_ALDH1a_Nv

    NM_022407_ALDH1A1_Rn

    Q9LRE9_ALDH2C1_Os

    P00352_ALDH1A1_RALDH1_Hs

    AK095827_ALDH1L2_Hs

    BW049365_ALDH1d_Ci

    NM_053080_ALDH1A3_Mm

    179476_ALDH2_Nv

    137014_ALDH1/2a_Pc

    Q43274_ALDH2B1_Zm

    ALDH1Ab_Sk

    08076_ALDH1L_Ci

    NP_705771_ALDH1L2_Mm

    07939_ALDH1a_Ci

    Q9VB96_ALDH1M1_Dm

    109073_ALDH8b_Ct

    200699_ALDH1L_Nv

    237306_ALDH1L_Lg

    809215_ALDH1A1_Pt

    269554_ALDH1L_Bf

    XP_786833_ALDH2a_Sp

    AK112485_ALDH2_Ci

    NM_009022_ALDH1A2_Mm

    245626_ALDH1b_Nv

    NM_013467_ALDH1A1_Mm

    ALDH1Ac_Sk

    151890_ALDH1a_Ct

    Q9TXM0_ALDH1J2_Ce

    228352_ALDH8a_Ct

    228199_ALDH1L_Ct

    118947_ALDH1b_Bf

    P47895_ALDH1A3_RALDH3_Hs

    30289_ALDH1/2c_Pb

    106815_ALDH2_Bf

    AAL99609_ALDH2C1_Zm

    XP_788528_ALDH8a_Sp

    P54115_ALDH1F2_Sc

    89632_ALDH8_Nv

    140689_ALDH1/2f_Pc

    XP_784777_ALDH1Lb_Sp

    86762_ALDH1c_Bf

    P05091_ALDH2_Hs

    Q8BH00_ALDH8A1_Mm

    AAM27004_ALDH1A1_At

    AAH46315_ALDH1A7_MmNM_017272_ALDH1A4_Rn

    P30837_ALDH1B1_ALDHXmito_Hs

    ALDH1Aa_Sk

    AAL99613_ALDH2B5_Zm

    145966_ALDHc_Ct

    14829_Ci

    O75891_ALDH1L1_Hs

    NP_081682_ALDH1L1_Mm

    NM_153300_ALDH1A3_Rn

    207312_ALDH1c_Lg

    1350_ALDH1/2b_Pc

    ALDH1Ad_Sk

    01325_ALDH1b_Cs

    04405_ALDH1L_Cs

    261794_ALDH1a_Bf

    113974_ALDH1e_Bf

    282069_ALDH8_Bf

    Q9URW9_ALDH1R1_Ssp

    NM_032416_ALDH2_Rn

    P93344_ALDH2B2_Nt

    666446_ALDH2B2_Pt

    NP_071992_ALDH1L_Rn

    NM_001011975_ALDH1B1_Rn

    206892_ALDH1f_Bf

    03997_ALDH1a_Cs

    183731_ALDH2_Ct

    0.2

    Fig. S2. Phylogram of a phylogenetic analysis of the ALDH1/2, ALDH1L, and ALDH8 families with the latter as outgroup. Posterior probabilities are indicatedat each node. Each individual ALDH family is supported by significant posterior probabilities (>0.95). At, Arabidopsis thaliana; Bf, Branchiostoma floridae;Ct, Capitella teleta; Ce, Caenorhabditis elegans; Ci, Ciona intestinalis; Cs, Ciona savignyi; Dm, Drosophila melanogaster; Hs, Homo sapiens; Lg, Lottia gi-gantea; Mm, Mus musculus; Nv, Nematostella vectensis; Nt, Nicotiana tabacum; Os, Oryza sativa; Pc, Phanerochaete chrysosporium; Pb, Phycomyces bla-kesleeanus; Pt, Populus trichocarpa; Rn, Rattus norvegicus; Sc, Saccharomyces cerevisiae; Sk, Saccoglossus kowalevskii; Sp, Strongylocentrotus purpuratus;Ssp, Schizosaccharomyces pombe; Zm, Zea mays.

    Sobreira et al. www.pnas.org/cgi/content/short/1011223108 4 of 14

    www.pnas.org/cgi/content/short/1011223108

  • Distribution clusters of ALDH1/2 volumes

    Volumes (ų)

    Fig. S3. ALDH1/2 clusters according to substrate access channel volume distribution. Average and SD values for the three clusters are 592.29 ± 42.07 for largechannels, 453.33 ± 27.39 for medium channels, and 354.79 ± 32.50 for small channels.

    Gly Cys

    CysSer

    Ile Cys

    Gly ThrPhe

    Phe

    Met

    Leu

    Leu CysPhe

    ALDH1aPos 124Pos 459Pos 303Vol (ų)

    GlyPheThr640

    ALDH1dPos 124Pos 459Pos 303Vol (ų)

    ALDH1cPos 124Pos 459Pos 303Vol (ų)

    ALDH1bPos 124Pos 459Pos 303Vol (ų)

    SerPheCys514

    IleMetCys467

    GlyLeuCys371

    ALDH2Pos 124Pos 459Pos 303Vol (ų)

    LeuPheCys331

    5-8 hours 10-12 hoursPosition124Position

    459Position

    303Position 124+ β-ionone

    BA C

    Fig. S4. Ciona intestinalis ALDH1/2 duplicates. Phylogeny (A), channel structure (B), and developmental expression (C). Amino acid signatures of the substrateentry channel at positions 124 (the mouth), 459 (the neck), and 303 (the bottom) are indicated. For the expression analyses, early embryos (5–8 h) and tailbudstage embryos (10–12 h) are shown. (Scale bars: 50 μm.)

    Sobreira et al. www.pnas.org/cgi/content/short/1011223108 5 of 14

    www.pnas.org/cgi/content/short/1011223108

  • Position 124 Position 459 Position 303Position 124+ β-ionone

    Ala CysMet

    AlaMetCys561

    ALDH1bPos 124Pos 459Pos 303Vol (ų)

    Leu CysPhe

    ALDH2LeuPheCys321

    Pos 124Pos 459Pos 303Vol (ų)

    Gly ThrPhe

    ALDH1aGlyPheThr546

    Pos 124Pos 459Pos 303Vol (ų)

    Fig. S5. Phylogenetic and structural analysis of ALDH1/2s from the ascidian tunicate Ciona savignyi. The two ALDH1 duplicates from C. savignyi display het-erogeneous properties at their substrate entry channels. Although ALDH1a and ALDH1b both display bulky amino acids at position 459, corresponding to thechannel neck, ALDH1a displays the smallest amino acid (Gly) at position 124, which corresponds to the channel mouth. In contrast, in ALDH1b, this residue issubstituted by the slightly larger Ala, which creates an obstacle to accommodate the large β-ionone moiety of retinaldehyde. Moreover, in ALDH1a, the aminoacid at position 303 is Thr, which is typical for ALDH1s, whereas in ALDH1b, this amino acid is Cys, which is typical for ALDH2s. This pattern suggests that in C.savignyi, ALDH1a is best adapted for retinoic acid synthesis and that ALDH1b incorporated structural features reminiscent of ALDH2s, which process smaller, toxicaldehydes.

    Sobreira et al. www.pnas.org/cgi/content/short/1011223108 6 of 14

    www.pnas.org/cgi/content/short/1011223108

  • Branchiostoma floridae

    Scaffold_560: 425 283 bp

    Scaffold_550: 447 241 bp

    Scaffold_21: 3 715 217 bp

    Scaffold_31: 3 505 904 bp

    Scaffold_118: 1 970 888 bp

    Scaffold_155: 1 674 870 bp

    Scaffold_444: 671 902 bp120 kb

    ALDH2

    ALDH1a

    ALDH1b

    ALDH1c

    ALDH1d

    ALDH1e

    ALDH1f

    ALDH2

    ALDH1

    Ciona intestinalis

    Scaffold_18: 551 401 bp

    Scaffold_184: 183 910 bp

    Scaffold_112: 259 664 bp

    ALDH1

    ALDH2

    ALDH1a Ciona intestinalis

    ALDH1a Ciona savignyi

    ALDH1b Ciona intestinalis

    ALDH1b Ciona savignyi

    ALDH1c Ciona intestinalis

    ALDH1d Ciona intestinalis

    ALDH2 Ciona intestinalis

    ALDH2 Ciona savignyi

    20 kb

    A

    B

    Fig. S6. Genomic linkage of ALDH1 and ALDH2 genes in the cephalochordate amphioxus and in the ascidian tunicate Ciona intestinalis. The position ofpredicted ALDH1 and ALDH2 sequences on assembled scaffolds of the amphioxus (A) and the C. intestinalis (B) genome are shown. (A) The phylogeneticrelationship between the six lineage-specific amphioxus ALDH1 duplicates and the single amphioxus ALDH2 is indicated. For ALDH2, the analysis identified twocopies in the amphioxus genome, one on scaffold 550 and one on scaffold 118, representing a single copy of amphioxus ALDH2 on each of the two ho-mologous chromosomes. For the amphioxus ALDH1 duplicates, the analysis suggests that at least some genes, like ALDH1c and ALDH1d (on scaffold 155),ALDH1a and e (on scaffold 560) or ALDH1a, ALDH1e, and ALDH1f (on scaffold 31) are located on the same scaffold and might, hence, be linked on thechromosome. In sum, the phylogeny and synteny data suggest that the six amphioxus-specific ALDH1 duplicates evolved by tandem duplications from anALDH1a-like ancestor. (B) The phylogenetic relationship between C. intestinalis and C. savignyi ALDH1s and ALDH2s is indicated. In the C. intestinalis genome,ALDH2 and ALDH1a are found on individual scaffolds (on scaffold 184 and 18, respectively), whereas the other three C. intestinalis ALDH1 duplicates, ALDH1b,ALDH1c, and ALDH1d, are clustered on a single scaffold (on scaffold 112), suggesting that these three genes are linked on the same chromosome and mightthus have originated by tandem duplications.

    Sobreira et al. www.pnas.org/cgi/content/short/1011223108 7 of 14

    www.pnas.org/cgi/content/short/1011223108

  • Met Phe Cys

    Glu Val Cys

    Gly

    Leu Phe

    Val

    Cys

    Cys

    Position 124 Position 459 Position 303Position 124+ β-ionone

    GluValCys453

    ALDH1B1H. sapiensPos 124Pos 459Pos 303Vol (ų)

    ALDH2aS. purpuratus

    LeuPheCys356

    Pos 124Pos 459Pos 303Vol (ų)

    ALDH2bS. purpuratus

    GlyValCys487

    Pos 124Pos 459Pos 303Vol (ų)

    ALDH2H. sapiens

    MetPheCys377

    Pos 124Pos 459Pos 303Vol (ų)

    Fig. S7. Plasticity of the substrate access channel in deuterostomes. Vertebrate and echinoderm ALDH2s are examples for the evolutionary plasticity ofALDH1/2 channels. In vertebrates, there are two members of the ALDH2 clade: the ALDH2s and the ALDH1B1s. Vertebrate ALDH2s have conserved the aminoacid signatures (Met124, Phe459) for shaping a narrow entrance (the mouth) and a tight-fitting proximal third (the neck) of the substrate access channel. TheALDH1B1s incorporated smaller Glu124 and Val459, which offer less resistance for accommodating the β-ionone moiety of retinaldehyde and alleviate con-striction at the channel neck, hence increasing the overall channel volumes (362 ± 30 Å3 versus 451 ± 30 Å3 for ALDH2 and ALDH1B1, respectively, P < 0.05).A similar trend is observed for sea urchin (Strongylocentrotus purpuratus) ALDH2s. S. purpuratus ALDH2a displays the typical ALDH2 pattern with bulky Leu124and Phe459 at the channel mouth and neck, respectively. In contrast, S. purpuratus ALDH2b incorporated small amino acids (Gly124 and Val459), which leavethe channel mouth wide open to accommodate the β-ionone moiety of retinaldehyde and alleviate constriction at the channel neck thus increasing the volumefrom 356 Å3 in ALDH2a to 487 Å3 in ALDH2b.

    Sobreira et al. www.pnas.org/cgi/content/short/1011223108 8 of 14

    www.pnas.org/cgi/content/short/1011223108

  • 124

    459

    303

    124

    459

    303

    Large channelWide entrance

    Small channelNarrow entrance

    ALDH1 ALDH2

    Functionalshifts

    Fig. S8. Gene duplication and functional evolution in the ALDH1/2 family. Metazoan ALDH1s are generally characterized by a large substrate entry channel(SEC) with a wide channel entrance, whereas metazoan ALDH2s have a small SEC with a narrow channel entrance (the three channel signatures important fordefining the ALDH1 and ALDH2 SECs are indicated: position 124 located at the channel mouth, position 459 at the neck of the channel, and position 303 at thechannel bottom). After duplication, ALDH1/2 duplicates accumulated mutations in their SECs that led to smaller channel sizes in ALDH1 duplicates (e.g., am-phioxus and ascidian tunicate ALDH1s) and to bigger channel volumes in ALDH2 duplicates (e.g., vertebrate ALDH1B1 and sea urchin ALDH2). Hence, duplicatedALDH1/2s experienced functional shifts of their SECs, which are indicative of changes in specificity toward aldehydes of different sizes.

    Sobreira et al. www.pnas.org/cgi/content/short/1011223108 9 of 14

    www.pnas.org/cgi/content/short/1011223108

  • Table S1. Sequence signatures of vertebrate ALDH1 and ALDH2 enzymes

    Amino acidsubstitution

    Vertebratesignature

    Cephalochordatesignature

    Tunicatesignature Type of substitution

    Structural locationof the signature

    Thr-104-Ala No No Polar to small aliphatic apolar Exposed surface (α helix-B)

    Gly-110-Asn No Yes Small aliphatic apolar to polar Buried residue (α helix-B)

    Asn-116-Ile No Yes Polar to aliphatic apolar Buried residue (α helix-C)

    Gly-123-Asp No No Polar to negatively charged Channel (α helix-C)

    Gly-124-Met No No Small aliphatic apolar to largealiphatic

    Channel mouth (α helix-C)

    Thr-128-Cys Yes Yes Polar to polar thiol-containing Buried residue (α helix-C)

    Phe-175-Gln Yes Yes Bulky aromatic to polar Buried residue (α helix-D)

    Ser-184-Ala No No Polar to small aliphatic apolar Buried residue (α helix-D)

    Cys-185-Thr Yes Yes Thiol-containing to polar Buried residue (α helix-D)

    Thr-188-Val Yes Yes Polar to aliphatic apolar Buried residue (β sheet-8)

    Val-191-Met Yes Yes Aliphatic apolar to large aliphaticapolar

    Buried residue (β sheet-8)

    Pro-193-Val Yes Yes Apolar cyclic to aliphatic apolar Buried residue (β sheet-8)

    Lys-251-Arg/His No No Positively charged to largerpositively charged imidazole

    Channel (α helix-G)

    Glu-255-Val No No Negatively charged to aliphaticapolar

    Oligomerization surface(α helix-G)

    Ala-279-Ser No Yes Small aliphatic apolar to polar Exposed surface (connectingloop between β sheet-12

    and α helix-H)

    Asn-285-Trp No No Polar to large aromatic Exposed surface (α helix-H)

    Gln-292-Phe Yes Yes Polar to aromatic Exposed surface (α helix-H)

    Sobreira et al. www.pnas.org/cgi/content/short/1011223108 10 of 14

    www.pnas.org/cgi/content/short/1011223108

  • Table S1. Cont.

    Amino acidsubstitution

    Vertebratesignature

    Cephalochordatesignature

    Tunicatesignature Type of substitution

    Structural locationof the signature

    Gly-293-Ala No No Small aliphatic apolar to smalleraliphatic apolar

    Buried residue (α helix-H)

    Ile-303-Cys No No Aliphatic apolar tothiol-containing

    Channel bottom (connectingloop between α helix-H and β

    sheet-13)

    Glu-311-Gln No No Negatively charged to polar Exposed surface (β sheet-13)

    Ser-313-Asp No No Polar to negatively charged Exposed surface (connectingloop between β sheet-13

    and α helix-I)

    Arg-320-Glu No No Positively charged to negativelycharged

    Exposed surface (α helix-I)

    Asp-355-Gly No No Negatively charged to smallaliphatic apolar

    Exposed surface (α helix-J)

    Leu-356-Tyr No No Aliphatic apolar to large aromatic Exposed surface (α helix-J)

    Glu-368-Leu No Yes Negatively charged to largealiphatic apolar

    Buried residue (β sheet-16)

    Ser-387-Gly No Yes Polar to small aliphatic apolar Exposed residue (β sheet-14)

    Arg-394-Thr No No Positively charged to polar Exposed residue (α helix-K)

    Gln-405-Met No No Polar to large aliphatic apolar Buried residue (β sheet-16)

    Ile/Val/Leu-440-Asp No No Aliphatic apolar to negativelycharged

    Oligomerization surface(α helix-M)

    Thr-441-Tyr Yes No Polar to large aromatic Exposed surface (α helix-M)

    Ser-444-Gln No No Polar to larger polar Exposed surface (α helix-M)

    Sobreira et al. www.pnas.org/cgi/content/short/1011223108 11 of 14

    www.pnas.org/cgi/content/short/1011223108

  • Table S1. Cont.

    Amino acidsubstitution

    Vertebratesignature

    Cephalochordatesignature

    Tunicatesignature Type of substitution

    Structural locationof the signature

    Ser-457-Asp No No Polar to negatively charged Channel (connecting loop betweenβ sheet-18 and α helix-N)

    Val-459-Phe Yes No Aliphatic apolar to aromatic Channel neck (connecting loopbetween β sheet-18 and α helix-N)

    Ser-460-Gly No No Polar to small aliphatic apolar Exposed surface (connecting loopbetween β sheet-18 and α helix-N)

    Signatures are displayed as motifs with 4–10 amino acids. Motifs are shown as pictograms with relative frequencies symbolized by amino acid height asdetermined in an alignment of 24 vertebrate ALDH1s and 12 vertebrate ALDH2s. Represented here are only amino acids that effectively distinguish vertebrateALDH1s from ALDH2s. Species-specific and vertebrate ALDH1A paralog-specific signatures are not included. The amino acids that are not conserved betweenALDH1s and ALDH2s are surrounded by a red box, and their localization along with the nature of the substitutions between ALDH1s and ALDH2s is displayed.The ability of each vertebrate signature to distinguish ALDH1s from ALDH2s in invertebrate chordates (amphioxus and ascidian tunicates) is also shown.

    Table S2. Reconstruction of ancestral sequences in the ALDH1/2 family

    Channel size

    Node Support* Sequence description P† Signature‡ Volume, Å3 Category

    A 1/75 Eukaryote ALDH1/2 ancestor 0.915 Met,Phe,Cys 378.4 SmallB 0.98/59 Fungal ALDH1/2 ancestor 0.918 Met,Phe,Cys 336.5 SmallC 1/94 Plant/Metazoan ALDH1/2 ancestor 0.937 Met,Phe,Cys 287.4 SmallD 1/100 Plant ALDH1/2 ancestor 0.912 Met,Phe,Cys 286.0 SmallE 1/100 Plant ALDH1-like ancestor 0.914 Ala,Phe,Val 559.2 BigF 1/100 Plant ALDH2-like ancestor 0.917 Met,Phe,Cys 333.8 SmallG 1/65 Metazoan ALDH1/2 ancestor 0.948 Met,Phe,Cys 368.8 SmallH 1/69 Metazoan ALDH2 ancestor 0.966 Met,Phe,Cys 444.9 MediumI 1/92 Vertebrate ALDH1 ancestor 0.956 Gly,Leu,Thr 604.3 Large

    *Phylogenetic support for the node, where the ancestral sequence was reconstructed (posterior probability/bootstrap value).†Mean probability of the reconstructed ancestral sequence.‡The order of the amino acids corresponds to the mouth, neck, and bottom signatures, respectively.

    Dataset S1. Sequences used for the comprehensive phylogenetic analysis of the ALDH superfamily

    Dataset S1 (XLS)

    Accession numbers and database information are given for each sequence.

    Dataset S2. Sequences used for the phylogenetic analysis of the ALDH1/2, ALDH1L, and ALDH8 families

    Dataset S2 (XLS)

    Accession numbers and database information are given for each sequence. In addition, channel volumes and channel size categories are indicated for ALDHenzymes with sufficient sequence information and structural compatibility with the crystal templates human ALDH2 and sheep ALDH1A1 (Protein Data BankID codes 1OF7 and 1BXS, respectively).

    Sobreira et al. www.pnas.org/cgi/content/short/1011223108 12 of 14

    http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1011223108/-/DCSupplemental/sd01.xlshttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1011223108/-/DCSupplemental/sd02.xlswww.pnas.org/cgi/content/short/1011223108

  • Movie S1. Six-nanosecond simulation of substrate behavior within the ALDH2 substrate entry channel. Acetaldehyde (shown in orange), a natural substrateof ALDH2, was guided into catalysis position, 3.45 Å from the γ-sulfur (shown in yellow) of the catalytic Cys302 and 4.77 Å from the oxidized NAD (ball and stickconfiguration, bottom left) by using the crystal coordinates for ALDH2 plus crotonal (Protein Data Bank ID codes 1OF7 and 10O1). Amino acid signatures at theALDH2 channel mouth, neck, and bottom are, respectively, represented in blue (Leu124), green (Phe459), and purple (Cys303) in van der Walls configurations.Phe465 (shown in white) is conserved in metazoan ALDHs and keeps acetaldehyde in position throughout the simulation. Phe465 is stabilized by Phe459, whichis the vertebrate ALDH2 neck signature. Phe459 is latched in position by Cys303, the vertebrate ALDH2 bottom signature. Thus, concerted action of vertebrateALDH2 signatures at channel bottom and neck, together with a conserved ALDH channel amino acid, keeps acetaldehyde close enough to Cys302 and the NADacceptor to guarantee efficient catalysis. The dashed line highlighting the distance between the γ-sulfur of Cys302 and the C1 of acetaldehyde is obscured bythe close proximity of these two atoms.

    Movie S1

    Movie S2. Six-nanosecond simulation of substrate behavior within the ALDH1 substrate entry channel. Acetaldehyde (shown in orange) was guided intoposition, 3.45 Å from the γ-sulfur (shown in yellow) of the catalytic Cys302 and 4.77 Å from the oxidized NAD (ball and stick configuration, bottom left) byusing the ALDH1A1 (Protein Data Bank ID code 1BXS) and ALDH2 plus crotonal (Protein Data Bank ID codes 1OF7 and 10O1) crystal coordinates. Amino acidsignatures at the ALDH1 channel mouth, neck, and bottom are, respectively, represented in blue (Gly124), green (Val459), and purple (Ile303) in van der Wallsconfigurations. Phe465, conserved in metazoan ALDHs, overlies acetaldehyde and is represented in white. In ALDH1, acetaldehyde moves away from itsposition close to the catalytic Cys302 and the NAD acceptor. The acetaldehyde escape is preceded by a pendulum-like Phe465 swing toward the channel mouth,which releases the constraints on the underlying acetaldehyde. This swing is preceded by quick disengagement of the bottom and neck amino acids Ile303 andVal459, respectively, which are not bulky enough to reconstitute the interaction surfaces maintained by the typical ALDH2 amino acids Phe459 and Cys303. Theconcerted latching of Phe465 by bottom and neck channel amino acids is thus lost in ALDH1s, which fail to maintain substrate position stable enough forefficient catalysis, which is consistent with their large Km values for acetaldehyde. The dashed line highlights the distance between the γ-sulfur of Cys302 andthe C1 of acetaldehyde.

    Movie S2

    Sobreira et al. www.pnas.org/cgi/content/short/1011223108 13 of 14

    http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1011223108/-/DCSupplemental/sm01.avihttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1011223108/-/DCSupplemental/sm02.aviwww.pnas.org/cgi/content/short/1011223108

  • Movie S3. Six-nanosecond simulation of formaldehyde, the smallest (C1) aldehyde (shown in orange) inside the ALDH1 substrate entry channel. Formal-dehyde was guided into position, 3.45 Å from the γ-sulfur (shown in yellow) of the catalytic Cys302 and 4.77 Å from the oxidized NAD (ball and stick con-figuration, bottom left) by using the ALDH1A1 (Protein Data Bank ID code 1BXS) and ALDH2 plus crotonal (Protein Data Bank ID codes 1OF7 and 10O1) crystalcoordinates. Amino acid signatures at the ALDH1 channel mouth, neck, and bottom are, respectively, represented in blue (Gly124), green (Val459), and purple(Ile303) in van der Walls configurations. Phe465, conserved in metazoan ALDHs, overlies the formaldehyde and is represented in white. Note that formal-dehyde cannot be kept inside the ALDH1 channel and, thus, migrates toward the channel mouth, which is consistent with the inability of ALDH1s to processformaldehyde. The dashed line highlights the distance between the γ-sulfur of Cys302 and the C1 of formaldehyde.

    Movie S3

    Sobreira et al. www.pnas.org/cgi/content/short/1011223108 14 of 14

    http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1011223108/-/DCSupplemental/sm03.aviwww.pnas.org/cgi/content/short/1011223108

Recommended