+ All Categories
Home > Documents > Supplementary Materials for -...

Supplementary Materials for -...

Date post: 17-May-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
40
www.sciencemag.org/cgi/content/full/338/6111/1209/DC1 Supplementary Materials for An Exon Splice Enhancer Primes IGF2:IGF2R Binding Site Structure and Function Evolution Christopher Williams, Hans-Jürgen Hoppe, Dellel Rezgui, Madeleine Strickland, Briony E. Forbes, Frank Grutzner, Susana Frago, Rosamund Z. Ellis, Pakorn Wattana-Amorn, Stuart N. Prince, Oliver J. Zaccheo, Catherine M. Nolan, Andrew J. Mungall, E. Yvonne Jones, Matthew P. Crump,* A. Bassim Hassan* *To whom correspondence should be addressed. E-mail: [email protected] (A.B.H.); [email protected] (M.P.C.) Published 30 November 2012, Science 338, 1209 (2012) DOI: 10.1126/science.1128633 This PDF file includes: Materials and Methods Supplementary Text Figs. S1 to S14 Tables S1 to S4 References Other Supplementary Material for this manuscript includes the following: available at www.sciencemag.org/cgi/content/full/338/6111/1209/DC1 Movie S1
Transcript
Page 1: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

www.sciencemag.org/cgi/content/full/338/6111/1209/DC1

Supplementary Materials for

An Exon Splice Enhancer Primes IGF2:IGF2R Binding Site Structure and Function Evolution

Christopher Williams, Hans-Jürgen Hoppe, Dellel Rezgui, Madeleine Strickland, Briony E. Forbes, Frank Grutzner, Susana Frago, Rosamund Z. Ellis, Pakorn Wattana-Amorn,

Stuart N. Prince, Oliver J. Zaccheo, Catherine M. Nolan, Andrew J. Mungall, E. Yvonne Jones, Matthew P. Crump,* A. Bassim Hassan*

*To whom correspondence should be addressed. E-mail: [email protected] (A.B.H.); [email protected] (M.P.C.)

Published 30 November 2012, Science 338, 1209 (2012)

DOI: 10.1126/science.1128633

This PDF file includes:

Materials and Methods Supplementary Text Figs. S1 to S14 Tables S1 to S4 References

Other Supplementary Material for this manuscript includes the following: available at www.sciencemag.org/cgi/content/full/338/6111/1209/DC1

Movie S1

Page 2: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

2

Materials and Methods Protein expression and purification of multi-species IGF2R-Dom11 for NMR studies

The pPIC-11 vector, which contains human IGF2R domain 11 was utilized as described in (12). Opossum, chicken and zebrafish cDNA were provided by Dr. Catherine Nolan (University College Dublin), platypus genomic DNA was from Andrew J. Mungall (The Wellcome Trust Sanger Institute, Cambridge) and echidna RNA extracted from liver tissue was from Dr Frank Grützner (University of Adelaide). Domain 11 was amplified by PCR using primers that incorporated EcoRI and AvrII restriction sites. The products were then ligated into the pPIC-HIS vector to generate pPIC-11 of each species. Briefly, the pPIC-11 expression constructs were linearised within the HIS4 gene using SalI and target integrated into the HIS4 locus of the P. pastoris genome by electroporation at 1,500V, 25μF, 400Ω. After selection for histidine prototrophs by growth on histidine-deficient plates, yeast colonies were grown in BMGY (1% yeast extract, 2% peptone, 1.34% yeast nitrogen base (YNB), 4 x 10-5% biotin, 1% (v/v) glycerol, 100 mM potassium phosphate, pH 6.0) and induced by being transferred to BMMY (1% yeast extract, 2% peptone, 1.34% YNB, 4 x 10-5% biotin, 0.5% (v/v) methanol, 100 mM potassium phosphate, pH 6.0). For maintained induction, cultures were supplemented with methanol to a final concentration of 1% (v/v) every 24 hrs for two days. Supernatants, which contain the soluble IGF2R domain 11 were then collected and subjected to SDS-PAGE and western blot using anti-His6 mouse monoclonal antibody conjugated to peroxidase (Roche Diagnostics, USA).

For expression in E. coli and NMR studies, DNA coding for the IGF2R domain 11 for each species was subcloned from pPIC vectors and cloned into pET26a (Novagen, Merck Chemicals Ltd, Nottingham, UK) for expression in E. coli BL21 (DE3). The proteins were refolded using existing protocols and purified by gel filtration. Double labeled proteins were expressed in E. coli BL21 (DE3) grown in minimal media containing 15NH4Cl and 13C glucose as the sole sources of carbon and nitrogen.

Nuclear magnetic resonance and structure calculations of the domain 11E4:IGF2 complex and multi-species domain 11.

Unlabeled and labeled IGF2R domain 11E4 was expressed and purified using existing protocols (11). Unlabeled IGF2 was purchased from Novozymes (Biopharma, AU). 13C/15N labeled IGF2 was prepared using previously reported protocols (27), also using TEV protease cleavage of NusA-IGF2 followed by reversed phase chromatography (28). The complex was formed by combining the dilute proteins together in 20 mM sodium acetate (pH 4.2) before extensive dialysis into 5 mM sodium acetate-d3, 100 μM NaN3 (pH 4.2) and concentration by ultra-filtration. The NMR samples contained 0.5 mM of labeled protein and a 1.5 times excess of unlabeled protein. Standard triple resonance experiments were then acquired at 37ºC with a cryoprobe equipped Varian VNMRS operating at 600MHz to assign the backbone and side-chain atoms. 15N, 13C NOESY-HSQC (nuclear Overhauser effect (enhancement) spectroscopy-heteronuclear single quantum correlation) and 2D 15N/13C filtered experiments were acquired at 600 and 900MHz (University of Birmingham, UK) for distance restraints. For the multi-species IGF2R domain11s, standard triple resonance experiments were acquired at 25ºC

Page 3: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

3

on a Varian Unity INOVA spectrometer operating at 600MHz to assign the backbone and side-chain atoms. 15N and 13C NOESY-HSQC experiments were acquired for distance restraints. NMR data processing and analysis was performed with NMRPipe (29) and CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2 using the ARIA2.2 protocol before being water refined using the RECOORD protocol (31-33). Final structures were checked with iCing (version r765) (https://nmr.cmbi.ru.nl/icing/iCing.html). Surface accessibility and binding pocket analysis were performed using 3D-Surfer (34). Protein-protein interfaces were analysed using PDBsum (35). Figures were produced using Pymol (http://www.pymol.org/).

The addition of extra data, such as residual dipolar coupling (RDCs) is often used to help refine NMR structures (36). However, for IGF2:domain 11E4 complex many of the residues involved in binding are highly dynamic, as judged by NMR dynamics2 . The RDCs from such residues are motionally averaged and so are typically excluded from analysis to prevent artificial over-restraint, something which we were keen to avoid. Therefore, to maximize the number of intermolecular NOEs we acquired a number of NOESY spectra on a combination of samples with both labeled and unlabeled components including a number of isotope-filtered NOESY spectra. This led to a well-defined complex, with the structures of both domain 11E4 and IGF2 in the complex being well-folded, having overall backbone root-mean square deviations (RMSDs) of 0.52 and 0.62, respectively.

The ensembles of NMR structures and associated NMR chemical shifts have been deposited with the protein database and BioMagResBank with the following accession codes: Chicken 2L21/17110; Echidna 2L5W/17287; Opossum 2L2G/17134; Domain11E4 2L2A/17128; Domain11E4 /IGF2 complex 2L29/17127.

Expression and purification of mature, NusA fusion and biotinylated IGF2 The pETM-60 vector containing human mature IGF2 was utilized as described in (11). Mature IGF2 of the different species were amplified by PCR from different sources using primers that incorporate NcoI and XhoI restriction sites, then ligated into the pETM-60 vector to generate IGF2 expression vectors for each species. Since the full echidna mature IGF2 sequence is unknown, it was amplified by PCR using a forward echidna primer and a reverse platypus primer and determined by DNA bi-directional sequencing. E. coli BL21 (DE3) cells and His Bind Chromatography Columns were from Novagen (Merck Chemicals Ltd, Nottingham, UK). The BiotinTag microbiotinylation kit was from Sigma-Aldrich (Dorset, UK). The anti-His6 mouse monoclonal antibody conjugated to peroxidase was purchased from Roche Diagnostics (West Sussex, UK). Surface plasmon resonance buffers, chips and consumables were from GE Healthcare (BIAcore, Chalfont, UK). The Phusion Site-directed Mutagenesis Kit was from Finnzymes (Espoo, Finland).

NusA-IGF2 expression and purification was performed as previously described(11). Briefly, E. coli BL21 (DE3) cells from Novagen (Merck Chemicals Ltd, UK); were transformed with pETM-60 plasmids containing the mature IGF2 of interest and high expression levels were achieved by induction with 1 mM IPTG at 25ºC. The produced fusion proteins were then retrieved from the soluble fraction after cell lysis and sonication. Proteins were His-purified using His Bind Column Chromatography from Novagen (Merck Chemicals Ltd, UK) washed with 60 mM imidazole and eluted with 1

Page 4: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

4

M imidazole. When needed, the NusA tag was removed by overnight incubation with ProTEV protease (Promega, UK) and cleaved IGF2 was purified away from NusA by gel filtration using 20 mM Tris pH 8.5, 1 M Urea, 0.15 M NaCl. The IGF2 gel filtration fractions were then concentrated and desalted on C4 resin before elution in 50% acteonitrile. To purify the IGF2 isoforms the sample was loaded onto a Juipter Proteo C4 column (Phenomonex, Cheshire, UK) running an acetonitrile/water solvent system and eluted over 20 min with a 15-45% gradient. The two isoforms, which eluted at ~26% and 27%, respectively, were pooled and freeze-dried.

For immobilization on streptavidin sensor chips, the purified NusA-IGF2 fusion proteins were biotinylated using the BiotinTag microbiotinylation kit (Sigma-Aldrich, UK). Briefly, biotinylation was achieved by mixing 5 µl of freshly prepared 5 mg/ml BAC-SulfoNHS with 150 µl of 10 mg/ml NusA-IGF2 solution in 0.1 M sodium phosphate buffer, pH 7.2 (molar ratio 13:3) and incubating with gentle stirring for 30 min at room temperature. Un-reacted biotinylation reagent was then separated using Sephadex G-50 pre-packaged columns. Biotinylated NusA-IGF2 proteins were subjected to SDS-PAGE and analysed by Western blot using alkaline phosphatase conjugated streptavidin. Species hybrid expression cassettes and protein purification

The chicken-echidna CD loop hybrid expression cassette was constructed using PCR products from both sequences digested with the type IIS restriction enzyme BspQI to allow ligation at the resulting three-base 3’ overhang designed to code for a single amino acid which is conserved in both species. A C-terminal hexa-histidine tag was added via the 3’ oligonucleotide to facilitate purification and the construct was cloned into pLITMUS28i (New England Biolabs, USA) for sequencing before sub-cloning into pPIC9k (Invitrogen, UK) using EcoRI and AvrII cloning sites (12) (Table S4). The chicken-echidna AB, CD, FG-loops constructs were produced from a Pichia pastoris codon optimized synthetic gene (Eurofinsdna, UK) and cloned using EcoRI and AvrII restriction sites, similar to the chicken-echidna CD-loop hybrid (Table S4). 10 µg of expression plasmid linearized with BglII and gel purified (Qiagen Gel Purification Kit, UK) was electroporated into P. pastoris cells (made competent according to the Pichia Manual, Invitrogen, UK) using a Bio-Rad Gene-Pulser (15µF, 4-5ms) and clones were grown on MD media plates for 2-3 days at 30ºC. Typically, 12 individual colonies were screened for the highest protein expression after 12 hours in 1ml BMGY cultures (BMGY: 1% yeast extract, 2% peptone, 1.34% yeast nitrogen base (YNB), 4 x 10-5% biotin, 1% (v/v) glycerol 100 mM potassium phosphate, pH 6.0) followed by 24 hours in BMMY induction media (BMMY: 1% yeast extract, 2% peptone, 1.34% yeast nitrogen base (YNB), 4 x 10-5% biotin, 0.5% (v/v) methanol, 100 mM potassium phosphate, pH 6.0). Selected clones were then plated on G418 containing (4 mg/ml) MD plates for 2 days to enrich multi-copy clones. Scale-up to batch-fed 2.4 l cultures was performed in six 2.5 l baffled Erlenmeyer flasks shaking at 250 rpm at 30ºC for 60 hours. Supernatants were cleared by centrifugation (4000rpm for 20 min at 4ºC in a Beckman-Coulter Avanti-J2 refrigerated centrifuge), made 1 mM NaN3, 2 mM EDTA and adjusted to pH 6 using 4M HCl and secreted proteins adsorbed to 10ml SP-Sepharose (GE Life Sciences, UK). Total protein was batch-eluted in a 5 ml solution of 0.8M NaCl, 1mM NaN3, 2mM EDTA in 20 mM sodium phosphate buffer at pH 6. Subsequently, the eluate was diluted 10-fold in 20 mM sodium phosphate buffer pH 6.0, 1 mM NaN3, 2 mM EDTA and recombinant

Page 5: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

5

domain 11 proteins were purified using a MonoS column on an Äkta FPLC system (GE Life Sciences, UK). Typically, domain 11 hybrid domain proteins eluted at 400-550 mM NaCl and were dialysed at 4ºC against 50 volumes of HBS-EP BIAcore buffer (10mM HEPES (pH 7.4), 150mM NaCl, 3mM EDTA, and 0.005% (v/v) surfactant P20) overnight. Protein concentration was determined spectrophotometrically (Nanodrop) and protein size and homogeneity was verified using 15 % SDS-PAGE and Western Blot using His-Probe (Ni-NTA conjugated HRP, Pierce, UK). Mammalian expression and purification of domain 11

To facilitate rapid expression of domain 11 constructs in mammalian cells, pHL sec (37) had its restriction sites altered using pHL-Mfe-F and pHL-EcoRI-Age-R followed by PCR of pHL-Mfe-BamNO-F and pHL-AvrII-XhoI-R, thus creating unique EcoRI and AvrII cloning sites for the domain 11 coding sequences (Table S4). To aid purification and detection a 6-His 2xStreptag was inserted into the newly created EcoRI site using a PCR fragment with compatible sites (Table S4) and domain 11 constructs were inserted using EcoRI and AvrII digested fragments from pPIC9K clones used in yeast and sequenced.

Two 175 cm2 flasks of confluent 293T cells were used to seed each 500 ml Hyperflask (Corning, USA) and cells were grown for 2 days in 10% FCS, penicillin/streptomycin containing DMEM with 4.5 g/l glucose (all from PAA). For transfections, 16 ml of serum free DMEM with 4.5 g/l glucose were left to incubate with 360 µg of DNA and 1.4 mg PEI for 15 min at RT before being added to 500 ml of 1% FCS, Penicillin/Streptomycin containing DMEM with 4.5 g/l glucose, which was used to replace the media in the Hyperflasks. The media was harvested after 96 hours, spun at 4000g for 20 min and the supernatant filtered through a 0.22 µm filter. For purification, the supernatant was made 50 mM Tris pH 8.0, 250 mM NaCl, 5 mM EDTA, 0.001% polyoxyethylene-10-tridecyl ether (Sigma), and 5 µg/l avidin was added, and the solution passed at 5 ml/min over a 20 ml Streptactin resin in an Äkta Purifier FPLC system. After washing with 10 column volumes with 50 mM Tris pH 8.0, 250 mM NaCl, 5 mM EDTA, the column was washed into 10 mM HEPES pH7.4, 150 mM NaCl, 3 mM EDTA, 0.005% NP20 for 10 column volumes and bound protein eluted with 2 mM desthiobiotin in 10 mM HEPES pH7.4, 150 mM NaCl, 3 mM EDTA, 0.005% NP20. The eluate was dialysed overnight against three changes of a 100-fold excess of 10 mM HEPES pH 7.4, 150 mM NaCl, 3 mM EDTA, 0.005% NP20 before protein concentration was determined using a Nanodrop 1000 and purity confirmed using SDS-PAGE followed by Coomassie Blue R-250 staining.

Having been cleaved off the NusA-IGF2 fusion protein, echidna IGF2 was reversed phase purified, the freeze-dried peptide was dissolved in 10 mM HCl for 4 hours at 4°C and diluted in 20-fold excess of 20 mM sodium borate pH 7.6, 150 mM NaCl. 10 µg of IGF2 was then biotinylated using the NHS-biotinylation kit (Pierce, UK) according to the manufacturer’s instructions, but using a 20-fold molar excess of biotin over IGF2 and incubating for 2 hours at 25°C. Reactions were stopped by adding 10 mM Tris pH8. The peptides were then dialysed extensively against 4 changes of 200-fold excess HBS-EP using 3.5kDa cut-off slide-a-lyser cassettes G2 (Pierce, UK) at 4°C.

Page 6: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

6

Surface plasmon resonance (SPR) and binding Analysis

Kinetic analysis by SPR was conducted as previously described (12) using either a BIAcore 3000 or T200 biosensors. Unless stated, all SPR experiments were performed at 25ºC in HBS-EP (10 mM HEPES pH 7.4, 150 mM NaCl, 3 mM EDTA, and 0.005% (v/v) surfactant P20) at a flow-rate of 25-40 µl/min. After pre-conditioning the sensor chip with three 1 min injections of 1M NaCl, 50mM NaOH, flow cells on sensor chip SA were saturated with biotinylated NusA-IGF2 fusion proteins by affinity capture to streptavidin. Kinetic experiments consisted of a 5 min injection of analyte (IGF2R domain 11 or human IGFBP3) followed by a 200 seconds dissociation phase in HBS-EP running buffer, after which the binding surface was regenerated with a 2 min injection of 2M MgCl2. Kinetic parameters were determined by global fitting of sensorgrams to a two-state (conformational change) binding model using standard BIAevaluation software version 4.0.1 for BIACore 3000 and T200 BIACore evaluation software (1.0). In all cases, the minor component made an insignificant contribution to the overall binding affinity and, as such, only the kinetic parameters of the major binding component were shown (Table 1). For each positive binding interaction, the dissociation affinity constant (KD) was also calculated by fitting of the response of each concentration at equilibrium to a steady-state affinity model using BIAevaluation software. Experiments were repeated at least four times unless stated (see Table 1). For characterisation of hybrid species forms of domain 11, biotinylated human and chicken IGF2, as well as biotinylated human IGF1 (GroPep, Adelaide, AU) were immobilized at 650, 600, and 450 RU, respectively, on a Biosensor Chip SA (GE Life Sciences) in a Biacore 3000 SPR Analyser (GE Life Sciences). SPR analysis was performed at 25ºC at a flow rate of 35 µl/min in HBS-EP buffer (10 mM Hepes pH 7.4, 150mM NaCl, 3mM EDTA, and 0.005% (v/v) surfactant P20). No domain 11 constructs, including domain 11E4, and echidna loop variants bound immobilised IGF1.

For T200 BIACore (GE) analysis a C1 chip was used. Chips were flushed twice for one minute with 100 mM Glycine pH 12, 0.03% Triton X-100 followed by a 10 min wash with HBS-EP. Streptavidin was immobilized by priming the chip surface for 7 min at 5 µl/min with sodium acetate pH 5, activating the surface using a 1:1 mixture of 400 mM 1-ethyl-3-(3-dimethylpropyl)-carbodimide and 100 mM N-hydroxysuccinimide for 7 min at 5 µl/min and then reacting it with 0.06 mg streptavidin in 100 µl sodium acetate pH 5 for 7 min at 5µl/min. Unreacted sites were deactivated by injecting 1 M ethanolamine pH 8.5 for 2 min at 5µl/min. The system was then primed twice with HBS-EP and IGF2 immobilized as follows. 50 ng/ml of biotinylated human IGF2 (GroPep, Aus), biotinylated echidna IGF2 (above) or biotinylated chicken IGF2 (GroPep) were mixed with 150 ng/ml of biotinylated ubiquitin and the mixtures, as well as just biotinylated ubiquitin for FC1, were passed over the streptavidin coated C1 chip surface at 5µg/ml until approximately 50-100 RU of peptide were immobilized on each flow cell. 2 mM biotin in HBS-EP was then injected for 2 min at 25 µl/min to occupy the remaining biotin binding sites of the immobilized streptavidin molecules. For kinetic analysis, concentration gradients ranging from 4 µM to 1.5 nM were analysed for recombinant opossum, echidna, chicken and zebrafish domain 11 proteins at 25µl/min at 25°C. T200 BiaEvaluation software (GE, UK) was used to fit the resulting curves according to a two-state conformational change model with RI set to a constant zero. For domains with

Page 7: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

7

detectable binding, steady state affinities were determined using the same experiment at 5 seconds before injection end and for 5 seconds average of steady binding.

IGF-1R binding affinities were measured in competition binding assays using europium-labelled IGF-I and immunocaptured IGF-1R from lysed P6 IGF-1R cells (BALB/c3T3 cells over-expressing the human IGF-1R) and increasing concentrations of unlabelled human or echidna IGF2, as previously described (13). Binding was expressed as a percentage of binding in the absence of competing ligand. BIAcore analysis of human IGFBP-2 (produced by Forbes laboratory) and IGFBP-3 (R&D systems, Aus) binding was performed as previously described (28). Human or echidna IGF2 (3.21 nM to 50 nM) were injected over IGFBP-2 or IGFBP-3 attached via amine coupling to sensor surfaces. Binding affinities were derived from a two state (conformational change) model. Binding analyses were performed in duplicate and the experiment was repeated twice.

Site-directed mutagenesis

Site-directed mutagenesis was performed using the Phusion Kit according to the manufacturer's protocols, using forward primers that contain the mismatch and 5'-phosphorylated reverse primers. The mutations were confirmed by DNA bi-directional sequencing performed by the sequencing service (School of Life Sciences, University of Dundee, Scotland). Splice site analysis

Splice site scores were calculated according to the maximum entropy, Markov and hidden Markov models (38). Polypyrimidine tract and branch points consensus sequences (39) were analysed according to (40, 41) and branch points scored according to (42). Species specific codon usage indices were taken from (http://www.kazusa.or.jp/codon/) and Codon Adaptability Indices calculated according to (43) using http://genomes.urv.es/CAIcal/)). Intronic Splice Enhancer and Regulatory Sequences were scored according to (44). Exonic Splice Enhancers were scored according to (45) using http://genes.mit.edu/burgelab/rescue-ese/) and Chasin (46) using http://cubweb.biology.columbia.edu/pesx/). LINE and SINE insertions were detected using repeatmasker (A.F.A. Smit, R. Hubley & P. Green RepeatMasker at http://repeatmasker.org) and verified using REPBASE (http://www.girinst.org/repbase/). For the analysis of the platypus CD loop coding sequence with respect to an ESE density enrichment, a PERL script was written to generate all 6144 possible combinations of codons coding for the echidna CD loop sequence FEGTGIKA; use strict; use warnings; my @F=('TTT','TTC'); my @E=('GAA','GAG'); my @G=('GGT','GGC','GGA','GGG'); my @T=('ACA','ACC','ACG','ACT'); my @G2=('GGT','GGC','GGA','GGG'); my @I=('ATT','ATA','ATC'); my @K=('AAA','AAG'); my @A=('GCT','GCC','GCA','GCG'); my $z = 1; foreach my $f (@F) foreach my $e (@E) foreach my $g (@G) foreach my $t (@T) foreach my $g2 (@G2) foreach my $i (@I) foreach my $k (@K) foreach my $a (@A) print $z," $f $e $g $t $g2 $i $k $a\n";$z++;

Each sequence was analysed using Rescue-ESE and number of ESEs detected were scored. The ESE densities of a platypus codon-optimized sequence (using the Codon Usage Database at http://www.kazusa.or.jp/codon/) and of the sequence found at the CD loop of platypus IGF2R were compared and highlighted in the Fig. S12.

Page 8: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

8

For analysis of ESE density in relation to intronic expansion analysis (Fig. S12) the following was performed. All intron exon pairs of BAC CLM1_102I6 (CR933560) (from Dr Andrew Mungall), which contains exon 34 used in this study, were compared to the genomic chicken sequence covering the same region (Exons 29-48). All platypus exons preceded by introns with significant length increases (>factor 6) were analysed for ESE densities (combined SF2, SRP40, SRP35, SC35, Chasin ESE and Fairbrother ESE scores) (http://sroogle.tau.ac.il/) and compared to the values obtained for the same exons in chicken. Similarly, the two exons in platypus which saw the biggest length decreases were also analysed for ESE densities and compared.

In vivo splicing analysis was carried out using the pExontrap (47) plasmid and a PCR generated chicken IGF2R mini-gene from the middle of intron 32 (base 37,724 of NC_006090) to 80bp into intron 33 (base 38,116) and from the last 80bp of intron 33 (base 38,906) to the middle of intron 34 (base 41,582). Chicken exon 34 was replaced by platypus exon 34 and ESE variants by PCR. 100 ng of plasmid was transfected into chicken DF-1 cells using FugeneHD (Roche, USA). RNA was harvested after 24 hours, DNase treated, and reverse transcribed using a specific primer for the pExontrap acceptor exon. RT-PCR reactions with chicken exon 33 and acceptor exon specific primers were analysed by electrophoresis and sequenced. Primers are available on request. In-vitro splicing assays of dsx mini-genes were performed as follows: Exon 3, intron 3 and 66bp of the D. melanogaster dsx gene were amplified from genomic DNA (J. Raff, Oxford) and cloned into the corresponding sites of pGEM-5zf+ (Promega, UK). ESE sequences were introduced in the PCR primers, for dsx-(AAG)7 (48) a synthetic gene was used. PCR products from 350bp 5’ of the SP6 transcription start site to the EcoRV site of the pGEM plasmid were gel purified and transcribed in-vitro using the Ribomax system (Promega), incorporating Ribo-m7G cap analog (49). RNA was DNAse treated, purified from 5% urea-PAGE gels, electroeluted and ethanol precipitated. For each time point, 50 fmol of RNA were spliced in-vitro (50, 51) at 30°C using 25 μl containing 30% HeLa nuclear extract (Promega, UK), 1 mM ATP, 20 mM creatine phosphate, 3.2 mM MgCl2, 5 units RNasin, 1.15 mM dithiothreitol, 30 mM KCl, 12 mM Tris, 0.06 mM EDTA, 0.15 mM PMSF, 80 mM potassium glutamate, 3% w/v polyvinyl alcohol (30–70 kDa), 7.5% w/v glycerol and 12 mM HEPES (pH 7.9 at 25 °C). Reactions were stopped with 175μl HelaStop solution (Promega, UK), digested with proteinase K, phenol/chloroform-extracted, and ethanol-precipitated prior to RT-PCR. 6-FAM labeled primers were used for PCR amplification (18 cycles) and relative fragment quantities were determined using capillary electrophoresis (52) (ABI 3730xl).

Supplementary Text Imprinting mechanisms

Igf2 and Igf2r are located in separate chromosomal loci and appear to have divergent mechanisms of genomic regulation. For Igf2, methylation dependant binding of CTCF on the maternal allele insulates the influence of endodermal and mesodermal enhancers on Igf2 expression, whilst promoting expression of a long H19 non-coding and microRNA (mir-675) (53). This mechanism appears evolutionary conserved with respect to eutheria and metatheria (4, 54). For Igf2r, methylation dependent expression of an

Page 9: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

9

anti-sense non-coding RNA (AIRN) arising from intron 2 is implicated in silencing of the paternal allele in eutheria, but this mechanism appears divergent from metatheria, even though the paternal allele is imprinted in both cases (55, 56). The high resolution structure of the IGF2R domain 11E4 in complex with IGF2

Comparison of domain 11 of multi-species IGF2R and IGF2 proteins shows variations in domain 11 that are confined mainly to the structural loops that comprise the binding site for IGF2, and conservation of IGF2 residues known to interact with these motifs. (Fig. S1). A 'coarse grain' model, the low resolution at 4.1Å crystal structure of IGF2 bound to domains 10-13, has dramatically increased our understanding of this binding interaction but is insufficient for a detailed interpretation of the side-chains at the interface (9). Following human domain 11 AB loop mutagenesis, we identified a new mutant (domain11E1544K, K1545S, L1546V or E4) with an increase in binding affinity to IGF2 (KD 15nM vs 46-64nM for wild type domain 11, Table 1) that facilitated acquisition of high quality NMR datasets using 13C and 15N labeled IGF2 and domain11E4 (12). The 1H-15N HSQC spectra of the domain 11E4 component of the complex and IGF2 are shown in Fig. S2 and Fig. S3, respectively. Samples of unlabeled IGF2 were titrated into 200M solutions of 15N labeled wild type human domain 11 and 1H-15N HSQC spectra recorded for the free protein and the complex (Fig S2). A number of chemical shift changes in these spectra indicate that the two proteins interacted and the complex remained soluble at concentrations required. When the residues displaying chemical shift changes were mapped to the known structure, we observed that AB, CD and FG loop residues were perturbed by the interaction. Although the data quality of free wild type domain 11 was excellent, close inspection (circled, Fig. S2) suggested that many signals from residues in the putative binding site were either broadened or had disappeared in the complex. This initially led to poor quality of NOESY spectra for this complex and it was clear that a high resolution structure of the wild type complex that goes beyond our initial published models could not be achieved. Domain 11 containing the single E1544K mutation offered the possibility of studying the protein complex with improved spectroscopic properties due to the higher affinity interaction, Kd of 37nM, for E1544K domain 11 with IGF2. Fig. S2c shows the 1H-15N HSQC spectra of this construct and shows that it is essentially identical to wild-type. Titration of IGF2 into 15N-labelled E1544K (Fig. S2d) gives an improved spectrum with less evidence of exchange line broadening. A 1H-15N HSQC spectrum of domain 11E4 in the free form resembles the wild type spectrum (Fig. S2e). Upon complex formation with IGF2 at pH 4.2 we obtained, however, an excellent, well dispersed spectrum that further improved upon the domain 11 E1544K data. The full structure of the IGF2:Domain 11E4 complex was completed using this construct. Likewise the 1H-15N HSQC spectrum of 13C/15N labeled IGF2 shows excellent chemical shift dispersion and line-width homogeneity when compared to other published reports (57). The high quality of spectra for both components of the complex allowed the structure of the complex to be tackled using a mixed isotope labeling strategy for either component. The structure of domain11E4 in complex with IGF2 was calculated using 3080 distance restraints (2604 unambiguous, 774 ambiguous and 62 intermolecular), 288 dihedral angle restraints and 52 hydrogen bond restraints. The final structures were consistent with both experimental data and standard covalent geometry, displaying no violations greater than 0.5Å for distance restraints or 5o for dihedral angles. The structural

Page 10: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

10

and quality statistics for the final ensemble are listed in supplementary Table S1. The core of the domain 11E4 is well defined, with a backbone RMSD from the mean over the regular secondary structure of 0.52Å. Due to its highly dynamic nature, IGF2 is less well-defined having a backbone RMSD of 0.62Å, however this is broadly similar to previously reported NMR structures of IGF2 (58, 59). Overall the structure of the complex was well-defined with an RMSD over the well-ordered regions of 0.44 ± 0.07Å for the backbone atoms and 0.77 ± 0.09Å for all heavy atoms, respectively. When just the domain 11 component was compared to published crystal structures of domain 11, a backbone RMSD of 0.90 Å is calculated to the unbound domain 11 (1GP0) and 0.87 Å to IGF2R domains 11-13 in complex with IGF2 (2V5P). Differences between the structures exist primarily in the loop regions; in particular the more flexible AB and FG loops of the binding pocket which move inwards to interact with IGF2 as described in the main text. Comparison of the solution structure of IGF2 solved at pH 3.1 and 300 K to the bound IGF2 revealed a backbone RMSD of 1.32Å over the secondary structure. Differences in the structures appear to arise from the α-helices of IGF2 being pushed together upon binding due to a number of interactions across the protein-protein interface between IGF2 and the AB and FG loops of domain 11.

Overall the orientation of IGF2 in the domain 11 binding pocket agrees with the previously reported domain 11-13:IGF2 crystal structure and confirms the role of F19 of IGF2 being the key anchoring residue at the interface (13). Binding of IGF2 to domain 11E4 buries ~640-760Å2 of solvent accessible surface area on domain 11 and ~600-820Å2

on IGF2. These results are similar to those observed for the low resolution crystal structure of IGF2 in complex with wild type domain 11-14 of IGF2R (710Å2, 750Å2), which was refined using co-ordinates of IGF1 with side chains mutated to those found in IGF2.

The hydrophobic core in domain 11 is composed of Y1542, F1567, I1572, V1574, Y1606, L1626, L1629, L1636 and the side chain of K1631. L1626, L1636 and V1574 form a foundation of hydrophobic residues that support the second group of surface hydrophobes, Y1542, F1567, I1572, Y1606, L1629 and K1631. Together they form the extended binding pocket for IGF2. The complex reveals that three key residues, previously identified as being critical for binding, are the principal residues interacting with this hydrophobic pocket. These are T16 (that determines the specificity of IGF2 over IGF1), F19 and L53 (13). Mutation of residues within this extended binding interface such as IGF2R F1567A and I1572A, and IGF2 F19A, have been shown to abolish IGF2 binding. In addition, D52, shown to be important for binding, also forms numerous interactions with S1600 and E1544K and a hydrogen bond with K1601.

The high-resolution structure of the domain11E4:IGF2 complex can be used to rationally explain the dramatic increase in IGF2 affinity over the wild type receptor. On the AB loop, a single mutation, E1544K, resulted in a 6-fold increase in affinity over the wild type domain 11 (12), whilst domain11E4 improves this affinity still further to 15 nM (10-fold). In both cases these effects are primarily due to modification of the ‘on’ rate. In the domain11E4:IGF2 complex, D23 of IGF2 forms favourable electrostatic interaction with E1544K explaining the 6-fold increase in affinity in the single point mutation (Table 1). In the wild type protein structure of domains 11-13, E1544 forms a salt bridge to the fibronectin insert on domain 13 which is thought to prevent otherwise unfavourable interactions with D23 on IGF2. The E1544K mutation therefore removes the necessity

Page 11: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

11

for this domain to be present in order to achieve higher affinities. Continuing over the AB loop this interaction is further enhanced by the K1545S mutation that not only replaces the bulky negatively charged lysine, reducing steric clashes with Q18 of IGF2, but also introduces a potential hydrogen bond donor. This group is within hydrogen bonding distance of the side chain of Q18 and the additional interactions further improve the binding affinity. The L1547V mutation is more difficult to rationalise, as this is a conservative change found outside the IGF2 binding site. However, this mutation in combination with the other mutations may restrict AB loop flexibility, allowing it to form more extensive and stronger interactions with IGF2, in effect mimicking the role of domain 13, which is speculated to play a similar role in the full receptor (10). The key interactions between IGF2 and domain 11 are displayed in Fig. 1, Fig. S4 and the supplementary movie S1.

The CD loop is immobile but moves as a whole upon IGF2 binding due to changes in the adjoining 4. The CD loop contributes the important I1572 to the hydrophobic binding pocket. In addition T1570, that gives a 10-fold reduction in binding when mutated to alanine is positioned to hydrogen bond with D15 of IGF2 (12). Q1569, that also effects binding, can interact with Q18 of IGF2. Analysis of the flexible FG loop also allows the interpretation of previous mutagenesis data. Increasing the side chain length or changing the charge at position 52 in IGF2 results in a 20-fold reduction in binding compared to the wild type IGF21 (see also Fig. S4b). In the E4 complex this residue hydrogen bonds and salt bridges with S1600 and K1601, and a switch of charge or increase in chain length would disrupt this network. Contacts on the rigid HI loop appear to be minimal. A54 packs against the hydrophobic chain region of K1631 whilst E57 of IGF2 can form a favourable interaction with the positive charge of K1631.

Finally, the model helps to explain mutations that have a definite but smaller effect such as E6R in IGF2, which results in a 2 fold increase in binding compared to IGF2 (9). The side chain of Glu6 is within hydrogen bonding distance of S1602 in the FG loop and mutation may conserve this interaction and introduce further hydrogen bonds. Structural evolution of the IGF2R domain 11 binding site

Overall the structures of chicken, echidna, opossum and human isolated domain 11s are similar with root-mean-squared deviations between the structures ranging from 1.39Å2 to 2.19Å2 over the Cαs of the secondary structure elements. As these structures are unbound to IGF2, we must be cautious about inferring potential changes in the bound state. All the structures contain four conserved disulphide bonds that staple the β-barrel together. As detailed in Figure 2, IGF2 binding evolution coincides with an increase in hydrophobicity and an expansion in the size of the IGF2 binding pocket of domain 11. The IGF2 binding pocket in the chicken domain 11 structure appeared relatively small and shallow with an estimated surface area of 714 Å2 and a volume of 1523 Å3. Compared to the human form, the hydrophobic binding site is not correctly assembled, the underlying foundation residues do not support the binding site and a number of charged residues are present (Fig. 2c). Y1542, F1567, I1572, and Y1606, which form the hydrophobic binding pocket, are replaced (D1519, I1545, L1552 and H1586). These changes not only decrease the overall size and depth of the binding pocket but also reduce the overall hydrophobicity of the pocket, which is critical to drive docking. The prototherian domain 11s from echidna and platypus display a weak but consistent affinity

Page 12: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

12

for their cognate IGF2s. From a structural perspective this can be explained by the presence of a hydrophobic core in echidna similar to the therian domain 11s with an expansion of estimated surface area to 990 Å2 and more importantly, an apparent dramatic increase in the volume to 2352 Å3. The binding pocket is composed of residues F1545, I1550, Y1584, L1604 and L1614, which are equivalent to the human residues F1567, I1572, Y1606, L1626 and L1636. The presence and correct orientation of F1567 and I1572 on the CD loop appears to be critical for IGF2 binding and this is confirmed by alanine mutagenesis in the human domain 11 where mutation of these residues totally abolishes IGF2 binding (12).

By the time the metatheria split from the eutheria, high affinity binding of IGF2 had fully evolved in both branches and the IGF2 binding pocket surface area and volume increases in size further to 1039 Å2 and 2551 Å3 respectively for human domain 11. We could only obtain an approximate estimate for surface area and volume for opossum

domain 11 due to folding of the flexible FG loop over the binding site but deletion of part of the loop gave comparable numbers to human domain 11. Comparison of the metatherial opossum NMR structure with human revealed that the critical residues forming the IGF2 binding site are structurally conserved, with the key hydrophobic residues Y1542 (Y1558), F1567 (F1583), I1572 (I1588) and Y1606 (Y1622) all adopting the same orientation despite minor sequence changes in these loop regions. This clearly explains why the affinity of opossum domain 11 for its cognate IGF2 is so similar to that of the human system. Interestingly, the equivalent residue in L1626 in the human domain 11 that helps to form the hydrophobic core of the IGF2 binding pocket is substituted with a phenylalanine and this may account for differences observed in the volume between the human and opossum binding pockets. Moreover, L1626 is highly conserved among IGF2R sequences and is found only in the few metatherial and sauropsid sequences deposited in the databases.

In addition to contributing key hydrophobic binding residues, the loop regions modulate the high-affinity IGF2 interaction through stabilizing salt bridges and electrostatic interactions. All four loops appear to display a number of key amino acid replacements that have evolved to complement eight principal surface residues of IGF2. The AB loop oscillates between the correct and incorrect complementation. In chicken and echidna, the correct positively charged residue is present, H1521 and R1522 respectively. Changes in residue size are tolerated at this position in human domain 11 where E1544R gave a 7-fold increase in IGF2 binding versus a 6-fold increase for E1544K. Both side chains are long and flexible and may be able to interact correctly with D23 of IGF2 although the role of the fibronectin domain insert and its influence on the AB loop is unknown in these species. In human, E1544 interacts with R1922 of the fibronectin insert to reduce the negative influence of the E1544 in the AB loop. The fibronectin insert is present in fish, birds, monotremes, marsupials and mammals. A blast search revealed twenty complete fibronectin (fn) insert sequences and showed the equivalent arginine residue is conserved in all but Silurus glanis (catfish). Catfish and other fish species also have several residues inserted after the arginine residue that may affect interaction with domain 11. Importantly, the fn insert predates the IGF2 binding site acquisition of the CD loop in domain 11, and may have a function unrelated to IGF2.

The AB loop is flexible in human domain 11 and in the current study it shows sufficient flexibility in the domain11E4:IGF2 complex for it to accommodate IGF2

Page 13: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

13

binding and for it to interact with the fibronectin domain in the full length receptor. Subtle differences in the dynamics of the AB loop of chicken domain 11 may also play a role in the ability of this region to accommodate IGF2.

The longer CD loop of chicken has two residues D1547 and G1548 that substitute for Q1569 and T1570 in human, both of which reduce binding when mutated to an alanine residue and which form important binding interactions with E12, D15 and Q18 of IGF2. The three-dimensional structure of chicken domain 11 reveals however that the loop is not integral to the hydrophobic binding site as it is in echidna, opossum and human. Instead it is anchored by I1545 and L1552 while residues 1946-1951 point out away from the binding site. In contrast the shorter CD loops of echidna, opossum and human (i.e. human CD is anchored by F1567 and I1572 with just four intervening residues) are held much closer to the binding site. The longer length of the CD loop and the presence of small or charged residues in the CD loop will both be contributory factors to the orientation observed in chicken domain 11. In addition, however, the BC loop, that does not form part of the binding site but packs against the CD loop, was not well defined in the chicken three-dimensional structure. In contrast, the echidna, opossum and human structures show a well defined BC loop. Previous [1H]-15N NOE analysis of human domain 11 revealed the HI and CD loops were relatively rigid on a nanosecond-picosecond timescale whereas the AB and FG loops were more flexible (11). Therefore the correct structure and dynamics of the BC loop may help to rigidify the CD loop and help to form the binding pocket. CD loop evolution and splicing mechanism

We noted that chicken exon 34 exhibits a weak splice site (Maximum Entropy Model 4.83, Exon 33=10.18, Exon 35=7.33) and potential branch points (score >70) are located distant to the acceptor splice site of exon 34. In platypus, 2,774bp of LINE/SINE insertions into intron 33 were found using Repeatmasker (and verified with REPBASE) replacing several Intronic Splicing Regulatory Elements (60) adjacent to the highest scoring potential branch point sequences found at those positions in the chicken intron 33, just 148bp from the acceptor splice site of exon 34. The low correlation of codon usage with optimal codon use in the platypus CD loop suggests a lesser impact of protein driven evolutionary pressure, and the codon adaptability score for the amino acids of the CD loop does not differ significantly from the expected CAI score. Additionally, the codons of the CD loop in platypus harbour a large number of exonic splice enhancer sequences, both hexamers (38) and octomers (61), consistent with splicing-related, rather than protein-related, evolutionary pressures as likely reasons for the sequence changes observed from avian to monotremes. We speculate that successive LINE/SINE insertions at close proximity to the exon 34 acceptor splice site in intron 33 have resulted in increasing pressure on an already weak splice site to adapt, resulting in strong exonic splice enhancer sequences within 45bp of the 5’ end of exon 34, and ultimately a remodelling of the branch point sequences to yield a stronger, constitutive splice site in platypus. Once the splice site was strong enough, it is conceivable that exonic splice enhancers became less important and selection pressure shifted from splicing-related sequences to optimization of ligand binding sequences with protein-driven codon adaptation, as evident in the human CD loop sequence. This interpretation is supported by the circumstantial evidence in Fig. S12, where the most expansion and contraction of

Page 14: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

14

the preceding introns of platypus IGF2R correlate with the changes in the ESE densities. Moreover, the actual ESE density observed is high, with 97% having lower ESE densities, and with the frequency of ESEs in the codon optimized sequence being relatively low (Fig. S12).

Transfection of HeLa cells with mini-gene (pET vector system, MoBiTec GmbH) constructs of chicken, platypus and human Exons 33 and 34 followed by RT-PCR found no unexpected splicing products, suggesting that the evolutionary pressure to improve the exon 34 acceptor splice site must have arisen within the prototherian branch (Fig. 3D), possibly concurrent with the invasion of transposable elements in intron 33.

In vertebrates, codon and amino acid usage near intron-exon boundaries appear to be selected for splicing rather than translational efficiency (62). In keeping with this hypothesis, selection for the retention of exonic splicing enhancers (ESEs) may explain the low rate of synonymous substitution, low SNP density, low protein evolution rates and the choice of amino acids enriched by ESEs in these regions (63, 64). The dependence of splicing efficiency on ESEs appears particularly important for intron rich genomes, as has been detected in the platypus (17, 65). In conjunction with alternative splice site usage, premature termination sequences may lead to purification of mRNA species depending on the efficiency of non-sense mediated decay (66). In this context, ESEs may have evolved because of the insertion of repeat elements within the prototherian genome, leading to a combination of efficient and correct splicing as well as alternative splice site competition.

Page 15: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

15

Page 16: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

16

Fig. S1. Amino-acid sequence alignment of IGF2R-domain 11, IGF2 and IGFBP3 from different vertebrates. Alignments using program ClustalW. The background-shadings represent related amino acids (if >50% of the total compared number of amino acids). a. secondary structure elements for human IGF2R-domain11 are displayed above the alignments. Residues that reduce IGF2 binding to domain11 are in red, residues that are critical for binding are in blue and residues at the binding interface are in yellow. Black stars indicate mutated residues in the domain 11 E4 mutant, black triangles residues that form the IGF2R-domain11:IGF2 binding interface. b. alignment of IGF2 from vertebrates. Shaded residues are identical, IGF2 residues for binding to domain 11 are blue. c. alignment of IGFBP-3. Orange shaded residues are identical, similar residues are shaded blue. PSIPred predicted secondary structure elements for human IGFBP-3 are displayed above the alignments (67).

Page 17: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

17

Fig. S2. Monitoring domain 11:IGF2 complex formation by NMR. Samples of unlabeled IGF2 were titrated into 200 M solutions of 15N labeled wild type human domain 11 and 1H-15N HSQC spectra recorded for a. the free protein and b. the complex (a, b). c. shows the 1H-15N HSQC spectra of the domain 11 E1544K construct and d. titration of IGF2 into 15N-labelled E1544K e. 1H-15N HSQC spectra of domain 11E4 (KD of 15 nM) and f. complex formation with IGF2 at pH 4.0. Circled regions shows an area of the spectrum used to assess signal quality in the free and bound forms of domain 11 mutants.

Page 18: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

18

Fig. S3. 1H-15N TROSY-HSQC of 13C/15N labeled IGF2 in free form and in complex with Domain 11E4. a. 1H-15N TROSY-HSQC spectrum of free IGF2 at pH 4.2 acquired at 600 MHz. b. 1H-15N TROSY-HSQC spectrum of the complex with domain 11E4 formed by dissolving 200μM 13C/15N labeled IGF2 with 1.5 molar excess of unlabeled Domain11E4 at pH 4.2. Upon binding there are dramatic chemical shift changes observed in IGF2, with most residues giving sharp signals indicative of a single conformation. The boxed area shows a close-up view of the arginine side-chain region (Nε bound protons).

Page 19: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

19

Fig. S4. Three-dimensional structure of the Domain11E4-IGF2 complex highlighting the loop interactions with IGF2. a. Close up view of the AB and CD loop polar and charged interactions with IGF2. IGF2 is coloured in yellow and the residues highlighted in Fig. 1 are shown for both IGF2 and domain11E4. b. Close up view of the FG and HI residue specific interactions with IGF2.

Page 20: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

20

Fig. S5. Protein expression of recombinant species specific NusA-IGF2 and IGF2R domain 11 proteins. a. Schematic diagram of the NusA-IGF2 fusion proteins and His-tagged domain 11 and b. and c, western blot detection of the five expressed and biotinylated NusA-IGF2 proteins with streptavidin-alkaline phosphatase and domain 11 with an anti-His6 antibody, respectively. d. schematic diagram of echidna domain 11 and 11-13 proteins with a C-terminal His6 tag. e. western blot using an anti-His6 antibody and f-h. Coomassie Blue stained gels of echidna domain 11-13, 11, chicken domain 11 with echidna CD loop and chicken domain 11 with echidna AB, CD and FG loops, respectively. i. Schematic representation of the 6-His-2xStrept-tagged domain 11 expression constructs produced in 293T cells. j. SDS-PAGE 15% (Coomassie Blue) of purified domains 11 from human, echidna and chicken.

Page 21: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

21

Fig. S6. SPR interactions between IGF2R-Domain 11 and NusA-IGF2 of the same species. a-f. Example sensorgrams depicting duplicate injections of domain 11 at 4096 nM to 8 nM binding to immobilised species NusA-IGF2. a. human, b. opossum, c. platypus, d. echidna, e. chicken, f. zebrafish. Red lines represent the global fitting of the data set to a two-state (conformational change) model. Inserts show steady state binding isotherms, with human showing both recombinant IGF2 (triangles) and NusA-IGF2 (circles). g, quantification of relative real time (two-state conformational change fit) and steady state kinetics for a single experiment.

Page 22: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

22

Fig. S7. SPR kinetics of species specific NusA-IGF2 with human recombinant IGFBP3. a-e. representative sensorgrams with injections of human IGFBP3 16nM, 8nM, 4nM, 2nM, 1nM to 0.25nM (except in e., 0.5nM) over immobilised NusA-IGF2 from respective species; a., human, b., opossum, c., platypus and echidna, d., chicken and e., zebrafish. Red lines shown in this example indicate examples of global fitting of each data set with a 1:1 Langmuir binding model. f, real time kinetic parameters and dissociation affinity constants, including χ2. Note the relatively poor fits to this model with associated high χ2 values compared to Fig S8.

Page 23: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

23

Fig. S8. Purification and bio-activity analysis of purified echidna IGF2. a. Following expression, the echidna NusA-IGF2 fusion was cleaved with TEV and IGF2 purified by gel filtration. The two isoforms were separated using a Juipter Proteo C4 column (Phenomonex) running an acetonitrile/water solvent system over 20 mins with a 15-45% gradient. The two isoforms which eluted at 26% and 27%, respectively, were pooled and freeze-dried. b. results of labeled IGF1 competition binding assays to immune-captured IGF-1R lysed from cells expressing human IGF-1R (see methods) using echidna and human IGF2. Mean ±SEM with each concentration measured in triplicate. c. Representative traces and fits of BIAcore analysis of amine coupled and immobilised IGFBP-2 and IGFBP-3 binding to increasing concentrations of echidna and human IGF2. Fits were analysed with two state conformational change model from two experiments. Affinity measurements with standard error of means (SEM) are shown in d. Binding affinities are expressed as a ratio (fold) of the EC50 or KD for echidna IGF2 over the affinity for human IGF2 IGF-1R assay was performed twice with each data point measured in triplicate in each assay (i.e. n=6 data points per concentration). BIAcore data for IGFBP-2 was repeated three times (except for Peak 1 which was repeated twice only). BIAcore data for IGFBP-3 was repeated four times.

Page 24: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

24

Fig. S9. SPR kinetics of IGF2 with recombinant echidna, chicken and zebrafish domain 11. Representative sensorgrams are shown for data presented in Table 1. Biotinylated IGF2 (human and chicken from Gropep, Aus) including purified echidna (peak 2) IGF2 were immobilized using streptavidin coated C1 chip and kinetics recorded with a BIACore T200 instrument. Echidna domain 11 bound all forms of purified IGF2 with similar kinetics and no binding was detected with either chicken or zebrafish domain 11.

Page 25: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

25

Fig. S10. Three-dimensional structures of the multi-species IGF2R domain 11 loop regions. Cartoon representations of the AB (a), CD (b), FG (c) and HI loops (d) from different species; chicken (red), echidna (orange), opossum (green), human (magenta) and domain 11E4 (blue).

Page 26: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

26

Fig. S11. Surface plasmon resonance kinetics of human IGF2 with chicken domain 11 with an echidna CD loop and echidna domains 11-13. Examples of quantitative binding analysis of human biotinylated IGF2 (GroPep, Aus) binding to echidna P. Pastoris expressed domains. a. Kinetic example of SPR data from chicken domain 11 with the CD loop of echidna, data shown in Table 1, with steady-state binding isotherm shown in the insert. b. example of echidna domains 11-13 binding analysis, without saturation kinetics (see Table 1).

Page 27: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

27

Fig. S12. Splice site analysis of exon 34 of the chicken, platypus and human IGF2R genes. a. Splice site features at the intron 33 – exon 34 boundary in chicken, platypus and

Page 28: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

28

human, and evolution of the IGF2 binding site residues of the CD loop are illustrated. Potential branch points (>70 score, underlined) and polypyrimidine tract (purple) are indicated. Intronic splicing enhancers (ISE) in the region replaced by LINE/SINE elements in platypus (arrow) are indicated with blue lines, exon splice enhancers in the CD loop sequence are marked with green (Rescue-ESE) and red (Zhang and Chasin) lines. Codon adaptability scores are illustrated for the CD loop amino acids (See Supplementary discussion 3). b. Summary of splice site features. Intron lengths, LINE/SINE insertions and predicted ISEs and ESEs (depicted in a) are listed. Splice site strengths are detailed according to Maximum Entropy (ME), Markov Model (MM) and Hidden Markov Model (HM). The percentage increases over the expected CAI values (eCAI) are listed for the entire CD loop. c. Comparison of ESE densities found in exons in platypus with significant changes to preceding intron lengths. (Analysis using platypus genomic sequence contained in BAC CLM1_102I6 (CR933560), Dr A Mungall). Tabled details show ESE density changes with respect to the most altered preceding intron length, suggesting correlation between intron length in(de)creases and the following exon ESE densities, d. ESE densities found in all possible 6144 codon combinations coding for the Platypus CD loop protein sequence FEGTGIKA using RESCUE-ESE (see Supplementary methods). The number of ESEs found in the actual platypus coding sequence, as well as in the platypus codon optimized sequence, are indicated by red and blue arrows, respectively.

Page 29: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

29

Fig. S13. Identification of splice products of IGF2R mini-genes in DF-1 cells by sequencing. a-e. Sequences of splice junctions found in RT-PCR products from Fig. 3b. Chicken exon 33 is spliced to exon 34 of chicken, platypus, chicken with platypus ESE and platypus with chicken ESE sequences. A minor splicing form of 116 bp is shown, which arises as a result of a cryptic splice acceptor site in chicken exon 34 (base 38,734 in NC_006090) and results in a stop codon at base 38,743. f, illustration of the position of the cryptic splice acceptor site identified in chicken exon 34, original base numbering (NC_006090).

Page 30: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

30

Fig. S14 In-vitro splicing analysis in HeLa cell extracts. a. dsx mini-gene constructs with RT-PCR primer locations. b. Sybr-II green stained denaturing PAGE gel of in vitro transcribed RNA, c. ethidium bromide stained 3% (w/v) agarose gel of RT-PCR products of in vitro splicing reactions at time points indicated. Un-spliced and spliced RNAs are indicated. d. peakscanner 1.0 traces of ABI 3730xl capillary electrophoresis fragment analysis after 2h of in vitro splicing. Blue lines indicate 6-FAM labeled PCR products and orange lines represent internal standard (GeneScan 500-LIZ) with dots marking the standards at 35, 50, 75, 100, 139, 150, 160, 200, 250, 300, 340, 350, 400, 450, 490, and 500 bases. The relative amount of spliced product (194bp) is given for each RNA species as a percentage of total RT-PCR products (spliced and un-spliced).

Page 31: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

31

Table S1. NMR and refinement statistics for IGF2R domain 11E4 alone and with IGF2. a

Calculated with Molmol, and b iCing.

NMR distance and dihedral restraints Total NOE restraints

Domain 11 E4 2630

Domain11E4- complex 3080

Domain11 E4

(component)

IGF2 (component)

Unambiguous Ambiguous

1645 685

2604 414

1840 217

764 143

Unambiguous Intermolecular NOEs - 58 - -

Ambiguous Intermolecular NOEs - 4 - -

Backbone Φ and ψ 220 288 208 80

Hydrogen bond restraints 38 52 40 12

Energies (kcal mol-1)

ENOE 77.2 ± 8.1 103.3 ± 6.53 EDIHE 634.8 ± 3.20 955.3 ± 5.6 EVDW -1182.5 ± 12.1 -1712.7 ± 23.4 EELEC -5009.1 ± 116.8 -7328.4 ± 150.3 Structure statistics for Complex RMSD from ideal geometry

bonds (Å) 0.011 ± 0.0003 0.011 ± 0.0002 angles (º) 1.261 ± 0.0326 1.370 ± 0.0230 improper (º) 1.481 ± 0.0965 1.461 ± 0.0438 RMSD to mean for backbone atoms 0.67 ± 0.09a 0.44 ± 0.07 0.52 ± 0.11 a 0.62 ± 0.14 a

RMSD to mean for heavy atoms 1.08 ± 0.12a 0.77 ± 0.09 0.81 ± 0.14a 1.33 ± 0.23a

Ramachandran plot regionsb in most favoured (%) 78.7 77.5 77.5 77.2 in additional allowed (%) 19.1 19.9 20.7 18.6 in generously allowed (%) 1.8 1.5 1.2 1.9 in disallowed (%) 0.4 1.1 0.6 2.3 Z scoresb Unbound D11 Complex Triple IGF2 1st Generation packing -0.983 -2.232 -1.020 -4.998 2nd Generation packing -1.704 -2.474 -1.582 -3.314 Ramachandran plot -3.910 -3.984 -3.336 -5.284 Chi1/Chi1 normality -2.464 -3.454 -3.385 -3.608 Backbone conformation -0.920 -1.544 -0.928 -2.840 Bond lengths 1.097 1.100 1.061 1.043 Bond angles 0.561 0.585 0.608 0.633 Omega angles 0.685 0.664 0.649 0.691 Side chain planarity 1.514 1.452 1.479 1.372 Improper dihedral distribution 1.132 1.156 1.145 1.171 Inside/outside distribution 0.952 0.984 0.931 1.041

Page 32: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

32

Table S2. NMR and refinement statistics for chicken, echidna and opossum IGF2R domain 11 structures. aCalculated with CNS v1.2/ARIA 1.2, bCalculated with iCing.

Chicken Echidna Opossum

Total NOE restraints 3375 2766 4986

Intra residue NOEs 1079 1187 2233

Sequential/medium NOEs (residue i to i+1,2,3,4,5)

1185 995 1525

Long range NOEs 1111 584 1228

Dihedral angle restraints

Backbone Φ and ψ 294 150 276

Hydrogen bond restraints 32 50 46

Energies (kcal mol-1)

ENOE 22.01 11.38 1.48 0.35 113 20.64 EDIHE 0.59 0.41 0.13 0.08 7.65 11.69 EVDW -1307.78 25.50 -1162.63 20.878 -1421.01 14.04 EELEC -6175.72 143.66 -5679.45 53.266 -4599.07 82.06RMSD from ideal geometry

bonds (Å) 0.002 0.0001 0.001 0.0001 0.003 0.0001 angles (º) 0.336 0.0110 0.259 0.0141 0.467 0.0230 improper (º) 0.228 0.0130 0.124 0.0194 0.424 0.0300

RMSD to mean for backbone atoms

0.68 0.11 a 0.565 0.059 a 0.31 0.11 a

RMSD to mean for heavy atoms 2.09 0.27a 1.601 0.193a 0.56 0.16a

Ramachandran plot regionsc in most favoured (%) 75.2 84.9 76.2 in additional allowed (%) 19.5 13.4 21.4 in generously allowed (%) 2.3 0.8 1.6 in disallowed (%) 1.5 0.8 0.8 Z scoresb 1st Generation packing -0.609 -0.550 0.147 2nd Generation packing -1.996 -1.950 -1.361 Ramachandran plot -4.536 -3.290 -4.636 Chi1/Chi1 normality -3.824 -2.090 -3.207 Backbone conformation -2.559 -1.140 -2.559 Bond lengths 1.189 1.140 1.245 Bond angles 0.578 0.470 0.594 Omega angles 0.150 0.110 0.149 Side chain planarity 1.049 0.750 0.915 Improper dihedral distribution 0.676 0.660 0.816 Inside/outside distribution 1.062 1.050 0.954

Page 33: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

33

Table S3. IGF2 binding of chicken/echidna IGF2R domain 11s and chicken-echidna loop hybrids. Domain CD Loop Sequence KD (µM) Chicken CITDGPKTLNA no binding Chicken D11-ΔT1568 CIDGPKTLNA no binding Chicken D11-ΔT1568,ΔT1573 CIDGPK MNA no binding Chicken D11- ΔT1568,ΔT1573-P1569T CIDGTK MNA no binding Chicken D11- ΔT1568,ΔT1573-K1570G CIDGPG MNA no binding Chicken D11-EchidnaCD-F1567A CAEGTG IKA no binding Chicken D11-EchidnaCD-I1572A CFEGTG AKA no binding Chicken D11-EchidnaCD-F1567I CIEGTG IKA no binding Chicken D11-EchidnaCD-I1572M CFEGTG MKA no binding Chicken D11-EchidnaCD-E1568D CFDGTG IKA no binding Chicken D11-EchidnaCD-T1570P CFEGPG IKA no binding Chicken D11-EchidnaCD-G1571K CFEGTK IKA no binding Chicken D11-EchidnaCD CFEGTG IKA 0.4 Echidna D11 CFEGTG IKA 0.4 Chicken D11 Echidna AB+CD+FG CFEGTG IKA 0.1 AB-Loop KQESGYTISDIRKGSIRLGV

FG-Loop ANLHLKYKSVIS

Page 34: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

34

Table S4. Primer sequences for supplementary material and methods

Purpose Primer ID Sequence (5’-3’) Primer ID Sequence (5’-3’)

chicken domain 11 protein scaffold

CD11-5’-EcoRI

GAGAATTCGGATCCAAAAGCAACGTGCAAAAT

CD11-3’AvrII

CGGCTCTTCTGCAGACAGCAGCTCCATTAGC

CD11-L2S1-R

CGGCTCTTCCGGAAAACTAAGTAAAACT

CD11-L2S2-F

GGCCTAGGTCAGTGATGGTGATGGTGATGCACCTCTTCCTCGCA

echidna domain 11 loop grafts

ED11-L2S1-F

CGGCTCTTCCTGCTTTGAAGGAACTGGAATC

ED11-L2S2-R

CGGCTCTTCTTCCAGCTTTGATTCCAGTTCC

pHLsec restriction site

pHL-Mfe-F

CCCAATTGAAGCTTGCCACCATGGGGTTCCTTCCCAGCCC

pHL-EcoRI-Age-R

GAACCGGTGGATCCGAATTCAGCTACGCAACCCATCAGCAGCACGGAG

pHL-Mfe-BAMNO-F PCR

CCCAATTGAAGCTTGCCACCATGGGGTTCCTTCCCAGCCC

pHL-AvrII-XhoI-R

CTAGTCTCGAGCCTAGGTTAGTGATGGTGATGGTGGTGCTTGG

6xHis 2xStreptag

MfeI- CATCATCATCATCATCACCATCATGGAGGATCCTGGTCACATCCACAATTCGAGAAGGGTGGTGGTAGTGGTGGTGGTTCTGGAGGATCTGCTTGGTCACATCCACAATTTGAAAAAGGATCTGAGAATATCTACTTCCAGGGA-EcoRI

Chicken domain 11 with echidna AB, CD and FG loops

Synthetic gene

(5’- GAATTCGGATCCAAAAGCAACGTGCAAAATGACTGCCGGGTAATGAATCCTGCAACAGGCCACCTCTTTGATTTGACATCACTGAAGCAAGAATCTGGCTACACCATTTCTGATATCCGCAAAGGCTCCATTCGCTTAGGGGTGTGTGCTGAAGCTAAAAGCTCATGTGCTAATGGAGCTGCTGTCTGCTTCGAAGGAACTGGAATCAAAGCTGGAAAACTAAGTAAAACTTTAACTTACGAGGATCAAGTGCTGAAGCTTGTTTATGAAGATGGGGACCCCTGCCCCGCAAATTTGCACCTCAAGTACAAAAGTGTTATTAGTTTCGTTTGCAAGTCTGATGCTGGAGATGACAGCCAGCCTGTTTTTCTGTCTTTTGATGAGCAGACATGCACTAGTTACTTCTCTTGGCATACTTCCTTGGCTTGCGAGGAAGAGGTGCATCACCATCACCATCACTGACCTAGG-3’)

Page 35: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

35

Movie S1. The conformational dynamics induced in IGF2R domain 11 upon binding IGF2. The movie initially shows the solution structure of domain 11E4 in complex with IGF2 to illustrate the predominantly hydrophobic binding interface formed by the AB, CD, FG and HI loops. Colour coding is light blue: AB loop, wheat, CD, pale green, FG and light pink, HI. The surface of domain 11 is shown and IGF2 is rendered in red ribbons. The movie then goes on to show the changes in the binding site loops that occur upon binding IGF2. In particular, the conformation of the AB and FG loops can be seen to change significantly upon complex formation as they move to accommodate IGF2. These conformational changes help form a complimentary hydrophobic surface and modulate the interaction through the formation of new hydrogen bonds and salt bridges. The movie frames were generated using PyMOL 1.3 and encoded using Mencoder r32510-4.1.2. The Yale morph server was used to morph the unbound IGF2R-domain11E4 structure to the bound form (68).

Page 36: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

References and Notes

1. M. B. Renfree, T. A. Hore, G. Shaw, J. A. Marshall Graves, A. J. Pask, Evolution of genomic imprinting: Insights from marsupials and monotremes. Annu. Rev. Genomics Hum. Genet. 10, 241 (2009). doi:10.1146/annurev-genom-082908-150026 Medline

2. D. Haig, C. Graham, Genomic imprinting and the strange case of the insulin-like growth factor II receptor. Cell 64, 1045 (1991). doi:10.1016/0092-8674(91)90256-X Medline

3. J. K. Killian et al., M6P/IGF2R imprinting evolution in mammals. Mol. Cell 5, 707 (2000). doi:10.1016/S1097-2765(00)80249-X Medline

4. J. K. Killian et al., Monotreme IGF2 expression and ancestral origin of genomic imprinting. J. Exp. Zool. 291, 205 (2001). doi:10.1002/jez.1070 Medline

5. P. Ghosh, N. M. Dahms, S. Kornfeld, Mannose 6-phosphate receptors: new twists in the tale. Nat. Rev. Mol. Cell Biol. 4, 202 (2003). doi:10.1038/nrm1050 Medline

6. T. Kono et al., Birth of parthenogenetic mice that can develop to adulthood. Nature 428, 860 (2004). doi:10.1038/nature02402 Medline

7. M. M. Lau et al., Loss of the imprinted IGF2/cation-independent mannose 6-phosphate receptor results in fetal overgrowth and perinatal lethality. Genes Dev. 8, 2953 (1994). doi:10.1101/gad.8.24.2953 Medline

8. Z. Q. Wang, M. R. Fung, D. P. Barlow, E. F. Wagner, Regulation of embryonic growth and lysosomal targeting by the imprinted Igf2/Mpr gene. Nature 372, 464 (1994). doi:10.1038/372464a0 Medline

9. J. Brown et al., Structure and functional analysis of the IGF-II/IGF2R interaction. EMBO J. 27, 265 (2008). doi:10.1038/sj.emboj.7601938 Medline

10. J. Brown et al., Structure of a functional IGF2R fragment determined from the anomalous scattering of sulfur. EMBO J. 21, 1054 (2002). doi:10.1093/emboj/21.5.1054 Medline

11. C. Williams et al., Structural insights into the interaction of insulin-like growth factor 2 with IGF2R domain 11. Structure 15, 1065 (2007). doi:10.1016/j.str.2007.07.007 Medline

12. O. J. Zaccheo et al., Kinetics of insulin-like growth factor II (IGF-II) interaction with domain 11 of the human IGF-II/mannose 6-phosphate receptor: function of CD and AB loop solvent-exposed residues. J. Mol. Biol. 359, 403 (2006). doi:10.1016/j.jmb.2006.03.046 Medline

13. C. Delaine et al., A novel binding site for the human insulin-like growth factor-II (IGF-II)/mannose 6-phosphate receptor on IGF-II. J. Biol. Chem. 282, 18886 (2007). doi:10.1074/jbc.M700531200 Medline

14. Information on materials and methods is available on Science Online.

Page 37: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

15. Single-letter abbreviations for the amino acid residues are as follows: A, Ala; C, Cys; D, Asp; E, Glu; F, Phe; G, Gly; H, His; I, Ile; K, Lys; L, Leu; M, Met; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; and Y, Tyr.

16. C. A. Yandell, A. J. Dunbar, J. F. Wheldrake, Z. Upton, The kangaroo cation-independent mannose 6-phosphate receptor binds insulin-like growth factor II with low affinity. J. Biol. Chem. 274, 27076 (1999). doi:10.1074/jbc.274.38.27076 Medline

17. W. M. Canfield, S. Kornfeld, The chicken liver cation-independent mannose 6-phosphate receptor lacks the high affinity binding site for insulin-like growth factor II. J. Biol. Chem. 264, 7100 (1989). Medline

18. T. Warnecke, C. C. Weber, L. D. Hurst, Why there is more to protein evolution than protein function: Splicing, nucleosomes and dual-coding sequence. Biochem. Soc. Trans. 37, 756 (2009). doi:10.1042/BST0370756 Medline

19. W. C. Warren et al., Genome analysis of the platypus reveals unique signatures of evolution. Nature 453, 175 (2008). doi:10.1038/nature06936 Medline

20. X. Xiao, Z. Wang, M. Jang, C. B. Burge, Coevolutionary networks of splicing cis-regulatory elements. Proc. Natl. Acad. Sci. U.S.A. 104, 18583 (2007). doi:10.1073/pnas.0707349104 Medline

21. C. N. Dewey, I. B. Rogozin, E. V. Koonin, Compensatory relationship between splice sites and exonic splicing signals depending on the length of vertebrate introns. BMC Genomics 7, 311 (2006). doi:10.1186/1471-2164-7-311 Medline

22. J. K. Killian et al., Divergent evolution in M6P/IGF2R imprinting from the Jurassic to the Quaternary. Hum. Mol. Genet. 10, 1721 (2001). doi:10.1093/hmg/10.17.1721 Medline

23. A. J. Pask et al., Analysis of the platypus genome suggests a transposon origin for mammalian imprinting. Genome Biol. 10, R1 (2009). doi:10.1186/gb-2009-10-1-r1 Medline

24. T. A. Hore, J. E. Deakin, J. A. Marshall Graves, The evolution of epigenetic regulators CTCF and BORIS/CTCFL in amniotes. PLoS Genet. 4, e1000169 (2008). doi:10.1371/journal.pgen.1000169 Medline

25. R. L. Trivers, Parent-offspring conflict. Integr. Comp. Biol. 14, 249 (1974). doi:10.1093/icb/14.1.249

26. A. Chess, Mechanisms and consequences of widespread random monoallelic expression. Nat. Rev. Genet. 13, 421 (2012). doi:10.1038/nrg3239 Medline

27. A. Denley et al., Structural determinants for high-affinity binding of insulin-like growth factor II to insulin receptor (IR)-A, the exon 11 minus isoform of the IR. Mol. Endocrinol. 18, 2502 (2004). doi:10.1210/me.2004-0183 Medline

28. F. E. Carrick et al., Interaction of insulin-like growth factor (IGF)-I and -II with IGF binding protein-2: Mapping the binding surfaces by nuclear magnetic resonance. J. Mol. Endocrinol. 34, 685 (2005). doi:10.1677/jme.1.01756 Medline

Page 38: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

29. F. Delaglio et al., NMRPipe: A multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 6, 277 (1995). doi:10.1007/BF00197809 Medline

30. W. F. Vranken et al., The CCPN data model for NMR spectroscopy: Development of a software pipeline. Proteins 59, 687 (2005). doi:10.1002/prot.20449 Medline

31. A. T. Brünger, P. D. Adams, L. M. Rice, Recent developments for the efficient crystallographic refinement of macromolecular structures. Curr. Opin. Struct. Biol. 8, 606 (1998). doi:10.1016/S0959-440X(98)80152-8 Medline

32. A. J. Nederveen et al., RECOORD: A recalculated coordinate database of 500+ proteins from the PDB using restraints from the BioMagResBank. Proteins 59, 662 (2005). doi:10.1002/prot.20408 Medline

33. W. Rieping et al., ARIA2: Automated NOE assignment and data integration in NMR structure calculation. Bioinformatics 23, 381 (2007). doi:10.1093/bioinformatics/btl589 Medline

34. D. La et al., 3D-SURFER: Software for high-throughput protein surface comparison and analysis. Bioinformatics 25, 2843 (2009). doi:10.1093/bioinformatics/btp542 Medline

35. R. A. Laskowski, PDBsum: Summaries and analyses of PDB structures. Nucleic Acids Res. 29, 221 (2001). doi:10.1093/nar/29.1.221 Medline

36. R. S. Lipsitz, N. Tjandra, Residual dipolar couplings in NMR structure analysis. Annu. Rev. Biophys. Biomol. Struct. 33, 387 (2004). doi:10.1146/annurev.biophys.33.110502.140306 Medline

37. A. R. Aricescu, W. Lu, E. Y. Jones, A time- and cost-efficient system for high-level protein production in mammalian cells. Acta Crystallogr. D Biol. Crystallogr. 62, 1243 (2006). doi:10.1107/S0907444906029799 Medline

38. G. Yeo, C. B. Burge, Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 11, 377 (2004). doi:10.1089/1066527041410418 Medline

39. J. F. Abril, R. Castelo, R. Guigó, Comparison of splice sites in mammals and chicken. Genome Res. 15, 111 (2005). doi:10.1101/gr.3108805 Medline

40. G. Kol, G. Lev-Maor, G. Ast, Human-mouse comparative analysis reveals that branch-site plasticity contributes to splicing regulation. Hum. Mol. Genet. 14, 1559 (2005). doi:10.1093/hmg/ddi164 Medline

41. S. H. Schwartz et al., Large-scale comparative analysis of splicing signals and their corresponding splicing factors in eukaryotes. Genome Res. 18, 88 (2008). doi:10.1101/gr.6818908 Medline

42. F. O. Desmet et al., Human Splicing Finder: An online bioinformatics tool to predict splicing signals. Nucleic Acids Res. 37, e67 (2009). doi:10.1093/nar/gkp215 Medline

Page 39: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

43. P. Puigbò, I. G. Bravo, S. Garcia-Vallve, CAIcal: A combined set of tools to assess codon usage adaptation. Biol. Direct 3, 38 (2008). doi:10.1186/1745-6150-3-38 Medline

44. R. B. Voelker, J. A. Berglund, A comprehensive computational characterization of conserved mammalian intronic sequences reveals conserved motifs associated with constitutive and alternative splicing. Genome Res. 17, 1023 (2007). doi:10.1101/gr.6017807 Medline

45. W. G. Fairbrother, R. F. Yeh, P. A. Sharp, C. B. Burge, Predictive identification of exonic splicing enhancers in human genes. Science 297, 1007 (2002); 10.1126/science.1073774. doi:10.1126/science.1073774 Medline

46. X. H. Zhang, L. A. Chasin, Computational definition of sequence motifs governing constitutive exon splicing. Genes Dev. 18, 1241 (2004). doi:10.1101/gad.1195304 Medline

47. D. Auch, M. Reth, Exon trap cloning: Using PCR to rapidly detect and clone exons from genomic DNA fragments. Nucleic Acids Res. 18, 6743 (1990). doi:10.1093/nar/18.22.6743 Medline

48. K. Tanaka, A. Watakabe, Y. Shimura, Polypurine sequences within a downstream exon function as a splicing enhancer. Mol. Cell. Biol. 14, 1347 (1994). Medline

49. I. Edery, N. Sonenberg, Cap-dependent RNA splicing in a HeLa nuclear extract. Proc. Natl. Acad. Sci. U.S.A. 82, 7590 (1985). doi:10.1073/pnas.82.22.7590 Medline

50. J. D. Dignam, R. M. Lebovitz, R. G. Roeder, Accurate transcription initiation by RNA polymerase II in a soluble extract from isolated mammalian nuclei. Nucleic Acids Res. 11, 1475 (1983). doi:10.1093/nar/11.5.1475 Medline

51. B. J. Lam et al., Enhancer-dependent 5′-splice site control of fruitless pre-mRNA splicing. J. Biol. Chem. 278, 22740 (2003). doi:10.1074/jbc.M301036200 Medline

52. I. Girault et al., Altered expression pattern of alternatively spliced estrogen receptor beta transcripts in breast carcinoma. Cancer Lett. 215, 101 (2004). doi:10.1016/j.canlet.2004.05.006 Medline

53. A. Gabory et al., H19 acts as a trans regulator of the imprinted gene network controlling growth in mice. Development 136, 3413 (2009). doi:10.1242/dev.036061 Medline

54. G. Smits et al., Conservation of the H19 noncoding RNA and H19-IGF2 imprinting mechanism in therians. Nat. Genet. 40, 971 (2008). doi:10.1038/ng.168 Medline

55. J. R. Weidman, D. C. Dolinoy, K. A. Maloney, J. F. Cheng, R. L. Jirtle, Imprinting of opossum Igf2r in the absence of differential methylation and air. Epigenetics 1, 49 (2006). doi:10.4161/epi.1.1.2592 Medline

56. I. Y. Yotova et al., Identification of the human homolog of the imprinted mouse Air non-coding RNA. Genomics 92, 464 (2008). doi:10.1016/j.ygeno.2008.08.004 Medline

Page 40: Supplementary Materials for - Sciencescience.sciencemag.org/highwire/filestream/593923/field...CcpNmr Analysis version 2.1 (30). Structures were calculated iteratively with CNS 1.2

57. I. R. Chandrashekaran et al., The N-terminal subdomain of insulin-like growth factor (IGF) binding protein 6. Structure and interaction with IGFs. Biochemistry 46, 3065 (2007). doi:10.1021/bi0619876 Medline

58. H. Terasawa et al., Solution structure of human insulin-like growth factor II; recognition sites for receptors and binding proteins. EMBO J. 13, 5590 (1994). Medline

59. A. M. Torres et al., Solution structure of human insulin-like growth factor II. Relationship to receptor and binding protein interactions. J. Mol. Biol. 248, 385 (1995). doi:10.1016/S0022-2836(95)80058-1 Medline

60. S. J. Culler, K. G. Hoff, R. B. Voelker, J. A. Berglund, C. D. Smolke, Functional selection and systematic analysis of intronic splicing elements identify active sequence motifs and associated splicing factors. Nucleic Acids Res. 38, 5152 (2010). doi:10.1093/nar/gkq248 Medline

61. X. H. Zhang, T. Kangsamaksin, M. S. Chao, J. K. Banerjee, L. A. Chasin, Exon inclusion is dependent on predictable exonic splicing enhancers. Mol. Cell. Biol. 25, 7323 (2005). doi:10.1128/MCB.25.16.7323-7332.2005 Medline

62. J. V. Chamary, L. D. Hurst, Biased codon usage near intron-exon junctions: Selection on splicing enhancers, splice-site recognition or something else? Trends Genet. 21, 256 (2005). doi:10.1016/j.tig.2005.03.001 Medline

63. J. L. Parmley, L. D. Hurst, Exonic splicing regulatory elements skew synonymous codon usage near intron-exon boundaries in mammals. Mol. Biol. Evol. 24, 1600 (2007). doi:10.1093/molbev/msm104 Medline

64. J. L. Parmley, A. O. Urrutia, L. Potrzebowski, H. Kaessmann, L. D. Hurst, Splicing and the evolution of proteins in mammals. PLoS Biol. 5, e14 (2007). doi:10.1371/journal.pbio.0050014 Medline

65. T. Warnecke, J. L. Parmley, L. D. Hurst, Finding exonic islands in a sea of non-coding sequence: Splicing related constraints on protein composition and evolution are common in intron-rich genomes. Genome Biol. 9, R29 (2008). doi:10.1186/gb-2008-9-2-r29 Medline

66. Z. Zhang et al., Noisy splicing, more than expression regulation, explains why some exons are subject to nonsense-mediated mRNA decay. BMC Biol. 7, 23 (2009). doi:10.1186/1741-7007-7-23 Medline

67. K. Bryson et al., Protein structure prediction servers at University College London. Nucleic Acids Res. 33, W36 (2005). doi:10.1093/nar/gki410 Medline

68. W. G. Krebs, M. Gerstein, The morph server: A standardized system for analyzing and visualizing macromolecular motions in a database framework. Nucleic Acids Res. 28, 1665 (2000). doi:10.1093/nar/28.8.1665 Medline


Recommended