Date post: | 08-Jan-2018 |
Category: |
Documents |
Upload: | jared-myron-sanders |
View: | 218 times |
Download: | 0 times |
Evolution of alternative splicing
Mikhail GelfandInstitute for Information Transmission Problems,
Russian Academy of Sciences
Workshop “Gene Annotation Analysis and Alternative Splicing”Berlin, December 2004
Overview
• Exon-intron structure of orthologous genes– human–mouse – Drosophila–Anopheles
• Sequence divergence in alternative and constitutive regions
• Evolution of splicing and regulatory sites • Alternative splicing and protein structure
Alternative splicing of human(and mouse) genes
5% Sharp, 1994 (Nobel lecture)35% Mironov-Fickett-Gelfand, 199938% Brett-…-Bork, 2000 (ESTs/mRNA)22% Croft et al., 2000 (ISIS database)55% Kan et al., 2001 (11% AS patterns conserved in mouse ESTs)
42% Modrek et al., 2001 (HASDB)~33% CELERA, 2001
59% Human Genome Consortium, 200128% Clark and Thanaraj, 2002all? Kan et al., 2002 (17-28% with total minor isoform frequency > 5%)
41% (mouse) FANTOM & RIKEN, 200260% (mouse) Zavolan et al., 2003
• Exon-intron structure of orthologous genes– human – mouse – Drosophila–Anopheles
• Sequence divergence in alternative and constitutive regions
• Alternative splicing and protein structure
Data
• known alternative splicing– HASDB (human, ESTs+mRNAs)– ASMamDB (mouse, mRNAs+genes)
• additional variants– UniGene (human and mouse EST clusters)
• complete genes and genomic DNA– GenBank (full-length mouse genes)– human genome
Methods
• TBLASTN (initial identification of orthologs: mRNAs against genomic DNA)
• BLASTN (human mRNAs against genome)• Pro-EST (spliced alignment, ESTs and mRNA
against genomic DNA)• Pro-Frame (spliced alignment, proteins against
genomic DNA)– confirmation of orthology
• same exon-intron structure• >70% identity over the entire protein length
– analysis of conservation of alternative splicing• conservation of exons or parts of exons• conservation of sites
166 gene pairs
42 84 40
human mouse
Known alternative splicing:
126 124
Elementary alternatives
Cassette exon
Alternative donor site
Alternative acceptor site
Retained intron
Human genes
mRNA EST
cons. non-cons. cons. non-cons.
Cassette exons 56 25 74 26Alt. donors 18 7 16 10Alt. acceptors 13 5 19 15Retained introns 4 3 5 0Total 96 30 114 51Total genes 45 28 41 44
Conserved elementary alternatives: 69% (EST) - 76% (mRNA)
Genes with all isoforms conserved: 57 (45%)
Mouse genes
mRNA EST
cons. non-cons. cons. non-cons.
Cassette exons 70 5 39 9Alt. donors 24 6 17 6Alt. acceptors 15 6 16 9Retained introns 8 7 10 4Total 117 24 82 28Total genes 68 22 30 26
Conserved elementary alternatives: 75% (EST) - 83% (mRNA)
Genes with all isoforms conserved: 79 (64%)
Real or aberrant non-conserved AS?• 24-31% human vs. 17-25% mouse elementary
alternatives are not conserved• 55% human vs 36% mouse genes have at least
one non-conserved variant• denser coverage of human genes by ESTs:
– pick up rare (tissue- and stage-specific) => younger variants
– pick up aberrant (non-functional) variants• 17-24% mRNA-derived elementary alternatives
are non-conserved (compared to 25-32% EST-derived ones)
smoothelin
human
common
mouse
human-specific donor-site
mouse-specific cassette exon
autoimmune regulator
human
common
mouse
retained intron; downstream exons read in two frames
Na/K-ATPase gamma subunit (Fxyd2)
human
mouse
(deleted) intron
com
mon
alternative acceptor site within (inserted) intron
Comparison to other studies.Modrek and Lee, 2003: skipped exons
• 98% constitutive exons are conserved• 98% major form exons are conserved• 28% minor form exons are conserved
• inclusion level is a good predictor of conservation
• inclusion level of conserved exons in human and mouse is highly correlated
Minor non-conserved form exons are errors? No:
• minor form exons are supported by multiple ESTs
• 28% of minor form exons are upregulated in one specific tissue
• 70% of tissue-specific exons are not conserved
• splicing signals of conserved and non-conserved exons are similar
Thanaraj et al., 2003:extrapolation from EST comparisons
• 61% (47-86%) alternative splice junctions are conserved
• 74% (71-78%) constitutive splice junctions are conserved
• the former number is consistent with other studies, whereas the latter seems to be an underestimate
Regulation of alternative splicing: introns
• Brudno et al., 2001: UGCAUG is over-represented downstream of tissue-specific exons (brain, muscle).
• Sorek and Ast, 2003: Enhanced conservation (between human and mouse) in intronic sequences flanking alternatively spliced exons. UGCAUG is over-represented in conserved regions.
• Exon-intron structure of orthologous genes– human – mouse – Drosophila–Anopheles
• Sequence divergence in alternative and constitutive regions
• Alternative splicing and protein structure
Fruit fly and mosquito
• Technically more difficult than human-mouse:– incomplete genomes– difficulties in alignment, especially at gene
termini– changes in exon-intron structure irrespective of
alternative splicing (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles)
Filtering of the
dataset
FlyBase
alternatively spliced fruit fly gene and all its protein isoforms
Non-canonical sites:exclude isoform
Pro-Frame alignment of all isoforms with the fruit fly genome. Frameshift or in-frame stop for at least one isoform:
exclude gene
No constitutive segments inside gene:exclude gene
List of orthologous
pairs
List of filtered fruit fly genesENSEMBL
Pro-Frame alignment of all fruit fly isoforms with the mosquito genome mosquito
genesSimilarity for all isoforms <30%: exclude
orthologous pair
Poly-N within aligned region in the mosquito genome for at least one isoform:
exclude orthologous pair
Set of filtered orthologous pairs
Classification of exons and coding segments • for each pair of isoforms define: mutually Exclusive
exon, Cassette exon, retained Intron, alternative Acceptor site, alternative Donor site; then merge these definitions over all pairs for a gene
--I--
----D ---AD -C-------- EC------A-E----
Left marginal coding segments
Internalcoding segments
isoform 1
isoform 2
isoform 3
- exon - alternative coding segment - constitutive coding segment
----D
E----
E----
E----
E----
----D
----D
-----
-----
---A-
---A-
EC---
--I--
---AD
---AD
-C--- -C---
-C---
constitutive exon
---AD
Right marginal coding segments
Left marginal exons
Internal exons
Right marginal exons
How to define conservation of fruit fly alternative exons
• Alignment of an exon may depend on the isoform. In the cases listed below, shorter exons are assumed to be conserved, whereas longer ones are considered missing
isoform 1
isoform 2
- similarity in alignments of all isoforms including this segment was less than 35%
- similarity in alignment of at least one isoform including this segment was greater than 35%
**missing exon **missing exon *missing exon ***missing exon
Conservation of fruit coding segments in the
mosquito genome. Small (curated) sample
Type of segment
Missing Conserved Total
left marginal (alternative)
46 (77%) 14 (23%) 60 (12%)
internal alternative
22 (55%) 18 (45%) 40 (8%)
internal constitutive
83 (24%) 264 (76%) 347 (69%)
right marginal (alternative)
31 (56%) 24 (44%) 55 (11%)
Total 182 (36%) 320 (64%) 502 (100%)
Conservation of fruit coding segments in the
mosquito genome. Large (non-curated) sample
Type of segment
Missing Conserved Total
left marginal (alternative)
858 (57%) 639 (43%) 1497 (23%)
internal alternative
215 (55%) 178 (45%) 393 (6%)
internal constitutive
903 (23%) 2999 (77%) 3902 (59%)
right marginal (alternative)
414 (53%) 369 (47%) 783 (12%)
Total 2390 (36%) 4185 (64%) 6575 (100%)
Classification of slice events for fruit fly exons
• divided exon• joined exon• exactly conserved exon• mixed;
d eDr
An
- slice
j jj d m jd j m m j
- exon
Different types of events for the same exon dependent on an isoform
dDr (isoform 1)
- slice
j
- exon
An
d
Dr (isoform 2)j
An
j
j
e
e
Types of elementary alternatives and conservation of fruit fly exons in the mosquito genome. Large (non-curated) sample, internal exons
missing mixed joined divided exact
constitutive 728 (23%) 212 (7%) 754 (23%) 407 (13%) 1356 (42%)
Donor site 229 (50%) 21 (5%) 52 (11%) 47 (10%) 130 (28%)
Acceptor site 390 (43%) 45 (5%) 133 (15%) 124 (14%) 250 (28%)
retained Intron 37 (70%) 3 (6%) 2 (4%) 8 (15%) 6 (11%)
Cassette exon 90 (59%) 4 (3%) 9 (6%) 6 (4%) 50 (33%)
Exclusive exon 10 (15%) 1 (1%) 1 (1%) 1 (1%) 55 (82%)
Types of elementary alternatives and conservation of fruit fly exons in the mosquito genome. Large (non-curated) sample, internal exons
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
CONSTANTexon
Donor site Acceptorsite
retainedIntron
Cassetteexon
Exclusiveexon
EXACT
divided
joined
mixed
MISSING
Fruit fly and mosquito
• The general results are the same as for the human-mouse comparison: more conservation of constitutive segments than alternative ones:– 75% const. and 45% alt. segments are
conserved– constitutive exons: >50% conserved exactly,
~25% intron in drosophila, ~8% intron in anopheles
– conservation of alternatives: 36% cassette exons, 51% donor sites, 63% acceptor sites, 83% mutually exclusive exons
• Exon-intron structure of orthologous genes– human – mouse – Drosophila – Anopheles
• Sequence divergence in alternative and constitutive regions
• Alternative splicing and protein structure
Concatenates of constitutive and alternative regions in all genes: different evolutionary rates
Columns (left-to-right) – (1) constitutive regions; (2–4) alternative regions: N-end, internal, C-end
0,1760,199
0,187
0,301
0,00
0,10
0,20
0,30
Constitutive N-endalternative
Internalalternative
C-endalternative
d N/dS
0,886 0,874 0,878
0,807
0,7
0,8
0,9
Constitutive N-endalternative
Internalalternative
C-endalternative
Am
ino-
acid
iden
tity
• Relatively more non-synonimous substitutions in alternative regions (higher dN/dS ratio)
• Less amino acid identity in alternative regions
Genes with length of both const. and alt. reg. > 80 nt
• Horizontal axis: difference in dN/dS in const. and alt. regions• Vertical axis: number of genes• Violet : dN/dS in const. regions > dN/dS in alt. regions • Yellow: dN/dS in const. regions < dN/dS in alt. regions
658
207
79
27 19 27
773
333
140
7144 58
0
100
200
300
400
500
600
700
800
900
0.0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 >0.5
279 proteins from SwissProt+TREMBL with “varsplic” features
constitutive alternative % alt. to all
length 199270 66054 25%all SNPs 1126 368 25%synonymous 576 (51%) 167 (45%) 22%benign 401 (36%) 141 (38%) 26%damaging 149 (13%) 60 (16%) 29%
again, there is some evidence of positive selection towards diversity. This is not due to aberrant ESTs
(only protein data are considered).
• Exon-intron structure of orthologous genes– human – mouse – Drosophila – Anopheles
• Sequence divergence in alternative and constitutive regions
• Alternative splicing and protein structure
Data• Alternatively spliced genes (proteins) from
SwissProt– human– mouse
• Protein structures from PDB• Domains from InterPro
– SMART– Pfam– Prosite– etc.
a)
6%10%
15%37%
40%
34%
21%
19%
6%13%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Expected Observed
Non-domain functional units partially
Domains partially
No annotated unit affected
Non-domain functional units completely
Domains completely
Alternative splicing avoids disrupting domains (and non-domain units)
Control:
fix the domain structure; randomly place alternative regions
… and this is not simply a consequence of the (disputed) exon-domain correlation
0
1
Rat
io(o
bser
vere
d/ex
pect
ed)
Mouse Human Mouse Human Mouse Human
nonAS_Exons AS_Exons AS
AS&Exon boundaries and SMART domains
inside domainsoutside domains
Positive selection towards domain shuffling (not simply avoidance of disrupting domains)
a)
6%10%
15%37%
40%
34%
21%
19%
6%13%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Expected Observed
Non-domain functional units partially
Domains partially
No annotated unit affected
Non-domain functional units completely
Domains completely
b)
Domains completely
Non-domain units
completely
No annotated
units affected
Expected Observed
Short (<50 aa) alternative splicing events within domains target protein functional sites
a)
6%10%
15%37%
40%
34%
21%
19%
6%13%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Expected Observed
Non-domain functional units partially
Domains partially
No annotated unit affected
Non-domain functional units completely
Domains completely
c)
Prosite patterns
unaffected
Prosite patterns
affected
FT positions
unaffected
FT positions affected
Expected Observed
An attempt of integration
• AS is often young (as opposed to degenerating)
• young AS isoforms are often minor and tissue-specific
• … but still functional– although unique isoforms may be result of aberrant
splicing• AS regions show evidence for positive
selection – excess damaging SNPs– excess non-synonymous codon substitutions
What to do
• Each isoform (alternative region) can be characterized:– by conservation (between genomes)– if conserved, by selection (positive vs negative)
• human-mouse, also add rat; compare species of Drosophila and Caenorhabditis
– pattern of SNPs (synonymous, benign, damaging)– tissue-specificity
• in particular, whether it is cancer-specific– degree of inclusion (major/minor)– functionality (for isoforms)
• whether it generates a frameshift• how bad it is (the distance between the stop-codon and
the last exon-exon junction)
What to expect
• Cancer-specific isoforms will be less functional and more often non-conserved
• Set of non-conserved isoforms will contain a larger fraction of non-functional isoforms; and this may influence evolutionary conclusions on the sequence level
• Still, after removal of non-functional isoforms, one would see positive selection in alternative regions (more non-synonymous substitutions compared to constant regions etc.), especially in tissue-specific ones
ReferencesNurtdinov RN, Artamonova II, Mironov AA, Gelfand MS (2003)
Low conservation of alternative splicing patterns in the human and mouse genomes. Human Molecular Genetics 12: 1313-1320.
Kriventseva EV, Koch I, Apweiler R, Vingron M, Bork P, Gelfand MS, Sunyaev S. (2003) Increase of functional diversity by alternative splicing. Trends in Genetics 19: 124-128.
Brudno M, Gelfand MS, Spengler S, Zorn M, Dubchak I, Conboy JG (2001) Computational analysis of candidate intron regulatory elements for tissue-specific alternative pre-mRNA splicing. Nucleic Acids Research 29: 2338-2348.
Mironov AA, Fickett JW, Gelfand MS (1999). Frequent alternative splicing of human genes. Genome Research 9: 1288-1293.
Acknowledgements
• Discussions– Vsevolod Makeev (GosNIIGenetika)– Eugene Koonin (NCBI)– Igor Rogozin (NCBI)– Dmitry Petrov (Stanford)
• Support– Ludwig Institute of Cancer Research– Howard Hughes Medical Institute– Russian Fund of Basic Research– Russian Academy of Sciences
Authors• Andrei Mironov (Moscow State University) – spliced alignment• Ramil Nurtdinov (Moscow State University) – human/mouse
comparison• Irena Artamonova (Institute of Bioorganic Chemistry, now
Institute of Bioinformaics, GSF) – human/mouse comparison, MAGEA family
• Dmitry Malko (GosNIIGenetika) – Drosophila/Anopheles comparison
• Inna Dubchak (Lawrence Berkeley Lab) – sites• Michael Brudno (UC Berkeley, now Stanford) – sites• Ekaterina Ermakova (Moscow State University) – evolution of
alternative/constitutive regions• Vasily Ramensky (Institute of Molecular Biology) – SNPs• Eugenia Kriventseva (EBI, now BASF) – protein structure• Shamil Sunyaev (EMBL, now Harvard University Medical
School) – protein structure