Alternative Splicing

Post on 25-Feb-2016

45 views 1 download

description

Alternative Splicing. As an introduction to microarrays. Human Genome. 90,000 Human proteins, initially assumed near that number of genes (initial estimates 153,000) The 1000 cell roundworm Caenorhabditis elegans has 19,500 genes, corn has 40,000 genes - PowerPoint PPT Presentation

transcript

Alternative Splicing

As an introduction to microarrays

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Human Genome

• 90,000 Human proteins, initially assumed near that number of genes (initial estimates 153,000)

• The 1000 cell roundworm Caenorhabditis elegans has 19,500 genes, corn has 40,000 genes

• Current estimates are 25,000 or fewer genes• Alternative splicing allows different tissue types

to perform different function with same gene assortment

Implications

• 75% of human genes are subject to alternative editing

• faulty gene splicing leads to cancer and congenital diseases.

• gene therapy can use splicing

Application

• We talked before about apoptotis when the cell determines it cant be repaired

• Bcl-x is a regulator of apoptotis, is alternatively spliced to produce either Bcl-x(L) that suppresses apoptosis, or Bcl-x(S) that promotes it.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Spliceosome

• Five snRNA molecules U1, U2, U3, U4, U5, U6 combine with as many as 150 proteins to form the spliceosome

• It recognizes sites where introns begin and end – Cuts introns out of pre-mRNA – joins exons

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Spliceosome

• The 5’ splice site is at the beginning of the intron, the 3’ site is at the end

• The average human protein coding gene is 28000 nucleotides long with 8.8 exons separated by 7.8 introns

• exons are 120 nucleotides long while introns are 100-100,000 nucleotides long

Splicing errors

• familial dysautonomia results from a single-nucleotide mutation that causes a gene to be alternatively spliced in nervous system tissue

• The decrease in the IKBKAP protein leads to abnormal nervous system development (half die before 30)

• > 15% of gene mutations that cause genetic diseases and cancers are caused by splicing errors.

Why splicing• Each gene generates 3 alternatively spliced mRNAs• Why so much intron (1-2% of genome is exons)?• Mouse and human differences are almost all splicing• Half of the human genome is made up of transposable

elements, Alus being the most abundant (1.4 million copies)– They continue to multiply and insert themselves into the

genome at the rate of one insertion per 100 human births• mutations in the Alu can create a 5’ or 3’ site in an intron

causing it to be an exon• This mutation doesn’t impact existing exons• It only has effect when it is alternatively spliced in

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Microarrays For Alt. Splicing

• Use short oligonucleotides• Get a guess at the rate of expression of

the oligo

Exon 1

Exon 3

Exon 2 Exon 4 Exon 5

AffymetrixMicroarrays For Alt. Splicing

Exon 1

Exon 3

Exon 2 Exon 4 Exon 5

Exon 1 Exon 2 Exon 4 Exon 5

Exon 1 Exon 3 Exon 5

Isoform 1:

Isoform 2:

Probe typesConstitutiveJunction ExonUnique (“Cassette”)

Ideal Microarray Readings

Exon 1 Exon 2 Exon 4 Exon 5

Exon 1 Exon 3 Exon 5

Isoform 1:

Isoform 2:

Probe typesConstitutiveExonJunctionUnique (“Cassette”)

a

a

b

c

d

e

Probe

Exp

ress

ion

a b c d e

Motivation

• Why alternatively splice?• How does it affect the resulting

proteins?• Look at domains:

– High level summary of protein– ~80% of eukaryotic proteins are multi-

domain– Domains are big relative to an exon

Some Previous Work

• Signatures of domain shuffling in the human genome. Kaessmann, 2002.Intron phase symmetry around domain

boundaries• The Effects of Alternative Splicing On

Transmembrane Proteins in the Mouse Genome. Cline, 2004.Half of TM proteins studied affected by alt-

splicing.

Method

• Predict Alternative Splicing• Predict Protein Domains• Look for effects of Alt-Splicing on

predicted domains– “Swapping”– “Knockout”– “Clipping”

Microarray Design

• Genes based on mRNA and EST data in mouse

• Mapped to Feb. 2002 mouse genome freeze

• ~500,000 probes (~66,000 sets)• ~100,000 transcripts• ~13,000 gene models

Technical work

Genome Space

transcripts

probes

Pro

vide

d da

ta

Overlap

Overlap

Overlap

gene models

E@NM_021320 cc-chr10-000017.82.0G6836022@J911445 cc-chr10-000017.91.1G6807921@J911524_RC cc-chr10-000018.4.0

Probe to transcript mapping

Generated D

ata

Predicting Alternative Splicing

• Using mouse alt-splicing microarrays• Data from Manny Ares

– 8 tissues– 3 replicates of each tissue

Predicting Alternative Splicing

• General Approach: Clustering, then Anti-Clustering

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

107 Clusters Detail View

Gene Expression Measurement

• mRNA expression represents dynamic aspects of cell

• mRNA expression can be measured with latest technology

• mRNA is isolated and labeled with fluorescent protein

• mRNA is hybridized to the target; level of hybridization corresponds to light emission which is measured with a laser

Gene Expression Microarrays

The main types of gene expression microarrays:

• Short oligonucleotide arrays (Affymetrix);• cDNA or spotted arrays (Brown/Botstein).• Long oligonucleotide arrays (Agilent Inkjet);• Fiber-optic arrays• ...

Affymetrix Microarrays

50um

1.28cm

~107 oligonucleotides, half Perfectly Match mRNA (PM), half have one Mismatch (MM)Raw gene expression is intensity difference: PM - MM

Raw image

Microarray Potential Applications

• Biological discovery– new and better molecular diagnostics– new molecular targets for therapy– finding and refining biological pathways

• Recent examples– molecular diagnosis of leukemia, breast cancer, ...– appropriate treatment for genetic signature– potential new drug targets

Microarray Data Analysis Types

• Gene Selection– find genes for therapeutic targets– avoid false positives (FDA approval ?)

• Classification (Supervised)– identify disease – predict outcome / select best treatment

• Clustering (Unsupervised)– find new biological classes / refine existing ones– exploration

• …

Microarray Data Mining Challenges

• too few records (samples), usually < 100 • too many columns (genes), usually > 1,000• Too many columns likely to lead to False

positives• for exploration, a large set of all relevant

genes is desired• for diagnostics or identification of therapeutic

targets, the smallest set of genes is needed• model needs to be explainable to biologists