Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome...

Post on 22-Dec-2015

220 views 1 download

Tags:

transcript

Whole genome transcriptome Whole genome transcriptome

variation in variation in Arabidopsis thalianaArabidopsis thaliana

Xu Zhang

Borevitz Lab

Whole genome transcriptome Whole genome transcriptome

variation in variation in Arabidopsis thalianaArabidopsis thaliana

Xu Zhang

Borevitz Lab

Arabidopsis thaliana have been adapted to highly variable environments

Transcription and splicing

Chromosomal DNA

Transcription

Nuclear RNA

Exon 1 Exon 2 Exon 3Intron 1 Intron 2

RNA splicing

Messenger RNA Exon 1 Exon 2 Exon 3 Exon 1 Exon 3

Whole genome tiling array

Genetic hybridization polymorphisms could affect the estimation of gene expression

High density and resolution: 1.6M unique probes at 35bp spacing

Without bias toward known transcripts

Col♀ x Col♂ Van ♀ x Van ♂ Col ♀ x Van ♂Van ♀ x Col ♂

parental strains and reciprocal F1 hybrids mRNA from total RNA; genomic DNA

The experiment

Double-stranded random labeling

Random reverse transcription

Double-stranded cDNA

Random priming

AAAAA

AAAAA

Sequence polymorphisms

Gene expression variation

Splicing variation

A functional network of differentially spliced genes

HMM for a de novo transcription profiling

Outlines

Sequence polymorphisms

Gene expression variation

Splicing variation

A functional network of differentially spliced genes

HMM for a de novo transcription profiling

Outlines

SFP

deletion or duplication in Van

Single Feature Polymorphisms and indels

SFPs

SFP

Sequence polymorphisms

SPFs and indels (>200bp) were removed before gene expression analysis

SFPsa

FDR Col > Vanc Van > Colc Total

11.82% 135769 14934 150703

7.66% 126443 9479 135922

5.22% 118381 6662 125043

3.88% 110861 4979 115840

3.15% 104115 3820 107935

Indelsb

Model selection deletion duplication Total

BICd 518 22 540

AICe 1645 136 1781

Deletions vs duplications

Distribution of indels along chromosomes

Sequence polymorphisms

Gene expression variation

Splicing variation

A functional network of differentially spliced genes

HMM for a de novo transcription profiling

Outlines

Additive, dominant and maternal effects of gene expression

The linear model

Gene probe Intensity ~ additive + dominant + maternal + εin

ten

sity

Co

l

Van

F1c

F1v

additivematernal

dominant

genotypes

Gene expression variation between genotypes

  Deltaa Sig+b Sig-c Total Falsed FDR

additive

0.5 4911 3967 8878 901 10.15%

1 2674 1736 4410 215 4.88%

1.5 1626 923 2549 70 2.76%

1.8 1249 676 1925 39 2.03%

2.5 690 334 1024 13 1.24%

dominant

0.5 1511 3190 4701 767 16.31%

1 405 1521 1926 186 9.65%

1.5 157 811 968 67 6.93%

1.8 92 575 667 40 5.99%

2.5 41 270 311 14 4.65%

maternal 

0.5 5998 95 6093 735 12.06%

1 2046 8 2054 151 7.37%

1.5 480 0 480 49 10.29%

1.8 163 0 163 28 17.33%

2.5 41 0 41 9 22.84%

Mea

n g

ene

inte

nsi

ty

Van d

omina

nt

Col do

mina

nt

over

dom

inan

tF1

v do

min

ant

F1c

dom

inan

t

Mat

erna

l pa

tern

al

The pattern of gene expression inheritance

Col Van F1v F1c

The pattern of gene expression inheritance

Enrichment in GO functional categories

GO enrichment for additive dominant maternal effect genes

Defense response genes are highly expressed in F1 hybrid lines, while many growth related pathway are down-regulated

Sequence polymorphisms

Gene expression variation

Splicing variation

A functional network of differentially spliced genes

HMM for a de novo transcription profiling

Outlines

Default expression status of exon and intron

Exons: correction for gene expression

corrected by gene mean

corrected by a gene median

splicing index (Meanexon/Meangene)

Introns: direct comparison

Exon/intron probe Intensity ~ additive + dominant + maternal + ε

Differential exon splicing

Exon probe Intensity ~ additive + dominant + maternal + ε

  Deltaa Sig+b Sig-c Total Falsed FDR

 

corrected by gene mean

 

0.3 287 190 477 559 117%

0.4 177 129 306 205 67.0%

0.5 127 109 236 97 41.0%

0.6 92 86 178 55 30.8%

0.7 77 69 146 34 23.4%

Corrected by gene median

0.3 523 280 803 556 69.2%

0.4 328 172 500 203 40.6%

0.5 223 120 343 96 28.0%

0.6 154 76 230 54 23.5%

0.7 123 52 175 34 19.3%

Splicing index

0.3 407 235 642 425 66.0%

0.4 292 175 467 132 28.0%

0.5 230 143 373 50 13.0%

0.6 178 104 282 21 7.50%

0.7 148 86 234 10 4.30%

Differential intron splicing

Intron probe Intensity ~ additive + dominant + maternal + ε

Deltaa Sig+b Sig-c Total Falsed FDR

0.3 561 1034 1595 332 20.8%

0.4 405 523 928 85 9.17%

0.5 316 352 668 28 4.26%

0.6 239 220 459 12 2.61%

0.7 202 155 357 7 1.91%

0.8 176 120 296 5 1.53%

Differential exon splicing is predominantly additive in F1 hybrids

Some dominant effect in differential intron splicing in F1 hybrids

Comparison for enrichment in known alternatively spliced exons

    Threshold 1 Threshold 2

    Called Not called Called Not called

Corrected by gene mean

Known 28 991 7 1012

Not known 397 55145 90 55452

Fold enrichment 3.92 4.26

p-value 5.97E-09 1.90E-03

Corrected by gene median polish

Known 24 995 6 1013

Not known 430 55112 85 55457

Fold enrichment 3.09 3.86

p-value 3.60E-06 6.14E-03

Splicing index

Known 24 1093 5 1112

Not known 537 72328 88 72777

Fold enrichment 2.96 3.72

p-value 6.84E-06 1.36E-02

AT1G21350

AT1G34180

AT1G76170

AT1G29120

AT1G51350

AT1G80960

AT1G07350

Experimental determined FDR for differential splicing

 

# of significant

calls

estimated FDR

# of tested # of

confirmedexperimental

FDR

Exon (corrected by mean)

477 117% 45 22 51.1%

111 20.8% 18 10 44.4%

Exon (corrected by median)

500 40.6% 40 21 47.5%

103 15.60% 17 10 41.2%

Exon (splicing

index)

642 66.0% 50 23 54.0%

102 1.00% 20 10 50.0%

intron459 2.61% 65 38 41.5%

195 1.15% 58 33 43.1%

Sequence polymorphisms

Gene expression variation

Splicing variation

A functional network of differentially spliced genes

HMM for a de novo transcription profiling

Outlines

Enrichment of differentially spliced genes in chloroplast thylakoid

enrichment of differentially spliced genes

Chloroplast thylakoid

Differrentially spliced genes which are located in chloroplast thylakoid

Photosynthesis related genes

AT5G38660 APE1 (Acclimation of Photosynthesis to Environment) mutant has altered acclimation responses

AT1G07350 transformer serine/arginine-richribonucleoprotein putative

AT1G55310 SC35-like splicing factor 33 kD(SCL33)

AT2G29210 splicing factor PWIdomain-containing protein

AT5G04430 KH domain-containing proteinNOVA putative

Splicing regulator tend to be differentially spliced

Sequence polymorphisms

Gene expression variation

Splicing variation

A functional network of differentially spliced genes

HMM for a de novo transcription profiling

Outlines

Generalized tiling array HMM

3-state HMM Discrete distribution for emission probability Transition probability counts for probe spacing Baum-Welch parameter estimation

(by Jake Byrnes)

An example of HMM detected segments

A nice model also needs better array

Array density is not enough to distinguish exon/intron boundaries

Probe quality

Differential segments

>=3 continuous probes with posterior probability >0.99.

Differentially expressed genes

annotated genes for which ≥33% of their probes reside within the observed differential segments.

Differentially spliced genes

annotated genes for which <33% of probes resided within the differential segment, or annotated genes containing ≥2 differential segments with different states.

Novel gene boundaries

differential segments with >= 5 probes extending beyond annotated gene boundary

Novel transcripts

differential segments with >= 5 probes and outside any annotated gene boundary.

Length distribution of segments called by HMM

Comparison of annotation-based analysis and HMM

    Col > Van Van > Col Total

Annotation

differential expressiona 1626 923 2549

differential exonic splicingb 287 190 477

differential intronic splicingc 202 155 357

HMM

differential expressiond 1654 962 2616

differential splicinge 874 530 1404

un-annotated transcriptf 34 42 76

un-annotated 5'g 30 19 49

un-annotated 3'g 28 8 36

Comparison of annotation-based analysis and HMM

 

AnnotationExpression(Col>Van)

Expression(Van>Col)

Splicing(Col>Van)

Splicing(Van>Col)

HMM1654 962 921 550

Expression(Col>Van)

1626 1270   225  

Expression(Van>Col)

923   727 132

Splicing(Col>Van)

441 181 47  

Splicing(Van>Col)

300   90 38

Acknowledgements

Justin Borevitz

Yan Li

Christos Noutsos

Geoff Morris

Andy Cal

Jake Byrnes

Josh Rest