+ All Categories
Home > Documents > RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on...

RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on...

Date post: 20-Feb-2018
Category:
Upload: vodung
View: 226 times
Download: 0 times
Share this document with a friend
24
RNA-Seq plant data analysis K. G´ orczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. Siatkowski Department of Mathematical and Statistical Methods, Poznan University of Life Sciences July 03, 2014 K. G´ orczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. Siatkowski RNA-Seq plant data analysis
Transcript
Page 1: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

RNA-Seq plant data analysis

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I.Siatkowski

Department of Mathematical and Statistical Methods,Poznan University of Life Sciences

July 03, 2014

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 2: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

Outline

1 What RNA-Seq is?

2 Steps of RNA-Seq experiment

3 Methods for differential analysis

4 Normalization

5 Differential expression analysis

6 Graphical presentation of the results

7 Conclusions

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 3: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

What RNA-Seq is?

RNA-Seq is high-through sequencing technology that sequence cDNAin order to get information about RNA content in the sample.

Analysis of gene expression

Detection of alternative splicing events

Gene fusion transcripts

Cancer research

Disease diagnosis

Cellular processes in plants or animals

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 4: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

What RNA-Seq is?

Platforms:

Applied Biosystems’ SOLiD:

based on sequencing by ligation

Illumina’s Genome Analyzer:

based on sequencing by synthesis

Roche’s 454 Life Sciences:

based on sequencing by pyrosequencing

Ion Torrent:

based on sequencing by Ion semiconductor

Pacific Biosciences:

based on single-molecule sequencing

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 5: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

What RNA-Seq is?

Platforms:

Applied Biosystems’ SOLiD:

based on sequencing by ligation

Illumina’s Genome Analyzer:

based on sequencing by synthesis

Roche’s 454 Life Sciences:

based on sequencing by pyrosequencing

Ion Torrent:

based on sequencing by Ion semiconductor

Pacific Biosciences:

based on single-molecule sequencing

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 6: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

What RNA-Seq is?

Platforms:

Applied Biosystems’ SOLiD:

based on sequencing by ligation

Illumina’s Genome Analyzer:

based on sequencing by synthesis

Roche’s 454 Life Sciences:

based on sequencing by pyrosequencing

Ion Torrent:

based on sequencing by Ion semiconductor

Pacific Biosciences:

based on single-molecule sequencing

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 7: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

What RNA-Seq is?

Platforms:

Applied Biosystems’ SOLiD:

based on sequencing by ligation

Illumina’s Genome Analyzer:

based on sequencing by synthesis

Roche’s 454 Life Sciences:

based on sequencing by pyrosequencing

Ion Torrent:

based on sequencing by Ion semiconductor

Pacific Biosciences:

based on single-molecule sequencing

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 8: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

What RNA-Seq is?

Platforms:

Applied Biosystems’ SOLiD:

based on sequencing by ligation

Illumina’s Genome Analyzer:

based on sequencing by synthesis

Roche’s 454 Life Sciences:

based on sequencing by pyrosequencing

Ion Torrent:

based on sequencing by Ion semiconductor

Pacific Biosciences:

based on single-molecule sequencing

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 9: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

What RNA-Seq is?

Figure: Next-generation sequencing platforms.

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 10: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

Steps of RNA-Seq experiment

An RNA-Seq experiment takes a sample of purified RNA, shears itand makes it possible to perform an RNA analysis through cDNAsequencing, and, in the effect, obtaining millions of short reads (Osh-lack et al. 2010). Subsequently, this experiment covers a low-levelanalysis (such as base calling, read mapping, alignment), a high-levelanalysis (such as normalization, quantification expression, differen-tial expression) and, finally, biological insight.

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 11: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

Steps of RNA-Seq experiment

Figure: An RNA-Seq experiment design (Oshlack et al. 2010).

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 12: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

Methods for differential analysis

based on transcriptome based on genome

Cuffdiff edgeRRSEM and EBSeq DESeq

BitSeq SAMseqNOISeqEBSeqbaySeq

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 13: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

Methods for differential analysis

We assume that the read counts Kij are derived from a Negative Binomial (NB)distribution, as follows:

Kij∼NB(µij , φ),

where

Kij is the observed count for gene i = 1, . . . ,G and sample j = 1, . . . ,m

µij is a mean

φ is the dispersion

mean and variance are related by σ2ij = µij + µ

2ijφ

µij = λijmj , where mj is the library size for sample j and λij is the level ofgene expression

The null hypothesis is tested for each gene:

H0 ∶ λiA = λiB ,

where λiA, λiB represent the mean values of expression levels of gene i betweensample A and sample B.

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 14: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

Normalization

Normalization is an essential step in the analysis of differentiallyexpressed genes. It allows us to compare the expression betweensamples with regard to some technical effects from the sequencing.There are several normalization methods used for a count-baseddifferential analysis (Dillies et al. 2012):

Reads per Kilobase per Million reads (RPKM)

TotalCount (TC)

Trimmed mean of M-values (TMM)

Median

Quantile

Upper Quartile

Relative log expression (RLE)

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 15: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

Normalization

name method

dRPKM dRPKMj =109Kij

∑Gi=1Kij Li

dTC dTCj = ∑Gi=1 Kij

∑Gi=1

Kir

dTMM log2(dTMMj ) = ∑

G ′i=1 wijMij

∑G ′i=1

wij

Mij =log2(Kij /dTCj )log2(Kir /dTCr ) , wij =

dTCj −KijdTCj

Kij+ dTCr −Kir

dTCr Kir

dmed dmedj = mediani

Kij

(∏mv=1

Kiv )1/m

dQ dQj= 10

log10(Qj−1m ∑m

j=1 log10 Qj)

dUQ dUQj

= ∑Gi=1 KijUQj

dsam dsamj = ∑i∈G ′′ Kij∑i∈G ′′ ∑m

j=1Kij

G ′′ - set of genes whose GOFi ∈ (0.25, 0.75),

where GOFi = ∑mj=1

(Kij−dTCj Ki ⋅)2

dTCj

Ki ⋅

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 16: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

Differential expression analysis

ComputationsAll computations were performed in the R environment (Ihaka and Gen-tleman 1996). Four R packages were used to find normalization factors:DESeq (Anders and Huber 2010), edgeR (Robinson et al. 2010), EBSeq(Leng et al. 2013) and SAMSeq (Li and Tibshirani 2011). They are freelyavailable from the Bioconductor repository (www.bioconductor.org). TheedgeR package was used to find differentially expressed genes.

DataThe data is presented in the form of a rectangular table of integer values,where genes correspond to rows and samples correspond to columns. Eachcell of this table tells us how many reads have been mapped to some gene insome sample. The dataset used here is derived from NBPSeq package. AnRNA-Seq dataset from a pilot study of the defense response of Arabidopsisto infection by bacteria.

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 17: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

Differential expression analysis

Ilummina platform26222 genes6 samples2 treatment groups

Figure: Arabidopsis(source: http://www.abcam.com/index.html?pageconfig=resourcerid=11682pid=5).

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 18: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

Graphical presentation of the result

Figure: Venn diagram for differentially expressed genes.

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 19: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

Graphical presentation of the result

Figure: MA-plots of differentially expressed genes.

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 20: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

Graphical presentation of the result

[fig a]

[fig b]

Figure: Boxplot of baseMean and log2FC [fig a], overlap for differentiallyexpressed genes [fig b].

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 21: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

Conclusions

RNA-Seq provides new knowledge of the range of gene expres-sion levels

Normalization of count data... sequencing biases:

within-sample biasbetween-sample bias

R packages... useful in assessing the results of the RNA-Seqexperiment

Gene expression... for understanding the impact of the geneson certain diseases and cellular processes

Graphical presentation of the results facilitates the evaluationof the results

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 22: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

References

Oshlack A., Robinson M.D., Young M.D. (2010). From RNA-seq reads todifferential expression results. Genome Biology 11: 220.

Dillies M., Rau A., Aubert J., Hennequet-Antier C., Jeanmougin M., Ser-vant N., Keime C., Marot G., Castel D., Estelle J., Guernec G., Jagla B.,Jouneau L., Laloe D., Le Gall C., Schaeffer B., Le Crom S., Guedj M.,Jaffrezic F. (2012). A comprehensive evaluation of normalization methodsfor Illumina high-throughput RNA sequencing data analysis. Briefings inBioinformatics. doi:10.1093/bib/bbs046.

Ihaka R., Gentleman R. (1996). R: A Language for Data Analysis andGraphics. Journal of Computational and Graphical Statistics 5(3): 299-314. doi: 10.1080/10618600.1996.10474713.

Anders S., Huber W. (2010). Differential expression analysis for sequencecount data. Genome Biology 11: R106.

Robinson M., McCarthy D., Smyth G.K. (2010). edgeR: a Bioconductorpackage for differential expression analysis of digital gene expression data.Bioinformatics 26, 139-140. doi:10.1093/bioinformatics/btp616.

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 23: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

Leng N., Dawson J., Thomson J.A., Ruotti V., Rissman A.I., Smits B.M.G.,Haag J.D., Gould M.N., Stewart R.M., Kendziorski Ch. (2013). EBSeq:An empirical Bayes hierarchical model for inference in RNA-seq experi-ments. Bioinformatics doi: 10.1093/bioinformatics/btt087.

Li J., Tibshirani R. (2011). Finding consistent patterns: a nonparametricapproach for identifying differential expression in RNA-Seq data. StatisticalMethods in Medical Research 22(5): 519-36. doi: 10.1177/0962280211428386.

Di Y., Schafer D.W., Cumbie J.S., Chang J.H. (2011). The NBP NegativeBinomial Model for Assessing Differential Gene Expression from RNA-Seq.Statistical Applications in Genetics and Molecular Biology 10(1).

Bullard J.H., Purdom E., Hansen K.D., Dudoit S. (2010). Evaluation ofstatistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11: 94.

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis

Page 24: RNA-Seq plant data analysis - COBORU. Gorczak... · RNA-Seq plant data analysis ... based on sequencing by Ion semiconductor ... Roche’s 454 Life Sciences: based on sequencing by

Thank you for your attention!

K. Gorczak, K. Klamecka, A. Szabelska, J. Zyprych-Walczak, I. SiatkowskiRNA-Seq plant data analysis


Recommended