Download - NGS data exploration - france-bioinformatique.fr · Envoi de la requête A dedicated interface: “SNPs/Indels” Principle: It offers tools to explore all the mutations predicted

Training courses on MicroScope platform

NGS data exploration and other tools

NGS data exploration

MicroScope – Extensions to handle NGS Data

3 major tools relying on the same structure

PALOMA (SNPs, Indels)

PALOMA_db, PALOMA_call

EvolGenomes (MySQL)

Microscope-PALOMA

MicroScope (Annotations)

Pipeline

Database

User Interface

TAMARA (Transcriptomic)

Mapping pipeline DESeq pipeline

RNASeq (MySQL)

Microscope-TAMARA

PALOMA - Polymorphism Analyses in Light Of MAssive DNA sequencing TAMARA - Transcriptome Analyses based on MAssive sequencing of RnAs

Mapping

Using NGS for micro-variations (SNPs)

Reference ≠ mapped reads

•  Error in the reference sequence => correction of the genome

•  SNPs/Indels => mutations in an evolved strain

Annotated reference sequence …ACGTGGTCAGCTGCAGGCACGCCAGTTC…

AAAATAAA

Envoi de la requête

A dedicated interface: “SNPs/Indels” Principle: It offers tools to explore all the mutations predicted by PALOMA in

the different evolved strains within lineages

3 modes of analysis 3 focuses

•  The results table of the « COMPARATIVE » tool :

Genomic Objects Evolved clones are grouped by lineage and ordered

according to their timepoint in each lineage.

Ø  Mutations are replaced in a genomic and functional context:

x 5’ 3’ end begin

Mut Upstream gene Downstream gene

Ø  The dynamics of genomic changes can easily be drawn during the studied evolutionary time

A dedicated interface: “SNPs/Indels” – Comparative analysis

A dedicated interface: “SNPs/Indels” – Graphical analysis

�  Global visualisation of the mutations of an evolved organism in comparison to a reference genome

�  Detection of potential mutation « hot spots »

C/A

A/-

-/T

T/A

MicroScope – Extensions to handle NGS Data

3 major tools relying on the same structure

PALOMA (SNPs, Indels)

PALOMA_db, PALOMA_call

EvolGenomes (MySQL)

Microscope-PALOMA

MicroScope (Annotations)

Pipeline

Database

User Interface

TAMARA (Transcriptomic)

Mapping pipeline DESeq pipeline

RNASeq (MySQL)

Microscope-TAMARA

PALOMA - Polymorphism Analyses in Light Of MAssive DNA sequencing TAMARA - Transcriptome Analyses based on MAssive sequencing of RnAs

Using NGS for Transcriptome analysis (RNA-seq)

Design experiment

RNA preparation

Prepare libraries Sequence Analysis

Ø  A typical RNA-seq experiment:

Ø  RNA-seq basic protocol/workflow:

RNA-seq definition RNA-seq (RNA Sequencing) is a technology that uses the capabilities of NGS to reveal a snapshot of RNA presence and quantification at a given moment.

Ø  RNA-seq experiment involves making: §  A library: collection of cDNA fragments which are flanked by specific constant sequences

(adapters) that are necessary for sequencing.

§  This library is then sequenced using short-read sequencing which produces millions of short sequence reads that correspond to individual cDNA fragments.

§  Biological replicates

Ø  Transcript abundance Ø  Quantify gene expression Ø  Quantify differential gene expressions:

§  At a given time for different experimental conditions §  At different time for the same experimental condition

Ø  Find new genes Ø  Small RNA identification Ø  Transcriptome reconstruction (operon structure) Ø  Identify Transcription Start Sites (TSS) Ø  Mutations: Indels/SNP

Questions addressed by transcriptome analysis (prokaryote)

Quantitative data (~ expression)

Qualitative data (~ structure)

Gene A Gene B ? New gene?

Gene A

Gene A

cond1

cond2

Count = 4

Count = 8

Fold = 2 ?

transcripts

reverse transcription fragmentation amplification sequencing (of a random subset)

reads

reads mapped to reference genome

Functional, Process level annotation

•  Principle

•  Mapping

Ø Make a connection between two objects of the same kind

Ø Application: map reads on the reference sequence

RNA-seq data : mapping statistics

CDS ? Experiment

list

Coverage Integrative Genome Viewer http://www.broadinstitute.org/igv/ •  Relatively easy to use and setup •  Display alignment, coverage and annotation data •  Integrated using Java Web Start

IGV : Analyze the coverage of genomic objects

•  Relatively easy to use and setup •  Able to display alignment, coverage and annotation data •  Integrated using Java Web Start

Tools : Visualizing coverage in IGV

transcripts


reads


2 4 1

3 17 1

4 15 2

3 5 1

Cond. A

Cond. A

Cond. B

Cond. B


•  Principle

raw read count per gene

RNA-seq data : Read Count analysis

Number of hits on genomic objects

RNA-seq data : Read Count analysis

Direct Access

To Tools

Genomic Objects features Tags Count

!Beware! Here « sense » means in the same sense as the genomic object!

transcripts


reads


2 4 1

3 17 1

4 15 2

3 5 1

Cond. A

Cond. A

Cond. B

Cond. B

DESeq tool: normalization for library size and test for differential expression

Gene Mean normalized read count

Fold change B/A

P-value

Gene A 3 1,4 0,5

Gene B 10,25 3,5 0,01

Gene C 1,25 1,5 0,8


•  Principle

raw read count per gene DESeq : (Anders & Huber, 2010, Nat Proceedings)

RNA-seq data : differential expression

RNA-seq data : differential expression result

MeV

http://www.tm4.org/mev/

RNA-seq data : Differential expression statistics

Graphical representation of differential expression

Hierarchical clustering

Genes list

Exercises

Exo1: Read count analysis tool Ø  Find the top 10 genes in term of read count for each condition

Exo 2: Differential Expression analysis Ø  Find differentially expressed genes on IMP conditions in

comparison to WT control condition. (FDR cutoff <= 0.05 and abs(L2FoldChange) >= 4 )

Ø  What comments can you make about the comparisons?

Using TAMARA: Acinetobacter baumannii ATCC 17978 (Chang et al, 2014, BMC Genomics 15:815)