Training courses on MicroScope platform
NGS data exploration and other tools
NGS data exploration
MicroScope – Extensions to handle NGS Data
3 major tools relying on the same structure
PALOMA (SNPs, Indels)
PALOMA_db, PALOMA_call
EvolGenomes (MySQL)
Microscope-PALOMA
MicroScope (Annotations)
Pipeline
Database
User Interface
TAMARA (Transcriptomic)
Mapping pipeline DESeq pipeline
RNASeq (MySQL)
Microscope-TAMARA
PALOMA - Polymorphism Analyses in Light Of MAssive DNA sequencing TAMARA - Transcriptome Analyses based on MAssive sequencing of RnAs
Mapping
Using NGS for micro-variations (SNPs)
Reference ≠ mapped reads
• Error in the reference sequence => correction of the genome
• SNPs/Indels => mutations in an evolved strain
Annotated reference sequence …ACGTGGTCAGCTGCAGGCACGCCAGTTC…
AAAATAAA
Envoi de la requête
A dedicated interface: “SNPs/Indels” Principle: It offers tools to explore all the mutations predicted by PALOMA in
the different evolved strains within lineages
3 modes of analysis 3 focuses
• The results table of the « COMPARATIVE » tool :
Genomic Objects Evolved clones are grouped by lineage and ordered
according to their timepoint in each lineage.
Ø Mutations are replaced in a genomic and functional context:
x 5’ 3’ end begin
Mut Upstream gene Downstream gene
Ø The dynamics of genomic changes can easily be drawn during the studied evolutionary time
A dedicated interface: “SNPs/Indels” – Comparative analysis
A dedicated interface: “SNPs/Indels” – Graphical analysis
� Global visualisation of the mutations of an evolved organism in comparison to a reference genome
� Detection of potential mutation « hot spots »
C/A
A/-
-/T
T/A
MicroScope – Extensions to handle NGS Data
3 major tools relying on the same structure
PALOMA (SNPs, Indels)
PALOMA_db, PALOMA_call
EvolGenomes (MySQL)
Microscope-PALOMA
MicroScope (Annotations)
Pipeline
Database
User Interface
TAMARA (Transcriptomic)
Mapping pipeline DESeq pipeline
RNASeq (MySQL)
Microscope-TAMARA
PALOMA - Polymorphism Analyses in Light Of MAssive DNA sequencing TAMARA - Transcriptome Analyses based on MAssive sequencing of RnAs
Using NGS for Transcriptome analysis (RNA-seq)
Design experiment
RNA preparation
Prepare libraries Sequence Analysis
Ø A typical RNA-seq experiment:
Ø RNA-seq basic protocol/workflow:
RNA-seq definition RNA-seq (RNA Sequencing) is a technology that uses the capabilities of NGS to reveal a snapshot of RNA presence and quantification at a given moment.
Ø RNA-seq experiment involves making: § A library: collection of cDNA fragments which are flanked by specific constant sequences
(adapters) that are necessary for sequencing.
§ This library is then sequenced using short-read sequencing which produces millions of short sequence reads that correspond to individual cDNA fragments.
§ Biological replicates
Ø Transcript abundance Ø Quantify gene expression Ø Quantify differential gene expressions:
§ At a given time for different experimental conditions § At different time for the same experimental condition
Ø Find new genes Ø Small RNA identification Ø Transcriptome reconstruction (operon structure) Ø Identify Transcription Start Sites (TSS) Ø Mutations: Indels/SNP
Questions addressed by transcriptome analysis (prokaryote)
Quantitative data (~ expression)
Qualitative data (~ structure)
Gene A Gene B ? New gene?
Gene A
Gene A
cond1
cond2
Count = 4
Count = 8
Fold = 2 ?
transcripts
reverse transcription fragmentation amplification sequencing (of a random subset)
reads
reads mapped to reference genome
Functional, Process level annotation
• Principle
• Mapping
Ø Make a connection between two objects of the same kind
Ø Application: map reads on the reference sequence
RNA-seq data : mapping statistics
CDS ? Experiment
list
Coverage Integrative Genome Viewer http://www.broadinstitute.org/igv/ • Relatively easy to use and setup • Display alignment, coverage and annotation data • Integrated using Java Web Start
IGV : Analyze the coverage of genomic objects
• Relatively easy to use and setup • Able to display alignment, coverage and annotation data • Integrated using Java Web Start
Tools : Visualizing coverage in IGV
transcripts
reverse transcription fragmentation amplification sequencing (of a random subset)
reads
reads mapped to reference genome
2 4 1
3 17 1
4 15 2
3 5 1
Cond. A
Cond. A
Cond. B
Cond. B
Functional, Process level annotation
• Principle
raw read count per gene
RNA-seq data : Read Count analysis
Number of hits on genomic objects
RNA-seq data : Read Count analysis
Direct Access
To Tools
Genomic Objects features Tags Count
!Beware! Here « sense » means in the same sense as the genomic object!
transcripts
reverse transcription fragmentation amplification sequencing (of a random subset)
reads
reads mapped to reference genome
2 4 1
3 17 1
4 15 2
3 5 1
Cond. A
Cond. A
Cond. B
Cond. B
DESeq tool: normalization for library size and test for differential expression
Gene Mean normalized read count
Fold change B/A
P-value
Gene A 3 1,4 0,5
Gene B 10,25 3,5 0,01
Gene C 1,25 1,5 0,8
Functional, Process level annotation
• Principle
raw read count per gene DESeq : (Anders & Huber, 2010, Nat Proceedings)
RNA-seq data : differential expression
RNA-seq data : differential expression result
MeV
http://www.tm4.org/mev/
RNA-seq data : Differential expression statistics
Graphical representation of differential expression
Hierarchical clustering
Genes list
Exercises
Exo1: Read count analysis tool Ø Find the top 10 genes in term of read count for each condition
Exo 2: Differential Expression analysis Ø Find differentially expressed genes on IMP conditions in
comparison to WT control condition. (FDR cutoff <= 0.05 and abs(L2FoldChange) >= 4 )
Ø What comments can you make about the comparisons?
Using TAMARA: Acinetobacter baumannii ATCC 17978 (Chang et al, 2014, BMC Genomics 15:815)