+ All Categories
Home > Science > Nadia Davidson - Introduction to rna-seq

Nadia Davidson - Introduction to rna-seq

Date post: 10-May-2015
Category:
Upload: australian-bioinformatics-network
View: 561 times
Download: 2 times
Share this document with a friend
Description:
The central dogma of genetics is that the genome, comprised of DNA, encodes many thousands of genes that can be transcribed into RNA. Following this, the RNA may be translated into amino acids giving a functional protein. While the genome of an individual will be identical for each cell throughout their body, the number of transcribed copies of each gene, as RNA, will differ due to the different functional requirement of each tissue type. An important area of research within genetics is to study the genome in‐action, through RNA. For example, by comparing the quantities of each gene’s RNA between different tissue types, through development, in disease or in different environments – known as differential gene expression analysis. RNA‐Seq, or high throughput RNA sequencing, has accelerated research in this area. The technology works by reverse transcribing the RNA back into DNA, sheering it into smaller fragments, then reading each fragments sequence in parallel to give millions of short “reads”, each between approximately 50‐200 bases in length. With this data comes a computational and statistical challenge because the biology must be inferred from millions of short sequences. Along with technical biases, there is true biological variability between samples of the same type, which must be accounted for. In this talk I discuss the applications of RNA‐Seq, its challenges and some of the bioinformatics strategies being employed to analyse this complex data. In particular, I will focus on the steps involved in differential gene expression analysis, for both model organisms, like human, and more exotic organisms, without a sequenced genome. First presented at the 2014 Winter School in Mathematical and Computational Biology http://bioinformatics.org.au/ws14/program/
Popular Tags:
23
Nadia Davidson Murdoch Childrens Research Institute Introduction to RNA-Seq Winter School in Mathematical and Computational Biology 2014
Transcript
Page 1: Nadia Davidson -  Introduction to rna-seq

Nadia Davidson Murdoch Childrens Research Institute

Introduction to RNA-Seq

Winter School in Mathematical and Computational Biology 2014

Page 2: Nadia Davidson -  Introduction to rna-seq

The  central  dogma  of  molecular  biology  

Image  from  wikipedia  

Page 3: Nadia Davidson -  Introduction to rna-seq

Alterna9ve  splicing  

DNA   RNA  

Page 4: Nadia Davidson -  Introduction to rna-seq

Transcrip9onal  abundance  

DNA   RNA  

2  copies  

mul9ple  copies,  different  “splice”    variants  

Page 5: Nadia Davidson -  Introduction to rna-seq

Transcrip9onal  abundance  

RNA  –  cell  type  A   RNA  –  cell  type  B  

Different  quan99es,  different  “splice”  variants  

Page 6: Nadia Davidson -  Introduction to rna-seq

A  

G  

Which  copy  is  expressed  more?  

DNA  

G  

Base  change  aKer  transcrip9on  

DNA  

RNA  

Structural  rearrangement  in    the  genome  fuses  Gene  A  to  Gene  B  

DNA  

RNA  

Gene  A                                                    Gene  B  

Benefits  and  opportuni9es  of  RNA-­‐seq  •  Differen9al  expression  

–  Comparing  the  expression  between  different  samples  

•  Whole  transcriptome  sequencing    –  Annota9on  of  new  exons,  transcribed  regions,  genes  or  non-­‐coding  RNAs  

–  The  ability  to  look  at  alterna9ve  splicing  

–  Allele  specific  expression  –  RNA  edi9ng  –  Fusion  genes  in  cancer  –  Etc.  

Page 7: Nadia Davidson -  Introduction to rna-seq

RNA-­‐Seq  

@HWI-ST945:93:c02g4acxx GGAAAAGGCAGAGGGTGGACTAAATGCTCAATCATGGGATTCTAATCTGG + CCCFFFFFHHHHHJJFGIIJJJJJJJJJJJJJGJJJJJGIIJJJJJJJJJJJIHIJJJJJIIJJJ

Millions  to  billions    of  these  

Page 8: Nadia Davidson -  Introduction to rna-seq

RNA-­‐Seq  data  analysis  

•  Whole  transcriptome  sequencing:  – What  were  the  original  full  length  transcript  sequences?  

–  This  Talk  •  Differen8al  expression:  

–  Do  we  have  more  blue  transcripts  in  one  cell  type  than  another?  

–  Next  Talk  

Page 9: Nadia Davidson -  Introduction to rna-seq

What  were  the  original  full  length  transcript  sequences…  

 if  we  have  a  reference  genome?  

Page 10: Nadia Davidson -  Introduction to rna-seq

The  reference  annota9on  •  Model  organisms  have  a  reference  annota9on  

   •  E.g.  ENSEMBL,  RefSeq,  UCSC,  GENCODE  all  provide  the  posi9on  

of  known  genes  in  the  reference  genome  •  OKen,  we  assume  these  are  the  full  set  of  transcripts  of  a  gene  •  But  how  do  we  know  which  gene  a  read  came  from?  

 

ScalechrX:

50 kb hg1972,800,000 72,850,000 72,900,000

Ensembl Gene Predictions - Ensembl 75ENST00000602584ENST00000438453ENST00000421245

ENST00000373504ENST00000373502ENST00000498407ENST00000498318

chrX (q13.2) 22.2 12 q21.1 Xq23 24 Xq25 Xq28

UCSC  screen  shot  

Page 11: Nadia Davidson -  Introduction to rna-seq

Mapping  reads  to  the  genome  

Cole  Trapnell  &  Steven  L  Salzberg,  Nature  Biotechnology  27,  455  -­‐  457  (2009)  

•  Some  reads  can  be  mapped  wholly  to  the  genome  (grey)  •  Other  reads  need  to  be  ‘split’  across  splice  sites  (blue)  •  So#ware:  Tophat,  STAR,  Subread  

Page 12: Nadia Davidson -  Introduction to rna-seq

What  were  the  original  full  length  transcript  sequences…  

 if  we  have  a  reference  genome  but  want  to  find  something  novel?  

Page 13: Nadia Davidson -  Introduction to rna-seq

Map  reads  

Graph  splicing  events  

Traverse  the  graph  

Genome  guided  assembly  

Gene  func9on?  e.g.  BLAST  against  the  protein  database  or  a  related  species  (Blast2GO)  Jeffrey  A.  Mar9n  &  Zhong  Wang  Nature  Reviews  Gene9cs  12,  671-­‐682  (October  2011)  

So#ware:  Cufflinks,  Scripture  

Page 14: Nadia Davidson -  Introduction to rna-seq

What  were  the  original  full  length  transcript  sequences…  

 if  we  don’t  have  a  reference  

genome?  

Page 15: Nadia Davidson -  Introduction to rna-seq

De  novo  transcriptome  assembly  •  Like  genome  assembly  •  But  also  needs  to  deal  with:  

–  Splicing  –  Non-­‐uniform  coverage    

•  SoKware:  (Trinity,  Oases,  TransAbyss)  

0 20 40 60 80

05

00

00

15

00

00

25

00

00

35

00

00

Reads (Millions)N

um

be

r o

f tr

an

scri

pts

A

0 20 40 60 80

05

00

10

00

15

00

20

00

Reads (Millions)

Me

an

tra

nsc

rip

t le

ng

th (

bp

)

B

0 20 40 60 80

0500

10

00

15

00

Reads (Millions)

Me

dia

n t

ran

scri

pt

len

gth

(b

p)

C

0 20 40 60 80

05

00

10

00

15

00

20

00

25

00

30

00

Reads (Millions)

N5

0 (

bp

)

D

0 20 40 60 80

01

00

00

20

00

03

00

00

40000

50000

Reads (Millions)

Nu

mb

er

of

loci

E

0 20 40 60 800

50

01

00

01

50

02

00

02

50

0Reads (Millions)

Lo

ci p

er

mill

ion

re

ad

s

F

0 20 40 60 80

02

00

04

00

06

00

08

00

0

Reads (Millions)

Tra

nsc

rip

ts p

er

mill

ion

re

ad

s

G

0 20 40 60 80

02

46

810

Reads (Millions)

Ave

rag

e t

ran

scri

pts

pe

r lo

cus

H

Samples

C.multidentata H.californensis P.robusta H.imbricata S.similis D.gigas Mouse!C10Figure 3

Francis  et.  al.,  BMC  Genomics  2013  

•  Challenges:  –  Accuracy    –  Computa9onal  requirements  –  Lots  of  transcripts.  Need  to  filter  and  cluster  transcripts  into  genes  (e.g.  with  Corset,  CD-­‐HIT-­‐EST,  assembler  informa9on  etc.)  

Page 16: Nadia Davidson -  Introduction to rna-seq

What  were  the  original  full  length  transcript  sequences…  

 if  we  have  a  reference  genome  but  

it’s  not  very  good?  

Page 17: Nadia Davidson -  Introduction to rna-seq

More  common  than  you  may  think  

– Non-­‐model  organisms:  •  A  badly  assembled  genome  •  No  reference  genome,  but  one  of  a  related  species  

– Model  organisms:  •  Cancer  •  Poorly  assembled  regions  in  an  otherwise  good  reference  genome  

– No  standard  approach  

Page 18: Nadia Davidson -  Introduction to rna-seq

Example  -­‐  Annota9ng  the  chicken  W  sex  chromosome  

Chicken  is  a  model  organisms,  but  the  sequenced  reference  W  chromosome  is  poorly  assembled  with  missing  sequence.    Mo9va9on:  The  mechanism  for  sex  determina9on  in  birds  has  not  been  proven.  Are  there  any  novel  W  genes  which  could  be  involved?  

Source:  hkp://mac122.icu.ac.jp/gen-­‐ed/mendel-­‐gifs/13-­‐sex-­‐chromosomes.JPG  

Page 19: Nadia Davidson -  Introduction to rna-seq

Experiment  and  analysis  Extracted  and  sequenced  mRNA  from  the  gonads  of    

4  female  and  4  male  embryonic  chickens  

1.4  billion  100bp  paired-­‐end  reads  

Re-­‐assembled  the  reference  annota9on  sequences  (Ensembl),  with  a  genome  guided  assembly  (Cufflinks)  and  a  de  novo  assembly  (Abyss)  

Iden9fied  W  genes  as  those  with  female  specific  expression  

Discovered  2  novel  W  genes  and  for  1/3  of  known  W  gene  sequence  which  were  previously  incomplete,  we  found  the  full  length  sequences.    

Some  W  candidates  were  followed  up  in  the  lab  for  sex  determina9on  studies  

Page 20: Nadia Davidson -  Introduction to rna-seq

An  example  of  one  W  gene  

Ayers  et  al,  2013  Reference    Annota9on    

Genome  

Genome  guided  

Coverage  

0                                                500                                    1000                                1500                                    2000                                    2500  

194            

       0  Blastoderm  

Gonads  

De  novo  assembly  

On  the  W  chromosome  in  the  reference  chicken  genome  On  “Unknown”  con9gs  in  the  reference  chicken  genome  On  an  autosome  in  the  reference  chicken  genome  

base  posi9on  in  the  transcript  

Take  home  message:  All  approaches  have  their  strengths  and  limita9ons  

Page 21: Nadia Davidson -  Introduction to rna-seq

Summary  •  RNA-­‐seq  is  very  powerful!    

–  It  allows  both  the  transcript  sequence  and  the  rela9ve  quan99es  to  be  measured.  

–  It  has  numerous  applica9ons:  •  It  compliments  DNA  sequencing  by  telling  us  how  the  genome  is  actually  used  is  a  par9cular  cell  type.  

•  In  some  cases  (e.g.  non-­‐model  organisms)  it  can  circumvent  the  need  for  DNA  sequencing.  

– There  are  standard  pipelines  for  some  applica9ons,  but  many  require  a  problem  specific  solu9on.  Challenging  but  fun!  

Page 22: Nadia Davidson -  Introduction to rna-seq

Acknowledgements  MCRI  Bioinforma8cs    The  (Alicia)  Oshlack  Lab        

This  research  was  partly  conducted  within  the  Poultry  CRC,  established  and  supported  under  the  Australian  Government’s  Coopera9ve  

Research  Centres  Program.  

This  research  was  partly  conducted  within  the  Poultry  CRC,  established  and  supported  under  the  Australian  Government’s  Coopera9ve  

Research  Centres  Program.  

This  research  was  partly  conducted  within  the  Poultry  CRC,  established  and  supported  under  the  Australian  Government’s  Coopera9ve  

Research  Centres  Program.  

This  research  was  partly  conducted  within  the  Poultry  CRC,  established  and  supported  under  the  Australian  Government’s  Coopera9ve  

Research  Centres  Program.  

Red  Jungle  Fowl  (credit:  NHGRI)  

Chicken  W  genes:  MCRI  Compara8ve  Development  Craig  Smith  Ka9e  Ayers      

Feel  free  to  email  me  with  ques8ons:  [email protected]  

Page 23: Nadia Davidson -  Introduction to rna-seq

More  informa9on  •  General:  

–  Wang  et  al,  RNA-­‐Seq:  a  revolu9onary  tool  for  transcriptomics,  Nature  Reviews  Gene9cs  2009  

•  Differen9al  Expression  Pipelines  and  Reviews:  –  Alicia  Oshlack  et  al.,  From  RNA-­‐seq  reads  to  differen9al  expression  results,  Genome  

Biology  2010  –  Anders  et  al.,  Count-­‐based  differen9al  expression  analysis  of  RNA  sequencing  data  using  

R  and  Bioconductor,  Nature  Protocols,  2013  –  hkp://bioinf.wehi.edu.au/RNAseqCaseStudy/  

•  Assembly  Pipelines  and  Reviews:  –  Jeffrey  A.  Mar9n1  &  Zhong  Wang,  Next-­‐genera9on  transcriptome  assembly,  Nature  

Reviews  Gene9cs  2011  –  hkps://code.google.com/p/corset-­‐project/wiki/Example  –  Hass  et  al.,  De  novo  transcript  sequence  reconstruc9on  from  RNA-­‐seq  using  the  Trinity  

plasorm  for  reference  genera9on  and  analysis,  Nature  Protocols,  2013  •  The  human  transcriptome  (ENCODE):  

–  Sarah  Djebali  et  al,  Landscape  of  transcrip9on  in  human  cells,  Nature  2012  

 


Recommended