+ All Categories
Home > Documents > Linux and RNA-Seq read alignment - Oregon...

Linux and RNA-Seq read alignment - Oregon...

Date post: 05-Mar-2018
Category:
Upload: duongnga
View: 222 times
Download: 6 times
Share this document with a friend
26
Linux and Linux and RNA-Seq read RNA-Seq read alignment alignment Brian J. Knaus Brian J. Knaus USDA Forest Service USDA Forest Service Pacific Northwest Research Station Pacific Northwest Research Station 1
Transcript
Page 1: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

Linux andLinux andRNA-Seq read RNA-Seq read

alignmentalignment

Brian J. KnausBrian J. KnausUSDA Forest ServiceUSDA Forest Service

Pacific Northwest Research StationPacific Northwest Research Station

1

Page 2: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

OutlineOutline

•Intro to LinuxIntro to Linux•Reference typesReference types•Read filteringRead filtering•Short read alignmentShort read alignment

2

Page 3: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

3

The Linux operating systemThe Linux operating system

•Many ‘flavors’ of Linux (Ubuntu, fedora, CentOS, openSUSE, Slackware).

•Frequently includes a GUI (Gnome, KDE).

•Strength is in the shell, a programmer’s OS.

•Permissions.

•Multiple shells (bash, tcsh, ksh).

•Text editors (gedit, vi, emacs).

•Finding help.

Page 4: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

Putty: http://www.chiark.greenend.org.uk/~sgtatham/putty/

Xming: http://www.straightrunning.com/XmingNotes/

Interacting with a server (PC options)

Page 5: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

lsls –lhcd ~cd ..pwdmvcpmkdirdfrmrmdirrm –rf # Will delete everything without asking.cat filename.txthead filename.txt less filename.txtgedit filename.txt &topchmod u+x filename.txttar –xvzf file.tar.gz

(Google ‘linux cheat sheet’)

Shell commands

Page 6: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

Tab completionhistory

Shell commands

Page 7: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

7

Finding help with LinuxFinding help with Linux

$ man command

$ info command

Google ‘Linux what you need help on’.

O’reilly books (http://oreilly.com/).

Page 8: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

8

Reference typesReference types

•From a genome project (model organisms).•De novo or from cDNA.

Are all isoforms present?

How will exon skipping affect inference of regulation?

Page 9: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

9

What’s in a name?What’s in a name?

•Bowtie truncates reference names at spaces.•Some characters don’t mix well with the sequence ontologies.http://www.sequenceontology.org/resources/gff3.html

Note the difference between sequence ontology and gene ontology.http://www.geneontology.org/

Page 10: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

@HWI-EAS121:1:1:0:952#0/1CGTTNCCACTTCCTCCATCATGTCATCATGTGCGACAGGA+HWI-EAS121:1:1:0:952#0/1aab^D\babbbabbbbabbaaaabaabaaa_`aaaaa]PY@HWI-EAS121:1:1:0:405#0/1CGTTNTAAAGGTGCACCAGGGATCAAATCAATGGAATGCT+HWI-EAS121:1:1:0:405#0/1aa^[DVa^`^_Y`a^a`[\^\Z^aaYZ`a`X__]ZZ_]`_@HWI-EAS121:1:1:0:724#0/1CGTTNCATGCCCTTCTTTAATTTTTACACATGGTTCTTCT+HWI-EAS121:1:1:0:724#0/1aa`[D^aa`aaaaaaaaa_R`aaaaaaaa`aa`Y`aa``a@HWI-EAS121:1:1:0:666#0/1TTGTNAAAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAG+HWI-EAS121:1:1:0:666#0/1a`bOD[]R]`a__aT^YX\a`aMXaa[a[_a\HT\_``\[@HWI-EAS121:1:1:0:1591#0/1TTGTNCTCACCTATAATTTGACTTTGACATGCTACCTAGC+HWI-EAS121:1:1:0:1591#0/1aaaYD[aaa`aaaaWZaaaaa``_aaaa`aa`_V_``Y[a

Fastq file

40-mer sequences

Page 11: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

11

Read filteringRead filtering

•Adapter dimers.•Fastq quality format (Phred, Illumina pre1.3, Illumina post1.3).http://maq.sourceforge.net/qual.shtml

•Poly(A).

•Non-target organism.

Page 12: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

12

Alignment softwareAlignment software

•Bowtie: http://bowtie-bio.sourceforge.net/index.shtml

Persistent index

Heterogeneous read-length

•BWA: http://bio-bwa.sourceforge.net/

Persistent index

Heterogenous read length

Gapped alignment

•CASHX: http://jcclab.science.oregonstate.edu/?q=node/view/56095

Non-Smith-Waterman alignment

SAMTools: http://samtools.sourceforge.net/

Manipulate SAM files.

Page 13: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

13

Index creation - BowtieIndex creation - Bowtie

mkdir btindexmv rna_ref.fa btindexcd btindexbowtie-build rna_ref.fa rna_refbowtie-inspect -s rna_refcd ..

Page 14: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

14

Read alignment - BowtieRead alignment - Bowtie

mkdir btoutcd btout

bowtie -q -n 2 -S ../btindex/rna_ref ../fastq/sample1.fq > sample1.sam

samcounter.pl -a ../btindex/rna_ref.fa -b sample1.sam

samtools view -b -S sample1.sam > sample1.bam

samtools sort sample1.bam sample1

samtools pileup -f ../btindex/rna_ref.fa sample1.bam > sample-pileup.txt

samtools index sample1.bam

Page 15: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

15

Read alignment - BowtieRead alignment - Bowtie

#!/bin/tcsh

set index='../btindex/rna_ref'set reads='../fastq/sample1.fq'set samp='sample1'

##### ##### ##### ##### ###### Main.

bowtie -q -n 2 -S $index $reads > $samp.samsamcounter.pl -a $index.fa -b $samp.samsamtools view -b -S $samp.sam > $samp.bamsamtools sort $samp.bam $sampsamtools pileup -f $index.fa $samp.bam > $samp-pileup.txtsamtools index $samp.bam

##### ##### ##### ##### ###### EOF.

Page 16: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

16

Read alignment - BowtieRead alignment - Bowtie

#!/bin/tcsh

set index=“../btindex/rna_ref”set reads=“../fastq/sample1.fq”set samp=“sample1”

##### ##### ##### ##### ###### Main.

easyqsub.pl -a "bowtie -q -n 2 -S $index $reads > $samp.sam"

easyqsub.pl -a "samcounter.pl -a $index.fa -b $samp.sam"

easyqsub.pl -a "samtools view -b -S $samp.sam > $samp.bam"

easyqsub.pl -a "samtools sort $samp.bam $samp"

easyqsub.pl -a "samtools pileup -f $index.fa $samp.bam > $samp-pileup.txt"

easyqsub.pl -a "samtools index $samp.bam"

##### ##### ##### ##### ###### EOF.

Page 17: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

17

Alignment viewer - SAMtoolsAlignment viewer - SAMtools

samtools tview sample1.bam ../btindex/rna_ref.fa

Page 18: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

18

Index creation - BWAIndex creation - BWA

mkdir bwaindexcp btindex/rna_ref.fa bwaindex/cd bwaindexbwa index -a is -p rna_ref rna_ref.fa

Page 19: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

19

Read alignment - BWARead alignment - BWA

cd ..mkdir bwaoutcd bwaout

bwa aln -o 0 ../bwaindex/rna_ref ../fastq/sample1.fq > sample1.sai

bwa samse ../bwaindex/rna_ref sample1.sai ../fastq/sample1.fq > sample1.sam

samcounter.pl -a ../btindex/rna_ref.fa -b sample1.sam

samtools view -b -S sample1.sam > sample1.bam

samtools sort sample1.bam sample1

samtools pileup -f ../btindex/rna_ref.fa sample1.bam > sample-pileup.txt

samtools index sample1.bam

Page 20: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

20

Alignment viewer - SAMtoolsAlignment viewer - SAMtools

samtools tview sample1.bam ../btindex/rna_ref.fa

Page 21: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

21

SAM file formatSAM file format

@HD VN:1.0 SO:sorted@PG TopHat VN:1.0.13 CL:/local/cluster/bin/tophat -p 4 --solexa1.3-quals../indexes/psme_ref ../psme_seqs.fqILLUMINA-3AB384_0001:6:24:19059:8781#GATT 0 0_54_255 1 255 80M *0 0TCTTCTTCATGTTTGGCACGTGTATTCGGGCCTACTTCGCCTTTCCTTCACAGTAGGCGCCTTATCATTATTGGTCAGTTCCCCCCCCCCCCCCCCDCCCCCCCC@CBCBBCCBCCCCCCCCCCCCCCCCCCCDCD@C@CCCC4=CCBCCCCAC>B>BBCNM:i:1HWI-EAS121_0024_FC61F8DAAXX:7:101:7452:15154#CTGT 0 0_54_255 17 25576M * 0 0CACGTGTATTCGGGCCTACTTCGCCTTTCCTTCACAGTAGGCGCCTTGTCATTATTGGTCAGTTATGACCTTAATTGGGGGGGGGGFEGFFGFEEFFBEECEFFFFFGGDGFDDGE:FBBFEGFFD?DEDEFB=DDD=ECCC=EAACDEDC=NM:i:0

@header line1 – file format version@header line2 – program which created the file

1 Query (read) name2 flag3 Reference name4 Leftmost mapping position5 Mapping quality6 CIGAR string7 Reference name of mate8 Position of the mate9 Template length10 Fragment sequence11 Fragment quality

Page 22: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

Gene cb_a cb_b yk_a yk_bisotig18613_gene=isogroup07808_length=677_numCo

ntigs=117 18 139 159

isotig01880_gene=isogroup00225_length=652_numContigs=4

11 10 162 56

isotig07160_gene=isogroup01638_length=3698_numContigs=4

31 81 276 226

isotig06362_gene=isogroup01321_length=1396_numContigs=4

32 31 149 91

isotig06005_gene=isogroup01197_length=1204_numContigs=4

52 68 169 198

isotig06363_gene=isogroup01321_length=1470_numContigs=4

21 27 73 100

contig29123_gene=isogroup00629_length=686 30 15 75 161isotig30058_gene=isogroup19254_length=1101_numC

ontigs=131 36 75 400

contig50604_gene=isogroup01657_length=1247 272 405 1153 724contig21101_gene=isogroup01657_length=559 47 96 264 165

isotig05419_gene=isogroup01011_length=1938_numContigs=4

32 49 103 126

contig03433_gene=isogroup00629_length=496 21 10 55 71isotig05877_gene=isogroup01156_length=2570_numC

ontigs=491 70 154 762

Page 23: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

23Parkhomchuck et al. 2009. Transcriptome analysis by strand-specific sequencing of complimentary DNA. Nucleic Acids Research 37(18):e123

Strand specificityStrand specificity

Page 24: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

24

Page 25: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

25

Page 26: Linux and RNA-Seq read alignment - Oregon Statepeople.oregonstate.edu/~knausb/rna_seq/rnaseq_align_v2.pdf · Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific

26


Recommended