+ All Categories
Home > Documents > The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the...

The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the...

Date post: 15-Jan-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
71
The CCBR RNA-Seq Pipeline Fathi Elloumi, Ph.D NCI CCBR 3/20/2017
Transcript
Page 1: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

TheCCBRRNA-Seq PipelineFathiElloumi,Ph.D

NCICCBR3/20/2017

Page 2: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Agenda

• Introduction• DataanalysisWorkflow• Reviewmainsteps

• CCBRRNA-Seq pipeline• Workflowoverview• QualityControlreports• PrincipalComponentAnalysisPCAanddifferentialexpressedreportsreports• Downstreamanalysisafterrunningthepipeline

• RunningtheCCBRpipeline• Usecaseanddemo

Page 3: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Agenda

• Introduction• DataanalysisWorkflow• Reviewmainsteps

• CCBRRNA-Seq pipeline• Workflowoverview• QualityControlreports• PrincipalComponentAnalysisPCAanddifferentialexpressedreportsreports• Downstreamanalysisafterrunningthepipeline

• RunningtheCCBRpipeline• Usecaseanddemo

Page 4: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

RNA-Seq Applications

4

• DifferentialGeneExpression• DifferentialTranscriptExpression• Stillconfinedtoknowntranscripts/isoforms

• TranscriptDiscovery/WholeTranscriptomeProfiling• Interestisinlookingfornewisoformsorunannotated genes

• Others• SNP/SomaticVariant/GeneFusionDetection

Page 5: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

RNASequencing

PrepareSamples

RNA-Seq projectOverview

ExperimentalDesign

PrepareSamples

RNASequencing

QCandDataAnalysis

Hypothesis

- RNAextractionprotocol- Depth- LibrarytypeSE/PE- Nb.Replicates- …

Group1 Group2

Page 6: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Best Practices• Factorinatleast3replicates(absoluteminimum),but4ifpossible(optimumminimum).Biologicalreplicatesarerecommendedratherthantechnicalreplicates.

• AlwaysprocessyourRNAextractionsatthesametime. Extractionsdoneatdifferenttimesleadtounwantedbatcheffects.

• Thereare2majorconsiderationsforRNA-Seq libraries:• IfyouareinterestedincodingmRNA,youcanselecttousethemRNAlibraryprep. Therecommendedsequencingdepthisbetween10-20Mpaired-end(PE)reads. YourRNAhastobehighquality(RIN>8).

• IfyouareinterestedinlongnoncodingRNAaswell,youcanselectthetotalRNAmethod,withsequencingdepth~25-60MPEreads. ThisisalsoanoptionifyourRNAisdegraded.

• Ideallytoavoidlanebatcheffects,allsampleswouldneedtobemultiplexedtogetherandrunonthesamelane. ThismayrequireaninitialMiSeq runforlibrarybalancing. Additionallanescanberunifmoresequencingdepthisneeded.

• IfyouareunabletoprocessallyourRNAsamplestogetherandneedtoprocesstheminbatches,makesurethatreplicatesforeachconditionareineachbatchsothatthebatcheffectscanbemeasuredandremovedbioinformatically.

6

https://bioinformatics.cancer.gov/content/rna-seq

Page 7: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Agenda

• Introduction• DataanalysisWorkflow• Reviewmainsteps

• CCBRRNA-Seq pipeline• Workflowoverview• QualityControlreports• PrincipalComponentAnalysisPCAanddifferentialexpressedreportsreports• Downstreamanalysisafterrunningthepipeline

• RunningtheCCBRpipeline• Usecaseanddemo

Page 8: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

TypicalRNA-Seq analysisworkflowSequencingFacility

Rawreads(fastq files) QCrawdata

Trimming

Alignment

Expressionquantification

Trimmedreads(fastq files)

DifferentialExpressionanalysis

Bamfiles

Gene,transcriptcounts

QCAligneddata

QCmetricsandplots

GoodQC?

QCmetricsandplots

Clustering&Visualization

Page 9: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Qualitycontrol(QC)ofrawdata

• DetectissuesrelatedtoSampleCollection,LibrarypreparationorSequencing• Needtocheck• Basequalityscore• sequencequality• Sequenceduplicationlevel• GCcontentlevel• Presenceofcontaminants

• bacteriaorvirus• Adaptorpresence

Page 10: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Alignment&quantification

HTSEQSUBREAD

Page 11: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Post-alignmentQC

• %mappedanduniquelymappedreads:70-90%• uniformityofreadcoverageovergenebody• Readdistribution• Checkforreadstrandedness• Biotypecomposition(checkforrRNA)

Page 12: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Differentialexpressionanalysis

• Whatarethegenesortranscriptsthataredifferentiallyexpressedbetweentwoormoregroups?• dostatisticaltest:

• T-test• EmpiricalBayes(moderatedt-test)• Anova (>2groups)• …

• adjustformultipletesting(FDR….)

Page 13: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Knowndifferentiallyexpressiondetectionmethods

ComparisonofsoftwarepackagesfordetectingdifferentialexpressioninRNA-seq studiesBriefingsinBioinformaticsvol 16N0I.59-70

Page 14: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Normalizationusingscalingmethods:overallgeneexpressionissameacrossallsamplesMethod Description

Totalcount(TC): Genecountsaredividedbythetotalnumberofmappedreads(orlibrarysize)associatedwiththeirsampleandmultipliedbythemeantotalcountacrossallthesamplesofthedataset

UpperQuartile(UQ): VerysimilarinprincipletoTC,thetotalcountsarereplacedbytheupperquartileofcountsdifferentfrom0inthecomputationofthenormalizationfactors

Median(Med): AlsosimilartoTC,thetotalcountsarereplacedbythemediancountsdifferentfrom0inthecomputationofthenormalizationfactors

DESeq Ascalingfactorforagiven sampleisthemedianoftheratio,foreachgene,ofitsreadcountoveritsgeometricmeanacrossallsamples

TrimmedMean ofM-values(TMM)

Ascalingfactoriscomputedastheweightedmeanoflogratiosbetweenthe sampleandthereference,afterexclusionofthemostexpressedgenesandthegeneswiththelargestlogratios

Page 15: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

• Methodfordimensionreductiontoidentifypatterns(thousandsofgenes=thousandsofdimensions)

Theeigenvectorwiththelargesteigenvalue(totalvariance)isthefirstprincipalcomponent.Thesecondlargesteigenvaluewillbethedirectionofthesecondlargestvariance.

PrincipalComponentAnalysis

Page 16: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

HierarchicalClustering

Dendrogram/tree

• branchingdiagramrepresentingahierarchyofcategoriesbasedondegreeofsimilarity

• canbedrawnforgenesand/orsamples

root branches leaves

Algorithmsforclustering:

Bottom-up:agglomerative

Heatmap

Page 17: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Agenda

• Introduction• DataanalysisWorkflow• Reviewmainsteps

• CCBRRNA-Seq pipeline• Workflowoverview• QualityControlreports• PrincipalComponentAnalysisPCAanddifferentialexpressedreportsreports• Downstreamanalysisafterrunningthepipeline

• RunningtheCCBRpipeline• Usecaseanddemo

Page 18: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

RNA-Seq Pipelineworkflow

18

STEP1:INITIALQC

STEP2:COUNTING&DEG

Page 19: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

RNA-Seq:InitialQCworkflow

- Trimmonatic:justadaptorclipping- STAR2passmode:formostsensitivenovel

junctionsdiscovery

Page 20: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Usecase:4samplesfromSEQCstudy

• MixtureofbiologicalsourcesandasetofsyntheticRNAsfromtheExternalRna ControlConsortium(ERCC)

Ø2samplesfromgroupA:Strategene UniversalHumanReferenceRNA(UHRR)– from10humancelllines-

Ø2samplesfromgroupB:Ambion HumanBrainReferenceRNA(HBRR)ØIlluminaHiSeq2000.-100bp-

Page 21: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Basequality(Qscore)

Q=-10log10 P,wherePisthebase-callingerrorprobability

Page 22: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

SampleQCreport

Page 23: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Basequalitydistribution

Warningifthelowerquartileforanybaseislessthan10,orifthemedianforanybaseislessthan25.Failureifthelowerquartileforanybaseislessthan5orifthemedianforanybaseislessthan20.

Commonreasonsforwarnings- Generaldegradationof

qualityoverthedurationoflongruns

- Lossqualityearlierintherun(bubblesinflowcell)

- Readsofdifferentlength

Page 24: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Tilesissues(bubble,smudgeordebrisinlane)

Flowcell tileheatmap showingdeviationfromtheaveragequalityforeachtile

FailureifanytileshowsameanPhred scoremorethan5lessthanthemeanforthatbaseacrossalltiles

Agoodplotshouldbeallblue!

Page 25: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Checkproportionofsequenceswithlowqualityvalues

Failure ifthemostfrequentlyobservedmeanqualityisbelow20

Forbi-modalorcomplexdistribution,shouldcheckwithpertilequalities

Page 26: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Perbasesequencecontentshouldbeuniform

Biasedfragmentation

RNA-Seq librariesproducebiasedsequencecompositionatstartoftheread(10-12bp)/doesnotaffectdownstreamanalysis

Page 27: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

GCcontentshouldbeanormaldistribution

Contaminantissue(adapterdimers=pairedofligatedadapterswithnoinsertsequence)Needtocheckoverrepresentedsequences

Page 28: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Nocalldistribution

Biasedsequencecomposition

Expected/checkwithbasequality

Page 29: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Allsequencesshouldhavethesamelength

Page 30: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Highduplicationlevelshouldbecarefullyassessed

- Technicalduplicates(PCRoveramplification)

- Biologicalduplicates- SmallRNAlibrary- Over-sequenceHigh

expressedtranscriptstoobservelow-expressedones

Page 31: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Checkforadaptersequence

Ifinsertsizesareshorterthanthereadlength->needtoremoveadaptersequence

Page 32: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

CheckforcontaminationinOver-representedsequences:

errorifanysequenceisfoundtorepresentmorethan1%ofthetotal

Page 33: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

FastqScreen:lookforBacteria/viruscontamination

Page 34: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

MultiQC report

Page 35: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

MultiQC:Multiplesamplesreport

Page 36: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

multiQC report:Mappingstatsnb.of mappedReads Mappingrate70-90%

Page 37: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

multiQC report:Picardduplicationratebypairedreads

Page 38: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

multiQC report:Picard

Page 39: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

multiQC report:RNAqualitycheck

DegradedRNAshowing3’biasincoverage

Page 40: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

multiQC report:RSEQC

Page 41: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

multiQC report:Exonscoverage

Page 42: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

multiQC report:Countcheck

Checkingunassignedrateforoverlappingregionsandmulti-mappingreads

Page 43: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

RNA-Seq:Differentialexpressionworkflow

Page 44: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

RNA-Seq:PCAreport

Page 45: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

RNA-Seq:EdgeR DEGreport(Limma,andDeseq2alsoavailable)

Page 46: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

RNA-Seq:EdgeR DEGreport

Page 47: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less
Page 48: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

EdgeR_deg_HBRR_vs_UHRR.txt

Page 49: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Whatisthemethodtouse?

DEGVenndiagram

Noclearanswer!

Compareresults:

- PCA- Sampleclustering- DEGresults

Page 50: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Visualizationandenrichmentanalysis

• Clusterthesamplesbasedonthetoprankedgenes(sd,mad,IQR..)• Pathwayenrichment(GSEA,IPA,…)• Easyuseof DEGfiles

Page 51: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

DealingwithBatcheffect

• incorporatebatcheffectasco-variateinthemodel)

Page 52: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

ViewingRNA-Seq data

• IntegrativeGenomicsViewer(IGV)• Readalignments• Splicesjunctions

Page 53: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Agenda

• Introduction• DataanalysisWorkflow• Reviewmainsteps

• CCBRRNA-Seq pipeline• Workflowoverview• QualityControlreports• PrincipalComponentAnalysisPCAanddifferentialexpressedreportsreports• Downstreamanalysisafterrunningthepipeline

• RunningtheCCBRpipeline• Usecaseanddemo

Page 54: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

CCBRPipeliner

•Offersfornow3NGSdataworkflow:RnaSeq,ExomeSEq andGenomeSeq.• Eachworkflow:

ü isversion-awareü ismodular and extensible• Multiple options/programs canbeselected for atask.

ü isreproducible• uses aconfig file

ümaintains anaudit trail (asalog file)ü runs onNIHcluster and use Queuesystemü informs user,via email,once run iscomplete 54

Page 55: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Datapreparation/Input

• Pipeliner takesinrawpaired-endNGSdata:fastq.gz files• Fastq namingconvention:• <samplename>.R1.fastq.gz,• <samplename>.R2.fastq.gz

• Pipeliner canconvertfilenamestothedesirednamingconvention• labels.txt:two-columntextfile

• SampleA_R1_001.fastq.gz TumR1_Batch1.R1.fastq.gz

• ForDEG,youneedtoknowthephenotype/groupforthesamplesandthecontrastsfordifferentialanalysis

Page 56: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

“groups.tab”file

SampleName group Sample label

sample1 treat treat1

sample2 treat treat2

sample3 treat treat3

sample4 control ctrl1

sample5 control ctrl2

sample6 control ctrl3

… … …

MandatoryFields(withoutlabels)

Onlyonefactor(youcansimulatemultifactorvariable)

Page 57: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

“contrasts.tab”file

Group1 Vs.group2

treat control

… …

Page 58: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

CCBRRNASEQPipeline (InitialQC)

58

Workingdirectory:/data/<user>/…

Datadirectory:/scratch/elloumif/SEQC4/

Page 59: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

CCBRRNASEQPipeline (DEGAnalysis)

59

Workingdirectory:/data/<user>/…

Datadirectory:/scratch/elloumif/SEQC4/

Page 60: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

RNA-Seq Output:Maindirectories

• rawQC:Fastqc resultsonrawdata• Trim:trimmeddata(adaptorcut)• QC:Fastqc resultsontrimmeddata• FQscreen:FastqScreen results(trimmeddata)• Reports:containsMultiqc reportandmainlogfileofthepipeline(snakemake.log)• DEG_genes:DEGresultsbasedongenecount+Htmlreports• DEG_genejunctions:DEGresultsbasedonjunctiongenecount+Htmlreports

Page 61: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

DEGdirectoryoutputfiles

• Limma*files(txt,png,html)• Deseq2*files• edgeR*files

Page 62: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

RNA-Seq Output:Mainfiles(mainworkingdirectory)• Bamfiles(*.bam)• rawcountdata(3methods):

• Gene:RawCountFile_gene.txt andRawCountFile_genes_filtered.txt• GeneNormalizeddata:CPM_TMM_counts.txt• RSEMresults:

• <sample>.rsem.genes.results• <sample>.rsem.isoforms.results

• EBSEQresults:• <sample>isoform..EBSeq• <sample>.isoform.EBSeq.normalized_data_matrix• <sample>.isoform.EBSeq.counts.matrix

• Run.json:configurationfile– runsettings

Page 63: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Configurationfile

Page 64: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Setupbeforerunningccbrpipeliner

• HelixandBiowulf accounts• X11client(Windows:Putty,NoMachine;Mac:Xquartz,NoMachine)• Space:• Biowulf homedirectorieshavedefaultof100GBallocation:notenoughtorunNGSpipelines.• Bestoption:havealab-wide/data/labname storageallocation,withhigherstorage

• BasicknowledgeofUnixcommands(ssh,mkdir,vi)

Page 65: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

CCBRpipeliner availability

ü https://github.com/CCBR/Pipelinerüviamodule“ccbrpipeliner”atBiowulf

65

Page 66: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

CCBRpipeliner documentation

https://github.com/CCBR/Pipeliner/blob/master/PipelinerVer1.0_documentation.pdf

66

Page 67: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Demo

Page 68: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Usecase:4samplesfromSEQCstudy

• MixtureofbiologicalsourcesandasetofsyntheticRNAsfromtheExternalRna ControlConsortium(ERCC)

Ø2samplesfromgroupA:Strategene UniversalHumanReferenceRNA(UHRR)– from10humancelllines-

Ø2samplesfromgroupB:Ambion HumanBrainReferenceRNA(HBRR)ØIlluminaHiSeq2000.-100bp-

Page 69: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Inputfiles

• Fastq files• Labels.txt• Groups.tab• Contrasts.tab

Page 70: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Outputfiles

• FASTQCreport• MultiQC report• Pca report• EdgeRreport• Rawcount files• Normalizeddatafiles

Page 71: The CCBR RNA-SeqPipeline - National Cancer Institute · Base quality distribution Warning if the lower quartile for any base is less than 10, or if the median for any base is less

Q&A


Recommended