Date post: | 04-Jul-2015 |
Category: |
Technology |
Upload: | bioinformatics-open-source-conference |
View: | 973 times |
Download: | 0 times |
Integra(ve Biology Program Is(tuto Nazionale di Gene(ca Molecolare
Italy
Bio-‐NGS: BioRuby plugin to conduct programmable workflows for
Next Genera?on Sequencing data
Raoul J.P. Bonnal
July 15, 2011 BOSC, Vienna, Austria
co-‐authors Francesco Strozzi Valeria Ranzani
Toshiaki Katayama
Bio-‐Gem
• a soOware generator for crea(ng BioRuby plugins
• last year (@BOSC 2010) was an idea and a prototype • Features: – Extend BioRuby – Modularity – Easy • sharing:packaging:publishing
– Just Code !
July 15, 2011 BOSC, Vienna, Austria
authors: Raoul J.P. Bonnal, Pjotr Prins, Toshiaki Katayama
• bio-‐assembly (0.1.0) • bio-‐blastxmlparser (0.6.1) • bio-‐bwa (0.2.2) • bio-‐cnls_screenscraper (0.1.0) • bio-‐emboss_six_frame_nucleo(de_sequences (0.1.0) • bio-‐gem (0.2.2) • bio-‐genomic-‐interval (0.1.2) • bio-‐gff3 (0.8.6) • bio-‐graphics (1.4) • bio-‐hello (0.0.0)
• bio-‐isoelectric_point (0.1.1) • bio-‐kb-‐illumina (0.1.0) • bio-‐lazyblastxml (0.4.0) • bio-‐logger (0.9.0) • bio-‐nexml (0.0.1) • bio-‐ngs (0.2.1) • bio-‐octopus (0.1.1) • bio-‐samtools (0.2.4) • bio-‐sge (0.0.0) • bio-‐tm_hmm (0.2.0) • bio-‐ucsc-‐api (0.1.0)
Dev: hcps://github.com/helios/bioruby-‐gem Install: gems install bio-‐gem
Bio-‐NGS
An Applica(on A SoOware Development Framework
A Project Environment
July 15, 2011 BOSC, Vienna, Austria
Applica(on
• Stand alone – Auto install everything it needs – sandbox/isola*on-‐ – System-‐wide or User (RVM –Ruby Version Manager-‐)
• Mul( plagorms – Linux, OS X – MRI, JRuby
• Command line – Thor: a simple and efficient tool for building self-‐documen(ng command
line u(li(es
• Common syntax to different applica(ons • Collec(on of Tasks – Basic, Advanced
July 15, 2011 BOSC, Vienna, Austria
RVM hcps://rvm.beginrescueend.com/ Thor hcps://github.com/wycats/thor
SoOware Development Framework
• Expand BioRuby’s func(onali(es to NGS • API + Consistent Namespace
• Integrate third-‐party tools • Wrapping : quick, easy support, increase produc(vity
• Binding : low-‐level func(onali(es
• Modular, reuse other plug-‐ins • BioBwa (binding) • BioSamtools (binding)
July 15, 2011 BOSC, Vienna, Austria
Project Environment
• Directory scaffold • Customize – Tasks : Thor or Rake (Ruby DSL) – Configura(ons: YAML
• History • Embedded DB – SQLite3
July 15, 2011 BOSC, Vienna, Austria
? ?
Tools
July 15, 2011 BOSC, Vienna, Austria
Bow(e/
BWA
Quant
FASTX-‐Toolkit
More…
Tools
July 15, 2011 BOSC, Vienna, Austria
Bio-‐NGS
Primary:
Pre-‐Processing
Conversion, Filter, FASTX-‐Toolkit
Illumina bcl FASTQ
Secondary:
Alignment
TopHat Bow(e/BWA Samtools
Ter(ary:
Knowledge
Cufflinks
Quant Differen(al Expression
Ontology
BAM
Execu(on Local Distributed/Parallel
More… More… More…
Wrapper
module Bio module Ngs module Cufflinks class Compare include Bio::Command::Wrapper
set_program Bio::Ngs::Utils.binary("cufflinks/cuffcompare") use_aliases
add_option "outprefix", :type => :string, :aliases => '-o', :default => "Comparison"
add_option "gtf_combine_file", :type => :string, :aliases => '-i' add_option "gtf_reference", :type => :string, :aliases => '-r' add_option "only_overlap", :type => :boolean, :aliases => '-R' add_option "discard_transfrags", :type => :boolean, :aliases => '-M’
end end end end
July 15, 2011 BOSC, Vienna, Austria
Wrapper
July 15, 2011 BOSC, Vienna, Austria
module Bio module Ngs module Cufflinks class Compare include Bio::Command::Wrapper
set_program Bio::Ngs::Utils.binary("cufflinks/cuffcompare") use_aliases
add_option "outprefix", :type => :string, :aliases => '-o', :default => "Comparison"
add_option "gtf_combine_file", :type => :string, :aliases => '-i' add_option "gtf_reference", :type => :string, :aliases => '-r' add_option "only_overlap", :type => :boolean, :aliases => '-R' add_option "discard_transfrags", :type => :boolean, :aliases => '-M’
end end end end
irb(main):001:0> require ‘bio-ngs’ irb(main):001:1> cuffcompare = Bio::Ngs::Cufflinks::Compare.new irb(main):001:2> cuffcompare.params = {….} irb(main):001:3> cuffcompare.run(:arguments=>[…])
=> #<Bio::Ngs::Cufflinks::Compare:0x0000000c1630f8 @program="/usr/local/lib/ruby/gems/1.9.1/gems/bio-ngs-0.2.1/lib/bio/ngs/ext/bin/linux/cufflinks/cuffcompare", @options={}, @params={}>
Tasks No binary found with this name: setupBclToQseq.py No binary found with this name: fastq_quality_boxplot_graph.sh No binary found with this name: blastn No binary found with this name: blastx WARNING: no program is associated with BCLQSEQ task, does
not make sense to create a thor task. WARNING: no program is associated with BLASTN task, does not
make sense to create a thor task. WARNING: no program is associated with BLASTX task, does not make sense to create a thor task. bwa -‐-‐-‐ biongs bwa:aln:long [FASTQ] -‐-‐file-‐out=FILE_OUT -‐-‐prefix=PREFIX biongs bwa:aln:short [FASTQ] -‐-‐file-‐out=FILE_OUT -‐-‐
prefix=PREFIX biongs bwa:index:long [FASTA] biongs bwa:index:short [FASTA] biongs bwa:sam:paired -‐-‐fastq=one two three -‐-‐file-‐
out=FILE_OUT -‐-‐prefix=PREFIX -‐-‐sai=one two three biongs bwa:sam:single [SAI] -‐-‐fastq=FASTQ -‐-‐file-‐out=FILE_OUT -‐-‐prefix=PREFIX
convert -‐-‐-‐-‐-‐-‐-‐ biongs convert:bam:extract_genes BAM GENES -‐-‐ensembl-‐
release=N -‐o, -‐-‐output=OUTPUT biongs convert:bam:merge -‐i, -‐-‐input-‐bams=one two three biongs convert:bam:sort BAM [PREFIX] biongs convert:bcl:qseq:convert RUN OUTPUT [JOBS] biongs convert:illumina:de:gene DIFF GTF biongs convert:illumina:de:isoform DIFF GTF biongs convert:illumina:de:rename_qs DIFF_FILE NAMES biongs convert:illumina:fastq:trim_b FASTQ biongs convert:illumina:humanize:build_compare_kb GTF biongs convert:illumina:humanize:isoform_exp GTF ISOFORM biongs convert:qseq:fastq:by_file FIRST OUTPUT biongs convert:qseq:fastq:by_lane LANE OUTPUT biongs convert:qseq:fastq:by_lane_index LANE INDEX OUTPUT
biongs convert:qseq:fastq:samples_by_lane SAMPLES LANE OUTPUT
history -‐-‐-‐-‐-‐-‐-‐ biongs history:8 # Task convert:illumina:de:isoform
PARAMETERS: /Users/bonnalraoul/Desktop/RRep16giugno/DE_lane1-‐2-‐3-‐4-‐6-‐8/DE_lane1-‐2-‐3-‐4-‐6-‐8/isoform_exp.diff /Users/bonnalraoul/Desktop/RRep16giugno/COMPARE_lane1-‐2-‐3-‐4-‐6-‐8/COMPA...
homology -‐-‐-‐-‐-‐-‐-‐-‐ biongs homology:convert:blast2text [XML FILE] -‐-‐file-‐
out=FILE_OUT biongs homology:convert:go2json
biongs homology:db:export [TABLE] -‐-‐fileout=FILEOUT biongs homology:db:init biongs homology:download:all biongs homology:download:goannota(on biongs homology:download:uniprot biongs homology:load:blast [FILE] biongs homology:load:goa biongs homology:report:blast
ontology -‐-‐-‐-‐-‐-‐-‐-‐ biongs ontology:db:export [TABLE] -‐-‐fileout=FILEOUT biongs ontology:db:init biongs ontology:download:all biongs ontology:download:go biongs ontology:download:goslim biongs ontology:load:genego [FILE] biongs ontology:load:go [FILE] biongs ontology:report:go
project -‐-‐-‐-‐-‐-‐-‐ biongs project:new [NAME] biongs project:update [TYPE]
quality -‐-‐-‐-‐-‐-‐-‐ biongs quality:boxplot FASTQ_QUALITY_STATS biongs quality:fastq_stats FASTQ biongs quality:illumina_b_profile_raw FASTQ -‐-‐read-‐length=N biongs quality:illumina_b_profile_svg FASTQ -‐-‐read-‐length=N biongs quality:reads FASTQ biongs quality:reads_coverage FASTQ_QUALITY_STATS biongs quality:scacerplot EXPR1 EXPR2 OUTPUT biongs quality:trim FASTQ
rna -‐-‐-‐ biongs rna:compare GTF_REF OUTPUTDIR
GTFS_QUANTIFICATION biongs rna:idx2fasta INDEX FASTA biongs rna:mapquant DIST INDEX OUTPUTDIR FASTQS biongs rna:quant GTF OUTPUTDIR BAM biongs rna:tophat DIST INDEX OUTPUTDIR FASTQS
sff -‐-‐-‐ biongs sff:extract [FILE]
July 15, 2011 BOSC, Vienna, Austria
No binary found with this name: setupBclToQseq.py No binary found with this name: fastq_quality_boxplot_graph.sh No binary found with this name: blastn No binary found with this name: blastx WARNING: no program is associated with BCLQSEQ task, does
not make sense to create a thor task. WARNING: no program is associated with BLASTN task, does not
make sense to create a thor task. WARNING: no program is associated with BLASTX task, does not make sense to create a thor task. bwa -‐-‐-‐ biongs bwa:aln:long [FASTQ] -‐-‐file-‐out=FILE_OUT -‐-‐prefix=PREFIX biongs bwa:aln:short [FASTQ] -‐-‐file-‐out=FILE_OUT -‐-‐
prefix=PREFIX biongs bwa:index:long [FASTA] biongs bwa:index:short [FASTA] biongs bwa:sam:paired -‐-‐fastq=one two three -‐-‐file-‐
out=FILE_OUT -‐-‐prefix=PREFIX -‐-‐sai=one two three biongs bwa:sam:single [SAI] -‐-‐fastq=FASTQ -‐-‐file-‐out=FILE_OUT -‐-‐prefix=PREFIX
convert -‐-‐-‐-‐-‐-‐-‐ biongs convert:bam:extract_genes BAM GENES -‐-‐ensembl-‐
release=N -‐o, -‐-‐output=OUTPUT biongs convert:bam:merge -‐i, -‐-‐input-‐bams=one two three biongs convert:bam:sort BAM [PREFIX] biongs convert:bcl:qseq:convert RUN OUTPUT [JOBS] biongs convert:illumina:de:gene DIFF GTF biongs convert:illumina:de:isoform DIFF GTF biongs convert:illumina:de:rename_qs DIFF_FILE NAMES biongs convert:illumina:fastq:trim_b FASTQ biongs convert:illumina:humanize:build_compare_kb GTF biongs convert:illumina:humanize:isoform_exp GTF ISOFORM biongs convert:qseq:fastq:by_file FIRST OUTPUT biongs convert:qseq:fastq:by_lane LANE OUTPUT biongs convert:qseq:fastq:by_lane_index LANE INDEX OUTPUT
biongs convert:qseq:fastq:samples_by_lane SAMPLES LANE OUTPUT
history -‐-‐-‐-‐-‐-‐-‐ biongs history:8 # Task convert:illumina:de:isoform
PARAMETERS: /Users/bonnalraoul/Desktop/RRep16giugno/DE_lane1-‐2-‐3-‐4-‐6-‐8/DE_lane1-‐2-‐3-‐4-‐6-‐8/isoform_exp.diff /Users/bonnalraoul/Desktop/RRep16giugno/COMPARE_lane1-‐2-‐3-‐4-‐6-‐8/COMPA...
homology -‐-‐-‐-‐-‐-‐-‐-‐ biongs homology:convert:blast2text [XML FILE] -‐-‐file-‐
out=FILE_OUT biongs homology:convert:go2json
biongs homology:db:export [TABLE] -‐-‐fileout=FILEOUT biongs homology:db:init biongs homology:download:all biongs homology:download:goannota(on biongs homology:download:uniprot biongs homology:load:blast [FILE] biongs homology:load:goa biongs homology:report:blast
ontology -‐-‐-‐-‐-‐-‐-‐-‐ biongs ontology:db:export [TABLE] -‐-‐fileout=FILEOUT biongs ontology:db:init biongs ontology:download:all biongs ontology:download:go biongs ontology:download:goslim biongs ontology:load:genego [FILE] biongs ontology:load:go [FILE] biongs ontology:report:go
project -‐-‐-‐-‐-‐-‐-‐ biongs project:new [NAME] biongs project:update [TYPE]
quality -‐-‐-‐-‐-‐-‐-‐ biongs quality:boxplot FASTQ_QUALITY_STATS biongs quality:fastq_stats FASTQ biongs quality:illumina_b_profile_raw FASTQ -‐-‐read-‐length=N biongs quality:illumina_b_profile_svg FASTQ -‐-‐read-‐length=N biongs quality:reads FASTQ biongs quality:reads_coverage FASTQ_QUALITY_STATS biongs quality:scacerplot EXPR1 EXPR2 OUTPUT biongs quality:trim FASTQ
rna -‐-‐-‐ biongs rna:compare GTF_REF OUTPUTDIR
GTFS_QUANTIFICATION biongs rna:idx2fasta INDEX FASTA biongs rna:mapquant DIST INDEX OUTPUTDIR FASTQS biongs rna:quant GTF OUTPUTDIR BAM biongs rna:tophat DIST INDEX OUTPUTDIR FASTQS
sff -‐-‐-‐ biongs sff:extract [FILE]
Tasks
July 15, 2011 BOSC, Vienna, Austria
Recall an old
analysis
Basic Advanced
Repor(ng
Keep everything organized
N o B i n a r y Task disabled
Tasks
July 15, 2011 BOSC, Vienna, Austria
class Rna < Thor
desc "mapquant DIST INDEX OUTPUTDIR FASTQS", "map and quantify" method_option :paired, :type => :boolean, :default => false, :desc => 'Are reads paired? If you chose this option pass just the basename of the file without forward/reverse and .fastq' def mapquant(dist, index, outputdir, fastqs) #tophat invoke :tophat, [dist, index, outputdir, fastqs], :paired=>options.paired #cufflinks quantification on gtf invoke :quant, ["#{index}.gtf", File.join(outputdir,"quantification"), File.join(outputdir,"accepted_hits_sort.bam")] end … end
class Rna < Thor
desc "mapquant DIST INDEX OUTPUTDIR FASTQS", "map and quantify" method_option :paired, :type => :boolean, :default => false, :desc => 'Are reads paired? If you chose this option pass just the basename of the file without forward/reverse and .fastq' def mapquant(dist, index, outputdir, fastqs) #tophat invoke :tophat, [dist, index, outputdir, fastqs], :paired=>options.paired #cufflinks quantification on gtf invoke :quant, ["#{index}.gtf", File.join(outputdir,"quantification"), File.join(outputdir,"accepted_hits_sort.bam")] end … end
Tasks
July 15, 2011 BOSC, Vienna, Austria
class Rna < Thor
# you'll end up with 3 accept file, regular, sorted, sorted-indexed desc "tophat DIST INDEX OUTPUTDIR FASTQS", "run tophat as from command line, default 6 processors and then create a sorted bam indexed." method_option :paired, :type => :boolean, :default => false, :desc => 'Are reads paired? If you chose this option pass just the…’ Bio::Ngs::Tophat.new.thor_task(self, :tophat) do |wrapper, task, dist, index, outputdir, fastqs| wrapper.params = task.options #merge passed options to the wrapper. wrapper.params = {"mate-inner-dist"=>dist, "output-dir"=>outputdir, "num-threads"=>6, "solexa1.3-quals"=>true} fastq_files = task.options[:paired] ? ["#{fastqs}_forward.fastq","#{fastqs}_reverse.fastq"] : ["#{fastqs}"] wrapper.run :arguments=>[index, fastq_files ].flatten, :separator=>"="
accepted_hits_bam_fn = File.join(outputdir, "accepted_hits.bam") task.invoke "convert:bam:sort", [accepted_hits_bam_fn] # call the sorting procedure. end end
class Rna < Thor
desc "mapquant DIST INDEX OUTPUTDIR FASTQS", "map and quantify" method_option :paired, :type => :boolean, :default => false, :desc => 'Are reads paired? If you chose this option pass just the basename of the file without forward/reverse and .fastq' def mapquant(dist, index, outputdir, fastqs) #tophat invoke :tophat, [dist, index, outputdir, fastqs], :paired=>options.paired #cufflinks quantification on gtf invoke :quant, ["#{index}.gtf", File.join(outputdir,"quantification"), File.join(outputdir,"accepted_hits_sort.bam")] end … end
Tasks
July 15, 2011 BOSC, Vienna, Austria
class Rna < Thor
# you'll end up with 3 accept file, regular, sorted, sorted-indexed desc "tophat DIST INDEX OUTPUTDIR FASTQS", "run tophat as from command line, default 6 processors and then create a sorted bam indexed." method_option :paired, :type => :boolean, :default => false, :desc => 'Are reads paired? If you chose this option pass just the…’ Bio::Ngs::Tophat.new.thor_task(self, :tophat) do |wrapper, task, dist, index, outputdir, fastqs| wrapper.params = task.options #merge passed options to the wrapper. wrapper.params = {"mate-inner-dist"=>dist, "output-dir"=>outputdir, "num-threads"=>6, "solexa1.3-quals"=>true} fastq_files = task.options[:paired] ? ["#{fastqs}_forward.fastq","#{fastqs}_reverse.fastq"] : ["#{fastqs}"] wrapper.run :arguments=>[index, fastq_files ].flatten, :separator=>"="
accepted_hits_bam_fn = File.join(outputdir, "accepted_hits.bam") task.invoke "convert:bam:sort", [accepted_hits_bam_fn] # call the sorting procedure. end end
class Rna < Thor desc "quant GTF OUTPUTDIR BAM ", "Genes and transcripts quantification" Bio::Ngs::Cufflinks::Quantification.new.thor_task(self, :quant) do |wrapper, task, gtf, outputdir, bam| wrapper.params = task.options wrapper.params = {"num-threads" => 6, "output-dir" => outputdir, "GTF" => gtf } wrapper.run :arguments=>[bam], :separator => "=" end end
Next? • Support more soOware, not only NGS
• Wrap EMBOSS on the fly reading acd files
• Tune according to hardware • Share tasks
– Thor & Rake • Improve JRuby compa(bility • Contributes • Scalability
– Cloud ? BioLinux – BioHub: distribute tasks using messaging
• Ac(veMQ • Stomp • Ac(veMessaging • Adapters for Queuing Systems
July 15, 2011 BOSC, Vienna, Austria
• Support more soOware, not only NGS • Wrap EMBOSS on the fly reading acd files
• Tune according to hardware • Share tasks
– Thor & Rake • Improve JRuby compa(bility • Contributes • Scalability
– Cloud ? BioLinux – BioHub: distribute tasks using messaging
• Ac(veMQ • Stomp • Ac(veMessaging • Adapters for Queuing Systems
Next?
July 15, 2011 BOSC, Vienna, Austria
Acknowledgments
July 15, 2011 BOSC, Vienna, Austria
Serena Cur( Debora Mascheroni Valeria Parente Valeria Ranzani1
Anna Ripamon( Grazisa Rossez Riccardo L. Rossi Roberto Sciarreca
Massimiliano Pagani
Francesco Strozzi1,3
Alessandra Stella
Groningen Bioinforma(cs Centre
Pjotr Prins2
Laboratory of Genome Database
Toshiaki Katayama1,2
1 bio-‐ngs, 2 bio-‐gem, 3 bio-‐bwa, 4 bio-‐samtools
Dan MacLean 4
The Genome Analysis Centre
Ricardo Ramirez-‐Gonzalez 4
Ques(ons ?
July 15, 2011 BOSC, Vienna, Austria
INFO E-‐mail: [email protected] / [email protected] Dev : hcp://github/helios/bioruby-‐ngs Docs : hcps://github.com/helios/bioruby-‐ngs/blob/master/README.rdoc Wiki : hcp://bioruby.open-‐bio.org/wiki/Next_Genera(on_Sequencing BioRuby-‐ML: hcp://lists.open-‐bio.org/mailman/lis(nfo/bioruby Irc: #bioruby ( irc.freenode.org )