Date post: | 13-Jul-2015 |
Category: |
Technology |
Upload: | jeremy-leipzig |
View: | 963 times |
Download: | 2 times |
Taming Snakemake
1/27/14
Why Make?
What are Make's advantages (over Perl and shell scripts)?
Make forces you to think about file transformation in terms of inputs and outputs, recipes and rules. In Perl you are forced to think at the level of variables, conditionals, and loops. In Shell you are forced to think like a caveman.
Unfortunately, bioinformatics is still largely about files and their suffixes. Make has a very powerful syntax based almost entirely around file suffixes.
Make knows what's been made and what hasn't. Make can be interrupted and restarted safely, and without overwriting finished work.
Make knows what's changed and what hasn't. If an input is newer than an output, it will attempt to rebuild the output.
Make allows you to add new input files without worrying about overwriting old ones.
Make is well supported. There are 1333 Make questions on SO alone.
When people see a Makefile, they immediately know how to run it.
Make does not force you to wrap shell statements in quotes.
Make is a DSL. It will attempt to validate your syntax.
Make is ancient, ubiquitous, and reliable.
Make can parallelize with --jobs.
Make recipes encourage reuse.
https://share.chop.edu/pages/viewpage.action?pageId=138478819
Make review
http://github.research.chop.edu/BiG/err_chip_seq/blob/master/Makefile
Pipelines and Workflows
Other pipelines
Ruffus GKNO
Queue
Why Snakemake?
Addresses Makefile weaknesses without throwing out the good stuff
Difficult to implement control flow
No cluster support
Inflexible wildcards
Too much reliance on sentinal files
No reporting mechanism
Johannes Köster
Syntax
Make Snakemake
Variables
Targets
Rules
Utilities
Logs - wire them up manually
Cluster support pretty decent
Cores/jobs/resources
source /nas/is1/leipzig/martin/variome-env/bin/activatesnakemake --directory /nas/is1/leipzig/martin/snake-env --snakefile /nas/is1/leipzig/martin/snake-env/Snakefile -c qsub -j 16
source /mnt/isilon/cbmi/variome/leipzig/martin/respublica-env/bin/activatesnakemake --directory /mnt/isilon/cbmi/variome/leipzig/martin/snake-env --snakefile /mnt/isilon/cbmi/variome/leipzig/martin/snake-env/Snakefile -c qsub -j 16
Useful stuff
dry-runs
keep-going
touch
version changes
workflow diagrams
Python legal
Client websites with Jekyll
Jekyll is a templating engine for blogs that accepts Markdown
Layouts use the Liquid markup
http://mitomap.org/martin-rna-seq/
A workflow that reports itself
Avoiding Sweave-Hell
The bad way
Cache-ing chunks?
Avoiding Sweave-Hell
Avoiding Sweave-Hell
R/Snakemake integration
git submodule add [email protected]:BiG/rna-seq-common-functions.git common/rna-seq
Leave a paper trail
Reproducible Checklist
repository github.research.chop.edu
workflow of some kind from beginning to end
website at mybic.chop.edu
Ties that bind