Date post: | 10-May-2015 |
Category: |
Real Estate |
Upload: | australian-bioinformatics-network |
View: | 331 times |
Download: | 0 times |
1
Bioinformatic Alchemy 101
Transmuting dark script
matter into reusable tools
Ross Lazarus
BakerIDI
2
Context: bioinformatic analyses
Big data; complex analyses
Repeatable, automated pipelines
Reproducibility real goal
Reproducibility is hard
3
Frameworks
Eg VGL
Local SOPs for biologists
Tools, canned workflows
Minimise opportunities for error
Maximise reproducibilty
4
In real life
90/10 rule
Need to tweak SOPs
Trivial 'disposable' scripts
Not documented or curated
Not reliably available to re-run
“Dark script matter”
5
Dark Script Matter
Outside usual VCS/pipelines
Manual =/= reproducible
Necessary evil?
Platform extensions complex
Eg Galaxy – hours of work
6
Plan
Context: Reproducible analyses
Frameworks vs Dark Scripts
Alchemy: script to Galaxy
tool Demonstration
Summary
Conclusions
7
Galaxy Tool Factory
An installable Galaxy tool
Runs scripts: Python,R,Perl,sh
Generates new Galaxy tools
Tool code wraps the script
Minutes – not hours
8
Galaxy Tool Shed
Separate server
Stores/serves Galaxy tools
Admin can install to Galaxy
Mercurial VCS archives
Explicit tool versioning
Sharing and reproducibility
Demo 1: Install the Tool Factory
Demo 2: Create a new tool
11
Prepare script
Python; R; Perl; Sh
Parse CL params – 1=in, 2=out
Typically workflow transformations
Arbitrary complexity
Simple example
Write transpose of a tabular file
12
Prepare/upload test data
SMALL sample input
Becomes functional test case
h1 h2 h3 h4
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
13
# R transpose a tabular input file and write as
# a tabular output file
ourargs = commandArgs(TRUE)
inf = ourargs[1]
outf = ourargs[2]
inp = read.table(inf,head=F,row.names=NULL,sep='\t')
outp = t(inp)
write.table(outp,outf,quote=FALSE, sep="\t",row.names=F,col.names=FALSE)
14
Demo part 1
As an admin, test run the code
Can't make a new tool until it works!
Admin only real time scripting in Galaxy.
Overrides ALL other security.
Generated tools run with normal security.
15
Use Redo button; Generate
When working right
Use Redo to save retyping
Select Generate option
Provide tool ID, help text
Execute
Expect a toolfactory.gz in history
Copy link (floppy disk icon)
16
What's in the toolshed.gz ?
A gzip'd mercurial tool repository (!)
Auto generated tool XML file
Auto generated tool python wrapper
Functional test case - the sample data
Familiar Galaxy tool for all users
Executes your script over their data
Interoperably inside Galaxy
17
Upload TS gzip to new repository
Upload to any tool shed
Create new repo; sensible name!
Choose Upload files to new repo
Paste URL (floppydisk save icon)
New tool ready to install
18
Install and Test New Tool
Back to Galaxy admin interface
Browse local tool shed
Choose new tool
Install to local Galaxy
Try it out
Run functional test
19
Summary
GTF = script to tool in minutes
Integrated with Galaxy and TS
Simple workflow components
If needed, generate simple tool
Then add parameters manually
20
Tool Factory Operation Guide
Script
(Python,R,
perl, sh)
Galaxy Tool Factory
Tool Form;
Paste script;
Generate TS gzip;
Copy download link for
pasting
Upload/paste
Sample Input for
functional test Test run;
Check outputs;
Rerun/fix;
Tool Shed
Create new repository.
Upload files – paste TS gzip
link and upload
Install new tool from toolshed
from Galaxy admin page;
Test; Functional test;
21
GALAXY
http://usegalaxy.org
22
Generate a new Galaxy tool
Galaxy Tool Factory
From a python, R, Perl or bash script
# transpose a tabular input file and write as a tabular output file
ourargs = commandArgs(T)
inf = ourargs[1]
outf = ourargs[2]
inp = read.table(inf,head=F,row.names=NULL,sep='\t')
outp = t(inp)
write.table(outp,outf,quote=F, sep="\t",row.names=F,col.names=F)
Using a Galaxy tool
Via a Tool Shed
23
Tool Factory Operation Guide
Script – R,
perl, python
Galaxy Tool Factory
Tool Form;
Paste script;
Generate TS gzip;
Copy download link for
pasting
Upload/paste
Sample Input for
functional test Test run;
Check outputs;
Rerun/fix;
Tool Shed
Create new repository.
Upload files – paste TS gzip
link and upload
Install new tool from toolshed
from Galaxy admin page;
Test; Functional test;