Biopackages.net
Operating System Packages for BioinformaticsAllen Day2005.05.17
What is a package? Software, config files, documentation,
and/or data encapsulated in a single file
Metadata describing: Version, license, package “category” Dependencies What the package provides
GMOD target audience Small MODs
Package Dependency Graph
Dependencies What the package provides
chado
chado-Hsa
genome-Hsa-nib ucsc-blat
genome-Hsa-annotation-affymetrix
genome-Hsa-annotation-gene
postgresql-AffxSeq
postgresql-server
perl-bioperl
obo-core
perl-go-perl
Dependencies
Build Dependency Installation Dependency
What is a Package Manager? Tools to manage installation,
upgrade, uninstallation of packages Verify package integrity (checksums) Maintain system integrity
Transactional Allow rollbacks
Dependency checking Dependency graph recursion Allow software customization (patches)
Why bioinformatics packages? Consistency of installation process
Bioinfo. package installs vary wildly, and commonly lack documentation
Automatic dependency installation Perl modules especially bad – bioperl has 60+
modules in its dependency tree Integrity/Auditing of system state
Know an installed package works, which version, how to replicate system setup
Tighter integration with operating system Daemons, config & log file locations, etc.
What’s available?
RPM packages only right now Primary focus on Fedora Core 2
Some RPMs also available for Fedora Core 3 RedHat 9 Cygwin
What’s available?
Three primary foci Applications Libraries Data sets
Applications
Gbrowse Textpresso BLAT daemon NCBI Toolkit (BLAST, etc) HMMer
What’s available?
Libraries Bioperl R & Bioconductor Squid EMBOSS
What’s available?
Data sets Genome & protein sequence Sequence features Ontologies All installed using a common directory
structure
What’s available? UCSC tools (utilities, BLAT system
service, CGI scripts) Bioperl R / Bioconductor GMOD apps (Gbrowse, Textpresso, …) Data packages
Genome sequence (fa, nib, blastdb) Genome features (Affy probeset
alignments, mRNA, etc)
GMOD Components Available
chado-Hsa gbrowse textpresso
gmod-web-Hsa
turnkey
chado
das2-Hsa
apollo-Hsa
cmap-Hsa
‘Hsa’ can be substituted for your organismCurrently built for ‘Cel’, ‘Hsa’, ‘Sce’
ucsc-BLATgenome-Hsa-nib
More details…
chado
chado-Hsa
genome-Hsa-nib ucsc-blat
perl-go-perl
genome-Hsa-annotation-affymetrix
genome-Hsa-annotation-gene
postgresql-AffxSeq
postgresql-serverperl-bioperl
…… ………
Gene Expression Components
chado-Hsa BioconductorR
Quant/NormPipeline
chado-GEC
DAS/2 forGenotyping,GeneChip
Resources
http://www.biopackages.net ~1000 RPMs for Fedora Core 2, 3 Available via yum
See site for a configuration example.
TODO
Support more architectures Build for Cygwin & OS X. RPM has been
ported to both Automate package build process
Build farm of multiple architectures, controllable via scheduler (GridEngine)
Automate (if possible) inclusion of new software / data releases
TODO
Build community interest and involvement Keep adding more packages! Keep existing packages current!
Acknowledgements
Patrick Alger Jared Fox Brian O’Connor Todd Harris Lincoln Stein Stanley Nelson