CVS Reorganization,Installation Reorganization & A Simple Build System
Steve FischerOctober 24, 2002
Goals
Make GUS portable Easily build and configure runnable instances of our projects Avoid conflicting with the pre-existing structure of external sites
Self-explanatory CVS structure Improve collaborators’ ability to understand our code structure And ours (particularly new folks)
Improved ability to release projects on a schedule Projects wait for other projects only when necessary
Easily maintain released versions while development continues Support development on a unified file system (new file server)
Don’t rely on local file systems having different installed versions Encourage a move away from individual ownership of projects
CVS Reorganization
CVS Structure
At the root there are projects, which have components, which have parts:
$CVSROOT/
Project1/
Component1/
bin/
lib/
Component2/
Project2/
Projectn/
parts
Projects
Units of software development having their own release schedule Have their own release number, eg, GUS 3.1.2, Annotator 1.0.0
May depend on other projects Either on a specific release or ‘latest’ No circular dependencies
Dependency example: DoTS->GUS->CSP
Proposed Projects (phase 1)
AllGenes Annotator (new) CSP (CBIL Style Police: web form assistance) DJob (distributed job controller) DoTS (build, including DoTS related plugins) GLE (grammars for transcription) GUS ParaDBs PlasmoDB RAD (plugins & website) (or should these be separate?)
Projects Contain Components
For example:GUS/
Common/
DBAdmin
Model/ A component is usually formed from one or a couple of
package directories (eg, java packages). In addition to code, components have:
Documentation TODO list Configuration Required data
A component may depend on others within its project
Proposed GUS/ Components
SeqMatch/ Common/ DBAdmin/ GenBank/ GOPredictor/ Model/ ObjRelJ/ ObjRelP/ Pipeline/ PluginMgr/ SchemaBrowser/ Util/ WebDevKit/
Proposed GUS Components in Detail - 1
SeqMatch/ Blast, BLAST2, and BLAT object models
Common/ Commonly used schema dependent stuff, such as common plugins
DBAdmin/ Database admin scripts
GenBank/ Data model for genbank parser
GOPredictor/ Predict GO function
Model/ Schema definition files Hand edited objects (eg, GUS::Model::DoTS::Assembly.pm.man)
Proposed GUS Components in Detail - 2
ObjRelJ/ Java object layer: superclasses
ObjRelP/ Perl object layer: superclasses and code generator
Pipeline/ Pipeline management API
PluginMgr/ GA and friends
SchemaBrowser/ Web based schema browser
Util/ Commonly used schema-independent utilities
WebDevKit/ Servlet and/or JSP based
Components Contain Standard Parts
As needed:
Component1/
bin/
cgi-bin/
config/
data/
doc/
htdocs/
lib/
src/
test/
Component Parts in Detail - 1
Bin executables
Cgi-bin/ Cgi executables
Config/ Properties files for the component Files typically named component.prop
Doc/ Specifications User guides TODO
Component Parts in Detail - 2
Data/ Extra data needed by component, eg, matrices,
Htdocs/ Static web pages
Test/ Test data
Component Parts in Detail - 3
Lib/ Contains linkable object files Perl includes full package path (Project::component::module)
lib/
perl/
Project/
component/
*.pm
xml/
*.xml
Component Parts in Detail - 4
Src/ Contains source that needs to be compiled or otherwise
transformed before it can be installed
src/ c/ java/ edu/ cbil/ project/ component/ org/ gusdb/ component/
Process to Migrate CVS Structure
origCVS/ GUS/ perl/ … www/ perl/ lib/ Fasta.pm CSP/
fromCVS/ GUS/ perl/ … www/ perl/ lib/ Fasta.pm CSP/
CVS checkout copy
newCVS/ DoTS/ Common/ lib/ perl/ Common/ Fasta.pm CSP/ GUS/
orig pruned
Transformscript
new
Process to Migrate CVS Structure
Do transform Check out old to origCVS/ Copy to fromCVS/ Run script to transform from fromCVS/ to newCVS/
moves files (thus pruning fromCVS/) Validate transform by:
Examing transform script (see where YOUR stuff is going to be) Examine fromCVS/ to see what I left behind
Create new CVSROOT with newCVS/ (losing history)
How can I get everybody’s OK on the transform??
Releases
A project is the unit of release When it is released, it is tagged in cvs with a release number,
eg 2.1.1 At that time, other projects can declare a dependency on that
particular release (discussed in detail later) Bug fixes are applied to the tagged branch
Installation Reorganization
Objectives
Install into a single relocatable location: $GUS_HOME Don’t conflict with the site’s pre-existing structure Easy to install and uninstall Avoid path and classpath conflicts
Be able to find (almost) all GUS installed resources there: Executables Libraries Documentation Configuration Some data Third party resources
Support multiple running instances on a machine, eg dev, beta Make explicit the versions of each included resource
In a file called $GUS_HOME/versions Also have installation targets for websites and appl. servers
P1/ C1
C2
P2/ C1
C2
P3/ C1
C2
Installing Project1 to $GUS_HOME
$GUS_HOME/Bin/Lib/Doc/
= dependencies
Understanding $GUS_HOME
The home for a single GUS related installation Contains one or more projects, and the projects they depend
on Including third party resources (such as BioJava or BioPerl) May contain projects that are running separately, as long as
they don’t have conflicting dependencies (May even contain the entire set of GUS projects) It contains a version file, eg:
DoTS 1.2.1
GUS 2.0.0
CSP 3.0.1
BioJava 1.1.1
Sample $GUS_HOME
$GUS_HOME/ bin/ dotsbuild extractSeqs config/ dotsbuild.prop csp.prop gus.prop data/ matrices/ doc/ DoTS/Dotsbuild/TODO GUS/Model/uml/*.uml CSP/UserGuide.html (cont’d)…
Sample $GUS_HOME (cont’d)
$GUS_HOME/
…
lib/
java/
gusmodel.jar
guswdk.jar
perl/
DoTS/DotsBuild/*.pm
GUS/Model/*.pm
GUS/ObjRelP/*.pm
GUS/WebDevKit/*.pm
CSP/*.pm
xml/
*.xml
Installation Targets - Website
Contains cgi-bin and htdocs
$/world/www.allgenes/
cgi-bin/
*.pl
htdocs/
*.html
Installation Targets – Application Server
EG, Tomcat Will copy .jar files and other stuff as needed
Configuration
Code never refers to absolute file paths Many resources are conveniently located relative to
$GUS_HOME, so don’t need configuration Otherwise, code relies on property files for configuration We provide a perl and java PropertySet object to make this
easy Property files are found in… $GUS_HOME/config Named after the component that is using them:
gusmodel.prop Consider using properties instead of macro substitutions when
possible
A Simple Ant-based Build System
Objectives
Install projects to installation targets Serves two kinds of installs:
From $PROJECT_HOME to installation targets (developers) From .tar file to installation targets (external users)
Support inter-project dependencies Support dependencies on third party resources Handle object layer code generation Be easy to use Be relatively easy to maintain Be generic across projects and components, but also flexible
$PROJECT_HOME -> $GUS_HOME (developers)
Install the build system Create $PROJECT_HOME Check out the install/ project Copy build to a local bin/
Check out the project of interest build DoTS install $GUS_HOME –co
Now $PROJECT_HOME is an image of CVS containing your project and all projects it depends on.
Edit away Install for testing
build DoTS install $GUS_HOME
.tar -> $GUS_HOME (external users)
Untar the download file cd install/ setenv $GUS_HOME /usr/local/somewhere ./build DoTS install $GUS_HOME
Also works for installing to a website ./build GeneDB installweb /world/www.genedb
The Mechanics
Uses Ant (http://jakarta.apache.org/ant/) Uses xml to declaratively specify the build Used extensively in the java community
Each project contains its own build.xml file The install/ directory contains the main build.xml file
Calls the appropriate project’s build.xml Houses “subroutines”
The project build.xml file Specifies the project’s dependencies on other projects Specifies the project’s component’s dependencies on each other Calls default project and component build routines, unless custom
ones supplied
The Mechanics (cont’d)
Java compilation The build compiles java into the classes/ directory of a component It places .jar files into lib/java/ directory of a component Does compilation in dependency order
Copying from $PROJECT_HOME to target (eg, $GUS_HOME) Copies and merges all dirs, such as bin/, lib/ doc/ Can perform macro substitution on the way
Creates version file Because Ant is kind of slow, build can do just a component
instead of a whole project
Special case: object layer code generation
Generates into: Perl:
$PROJECT_HOME/GUS/Model/lib/perl/GUS/Model/DoTS $PROJECT_HOME/GUS/Model/lib/perl/GUS/Model/Core Etc
Java: $PROJECT_HOME/GUS/Model/src/java/org/gusdb/model/dots $PROJECT_HOME/GUS/Model/src/java/org/gusdb/model/core Etc
Hand created files have .man suffix, eg: GUS/Model/DoTS/Assembly.pm.man
Only generates if files are older than “schema definition” file: $PROJECT_HOME/GUS/Model/schema/definition.sql
This file is generated by schema modification process (and lives in CVS)
Making it All Happen
Goals
Getting it done quickly With a minimum of disruption
Overarching strategy
Work on projects and components one at a time Start with ones that are most central and not in development
GUS/ObjRelP GUS/Model/lib/perl/Model/DoTS GUS/PluginMgr GUS/Common DoTS DJob
Get that core working with new build system and $GUS_HOME In parallel, convert to GUS 3.0 In parallel, bring in GUS/ObjRelJ and Annotator (Dave) Upgrade WebDevKit install apparatus Bring in Web sites one at a time
Inconveniences
During the time that a project is being upgraded: Everybody will need to commit their work to cvs Will need to maintain 2 copies in cvs:
Old New Will need to merge changes to both
Hopefully this will be short for most projects