Spaghetti Code, Soupy Logic Jim Kent - University of California
Santa Cruz Steaming fresh modules in sourceforge.net Combinatorical
assembly of transcription factors in cell.
Slide 2
A Challenge Every Speaker Faces: Who is the audience?
Bioinformaticians: Biologists with bigger, better databases? Geeks
trading bits for bases? Leading edge interdisciplinary super
scientists?
Slide 3
Top 5 Reasons Biologists Go Into Bioinformatics 5 - Microscopes
and biochemistry are so 20th century.
Slide 4
Top 5 Reasons Biologists Go Into Bioinformatics 5 - Microscopes
and biochemistry are so 20th century. 4 - Got started purifying
proteins, but it turns out the cold room is really COLD.
Slide 5
Top 5 Reasons Biologists Go Into Bioinformatics 5 - Microscopes
and biochemistry are so 20th century. 4 - Got started purifying
proteins, but it turns out the cold room is really COLD. 3 - After
23 years of school wanted to make MORE than $23,000/year as a
postdoc.
Slide 6
Top 5 Reasons Biologists Go Into Bioinformatics 5 - Microscopes
and biochemistry are so 20th century. 4 - Got started purifying
proteins, but it turns out the cold room is really COLD. 3 - After
23 years of school wanted to make MORE than $23,000/year as a
postdoc. 2 - Like to swear, @ttracted to $_ Perl #!!
Slide 7
Top 5 Reasons Biologists Go Into Bioinformatics 5 - Microscopes
and biochemistry are so 20th century. 4 - Got started purifying
proteins, but it turns out the cold room is really COLD. 3 - After
23 years of school wanted to make MORE than $23,000/year as a
postdoc. 2 - Like to swear, @ttracted to $_ Perl #!! 1 - Getting
carpel tunnel from pipetting
Slide 8
Top 5 Reasons Computer People go into Bioinformatics 5 - Bio
courses actually have some females.
Slide 9
Top 5 Reasons Computer People go into Bioinformatics 5 - Bio
courses actually have some females. 4 - Human genome more stable
than Windows XP
Slide 10
Top 5 Reasons Computer People go into Bioinformatics 5 - Bio
courses actually have some females. 4 - Human genome more stable
than Windows XP 3 - Having mastered binary trees, quad trees, and
parse trees ready for phylogenic trees.
Slide 11
Top 5 Reasons Computer People go into Bioinformatics 5 - Bio
courses actually have some females. 4 - Human genome more stable
than Windows XP 3 - Having mastered binary trees, quad trees, and
parse trees ready for phylogenic trees. 2 - Missing heady froth of
the internet bubble.
Slide 12
Top 5 Reasons Computer People go into Bioinformatics 5 - Bio
courses actually have some females. 4 - Human genome more stable
than Windows XP 3 - Having mastered binary trees, quad trees, and
parse trees ready for phylogenic trees. 2 - Missing heady froth of
the internet bubble. 1 - Must augment humanity to defeat evil
artificial intelligent robots.
Slide 13
The Paradox of Genomics How does a long, static, one
dimensional string of DNA turn into the remarkably complex,
dynamic, and three dimensional human body? GTTTGCCATCTTTTG
CTGCTCTAGGGAATC CAGCAGCTGTCACCA TGTAAACAAGCCCAG GCTAGACCAGTTACC
CTCATCATCTTAGCT GATAGCCAGCCAGCC ACCACAGGCATGAGT
Slide 14
The Analogy of the Code of Life DNA is popularly considered the
code of life. Computer programs are complex systems that ultimately
are built up of 0s and 1s, perhaps they are a model for a genome
built of A,C,G and T? BUT. Human genome lacks documentation, has
accumulated 3 billion years of cruft, and does not believe in local
variables. Therefore we must look to less than straightforward
software programs as guides.
Bioperl CORBA module sub new { my ( $class, @args) = @_; my
$self = $class->SUPER::new(@args); my ( $idl, $ior, $orbname ) =
$self->_rearrange( [ qw(IDL IOR ORBNAME)], @args);
$self->{'_ior'} = $ior || 'biocorba.ior'; $self->{'_idl'} =
$idl || $ENV{BIOCORBAIDL} || 'biocorba.idl'; $self->{'_orbname'}
= $orbname || 'orbit-local-orb'; $CORBA::ORBit::IDL_PATH =
$self->{'_idl'}; my $orb = CORBA::ORB_init($orbname); my
$root_poa = $orb->resolve_initial_references("RootPOA");
$self->{'_orb'} = $orb; $self->{'_rootpoa'} = $root_poa;
return $self; }
Slide 16
3)+1);setitimer(0,&t,0);f&&printf("\e[10;%u]",g+24);}f&&putchar(7);s+=(9-w[21]
)*((g>>3)+1);o=p;m(x);m(w);(n=rand())&255||--*w||++*w;if(!(**P&&P++||n&7936)){
while(abs((X=rand()%76)-*x+2)-*w 100,000 hits by > 5000
scientists each day. Involves 570,000 lines of C code, bits of awk,
perl, ">
A Big Bioinformatics Web Site genome.ucsc.edu gets > 100,000
hits by > 5000 scientists each day. Involves 570,000 lines of C
code, bits of awk, perl, bash, tcsh, java, r and tcl. 1200 CPUs and
12 Terabytes of disk 12 full time staff, 18 part time, grad student
and post-doc.
Slide 58
Site Architecture 8 web servers running Apache and MySQL CGIs
written in C access genome data and user interface settings in
MySQL. Genome database is bottleneck, and is replicated on each
server. Cluster of 1000 CPUs, and smaller clusters of faster CPUs
create annotation files which are loaded into database.
Slide 59
Site Sociology 1/3 of group telecommutes. Thursdays are devoted
to reading and testing each others code and if necessary a one or
two hour meeting. We develop very incrementally, and do a new
release once a week. 1/4 of group is dedicated to quality
assurance, Im wanting to increase this to 1/3. User support is
shared by everyone.
Slide 60
Parasol and Kilo Cluster UCSC cluster has 1000 CPUs running
Linux 1,000,000 BLASTZ jobs in 25 hours for mouse/human alignment
We wrote Parasol job scheduler to keep up. Very fast and free. Jobs
are organized into batches. Error checking at job and at batch
level.
Slide 61
Conclusions Spaghetti code is not so helpful in understanding
the genome. Human genome suggests that trial and error development
is likely to yield a robust version of windows within 3 billion
years. Understanding the flow of control in the genome is a problem
that fascinates biologists and computer scientists alike.
Slide 62
Further Acknowledgements Individuals Institutions NHGRI, The
Wellcome Trust, HHMI, NCI, Taxpayers in the US and worldwide.
Baylor, Sanger, Wash U, Whitehead, Stanford, JGI/ DOE, Vancouver
GSC, UW and the international sequencing centers. UCSC, NCBI, EBI,
Ensembl, Genoscope, MGC, Intel, TIGR, Jackson Labs, Affymetrix,
SwissProt. Chuck Sugnet, Angie Hinrichs, Fan Hsu, Terry Furey,
Heather Trumbower, Kate Rosenbloom, Hiram Clawson, Brian Raney,
Rachel Harte, Bob Kuhn, Mathieu Blanchette, Donna Karolchik, David
Haussler John Sulston, Richard Gibbs, Eric Lander, Francis Collins,
Roderic Guigo, Michael Brent, Olivier Jaillon, David Kulp, Victor
Solovyev, Ewan Birney, Greg Schuler, Deanna Church, Scott Schwartz,
Ross Hardison, and everyone else!