Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 222 times |
Download: | 1 times |
Sierra Taylor-Moxon
Database Administrator/Designer
http://zfin.org
My Background
• Database Administrator/Designer: ZFIN
• Database Analyst/Programmer: EPS
• Product Development• Agricultural Research
• Graduated 1999 Whitman College (Biology/Environmental Science)– I wasn’t a CS major!--but I have bioinformatics experience
What is Bioinformatics?
The intersection of computation and biology
• Found in both the private and public sector
• Genomics, Proteomics & Computational biology• Laboratory management and pipeline development.
– High throughput, gene patents, digital lab notebooks
• Modeling and imaging– Drug discovery
• Websites and databases (“Bioinformation”)
Our purpose
• Provide a repository of Zebrafish data.
• Design software that interfaces with our database, and allows users to select, insert, update and delete via a web page.– Some users are savvy, others are not.
Why Zebrafish?
• A popular pet that’s easy to care for
• A freshwater fish
• A model organism– Short time between birth and reproduction– Developing embryo is transparent– Vertebrate – Regenerative body parts
ZFIN technical architecture
• Front end:– Apache web server– Webdatablade, Perl, Java, C, SQL (SPL)
• Back end:– Unix (Sun/Solaris) Platform– Informix Relational Database Management
System (RDBMS)
Databases
• MS Access– Filemaker Pro– MS Excel
• MySQL• PostgreSQL• DB2, Oracle, Informix, SQL Server
Why choose Informix over MS Access?
What’s the difference?
MS Access• File based system
– Or ODBC (open database connectivity) to RDBMS
– Files are hosted on a PC, opened directly
• Works great for small number of concurrent users
• Front end tools included• Web server or PC crashes
cause corruption
Informix (RDBMS)– Server manages concurrent
users– Security built into database
server– Data is not corrupted by
web server failure
• Some come with front end tools, others do not
• Support triggers and other kinds of ‘scheduled’ jobs
• Platform independent
What does a DBA at ZFIN do?
1) Doctor
2) Architect
3) Janitor
4) Security guard
5) Liaisonit depends…
“I think paranoia can be instructive in the right doses. Paranoia is a skill.”
-John Shirley
(Science Fiction novelist)
How?
• Monitor it
• Coordinate backups and test them
• Create and manage development environments
• Plan disk usage
• Plan for upgrades
• Document problems and their solutions.
Interpreting symptoms
• Can’t log into the system, can’t run system commands, or the website is slow.
• Number of users is much higher than usual, some queries will not return data, some will.
• Disappearing data
Stay cool under pressure!
Architect
• There is more than one way to do it.
• Be prepared to do it again.– Biology is the science of
exceptions.
• Do what works.
• There is always a legacy
2) Create the logical and physical data model
A little ZFIN history
• Very few foreign key or primary key constraints
• Test and production on the same machine
• Only two developers
• xfig
Reverse Engineeringcreate table vocabulary(
voc_zdb_id varchar(50),voc_otype varchar(10),voc_ont_id integer
) in tbldbs1 extent size 16 next size 16 ;
create unique index voc_primary_key_index on vocabulary (voc_zdb_id) using btree in idxdbs2 ;
alter table vocabulary add constraint primary key (voc_zdb_id) constraint voc_primary_key ;
alter table vocabulary add constraint (foreign key (voc_type) references ontology_type constraint voc_name_foreign_key) ;
ontology_type
voc_zdb_id---------------voc_otype voc_ont_id
Physical Design
• Column and table definitions
• Placement of data on disk– Data fragmentation
• Performance– Indexes
– “Materialized Views”
create table vocabulary(voc_zdb_id varchar(50),voc_otype varchar(10),voc_ont_id integer
) in tbldbs1 extent size 16 next size 16 ;
create unique index voc_primary_key_index on vocabulary (voc_zdb_id) using btree in idxdbs2 ;
alter table vocabulary add constraint primary key (voc_zdb_id) constraint voc_primary_key ;
alter table vocabulary add constraint (foreign key (voc_type) references ontology_type constraint voc_name_foreign_key) ;
Responsibilities
3) Talk to people• Upgrades• Data model changes• New data• Business logic• New users
BLAST
• Basic Local Alignment Search Tool– Based on the Smith-Waterman algorithm
• Uses substitution matrix (based on the probability that one protein will turn into another protein).
– Performance is increased by matching small areas and building outward
Zebrafish Gene
Human Gene
SmileProtein
SmileProtein
Humans Animal models
Mutant Gene
Mutant or missing Protein
Mutant Phenotype (disease)
Mutant Gene
Mutant or missing Protein
Mutant Phenotype
(disease model)
1996 1998 2000 2002
mutants discovered
Nodal as cause ofholoprosencephaly
Humans
Zebrafish
genesidentified
Recording and Storing Phenotypes at ZFIN
• Gene matching is easy because all DNA is made up of the same 4 characters.
• Describing the way something looks is subjective.
shh-/-
eye placement abnormal+ +
Phenotype = entity value attribute+ +
brain size small+ +
kidney size hypertrophied+ +
Humanprosencephaly
Zebrafishshh
Zebrafishoep
Human gene:SHH;OMIM:600725 Zebrafish gene:shh Zebrafish gene:oep Ref: OMIM:142945 Ref: ZDB-GENE-980526-166 Ref: ZDB-GENE-990415-198 Entity:prosencephalon development Attribute:process Value:arrested
Entity:prosencephalon development Attribute:process Value:reduced
Entity:prosencephalon development Attribute:process Value:arrested
Entity:brain Attribute:size Value:small
Entity:brain Attribute:size Value:small
Entity:brain Attribute:size Value:small
Entity:brain ventricle Attribute:number Value:single
Entity:brain ventricle Attribute:number Value:single
Entity:midface Attribute:structure Value:hypoplastic
Entity:midface Attribute:structure Value:hypoplastic
Entity:midface Attribute:structure Value:hypoplastic
Entity:eye Attribute:morphology Value:abnormal
Entity:eye Attribute:morphology Value:abnormal
Entity:eye Attribute:morphology Value:abnormal
Entity:eye Attribute:number Value:single
Entity:eye Attribute:number Value:single
Entity:eye Attribute:placement Value:mislocalized
Entity:eye Attribute:placement Value:mislocalized
Entity:eye Attribute:placement Value:mislocalized
Entity:nose Attribute:morphology Value:abnormal
Entity:nose Attribute:morphology Value:abnormal
Entity:nostril Attribute:number Value:single
Entity:nostril Attribute:number Value:single
Entity:kidney Attribute:rel_size Value:hypertrophied
Entity:kidney Attribute:rel_size Value:hypertrophied
Entity:kidney Attribute:rel_size Value:hypertrophied
Data type Total
Total ZFIN genes 16,811
Genes with assigned human orthologs 1,615
Total ZFIN mutants 2,827
ZFIN mutants with potential human references
528
More information
• Genomics, proteomics– http://www.ncbi.nlm.nih.gov/gquery.fgi
• Algorithm Development– http://blast.wustl.edu
• Imaging– http://genex.hgu.mrc.ac.uk/Atlas/intro.html
• Websites and databases– http://zfin.org