+ All Categories
Home > Documents > PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case...

PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case...

Date post: 06-Jun-2020
Category:
Upload: others
View: 17 times
Download: 1 times
Share this document with a friend
21
PostgreSQL PostgreSQL Scientific Scientific Application Application - Case Case example example PostgreSQL PostgreSQL Genomic Genomic Databases Databases bastien Cl bastien Clé ment ment [email protected] [email protected] Natural Natural Resources Resources Canada Canada Presented Presented at at the the PostgreSQL PostgreSQL Conference Conference 2009 in 2009 in Japan Japan November November 20th 20th
Transcript
Page 1: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

PostgreSQLPostgreSQL ScientificScientific Application Application -- Case Case exampleexample

PostgreSQLPostgreSQL GenomicGenomic DatabasesDatabasesSSéébastien Clbastien Cléémentment

[email protected]@cfrl.forestry.ca

Natural Natural ResourcesResources CanadaCanadaPresentedPresented atat the the PostgreSQLPostgreSQL ConferenceConference 2009 in 2009 in JapanJapan

NovemberNovember 20th20th

Page 2: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

ForewordForeword

«« WhatWhat isis thisthis guyguy doingdoing herehere ?? »»

«« Can Can PostgreSQLPostgreSQL handlehandle scientificscientific databasesdatabases ?? »»

Page 3: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

WhatWhat isis genomicsgenomics and and whywhy botherbother ??

Genomics: « The study of the entire genome (all genes) of a species »

Genome sizeNumber of genes

•Health and disease•Heredity•etc.

•Genetic improvement

3 000 000 0003 000 000 000~~23 00023 000

390 000 000390 000 000~53 000~53 000

Page 4: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

DATA

WhyWhy studystudy TREETREE genomicsgenomics??

CGACGTTAATGCCACTC

CGACGTTAATGCCACTCG

Cellulose Cellulose genegene

Normal

Variant

DATADBDB

Page 5: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

WhyWhy isis a a genomicgenomic DB essential?DB essential?

A single A single genegene……

……how about how about thousandsthousands of of genesgenes…………for for thousandsthousands of of speciesspecies??

Name

Sequence

Size

FunctionsCell wall metabolismCell structrureCatalysis…

Variations

Species

Interaction withother genes

Similarity withother species

more…(phew!) Chromosome pos.

Page 6: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

Public Public genomicsgenomics DBsDBs

GenbankGenbank

UniProtUniProt

TAIRTAIR

http://www.arabidopsis.org/

http://www.ebi.ac.uk/uniprot/

http://www.ncbi.nlm.nih.gov/Genbank/

Page 7: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

Our Our PostgreSQLPostgreSQL DatabasesDatabases

TreeSNPsTreeSNPsGenes and variations

PhenoTreePhenoTreeObservable attributes(physical, morphological)

•Ruby on Rails interface•Multi-language support•38 tables•~450 K records•Mostly manual entry

•PhpPgAdmin interface•21 tables•~4.1 M records

Page 8: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

TreeSNPsTreeSNPs overviewoverviewGeneral General viewsviews

Page 9: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

TreeSNPsTreeSNPs overviewoverview ((contcont’’dd))

LabLab plateplate

Plate Plate viewview

ResultsResults

Page 10: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

TreeSNPsTreeSNPs overviewoverview ((contcont’’dd))

ExampleExample of of calculationscalculations ((viewsviews):):

Page 11: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

TreeSNPsTreeSNPs overviewoverview ((contcont’’dd))

TreeSNPsTreeSNPs downloaddownload page & page & demodemo versionversionhttp://treesnps-pub.arborea.ulaval.ca:3000/download

A A paperpaper to to appearappear soonsoon inin

AdoptedAdopted by U. of Albertaby U. of Alberta’’s (Canada)s (Canada)LaboratoryLaboratory on on MoutainMoutain Pine Pine BeetleBeetle

Page 12: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

PhenoTreePhenoTree overviewoverview

> > 10001000treestrees

> > 60 K60 K recordsrecords

~ ~ 4 M4 M recordsrecords

1

234

Dimensions & Dimensions & morphologymorphology

Wood Wood analysisanalysis

Other data:•Geographical locations•Tree pedigree

WhatWhat data data isis storedstored ??

Page 13: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

PhenoTreePhenoTree overviewoverview ((contcont’’dd))

Wood Wood analysisanalysis propertiesproperties tabletable

read every 25 µm

PithPith BarkBark

RadiusRadius

942 trees

~2100 reads/tree

1.98 M1.98 M reads!etc.etc.

Wood Wood densitydensity

Fibre dimensionsFibre dimensionsCellsCells countcount

Page 14: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

ExampleExample of of calculationscalculations 1 (SQL 1 (SQL viewsviews):):

PhenoTreePhenoTree overviewoverview ((contcont’’dd))

PithPith BarkBark

RadiusRadius

Ring width (mm)

Ring area (mm2)Σ

xWood density (kg/m³)

Fibre width (µm)

Cell counts (/mm²)

4.35 4.53 3.70122.3 253.4 302.8

GrowthGrowth ring ring averagesaverages

1

744 664 611

1 2 3 …

22.95 23.85 23.89

1369 1446 1699

942 trees

~16 rings/tree

15 K15 K records

Page 15: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

SELECT tableau_croise.arbre, tableau_croise.height_1986, tableau_croise.height_1992, tableau_croise.height_1997, tableau_croise.height_2004, tableau_croise.height_2005

FROM crosstab('select tree_name AS nom_ligne, yearAS categorie,height AS valeur from trunk_measuresORDER BY 1,2'::text, 'SELECT DISTINCT year FROM trunk_measures ORDER BY 1'::text) tableau_croise(arbre text, height_1986 double precision, height_1992 double precision, height_1997 double precision, height_2004 double precision, height_2005 double precision);

PhenoTreePhenoTree overviewoverview ((contcont’’dd))

ExampleExample of of calculationscalculations 2 (SQL 2 (SQL viewsviews):): crosstabcrosstab functionfunction

crosstabcrosstab

LogicalLogical, , but not but not veryvery usefuluseful……

……thisthis isis itit whatwhatendend--usersusers wantwant

Page 16: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

Systems and user baseSystems and user base

•Formerly Access projects (2006-7)•Migrated to PostgreSQL 8.3 under Fedora (2007-8)•Migrated back to Windows (2009)

•Around 20 scientific users (Universities, Federal Government)

ProductionProductionServerServer

Gov.Canadanetwork

Universitynetwork

MirrorMirrorserverserver

Localusers

VPN

Localusers

Page 17: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

PostgreSQLPostgreSQL and Windows and Windows –– cancan itit reallyreally workwork ??

TaskTask automation automation withwith DOSDOS••Limited Limited functionnalityfunctionnality

Solution ?Solution ?

*Thanks: Greg Smith (http://wiki.postgresql.org/wiki/Automated_Backup_on_Windows)

Windows Windows TaskTask ManagerManager

CygwinCygwin

Unix/Unix/bashbash scriptsscripts

PostgreSQLPostgreSQL

Script Script examplesexamples::••Start Rails server (DOS)Start Rails server (DOS)••Backups (DOS)*Backups (DOS)*••Backup files Backup files cleanercleaner ((bashbash))••VPN connexion to production server (DOS)VPN connexion to production server (DOS)••Mirror Mirror synchronizingsynchronizing ((bash,DOSbash,DOS))••DatabaseDatabase version version comparisoncomparison ((bashbash))••UsersUsers & & privilegesprivileges report (report (bashbash))

Page 18: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

DevelopingDeveloping databasesdatabases for the for the scientificscientific communitycommunity

Suggestions:Suggestions:

••Have a userHave a user--basedbased approachapproach

••1. Know/1. Know/answeranswer the the useruser’’ss needsneeds

••2. 2. LimitLimit technicaltechnical jargonjargon

••3. 3. ThinkThink ‘‘usabilityusability’’

Page 19: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

AknowledgementsAknowledgements

Jean Beaulieu Jean Beaulieu –– LabLab directordirectorJoJoëël Fillon l Fillon –– Ruby on Rails interface designerRuby on Rails interface designerJeanJean--Philippe Dionne Philippe Dionne –– Rails Rails securesecure accessaccess programmingprogrammingJean Bousquet Jean Bousquet –– CollaboratorCollaborator

All end All end usersusers, , particularlyparticularly::Sylvie Blais, StSylvie Blais, Stééphanie phanie BeauseigleBeauseigle, Marie Deslauriers,, Marie Deslauriers,PierPier--Luc Poulin, Patrick LenzLuc Poulin, Patrick Lenz

PeoplePeople

OrganizationsOrganizations

ArboreaArborea Forest Forest GenomicsGenomics ((http://www.arborea.ulaval.ca/http://www.arborea.ulaval.ca/))Canadian Forest Service, Natural Canadian Forest Service, Natural ResourcesResources CanadaCanadaGenomeGenome QuQuéébecbecGenomeGenome CanadaCanada

Page 20: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

Done ?

Page 21: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural

Recommended