Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | maris-lambert |
View: | 35 times |
Download: | 0 times |
Postgres and the Genome
Jeff PenningtonDirector, Translational InformaticsCenter for Biomedical Informatics
AndDepartment of Pathology
The Children’s Hospital Of Philadelphia
DNA as Data
• 4 letter ‘alphabet’ of bases – A T C G3,000,000,000 base pairs
• Sequence codes for biological function
VARIFY Architecture
• Varify Architecture– Three-tier web application– Harvest (http://harvest.research.chop.edu)• Javascript client• Python server using Django ORM• Postgres 9.2
Database
• Physical – 9.2, RHEL VM, VMWare w/ storage on host
• Round 1 – 4G RAM, 80G disk• Round 2 – 32 G RAM, 250G disk
Tuning
• max_connections – too big, • shared_buffers – amount of memory allocated
to PG• work_mem – amount of memory available to
sort• default_statistics_target – gives the query
planner something to work with
Resources
• Book: PostgreSQL 9.0 High Performance– Ch 5 and 6– Page 145
• Tools: pg_buffercache• Benchmarking: – \timing– EXPLAIN– log_min_duration_statement = 5000
Tuning Round 1 (4G RAM)
• max_connections = 100• shared_buffers = 1024MB (default 32MB)• work_mem = 200MB (default 1M)– Tried 1G, bad trade-off on count (slow) vs. list (not
much faster)
Tuning Round 2 (32G RAM)
• max_connections = 100• shared_buffers = 24576MB (Increased from
1024MB)• work_mem = 150MB (Decreased from 200MB)