It's all in the genes

Post on 05-Aug-2015

72 views 2 download

Tags:

transcript

1. @rjlkuipersrkuipers@vxcompany.comIts all in the genesThe power of the Oracle database and Exadata in cancer research 2. About me Business Manager Data and BI Solutions Datawarehouse Architect Business Intelligence specialist Master degree in Biochemistrymolecular biology cancer genetics@rjlkuipersrkuipers@vxcompany.com 3. Agenda Basic genetics analyses Technology behind this What does it look like The next step: combining genomic data with patient data When both worlds meet@rjlkuipersrkuipers@vxcompany.com 4. Set the contextBASIC GENETICS@rjlkuipersrkuipers@vxcompany.com 5. @rjlkuipersrkuipers@vxcompany.com 6. Chromosomes@rjlkuipersrkuipers@vxcompany.com 7. Genes@rjlkuipersrkuipers@vxcompany.com 8. basic geneticsDETERMINING THE GENETICSEQUENCE@rjlkuipersrkuipers@vxcompany.com 9. Genetic sequence Blood / cancer tissue DNA isolation DNA amplification DNA Sequencing (40x - 80x)@rjlkuipersrkuipers@vxcompany.com 10. Genetic sequence approx. 5% of DNA is gene approx. 95% of DNA is referred to as junk-DNA 99% of entire DNA sequence is stable Genetic variations are normal@rjlkuipersrkuipers@vxcompany.com 11. @rjlkuipersrkuipers@vxcompany.com 12. DNA (Next Generation) SequencingFrom blood-sample to DNA sequence 3 billion basepairs 2 TB per sample unique: whole genomes@rjlkuipersrkuipers@vxcompany.com 13. Abnormal genetic variations@rjlkuipersrkuipers@vxcompany.com 14. Searching for the unknown genetic variations normal genetic variations cancer better diagnoses require better analyses. Upfront (predictive) diagnoses require a lot of data andprocessing power. result: less-invasive treatment, better patient-life. What did we not know (yet) and can be learned from Ultimate goal: centralized DNA library for statistical purposes@rjlkuipersrkuipers@vxcompany.com 15. THE TECHNOLOGY BEHIND THIS@rjlkuipersrkuipers@vxcompany.com 16. DNA (Next Generation) Sequencing 3 billion basepairs 2 TB per sample Whole genomes@rjlkuipersrkuipers@vxcompany.com 17. Handling large volumes Oracle Database Partitioning Optimized data model Oracle Exadata Database Machine Optimized to run Oracle Database Specific performance features- Smart Scans- Exadata Hybrid Columnar Compression Performance increase: 700x@rjlkuipersrkuipers@vxcompany.com 18. Handling large volumes - database benefits Datamodel V1 Sample-oriented (partitioned) Each base-position stored (compared to reference genome)- leads to 95% no-calls 206 samples --> 800 GB- max 2500 samples on Exadata Indexes are (still) needed: Index size 5x larger than sample-size@rjlkuipersrkuipers@vxcompany.com 19. Handling large volumes - database benefits Datamodel V2 Sample-oriented (partitioned) positions are stored as regions (buckets)- 1000 positions per region Buckets are indexed EHCC Compression Reduce redundant data- Store allele 1 and 2 as 1 row when values are equal Storage 99GB (246 samples)- Up to 20.000 samples Indexes require less space than in Datamodel V1@rjlkuipersrkuipers@vxcompany.com 20. Exadata benefits Flash Parallel processing Smart Scans Exadata Hybrid Columnar Compression Lets have a look videos courtesy of Frits Hoogland@rjlkuipersrkuipers@vxcompany.com 21. Executed testsNr@rjlkuipersrkuipers@vxcompany.comExadatafeaturesParallel Disk type1 - Serial HDD2 - Serial FDD3 - 64 HDD4 - 64 FDD5 SS Serial HDD6 SS Serial FDD7 SS 64 HDD8 SS 64 FDD9 SS + EHCC 64 FDD 22. Executed testsNr@rjlkuipersrkuipers@vxcompany.comExadatafeaturesParallel Disk type1 - Serial HDD2 - Serial FDD3 - 64 HDD4 - 64 FDD5 SS Serial HDD6 SS Serial FDD7 SS 64 HDD8 SS 64 FDD9 SS + EHCC 64 FDD 23. 23 24. 24 25. Executed testsNr@rjlkuipersrkuipers@vxcompany.comExadatafeaturesParallel Disk type1 - Serial HDD2 - Serial FDD3 - 64 HDD4 - 64 FDD5 SS Serial HDD6 SS Serial FDD7 SS 64 HDD8 SS 64 FDD9 SS + EHCC 64 FDD 26. 26 27. 27 28. Executed testsNr@rjlkuipersrkuipers@vxcompany.comExadatafeaturesParallel Disk type1 - Serial HDD2 - Serial FDD3 - 64 HDD4 - 64 FDD5 SS Serial HDD6 SS Serial FDD7 SS 64 HDD8 SS 64 FDD9 SS + EHCC 64 FDD 29. 29 30. 30 31. Executed testsNr@rjlkuipersrkuipers@vxcompany.comExadatafeaturesParallel Disk type1 - Serial HDD2 - Serial FDD3 - 64 HDD4 - 64 FDD5 SS Serial HDD6 SS Serial FDD7 SS 64 HDD8 SS 64 FDD9 SS + EHCC 64 FDD 32. 32 33. Query performance (times are seconds)Nr@rjlkuipersrkuipers@vxcompany.comExadatafeaturesParallel Disk type 11.2.0.1 11.2.0.21 - Serial HDD 695 1532 - Serial FDD 403 913 - 64 HDD 19 184 - 64 FDD 16 135 SS Serial HDD 416 SS Serial FDD 377 SS 64 HDD 138 SS 64 FDD 69 SS + EHCC 64 FDD 1 34. WHAT DOES IT LOOK LIKE ?@rjlkuipersrkuipers@vxcompany.com 35. @rjlkuipersrkuipers@vxcompany.com 36. Why is this important? Speed Faster results No is found earlier Volume (Centralized DNA Library) Better statistical basis Less-invasive treatments for patients Personalized healthcare@rjlkuipersrkuipers@vxcompany.com 37. Even more Add clinical data to genomic data. Patient history Drug treatment history DemographicsClinicalData Biobanks@rjlkuipersrkuipers@vxcompany.comLabSystems Omic DataIntegration of Data 38. Oracle Translational Research Center (TRC)@rjlkuipersrkuipers@vxcompany.com 39. @rjlkuipersrkuipers@vxcompany.com 40. Advanced visualizations@rjlkuipersrkuipers@vxcompany.com 41. Summary Care is primary. Technology is supporting. Oracle offers platforms to provide better care Database Exadata TRC Clinical and Genomic data are complimentary. Not everything is in the genes@rjlkuipersrkuipers@vxcompany.com 42. @rjlkuipersrkuipers@vxcompany.com