+ All Categories
Home > Data & Analytics > X team 2 - presentation

X team 2 - presentation

Date post: 21-Aug-2015
Category:
Upload: rayna-harris
View: 20 times
Download: 2 times
Share this document with a friend
Popular Tags:
10
X-team #2 High Dimensional Biological Butterflies Data Science Workshop 2015
Transcript

X-team #2High Dimensional

Biological Butterflies

Data Science Workshop 2015

What do we have in common?

High-dimensional biological data

● High-throughput genotyping and phenotyping

● Finding biological meaning in big data with high N and/or P

The ability to harvest the wealth of information contained in biomedical Big Data will advance our understanding of human health and disease; however, lack of appropriate tools, poor data accessibility, and insufficient training, are major impediments to rapid translational impact. -NIH BD2K

Data integration

● Data fragmentationo individual vs populationo multiple -omicso multiple sources

● Discovery and predictiono genome and functional

annotation

Statistical learning methods

● Data quality○ hidden sources of variability○ limitations of short read

sequencing

Data annotationGenome assembly/error

correction

Problem Solution

Success StoriesDomain Science Data Science Methods

Metabolic pathway - Ingenuity Pathway Analysis (http://www.ingenuity.com/products/ipa)

Genomic data - Quality Control- FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)- EasyQC for genome-wide association meta-analyses

(http://www.nature.com/nprot/journal/v9/n5/full/nprot.2014.071.html)- Batch effect

- PEER (http://www.ncbi.nlm.nih.gov/pubmed/22343431)- SVA (http://www.ncbi.nlm.nih.gov/pubmed/22257669)- scLVM (Buettner et al., 2015)

- Data storage and sharing- NCBI (http://www.ncbi.nlm.nih.gov)- GitHub (https://github.com)- UCSC genome browser (http://genome.ucsc.edu/)

- Gene annotation- Gene Ontology (http://geneontology.org/page/documentation)

Proteomics - Protein Data Bank (PDB) (http://www.rcsb.org/pdb/home/home.do)

Disease Survivability - WEKA (Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten (2009); The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1.)

Wei Xie
Famous example: Potential flaws in genomics paper scrutinized on Twitter: http://www.nature.com/news/potential-flaws-in-genomics-paper-scrutinized-on-twitter-1.17591

Same data, different interpretation

Gilad & Mizrahi-Man 2015 F1000Research, 4:121

InterdisciplinaryResearch

Statistics

Domain science

Computer science

Scientific writingCollaboration

Visualization of data

Database

Bioinformatics

Interdisciplinary data science essentials

Going Forward● Create and maintain a HowTo website for

Data Science computational tools and methods.

http://data-science-for-biologists.wikia.com/wiki/Data_Science_for_Biologists_Wikia

● Collaborate via Github

Thanks!


Recommended