Home >Data & Analytics >Managing and Analyzing Global Health Data

Managing and Analyzing Global Health Data

Date post:27-Jan-2015
View:107 times
Download:2 times
Share this document with a friend
  • 1. UNIVERSITY OF WASHINGTON Managing and Analyzing Global Health Data Seattle, August 30, 2011 Peter Speyer, Director of Data Development

2. IHME Background Global institute dedicated to providing independent, rigorous, and scientific measurements and evaluations to accelerate progress on global health Part of the Department of Global Health at the University of Washington Funded by the Bill & Melinda Gates Foundation and the state of Washington (core funding), and other funders through specific research grants Created in 2007 70 researchers, 30 staff 2 3. IHME Mission Our goal is to improve the health of the worlds populations by providing the best information on population health 3 4. 4 5. Health-related data Social determinants Risk factors Health data 5 Population-based data Household/facility surveys Census Vital registration Registries (provider, disease) Facility-based data Health records Administrative data (financial, operational) Research data (DSS, clinical trials, etc.) Individual-based data Personal health records Quantified self Disease-based social networks Health Data Innovation Patient engagement Open data Health apps 6. Key health data challenges 6 Find & access data Dissemi- nate data Use data 7. Key health data challenges Lack of transparency Timeliness of data Lack of documentation Access vs. privacy 7 Find & access data Dissemi- nate data Use data 8. Key health data challenges Sheer quantity of data files (30TB, 20K+ source datasets, 40M files) Diverse source data types and formats (pdf, csv, SPSS, CSPro,) Data quality issues 8 Find & access data Dissemi- nate data Use data 9. Key health data challenges Make results data engaging Accountability: share results, code, source data Accommodate diverse audiences (expertise, geographies) 9 Find & access data Dissemi- nate data Use data 10. Example: Global Burden of Disease Mortality & causes of death Sources: census, surveys, vital registration, verbal autopsy Estimates: covariate models, spatial-temporal regressions; weighted combination of models Morbidity Sources: Literature reviews, surveys, registries, hospital data Disease modeling: compartmental Bayesian model Health severity weights Burden of disease DALYnator 10 300 diseases 40 risk factors 21 regions 1990, 2005, 2010 11. GBD Country Years, Causes of Death 1950-2009 11 12. GBD Country Years, Causes of Death 1950-2009 12 Data source Countries Site-years # of Deaths VR 128 4,190 722,267,710 Household surveys 136 2,827 10,132,976 Surveillance systems 12 126 717,698 National VA 21 71 301,855 Subnational VA 59 442 2,606,815 Mortuary registries 6 25 54,316 TOTAL 7,680 735,564,116 13. Solutions: computing infrastructure Analysis with statistical packages Projects with 100K+ lines of code File system 60TB disk space Redundant backup Cluster with 63 nodes (+300% in 2011), ~2000 cores Runs 24x7, very little downtime Virtual environments to test new applications, serve them to collaborators, etc. 13 14. Solutions: Global Health Data Exchange Transparency => data catalog Access => data repository Information => data community (future) One record per dataset Standardized metadata Internal users (10K records): files on file server External users (5K records): files for download CMS: Drupal Search: SOLR 14 Objectives Approach Implementation 15. 15 16. UNIVERSITY OF WASHINGTON Thank you! [email protected] @peterspeyer www.ghdx.org Peter Speyer Director of Data Development

Popular Tags:

Click here to load reader

Embed Size (px)