Date post: | 18-Jan-2017 |
Category: |
Health & Medicine |
Upload: | sb-bhattacharyya |
View: | 219 times |
Download: | 0 times |
CLINICAL DATA ANALYTICS
Dr SB Bhattacharyya
MBBS, MBA, FCGP
Member, IMA Standing Committee on IT, IMA Hqrs
Member, EHR Standards Committee, MoH&FW, GoI
Hony. State Secretary (2015), IMA Haryana
President (2010 – 2011), IAMI
“If you can measure that of which you speak and can
express it by a number, you know something of your
subject; but when you cannot measure it, when you
cannot express it in numbers, your knowledge is meagre
and unsatisfactory.”Lord Kelvin
Dr SB Bhattacharyya© 2
“If it were not for the great variability among individuals, medicine might as well be a science and not an art”
“The good physician treats the disease; the great physician treats the patient who has the disease.”
Sir William Osler, 1892
Dr SB Bhattacharyya© 4
Patient with Acute Fever (Europe)
Diagnosis was fever
Treat with white benedicta (blessed thistle) taken on empty stomach while reciting Pater Noster and Ave Maria three times
Dr SB Bhattacharyya© 5
In the 13th Century…
Patient with Acute Fever (Europe)
Dr SB Bhattacharyya© 6
In the 19th Century…
• Diagnosis was pneumonia by using the newly invented stethoscope
• Treat by blood letting, restricted diet and blistering induced by dried,
pulverized Spanish fly
Patient with Acute Fever (Europe)
In the 20th Century
• Diagnosis is pneumonia using CXR PA View
• Treat with antibiotics (penicillin) administration
• Lumbar Puncture if signs of meningitis is present or develops
Dr SB Bhattacharyya© 7
Hertfordshire Records :: DOHAD
■ Meticulous birth records were maintained throughout Hertfordshire County, UK, from 1911 onwards through the efforts of a dedicated and visionary midwife, Ethel Margaret Burnside
■ Through linking records of births with health in later life by a research team headed by Dr David Barker led to the development of the fetal origins hypothesis termed DOHAD (developmental origins of health and diseases)
Dr SB Bhattacharyya© 9
Clinical Science is Empirical
■ The word empirical denotes information gained by means of observation, experience, or experiment.
■ Empirical data is data that is produced by an experiment or observation.
■ As opposed to theoretical that depends on hypotheses
Dr SB Bhattacharyya© 10
Medical Records Data Volume & Costs
■ On an average, around 80 MB of data (4 MB text & 76 MB imaging) per patient per year is generated
■ Storage costs < US$ 2.00 per patient for 7 years
– Dr John Halamka, MD, MSCIO, Beth Israel Deaconess Medical Center
Dr SB Bhattacharyya© 11
“Statistics are like bikinis. What they reveal is suggestive but what they conceal is vital”
- Aaron Levenstein
Dr SB Bhattacharyya© 12
Nota Bene
■ Statistics is confusing unless one understands the numbers and what they actually mean making them open to misinterpretation
■ There is always a chance of over-analysis leading to analysis paralysis
■ It is important to ask the right questions and re-frame them intelligently
Dr SB Bhattacharyya© 14
Nota Bene
■ Running the analytics is all science – mostly mathematics
■ Interpreting the results is all art derived from knowledge and wisdom
■ It is possible to predict with a reasonable degree of accuracy (~95%) the most likely outcome under a given set of circumstances
Dr SB Bhattacharyya© 15
Nota Bene
■ One must continuously strive to avoid overfitting
■ Likelihood ratio is the best indicator, while p-Value is the worst
■ Hindsight is 6/6 vision, foresight is 0/0
Dr SB Bhattacharyya© 16
Clinical data is…
■ Highly multivariate with many important predictors and response variables
■ Temporally correlated (longitudinal, survival studies)
■ Costly and difficult to obtain
■ Historical in nature
Dr SB Bhattacharyya© 17
Few Indices
■ Sensitivity
■ Specificity
■ Likelihood Ratio (+/-)
■ Predictive Value (+/-)
■ Prevalence
■ Pre-test/Post-test Odds
■ Post-test Probability (+/-)
■ Kaplan-Meier Survival Curves / Cox’s Hazard Ratio
■ Relative Risk
■ Relative Risk Reduction
■ Absolute Risk Reduction
■ Odds Ratio
■ Numbers Needed to Treat (or Harm)
■ Quality of Life Year Adjusted
■ Receiver Operator Characteristic (ROC) Curve
■ Total Cost of Treatment
Dr SB Bhattacharyya© 18
Outcomes
■ Patient better, same or worse
■ Cost less, same or more
■ Needs lesser time, same time, longer time to recover/for relief
■ Needs lesser time, same time, longer time to cure
■ Cure vs. Recover/Relief
Dr SB Bhattacharyya© 19
5 C’s of Analytics
■ Curiosity – figure out what one wishes to figure out
■ Capture – the data
■ Cure – clean and transform the data
■ Crunch – run the chosen analytical model
■ Create – reports and graphs
Dr SB Bhattacharyya© 20
Steps of Performing Analytics
1. Construct query
2. Data acquisition (70 – 80% of effort)
1. Data pre-processing and visualisation
2. ETL
3. Algorithm modelling
4. Run model
5. Study results
6. Repeat from step # 3 above till the most appropriate answer is derived –occasionally the data may have to be re-processed, which most analytical tools are capable of performing
Dr SB Bhattacharyya© 21
The Process of Analytics
■ Several alternative models may need to be run before the “right” model is discovered.
■ With experience, the number of alternative models required to be studied before finding the “right” one will diminish.
Dr SB Bhattacharyya© 22
Data Analytic Reports
■ Data Management: ETL
– Acquire Data
– Clean Data
– Prepare Data (incl. anonymisation)
■ Query Preparation
– Formulate Null Hypothesis
– Determine Data Requirements
■ Analytics Management
– Prepare Analysis
– Run Analysis
– Review Results
– Review Analytical Steps
■ Repeat Cycle
– Analytics Management Onwards
– Query Management Onwards
– Data Management Onwards
Dr SB Bhattacharyya© 24
Sensitivity
Proportion of truly diseased persons in the screened population who are identified as diseased by the screening test (i.e. they have high scores).
Sensitivity indicates the probability that the test will correctly diagnose a case, or the probability that any given case will be identified by the test.
Does positive really mean positive?
That is, confidence level of a positive finding.
To help you remember the term, being sensitive implies being able to react to something.
Dr SB Bhattacharyya© 26
Specificity
Proportion of persons without the disease who have low scores on the screening test: the probability that the test will correctly identify a non-diseased person.
Does negative really mean negative?
That is, confidence level of a negative finding.
To help you remember the term, a specific test is one that picks up only the disease in question,
so it has a narrow focus, which explains the term 'specific'.
Dr SB Bhattacharyya© 27
Likelihood Ratio
■ The Likelihood Ratio (LR) is a ratio of likelihoods (or probabilities) for a condition. The first is the probability that a given condition occurs (or not) in the first observation paradigm. The second is the probability that the samecondition occurs (or not) in the second observation paradigm. The ratio of these 2 probabilities (or likelihoods) is the Likelihood Ratio.
■ Likelihood ratio+ = sensitivity / (1 - specificity) or (A/(A + C)) / (B/(B + D))
■ Likelihood ratio- = (1 - sensitivity) / specificity or (C/(A + C)) / (D/(B + D))
Dr SB Bhattacharyya© 28
Likelihood Ratio
■ Thus, LR is a way to incorporate the sensitivity and specificity of a test into a single measure. Since sensitivity and specificity are fixed characteristics of the test itself within the clinical sciences paradigm, the likelihood ratio is independent of the prevalence in the population.
■ The LR basically measures the power of a test to change the pre-test into the post-test probability of a particular outcome happening.
Dr SB Bhattacharyya© 29
LR Value Interpretation
LRs greater than 10 or less than 0.1 (LR > 10 or LR
< 0.1)
causes large
changes
LRs 5 - 10 or 0.1 - 0.2 (LR > 5 & < 10 or LR > 0.1 &
< 0.2)
causes moderate
changes
LRs 2 - 5 or 0.2 - 0.5 (LR > 2 & < 5 or LR > 0.2 & <
0.5)
causes small
changes
LRs less than 2 or greater than 0.5 (LR < 2 or LR >
0.5)causes tiny changes
LRs equal to 1 (LR = 1)causes no change at
all
Dr SB Bhattacharyya© 30
Big Data in Healthcare
■ High Volume
– Data from all sorts of sources in electronic format
■ High Velocity
– Data from devices, monitors and variety of systems continuously streaming in 24x7
■ High Variety
– Data is in almost all types
■ High Veracity
– Data sources are dependable as they are mostly known
Dr SB Bhattacharyya© 32
Big Data in Healthcare
■ Sources of data
– Wi-Fi/Bluetooth/NCF-enabled personal healthcare monitoring devices
– Smartphones/smart devices (iPod, iPad, etc.)
– Radio-diagnostic imaging devices
– Electronic medical records/health records
– Social media
Dr SB Bhattacharyya© 33
Big Data in Healthcare
■ Data Types
– Textual: EHR and clinical and nursing informatics systems
– Numeric: lab systems and devices
– Coded: EHR and devices
– Audio: EHR and lab systems
– Image: EHR and radio-diagnostic systems
– Video: EHR and radio-diagnostic systems
– Waveform: devices and monitors
– Streamed binary data: wearables, bio-sensors, monitors
Dr SB Bhattacharyya© 34
Types of Data Analysis
■ Prediction
– Classification
– Regression
– Latent Knowledge Estimation
■ Structure Discovery
– Clustering
– Factor Analysis
– Domain Structure Discovery
– Network Analysis
■ Relationship mining
– Association rule mining
– Correlation mining
– Sequential pattern mining
– Causal data mining
■ Distillation of data for human judgment
■ Discovery with models
Dr SB Bhattacharyya© 35
Machine Learning Techniques Used
Dr SB Bhattacharyya© 37
Algorithm Application Areas
Linear Regression Cost predictions
Logistic Regression Likely outcomes (treatment/intervention)
Neural Networks Likely outcomes (treatment/intervention)
Support Vector Machines In place of linear / logistic regression
Classification (Decision Tree,
OneR) and Clustering (K-Means,
Cobweb)
Finding groups (clusters) of similar
observations like clinical outcomes
Principal Component Analysis Data and image compression
Anomaly Detection (Signal
Detection)
Any significant observation (signal)
amongst a ton of observations (noise)
Recommender Systems
(Collaborative Filtering & Market
Basket Analysis)
Drug & treatment suggestions based on
care provider/patient/peer preferences -
personalised medicine
Predictive Analytics
■ Data pre-processing and visualisation
■ Attribute selection
■ Classification (OneR, Decision trees)
■ Prediction (Nearest neighbour)
■ Clustering (K-means, Cobweb)
■ Association rules
Dr SB Bhattacharyya© 38
Application Areas
■ Operational
– Administrative
– Clinical
– Nursing
■ Predictive
– Clinical decision support
– Outcomes (prognostics) assessment
– Readmission prevention
– Adverse event avoidance
– Disease management
– Patient matching
– Personalised medicine
Dr SB Bhattacharyya© 39