+ All Categories
Home > Documents > Electronic Medical Record

Electronic Medical Record

Date post: 01-Jan-2016
Category:
Upload: milagro-unzues
View: 25 times
Download: 4 times
Share this document with a friend
Description:
Machine Learning for Healthcare David Page Dept. of Biostatistics & Medical Informatics and Dept. of Computer Sciences University of Wisconsin-Madison. PatientID Date Physician Symptoms Diagnosis P1 1/1/01 Smith palpitations hypoglycemic - PowerPoint PPT Presentation
80
Machine Learning Machine Learning for Healthcare for Healthcare David Page David Page Dept. of Biostatistics & Medical Informatics Dept. of Biostatistics & Medical Informatics and Dept. of Computer Sciences and Dept. of Computer Sciences University of Wisconsin-Madison University of Wisconsin-Madison
Transcript
Page 1: Electronic Medical Record

Machine LearningMachine Learningfor Healthcarefor Healthcare

David PageDavid Page

Dept. of Biostatistics & Medical InformaticsDept. of Biostatistics & Medical Informaticsand Dept. of Computer Sciencesand Dept. of Computer SciencesUniversity of Wisconsin-MadisonUniversity of Wisconsin-Madison

Page 2: Electronic Medical Record

Electronic Medical RecordElectronic Medical Record

PatientID Gender Birthdate

P1 M 3/22/63

PatientID Date Physician Symptoms Diagnosis

P1 1/1/01 Smith palpitations hypoglycemic P1 2/1/03 Jones fever, aches influenza

PatientID Date Lab Test Result

P1 1/1/01 blood glucose 42 P1 1/9/01 blood glucose 45

PatientID SNP1 SNP2 … SNP500K

P1 AA AB BB P2 AB BB AA

PatientID Date Prescribed Date Filled Physician Medication Dose Duration

P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months

Page 3: Electronic Medical Record

33

Predictive Predictive PersonalizedPersonalized

MedicineMedicine

Repeat for hundreds of diseases and treatments

Rep

eat for thou

sand

s of patien

ts

PersonalizedTreatment

Individual Patient

G + C + E

Predictive Model for

Disease Susceptibility& Treatment

Response

State-of-the-ArtMachine Learning

Genetic,Clinical,

&Environmental

Data

Page 4: Electronic Medical Record

Estimation of the Warfarin Estimation of the Warfarin Dose with Dose with Clinical and Clinical and

Pharmacogenetic DataPharmacogenetic Data International Warfarin Pharmacogenetics ConsortiumInternational Warfarin Pharmacogenetics Consortium

(IWPC)(IWPC)

NEJM, February 19, 2009, vol. 360, no. 8 NEJM, February 19, 2009, vol. 360, no. 8

Page 5: Electronic Medical Record

MotivationMotivation ““In Milestone, FDA Pushes Genetic Tests Tied to Drug”In Milestone, FDA Pushes Genetic Tests Tied to Drug”

Where:Where: Front-page article, Wall Street Journal, August Front-page article, Wall Street Journal, August 16, 200716, 2007

Why:Why: FDA released new warfarin product labeling with FDA released new warfarin product labeling with pharmacogenomics dosing recommendationspharmacogenomics dosing recommendations

What:What: New pharmacogenetics section and changes in New pharmacogenetics section and changes in initial dosage section with pharmacogentics in the initial dosage section with pharmacogentics in the warnings sectionwarnings section

http://www.fda.gov/cder/foi/label/2007/009218s105lblv2.pdfhttp://www.fda.gov/cder/foi/label/2007/009218s105lblv2.pdf

Page 6: Electronic Medical Record

““In Milestone, FDA Pushes Genetic In Milestone, FDA Pushes Genetic Tests Tied to Drug”Tests Tied to Drug”

Initial dosing (warfarin package insert)Initial dosing (warfarin package insert)

““The dosing of COUMADIN must be individualized according to The dosing of COUMADIN must be individualized according to patient’s sensitivity to the drug as indicated by the PT/INR….. patient’s sensitivity to the drug as indicated by the PT/INR….. It is recommended that COUMADIN therapy be initiated with It is recommended that COUMADIN therapy be initiated with a dose of 2 to 5 mg per day with dosage adjustments based a dose of 2 to 5 mg per day with dosage adjustments based on the results of PT/INR determinations.on the results of PT/INR determinations. The lower initiation The lower initiation doses should be considered for patients with certain genetic doses should be considered for patients with certain genetic variations in CYP2C9 and VKORC1 enzymesvariations in CYP2C9 and VKORC1 enzymes as well as for as well as for elderly and/or debilitated patients….”elderly and/or debilitated patients….”

http://www.fda.gov/cder/foi/label/2007/009218s105lblv2.pdfhttp://www.fda.gov/cder/foi/label/2007/009218s105lblv2.pdf

Page 7: Electronic Medical Record

Clinicians’ responses to FDA labeling Clinicians’ responses to FDA labeling change for warfarinschange for warfarins

How, exactly, would I use this information?How, exactly, would I use this information? Nice science, but prove to me that it’s better than Nice science, but prove to me that it’s better than

what we already dowhat we already do

i.e., I have to see a randomized trial comparing genotype-i.e., I have to see a randomized trial comparing genotype-guided versus usual dosingguided versus usual dosing

Summer 2009: the NHLBI Clarification of Optimal Summer 2009: the NHLBI Clarification of Optimal Anticoagulation through Genetics (COAG) trial (PI: Stephen Anticoagulation through Genetics (COAG) trial (PI: Stephen Kimmel, MD)Kimmel, MD)

Page 8: Electronic Medical Record

Current warfarin pharmacogenetics Current warfarin pharmacogenetics information limitations information limitations

Clinical utility (or a randomized trial) will require Clinical utility (or a randomized trial) will require dosing equation that incorporates genetic and non-dosing equation that incorporates genetic and non-genetic, demographic information.genetic, demographic information.

Numerous such equations have been proposed, but:Numerous such equations have been proposed, but:• most are highly geographically confinedmost are highly geographically confined• none were developed from robust data in Asians, none were developed from robust data in Asians,

Caucasians, and AfricansCaucasians, and Africans Thus, an equation derived from a large, Thus, an equation derived from a large,

geographically and ethnically diverse population was geographically and ethnically diverse population was needed to help insure global clinical utility.needed to help insure global clinical utility.

Page 9: Electronic Medical Record

IWPC - 21 research groupsIWPC - 21 research groups4 continents and 9 countries4 continents and 9 countries AsiaAsia

Israel, Japan, Korea, Taiwan, SingaporeIsrael, Japan, Korea, Taiwan, Singapore EuropeEurope

Sweden, United KingdomSweden, United Kingdom North AmericaNorth America

USA (11 states: Alabama, California, Florida, Illinois, Missouri, North USA (11 states: Alabama, California, Florida, Illinois, Missouri, North Carolina, Pennsylvania, Tennessee, Utah, Washington, Wisconsin)Carolina, Pennsylvania, Tennessee, Utah, Washington, Wisconsin)

South AmericaSouth America BrazilBrazil

Page 10: Electronic Medical Record

DatasetDataset 5,700 patients treated with warfarin5,700 patients treated with warfarin Demographic characteristicsDemographic characteristics Primary indication for warfarin treatmentPrimary indication for warfarin treatment Stable therapeutic dose of warfarinStable therapeutic dose of warfarin Treatment INRTreatment INR Target INRTarget INR

5,052 patients with a target INR of 2-35,052 patients with a target INR of 2-3 Concomitant medicationsConcomitant medications

Grouped by increased or decreased effect on INRGrouped by increased or decreased effect on INR Presence of genotype variantsPresence of genotype variants

CYP2C9CYP2C9 (*1, *2 and *3)(*1, *2 and *3) VKORC1 (one of seven SNPs in linkage disequilibrium)VKORC1 (one of seven SNPs in linkage disequilibrium)

blinded re-genotyping for quality controlblinded re-genotyping for quality control

Page 11: Electronic Medical Record

Age, height and weightAge, height and weight

Page 12: Electronic Medical Record

Average warfarin doses for Average warfarin doses for stable INR (median – 2.5)stable INR (median – 2.5)

Page 13: Electronic Medical Record

Race, inducers and amiodaroneRace, inducers and amiodarone

Page 14: Electronic Medical Record

CYP2C9 and VKORC1 genotypesCYP2C9 and VKORC1 genotypes

Page 15: Electronic Medical Record

Weekly dose by CYP2C9 genotypeWeekly dose by CYP2C9 genotype

Page 16: Electronic Medical Record

CYP2C9 genotype by raceCYP2C9 genotype by race

Page 17: Electronic Medical Record

Weekly dose by VKORC1 -1639 Weekly dose by VKORC1 -1639 genotypegenotype

Page 18: Electronic Medical Record

VKORC1 -1639 genotype by raceVKORC1 -1639 genotype by race

Page 19: Electronic Medical Record

Modeling of VKORC1 SNPsModeling of VKORC1 SNPs Missing values of VKORC1 -1639 G>A Missing values of VKORC1 -1639 G>A

(rs9923231)(rs9923231)

Imputed based on race and VKORC1 SNP data at Imputed based on race and VKORC1 SNP data at 2255C>T (rs2359612), 1173 C>T (rs9934438), or 2255C>T (rs2359612), 1173 C>T (rs9934438), or 1542G>Crs80508941542G>Crs8050894

If the VKORC1 genotype could not be imputed, it was If the VKORC1 genotype could not be imputed, it was treated as “missing” (a distinct variable) in the model.treated as “missing” (a distinct variable) in the model.

Page 20: Electronic Medical Record

Data Analysis MethodologyData Analysis Methodology Derivation Cohort Derivation Cohort

4,043 patients with a stable dose of warfarin and 4,043 patients with a stable dose of warfarin and target INR of 2-3 mg/weektarget INR of 2-3 mg/week

Used for developing dose prediction modelsUsed for developing dose prediction modelsValidation CohortValidation Cohort

• 1,009 patients (20% of dataset)1,009 patients (20% of dataset)• Used for testing final selected modelUsed for testing final selected model

Analysis group did not have access to validation Analysis group did not have access to validation set until set until afterafter the final model was selected the final model was selected

Page 21: Electronic Medical Record

Real-valued prediction methods Real-valued prediction methods usedused Included, among others Included, among others

Support vector regressionSupport vector regression Regression treesRegression trees Model treesModel trees Multivariate adaptive regression splinesMultivariate adaptive regression splines Least-angle regressionLeast-angle regression LassoLasso Logarithmic and square-root transformations Logarithmic and square-root transformations Direct prediction of doseDirect prediction of dose

Support vector regression and Ordinary least-squares linear regression gave the lowest Support vector regression and Ordinary least-squares linear regression gave the lowest mean absolute errormean absolute error

Predicted the square root of the dosePredicted the square root of the dose Incorporated both genetic and clinical dataIncorporated both genetic and clinical data

Page 22: Electronic Medical Record

IWPC pharmacogenetic dosing algorithmIWPC pharmacogenetic dosing algorithm

**The output of this **The output of this algorithm must be algorithm must be squared to squared to compute weekly compute weekly dose in mgdose in mg

^All references to ^All references to VKORC1 refer to VKORC1 refer to genotype for genotype for rs9923231rs9923231

5.6044 - 0.2614 x Age in decades + 0.0087 x Height in cm + 0.0128 x Weight in kg - 0.8677 x VKORC1^ A/G - 1.6974 x VKORC1 A/A - 0.4854 x VKORC1 genotype

unknown - 0.5211 x CYP2C9 *1/*2 - 0.9357 x CYP2C9 *1/*3 - 1.0616 x CYP2C9 *2/*2

- 1.9206 x CYP2C9 *2/*3 - 2.3312 x CYP2C9 *3/*3 - 0.2188 x CYP2C9 genotype

unknown - 0.1092 x Asian race - 0.2760 x Black or African

American - 0.1032 x Missing or Mixed

race + 1.1816 x Enzyme inducer

status - 0.5503 x Amiodarone status = Square root of weekly warfarin dose**

Page 23: Electronic Medical Record

IWPC clinical dosing algorithmIWPC clinical dosing algorithm

**The output of this **The output of this algorithm must be algorithm must be squared to squared to compute weekly compute weekly dose in mgdose in mg

4.0376 - 0.2546 x Age in decades + 0.0118 x Height in cm + 0.0134 x Weight in kg - 0.6752 x Asian race + 0.4060 x Black or African American + 0.0443 x Missing or Mixed race + 1.2799 x Enzyme inducer status - 0.5695 x Amiodarone status = Square root of weekly warfarin dose**

Page 24: Electronic Medical Record

ResultsResultsInclusion of genotypes for CYP2C9 and VKORC1, in addition Inclusion of genotypes for CYP2C9 and VKORC1, in addition

to clinical variables, are significantly closer to estimating to clinical variables, are significantly closer to estimating the appropriate initial dose of warfarin than just a clinical the appropriate initial dose of warfarin than just a clinical or fixed-dose approachor fixed-dose approach

46.2% of the population with ≤21 mg/wk or ≥49 mg/wk 46.2% of the population with ≤21 mg/wk or ≥49 mg/wk benefit the mostbenefit the most

These are the patients for whom an underdose or overdose could These are the patients for whom an underdose or overdose could have adverse clinical consequences.have adverse clinical consequences.

Patients requiring an intermediate dose are likely to obtain Patients requiring an intermediate dose are likely to obtain little benefit including genotypeslittle benefit including genotypes

Page 25: Electronic Medical Record

Model comparisonsModel comparisons

Page 26: Electronic Medical Record

Warfarin doses predicted for the clinical Warfarin doses predicted for the clinical and PGx algorithms with and without and PGx algorithms with and without

amiodaroneamiodarone

Genotypes can change the recommended dose from Genotypes can change the recommended dose from >45 mg/wk to <10 mg/wk when all other factors equal!>45 mg/wk to <10 mg/wk when all other factors equal!

50 yr oldWhiteMale

175 cm80 kg

Page 27: Electronic Medical Record

Warfarin doses predicted for the clinical Warfarin doses predicted for the clinical and PGx algorithms based on race and and PGx algorithms based on race and

genotypegenotype

50 yr oldMale

175 cm80 kg

Racial differences in the estimated dose are Racial differences in the estimated dose are insignificant insignificant when when genotypes included. Clinical algorithm may substantially genotypes included. Clinical algorithm may substantially

overestimate or underestimate the dose.overestimate or underestimate the dose.

Page 28: Electronic Medical Record

% Patients with % Patients with dose estimates dose estimates within 20% of within 20% of actual doseactual dose

• Comparison of PGx, clinical Comparison of PGx, clinical and fixed dose approachesand fixed dose approaches

• 3 dose groups shown (mg/wk)3 dose groups shown (mg/wk)• low (≤21)low (≤21)

• intermediate (>21 to <49)intermediate (>21 to <49)• high (≥49)high (≥49)

• Fixed dose (35 mg/wk)Fixed dose (35 mg/wk)• None of the estimates forNone of the estimates for

low and high dose groups werelow and high dose groups werewithin 20% of actual dosewithin 20% of actual dose

Page 29: Electronic Medical Record

Limitations of this study Limitations of this study 1.1. Did not address the issue of whether a precise initial Did not address the issue of whether a precise initial

dose of warfarin translates into dose of warfarin translates into

improved clinical end points reduction in time needed to achieve improved clinical end points reduction in time needed to achieve a stable therapeutic INR, fewer INRs out of range, reduced a stable therapeutic INR, fewer INRs out of range, reduced incidence of bleeding or thromboembolic eventsincidence of bleeding or thromboembolic events

2.2. Did not have sufficient data across the 21 groups to Did not have sufficient data across the 21 groups to include potentially important factors such asinclude potentially important factors such as

smoking status, vitamin K intake, alcohol consumption, other smoking status, vitamin K intake, alcohol consumption, other genetic factors (e.g., CYP4F2, ApoE, GGCX), environmental factorsgenetic factors (e.g., CYP4F2, ApoE, GGCX), environmental factors

Page 30: Electronic Medical Record

New England Journal of Medicine, Feb 2009New England Journal of Medicine, Feb 2009

Data available at PharmGKBData available at PharmGKB

• www.pharmgkb.orgwww.pharmgkb.org

• Accession number: PA162355460Accession number: PA162355460

Page 31: Electronic Medical Record

IWPC AuthorsIWPC AuthorsWriting committee:Writing committee: Teri E. Klein, Russ B. Altman, Niklas Eriksson, Brian F. Gage, Teri E. Klein, Russ B. Altman, Niklas Eriksson, Brian F. Gage,

Stephen E. Kimmel, Ming-Ta M. Lee, Nita A. Limdi, David Page, Dan M. Roden, Stephen E. Kimmel, Ming-Ta M. Lee, Nita A. Limdi, David Page, Dan M. Roden, Michael J. Wagner, Michael D. Caldwell, Julie A. JohnsonMichael J. Wagner, Michael D. Caldwell, Julie A. Johnson

Data Contributors:Data Contributors: Academic Sinica, Taiwan, ROC: Ming-Ta M. Lee, Yuan-Tsong ChenAcademic Sinica, Taiwan, ROC: Ming-Ta M. Lee, Yuan-Tsong ChenChang Gung Memorial Hospital, Chang Gung University, Taiwan, ROC: Ming-Shien WenChang Gung Memorial Hospital, Chang Gung University, Taiwan, ROC: Ming-Shien WenChina Medical University, Graduate Institute of Chinese Medical Science, Taichung, China Medical University, Graduate Institute of Chinese Medical Science, Taichung,

Taiwan, ROC: Ming-Ta M. LeeTaiwan, ROC: Ming-Ta M. LeeHadassah Medical Organization, Israel: Yoseph Caraco, Idit Achache, Simha Blotnick, Hadassah Medical Organization, Israel: Yoseph Caraco, Idit Achache, Simha Blotnick,

Mordechai MuszkatMordechai MuszkatInje University, Korea: Jae-Gook Shin, Ho-Sook KimInje University, Korea: Jae-Gook Shin, Ho-Sook KimInstituto Nacional de Câncer, Brazil: Guilherme Suarez-Kurtz, Jamila Alessandra PeriniInstituto Nacional de Câncer, Brazil: Guilherme Suarez-Kurtz, Jamila Alessandra PeriniInstituto Nacional de Cardiologia Laranjeiras, Brazil: Edimilson Silva-AssunçãoInstituto Nacional de Cardiologia Laranjeiras, Brazil: Edimilson Silva-AssunçãoIntermountain Healthcare, USA: Jeffrey L. Anderson, Benjamin D. Horne, John F. Intermountain Healthcare, USA: Jeffrey L. Anderson, Benjamin D. Horne, John F.

CarlquistCarlquistMarshfield Clinic, USA: Michael D. Caldwell, Richard L. Berg, James K. BurmesterMarshfield Clinic, USA: Michael D. Caldwell, Richard L. Berg, James K. BurmesterNational University Hospital, Singapore: Boon Cher Goh, Soo-Chin LeeNational University Hospital, Singapore: Boon Cher Goh, Soo-Chin LeeNewcastle University, United Kingdom: Farhad Kamali, Elizabeth Sconce, Ann K. DalyNewcastle University, United Kingdom: Farhad Kamali, Elizabeth Sconce, Ann K. DalyUniversity of Alabama, USA: Nita A. LimdiUniversity of Alabama, USA: Nita A. LimdiUniversity of California, San Francisco, USA: Alan H.B. WuUniversity of California, San Francisco, USA: Alan H.B. WuUniversity of Florida, USA: Julie A. Johnson, Taimour Y. Langaee, Hua FengUniversity of Florida, USA: Julie A. Johnson, Taimour Y. Langaee, Hua FengUniversity of Illinois, Chicago, USA: Larisa Cavallari, Kathryn MomaryUniversity of Illinois, Chicago, USA: Larisa Cavallari, Kathryn MomaryUniversity of Liverpool, United Kingdom: Munir Pirmohamed, Andrea Jorgensen, Cheng University of Liverpool, United Kingdom: Munir Pirmohamed, Andrea Jorgensen, Cheng

Hok Toh, Paula WilliamsonHok Toh, Paula WilliamsonUniversity of North Carolina, USA: Howard McLeod, James P. Evans, Karen E. WeckUniversity of North Carolina, USA: Howard McLeod, James P. Evans, Karen E. WeckUniversity of Pennsylvania, USA: Stephen E. Kimmel, Colleen BrensingerUniversity of Pennsylvania, USA: Stephen E. Kimmel, Colleen BrensingerUniversity of Tokyo and RIKEN Center for Genomic Medicine, Japan: Yusuke Nakamura, University of Tokyo and RIKEN Center for Genomic Medicine, Japan: Yusuke Nakamura,

Taisei MushirodaTaisei MushirodaUniversity of Washington, USA: David Veenstra, Lisa Meckley, Mark J. Rieder, Allan E. University of Washington, USA: David Veenstra, Lisa Meckley, Mark J. Rieder, Allan E.

RettieRettieUppsala University, Sweden: Mia Wadelius, Niclas Eriksson, Håkan MelhusUppsala University, Sweden: Mia Wadelius, Niclas Eriksson, Håkan MelhusVanderbilt University, USA: C. Michael Stein, Dan M. Roden, Ute Schwartz, Daniel Vanderbilt University, USA: C. Michael Stein, Dan M. Roden, Ute Schwartz, Daniel

KurnikKurnikWashington University in St. Louis, USA: Brian F. Gage, Elena Deych, Petra Lenzini, Washington University in St. Louis, USA: Brian F. Gage, Elena Deych, Petra Lenzini,

Charles EbyCharles EbyWellcome Trust Sanger Institute, United Kingdom: Leslie Y. Chen, Panos DeloukasWellcome Trust Sanger Institute, United Kingdom: Leslie Y. Chen, Panos Deloukas

Statistical Analysis:Statistical Analysis:

University of Alabama, USA: Nita A. LimdiUniversity of Alabama, USA: Nita A. Limdi

Marshfield Clinic, USA: Michael D. CaldwellMarshfield Clinic, USA: Michael D. Caldwell

North Carolina State University, USA: Alison Motsinger-ReifNorth Carolina State University, USA: Alison Motsinger-Reif

Stanford University, USA: Russ B. Altman, Hersh Sagrieya, Teri E. Klein, Balaji S. Stanford University, USA: Russ B. Altman, Hersh Sagrieya, Teri E. Klein, Balaji S. SrinivasanSrinivasan

Uppsala University, Uppsala Clinical Research Center, Sweden: Niclas ErikssonUppsala University, Uppsala Clinical Research Center, Sweden: Niclas Eriksson

University of California, San Francisco, USA: Alan H.B. WuUniversity of California, San Francisco, USA: Alan H.B. Wu

University of North Carolina, USA: Michael J. WagnerUniversity of North Carolina, USA: Michael J. Wagner

University of Florida, USA: Julie A. JohnsonUniversity of Florida, USA: Julie A. Johnson

University of Pennsylvania, USA: Stephen E. KimmelUniversity of Pennsylvania, USA: Stephen E. Kimmel

University of Wisconsin-Madison, USA: David Page, Eric Lantz, Tim ChangUniversity of Wisconsin-Madison, USA: David Page, Eric Lantz, Tim Chang

Vanderbilt University, USA: Marylyn RitchieVanderbilt University, USA: Marylyn Ritchie

Washington University in St. Louis, USA: Brian F. Gage, Elena DeychWashington University in St. Louis, USA: Brian F. Gage, Elena Deych

Genotyping QC of IWPC Samples:Genotyping QC of IWPC Samples:

Academic Sinica, Taiwan, ROC: Ming-Ta M. Lee, Liang-Suei LuAcademic Sinica, Taiwan, ROC: Ming-Ta M. Lee, Liang-Suei Lu

Genotype and Phenotype QC:Genotype and Phenotype QC:

Inje University, Korea: Jae-Gook ShinInje University, Korea: Jae-Gook Shin

Marshfield Clinic, USA: Michael D. CaldwellMarshfield Clinic, USA: Michael D. Caldwell

Stanford University, USA: Teri E. Klein, Russ B. Altman, Balaji S. SrinivasanStanford University, USA: Teri E. Klein, Russ B. Altman, Balaji S. Srinivasan

University of Alabama, USA: Nita A. LimdiUniversity of Alabama, USA: Nita A. Limdi

University of Florida, USA: Julie A. JohnsonUniversity of Florida, USA: Julie A. Johnson

University of Pennsylvania, USA: Stephen E. KimmelUniversity of Pennsylvania, USA: Stephen E. Kimmel

University of North Carolina, USA: Michael J. WagnerUniversity of North Carolina, USA: Michael J. Wagner

University of Wisconsin-Madison, USA: David PageUniversity of Wisconsin-Madison, USA: David Page

Washington University in St. Louis, USA: Brian F. GageWashington University in St. Louis, USA: Brian F. Gage

Vanderbilt University, USA: Marylyn RitchieVanderbilt University, USA: Marylyn Ritchie

Data Curation:Data Curation:

Stanford University, USA: Teri E. Klein, Russ B. Altman, Balaji S. SrinivasanStanford University, USA: Teri E. Klein, Russ B. Altman, Balaji S. Srinivasan

University of North Carolina, USA: Michael J. WagnerUniversity of North Carolina, USA: Michael J. Wagner

Washington University in St. Louis, USA: Elena Deych Washington University in St. Louis, USA: Elena Deych

Page 32: Electronic Medical Record

Application: MammographyApplication: Mammography Provide decision support for radiologists Provide decision support for radiologists

Variability due to differences in training Variability due to differences in training and experience… to get 90% of cancers, and experience… to get 90% of cancers, have high false positive ratehave high false positive rate

Experts have higher cancer detection and Experts have higher cancer detection and fewer benign biopsiesfewer benign biopsies

Shortage of expertsShortage of experts

Page 33: Electronic Medical Record

Bayes Net for Bayes Net for MammographyMammography

Kahn, Roberts, Wang, Jenks, Haddawy (1995)Kahn, Roberts, Wang, Jenks, Haddawy (1995)

Kahn, Roberts, Shaffer, Haddawy (1997)Kahn, Roberts, Shaffer, Haddawy (1997)

Burnside, Rubin, Shachter (2000)Burnside, Rubin, Shachter (2000)

Note: not CAD (computer-assisted diagnosis), Note: not CAD (computer-assisted diagnosis), which circles abnormalities in an image… this which circles abnormalities in an image… this is based on data entered into National is based on data entered into National Mammography Database schema by Mammography Database schema by radiologistsradiologists

Page 34: Electronic Medical Record

Ca++ LucentCentered

Benign v.Malignant

TubularDensity

Ca++ Amorphous

Ca++ Dystrophic

Mass Margins

Ca++ Round

Ca++ Punctate

Mass Size

Mass P/A/O

Mass Stability Milk of CalciumCa++ Dermal

Ca++ Popcorn

Ca++ Fine/Linear

Ca++ Eggshell

Ca++ Pleomorphic

Ca++ Rod-like

Skin Lesion

ArchitecturalDistortion

Mass Shape

Mass Density

BreastDensity

LN AsymmetricDensity

Age

HRT

FHx

Page 35: Electronic Medical Record

P1 1 5/02 No 0.03 RU4 B

P1 2 5/04 Yes 0.05 RU4 M

P1 3 5/04 No 0.04 LL3 B

P2 4 6/00 No 0.02 RL2 B … … … … … … …

Patient Abnormality Date Calcification … Mass Loc Benign/ Fine/Linear Size Malignant

Mammography DatabaseMammography Database

Page 36: Electronic Medical Record

Level 1: ParametersLevel 1: Parameters

Benign v.Malignant

Calc FineLinear

MassSize

P(Benign) = ??.99

P(Yes| Benign) =

P(Yes| Malignant) =

P( size > 5| Benign) =

P(size > 5| Malignant) =

.33

.42

??

??

??

??

.01

.55

Page 37: Electronic Medical Record

Level 2: Structure + Level 2: Structure + ParametersParameters

Benign v.Malignant

Calc FineLinear

MassSize

P(Benign) = .99

P(Yes| Benign) = .01

P(Yes| Malignant) = .55

P(Yes) = .02P( size > 5| Benign) = .33

P(size > 5| Malignant) = .42

P( size > 5 )= .1P(size > 5| Benign ^ Yes) = .4

P(size > 5| Malignant ^ Yes) = .6

P(size > 5| Benign ^ No) = .05

P(size > 5| Malignant ^ No) = .2

Page 38: Electronic Medical Record

DataData

Structured data from actual practiceStructured data from actual practice National Mammography DatabaseNational Mammography Database

Standard for reporting all abnormalitiesStandard for reporting all abnormalities Our dataset containsOur dataset contains

435 malignancies435 malignancies 65,365 benign abnormalities65,365 benign abnormalities

Link to biopsy results Link to biopsy results Obtain disease diagnosis – our ground Obtain disease diagnosis – our ground

truthtruth

Page 39: Electronic Medical Record

HypothesesHypotheses

Learn relationships that are useful to Learn relationships that are useful to radiologistradiologist

Improve by moving up learning Improve by moving up learning hierarchyhierarchy

Page 40: Electronic Medical Record

Results (Radiology, 2009)Results (Radiology, 2009)

Trained (Level 2, TAN) Bayesian network Trained (Level 2, TAN) Bayesian network model achieved an AUC of 0.966 which was model achieved an AUC of 0.966 which was significantly better than the radiologists’ AUC significantly better than the radiologists’ AUC of 0.940 (P = 0.005)of 0.940 (P = 0.005)

Trained BN demonstrated significantly better Trained BN demonstrated significantly better sensitivity than the radiologist (89.5% vs. sensitivity than the radiologist (89.5% vs. 82.3%—P = 0.009) at a specificity of 90%82.3%—P = 0.009) at a specificity of 90%

Trained BN demonstrated significantly better Trained BN demonstrated significantly better specificity than the radiologist (93.4% versus specificity than the radiologist (93.4% versus 86.5%—P = 0.007) at a sensitivity of 85% 86.5%—P = 0.007) at a sensitivity of 85%

Page 41: Electronic Medical Record

ROC: Level 2 (TAN) vs. Level ROC: Level 2 (TAN) vs. Level 11

Page 42: Electronic Medical Record

Precision-Recall CurvesPrecision-Recall Curves

Page 43: Electronic Medical Record

P1 1 5/02 No 0.03 RU4 B

P1 2 5/04 Yes 0.05 RU4 M

P1 3 5/04 No 0.04 LL3 B

P2 4 6/00 No 0.02 RL2 B … … … … … … …

Patient Abnormality Date Calcification … Mass Loc Benign/ Fine/Linear Size Malignant

Mammography DatabaseMammography Database

Page 44: Electronic Medical Record

Statistical Relational Statistical Relational LearningLearning

Learn probabilistic model, but don’t Learn probabilistic model, but don’t assume iid data: there may be assume iid data: there may be relevant data in other rows or even relevant data in other rows or even other tablesother tables

Database schema: defines set of Database schema: defines set of featuresfeatures

Page 45: Electronic Medical Record

SRL Aggregates Information SRL Aggregates Information from Related Rows or Tablesfrom Related Rows or Tables

Extend probabilistic models to relational Extend probabilistic models to relational databasesdatabases

Probabilistic Relational ModelsProbabilistic Relational Models

(Friedman et al. 1999, Getoor et al. 2001)(Friedman et al. 1999, Getoor et al. 2001)

Tricky issue: one to many relationshipsTricky issue: one to many relationships Approach: use aggregationApproach: use aggregation

PRMs cannot capture all relevant conceptsPRMs cannot capture all relevant concepts

Page 46: Electronic Medical Record

P1 1 5/02 No 0.03 RU4 B

P1 2 5/04 Yes 0.05 RU4 M

P1 3 5/04 No 0.04 LL3 B

P2 4 6/00 No 0.02 RL2 B … … … … … … …

Patient Abnormality Date Calcification … Mass Loc Benign/ Fine/Linear Size Malignant

Aggregate IllustrationAggregate IllustrationAggregation Function:

Min, Max, Average, etc.

Page 47: Electronic Medical Record

P1 1 5/02 No 0.03 0.03 RU4 B

P1 2 5/04 Yes 0.05 0.045 RU4 M

P1 3 5/04 No 0.04 0.045 LL3 B

P2 4 6/00 No 0.02 0.02 RL2 B … … … … … … … …

Patient Abnormality Date Calcification … Mass Avg Size Loc Benign/ Fine/Linear Size this date Malignant

New SchemaNew SchemaAvg Sizethis Date

0.03

0.045

0.045

0.02…

Page 48: Electronic Medical Record

Level 3: AggregatesLevel 3: Aggregates

Avg Sizethis date

Benign v.Malignant

Calc FineLinear

MassSize

Note: Learn parameters for each node

Page 49: Electronic Medical Record

Database Notion of Database Notion of ViewView

New tables or fields defined in terms New tables or fields defined in terms of existing tables and fields known as of existing tables and fields known as viewsviews

A A viewview corresponds to alteration in corresponds to alteration in database schemadatabase schema

Goal: automate the learning of Goal: automate the learning of viewsviews

Page 50: Electronic Medical Record

P1 1 5/02 No 0.03 RU4 B

P1 2 5/04 Yes 0.05 RU4 M

P1 3 5/04 No 0.04 LL3 B

P2 4 6/00 No 0.02 RL2 B … … … … … … …

Patient Abnormality Date Calcification … Mass Loc Benign/ Fine/Linear Size Malignant

Possible ViewPossible View

Page 51: Electronic Medical Record

P1 1 5/02 No 0.03 No RU4 B

P1 2 5/04 Yes 0.05 Yes RU4 M

P1 3 5/04 No 0.04 No LL3 B

P2 4 6/00 No 0.02 No RL2 B … … … … … … … …

Patient Abnormality Date Calcification … Mass Increase Loc Benign/

Fine/Linear Size in size Malignant

New SchemaNew Schema

IncreaseIn Size

No

Yes

No

No…

Page 52: Electronic Medical Record

Level 4: View LearningLevel 4: View Learning

Increasein Size Avg

Sizethis date

Benign v.Malignant

Calc FineLinear

MassSize

Note: Include aggregate features Learn parameters for each node

Page 53: Electronic Medical Record

Level 4: View LearningLevel 4: View Learning

Learn rules predictive of “malignant”Learn rules predictive of “malignant” We used Aleph (Srinivasan)We used Aleph (Srinivasan)

Treat each rule as a new fieldTreat each rule as a new field 1 if abnormality matches rule1 if abnormality matches rule 0 otherwise0 otherwise

New view consists of original table New view consists of original table extended with new fieldsextended with new fields

Page 54: Electronic Medical Record

Experimental MethodologyExperimental Methodology

10-fold cross validation10-fold cross validation Split at the patient levelSplit at the patient level Roughly 40 malignant cases and Roughly 40 malignant cases and

6000 benign cases in each fold6000 benign cases in each fold Tree Augmented Naïve Bayes (TAN) Tree Augmented Naïve Bayes (TAN)

as structure learner as structure learner (Friedman,Geiger & (Friedman,Geiger & Goldszmidt ’97)Goldszmidt ’97)

Page 55: Electronic Medical Record

Sample View Sample View [Burnside et al. AMIA05][Burnside et al. AMIA05]

malignant(malignant(AA) :-) :-

birads_category(birads_category(AA,b5), ,b5),

massPAO(massPAO(AA,present), ,present),

massesDensity(massesDensity(AA,high), ,high),

ho_breastCA(ho_breastCA(AA,hxDCorLC), ,hxDCorLC),

in_same_mammogram(in_same_mammogram(AA,,BB),),

calc_pleomorphic(calc_pleomorphic(BB,notPresent), ,notPresent),

calc_punctate(calc_punctate(BB,notPresent).,notPresent).

Page 56: Electronic Medical Record

All Levels of Learning

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Pre

cisi

on

Level 4 (View)

Level 3 (Aggregate)

Level 2 (Structure)

Level 1 (Parameter)

Page 57: Electronic Medical Record

View Learning: First View Learning: First ApproachApproach

[Davis et al. IA05, Davis et al. IJCAI05][Davis et al. IA05, Davis et al. IJCAI05]

RuleLearner

Step 1

Learn

Rule NRule 2Rule 1

TargetPredicate

Step 3

Build Model

Step 2

Select

Page 58: Electronic Medical Record

Drawback to First ApproachDrawback to First Approach

Mismatch betweenMismatch between Rule buildingRule building Model’s use of rulesModel’s use of rules

Should Should SScore core AAs s YYou ou UUse se (SAYU)(SAYU)

Page 59: Electronic Medical Record

SAYUSAYU[Davis et al. ECML05][Davis et al. ECML05]

Build network as we learn rulesBuild network as we learn rules[Landwehr et al. AAAI 2005][Landwehr et al. AAAI 2005]

Score rule on whether it improves Score rule on whether it improves networknetwork

Results in tight coupling between Results in tight coupling between rule generation, selection and usagerule generation, selection and usage

Page 60: Electronic Medical Record

SAYU-NBSAYU-NB

seed 1Rule 2

Rule 14 Rule N

Rule 1

ClassValue

…Score = 0.0

20.12

0.10

0.15

0.35

Rule 3

seed 2

Page 61: Electronic Medical Record

SAYU-ViewSAYU-View[Davis et al. Intro to SRL 06][Davis et al. Intro to SRL 06]

FeatN

AggM

Feat1

Agg1

ClassValue

…Rule

1Rule

L

……

Page 62: Electronic Medical Record

Parameter SettingsParameter Settings

Score using AUC-PR (recall >= .5)Score using AUC-PR (recall >= .5) Keep a rule: 2% increase in AUCKeep a rule: 2% increase in AUC Switch seeds after adding a ruleSwitch seeds after adding a rule Train set to learn network Train set to learn network

structure and parametersstructure and parameters Tune set to score structuresTune set to score structures

Page 63: Electronic Medical Record

Relational Learning Algorithms

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Pre

cisi

on

SAYU-View

Initial Level 4 (View)

Level 3 (Aggregates)

Page 64: Electronic Medical Record

Electronic Medical RecordElectronic Medical Record

PatientID Gender Birthdate

P1 M 3/22/63

PatientID Date Physician Symptoms Diagnosis

P1 1/1/01 Smith palpitations hypoglycemic P1 2/1/03 Jones fever, aches influenza

PatientID Date Lab Test Result

P1 1/1/01 blood glucose 42 P1 1/9/01 blood glucose 45

PatientID SNP1 SNP2 … SNP500K

P1 AA AB BB P2 AB BB AA

PatientID Date Prescribed Date Filled Physician Medication Dose Duration

P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months

Page 65: Electronic Medical Record

Cox-1

Cox Inhibition Cox Inhibition

Non-steroidal anti-inflammatory drugNon-steroidal anti-inflammatory drug Cox-2 goal: reduce stomach troubleCox-2 goal: reduce stomach trouble

Vioxx, Bextra,

Celebrex block this pathway

Aspirin, Aleve,Ibuprofen, etc

block both pathways Cox-2

Page 66: Electronic Medical Record

Cox-2 TimelineCox-2 Timeline

Dec. 1998-May 1999, Celebrex, Vioxx approved

2001, Cox-2 sales top $6 billion/year in US

2002, Beginning ofAPPROVe Study

Dec. 2004, FDA issues warning

Sept 2004, Vioxx voluntarily pulled from market

April 2005, FDA removes Bextra from market

Page 67: Electronic Medical Record

Predicting Adverse ReactionPredicting Adverse Reactionto Cox-2 Inhibitorsto Cox-2 Inhibitors

Given: A patient’s clinical historyGiven: A patient’s clinical history

Do: Predict whether the patient will Do: Predict whether the patient will have ahave a myocardial infarction (MI) myocardial infarction (MI)

Note: This is work in progress Note: This is work in progress

Page 68: Electronic Medical Record

DataData

492 patients who took Cox-2, MI492 patients who took Cox-2, MI 77077 patients who took Cox-2, no MI77077 patients who took Cox-2, no MI

Sub-sampled 651 patients Sub-sampled 651 patients Relational tables forRelational tables for

Lab testsLab tests Drugs takenDrugs taken DiagnosesDiagnoses ObservationsObservations

Page 69: Electronic Medical Record

Q: What Data to Use?Q: What Data to Use?

All data for a patient? Many perfect All data for a patient? Many perfect predictorspredictors

Cut off data right before MICut off data right before MI Model not relevant pre-Cox2ibModel not relevant pre-Cox2ib Uniformly more data for non-MI casesUniformly more data for non-MI cases

Our choice: cut off data for each Our choice: cut off data for each patient at first Cox2ib prescriptionpatient at first Cox2ib prescription

Page 70: Electronic Medical Record

Approaches TriedApproaches Tried

Propositional: Linear SVM, naïve Propositional: Linear SVM, naïve Bayes, TAN, trees, boosted trees, Bayes, TAN, trees, boosted trees, boosted rules boosted rules

Relational: Inductive Logic Relational: Inductive Logic Programming (ILP) system AlephProgramming (ILP) system Aleph

SRL: View learning with SAYUSRL: View learning with SAYU

Page 71: Electronic Medical Record

Experimental MethodologyExperimental Methodology

10-fold cross validation10-fold cross validation

Feature selection pick top 50/foldFeature selection pick top 50/fold

ROC curves to evaluateROC curves to evaluate

Paired t-test for significancePaired t-test for significance

Page 72: Electronic Medical Record

Algorithms ComparedAlgorithms Compared

Naïve BayesNaïve Bayes

Boosted rules (C5)Boosted rules (C5)

SAYU-TAN (w/initial feature set)SAYU-TAN (w/initial feature set)

Note: Preliminary results with Aleph were poor/slowNote: Preliminary results with Aleph were poor/slow

Best feature vector approaches

Page 73: Electronic Medical Record

Algorithm ComparisonAlgorithm Comparison

Page 74: Electronic Medical Record

ROC AreaROC Area

Boosted Rules

Naive Bayes

SAYU-TAN

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Ave

rage

AU

C-R

OC

Page 75: Electronic Medical Record

Sample RuleSample Rule

myocardial_infarction(A) :-myocardial_infarction(A) :-

hasdrug(A, GLUCOSE),hasdrug(A, GLUCOSE),

diagnosis(A, ischemic heart diagnosis(A, ischemic heart disease). disease).

Page 76: Electronic Medical Record

Sample RuleSample Rule

myocardial_infarction(A) :-myocardial_infarction(A) :-

diagnosis(A,B, INFECTIOUS AND diagnosis(A,B, INFECTIOUS AND PARASITIC PARASITIC

DISEASES),DISEASES),

before(B,10/26/1982),before(B,10/26/1982),

age(A,B,C),age(A,B,C),

younger(C, 51).younger(C, 51).

Page 77: Electronic Medical Record

Lingering QuestionsLingering Questions

Are we predicting predisposition to Are we predicting predisposition to MI?MI?

Can we do better with data we have?Can we do better with data we have?

How much will genotype data help? How much will genotype data help?

Page 78: Electronic Medical Record

ConclusionsConclusions

EMRs and genotyping give machine EMRs and genotyping give machine learning a new opportunity for great learning a new opportunity for great impact on healthcare in next few yearsimpact on healthcare in next few years Personalized medicinePersonalized medicine Pharmacovigilance (FDA’s Sentinel, OMOP)Pharmacovigilance (FDA’s Sentinel, OMOP) Decision supportDecision support

Statistical relational learning helps for Statistical relational learning helps for some tasks (but not all)some tasks (but not all)

Page 79: Electronic Medical Record

Conclusions (Continued)Conclusions (Continued)

Fancy new algorithms not always the Fancy new algorithms not always the best… healthcare applications raise best… healthcare applications raise other issuesother issues Missing data (not missing at random)Missing data (not missing at random) Need simple, comprehensible models… Need simple, comprehensible models…

clinicians may prefer slightly less clinicians may prefer slightly less accurate model if it makes more sense accurate model if it makes more sense to themto them

Different evaluation metricsDifferent evaluation metrics

Page 80: Electronic Medical Record

ThanksThanks Jesse DavisJesse Davis Beth BurnsideBeth Burnside Vitor Santos CostaVitor Santos Costa Michael CaldwellMichael Caldwell Peggy PeissigPeggy Peissig Eric LantzEric Lantz Jude ShavlikJude Shavlik IWPCIWPC WGI (Wisconsin Genomics Initiative)WGI (Wisconsin Genomics Initiative)


Recommended