+ All Categories
Home > Documents > University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse...

University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse...

Date post: 01-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
29
Integrating R efficiently to allow secure, interactive analysis within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman University of Kansas Medical Center
Transcript
Page 1: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Integrating R efficiently to allow secure, interactive analysiswithin a clinical data warehouse

Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel

R. WaitmanUniversity of Kansas Medical Center

Page 2: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Interactive R statistical visualization in HERON Clinical Data Repository

Page 3: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Interactive R statistical visualization in HERON Clinical Data Repository

Please don't laugh if our R plots are crude and ugly. ;-) We're new to R and here to learn.

Page 4: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Overview● R in HERON/I2B2:

○ What it looks like○ Motivation: research support goals

● Background○ I2B2○ R Engine Cell

● Toward a general architecture for I2B2+R○ Efficiency/Scalability○ Separation of Concerns, Security

Page 5: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

HERON Research Support GoalsClinical Data Repository supports:

● Cohort Discovery○ prospective trials: feasibility○ retrospective studies: data use

● Hypothesis Generation○ explore data○ summarize○ visualize

Waitman LR, Warren JJ, Manos EL, Connolly DW. Expressing Observations from Electronic Medical Record Flowsheets in an i2b2 based Clinical Data Repository to Support Research and Quality Improvement. AMIA Annu Symp Proc. 2011;2011:1454-63.

photo credit: Christopher Harshaw

informatics.kumc.edu

HealthcareEnterpriseRepository forOntologicalNarration

Page 6: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

HERON System Architecture● Data from Epic Clarity database (> 7,000 tables & 60,000 columns)● Transformed into an I2B2-compatible schema. Then, de-identified, and

loaded on a separate database server to be accessed by I2B2.● De-identified data used by I2B2 is deemed non-human subjects research

by our institutional review board

patient privacy, institutional liability

python, SQL

Page 7: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Exploring Breast Cancer comorbidities: Obesity, DiabetesHERON brings together diabetes diagnosis and BMI from hospital EMR with cancer staging from tumor

registry and vital status from the U.S. SSA death index.

Page 8: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

i2b2 Query Tool: Counts, Analysis

Murphy SN, Weber G, Mendis M, Chueh HC, Churchill S, Glaser JP, Kohane IS. Serving the Enterprise and beyond with Informatics for Integrating Biology and the Bedside (i2b2). J Am Med Inform Assoc. 2010;17(2):124-30.

Page 9: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

i2b2 Patient Data Query

source: Murphy et. al. AMIA 2010

Page 10: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Segagni D, Ferrazzi F, Larizza C, Tibollo V, Napolitano C, Priori SG, Bellazzi R. R engine cell: integrating R into the i2b2 software infrastructure. J Am Med Inform Assoc. 2011 May 1;18(3):314-7. Epub 2011 Jan 24.

Kaplan Meier Web Client

Plug-in

RECell

Kaplan Meier jar

application

JRI libraries

R statistical software

CRC Cell I2B2 DW

I2B2 HIVE

1

2

3 4

5

R Engine Cell

To the web plug-in requests patients, visits, and observation data from the clinical research chart (CRC) cell.

Page 11: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Segagni D, Ferrazzi F, Larizza C, Tibollo V, Napolitano C, Priori SG, Bellazzi R. R engine cell: integrating R into the i2b2 software infrastructure. J Am Med Inform Assoc. 2011 May 1;18(3):314-7. Epub 2011 Jan 24.

Kaplan Meier Web Client

Plug-in

RECell

Kaplan Meier jar

application

JRI libraries

R statistical software

CRC Cell I2B2 DW

I2B2 HIVE

1

2

3 4

5

R Engine Cell

CRC Cell sends back to the plug-in an XML response containing the requested data (extracted from the i2b2 datawarehouse).

Page 12: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Segagni D, Ferrazzi F, Larizza C, Tibollo V, Napolitano C, Priori SG, Bellazzi R. R engine cell: integrating R into the i2b2 software infrastructure. J Am Med Inform Assoc. 2011 May 1;18(3):314-7. Epub 2011 Jan 24.

Kaplan Meier Web Client

Plug-in

RECell

Kaplan Meier jar

application

JRI libraries

R statistical software

CRC Cell I2B2 DW

I2B2 HIVE

1

2

3 4

5

R Engine CellThe web client plug-in sends the data to the RE Cell through dynamically created XML messages.

Page 13: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Segagni D, Ferrazzi F, Larizza C, Tibollo V, Napolitano C, Priori SG, Bellazzi R. R engine cell: integrating R into the i2b2 software infrastructure. J Am Med Inform Assoc. 2011 May 1;18(3):314-7. Epub 2011 Jan 24.

Kaplan Meier Web Client

Plug-in

RECell

Kaplan Meier jar

application

JRI libraries

R statistical software

CRC Cell I2B2 DW

I2B2 HIVE

1

2

3 4

5

R Engine Cell The RE Cell creates the dataset for the analysis by parsing the XML and runs the Kaplan–Meier jar application. This application, through the JRI libraries, uses the R statistical software.

Page 14: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Segagni D, Ferrazzi F, Larizza C, Tibollo V, Napolitano C, Priori SG, Bellazzi R. R engine cell: integrating R into the i2b2 software infrastructure. J Am Med Inform Assoc. 2011 May 1;18(3):314-7. Epub 2011 Jan 24.

Kaplan Meier Web Client

Plug-in

RECell

Kaplan Meier jar

application

JRI libraries

R statistical software

CRC Cell I2B2 DW

I2B2 HIVE

1

2

3 4

5

R Engine CellThe RE Cell returns to the web client plug-in the URL where the results have been saved. The web client plug-in shows the survival analysis HTML report and related graphics.

Page 15: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Kaplan Meier Web Client

Plug-in

RECell

Kaplan Meier jar

application

JRI libraries

R statistical software

CRC Cell I2B2 DW

I2B2 HIVE

1

2

3 4

5

Integrating the R Engine Cell with HERON for Cancer Research

Issues:● Clinical Domain

○ cardio vs. cancer○ start at birth vs start at

diagnosis○ stratification: gender vs. stage

● Version Skew○ RE Cell: I2B2 version 1.4○ HERON: I2B2 version 1.6

● Architecture...photo credit: Christopher Harshaw

Page 16: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Kaplan Meier Web Client

Plug-in

rgate

km_analysis.R

rpy libraries

R statistical software

I2B2 DW

apache

1 3

5

Toward a General Architecture for R in I2B2

I2B2 HIVE

PM cell

2

4

biostatistics,R

patient privacy, institutional liability

biostatistics,R biostatistics,

R

abc_analysis.R xyz_analysis.R

abcWeb Client

Plug-in xyzWeb Client

Plug-in

cancer prevention,treatment

python, SQL, HTML, JavaScript

Page 17: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Kaplan Meier Web Client

Plug-in

RECell

Kaplan Meier jar

application

JRI libraries

R statistical software

CRC Cell I2B2 DW

I2B2 HIVE

1

2

3 4

5

Efficiency, Scalability:R Engine Cell Data Path

CRC Cell sends back to the plug-in an XML response containing the requested data (extracted from the i2b2 datawarehouse).

725,000,000 facts incl. 60,000 cancer cases

Page 18: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Kaplan Meier Web Client

Plug-in

rgate

km_analysis.R

rpy libraries

R statistical software

I2B2 DW

apache

1 3

5

Efficiency, Scalability:rgate connects R to Oracle directly

I2B2 HIVE

PM cell

2

4

Like the CRC cell, rgate calls the PM cell to validate authorization.

Page 19: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

R Engine Cell approach to R Integration: Kaplan Meier jar application

R Code Generation in KMAnalysis.java:...

Integer[] statusInteger = (Integer[])status.toArray(new Integer[status.size()]);

StringBuffer statusStr = new StringBuffer();

statusStr.append("status<-c(");

for(int i=0;i<statusInteger.length;i++){

statusStr.append(statusInteger[i].intValue());

if(i!=(statusInteger.length-1))

statusStr.append(",");

}

statusStr.append(")");

...

re.eval("data=data.frame(time,status,gender)");

re.eval("names(data)=c('time','status','gender')");

re.eval("setwd(\""+resultFolder+"\")");

re.eval("library(survival)");

re.eval("fit <- survfit(Surv(data$time, data$status) ~ gender, data)");

python, SQL, HTML, JavaScript

Page 20: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

R Engine Cell approach to R Integration: Kaplan Meier jar application

R Code Generation in KMAnalysis.java:...

Integer[] statusInteger = (Integer[])status.toArray(new Integer[status.size()]);

StringBuffer statusStr = new StringBuffer();

statusStr.append("status<-c(");

for(int i=0;i<statusInteger.length;i++){

statusStr.append(statusInteger[i].intValue());

if(i!=(statusInteger.length-1))

statusStr.append(",");

}

statusStr.append(")");

...

re.eval("data=data.frame(time,status,gender)");

re.eval("names(data)=c('time','status','gender')");

re.eval("setwd(\""+resultFolder+"\")");

re.eval("library(survival)");

re.eval("fit <- survfit(Surv(data$time, data$status) ~ gender, data)");

biostatistics,R

Page 21: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Separation of Concerns in rgate:R code goes in .R files

Analysis is written in the language of statisticians:##' km_analysis -- Kaplan Meyer analysis from i2b2 observations

library(ROracle)

acct = db_config()

patient.set.survival <- function(concept.paths, patient.set.id,

web.folder, filename) {

conn <- dbConnect(Oracle(), acct$username, acct$password, access)

sql <- paste("

select '", concept.paths$event, "' panel

, to_char(f.start_date, 'YYYY-MM-DD HH24:MI:SS') start_date

, pset.patient_num

, cd.name_char

, cd.concept_cd

from blueherondata.observation_fact f, ...")

data = transform.observations(dbGetQuery(conn, sql))

fit <- survfit(Surv(data$time, data$status) ~ concept.paths$stratum, data)

png(paste(web.folder, filename, sep='/'))

plot(fit, xlab="Time (Years)", ylab="Survival probability")

dev.off()

}

biostatistics,R

Page 22: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Separation of Concerns in rgate:R code goes in .R files, but...

How well does the R code behave when the author is not there?:##' km_analysis -- Kaplan Meyer analysis from i2b2 observations

library(ROracle)

acct = db_config()

patient.set.survival <- function(concept.paths, patient.set.id,

web.folder, filename) {

conn <- dbConnect(Oracle(), acct$username, acct$password, access)

sql <- paste("

select '", concept.paths$event, "' panel

, to_char(f.start_date, 'YYYY-MM-DD HH24:MI:SS') start_date

, pset.patient_num

, cd.name_char

, cd.concept_cd

from blueherondata.observation_fact f, ...")

data = transform.observations(dbGetQuery(conn, sql))

fit <- survfit(Surv(data$time, data$status) ~ concept.paths$stratum, data)

png(paste(web.folder, filename, sep='/'))

plot(fit, xlab="Time (Years)", ylab="Survival probability")

dev.off()

}

patient privacy, institutional liability

python, SQL, HTML, JavaScript

what the R author needs

???

Page 23: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Object Capability Discipline supports the Principle of Least Authority

Memory safety and encapsulation1 + Effects only by using held references2 + No powerful references by default3

Reference graph ≡ Access graph Only connectivity begets connectivity Natural Least Authority OO expressiveness for security patterns

acct = db_config()

1. closure inspection is not safe: environment(function), as.list(function)

2. plot(fit) implicitly uses results of png(paste(web.folder, filename))3. R global environment most likely includes lots of powerful

references

A B

C

m

A B

C

M. Miller, C. Morningstar, B. Frantz; "Capability-based Financial Instruments"; Proceedings of Financial Cryptography (Springer-Verlag); 2000 erights.org

erights.org

in a: b.m(c)

Page 24: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Kaplan Meier Web Client

Plug-in

rgate

km_analysis.R

rpy libraries

R statistical software

I2B2 DW

apache

1

35

I2B2 HIVE

PM cell

24

rgate Security Architecture:Authority in the System Context

POST

selectget

user config

1

...

Page 25: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

R

rgate Security Architecture:km_analysis.R starts with no authority

deid.R

rgate.py

results

km_analysis.R

apache

I2B2 DW

Python

rpy2

none!

"The principle of least authority requires one to design interfaces such that authority is handed out only on a need-to-do basis." - Miller et. al.

Page 26: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

R

rgate Security Architecture:deid.R attenuates DW access with patient set facet

deid.R

rgate.py

results

km_analysis.R

patients

patient set #7..

rOracle con #xf..

apache

I2B2 DW

Python

rpy2

POST

"facets are objects that act as intermediaries between powerful objects and users that do not need (and should not be granted) its full power." - Miller et. al.

Page 27: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

R

rgate Security Architecture:km_analysis.R can only read patient set, write results

run

deid.R

rgate.py

results

plot #2.. km_analysis.R

observations

patients

patient set #7..

rOracle con #xf..

apache

I2B2 DWSELECT

...

Python

rpy2

POST

Page 28: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

rgate Security Architecture:km_analysis.R can only read patient set

Attenuated patient data access:##' km_analysis -- Kaplan Meyer analysis from i2b2 observations

library(survival)

run_analysis <- function(patient.set, folder, filename, progress,

paths, title, xmax) {

obs.db = observations(patient.set, unlist(paths))

progress(paste("query returned", nrow(obs.db), " observations."))

data <- db2km(obs.db, paths)

progress(paste("db2km resulted in ", nrow(data), "data points for plotting."))

survplot(data, title, folder, xmax, filename)

progress(paste("KM plot stored in", filename, "in", folder))

}

biostatistics,R

patient privacy, institutional liability

Page 29: University of Kansas Medical Center R. Waitman Adagarla ... · within a clinical data warehouse Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman ... Data from

Efficient, Secure Interactive R statistical visualization in HERON/I2B2

python, SQL, HTML, JavaScript

cancer prevention,treatment

biostatistics,R

patient privacy, institutional liability

informatics.kumc.edu


Recommended