+ All Categories
Home > Data & Analytics > Semantic Web Technologies in Health Care Analytics

Semantic Web Technologies in Health Care Analytics

Date post: 16-Apr-2017
Category:
Upload: robert-piro
View: 339 times
Download: 2 times
Share this document with a friend
39
SEMANTIC WEB TECHNOLOGIES IN HEALTH CARE ANALYTICS AN I MPACT SCENARIO FOR DATALOG REASONING WITH RDFOX Robert Piro Departmental Seminar Robert Piro Semantic Web Technologies in Health Care 1/15
Transcript
Page 1: Semantic Web Technologies in Health Care Analytics

SEMANTIC WEB TECHNOLOGIES IN HEALTH CARE ANALYTICSAN IMPACT SCENARIO FOR DATALOG REASONING WITH RDFOX

Robert Piro

Departmental Seminar

Robert Piro Semantic Web Technologies in Health Care 1/15

Page 2: Semantic Web Technologies in Health Care Analytics

OVERVIEW

1 RDFOX

RDF

Datalog

2 PROJECT WITH KAISER PERMANENTE

HEDIS Measures for Diabetic Care

Data Model

Data Model as RDF Triples

The Datalog Rules

3 CONCLUSION & FUTURE WORK

Robert Piro Semantic Web Technologies in Health Care 2/15

Page 3: Semantic Web Technologies in Health Care Analytics

RDFox

RDFOX — RESULT OF 4 YEARS OF DEVELOPMENT

RDFOX (BORIS MOTIK, YAVOR NENOV, ROBERT PIRO, IAN HORROCKS)

in memory RDF Triple Store — optimised indexing

parallel Datalog Reasoner — very good scalability

FEATURES

load RDF data (Triples/Turtle)

materialise data — (extended) Datalog language

incremental reasoning / equality reasoning

query data — SPARQL query Language

INTEGRATION

stand-alone C++ implementation / C++ library

Java/Python Bridge

SPARQL end-point

Robert Piro Semantic Web Technologies in Health Care 3/15

Page 4: Semantic Web Technologies in Health Care Analytics

RDFox

RDFOX — RESULT OF 4 YEARS OF DEVELOPMENT

RDFOX (BORIS MOTIK, YAVOR NENOV, ROBERT PIRO, IAN HORROCKS)

in memory RDF Triple Store — optimised indexing

parallel Datalog Reasoner — very good scalability

FEATURES

load RDF data (Triples/Turtle)

materialise data — (extended) Datalog language

incremental reasoning / equality reasoning

query data — SPARQL query Language

INTEGRATION

stand-alone C++ implementation / C++ library

Java/Python Bridge

SPARQL end-point

Robert Piro Semantic Web Technologies in Health Care 3/15

Page 5: Semantic Web Technologies in Health Care Analytics

RDFox

RDFOX — RESULT OF 4 YEARS OF DEVELOPMENT

RDFOX (BORIS MOTIK, YAVOR NENOV, ROBERT PIRO, IAN HORROCKS)

in memory RDF Triple Store — optimised indexing

parallel Datalog Reasoner — very good scalability

FEATURES

load RDF data (Triples/Turtle)

materialise data — (extended) Datalog language

incremental reasoning / equality reasoning

query data — SPARQL query Language

INTEGRATION

stand-alone C++ implementation / C++ library

Java/Python Bridge

SPARQL end-point

Robert Piro Semantic Web Technologies in Health Care 3/15

Page 6: Semantic Web Technologies in Health Care Analytics

RDFox RDF

RDF — RESOURCE DESCRIPTION FRAMEWORK

RDF

data format with types W3C standard encode semantic data

Triple: subject predicate object (s, p, o)building blocks: resources & literals

URI — <http://www.w3.org/2001/XMLSchema#double>String, Boolean, Integer, Decimal — "0.789"ˆˆxsd:double

EXAMPLE (ENCODING A DATABASE TABLE IN RDF)

Table: PATIENT VISITREC | MBR | SERV DT | CPT | ... | DIAG1 | ... | DIAG22

001 | 007 | 20151101 | ...

@prefix ex: <http://my.example.com/FieldName/> .@prefix visit: <http://my.example.com/Rec/PATIENT VISIT/> .visit:001 ex:MBR "007" .visit:001 ex:SERV DT "2015-11-01"ˆˆxsd:date .

Robert Piro Semantic Web Technologies in Health Care 4/15

Page 7: Semantic Web Technologies in Health Care Analytics

RDFox RDF

RDF — RESOURCE DESCRIPTION FRAMEWORK

RDF

data format with types W3C standard encode semantic data

Triple: subject predicate object (s, p, o)building blocks: resources & literals

URI — <http://www.w3.org/2001/XMLSchema#double>String, Boolean, Integer, Decimal — "0.789"ˆˆxsd:double

EXAMPLE (ENCODING A DATABASE TABLE IN RDF)

Table: PATIENT VISITREC | MBR | SERV DT | CPT | ... | DIAG1 | ... | DIAG22

001 | 007 | 20151101 | ...

@prefix ex: <http://my.example.com/FieldName/> .@prefix visit: <http://my.example.com/Rec/PATIENT VISIT/> .visit:001 ex:MBR "007" .visit:001 ex:SERV DT "2015-11-01"ˆˆxsd:date .

Robert Piro Semantic Web Technologies in Health Care 4/15

Page 8: Semantic Web Technologies in Health Care Analytics

RDFox Datalog

DATALOG

RDF DATALOG RULE

[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’

Variables start with ‘?’. Var(head) ⊆ Var(body)

EXAMPLE (MATERIALISATION WITH RDFOX)

[?p, ex:has, ex:Diabetes]← [?p, ex:MBRNo, ?mbr], [?rec, ex:MBR, ?mbr],[?rec, ex:DIAG, "Diabetes"].

Datap:007 ex:MBRNo "007" . v:001 ex:DIAG "Diabetes" .v:001 ex:MBR "007" . p:001 ex:MBR "001" .

p:007 ex:has ex:Diabetes .

RDFOX COMPUTES all CONSEQUENCES . . .

also from newly derived data

in a systematic way

Robert Piro Semantic Web Technologies in Health Care 5/15

Page 9: Semantic Web Technologies in Health Care Analytics

RDFox Datalog

DATALOG

RDF DATALOG RULE

[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’

Variables start with ‘?’. Var(head) ⊆ Var(body)

EXAMPLE (MATERIALISATION WITH RDFOX)

[?p, ex:has, ex:Diabetes]← [?p, ex:MBRNo, ?mbr], [?rec, ex:MBR, ?mbr],[?rec, ex:DIAG, "Diabetes"].

Datap:007 ex:MBRNo "007" . v:001 ex:DIAG "Diabetes" .v:001 ex:MBR "007" . p:001 ex:MBR "001" .

p:007 ex:has ex:Diabetes .

RDFOX COMPUTES all CONSEQUENCES . . .

also from newly derived data

in a systematic way

Robert Piro Semantic Web Technologies in Health Care 5/15

Page 10: Semantic Web Technologies in Health Care Analytics

RDFox Datalog

DATALOG

RDF DATALOG RULE

[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’

Variables start with ‘?’. Var(head) ⊆ Var(body)

EXAMPLE (MATERIALISATION WITH RDFOX)

[?p, ex:has, ex:Diabetes]← [?p, ex:MBRNo, ?mbr], [?rec, ex:MBR, ?mbr],[?rec, ex:DIAG, "Diabetes"].

Datap:007 ex:MBRNo "007" . v:001 ex:DIAG "Diabetes" .v:001 ex:MBR "007" . p:001 ex:MBR "001" .

p:007 ex:has ex:Diabetes .

RDFOX COMPUTES all CONSEQUENCES . . .

also from newly derived data

in a systematic way

Robert Piro Semantic Web Technologies in Health Care 5/15

Page 11: Semantic Web Technologies in Health Care Analytics

RDFox Datalog

DATALOG

RDF DATALOG RULE

[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’

Variables start with ‘?’. Var(head) ⊆ Var(body)

EXAMPLE (MATERIALISATION WITH RDFOX)

[p:007, ex:has, ex:Diabetes]← [p:007, ex:MBRNo, "007"], [?rec, ex:MBR, "007"],[?rec, ex:DIAG, "Diabetes"].

Datap:007 ex:MBRNo "007" . v:001 ex:DIAG "Diabetes" .v:001 ex:MBR "007" . p:001 ex:MBR "001" .

p:007 ex:has ex:Diabetes .

RDFOX COMPUTES all CONSEQUENCES . . .

also from newly derived data

in a systematic way

Robert Piro Semantic Web Technologies in Health Care 5/15

Page 12: Semantic Web Technologies in Health Care Analytics

RDFox Datalog

DATALOG

RDF DATALOG RULE

[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’

Variables start with ‘?’. Var(head) ⊆ Var(body)

EXAMPLE (MATERIALISATION WITH RDFOX)

[p:007, ex:has, ex:Diabetes]← [p:007, ex:MBRNo, "007"], [v:001, ex:MBR, "001"][v:001, ex:DIAG, "Diabetes"].

Datap:007 ex:MBRNo "007" . v:001 ex:DIAG "Diabetes" .v:001 ex:MBR "007" . p:001 ex:MBR "001" .

p:007 ex:has ex:Diabetes .

RDFOX COMPUTES all CONSEQUENCES . . .

also from newly derived data

in a systematic way

Robert Piro Semantic Web Technologies in Health Care 5/15

Page 13: Semantic Web Technologies in Health Care Analytics

RDFox Datalog

DATALOG

RDF DATALOG RULE

[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’

Variables start with ‘?’. Var(head) ⊆ Var(body)

EXAMPLE (MATERIALISATION WITH RDFOX)

[p:007, ex:has, ex:Diabetes]← [p:007, ex:MBRNo, "007"], [v:001, ex:MBR, "007"],[v:001, ex:DIAG, "Diabetes"].

Datap:007 ex:MBRNo "007" . v:001 ex:DIAG "Diabetes" .v:001 ex:MBR "007" . p:001 ex:MBR "001" .

p:007 ex:has ex:Diabetes .

RDFOX COMPUTES all CONSEQUENCES . . .

also from newly derived data

in a systematic way

Robert Piro Semantic Web Technologies in Health Care 5/15

Page 14: Semantic Web Technologies in Health Care Analytics

RDFox Datalog

DATALOG

RDF DATALOG RULE

[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’

Variables start with ‘?’. Var(head) ⊆ Var(body)

EXAMPLE (MATERIALISATION WITH RDFOX)

[p:007, ex:has, ex:Diabetes]← [p:007, ex:MBRNo, "007"], [v:001, ex:MBR, "007"],[v:001, ex:DIAG, "Diabetes"].

Datap:007 ex:MBRNo "007" . v:001 ex:DIAG "Diabetes" .v:001 ex:MBR "007" . p:001 ex:MBR "001" .

p:007 ex:has ex:Diabetes .

RDFOX COMPUTES all CONSEQUENCES . . .

also from newly derived data

in a systematic way

Robert Piro Semantic Web Technologies in Health Care 5/15

Page 15: Semantic Web Technologies in Health Care Analytics

RDFox Datalog

DATALOG

RDF DATALOG RULE

[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’

Variables start with ‘?’. Var(head) ⊆ Var(body)

EXAMPLE (MATERIALISATION WITH RDFOX)

[?p, ex:has, ex:Diabetes]← [?p, ex:MBRNo, ?mbr], [?rec, ex:MBR, ?mbr],[?rec, ex:DIAG, "Diabetes"].

Datap:007 ex:MBR "007" . v:001 ex:DIAG "Diabetes" .v:001 ex:MBR "007" . p:001 ex:MBR "001" .

p:007 ex:has ex:Diabetes .

RDFOX COMPUTES all CONSEQUENCES . . .

also from newly derived data

in a systematic way

Robert Piro Semantic Web Technologies in Health Care 5/15

Page 16: Semantic Web Technologies in Health Care Analytics

RDFox Datalog

DATALOG

RDF DATALOG RULE

[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’

Variables start with ‘?’. Var(head) ⊆ Var(body)

EXAMPLE (MATERIALISATION WITH RDFOX)

[?p, ex:has, ex:Diabetes]← [?p, ex:MBRNo, ?mbr], [?rec, ex:MBR, ?mbr],[?rec, ex:DIAG, "Diabetes"].

Datap:007 ex:MBR "007" . v:001 ex:DIAG "Diabetes" .v:001 ex:MBR "007" . p:001 ex:MBR "001" .

p:007 ex:has ex:Diabetes .

RDFOX COMPUTES all CONSEQUENCES . . .

also from newly derived data

in a systematic way

Robert Piro Semantic Web Technologies in Health Care 5/15

Page 17: Semantic Web Technologies in Health Care Analytics

RDFox Datalog

DATALOG

RDF DATALOG RULE

[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’

Variables start with ‘?’. Var(head) ⊆ Var(body)

EXAMPLE (MATERIALISATION WITH RDFOX)

[?p, ex:has, ex:Diabetes]← [?p, ex:MBRNo, ?mbr], [?rec, ex:MBR, ?mbr],[?rec, ex:DIAG, "Diabetes"].

Datap:007 ex:MBR "007" . v:001 ex:DIAG "Diabetes" .v:001 ex:MBR "007" . p:001 ex:MBR "001" .

p:007 ex:has ex:Diabetes .

RDFOX COMPUTES all CONSEQUENCES . . . AND TERMINATES

also from newly derived data

in a systematic way

Robert Piro Semantic Web Technologies in Health Care 5/15

Page 18: Semantic Web Technologies in Health Care Analytics

RDFox Datalog

RDFOX AND DATALOG

STATS

Name Start (Trp) End (Trp) Mem Cores TimeDBpedia 112M 118M 6.1GB 8 28sClaros 19M 96 M 4.2GB 16(32) 127sLUBM-1K 134M 182M 9.3GB 16 8sLUBM-9K 6G 9G ≈100GB 128(1024) 8s

FEATURES OF RDFOX DATALOG

Allows many more constructs (arithmetic*, string ops*, comparisons)

Will allow negation, aggregation (can be simulated already)

Generalises OWL 2 RL; Reasoning with OWL 2 EL reduceable to Datalog

GENERAL FEATURES OF DATALOG

Intuitive if-then-statements

Declarative (say what, not how to compute)

Powerful due to recursion

Robert Piro Semantic Web Technologies in Health Care 6/15

Page 19: Semantic Web Technologies in Health Care Analytics

RDFox Datalog

RDFOX AND DATALOG

STATS

Name Start (Trp) End (Trp) Mem Cores TimeDBpedia 112M 118M 6.1GB 8 28sClaros 19M 96 M 4.2GB 16(32) 127sLUBM-1K 134M 182M 9.3GB 16 8sLUBM-9K 6G 9G ≈100GB 128(1024) 8s

FEATURES OF RDFOX DATALOG

Allows many more constructs (arithmetic*, string ops*, comparisons)

Will allow negation, aggregation (can be simulated already)

Generalises OWL 2 RL; Reasoning with OWL 2 EL reduceable to Datalog

GENERAL FEATURES OF DATALOG

Intuitive if-then-statements

Declarative (say what, not how to compute)

Powerful due to recursion

Robert Piro Semantic Web Technologies in Health Care 6/15

Page 20: Semantic Web Technologies in Health Care Analytics

RDFox Datalog

RDFOX AND DATALOG

STATS

Name Start (Trp) End (Trp) Mem Cores TimeDBpedia 112M 118M 6.1GB 8 28sClaros 19M 96 M 4.2GB 16(32) 127sLUBM-1K 134M 182M 9.3GB 16 8sLUBM-9K 6G 9G ≈100GB 128(1024) 8s

FEATURES OF RDFOX DATALOG

Allows many more constructs (arithmetic*, string ops*, comparisons)

Will allow negation, aggregation (can be simulated already)

Generalises OWL 2 RL; Reasoning with OWL 2 EL reduceable to Datalog

GENERAL FEATURES OF DATALOG

Intuitive if-then-statements

Declarative (say what, not how to compute)

Powerful due to recursion

Robert Piro Semantic Web Technologies in Health Care 6/15

Page 21: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente

KAISER PERMANENTE

THE ORGANISATION

Kaiser HealthPlan, Kaiser Hospitals, Permanente Medical Group

KP largest ‘managed care’ organisation in the U.S.

KP HealthConnect; largest private electronic health record system

STATS

9.6M members

38 medical centres

620 medical offices

177k emloyees

17k physicians

50k nurses

Turn over 56.4G USD

Net income 3.1G USD

Robert Piro Semantic Web Technologies in Health Care 7/15

Page 22: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente HEDIS Measures for Diabetic Care

HEALTHCARE EFFECTIVENESS DATA AND INFORMATION SET

HEDIS

Performance measure specification issued NCQA1 (USA)

Percentages of a precisely defined eligible population:#Eligible with eye exam

#Eligible(is Diabetic,≤65yo, etc)

Entry requirements for government funded healthcare (Medicare)

HEDIS MEASURE COMPUTATION: TODAY

Disparate data sources (historically grown)

Ad-hoc schemas used to store data (meaning implicit)Involved programs for analytics software

mix data (re)formatting and measuringdifficult to maintainrequire high expertise of IT-experts

1National Committee for Quality assuranceRobert Piro Semantic Web Technologies in Health Care 8/15

Page 23: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente HEDIS Measures for Diabetic Care

HEALTHCARE EFFECTIVENESS DATA AND INFORMATION SET

HEDIS

Performance measure specification issued NCQA1 (USA)

Percentages of a precisely defined eligible population:#Eligible with eye exam

#Eligible(is Diabetic,≤65yo, etc)

Entry requirements for government funded healthcare (Medicare)

HEDIS MEASURE COMPUTATION: TODAY

Disparate data sources (historically grown)

Ad-hoc schemas used to store data (meaning implicit)Involved programs for analytics software

mix data (re)formatting and measuringdifficult to maintainrequire high expertise of IT-experts

1National Committee for Quality assuranceRobert Piro Semantic Web Technologies in Health Care 8/15

Page 24: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente HEDIS Measures for Diabetic Care

HEDIS MEASURE COMPUTATION IN OUR PROJECT

NEW APPROACH (PETER HENDLER, ROBERT PIRO)

Separate data aggregation and reformatting from computing measures!

Data model inspired by HL7 RIM: ‘Entities in Roles Participating in Acts’

Data translated as RDF-triples into the data model first (Java/Scala)

RDFox Datalog rules compute measures according to this model

Results are read out through simple queries

BENEFITS

Reusability: uniform data model reusable for other tasks

Efficiency: rules are close to natural language & concise

Maintainability: rules are declarative and easy to understand

Robert Piro Semantic Web Technologies in Health Care 9/15

Page 25: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente HEDIS Measures for Diabetic Care

HEDIS MEASURE COMPUTATION IN OUR PROJECT

NEW APPROACH (PETER HENDLER, ROBERT PIRO)

Separate data aggregation and reformatting from computing measures!

Data model inspired by HL7 RIM: ‘Entities in Roles Participating in Acts’

Data translated as RDF-triples into the data model first (Java/Scala)

RDFox Datalog rules compute measures according to this model

Results are read out through simple queries

BENEFITS

Reusability: uniform data model reusable for other tasks

Efficiency: rules are close to natural language & concise

Maintainability: rules are declarative and easy to understand

Robert Piro Semantic Web Technologies in Health Care 9/15

Page 26: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente Data Model

DATA MODEL

INSPIRED BY HL7 REFERENCE INFORMATION MODEL (RIM)

Entity Role Participation ActhasRole hasPart hasAct

ISO standard: ISO/HL7 21731:2014

Process centric (Administrative KR)

Developed for/in the medical community; BUT ‘NHS experience’

EXAMPLE

Robert Piro Semantic Web Technologies in Health Care 10/15

Page 27: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente Data Model

DATA MODEL

INSPIRED BY HL7 REFERENCE INFORMATION MODEL (RIM)

Entity Role Participation ActhasRole hasPart hasAct

ISO standard: ISO/HL7 21731:2014

Process centric (Administrative KR)

Developed for/in the medical community; BUT ‘NHS experience’

EXAMPLE

Getting a coffee

Person Customer Purchaser‘Buying aproduct’

Person Barista Preparer

Subst Coffee Product

Person Customer Consumer

hasRole hasPart hasAct

hasRole hasPart

hasAct

hasRole hasPart

hasAct

hasRole hasPart

hasAct

Robert Piro Semantic Web Technologies in Health Care 10/15

Page 28: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente Data Model

DATA MODEL

INSPIRED BY HL7 REFERENCE INFORMATION MODEL (RIM)

Entity Role Participation ActhasRole hasPart hasAct

ISO standard: ISO/HL7 21731:2014

Process centric (Administrative KR)

Developed for/in the medical community; BUT ‘NHS experience’

EXAMPLE

Contract for Work

Person Customer Offering Party ‘Buying aproduct’

Person Representative Accepting Party

Subst Coffee Work Result

Person Customer Beneficiary

hasRole hasPart hasAct

hasRole hasPart

hasAct

hasRole hasPart

hasAct

hasRole hasPart

hasAct

Robert Piro Semantic Web Technologies in Health Care 10/15

Page 29: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente Data Model

DATA MODEL

INSPIRED BY HL7 REFERENCE INFORMATION MODEL (RIM)

Entity Role Participation ActhasRole hasPart hasAct

ISO standard: ISO/HL7 21731:2014

Process centric (Administrative KR)

Developed for/in the medical community; BUT ‘NHS experience’

EXAMPLE

Prescription

Person Physician Prescriber Prescription

Person Pharmacist Dispenser

Subst Drug Medication

Person Patient Recipient

hasRole hasPart hasAct

hasRole hasPart

hasAct

hasRole hasPart

hasAct

hasRole hasPart

hasAct

Robert Piro Semantic Web Technologies in Health Care 10/15

Page 30: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente Data Model as RDF Triples

DATA MODEL AS RDF TRIPLES

DATA MODEL USED FOR HEDIS

Entity(EN00)Name: ”John Smith”Gender: kp:maleDoB: ”1973-10-22”ˆˆxsd:datetype: cat:person

Role(RL00)type : cat:Patient

Act(ACT00)Date : “2013-03-22”ˆˆxsd:datetype: cat:Diagnosis

Participation(PT00)type : cat:Subject

kp:hasRole

kp:hasPart

kp:has

Contex

t

ENCODING IN RDF-TRIPLES

EN00 kp:DoB ”1973-10-22”ˆˆxsd:date PT00 kp:hasContext ACT00 .EN00 kp:hasRole RL00 . ACT00 rdf:type cat:Diagnosis .RL00 rdf:type kp:Patient .RL00 kp:hasPart PT00 .

Robert Piro Semantic Web Technologies in Health Care 11/15

Page 31: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente Data Model as RDF Triples

DATA MODEL AS RDF TRIPLES

DATA MODEL USED FOR HEDIS

Entity(EN00)Name: ”John Smith”Gender: kp:maleDoB: ”1973-10-22”ˆˆxsd:datetype: cat:person

Role(RL00)type : cat:Patient

Act(ACT00)Date : “2013-03-22”ˆˆxsd:datetype: cat:Diagnosis

Participation(PT00)type : cat:Subject

kp:hasRole

kp:hasPart

kp:has

Contex

t

ENCODING IN RDF-TRIPLES

EN00 kp:DoB ”1973-10-22”ˆˆxsd:date PT00 kp:hasContext ACT00 .EN00 kp:hasRole RL00 . ACT00 rdf:type cat:Diagnosis .RL00 rdf:type kp:Patient .RL00 kp:hasPart PT00 .

Robert Piro Semantic Web Technologies in Health Care 11/15

Page 32: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente Data Model as RDF Triples

DATA TRANSLATION

DATA PROVIDED

Real Data from a KP regional branch2

Data: ASCII-files, one record per line, pipe-separated fields

MBR | SERV DT | CPT | ... | DIAG1 | ... | DIAG22 | PROVNBR

DATA STATS

About Records Size About Records SizeProviders 113k 6.8M Labs 28.3M 1.4GBMembers 466k 84MB Prescriptions 8.9M 892MBEnrollments 3.3M 332MB Visits 54M 8.6GB

TRANSLATION & IMPORT

Translation time: 45min @ 8threads

902M triples (4.6GB gzipped), 547M unique

RDFox import time 390s @ 8threads

2The data never left KaiserRobert Piro Semantic Web Technologies in Health Care 12/15

Page 33: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente Data Model as RDF Triples

DATA TRANSLATION

DATA PROVIDED

Real Data from a KP regional branch2

Data: ASCII-files, one record per line, pipe-separated fields

MBR | SERV DT | CPT | ... | DIAG1 | ... | DIAG22 | PROVNBR

DATA STATS

About Records Size About Records SizeProviders 113k 6.8M Labs 28.3M 1.4GBMembers 466k 84MB Prescriptions 8.9M 892MBEnrollments 3.3M 332MB Visits 54M 8.6GB

TRANSLATION & IMPORT

Translation time: 45min @ 8threads

902M triples (4.6GB gzipped), 547M unique

RDFox import time 390s @ 8threads

2The data never left KaiserRobert Piro Semantic Web Technologies in Health Care 12/15

Page 34: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente Data Model as RDF Triples

DATA TRANSLATION

DATA PROVIDED

Real Data from a KP regional branch2

Data: ASCII-files, one record per line, pipe-separated fields

MBR | SERV DT | CPT | ... | DIAG1 | ... | DIAG22 | PROVNBR

DATA STATS

About Records Size About Records SizeProviders 113k 6.8M Labs 28.3M 1.4GBMembers 466k 84MB Prescriptions 8.9M 892MBEnrollments 3.3M 332MB Visits 54M 8.6GB

TRANSLATION & IMPORT

Translation time: 45min @ 8threads

902M triples (4.6GB gzipped), 547M unique

RDFox import time 390s @ 8threads2The data never left Kaiser

Robert Piro Semantic Web Technologies in Health Care 12/15

Page 35: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente The Datalog Rules

DATALOG RULES

RULES HEDIS DIABETES CARE DENOMINATORS AND NUMERATORS

174 rules in 607 lines of code distributed in 21 files

authored on a 200 patient test set using an interactive autoring tool

MATERIALISATION

8 Intel Xeon [email protected] with 64GB RAM

Data import + materialisation: 1h40m

Maximal number of triples before subgraph extraction: 731M (43GB)

Subgraph 71.7M triples (4GB), maximal number of triples: 92.2M (4.8GB)

SUMMARY

Data is translated into RDF triples

RDFox computes with a Datalog Program and the RDF triples thematerialisation

Results are obtained by querying the triple store (SPARQL)

Robert Piro Semantic Web Technologies in Health Care 13/15

Page 36: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente The Datalog Rules

DATALOG RULES

RULES HEDIS DIABETES CARE DENOMINATORS AND NUMERATORS

174 rules in 607 lines of code distributed in 21 files

authored on a 200 patient test set using an interactive autoring tool

MATERIALISATION

8 Intel Xeon [email protected] with 64GB RAM

Data import + materialisation: 1h40m

Maximal number of triples before subgraph extraction: 731M (43GB)

Subgraph 71.7M triples (4GB), maximal number of triples: 92.2M (4.8GB)

SUMMARY

Data is translated into RDF triples

RDFox computes with a Datalog Program and the RDF triples thematerialisation

Results are obtained by querying the triple store (SPARQL)

Robert Piro Semantic Web Technologies in Health Care 13/15

Page 37: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente The Datalog Rules

DATALOG RULES

RULES HEDIS DIABETES CARE DENOMINATORS AND NUMERATORS

174 rules in 607 lines of code distributed in 21 files

authored on a 200 patient test set using an interactive autoring tool

MATERIALISATION

8 Intel Xeon [email protected] with 64GB RAM

Data import + materialisation: 1h40m

Maximal number of triples before subgraph extraction: 731M (43GB)

Subgraph 71.7M triples (4GB), maximal number of triples: 92.2M (4.8GB)

SUMMARY

Data is translated into RDF triples

RDFox computes with a Datalog Program and the RDF triples thematerialisation

Results are obtained by querying the triple store (SPARQL)

Robert Piro Semantic Web Technologies in Health Care 13/15

Page 38: Semantic Web Technologies in Health Care Analytics

Project with Kaiser Permanente The Datalog Rules

RULE EXAMPLE

EXAMPLE

Patients must be enrolled and can have multiple enrollements in a year.

Enrollments are given as [begin-date,end-date] pair per patient.

“Compute all patients with contintuous enrollments within themeasurement year” i.e. the enrollments must form a connected chain

[x0, x1] . . . [xi , xi+1][xi+1, xi+2] . . . [xn−1, xn]

such that “2013-01-01” and “2013-12-31” are enclosed by some interval

[?Patient , aux : continiousEnrollment , ?PredEnr ]←[?Patient , aux : continiousEnrollment , ?Enr ],[?Enr , kp : hasBeginConnectDateTime, ?begin],[?Patient , aux : roleHasEnrollment , ?PredEnr ],[?PredEnr , kp : hasEndDateTime, ?begin] .

Robert Piro Semantic Web Technologies in Health Care 14/15

Page 39: Semantic Web Technologies in Health Care Analytics

Conclusion & Future Work

CONCLUSION & FUTURE WORK

CONCLUSION

Created a use-case / Impact Scenario: real requirements, real data

Rooting of reasearch; usefulness of RDFox, new avenues, benchmarks

FUTURE WORK

Rule authoring tool / anoymisation of the dataResearch

stratification of the reasoningnegation + aggregatesBig data reasoning + browsing

www.rdfox.org

Robert Piro Semantic Web Technologies in Health Care 15/15


Recommended