Date post: | 07-Apr-2017 |
Category: |
Technology |
Upload: | ontotext |
View: | 543 times |
Download: | 1 times |
Mining Electronic Health RecordsGo Beyond Ontology Based Text Mining
October 15th 2015
Mining Electronic Health Records #105/02/2023
• Information management company providing text analysis, data management and state-of-the-art semantic technology
• 70 software developers in Sofia, Bulgaria• Presence in London and New York• Clients include BBC, FT, AstraZeneca, DoD, Wiley & Sons• Over 400 person-years in R&D to create a one-stop shop for:
– Content enrichment– Data management – Graph database engine
Ontotext
Mining Electronic Health Records #205/02/2023
Technology Portfolio
Mining Electronic Health Records #305/02/2023
Mining Electronic Health Records #405/02/2023
Clients
Healthcare Insights
Mining Electronic Health Records #505/02/2023
Mining Electronic Health Records #605/02/2023
• An ontology models discrete knowledge domain
• All ontology concepts have a definition
• All ontology concepts have alternative labels
• Where appropriate, ontology concepts have additional labels
• Inference can be applied
Chronic Obstructive Pulmonary Disease
rdf:typeCOPDDisease
skos:prefLabel
skos:altLabel COLD
Shortness of Breath
rdf:type Symptom
hasSymptom
skos:altLabel Chronic Airflow Obstruction
rdf:type Disease
rdfs
:sub
Clas
sOff
Respiratory Disease
Ontology Based IE
Ontology Based IE - problems
Mining Electronic Health Records #705/02/2023
• Does not model a domain completely (both on instance level and labels) Extend ontologies Ontology enrichment via instance mappings
• Labels contain additional qualifying information Definition of literals rewrite and ignore rules
• Labels does not reflect natural language Apply “flexible” gazetteers
• Ambiguity in terminology Pre-filtering Ranking Semantic instance mappings
Vocabulary Enrichment – Semantic Mappings
Mining Electronic Health Records #805/02/2023
J44.9
Chronic obstructive airway disease NOS
Chronic obstructive lung disease NOS
Chronic obstructive pulmonary disease, unspecified
155565006
Chronic obstructive lung disease
Chronic obstructive airways disease NOS
Chronic obstructive lung disease (disorder)
CAFL - Chronic airflow limitation
Chronic irreversible airway obstruction
ICD 10 CM SNOMED CT US
skos:closeMatch
string matching
Ontology Based IE - problems
Mining Electronic Health Records #905/02/2023
• Does not model a domain completely (both on instance level and labels) Extend ontologies Ontology enrichment via instance mappings
• Labels contain additional qualifying information Definition of literals rewrite and ignore rules
• Labels does not reflect natural language Apply “flexible” gazetteers
• Ambiguity in terminology Pre-filtering Ranking Semantic instance mappings
Vocabulary Enrichment – Synonym Enrichment
Mining Electronic Health Records #1005/02/2023
Tumor
Tumour
Abdomen
Abd
Tumor of abdomen
Tumor of abd
Tumour of abdomen
Tumour of abd
Ontology Based IE - problems
Mining Electronic Health Records #1105/02/2023
• Does not model a domain completely (both on instance level and labels) Extend ontologies Ontology enrichment via instance mappings
• Labels contain additional qualifying information Definition of literals rewrite and ignore rules
• Labels does not reflect natural language Apply “flexible” gazetteers
• Ambiguity in terminology Pre-filtering Ranking Semantic instance mappings
Ontology Based IE – example
Mining Electronic Health Records #1205/02/2023
Flexible Gazetteers
Mining Electronic Health Records #1305/02/2023
• Pre-coordinated terms cannot match all natural language terms, especially those used in narrative medical text! Inversions
concept “knee injury” vs. “injury of knee” in text
Gaps due to additional qualifiersconcept “periorbital swelling” vs. “periorbital soft tissue swelling” in text
Detection of negations
Mining Electronic Health Records #1405/02/2023
• The ability to reliably identify negated medical statements in text may significantly affect the quality of the extracted information.
Adverbial Negation
Negations in noun phrase
Prepositional Negation
Adjective Negation
Verb Negation
Temporality Identification
Mining Electronic Health Records #1505/02/2023
• Temporal resolution for events in clinical notes is crucial for an accurate definition of patient history, current medical condition and assigned treatment.
• Identified temporality classes are:HistoricalHypothetical (“Not particular”)Recent
• The temporality data is important to be normalized based on the medical documents meta data (date of report/visit)!
Temporality Identification - Example
Mining Electronic Health Records #1605/02/2023
Post-coordination Patterns
Mining Electronic Health Records #1705/02/2023
• It is impossible to fully describe medical knowledge in term of fully qualified concepts!
• Natural language does not follow the standardized descriptions defined by domain ontologies!
• Concepts must describe basic entities• Entity properties can be described by different
qualifier classes• Patterns can generate new concepts, combining
specific instance and qualifier classes
Post-coordination Patterns - Examples
Mining Electronic Health Records #1805/02/2023
• Example pattern:<disease> or <morphologic abnormality> as right most concept in a noun phrase, preceded by <qualifier> and <body structure>
Data Modeling
Mining Electronic Health Records #1905/02/2023
• Based on normalized data• … but allowing extension with free text• Allow data fusion with background knowledge• Capture all aspects of the extracted information• Tightly coupled with the context• Provide provenance and confidence score• Explorable! Not just searchable
Data provenance: graph <http://linkedlifedata.com/resource/document/CD8672>
Data Modeling
Mining Electronic Health Records #2005/02/2023
rdf:typePatient XYZPatient
malehasGender
hasBirthDate 1956/09/20 xsd:date
hasDiagnosehttp://linkedlifedata.com/resource/icd9cm/157.9
currentDisease
hasStatus
skos:prefLabel Malignant neoplasm of pancreas
rdf:type
Data provenance: graph <http://linkedlifedata.com/resource/document/CN127753>hasTreatment
http://linkedlifedata.com/resource/treatment/DT127753
TreatmenthasDrug
hasDosage
rdf:type
http://linkedlifedata.com/resource/drug/irinotecan180 mg/ 1 m2 for 80 min
Data provenance: graph <http://linkedlifedata.com/resource/drugBroshure/CAMPTOSAR>
Maximum Daily Dosage
Data Modeling – KB
Mining Electronic Health Records #2105/02/2023
http://linkedlifedata.com/resource/drugDosage/DD127753
Dosage
hasMedication
hasPopulationGroup
rdf:type
http://linkedlifedata.com/resource/drug/irinotecanAdult
hasAdministration Routehttp://linkedlifedata.com/resource/route/subcutaneus
hasAdministration Formhttp://linkedlifedata.com/resource/form/injection
http://linkedlifedata.com/resource/icd9cm/157.9hasIndication
hasDosageValue180
hasDosageUnitmg
hasDenominatorValue1
hasDenominatorUnitm2
Semantic Data Exploration and Mining
Mining Electronic Health Records #2205/02/2023
• Build Linked Data out of extracted facts and background knowledge
• Semantic Faceted Search • Cross Entity Search & Exploration• Expert Text Mining Search in pre-annotated
documentsCombine semantic annotations with PoS elements Identify post-coordination patterns Identify relations patternsQuery expansion using background knowledge
• Information Extraction from EHRs is still a challenge!• Making use of the extracted data is even more
challenging • Ontotext provides the technology stack to make it work!
Thank you!
Mining Electronic Health Records #2305/02/2023