Searching and Exploring Biomedical Data
Vagelis HristidisSchool of Computing and Information SciencesFlorida International University
Vagelis Hristidis, Searching and Exploring Biomedical Data 2
RoadmapWhy is it challenging to search
EMRs?XOntoRank: Leveraging
Ontologies to improve sensitivity in EMR search
ObjectRank: Use authority flow to rank EMR entities
BioNav: Using MeSH to explore the results of PubMed queries
Vagelis Hristidis, Searching and Exploring Biomedical Data 3
RoadmapWhy is it challenging to search
EMRs?XOntoRank: Leveraging
Ontologies to improve sensitivity in EMR search
ObjectRank: Use authority flow to rank EMR entities
BioNav: Using MeSH to explore the results of PubMed queries
Vagelis Hristidis, Searching and Exploring Biomedical Data 4
ELECTRONIC MEDICAL RECORDS (EMRs) Adoption of EMRs hard due to political reasons
◦ No unique patient id◦ Confidentiality◦ HIPAA (Health Insurance Portability and
Accountability Act) Move towards XML-based format. One of most promising:
Health Level 7’s Clinical Document Architecture (CDA).
EMRs pose new challenges for Computer Scientists◦ Confidentiality, authentication, secure exchange◦ Storage, Scalability◦ Dictionaries, terms disambiguation◦ Search for interesting patterns (Data Mining)◦ Data Integration, Schema mapping◦ Searching and Exploring
Vagelis Hristidis, Searching and Exploring Biomedical Data
7
LIMITATIONS OFTraditional IR General XML Search
Text-based search engines do not exploit the XML tags, hierarchical structure of XML
Whole XML document treated as single unit - unacceptable given the possibly large sizes of XML documents
Proximity in XML can also be measured in terms of containment edges
EMRs have known but complex semantics EMRs include free text, numeric data, time
sequences, negative statements. Routine references in EMRs to external
information sources like dictionaries and ontologies.
Vagelis Hristidis, Searching and Exploring Biomedical Data
Syntax vs. Semantics in Schema
8
Example – query “Asthma Theophylline”
More details at [Hristidis et al. NSF Symposium on Next Generation of Data Mining ’07]
Vagelis Hristidis, Searching and Exploring Biomedical Data 9
RoadmapWhy is it challenging to search
EMRs?XOntoRank: Leveraging
Ontologies to improve sensitivity in EMR search
ObjectRank: Use authority flow to rank EMR entities
BioNav: Using MeSH to explore the results of PubMed queries
Vagelis Hristidis, Searching and Exploring Biomedical Data
XOntoRank: Leverage Ontological Knowledge
Algorithm to enhance keyword search using ontological knowledge (e.g., SNOMED) [ICDE’08 poster, ICDE’09 full paper]
10
Medical DictionaryM
edic
al D
icti
onar
y
50043002Disorder of
Respiratory system
79688008RespiratoryObstruction
Is a
118946009Disorder of
Thorax
41427001Disorder ofBronchus
Is a
195967001Asthma
Is a
Is a
301229001Bronchial Finding
Is a
405944004AsthmaticBronchitis
Is a
May be
266364000Asthma attack
Is aMay be
955009Bronchial Structure
Finding site of
Finding site of
Finding site of
82094008Lower respiratory tract
structure
Is a
Vagelis Hristidis, Searching and Exploring Biomedical Data
Example 1
q = {“bronchitis”, “albuterol”}
result = Observationcodevalue Bronchitisvalue Albuterol
11
Vagelis Hristidis, Searching and Exploring Biomedical Data
Example 2
q = {“asthma”, “albuterol”}
result = ???
12
Vagelis Hristidis, Searching and Exploring Biomedical Data
XOntoRankA CDA node may be associated to a
query keyword w through ontology.XOntoRank first assigns scores to
ontological concepts◦ OntoScore OS(): Semantic relevance of a
concept c in the ontology to a query keyword w.
Then, given these scores, assign Node Scores NS() to document nodes
Other aggregation functions are possible.
13
Vagelis Hristidis, Searching and Exploring Biomedical Data
Computing OntoScore of Concept Given Query KeywordThree ways to view the ontology
graph:◦As an unlabeled, undirected graph.◦As a taxonomy.◦As a complete set of relationships.
14
Vagelis Hristidis, Searching and Exploring Biomedical Data 15
RoadmapWhy is it challenging to search
EMRs?XOntoRank: Leveraging
Ontologies to improve sensitivity in EMR search
ObjectRank: Use authority flow to rank EMR entities
BioNav: Using MeSH to explore the results of PubMed queries
Vagelis Hristidis, Searching and Exploring Biomedical Data 16
Authority Flow Ranking in EMRs
A subset of the electronic health record dataset.
Work under submission.
EventsPlan TimeStampCreated=”2004-11-03 11:57:00.0" Events=”….small residual pericardial effusion…..”
Hospitalization TimeStampCreated=”2004-10-27 22:00:00.0" History=”18 year old boy with an aggressive form of chest lymphoma…” Allergies = “NKDA”…...
Cardiac PatientID=”1438" Complication=”apical impulse … Echo-large increasing pericardial effusion…”
Employee TimeStampCreated=”2004-12-23 14:03:00.0" Title=”Pediatric Cardiologist”….
EventsPlan Events=“4 month old baby… pericardial effusion...”
Medication TimeStampCreated=”2003-02-13 21:57:00.0"..
Hospitalization History = “48 year old..”
v1v7
v2v3
v4
v5v6
prescribed_to
recorded_by
recorded_by
Query: “pericardial effusion”
Vagelis Hristidis, Searching and Exploring Biomedical Data 17
Authority Flow Ranking
Schema of the EMR dataset
Hospitalization
EmployeeAssociated_
Events
Patient Medication
A-E
P-M H-M
M-E
A-H H-E
P-E
created_by
reco
rded
_by
pres
crib
ed_b
y
of prescribed_to
forcreated_by
Vagelis Hristidis, Searching and Exploring Biomedical Data 20
User Study Results
00.10.20.30.40.50.60.70.80.9
1
CO085BM25 BM25 CO085 CO030
Ave
rag
e S
ensi
tivity
00.10.20.30.40.50.60.70.80.9
1
CO085BM25 BM25 CO085 CO030
Ave
rage
Spe
cific
ity
Mean Sensitivity Mean Specificity
BM25: Traditional Information Retrieval Ranking FunctionCO: Clinical ObjectRank (Authority Flow)
Vagelis Hristidis, Searching and Exploring Biomedical Data 21
RoadmapWhy is it challenging to search
EMRs?XOntoRank: Leveraging
Ontologies to improve sensitivity in EMR search
ObjectRank: Use authority flow to rank EMR entities
BioNav: Using MeSH to explore the results of PubMed queries
Vagelis Hristidis, Searching and Exploring Biomedical Data
Biological Databases (cont’d) – Results Navigation [ICDE09, TKDE 2010]
With SUNY Buffalo.Demo at
http://db.cse.buffalo.edu/bionav/Most publications in PubMed
annotated with Medical Subject Headings (MeSH) terms.
Present results in MeSH tree.Propose navigation model and
smart expansion techniques that may skip tree levels. 22
BioNav: Exploring PubMed Results
Static Navigation Treefor query “prothymosin”
MESH (313)Amino Acids, Peptides, and Proteins (310)
Proteins (307)Nucleoproteins (40)
Biological Phenomena, … (217)Cell Physiology (161)
Cell Growth Processes (99)
Genetic Processes (193)Gene Expression (92)
Transcription, Genetic (25)
95 more nodes
2 more nodes45 more nodes
4 more nodes
3 more nodes15 more nodes
10 more nodes1 more node
Histones (15)
- Query Keyword: prothymosin
- Number of results: 313
- Navigation Tree stats:
• # of nodes: 3941• depth: 10• total citations: 30897
Big tree with many duplicates!
23Vagelis Hristidis, Searching and Exploring Biomedical Data
24
BioNav: Exploring PubMed Results
Reveal to the user a selected set of descendent concepts that:(a) Collectively contain all results(b) Minimize the expected user navigation costNot all children of the root are necessarily revealed as in static navigation.
Vagelis Hristidis, Searching and Exploring Biomedical Data
Vagelis Hristidis, Searching and Exploring Biomedical Data
25
BioNav Evaluation
02468
101214161820
Overall Navigation Cost(# of Concepts Revealed + # of EXPAND Actions)
Static BioNav
References Abhijith Kashyap, Vagelis Hristidis, Michalis Petropoulos, and Sotiria Tavoulari.
Effective Navigation of Query Results Based on Concept Hierarchies. IEEE Transactions on Knowledge and Data Engineering (TKDE) 2010
Fernando Farfán, Vagelis Hristidis, Anand Ranganathan, and Michael Weiner. XOntoRank: Ontology-Aware Search of Electronic Medical Records. IEEE International Conference on Data Engineering (ICDE) 2009
Abhijith Kashyap, Vagelis Hristidis, Michalis Petropoulos, and Sotiria Tavoulari. BioNav: Effective Navigation on Query Results of Biomedical Databases. IEEE International Conference on Data Engineering, ICDE 2009
Vagelis Hristidis, Fernando Farfán, Redmond P. Burke, Anthony F. Rossi, Jeffrey A. White. Information Discovery on Electronic Medical Records. National Science Foundation Symposium on Next Generation of Data Mining and Cyber-Enabled Discovery for Innovation (NGDM) 2007
Supported by NSF IIS-0811922: Information Discovery on Domain Data Graphs, 2008-
2011 NSF CAREER IIS-0952347, 2010-2015
26Vagelis Hristidis, Searching and Exploring Biomedical Data