A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library
Wesley W. ChuComputer Science Dept,
2
NIH Program Project Grant A 5 year $ 10M joint interdisciplinary project
between Medical School & CS faculty Project 1-- teleradaiology infrastructure Project 2-- neuroradiology workstation Project 3-- multimedia information architecture Project 4-- natural language processing for
medical reports Project 5-- medical digital library
3
Project 5 Personnel
Graduate students:Victor Z. LiuWenlei MaoQinghua Zou
Consultants:Hooshang Kangaloo, M.D.Denies Aberle, M.D.
Project leader: Wesley W. Chu
4
Data in a Medical Digital Library Structured data (patient lab data,
demographic data,…)--CoBase Images (X rays, MRI, CT scans)--
KMeD Free-text
Patient reports Teaching files Literature News articles
5
System Overview
Patient reports
Medical literature
Medical Digital Library(MDL)
Teaching materials
Query results
Ad-hoc query
Patient report for content correlation
News Articles
6
A Sample Patient Report
…Tissue Source:LUNG (FINE NEEDLE ASPIRATION) (LEFT
LOWER LOBE)…FINAL DIAGNOSIS:
- LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION):- LUNG CANCER, SMALL CELL, STAGE II.
…
…Tissue Source:LUNG (FINE NEEDLE ASPIRATION) (LEFT
LOWER LOBE)…FINAL DIAGNOSIS:
- LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION):- LUNG CANCER, SMALL CELL, STAGE II.
…
7
Treatment-related articles
??? How to treat the disease
Diagnosis-related articles
??? How to diagnose the disease
Scenario Specific Retrieval
…Tissue Source:LUNG (FINE NEEDLE
ASPIRATION) (LEFT LOWER LOBE)
…FINAL DIAGNOSIS:
- LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION):- LUNG CANCER, SMALL CELL, STAGE II.
…
8
Challenge I: Indexing Extracting domain-specific key
concepts in the free text for indexing Free-text: Lung cancer, small cell, stage II
Concept terms in knowledge source: stage II small cell lung cancer
Conventional methods use NLP Not scalable Cannot adapt to various forms of word
permutation
9
Challenge II: Terms used in the query are too general
Expanding the general terms in the query to specific terms that are used in the document
Query: lung cancer, diagnosis options
Document: … the effectiveness of chest x-ray and bronchography on patients with lung cancer …
?√
Query: lung cancer, chest x-ray, bronchography, …
10
Challenge III: Mismatching between terms used in query and documents
ExampleQuery: … lung cancer, …
Document 3: anti-cancerdrug combinations…
?? ?Document 1: … lung carcinoma …
Document 2: … lung neoplasm …
11
Challenge I: Indexing Challenge II: Terms in the query
are too general Challenge III: Mismatch between
terms in the query and the documents
12
IndexFinder: Extracting domain-specific key concepts
Technique Permute words from text to generate
concept candidates. Use knowledge base to select the
valid candidates. Problem
Valid candidates may be irrelevant to specific domain indexing.
13
Eliminating irrelevant concepts
Syntactic filter: Limit permutation of words within a
sentence. Semantic filter:
Use the semantic type (e.g. body part, disease, treatment, diagnosis) to filter out irrelevant concepts
Use ISA relationship to filter out general concepts and yield specific concepts.
14
IndexFinder Performance Two orders of magnitude faster than
conventional approaches No NLP Knowledge base (UMLS) and index files are
resided in main memory Time complexity is linear with the number of
distinct words in the text Preliminary Evaluation
IndexFinder generates 4% more concepts than conventional approaches
(using a single noun phrase) All concepts are relevant
15
Challenge I: Indexing Challenge II: Terms in the query
are too general Challenge III: Mismatch between
terms in the query and the documents
16
Query Expansion (QE) Queries in the following form
benefit from expansion:
<key concept> + <general supporting concept(s)>e.g. lung cancer e.g. diagnosis options
<key concept> + <specific supporting concept(s)>e.g. lung cancer e.g. chest x-ray, bronchography
expansion
17
Traditional QE Appends all terms that statistically co-
occur with the key terms in the query Not semantically focused
Original Query: lung cancer, diagnosis options
expansion
Expanded Query: lung cancer, radiotherapy, chemotherapy, antineoplastic agents, survival rate
18
Knowledge-based QE
Knowledge source(UMLS,by theNLM)
diagnoses
Concept
Disease or Syndrome
Diagnostic Procedure
Sign or Symptom
Pharmacologic Substance
lung cancer chest x-ray
Semantic Type
Key concept Specific supporting concepts
A class of conceptsthat belong to aSemantic Type
BodyParts
Injury orPoisoning
Semantic NetworkMetathesaurus
diagnoses
diagnoses
19
Challenge I: Indexing Challenge II: Terms in the query
are too general Challenge III: Mismatch between
terms in the query and the documents
20
Document: … lung carcinoma …Document: … lung neoplasm …Document: … anti-cancer drugcombinations …
Document: … anti-cancer drugcombinations …
Phrase-based Vector Space Model (VSM)
Query: … lung cancer, …
?
Knowledge-source
lung cancer = lung carcinoma …√
lung neoplasm …
parent_of
√
anti-cancer drug combinations
missing!!!
Query: … lung cancer, …
√??
21
Phrase-based VSM Examples
Query
Document
[(C0242379); “lung” “cancer”] …[(C0003393); “anti” “cancer” “drug” “combin”] …
Query:“lung cancer …”
Phrases:[(C0242379); “lung” “cancer”]…
Document:“anti-cancer drugcombinations …”
Phrases:[(C0003393); “anti” “cancer” “drug” “combin”]…
22
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
recall
aver
age
prec
isio
n ov
er 1
05 q
uerie
s
Stems
Retrieval Effectiveness Comparison (Corpus: OHSUMED, KB: UMLS)
16%100 queries
vs.5%
50 queries
23
System Overview
Patient reports
Medical literature
Medical Digital Library(MDL)
Teaching materials
Query results
Ad-hoc query
Patient report for content correlation
News Articles
24
Application: Query Answering via Templates
Sample templates:“<disease>, treatment,”“<disease>, diagnosis ”
QueryExpansion
…Template:“<disease>, treatment”
lung cancer
lung cancerradiotherapychemotherapycisplatin
relevant documents
IndexFinder
lung cancer,treatment
Phrase-basedVSM
25
Applications (cont’d) Scenario-specific content
correlation
Query Templates Scenario
Selection
e.g. treatment, diagnosis, etc.
PatientReport
QueryExpansion
…
relevant documents
Phrase-basedVSM
IndexFinder
26
Conclusion
Knowledge based (UMLS) approach provides scenario-specific medical free-text retrieval
IndexFinder – use word permutation as well as syntactic and semantic filtering to extract domain-specific key concepts in the free text for indexing
Knowledge-based query expansion – transform general terms in the query into the scenario specific terms used in the documents, giving the query a higher probability of matching with the relevant documents
Phrase based indexing – transform document indexing into phrase paradigm (concept and its word stems) to improve retrieve effectiveness
27
Acknowledgement
This research is supported in part by NIC/NIH Grant#4442511-33780
31
Demo http://fargo.cs.ucla.edu/umls/search.aspx
Test Texts
• Technically successful left lower lobe nodule biopsy.
• Preliminary localization CT images again demonstrate a left lower lobe nodule adjacent to the posterior segmental bronchus.
• CT scans obtained during biopsy demonstrate the coaxial cannula adjacent to the proximal aspect of the nodule.
• Surrounding pulmonary parenchymal hemorrhage as a result of the biopsy is also noted.
• There may be a tiny left apical air collection in the pleural space lateral to the apical bulla.
• Formal cytologic evaluation of the withdrawn specimen is pending at this time, although abnormal appearing "spindle" cells were identified during on-site cytopathologic evaluation of specimen adequacy.