©2013 MFMER | slide-1
An Incremental Approach to MEDLINE MeSH IndexingPresenter: Hongfang Liu
BioASQ 2013
Team Member: Mayo Clinic: Wu Stephen, James Masanz, and Hongfang LiuUniversity of Delaware: Dongqing Zhu, Ben Carterette
©2013 MFMER | slide-2
Outline
• Motivation & Task• Incremental Systems
• MetaMap-based• Search-based• LLDA-based
• Experiment Setup• Evaluation• Conclusion
©2013 MFMER | slide-3
Motivation of BioASQ Task
• Reduce human effort in MeSH indexing• Increasing number of new articles• Low consistency among annotators [Funk and Reid]
• Automatic MeSH indexing• Suggest MeSH terms for a given new article
©2013 MFMER | slide-4
Motivation of Mayo’s Participation
• Information retrieval (IR)-based ontology annotation• Traditional approach has been information
extraction-based• Three levels of intelligence in artificial intelligence
• Knowledge-base intelligence• Data intelligence• User intelligence> Explore the use of topic modeling and distant supervision for ontology annotation
©2013 MFMER | slide-5
Proposed Approaches
• MetaMap-based
• Search-based
• LLDA-based
Three approaches can work either independently or together in an incremental way
DUI
DUI
DUI
DUI
©2013 MFMER | slide-6
MetaMap-based System
Title: Age-period-cohort effect on mortality from cervical cancer. Abstract: to estimate the effect of age, period and birth cohort …
CUICandidates
Score
C0007847 1000C0302592 1000C0998265 861
… …
MetaMap Restricted to MeSH ontology
… …
… …
… …
… ..
… …
A ranked list of CUI => a ranked list of DUI
Title_score
Score threshold
Top DUI
©2013 MFMER | slide-7
MetaMap-based System
Title weight Score threshold Top DUI
• Parameter Tuning
Titles concepts are more important
Low threshold roughly leads to high
precision/recall
Tradeoff between P/R
©2013 MFMER | slide-8
Search-based System
• Retrieval Model
• DUI Aggregation
– query term – query weight – matching function – document – Dirichlet parameter
Docs
D01, D02, D03 …
D08, D03, D01 …
D02, D03, D01 …
DUI
ranked by tf * score(Q, D)
©2013 MFMER | slide-9
• Term Query• is a single-word expression• concept-related words in title and abstract
• Phrase Query• is a multi-word expression• concept-related phrases in title and abstract
• Long Query• mix of TQ and PQ
Search-based System
#weight(2.0 examination 2.0 cow 2.0 ultrasonographic 3.0 navel3.0 urachal 3.0 extra-abdominal 2.0 pathologic 2.0 abscess)
#weight(3.5 #uw2(hiv-1 infection) 4.5 #uw2(differential susceptibility) 2.0 #uw2(actin dynamics) 2.0 actin 4.5 #uw2(cortical actin) 4.5 #uw3(naive t cells) 2.5 dichotomy 3.5 #uw2(human memory)3.5 #uw3(chemotactic actin activity) 2.0 cd45ro)
©2013 MFMER | slide-10
Search-based System
Dirichlet Smoothing parameter
Top-ranked documents Top-ranked DUI
• Parameter Tuning
Less smoothing => better performance
A small set of highly relevant documents Tradeoff between P/R
©2013 MFMER | slide-11
Systems
• LLDA-based• LDA Process
• Each document is a mixture of topics• Each topic is a multinomial word distribution
• Labeled LDA• Incorporate label information
©2013 MFMER | slide-12
Systems
• LLDA-based• Top categories in MeSH
…Top-level categories as topics
(e.g., Anatomy Category, Chemicals and Drugs Category,
etc.)
root
Each label below is converted to corresponding top-level labels
©2013 MFMER | slide-13
Systems
• LLDA-based• DUI candidate list pruning
A pruned rank list
doc Search-based
LLDA-based Categories
DUIDUI
DUI
DUI
©2013 MFMER | slide-14
Data
Training -- <PMID, title, abstract, labels>
Testing -- input:<PMID, title, abstract>
output: <PMID, labels>
©2013 MFMER | slide-15
Evaluation
MM: MetaMap-based systemMi: microLCA: lowest common ancestor
©2013 MFMER | slide-16
Conclusion and Future Work• Three Systems
• MetaMap-based, search-based, LLDA-based
• Research findings• Explored impact of various parameter on performance• Promising results from search-based labeling
• Future Direction• Better concept weighting strategies
• E.g., corpus-level statistics, external resources• Comprehensive comparisons with existing methods• A better strategy for incorporating hierarchical info. Into LLDA
©2013 MFMER | slide-17
Questions & Discussion