Jason H.D. Cho1,2, Parikshit Sondhi1, Chengxiang Zhai1, Bruce R. Schatz1,2,3
1Department of Computer Science,2Institute of Genomic Biology,
3Department of Medical Information Science, University of Illinois at Urbana-Champaign, Urbana,
IL
Resolving Healthcare Forum Posts via Similar Thread Retrieval
2
• 72% of internet users looked online for health information within the past year
• 18% of internet users have gone online to find others who might have health concerns similar to theirs
• Improving health information retrieval and similar case retrieval will improve quality of search for vast majority of users• Not many posts are answered in timely manner!
* Pew Research http://www.pewinternet.org/
Motivation
3
Motivation
4
Envisioned Response
The following threads discuss similar problems: Doritos Allergy Very Severe and New
Certain Foods + Beer = Flushing and Head Pounding…Help!
Peanut/Food Allergies
……………………
• Traditionally defined as retrieving relevant cases doctors may be interested in• Doctors may want to compare cases that are similar to the current patient
• In online domain, we define this as retrieving forum posts written by patients
• We tackled cases where we do not know user’s background
Case Retrieval Task
Query Characteristics
• Queries meant for human experts not automated systems
• Simple non-technical language
• Presence of emotional statements
Document Characteristics
8
• How can we improve case retrieval search task? • How should we represent queries?• Entity-based search, or context-based search?• Which posts are most informative in a given thread?• Can we utilize forum categories?
Our Goal
Evaluation via Pooling• 350K threads and 20 queries from HealthBoards
• 2 judges first judged 100 query-thread pairs• 88% agreement (κ=0.76)
• 730 total judged query-thread pairs• 324 relevant• 406 irrelevant
10
Method Summary
• Baseline weighting• First Post BM-25• Thread BM-25
• Semantic weighting• Medical term extraction• Shallow Information Extraction
• Post weighting• Monotonic weighting• Parabolic weighting
• Forum Category weighting• Uniform weighting (FCUW)• Feedback weighting (FCFW)
Q: How should we represent queries?
11
State of the Art Baseline• Baseline BM-25 formula:
• c(w,t): Count of word w in thread t
• c(w,q): Count of word w in query q
• FPBM-25: Consider only the content of first post to represent the thread document
• TBM-25: Consider content of entire thread to represent the thread document
Results: Query Representation Comparison
Run Method P@5 Recall@30 MAP
B1 Baseline TBM-25 0.3000 0.2846 0.1977
B2 Baseline FPBM-25 0.4700 (56.6%)0.4975 (74.8%) 0.3316 (67.7%)
Representing first post as query is better than utilizing all of the posts
13
Method Summary
• Baseline weighting• First Post BM-25• Thread BM-25
• Semantic weighting• Medical term extraction• Shallow Information Extraction
• Post weighting• Monotonic weighting• Parabolic weighting
• Forum Category weighting• Uniform weighting (FCUW)• Feedback weighting (FCFW)
Q: Which one works better? Entity-based search, or context-based search?
Medical Entity Extraction
• Applied ADEPT toolkit (MacLean and Heer 2013)
• High precision but low recall
15
MedicalEx: Relevance Scoring
Count of occurrences
labeled as med entity
Count of occurrences
not labeled as med entity
Modified query
frequency
Background (BKG) Neither PE nor MED
I am severly allergic to some product that is found in both Tostitos and Doritos, as well as random other types of chips. I know the solution is "don't eat chips" but what could the product be? I don't want to accidentally consume it. When I eat this, I get very bad stomach cramps and it ruins the rest of my day/night - the only solution is to go to sleep so I can't feel it. Help! Any ideas on this?
Shallow Information Extraction
Physical Examination (PE) Disease, Symptoms
Medication (MED) Treatment, Prevention
Sondhi, 2010
17
ShallowEx: Relevance Scoring
Give higher importance to PE and MED sentences
Modified Query Count
Word count in PE sentences
Word count in MED sentences
Word count in BKG sentences
18
Results: Semantic Methods
Run Method P@5 Recall@30 MAP
B2 Baseline FPBM-25 0.4700 0.4975 0.3316
S1 B2+MedEx 0.4600 0.4283 0.2918
S2 B2+ShallowEx 0.53 (12.7%) 0.4847 (-2.5%) 0.3481 (4.9%)
Shallow extraction is better than medical entity extraction
19
Method Summary
• Baseline weighting• First Post BM-25• Thread BM-25
• Semantic weighting• Medical term extraction• Shallow Information Extraction
• Post weighting• Monotonic weighting• Parabolic weighting
• Forum Category weighting• Uniform weighting (FCUW)• Feedback weighting (FCFW)
Q: Which posts are most informative in a given thread?
Not all posts are equally representative
),(' twc
Post Weighting
Sondhi, 2013
Post Weighting
),()3,1( 1pwcf
),()3,3( 3pwcf
: gives the weight of post i in a thread with K posts
),( Kif
Monotonic Post Weighting
2m
1m
3m
Post Position i
Relative Post
Weightfor K=10
Parabolic Post Weighting
Post Weighting Methods Evaluation
FF UF LQ Cross Forum
0.4
0.5
0.6
0.7
0.8
UniformMonotonicParabolic
Forum Used
Acc
ura
cy
Results: Post Weighting
Run Method P@5 Recall@30 MAP
B2 Baseline FPBM-25 0.4700 0.4975 0.3316
P1 Monotonic 0.5100 (8.5%) 0.5240 (5.3%) 0.3631 (9.5%)
P2 Parabolic 0.5100 (8.5%) 0.5040 0.3494
Both post weighting schemes outperform the baseline
26
Method Summary
• Baseline weighting• First Post BM-25• Thread BM-25
• Semantic weighting• Medical term extraction• Shallow Information Extraction
• Post weighting• Monotonic weighting• Parabolic weighting
• Forum Category weighting• Uniform weighting (FCUW)• Feedback weighting (FCFW)
Q: Can we utilize forum categories?
Forum Categories
• Relevance feedback based on top k retrieved categories• Forum Category Uniform weighting (FCUW)
• Forum Category Feedback weighting (FCFW)
Forum Category Weighting
Randomly selecting forum IDRatio of current forum ID
amongst retrieved documents
29
Forum Category Weighting Scoring
New ScoreForum Category
Feedback weighting
Weights for forum category weighting
30
Results: Forum Category Weighting
Run Method P@5 Recall@30 MAP
B2 Baseline FPBM-25 0.4700 0.4975 0.3316
P1 Uniform weighting 0.5200(10.6%)
0.4678(-7.0%) 0.3334 (0.5%)
P2 Feedback weighting 0.5100 (8.5%)
0.4610(-7.3%) 0.3389 (2.2%)
Uniform weighting and Feedback weighting similar performance, but FCFW less parameters to tune.
31
Results: Method Combinations
Run Method P@5 Recall@30 MAP
B2 Baseline FPBM-25 0.4700 0.4975 0.3316
S2 Baseline FPBM-25 + ShallowEx
0.53 0.4847 0.3481
C2 Monotonic + ShallowEx
0.5400 (14.9%) 0.5354 (7.6%) 0.3745 (12.9%)
C3 Parabolic+ShallowEx
0.5100 0.5155 0.3573
C4 Monotonic + ShallowEx + FCFW
0.5200 0.5625 (13.1%) 0.3702Monotonic + ShallowEx performs the best
Conclusion• Fairly high P@5 accuracy is achievable
• Treating first post as query performed the better than utilizing all posts in thread
• Shallow information extraction is better for query understanding• Incorporates contextual information
• Utility of posts drops steadily with position
• Easy extension of baseline method
33
Future Work• Recommending relevant forum posts for doctors
• Various online forums have ‘ask a doctor’ section• Doctors will save time by recommending forum posts
• Intent-based case retrieval• Identifying intents for both the end user and the existing posts
will improve search quality• Examples: Cause of symptom, managing disease, adverse
effects
34
• This work is supported in part by the National Science Foundation under Grant Number CNS-1027965. We would also like to thank the anonymous reviewers for their invalu- able feedback, and Institute of Genomic Biology for their computing resources.
Acknowledgements
35
Questions?
Thank you!
36
• J. H. D. Cho and V. Q. Liao and Y. Jiang and B. Schatz, Aggregating Personal Health Messages for Scalable Comparative Effectiveness Research. ACM BCB, 2013
• J. H. D. Cho and P. Sondhi and C. Zhai and B. Schatz, Resolving Healthcare Forum Posts via Similar Thread Retrieval. ACM BCB, 2014
• K. Pattabiraman and P. Sondhi and C. Zhai, Exploiting Forum Thread Structures to Improve Thread Clustering. ICTIR 2013.
• P. Sondhi and M. Gupta and C. Zhai and J. Hockenmaier, Shallow Information Extraction from Medical Forum Data. COLING 2010.
• B. W. Chee and R. Berlin and B Schatz, Predicting Adverse Drug Events from Personal Health Messages, AMIA 2011
• Diana L. MacLean and Jeffrey Heer. Identifying medical terms in patient-authored text: a crowdsourcing-based approach. Journal of the American Medical Informatics Association, pages amiajnl–2012–001110+, May 2013.
References
37
Features & Performance of Shallow Information Extraction Method
unig
ram
s
+sem
antic
+pos
ition
+mor
phol
ogica
l
+wor
dcou
nt
+thr
eadc
reat
or
+big
ram
s
Feat. S
elec
tion
60
62
64
66
68
70
72
74
76
Performance results for different feature sets
Order-1 CRF
SVM
Feature Set
Pe
rce
nta
ge
A
ccu
racy
We use the best performing SVM based classifier(Posts: 175, Sentences: 1494)
ShallowEx: Extraction Model