+ All Categories
Home > Documents > Resolving Healthcare Forum Posts via Similar Thread Retrieval

Resolving Healthcare Forum Posts via Similar Thread Retrieval

Date post: 06-Jan-2016
Category:
Upload: harva
View: 45 times
Download: 0 times
Share this document with a friend
Description:
Resolving Healthcare Forum Posts via Similar Thread Retrieval. Jason H.D. Cho 1,2 , Parikshit Sondhi 1 , Chengxiang Zhai 1 , Bruce R. Schatz 1,2,3 1 Department of Computer Science, 2 Institute of Genomic Biology, 3 Department of Medical Information Science, - PowerPoint PPT Presentation
Popular Tags:
38
Jason H.D. Cho 1,2 , Parikshit Sondhi 1 , Chengxiang Zhai 1 , Bruce R. Schatz 1,2,3 1 Department of Computer Science, 2 Institute of Genomic Biology, 3 Department of Medical Information Science, University of Illinois at Urbana-Champaign, Urbana, IL Resolving Healthcare Forum Posts via Similar Thread Retrieval
Transcript
Page 1: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

Jason H.D. Cho1,2, Parikshit Sondhi1, Chengxiang Zhai1, Bruce R. Schatz1,2,3

1Department of Computer Science,2Institute of Genomic Biology,

3Department of Medical Information Science, University of Illinois at Urbana-Champaign, Urbana,

IL

Resolving Healthcare Forum Posts via Similar Thread Retrieval

Page 2: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

2

• 72% of internet users looked online for health information within the past year

• 18% of internet users have gone online to find others who might have health concerns similar to theirs

• Improving health information retrieval and similar case retrieval will improve quality of search for vast majority of users• Not many posts are answered in timely manner!

* Pew Research http://www.pewinternet.org/

Motivation

Page 3: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

3

Motivation

Page 4: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

4

Envisioned Response

The following threads discuss similar problems: Doritos Allergy Very Severe and New

Certain Foods + Beer = Flushing and Head Pounding…Help!

Peanut/Food Allergies

……………………

Page 5: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

• Traditionally defined as retrieving relevant cases doctors may be interested in• Doctors may want to compare cases that are similar to the current patient

• In online domain, we define this as retrieving forum posts written by patients

• We tackled cases where we do not know user’s background

Case Retrieval Task

Page 6: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

Query Characteristics

• Queries meant for human experts not automated systems

• Simple non-technical language

• Presence of emotional statements

Page 7: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

Document Characteristics

Page 8: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

8

• How can we improve case retrieval search task? • How should we represent queries?• Entity-based search, or context-based search?• Which posts are most informative in a given thread?• Can we utilize forum categories?

Our Goal

Page 9: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

Evaluation via Pooling• 350K threads and 20 queries from HealthBoards

• 2 judges first judged 100 query-thread pairs• 88% agreement (κ=0.76)

• 730 total judged query-thread pairs• 324 relevant• 406 irrelevant

Page 10: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

10

Method Summary

• Baseline weighting• First Post BM-25• Thread BM-25

• Semantic weighting• Medical term extraction• Shallow Information Extraction

• Post weighting• Monotonic weighting• Parabolic weighting

• Forum Category weighting• Uniform weighting (FCUW)• Feedback weighting (FCFW)

Q: How should we represent queries?

Page 11: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

11

State of the Art Baseline• Baseline BM-25 formula:

• c(w,t): Count of word w in thread t

• c(w,q): Count of word w in query q

• FPBM-25: Consider only the content of first post to represent the thread document

• TBM-25: Consider content of entire thread to represent the thread document

Page 12: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

Results: Query Representation Comparison

Run Method P@5 Recall@30 MAP

B1 Baseline TBM-25 0.3000 0.2846 0.1977

B2 Baseline FPBM-25 0.4700 (56.6%)0.4975 (74.8%) 0.3316 (67.7%)

Representing first post as query is better than utilizing all of the posts

Page 13: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

13

Method Summary

• Baseline weighting• First Post BM-25• Thread BM-25

• Semantic weighting• Medical term extraction• Shallow Information Extraction

• Post weighting• Monotonic weighting• Parabolic weighting

• Forum Category weighting• Uniform weighting (FCUW)• Feedback weighting (FCFW)

Q: Which one works better? Entity-based search, or context-based search?

Page 14: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

Medical Entity Extraction

• Applied ADEPT toolkit (MacLean and Heer 2013)

• High precision but low recall

Page 15: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

15

MedicalEx: Relevance Scoring

Count of occurrences

labeled as med entity

Count of occurrences

not labeled as med entity

Modified query

frequency

Page 16: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

Background (BKG) Neither PE nor MED

I am severly allergic to some product that is found in both Tostitos and Doritos, as well as random other types of chips. I know the solution is "don't eat chips" but what could the product be? I don't want to accidentally consume it. When I eat this, I get very bad stomach cramps and it ruins the rest of my day/night - the only solution is to go to sleep so I can't feel it. Help! Any ideas on this?

Shallow Information Extraction

Physical Examination (PE) Disease, Symptoms

Medication (MED) Treatment, Prevention

Sondhi, 2010

Page 17: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

17

ShallowEx: Relevance Scoring

Give higher importance to PE and MED sentences

Modified Query Count

Word count in PE sentences

Word count in MED sentences

Word count in BKG sentences

Page 18: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

18

Results: Semantic Methods

Run Method P@5 Recall@30 MAP

B2 Baseline FPBM-25 0.4700 0.4975 0.3316

S1 B2+MedEx 0.4600 0.4283 0.2918

S2 B2+ShallowEx 0.53 (12.7%) 0.4847 (-2.5%) 0.3481 (4.9%)

Shallow extraction is better than medical entity extraction

Page 19: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

19

Method Summary

• Baseline weighting• First Post BM-25• Thread BM-25

• Semantic weighting• Medical term extraction• Shallow Information Extraction

• Post weighting• Monotonic weighting• Parabolic weighting

• Forum Category weighting• Uniform weighting (FCUW)• Feedback weighting (FCFW)

Q: Which posts are most informative in a given thread?

Page 20: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

Not all posts are equally representative

),(' twc

Post Weighting

Sondhi, 2013

Page 21: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

Post Weighting

),()3,1( 1pwcf

),()3,3( 3pwcf

: gives the weight of post i in a thread with K posts

),( Kif

Page 22: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

Monotonic Post Weighting

2m

1m

3m

Post Position i

Relative Post

Weightfor K=10

Page 23: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

Parabolic Post Weighting

Page 24: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

Post Weighting Methods Evaluation

FF UF LQ Cross Forum

0.4

0.5

0.6

0.7

0.8

UniformMonotonicParabolic

Forum Used

Acc

ura

cy

Page 25: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

Results: Post Weighting

Run Method P@5 Recall@30 MAP

B2 Baseline FPBM-25 0.4700 0.4975 0.3316

P1 Monotonic 0.5100 (8.5%) 0.5240 (5.3%) 0.3631 (9.5%)

P2 Parabolic 0.5100 (8.5%) 0.5040 0.3494

Both post weighting schemes outperform the baseline

Page 26: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

26

Method Summary

• Baseline weighting• First Post BM-25• Thread BM-25

• Semantic weighting• Medical term extraction• Shallow Information Extraction

• Post weighting• Monotonic weighting• Parabolic weighting

• Forum Category weighting• Uniform weighting (FCUW)• Feedback weighting (FCFW)

Q: Can we utilize forum categories?

Page 27: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

Forum Categories

Page 28: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

• Relevance feedback based on top k retrieved categories• Forum Category Uniform weighting (FCUW)

• Forum Category Feedback weighting (FCFW)

Forum Category Weighting

Randomly selecting forum IDRatio of current forum ID

amongst retrieved documents

Page 29: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

29

Forum Category Weighting Scoring

New ScoreForum Category

Feedback weighting

Weights for forum category weighting

Page 30: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

30

Results: Forum Category Weighting

Run Method P@5 Recall@30 MAP

B2 Baseline FPBM-25 0.4700 0.4975 0.3316

P1 Uniform weighting 0.5200(10.6%)

0.4678(-7.0%) 0.3334 (0.5%)

P2 Feedback weighting 0.5100 (8.5%)

0.4610(-7.3%) 0.3389 (2.2%)

Uniform weighting and Feedback weighting similar performance, but FCFW less parameters to tune.

Page 31: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

31

Results: Method Combinations

Run Method P@5 Recall@30 MAP

B2 Baseline FPBM-25 0.4700 0.4975 0.3316

S2 Baseline FPBM-25 + ShallowEx

0.53 0.4847 0.3481

C2 Monotonic + ShallowEx

0.5400 (14.9%) 0.5354 (7.6%) 0.3745 (12.9%)

C3 Parabolic+ShallowEx

0.5100 0.5155 0.3573

C4 Monotonic + ShallowEx + FCFW

0.5200 0.5625 (13.1%) 0.3702Monotonic + ShallowEx performs the best

Page 32: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

Conclusion• Fairly high P@5 accuracy is achievable

• Treating first post as query performed the better than utilizing all posts in thread

• Shallow information extraction is better for query understanding• Incorporates contextual information

• Utility of posts drops steadily with position

• Easy extension of baseline method

Page 33: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

33

Future Work• Recommending relevant forum posts for doctors

• Various online forums have ‘ask a doctor’ section• Doctors will save time by recommending forum posts

• Intent-based case retrieval• Identifying intents for both the end user and the existing posts

will improve search quality• Examples: Cause of symptom, managing disease, adverse

effects

Page 34: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

34

• This work is supported in part by the National Science Foundation under Grant Number CNS-1027965. We would also like to thank the anonymous reviewers for their invalu- able feedback, and Institute of Genomic Biology for their computing resources.

Acknowledgements

Page 35: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

35

Questions?

Thank you!

Page 36: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

36

• J. H. D. Cho and V. Q. Liao and Y. Jiang and B. Schatz, Aggregating Personal Health Messages for Scalable Comparative Effectiveness Research. ACM BCB, 2013

• J. H. D. Cho and P. Sondhi and C. Zhai and B. Schatz, Resolving Healthcare Forum Posts via Similar Thread Retrieval. ACM BCB, 2014

• K. Pattabiraman and P. Sondhi and C. Zhai, Exploiting Forum Thread Structures to Improve Thread Clustering. ICTIR 2013.

• P. Sondhi and M. Gupta and C. Zhai and J. Hockenmaier, Shallow Information Extraction from Medical Forum Data. COLING 2010.

• B. W. Chee and R. Berlin and B Schatz, Predicting Adverse Drug Events from Personal Health Messages, AMIA 2011

• Diana L. MacLean and Jeffrey Heer. Identifying medical terms in patient-authored text: a crowdsourcing-based approach. Journal of the American Medical Informatics Association, pages amiajnl–2012–001110+, May 2013.

References

Page 37: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

37

Features & Performance of Shallow Information Extraction Method

Page 38: Resolving Healthcare Forum Posts via Similar Thread  Retrieval

unig

ram

s

+sem

antic

+pos

ition

+mor

phol

ogica

l

+wor

dcou

nt

+thr

eadc

reat

or

+big

ram

s

Feat. S

elec

tion

60

62

64

66

68

70

72

74

76

Performance results for different feature sets

Order-1 CRF

SVM

Feature Set

Pe

rce

nta

ge

A

ccu

racy

We use the best performing SVM based classifier(Posts: 175, Sentences: 1494)

ShallowEx: Extraction Model


Recommended