+ All Categories
Home > Documents > Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50...

Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50...

Date post: 07-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
31
1 09/11/07 Using UMLS CUIs for WSD in the Biomedical Domain Bridget T. McInnes¹ Ted Pedersen² and John Carlis¹ University of Minnesota Twin Cities¹ and University of Minnesota Duluth²
Transcript
Page 1: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

109/11/07

Using UMLS CUIs for WSD in the Biomedical Domain

Bridget T. McInnes¹Ted Pedersen²

and John Carlis¹

University of Minnesota Twin Cities¹ and

University of Minnesota Duluth²

Page 2: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

209/11/07

What is WSD?

The culture count doubled.

Culture

LaboratoryCulture

AnthropologicalCulture

Sense Inventory

Page 3: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

309/11/07

Sense Inventory: UMLS

Unified Medical Language System contains a list of Concept Unique Identifiers (CUIs) which are concepts (senses) associated with a word

or term

Culture

LaboratoryCulture (C0430400)

AnthropologicalCulture (C0010453)

Sense Inventory: UMLS

Page 4: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

409/11/07

UMLS: Semantic Network

framework encoded with different semantic and syntactic structures

AnthropologicalCulture (C0010453)

Semantic Type(s):Idea or Concept

Semantic Type(s):Laboratory Procedure

Semantic Type:Mental Process semantic relation:

assesses_effect_ofsemantic relation:

result_of

LaboratoryCulture (C0430400)

Page 5: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

509/11/07

MetaMap

Concept mapping system

maps text to concepts in the UMLS provides a wealth of information for all words in a document

phrasal informationPart of speech (POS) of a wordCUI of a wordSemantic types of a word

Page 6: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

609/11/07

Example

The culture count doubled

countCUI: Count (C0750480)semantic type: Idea or Concept (idcn)pos: noun

doubled

CUI: Duplicate (C0205173)semantic type: Functional Concept (ftcn)pos: verb

Page 7: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

709/11/07

Supervised Approaches

Leroy and Rindflesch 2005Semantic types, semantic relations, part-of-speech, and head information (from MetaMap)

Joshi, Pedersen and Maclin 2005

unigrams in the same sentence as the ambiguous word in the same abstract as the ambiguous word

Liu, Teller and Friedman 2004unigrams, direction and orientation of unigrams and collocations

Page 8: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

809/11/07

Questions

Page 9: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

909/11/07

Questions

Would UMLS CUIs be an improvement over semantic types?

Page 10: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

1009/11/07

Questions

Would UMLS CUIs be an improvement over semantic types?

Would the biomedical specific feature CUIs be an improvement over the more general feature unigrams?

Page 11: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

1109/11/07

Questions

Would UMLS CUIs be an improvement over semantic types?

Would the biomedical specific feature CUIs be an improvement over the more general feature unigrams?

Would increasing the context window in which surrounding CUIs are found improve the results?

Page 12: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

1209/11/07

Our supervised approach

Algorithm:

Naïve Bayes from WEKA datamining package using 10 fold cross validation

Features:

UMLS CUIs obtained from MetaMap

that occur in the same sentence as the ambiguous word more than one time (s-1-cui) that occur in the same abstract as the ambiguous word more than one time (a-1-cui)

Page 13: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

1309/11/07

Example

... The culture count doubled. The cells multiplied by twice the expected rate ...

C0750480 Count (2)C0205173 Duplicate (1)...

C0750480 Count (2)C0205173 Duplicate (3)C0007634 Cells (4)C1517001 Expected (1)C1521828 Rate (3)...

Sentence: Abstract:

Page 14: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

1409/11/07

Example Instances

Extract Relevant CUIs

Training Data Test Data

Algorithm

Naïve Bayes Algorithm

Sense TaggedTest Data

Page 15: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

1509/11/07

Dataset

National Library of Medicine's Word Sense Disambiguation (NLM-WSD) Dataset

50 words from the 1998 MEDLINE abstracts

100 instances for each of the 50 words

Each instance has been tagged by MetaMap

The target word was manually assigned a UMLS concept or None

Average number of concepts per ambiguous word is 2.26 (not including None)

Page 16: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

1609/11/07

Data subsets

Liu subsetLiu, Teller and Friedman 200422 out of the 50 words in NLM-WSD

Leroy subset

Leroy and Rindflesch 200515 out of the 50 words in NLM-WSD

Joshi subset

Joshi, Pedersen and Maclin 200528 out of the 50 words in NLM-WSD

(union of Leroy and Liu subsets)

Page 17: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

17

Results

Page 18: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

1809/11/07

Results for Question 1

Would CUIs be an improvement over semantic types?

Page 19: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

1909/11/07

Comparative results with Leroy and Rindflesch 2005

s-1-cui a-1-cui s-0-Leroy0

5

10

15

2025

30

35

40

45

5055

60

65

7075

Accuracy using Leroy subset

71% 74.5%

65.6%

Page 20: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

2009/11/07

Significance of Differences

Pairwise t-test

s-1-cui (71%) and s-0-Leroy (65.6%)

p <= 0.001 a-1-cui (74.5%) and s-0-Leroy (65.6%)

p <= .00005

Page 21: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

2109/11/07

Results for Question 2

Would the biomedical specific feature CUIs be an improvement over the more

general feature unigrams?

Page 22: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

2209/11/07

Comparative results with Joshi, Pedersen and Maclin

2005

s-1-cui a-1-cui s-4-Joshi a-4-Joshi0

10

20

30

40

50

60

70

80

90

Accuracy using Joshi subset

77.7% 80% 82.5%

79.3%

Page 23: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

2309/11/07

Significance of Results

Pairwise t-test

s-1-cui (77.7%) and s-4-Joshi (79.3%)p < 0.135

a-1-cui (80.0%) and a-4-Joshi (82.5%)p < 0.003

Page 24: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

2409/11/07

Results for Question 3

Would increasing the size of the context window in which surrounding CUIs are found improve the results, as

seen by Joshi, Pedersen and Maclin using unigrams?

Page 25: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

2509/11/07

Comparative results between size of context window

s-1-cui a-1-cui0

10

20

30

40

50

60

70

80

Accuracy using NLM-WSD dataset

83.3% 85.6%

Page 26: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

2609/11/07

Significance of Results

Pairwise t-test

s-1-cui (83.3%) and a-1-cui (85.6%)p < 0.0006

Page 27: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

2709/11/07

Comparative results with Liu, Teller and Friedman 2004

a-1-cui s-0-Liu0

10

20

30

40

50

60

70

80

90

Accuracy using the Liu subset

81.9%85.5%

Page 28: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

2809/11/07

Significance of Results

Pairwise t-test

a-1-cui (81.9%) and s-1-Liu (85.5%)p < 0.001

Page 29: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

2909/11/07

Conclusions

CUIs result in more accurate disambiguation than semantic types and are comparable to unigrams

Incorporating more surrounding context improves the results

MetaMap generates useful information that can used as features for supervised disambiguation

Page 30: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

3009/11/07

Future Work

Combination approach

Exploring additional UMLS features

Unsupervised approach using information from the UMLS

Page 31: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for

3109/11/07

Software and Data

CuiTools version 0.05http://cuitools.sourceforge.net

NLM-WSD Dataset

http://wsd.nlm.nih.gov Pairwise t-test

http://www.quantitativeskills.com/sisa/statistics/


Recommended