Biomedical Word Sense Disambiguation presentation [Autosaved]

Post on 15-Apr-2017

76 views 3 download

transcript

05/03/2023 1

Biomedical Word Sense Disambiguation with Neural Word and Concept Embedding

Department of Computer Science University of Kentucky

Oct 7, 2016

AKM Sabbir

Advisor,Dr. Ramakanth Kavuluru

05/03/2023 2

Outline Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Method in Detail • Experiment and Analysis• Conclusion

05/03/2023 3

Introduction

• WSD is the task of detecting correct sense or assigning proper

sense

– the air in the center of the vortex of a cyclone is generally

very cold

– I Could not come to office last week because I had a cold

• Retrieving information from Machine is not easy task

• Number of Natural Language Processing(NLP) tasks require WSD

05/03/2023 4

Outline• Introduction Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Method in Detail • Experiment and Analysis• Conclusion

05/03/2023 5

Application

• Text to Speech Conversion

– Bass can be pronounced either base or baes

• Machine Translation

– French Word Grille can be translated into gate or bar

• Information Retrieval

• Named Entity Recognition

• Document Summary Generation

05/03/2023 6

Outline• Introduction• Application of Word Sense Disambiguation(WSD) Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Approach• Experiment and Analysis• Conclusion

05/03/2023 7

Motivation

• Generalized WSD is a difficult problem

• Solve it for each domain

• Biomedical domain contains a large number of ambiguous

words

• Medical report summary generation

• Drug side effect prediction

05/03/2023 8

Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Method in Detail • Experiment and Analysis• Conclusion

05/03/2023 9

Related Method

• Supervised Methods

– Support Vector Machines, Convolutional Neural Net

• Unsupervised Methods

– Clustering, generative model

– If vocabulary has four words w1, w2, w3, w4

• Knowledge Based Methods

– WordNet, UMLS(Unified Medical Language System)

w1 … w4

w1 1/5 2/5 0w2…

05/03/2023 10

Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD Our Method• Word Vectors• Tools Used• Our Approach• Experiment and Analysis• Conclusion

05/03/2023 11

Our Method• We build a semi supervised model

• Model involves usage of concept/sense/CUI vectors just like how people use word vectors (more later)

• Metamap is an knowledge based NER tool. We use its decisions is used to generate concept vectors

• Model also involves the usage of P(w|c) where c is a concept or sense generated using other knowledge based approaches

• Generated word vectors using unstructured data source

05/03/2023 12

Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Approach Word Vectors• Tools Used• Our Method in Detail • Experiment and Analysis• Conclusion

05/03/2023 13

What is Word Vector• Distributed representation of words

• Representation of word spread across all dimension of vector

• The idea is different from other representation where the

length is equal to the vocabulary size. Here we choose a small

dimension say d=200 and generate dense vectors

• Each element of the vector contributes to the definition of

many different words

0.07 0.05 0.8 0.002 0.1 0.3King

0.7 0.05 0.67 0.002 0.2 0.3Queen

05/03/2023 14

What is Word Vector

• It is a numerical way of word representation

• Where each dimension captures some semantic and syntactic

information related to that word

• Using the similar idea we can generate concept/sense/CUI vectors.

05/03/2023 15

Why Word Vectors Work ?• Learned word vectors capture the syntactic and semantic

information exist in text– vector(“king”) – vector(“man”) + vector(“woman”) vector(“queen”)

Fig 5: resultant queen vector and other vectors [5]

05/03/2023 16

Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors Tools Used• Our Method in Detail• Experiment and Analysis• Conclusion

05/03/2023 17

Required Tools • language model

05/03/2023 18

Required Tools Contd.Step1 Parsing: text parsed in noun phrases using xerox POS tagger to perform syntactic analysis [4]. Step2 Variant Generation: Varaint for each input phrase are generated using the knowledge of specialist lexicons and supplementary database of synonymsStep3 Candidate Retrieval: the candidate sets retrieved from the UMLS metathesaurus contains at least One of the variants generated from step threeStep 4 Candidate Evaluation

Fig2 : Variants for word ocular

Fig3 : evaluated candidates for Ocular complication

• Metamap

05/03/2023 19

Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used Our Method in Detail• Experiment and Analysis• Conclusion

05/03/2023 20

Our Method in Detail

• Text preprocessing

– English stop words

– Nltk word tokenization

– Frequency greater than five

– Lower case everything

• Word context is ten words long

• Generated word vectors are 300 dimension

05/03/2023 21

Generating word and concept vectors• Generate word and concept vectors

• 20 million citations from pubmed for training word vectors

• Randomly chosen 5 million citations

• Retrieved 7.1 million sentences containing target ambiguous

words

• Each sentence is 16-17 words long

• Combined sentence are used to generate the bigrams

• Each bigrams fed into metamap with WSD option turned on

• Replace each bigram with corresponding concepts

• Then fed the data to language model to generate concept vector

05/03/2023 22

Estimate P(D|c) [Yepes et al.]

• Using Jimeno-Yepes and Berlanga[3] model used Markov

Chain to calculate P(D|c)

• In order to get P(D|c), need to calculate P(w|c)

05/03/2023 23

Biomedical MSH WSD

• A dataset with 203 ambiguous words

• 424 unique concept identifiers (senses)

• 38,495 test context instances with an average of 200 test

instances for each ambiguous word.

• Goal -- to correctly identify the senses for each test instance

05/03/2023 24

Model I Cosine Similarity

𝑓 𝑐 (𝑇 ,𝑤 ,𝐶 (𝑤 ) )=𝑎𝑟𝑔 max𝑐∈𝐶 (𝑤)

cos (𝑇 𝑎𝑣𝑔 , �⃗�)

• W is the ambiguous word

• T is test instance context containing the ambiguous word w

• C(w) is the set of concepts that w can assume

05/03/2023 25

Model II projection magnitude

𝑓 𝑐(𝑇 ,𝑤 ,𝐶 (𝑤))=arg max𝑐∈𝐶(𝑤)[𝜌 (cos (𝑇𝑎𝑣𝑔 ,�⃗� )) .‖𝑃𝑟 (𝑇 𝑎𝑣𝑔 , �⃗�)‖

‖𝑐‖ ]• Took projection along concept vector and then consider the

Euclidean norm

05/03/2023 26

Model III

𝑓 𝑐 (𝑇 ,𝑤 ,𝐶 (𝑤))=arg max𝑐∈𝐶(𝑤)[cos (𝑇 𝑎𝑣𝑔 , �⃗� ) .‖𝑃𝑟 (𝑇 𝑎𝑣𝑔 , �⃗�)‖

‖𝑐‖ ]• Combined both angular and magnitude

05/03/2023 27

Model IV

05/03/2023 28

Model V KNN

• Now we have multiple ways to resolve sense for ambiguous terms

• Built distantly supervised dataset by collecting data from

biomedical citations

• For each ambiguous words there is on average 40000 sentences

• Resolved senses for each sentences using Model IV

05/03/2023 29

KNN in Pseudo Code

05/03/2023 30

KNN contd.

𝑓 𝑘−𝑁𝑁 (𝑇 ,𝑤 ,𝐶 (𝑤))=argmax𝑐∈𝐶(𝑤) [ ∑

(𝐷 ,𝑤 , 𝑐)∈𝑅𝑘(𝐷𝑤 )

cos (𝑇 𝑎𝑣𝑔 , �⃗�𝑎𝑣𝑔) ]Training instance 1 (c_1)

Training instance 2(c_1)

Training instance 3 (c_2)

Training instance 4 (c_1)

Training instance 5 (c_2)

……………..

………………

Training instance n (c_2)

Test Instance 1 (__)

Cosine similarity

Training instance 1 (c_1, 0.7)

Training instance 2(c_1, 0.9)

Training instance 3 (c_2, 0.1)

Training instance 4 (c_1, 0.03)

Training instance 5 (c_2, 0.02)

……………..

………………

Training instance n (c_2, 0.12)

05/03/2023 31

KNN Accuracy graph

05/03/2023 32

Distant Supervision with CNN

• Used the refined assignment of CUIs to sentences as a training set

• Then used MSH WSD data as a test data set

• Trained 203 Convolutional Neural Net

• With one convolutional layer and one hidden layer

• Used 900 filters of 3 different size

• Used the test case for testing purpose

05/03/2023 33

Distant Supervision Using CNN

05/03/2023 34

Ensembling of CNNs

• Five CNN training and testing for each ambiguous words

• Average the output and takes the best one

• Tends to improve the result at the cost of computation

05/03/2023 35

Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Approach Experiment and Analysis• Conclusion

05/03/2023 36

Results and AnalysisMethods Results

Jimeno-Yepes and Berlanga [1] 89.10%

Cosine similarity (Model I ) 85.54%

Projection length proportion(Model II ) 88.68%

Combining Model I and II 89.26%

Combining Model I, II and [1] 92.24%

Convolutional Neural Net 86.17%

Ensembling CNN 87.78%

K-NN with k = 3500 () 94.34%

05/03/2023 37

Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Approach• Experiment and Analysis Conclusion

05/03/2023 38

Conclusion

• The developed model is highly accurate beating previous best

• It is unsupervised no requirement of hand label information

• It is scalable however the accuracy level will be uncertain

– By increasing the number of training sentence and the context of

sentence more information may be extractable

• Graph based algorithm need to be explored

• HPC, Theano, Nltk, Gensim Word2Vec

05/03/2023 39

Questions

05/03/2023 40

References1. Eneko Agirre and Philip Edmonds. Word sense disambiguation: Algorithms

and applications, volume 33. Springer Science & Business Media, 2007.2. Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin. A

neural probabilistic language model. The Journal of Machine Learning Research, 3:1137-1155, 2003

3. Antonio Jimeno Yepes and Rafael Berlanga. Knowledge based word-concept model estimation and renement for biomedical text mining. Journal of biomedical informatics, 53:300-307, 2015.

4. Aronson, Alan R. "Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program." Proceedings of the AMIA Symposium. American Medical Informatics Association, 2001.

5. https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/

05/03/2023 41

References6. Alex Krizhevsky, Ilya Sutskever, and Georey E Hinton. Imagenet

classication with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097-1105, 2012.