+ All Categories
Home > Documents > Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre...

Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre...

Date post: 31-Dec-2015
Category:
Upload: morgan-terry
View: 219 times
Download: 2 times
Share this document with a friend
Popular Tags:
21
Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France
Transcript
Page 1: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

Semi-Automated Extension of a Specialized Medical Lexicon for French

Bruno Cartoni & Pierre ZweigenbaumLIMSI-CNRS, France

Page 2: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

2

Outline

Context : UMLF for French The desired coverage The target lexical information The organisation of a specialised lexicon

Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique

Results Consensus guessing Acquisition of the full paradigm General improvement

Conclusion and further work

Page 3: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

3

Context : the InterSTIS project

InterSTIS: development of Terminology Server for French Medical Terminologies

Sub-Project: Improving the Lexical Coverage of a French medical lexicon (UMLF : Unified Medical Lexicon for French)

Use: support indexation process of medical texts

Issues: What is the desired lexical knowledge ? How to acquire it ?

Page 4: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

4

The desired coverage

Reference: “Term-Union” Union of 10 terminologies (CIM-10,

SNOMED, MeSH, CISMeF, …) of French medical domains, organised around concept identifiers (CUI) of the UMLS

311,518 terms 203,300 unique concepts (CUI) 94,964 word-forms

Page 5: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

5

Term-Union: example

C0000936 MSHFRE … Accommodation de l'oeiC0000936 MSHFRE … Accommodation des yeuxC0000936 MSHFRE … Accommodation oculaireC0000936 SNMIGIPFRE … accommodation visuelle...C00001558 MSHF … Voie cutanéeC00001558 MSHF … Voie intradermiqueC00001558 MSHF … Voie percutanéeC00001558 MSHF … Voie transcutanée

Observation of term variation

Page 6: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

6

Target lexical information

Term variation within Term-Union Graphemic

équilibre acido-basique – équilibre acidobasique [EN: acid-base balance]

Morphosyntactic adaptation de l'oeil - adaptation des yeux

[EN: eye adaptation]

Morphosemantic intoxication à l’alcool - intoxication alcoolique

[EN: alcohol intoxication]

Others ...

Page 7: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

7

Organisation of the specialised lexicon

3 types of relational tables for the 3 levels of representation (graphemic, inflection, derivation)

A full-entry lexicon (LMF compliant) that gathers all lexical information

…inter-maxillaire | intermaxillaireinsulino-sécrétantes | insulinosécrétantesscléro-cornéenne | sclérocornéenne …

...abdominal | abdomenaplasique | aplasiearachnoïdien | arachnoïdeargentique | argent…

…sérofibrineux | sérofibrineux | Afpmssérofibrineuse | sérofibrineux | Afpfssérofibrineux | sérofibrineux | Afpmpsérofibrineuses | sérofibrineux | Afpfp…

Page 8: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

8

Outline

Context : UMLS for French The desired coverage The target lexical information The organisation of a specialised lexicon

Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique

Results Consensus guessing Acquisition of the full paradigm General improvement

Conclusion and further work

Page 9: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

9

Acquiring the lexical information

Initial coverage of UMLF (previous project, UMLF, based on Baud et al. 1998) 17,192 lexical units

5,353 adjectives 11,799 nouns

36,211 word forms

Page 10: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

10

Acquiring the lexical information

From general lexicon Existing French general lexicon

(Morphalou) With a guessing technique

Page 11: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

11

Acquiring the lexical information

From guessing technique (Tanguy & Hathout 2007)

3 steps: Learning phase : calculating the most

frequent tag for each ending string in 2 existing lexicons

Guessing phase: assigning possible tag(s)

Cross validation with 2 guessing based on 2 lexicons

Page 12: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

12

Acquiring the lexical information

Acquiring the full paradigm All the inflectional forms Lemma

Based on “productive” inflectional paradigms 9 for adjectives 3 for nouns

Algorithm based on lexical tries to cluster forms of the same paradigm

Page 13: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

13

Outline

Context : UMLS for French The desired coverage The target lexical information The organisation of a specialised lexicon

Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique

Results Consensus guessing Acquisition of the full paradigm General improvement

Conclusion and further work

Page 14: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

14

Acquisition from general lexicon: results

74,9786,617Morphalou

81,59519,599Initial UMLF

94,964Term-Union

Remaining words to describe

Known words entries

Page 15: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

15

Acquisition with guessing techniques: results

74,978 unknown forms 44,515 analyses from Morphalou-based

program 35,438 analyses from UMLF-based

program Cross-validation: 30,137 in common

Page 16: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

16

Acquisition with guessing techniques: evaluation

Errors: 82 out of 1000 (8.2 %)

82Total

5Other

10Spelling/segmentation

1English words

5Latin words

49Proper names

12Wrong label

Page 17: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

17

Acquisition of the full paradigm: Results

4,453 paradigms captured (incomplete or not, grouping 9352 word forms) 3,308 adjectives 514 nouns

Automatic extension for the full paradigms (with canonical forms only)

Manually checked for the others

Page 18: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

18

General improvement

25,7%70,6028,088Acquisition

21,0%74,97817,828Morphalou

14,1%81,59536,211UMLF-v1

CoverageStill unknown in Term-union

Forms added

Source

Page 19: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

19

Outline

Context : UMLS for French The desired coverage The target lexical information The organisation of a specialized lexicon

Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique

Results Consensus guessing Acquisition of the full paradigm General improvement

Conclusion and further work

Page 20: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

20

Discussion and conclusion

The acquisition and evaluation of specialised lexical resources require a specific reference Term-Union Extract (full) lexical information Assess lexical needs and target

Other acquisition techniques (CRF for inflectional information, rule-based techniques for derivational information)

Page 21: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

21

Acknowledgment

This work was partially funded by project InterSTIS (ANR-07-TECSAN-010)

InterSTIS project: www.interstis.org


Recommended