+ All Categories
Home > Documents > Www.svenska.gu.se spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University...

Www.svenska.gu.se spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University...

Date post: 12-Jan-2016
Category:
Upload: primrose-stafford
View: 215 times
Download: 1 times
Share this document with a friend
28
www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg [email protected] Medication Extraction from Clinical Data Using Frame Semantics
Transcript
Page 1: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

DIMITRIOS KOKKINAKISCentre for Language TechnologyUniversity of [email protected]

Medication Extraction from Clinical Data Using Frame Semantics

Page 2: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

OVERVIEW

Motivation Semantic Annotation of Corpora and

Event-Based Information Extraction e.g. i2b2 Medication Challenge

Frame Semantics Medical Frames Pilot. Administration_of_Medication

Design and Resources (so far…) Conclusion and Future Work

Page 3: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

Semantic annotation of corpora for mining complex relations and events has gained a considerable growing attention in the medical domain

Goal (work in progress) to develop an appropriate infrastructure for automatic event labeling in the clinical domain using hybrid techniques (e.g. supervised machine learning, rules, lexicons, etc)

Event extraction can be modeled as a sequential tagging problem, train and test data sets will be/are taken from Swedish medical corpora while the Swedish FrametNet++ provides the basis for the events’ description

MOTIVATION (EXTRACTION OF FACTS and EVENTS)

Page 4: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

Information extraction (IE) is a technology thathas a direct correlation with frame-like structures in FrameNet; since templates in the context of IE are frame-like structures with slots representing event information. Most event-based IE approaches are designed to identify role fillers that appear as arguments to event verbs or nouns, either explicitly via syntactic relations or implicitly via proximity

EVENT-BASED INFORMATION EXTRACTION

Page 5: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

The Third i2b2 Workshop on NLP Challenges for Clinical Records (designed as an information extraction task) focused on the extraction of medications and medication-related information from discharge summaries

The ”Medication Challenge” i2b2… (2009)

Page 6: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

The ”Medication Challenge” i2b2… (2009)

Page 7: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

ADVANTAGES OF STRUCTURED DATA…

get an overview of the medication ordered in diff dimensions help organize and improve the presentation of EHR; advanced

graphical presentation of EHR data create the basis for data mining, evidence-based medicine; e.g.

for the epidemiological analysis of adverse events allow the automatic transmission of data to various registries aggregate data from many patients in repositories, facilitating e.g.

open comparisons make the selection of more reliable quality comparisons between

different parts of the country / world create a database directly accessible to the research allowing the generation of new hypotheses and new (semantic)

relationships improving patient safety, pharmacovigilance …

Page 8: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

The FrameNet approach is based on the linguistic theory of frame semantics supported by corpus evidence. A semantic frame is a script-like structure of concepts, which are linked to the meanings of linguistic units and associated with a specific event or state

Each frame identifies a set of frame elements, which are frame specific semantic roles; both so called core roles, arguments, tightly coupled with the particular meaning of the frame and more generic non-core ones, adjuncts or modifiers which to large extent are event-independent semantic roles

When using computers to extract semantic information for NLP tasks, FrameNet's semantic mapping provides a means for the computer to extract meaning from a string of words

FRAME SEMANTICS…

Page 9: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

FRAME SEMANTICS…Thus, a word activates, or evokes, a frame of semantic knowledge

relating to the specific concept it refers to. A semantic frame is a collection of facts that specify "characteristic features, attributes, and functions of a denotatum, and its characteristic interactions with things necessarily or typically associated with it". A semantic frame can also be defined as a coherent structure of related concepts that are related such that without knowledge of all of them, one does not have complete knowledge of any one

E.g., one would not be able to understand the word sell without knowing anything about the situation of commercial transfer, which also involves a seller, a buyer, goods, money, the relation between the money and the goods and so on

Page 10: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

http://www.icsi.berkeley.edu/pubs/icsi/2011AnnualReport.pdf

FN began collaborations with two industrial partners this year. One is with a defense contractor to develop frames and annotation for reports written by U.S. soldiers after patrols in Afghanistan and Iraq. The other is a partnership with Siemens Research U.S. to develop frames and annotation for medical texts, such as medical textbooks and guidelines for the treatment of diseases.

RELEVANT APPLICATIONS…

Page 11: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

A slide from an LREC 2012 presentation (closing session)

Page 12: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

Page 13: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

MEDICALLY ORIENTED FRAMES

https://framenet.icsi.berkeley.edu/fndrupal/index.php?q=frame_report&name=Medical_intervention

Page 14: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

Swedish MEDICALLY ORIENTED FRAMESAdministration_of_medication AddictionBirth Death Experience_bodily_harm Falling_ill Health_response Institutionalization Medical_disorders Medical_instruments Medical_interaction_scenarioMedical_professionals Medical_specialties Medical_treatment Observable_bodyparts People_by_disease Recovery …

http://spraakbanken.gu.se/eng/research/swefn/development-version

Page 15: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

Example Frame: CURE

http://spraakbanken.gu.se/eng/research/swefn/development-version

Page 16: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

http://spraakbanken.gu.se/eng/research/swefn/development-version

Example

Page 17: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

CORE Frame Elements

NON-CORE Frame Elements

Frame: Administration_of_Medication

Page 18: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

Design so far… Resources in Use1. FASS is the Swedish national formulary: contains a list of

medicines that are approved for prescription throughout 2. Swedish SNOMED CT’s Substance hierarchy: contains

“concepts that can be used for recording active chemical constituents of drug projects, food and chemical allergens, adverse reactions, toxicity or poisoning information, and physicians and nursing orders”<http://www.ihtsdo.org/snomed-ct/snomed-ct0/snomed-ct-hierarchies/substance/>

3. Swedish MeSH’s category D, Chemicals and Drugs (5,886)4. Drug lexicon extensions (e.g. generic expressions of drugs,

detecting misspellings)5. List of relevant abbreviations+variants: iv, i.v., im, i.m. sc,

s.c., po, p.o., vb, v.b., V b, T, inj., tbl, …6. …

Page 19: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

Design so far… Resources in Use1. Named Entity Recognition for the relevant entities:

1. Drug Names2. Time3. Frequency

2. Terminology Recognition1. MeSH2. SNOMED CT

3. (ongoing) Manual annotation with the frame elements

Page 20: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

Page 21: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

Page 22: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

Page 23: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

Page 24: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

Page 25: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

Richard Johansson, Karin Friberg Heppin, Dimitrios Kokkinakis. Semantic Role Labeling with the Swedish FrameNet. Proceedings of the 8th International Conf on Language Resources and Evaluation (LREC'12), pp. 3697–3700. Istanbul, Turkey, 2012.

Page 26: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

The driving force for the experiments is frame semantics, which allows us to work with a more holistic and detailed semantic event description than it is possible using for instance most traditional efforts based on binary relation extraction approaches

Event extraction is more complicated and challenging than relation extraction since events usually have internal structure involving several entities as participants allowing a detailed representation of more complex statements

Preliminary results suggest that SweFN++ seems a good start for annotating corpora. The role set described is general enough to capture a wide range of phenomena that characterize the majority of semantic arguments of general medical events

CONCLUSIONS

Page 27: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

Need larger size of annotated corpora for larger scale experiments (which are planned…)

We are currently working with:• extending/refining/encoding new frames according to the BFN descriptions• manually annotating larger corpora• investigate how existing frame descriptions can actually capture semantics• continue with more experiments (methods, software, larger data sets) for learning to annotate the arguments• using a richer set of features, and particularly syntactic information and the distance between the arguments

FUTURE WORK

Page 28: Www.svenska.gu.se  spraakbanken.gu.se DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gotehnburg dimitrios.kokkinakis@svenska.gu.se.

www.svenska.gu.se www.clt.gu.se spraakbanken.gu.se

…related REFERENCES

• Sigfried Gold, Noémie Elhadad, Xinxin Zhu, James J. Cimino, and George Hripcsak. Extracting Structured Medication Event Information from Discharge Summaries. AMIA Annu Symp Proc. 2008; 2008: 237–241.

• Jon Patrick, Min Li. High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge. J Am Med Inf Assoc 2010;17:524e527.

• Louise Deléger, Cyril Grouin, Pierre Zweigenbaum. Extracting medical information from narrative patient records: the case of medication-related information. J Am Med Inf Assoc 2010;17:555e558.

• Son Doan, Lisa Bastarache, Sergio Klimkowski, Joshua C Denny, Hua Xu. Integrating existing natural language processing tools for medication extraction from discharge summaries. J Am Med Inf Assoc 2010;17:528e531.

• Thierry Hamon, Natalia Grabar. Linguistic approach for identification of medication names and related information in clinical narratives. J Am Med Inf Assoc 2010;17:549e554.

• Scott Russell Halgrim, Fei Xia, Imre Solti, Eithon Cadag, Özlem Uzuner. A cascade of classifiers for extracting medication information from discharge summaries. J of Biomed Sem 2011, 2(Suppl 3):S2


Recommended