+ All Categories
Home > Documents > Galia Angelova Institute of Information and Communication...

Galia Angelova Institute of Information and Communication...

Date post: 22-May-2018
Category:
Upload: dangthu
View: 221 times
Download: 0 times
Share this document with a friend
29
INSTITUTE OF INFORMATION AND COMMUNICATION TECHNOLOGIES BULGARIAN ACADEMY OF SCIENCE Automatic Information Extraction from Patient Records in Bulgarian Language 1 http://www.iict.bas.bg/acomin 6/27/2013 Galia Angelova Institute of Information and Communication Technologies Bulgarian Academy of Sciences AComIn: Advanced Computing for Innovation
Transcript

INSTITUTE OF INFORMATION AND

COMMUNICATION TECHNOLOGIES

BULGARIAN ACADEMY OF SCIENCE

Automatic Information Extraction from Patient Records

in Bulgarian Language

1

http://www.iict.bas.bg/acomin6/27/2013

Galia Angelova

Institute of Information and Communication Technologies

Bulgarian Academy of Sciences

AComIn: Advanced Computing for Innovation

Outline

• First: thanks a lot for the invitation!

• Automatic analysis of medical texts,

extraction of patient-related data - WHY?

• Specific features of Bulgarian clinical

narratives and hospital discharge letters

6/27/20132

http://www.iict.bas.bg

narratives and hospital discharge letters

• Achievements

• Current work

• Conclusion

• Acknowledgements

AComIn: Advanced Computing for Innovation

Biomedical NLP (Nat Lang Proc)

• Started in the 80-ies last century in the USA

• Task 1: automatic encoding of diagnoses,

procedures, …, described in clinical texts

• Hot branch in “secondary use” of EHR data

6/27/20133

http://www.iict.bas.bg

Bulgarian clinical texts

• The Latin terminology is a real challenge

• Mixture of medical terminology in Latin and

Bulgarian. Example:

Angiosclerosis vas. retinae hypertonica. Начални

6/27/20134

http://www.iict.bas.bg

Angiosclerosis vas. retinae hypertonica. Начални

промени по типа на диабетна ретинопатия.

• Latin terms transliterated with Cyrillic letters

Диагноза: Хипотиреоидизмус постоператива

компенсата.

Bulgarian clinical texts

•• About About 1/4 1/4 nonnon--Bulgarian wordforms and Bulgarian wordforms and

strings (without counting misspellings)strings (without counting misspellings)

6/27/20135

http://www.iict.bas.bg

Bulgarian clinical texts

•• Corpus of 6200 anonymised hospital discharge Corpus of 6200 anonymised hospital discharge

letters of diabetic patientsletters of diabetic patients

Terms Wordforms Basic words Abbreviations

Bulgarian 601 233 12 009 (63%) 1 471

6/27/20136

http://www.iict.bas.bg

Bulgarian 601 233 12 009 (63%)

>50% “unknown”

1 471

Latin 18 926 560 (3%) 1 189

Translite-

rations

179 589 6 465 (34%) 982

Total 799 748 19 034 3 642

Bulgarian clinical texts

Common features of a medical sublanguage:

• Phrases instead of complete sentences

• A lot of implicit or tacit knowledge needed for proper understanding

• Only few types of negation

• Mostly the facts relevant to the focus are

6/27/20137

http://www.iict.bas.bg

• Mostly the facts relevant to the focus are documented – e.g., in a specialised diabetic hospital, facts related to other diseases might be ignored

• Many results of clinical tests are entered as free texts when they are done outside the hospital

BG hospital discharge letters

• 2-3 pages, structured into sections (by law)

6/27/20138

http://www.iict.bas.bg

BG hospital discharge letters• 77г. - ж

• гр. София

• Диагноза: Захарен диабет тип 2, с вторична резистентност към СУП. Полиневропатия диабетика. Нефропатия диабетика инципиенс. Тиреоидитис Хашимото – хипотиреоиден стадии. Анемия пернициоза. Двустранна глухота.

• Анамнеза: Постъпва за пореден път в клиниката за контрол на състоянието. Зах. диабет тип ІІ с 20г. давност, открит случайно при изследвания по друг повод. От 11г. е на лечение с инсулин, …. Оплаквания

6/27/20139

http://www.iict.bas.bg

изследвания по друг повод. От 11г. е на лечение с инсулин, …. Оплаквания при постъпването изцяло от страна на крайниците, изброени по – горе.

• Минали заболявания: Нефролитиазис билатералис.

• Фамилна обремененост :отрича.

• Рискови фактори – алергия към пеницилини и Аналгин.

• Статус: Жена на видима възраст около действителната, в задоволително общо състояние, ориентирана, …

• Изследвания: СУЕ – 22 , Хб - 133 , Ер – 4,6, Хт – 0,42 , Левк – 4,8 , МСV – 91,4; Тр - 258, HDL-chol – 1.28, общ хол. – 4,8, 3-гл – 1,07.; …..

• Обсъждане: …..

Completed projects

6/27/201310

http://www.iict.bas.bg

PSIP: an IP in 7FP ICT eHealth

• Patient Safety through Intelligent Procedures in medication

• Extension of a running project with 14 core partners

• 1.8 drugs per patient in the HIS; 5.6 in the free text

6/27/201311

http://www.iict.bas.bg

BG-Resources – ICD & Tabular Index

• ICD-10: 14439 lines ‘code’ - ’text description’

E66.8 'Other obesity'

• ‘Instructions’ for using ICD

– 19 161 BG-words

– 291 116 BG wordforms

6/27/201312

http://www.iict.bas.bg

– 291 116 BG wordforms

– 2 221 Latin Terms (11.59%)

– 83 713 occurrences of

Latin terms (28.76%)

– 76 939 descriptions of

9044 ICD codes

BG-Resources – Drug names

• 1500 drug names manually translated to Bulgarian, to fill

in the ATC classification with Bulgarian drug names

• Training to acquire grammatical patterns

6/27/201313

http://www.iict.bas.bg

Results for 6200 discharge letters

• Precision - % correct among all found

• Recall - % correct among all available

Occurren-

ces

Precision Recall F

6/27/201314

http://www.iict.bas.bg

ces

Diagnoses 26 826 97.30% 74.69% 84.50%

Drugs 160 892 97.28% 99.59% 98.42%

Drugs at hospitalisation day 0

• Contextualisation: timing of drug events

• Using the Anamnesis (Case History) section only

• 500 drugs in 6200 discharge letters

• Careful training on suitable phrases

6/27/201315

http://www.iict.bas.bg

• Precision: 88%

• Recall: 92,45%

• F (harmonic mean): 90,17%

• Award for best paper on EHR at EFMI 2011

EVTIMA: Building event timelines

• Events are important (with their time,modality)

• A Primitive Event (in our context) is a:

• (1) a diagnose,

• (2) a drug,

• (3) a condition: can be a complaint, a symptom, a

6/27/201316

http://www.iict.bas.bg

• (3) a condition: can be a complaint, a symptom, a change in the status that signals abnormality

– high BP

– decompensation of diabetes mellitus

– increased levels of serum creatinine

• Complex event – aggregation of e.g. all drugs

Temporal expressions

• Dates day/month/year

• Year or month only

• Prepositional phrases containing temporal

information

Classified into

6/27/201317

http://www.iict.bas.bg

• Classified into

– Absolute

– Relative according to hospitalisation date,

birthdate, events like e.g. previous moment

“since then” or other (“since puberty”)

Ordering events on time lines

• Algorithm based on directed multi-graph

representation

• Time markers are nodes (states)

• The edges represent primitive events incident

6/27/201318

http://www.iict.bas.bg

• The edges represent primitive events incident

with the beginning and end time nodes

• Two graphs are generated – one for relative

and one for absolute time scales

6/27/201319

http://www.iict.bas.bg

Evaluation: Training/test sets 1300/6200

• Average Primitive events per Discharge Letter: 20,69

• In the training/test set: 371/565 different diagnoses (patients have similar diagnoses and treatments)

• In the test set:

– 1,349 dates (day/month/year),

– 2,698 markers (year and/or month only),

6/27/201320

http://www.iict.bas.bg

– 2,698 markers (year and/or month only),

– 2,362 markers for relative time periods

– 2,351 concerning the admission date

• Distribution of temporal markers:

– 38% to events presenting diagnoses

– 47% to events expressing drug admission / change

– 15% to complaints and conditions

Accuracy

Precision % Recall % F %

Drugs 97.28 99.59 98.42

Event Diagnoses 97.30 74.68 84.50

Complaints 97.98 96.82 97.40

6/27/201321

http://www.iict.bas.bg

Complaints 97.98 96.82 97.40

Dates 98.86 98.21 98.53

Time Duration 99.14 98.26 98.70

Frequency 92.25 95.51 93.85

Current work

• Reimbursement of Diabetic patients (ICD E11) is a

major budget of the Health Insurance Fund (HIF):

– 2011: > 61 Mio lv

– 2012: > 77 Mio lv

– 2013/Jan-March: > 20 Mio lv

6/27/201322

http://www.iict.bas.bg

– 2013/Jan-March: > 20 Mio lv

– tendency to use more expensive drugs

• Principal application goal: to do something useful

with the millions of Records stored in the Health

Insurance Fund (there is much text there)

• Ambition of the very active medical partners

Files submitted for reimbursement<Pay>1</Pay>

- <Patient>

<EGN>29d53d021a8ea04f8a58b0b7b17ca901d471c111</EGN> PSEUDONYM

<RZOK>22</RZOK>

<ZdrRajon>01</ZdrRajon>

- ……

<age>68</age>

<gender>2</gender> </Patient>

- <MainDiag>

<imeMD>Неинсулинозависим захарен диабет с неврологични усложнения</imeMD>

<MKB>E11.4</MKB> </MainDiag>

6/27/201323

http://www.iict.bas.bg

<MKB>E11.4</MKB> </MainDiag>

- <Diag>

<imeD>Диабетна полиневропатия (Е10-Е14 с общ четвърти знак .4)</imeD>

<MKB>G63.2</MKB> </Diag>

- <Diag>

<imeD>Тиреоидит, неуточнен</imeD>

<MKB>E06.9</MKB> </Diag>

- <Diag>

<imeD>Хипертонична болест на сърцето</imeD>

<MKB>I11</MKB> </Diag>

- <Diag>

<imeD>Стенокардия</imeD>

<MKB>I20</MKB>

</Diag>

Files submitted for reimbursement

• <Anamnesa>От 9 год. има Захарен диабет, установен по повод умерена полидипсо-полиурия,при наднормено тегло. Приема Метфогамма 3 х 1000 мг. дн. Установена Невропатия и провежда лечение с вливания с Тиоктацид.Има отпадналост, неспокоен сън. Има Хипертония , ИБС. Приема Енап и Верапамил. Кр. захар най-често е около 10. Ф.А.-баща й бил с хипертония.</Anamnesa>

• <HState>Ръст 164 см., тегло 81 кг, ИТМ 30.6 кг/м2.Кожа-леко суховата. Щит. жл.- суспекция за възел в десния лоб /на ехография възел в десния лоб и в левия и хипоехогенна зона в л. лоб/. Дих. с-ма-б.о.Сърд. дейност е правилна, ритмична 80 в мин., ясни тонове, кр. нал. 150/90. Ч. дроб и

6/27/201324

http://www.iict.bas.bg

е правилна, ритмична 80 в мин., ясни тонове, кр. нал. 150/90. Ч. дроб и слезка-не се опипват. Сетивност-запазена.</HState>

• <Examine>КЗП 9.3-8.13, 8.96 ,НвА1с 6.9% , пик. к-на 277.8 ехография на щит. жл.-д.лоб увеличени р-ри,нехомогенна с-ра, хипоехогенен възел с р-ри 21/18 мм, л. лоб-норм.р-ри в средна трета хипоехогенен възел 8/5 мм , в основата некапсулирана хипоехогенна зона 20/11мм-закл. Струма нодоза, диф. д.-Тир. на Хашимото-нод.форма [ 10.09 - сума: 10,34 ] ТАТ, МАТ</Examine>

• - <Therapy> <Nonreimburce>Да бъде на хипокалоричен диетичен режим-дадени указания.Да приема Метфогамма 3 х 1000 мг.След изследванията ще се прецени тир. функция. Отказва ТАБ. Води се на диспансеризация от ОПЛ.</Nonreimburce>

</Therapy>

Data Mining of multiple visits data

6/27/201325

http://www.iict.bas.bg

Possible findings

• When diabetic patients come second time for

control examinations, what is the reason for the

worsened lab test results?

• Does it depend on the drugs (giving more

expensive drugs does not always mean better

6/27/201326

http://www.iict.bas.bg

expensive drugs does not always mean better

compensation)

• Grouping patients by: gender, age, region, drugs,

accompanying diseases … but only after

automatic analysis of the free text presenting

clinical tests

Conclusion

• Medicine is a quite large domain, progress might be only incremental

• Diabetes is a relatively narrow “genre”

• Medical experts learn the potential of text analytics and plan how to use it practically

6/27/201327

http://www.iict.bas.bg

• From application perspective, what we want do is a typical example for secondary use of EHR data

• Principal theoretical goal: … to help computers understand biomedical language and natural language in general ☺☺☺☺

Acknowledgements

• Dr Dimitar Tcharaktchiev, University SpecialisedHospital for Endocrinology, Medical Univ. Sofia

• Dr Svetla Boytcheva, AUBG

• Ivelina Nikolova, PhD student IICT-BAS

• Hristo Dimitrov, PhD student MU-Sofia

6/27/201328

http://www.iict.bas.bg

• Hristo Dimitrov, PhD student MU-Sofia

• Dr Zhivko Angelov, Adiss Lab Ltd.

• All starring in the movie:

Информатиката в полза на здравеопазването

http://www.youtube.com/watch?v=K7m3JY9ekHA&feature=youtu.be

Acknowledgements

• AComIn (Advanced Computing for Innovation),

FP7-REGPOT-2012-2013-1 grant 316087

• PSIP (Patient Safety through Intelligent

Procedures in medication), FP7 ICT eHealth grant

216130

6/27/201329

http://www.iict.bas.bg

216130

• EVTIMA (Effective search of conceptual

information with applications in medical

informatics), Bulgarian National Science Fund DO

02-292


Recommended