+ All Categories
Home > Documents > 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of...

1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of...

Date post: 18-Jan-2018
Category:
Upload: judith-robinson
View: 216 times
Download: 0 times
Share this document with a friend
Description:
3 INTRODUCTION – Motivation SWAPS - South West Area Pathology Service  Natural language medical records – 400K pathology texts contains a great deal of formal terminology but used in an informal and haphazard way.  Medical records need to be converted to the formal terminology: to enable accurate retrieval to compile aggregated statistics of the medical care
17
1 Automatic Classification Automatic Classification of Pathology Reports of Pathology Reports into SNOMED Codes into SNOMED Codes June 2008 By Weihang ZHANG (MIT) Supervisors : Prof. Jon PATRIC K Dr. Irena KOPRINSKA
Transcript
Page 1: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

1

Automatic Classification of Automatic Classification of Pathology Reports into Pathology Reports into

SNOMED CodesSNOMED Codes

June 2008

By Weihang ZHANG (MIT)

Supervisors : Prof. Jon PATRICK

Dr. Irena KOPRINSKA

Page 2: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

2

INTRODUCTION

Motivation SWAPS Clinical Notes - 400K Pathology Texts

Context Text Categorization (TC) SNOMED Project TTSCT

Page 3: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

3

INTRODUCTION – Motivation

SWAPS - South West Area Pathology Service Natural language medical records – 400K pathology

texts contains a great deal of formal terminology but used in an

informal and haphazard way. Medical records need to be converted to the formal

terminology: to enable accurate retrieval to compile aggregated statistics of the medical care

Page 4: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

4

Text Categorization Definition:

Given a collection of documents D = {d1, d2, . . . , dn} , and a pre-defined category set C = {c1, c2, . . . , cm} ,

assign a True or False value for each pair <di , cj> D × C∈[Sebastiani F., 2002]

i.e. Text Categorization (TC) assigns meaningful categories to text Topics (politics, sports, entertainment, etc.) Opinions (negative, neutral, positive) Spams, Child Safety, Scam

A successful project: ScamSeek Project

INTRODUCTION – Context

Page 5: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

5

SNOMED - Systematized Nomenclature of Medicine

Concepts: Basic unit of meaning designated by a unique numeric code, unique name (Fully Specif

ied Name), and descriptions, including a preferred term and one or more synonyms.

INTRODUCTION – Context (Cont’)

Page 6: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

6

TTSCT - Text to SNOMED CT* A system which automatically maps free text into a m

edical reference terminology NLP*-technique enhanced lexical token matcher Qualifier identifier Negation identifier

[J. Patrick et al., 2006]

INTRODUCTION – Context (Cont’)

*CT – Clinical Terms

*NLP - Natural Language Processing

Page 7: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

7

OBJECTIVES Explore an effective information retrieval

mechanism for medical notes classification

Evaluate the performance of classifiers with TTSCT support

Develop a SNOMED auto-coding system which helps clinician on research and decision making

Page 8: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

8

RESEARCH METHOD Data Inspection System Design Classification Evaluation Machine Learners Comparison Feature Selection Methods Comparison

Indexing Methods Stemming Strategy Dimension Reduction N-Gram Text Subsection

Page 9: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

9

Data Inspection– The Pathology Text• 400K of pathology texts from the SWAPS Anatomical Pathology Database

•A set of diagnoses for each report (pathology text), presented as SNOMED codes

<Title>CLINICAL HISTORY</Title>

Biopsy of discoid erythematosus like lesion from right cheek ? DLE.

<Title>MACROSCOPIC</Title>

LABELLED `RIGHT CHEEK LESION'. An ellipse 12 x 3mm with subcutis to 3mm. A poorly defined pale nodular lesion 3 x 3mm. It appears to abut the surgical margin. Representative sections embeded, A tips face on, B lesion and surgical margin. (MR 17/4)<DOT>TA</DOT>

<Title>MICROSCOPIC</Title>

Section shows hyperkeratosis with occasional follicular plugging, epidermal atrophy and severe sundamage to dermal collagen. A dense chronic inflammatory cell infiltrate, both superficial and deep is present, mainly in a perivascular and periadnexal distribution. No liquefaction degeneration of the basal layer, no dermal oedema and no interface dermatitis are seen. PAS stain reveals no thickening of the epidermal basement membrane and only an occasional fungal spore on the skin surface.

Immunofluorescence for immunoglobulins and complement fractions are negative.

The differential diagnosis rests between chronic discoid erythematosus, lymphocytic infiltration of skin of Jessner and the plaque type of polymorphous light eruption. The presence of marked solar damage to collagen, the absence of basal liquefaction degeneration and the negative immunofluorescence favours polymorphous light eruption. A reaction to drugs or an insect bite is also a possibility. No evidence of malignancy.

Reported 24/4/98

Page 10: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

10

Data Inspection – SNOMED Codes Distribution

•867 types of codes occurred, and 30K codes have been assigned for the 10K texts

• The 9 codes with highest frequency are selected for experiments

•All the left codes are considered as “others”

Uniformly Random Select 10K pathology texts from Uniformly Random Select 10K pathology texts from

400K texts in database400K texts in database

Page 11: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

11

System Design – TC Work Flow

Read Document

Text Tokenization

Lexical Verification

Remove Stopwords

Stemming

Vector Representation of Text

(Indexing)

Feature Filtering to Reduce Dimensionality

Machine Learning(classifiers)

SNOMED Code

Page 12: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

12

System Design – Product Codes Assembling

Assembly line Text-to-Vector conversion TextVector Delivering

Classifiers – the workers All classifiers work together, assign the vector their classification result

s Product – a collection of codes

Classifier_1 Classifier_2 Object_3 Object_K

NewTextclassify

CODE_1Positive

CODE_2Negative

CODE_3Negative

CODE_KPositive

Page 13: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

13

EXPERIMENTS Machine Learners:

SVM-Light, MaxEnt, J48(Weka-DT) Indexing Methods:

Boolean Weight Word Frequency Entropy Weight

Stemming: all words, none of words

N-Gram: Unigram, Bigram, Trigram

Dimension Reduction: Frequency threshold: >=1, >= 4 Information Gain: top 100, 200, 500, 1000, 2000, 4000

TTSCT ConceptID integration (Negation): Keep text and add Concepts Replace concept words with Concepts Only Concepts, no text at all

Text Subsection Hidding: <Clinical History>, <Microscopic>, <Macroscopic>

Page 14: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

14

RESULTS & DISCUSSIONMeasurement for classification :

Recall:

Precision:

F-measure (F-value):

Standard Deviation:Measurement for System Performance :

Micro F:

Page 15: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

15

EXPERIMENT

Page 16: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

16

FUTURE WORK

Page 17: 1 Automatic Classification of Pathology Reports into SNOMED Codes Automatic Classification of Pathology Reports into SNOMED Codes June 2008 By Weihang.

17

Non-Word-Stemming performed better than Stemming

N-gram increased the correct-rate of classification with the increase of N (Trigram>Bigram>Unigram)

TTSCT indeed enhanced classification performance

Hiding misleading parts of text did raise the F-score

CONCLUSIONS


Recommended