Date post: | 31-Mar-2015 |
Category: |
Documents |
Upload: | marianna-hogate |
View: | 218 times |
Download: | 2 times |
Applying Natural Language Applying Natural Language Generation to Electronic Generation to Electronic Health Records in an e-Health Records in an e-
Science contextScience context
Donia ScottCentre for Research in Computing
The Open University
OutlineOutline
Background: the CLEF projectPatient records as data-encoded patient historiesRole of NLG in CLEFIntuitive querying with natural languageGenerating tailored reports from CLEF data
Background: the CLEF Background: the CLEF projectproject
CLEF (Clinical E-Science Framework) is an MRC-funded project aiming at providing a repository of well organised data-encoded clinical histories
Aim: to provide the framework for a new type of medical research: in silico experiments
Partners:
NLP: OU, Sheffield
Medical informatics: Manchester
Electronic Health Records: Royal Marsden Hospital, UCL
Privacy/confidentiality: Cambridge
Collect clinical information from multiple sitesAnalyse, structure and integrate itMake it available, using GRID toolsTo authorised clinicians and e-Health scientistsIn a secure and ethical collaborative framework
GRIDGRID
The CLEF repositoryThe CLEF repository
Chronicle
Repository
Organised data on individual patients
Data from:• Referral letters• Review notes• Lab results• Nurse notes• Hospital admission notes• Hospital discharge notes• Treatment notes• Surgery reports
The CLEF ChronicleThe CLEF Chronicle
Representing the story of a patient over time
time time time time time time time time
The story of an illnessThe story of an illnessHuman:1382
Mass:1666
locus
Pain:5735
locus
locus
Radio:1812
plansplans
Chemo:6502
plans
treats
treats
locus
target
attends
attendsattends
Ulcer:1945
finding
Cancer:1914
finding
Breast:1492
locus
Clinic:4096
reason
reason
Biopsy:1066
reason
Clinic:1024plans Clinic:2010plans
reason reason
reason
000abnormality 4574572342512
0023320133
511
050metastatic
lymphnode count
4494492342511996
3320133511
00oestrogen receptor +ve
invasive tubular adeno
00BRCA1 +ve
1 -1cancer 4494492342511993
3320133511
00oestrogen receptor +ve
invasive tubular adeno
00BRCA1 +ve
-1stage1 cancer 4494492342511989
3320133511
000abnormality 4464462342511984
3320133511
000enlargement 4464462342511982
3320133511
000enlargement 4464462342511980
3320133511
000lymphadenopathy
4464462342511979
3320133511
000recurrent
cancer 4464462342511978
3320133511
000abnormality 4464462342511959
3320133511
000abnormality 4464462342511955
3320133511
00oestrogen receptor +ve
invasive tubular adeno
00BRCA1 +ve
-1cancer 4434432342511948
3320133511
000cancer 2872872342511944
3320133511
000cancer 1311312342511940
3320133511
00oestrogen receptor +ve
invasive tubular adeno
05.8BRCA1 +ve
-1primary cancer 12342511936
3320133511
NodesInvolved
NodesCounted
TumourMarkerHistologyGrademmSizeGenotype
ClinicalCourse
ExistenceStatusNameEventEnd
DateEventStartDateIDSimID
Problems
~200
unsuccessful completed relapse treatment package 34729823425124753320133512
completed chemotherapy cycle 22522523425123833320133512
completed chemotherapy cycle 22422423425123823320133512
completed packed red cell transfusion 22222223425123813320133512
deferred chemotherapy cycle 22222223425123793320133512
completed chemotherapy cycle 22122123425123783320133512
completed packed red cell transfusion 21921923425123773320133512
deferred chemotherapy cycle 21921923425123753320133512
completed chemotherapy cycle 21721723425123733320133512
completed chemotherapy cycle 21621623425123503320133512
completed radiotherapy cycle 21421423425123493320133512
completed radiotherapy cycle 21321323425123483320133512
completed chemotherapy course 22521823425123173320133512
completed radiotherapy course 21521123425123163320133512
incomplete excision completed radical mastectomy 19719723425122903320133512
successful completed primary treatment package 20519723425122873320133512
started hormone anatagonist therapy 044923425119973320133512
complete excision completed lumpectomy 44944923425119903320133512
successful completed primary treatment package 45744923425119873320133512
OutcomeStatusNameEventEndDateEventStartDateIDSimID
Interventions
15
completed examination 46546523425120313320133511
completed testing 46546523425120223320133511
completed Xray 46546523425120213320133511
completed examination 45745723425120133320133511
completed examination 45745723425120123320133511
completed examination 45745723425120103320133511
completed testing 45745723425120013320133511
completed Xray 45745723425120003320133511
completed excision biopsy 44944923425119943320133511
completed histopathology 44944923425119923320133511
completed excision biopsy 44944923425119913320133511
completed cancer staging 44944923425119883320133511
completed examination 44644623425119763320133511
completed examination 44644623425119743320133511
completed examination 44644623425119733320133511
completed examination 44644623425119713320133511
completed testing 44644623425119533320133511
completed Xray 44644623425119513320133511
completed Xray 44344323425119473320133511
completed Xray 28728723425119433320133511
completed Xray 13113123425119393320133511
StatusNameEventEndDateEventStartDateIDSimID
Investigations
~100
daily epirubicin 23425128183320133512
daily doxorubicin 23425124793320133512
daily 5-fluorouracil 23425123203320133512
daily cyclophosphamide 23425123193320133512
daily epirubicin 23425123183320133512
RegimeNameIDSimID
Drugs
~5
clinic mammography screening scheduled 0023425122293320133511
clinic mammography screening completed 84184123425122223320133511
clinic follow up completed 73773723425121963320133511
clinic follow up completed 63363323425121523320133511
clinic follow up completed 54554523425121083320133511
clinic follow up completed 48948923425120643320133511
clinic follow up completed 46546523425120203320133511
clinic initial treatment planning completed 44944923425119863320133511
clinic mammography screening completed 44344323425119463320133511
clinic mammography screening completed 13113123425119383320133511
LocationTypeStatusEventEndDateEventStartDateIDSimID
Consults
~10
Loci
bone metabolism 23479114143322572593
Lbrain 23479113193322572593
Llung 23479112943322572593
Rlung 23479112923322572593
brain 23479112683322572593
Raxilla 23479110903322572593
spleen 23479110723322572593
liver 23479110703322572593
abdomen 23479110653322572593
Raxillary lymphnodes 23479110623322572593
ESR concentration 23479110603322572593
Creatinine concentration 23479110583322572593
Alkaline Phosphatase concentration 23479110563322572593
Bilirubin concentration 23479110543322572593
GGT concentration 23479110523322572593
platelet count 23479110503322572593
leucocyte count 23479110483322572593
haemoglobin concentration 23479110463322572593
blood 23479110443322572593
chest 23479110423322572593
Rbreast 23479110363322572593
LateralityNameIDSimID
~20 ~600
2342511955PROBLEM HAS_FINDING 2342511953INVESTIGATION 3320133511
2342511954LOCUS HAS_TARGET 2342511953INVESTIGATION 3320133511
2342511950CONSULT RECOMMENDED_BY 2342511953INVESTIGATION 3320133511
2342511936PROBLEM INDICATED_BY 2342511953INVESTIGATION 3320133511
3320133511PATIENT HAS_LOCUS 2342511952LOCUS 3320133511
2342511950CONSULT RECOMMENDED_BY 2342511951INVESTIGATION 3320133511
2342511936PROBLEM INDICATED_BY 2342511951INVESTIGATION 3320133511
2342511985CONSULT ARRANGED 2342511950CONSULT 3320133511
2342511937LOCUS HAS_LOCUS 2342511948PROBLEM 3320133511
2342511948PROBLEM HAS_FINDING 2342511947INVESTIGATION 3320133511
2342511949CONSULT ARRANGED 2342511946CONSULT 3320133511
2342511937LOCUS HAS_LOCUS 2342511944PROBLEM 3320133511
2342511944PROBLEM HAS_FINDING 2342511943INVESTIGATION 3320133511
2342511937LOCUS HAS_TARGET 2342511943INVESTIGATION 3320133511
2342511945CONSULT ARRANGED 2342511942CONSULT 3320133511
2342511937LOCUS HAS_LOCUS 2342511940PROBLEM 3320133511
2342511940PROBLEM HAS_FINDING 2342511939INVESTIGATION 3320133511
2342511937LOCUS HAS_TARGET 2342511939INVESTIGATION 3320133511
2342511941CONSULT ARRANGED 2342511938CONSULT 3320133511
3320133511PATIENT HAS_LOCUS 2342511937LOCUS 3320133511
2342511937LOCUS HAS_LOCUS 2342511936PROBLEM 3320133511
Item2IDItem2TypeRelationItem1IDItem1TypeSimID
Relations
A typical cancer patientA typical cancer patient
The role of NLGThe role of NLG
an intuitive query interface to provide efficient access to aggregated data-encoded patient histories for:
Assisting in diagnosis and treatmentIdentifying patterns in treatmentSelecting subjects for clinical trials
generating reports from the data-encoded histories, for clinicians to use at the point of care.
Intuitive querying of the Intuitive querying of the CLEF repositoryCLEF repository
What does the CLEF What does the CLEF database providedatabase provide
Evidence from about 20,000 patient records, comprising 3.5 million record components (about 5GB of data). These are all in the area of cancer.162 queriable fieldsvarious text-only records (non-queriable)Two types of data:
StructuredExtracted from narratives by IE
Queriable data is encoded according to various medical terminologies (SNOMED, ICD, UMLS)There are approximately 19,500 different medical codes currently used in the database (a relatively small subset of SNOMED and ICD)
Queriable dataQueriable dataStructured data:
Demographics: Age, gender, postal district, ethnical group, occupation
Laboratory findings:32 types of haematology findings51 types of chemistry findingsCytology reportsHistopathology reports
Imaging studies:Radiology procedure, site, diagnosis, morphology, topography, report, indication, department
Treatments:Prescription drugsChemotherapy protocolIV chemotherapyRadiotherapySurgical procedures
DiagnosesClinical diagnosisCause(s) of death
Data extracted from narratives
Query interface Query interface requirementsrequirements
Designed for:casual and moderate users, who are familiar with the semantic domain of the repository but not with its technical implementationTypically clinicians or medical researchers
Should be able to:Allow the construction of complex queries with nested structures and temporal expressionsMinimise the risk of ambiguitiesOffer good coverage of the data types in the CLEF database
Should be used with:Minimal trainingNo prior knowledge of medical terminologies, formal querying languages, databases
Typical queriesTypical queries“How many patients with AML have had a normal count after two
cycles of treatment?”“ How many patients with primary breast cancer have relapsed in
the last five years? ”“ What is the median time between first drug treatment for
metastatic breast cancer and death? ”“ In breast cancer patients, what is the incidence of lymphoedema
of the arm that persists more than two years after primary surgical treatment? ”
“ What is the average number of x-rays for patients with prostate cancer? ”
“ What is the average time between first treatment for cervical cancer and death for patients aged less than 60 at death compared with those aged over 60? ”
“How many patients between the ages of 40 and 60 when they were first diagnosed with lung cancer had a platelet count higher than 300 but a white cell count lower than 3 before the 4th cycle of any course of chemotherapy they received during treatment? ”
Querying alternativesQuerying alternativesSQL:
Not appropriate for the typical CLEF userRequires deep knowledge of the database structure and content, medical terminologies used in the database
Graphical interfaces:Have to cope with large number of parametersNested structures and temporal restrictions are difficult to express
Natural Language interfaces:More natural and more expressive than formal querying languages, but…
Sensitive to errors in composition, spelling, vocabularyNormally understand only a subset of natural languageComplex queries are difficult to processIt is difficult to trace the source of errors in the result
The CLEF approachThe CLEF approach
Similar to Natural Language interfaces, however the user edits the conceptual meaning of a query instead of its surface textAllows users to easily construct non-ambiguous queriesGuides the users towards constructing correct queries only (queries compatible with the content of the database)It is semi-database independent but very domain specificBased on the Conceptual Authoring (aka WYSIWYM) technique (Power and Scott, 1998)The query is presented to the user as an interactive text, and it is edited by making selections on various components of the queryEach selection triggers a text re-generation process which results into a new feedback text containing the selection the user made
Query editingQuery editing
Modelling queriesModelling queriesThere are 4 distinct sections of a query:
A description of the subjects (in terms of demographics information and basic diagnosis)A description of treatments that the subjects receivedA description of laboratory findingsAn outcome section (what do we want from the group of patients we have just described)
Each query element can be expressed as a conjunction or disjunction of same-type query elements, e.g.,:
Cancer of the breast and of the lungPatients who received chemotherapy and radiotherapy
Some query elements can be temporally related to each other, e.g.,:
Patients who received chemotherapy within 5 months of surgeryPatients alive 5 years after the diagnosis
Constraining user choicesConstraining user choices
At each step, users are only given correct choices
Choices are context dependentPatients diagnosed with [some cancer] in [some body part]User selects [some cancer] => “squamous cell carcinoma”The interface restricts the choices available for [some body part] to those sites where squamous cell carcinoma can develop
Dealing with ambiguitiesDealing with ambiguities
Once a query is constructed, there is only one way it can be interpreted – there is no disambiguation task to be performed
… but users may be misled into constructing a different query than they intend to
Answer generationAnswer generationThe answer set consists of an age/gender breakdown of the patients that fulfil the query requirementsEach additional clinical feature is combined with the age/gender breakdown to provide more detailed information3 types of rendering:
TextChartsTable
EvaluationEvaluation
Research questions:Can the WYSIWYM query formulation method be easily learned by users of CLEF?Is it easier to formulate CLEF queries in SQL or with the WYSIWYM query formulation method?Are the interactive feedback texts ambiguous?
Evaluation results show Evaluation results show that…that…
The CLEF Conceptual Authoring query interface works!
The method is easily acquired.
Investigation shows that it is much easier to use than current alternatives (viz. SQL).
The feedback texts tend to be easily understood
It is a viable solution to the querying the CLEF repository.
However ….
Unresolved issuesUnresolved issues
Are the queries we currently support really the ones users will want to ask? Does the query interface provide sufficient data coverage?
Generating reports from Generating reports from the CLEF repositorythe CLEF repository
The contextThe context
We aim at generating reports from the data-encoded Electronic Patient RecordsOur reports are aimed at clinicians for use at the point of careVarious types of report work on the same input (roughly the same content) but express information from different viewpointsWe address the problem of conceptual restatement in generating summarised reports
Typical inputTypical input
000abnormality 4574572342512002
3320133511
050metastatic
lymphnode count
4494492342511996
3320133511
00oestrogen receptor +ve
invasive tubular adeno
00BRCA1 +ve
1 -1cancer 4494492342511993
3320133511
00oestrogen receptor +ve
invasive tubular adeno
00BRCA1 +ve
-1stage1 cancer 4494492342511989
3320133511
000abnormality 4464462342511984
3320133511
000enlargement 4464462342511982
3320133511
000enlargement 4464462342511980
3320133511
000lymphadenopathy
4464462342511979
3320133511
000recurrent
cancer 4464462342511978
3320133511
000abnormality 4464462342511959
3320133511
000abnormality 4464462342511955
3320133511
00oestrogen receptor +ve
invasive tubular adeno
00BRCA1 +ve
-1cancer 4434432342511948
3320133511
000cancer 2872872342511944
3320133511
000cancer 1311312342511940
3320133511
00oestrogen receptor +ve
invasive tubular adeno
05.8BRCA1 +ve
-1primary cancer 12342511936
3320133511
NodesInvolved
NodesCounted
TumourMarkerHistologyGrademmSizeGenotype
ClinicalCourse
ExistenceStatusNameEventEnd
DateEventStartDateIDSimID
Problems
~200
unsuccessful completed relapse treatment package 34729823425124753320133512
completed chemotherapy cycle 22522523425123833320133512
completed chemotherapy cycle 22422423425123823320133512
completed packed red cell transfusion 22222223425123813320133512
deferred chemotherapy cycle 22222223425123793320133512
completed chemotherapy cycle 22122123425123783320133512
completed packed red cell transfusion 21921923425123773320133512
deferred chemotherapy cycle 21921923425123753320133512
completed chemotherapy cycle 21721723425123733320133512
completed chemotherapy cycle 21621623425123503320133512
completed radiotherapy cycle 21421423425123493320133512
completed radiotherapy cycle 21321323425123483320133512
completed chemotherapy course 22521823425123173320133512
completed radiotherapy course 21521123425123163320133512
incomplete excision completed radical mastectomy 19719723425122903320133512
successful completed primary treatment package 20519723425122873320133512
started hormone anatagonist therapy 044923425119973320133512
complete excision completed lumpectomy 44944923425119903320133512
successful completed primary treatment package 45744923425119873320133512
OutcomeStatusNameEventEndDateEventStartDateIDSimID
Interventions
15
completed examination 46546523425120313320133511
completed testing 46546523425120223320133511
completed Xray 46546523425120213320133511
completed examination 45745723425120133320133511
completed examination 45745723425120123320133511
completed examination 45745723425120103320133511
completed testing 45745723425120013320133511
completed Xray 45745723425120003320133511
completed excision biopsy 44944923425119943320133511
completed histopathology 44944923425119923320133511
completed excision biopsy 44944923425119913320133511
completed cancer staging 44944923425119883320133511
completed examination 44644623425119763320133511
completed examination 44644623425119743320133511
completed examination 44644623425119733320133511
completed examination 44644623425119713320133511
completed testing 44644623425119533320133511
completed Xray 44644623425119513320133511
completed Xray 44344323425119473320133511
completed Xray 28728723425119433320133511
completed Xray 13113123425119393320133511
StatusNameEventEndDateEventStartDateIDSimID
Investigations
~100
daily epirubicin 23425128183320133512
daily doxorubicin 23425124793320133512
daily 5-fluorouracil 23425123203320133512
daily cyclophosphamide 23425123193320133512
daily epirubicin 23425123183320133512
RegimeNameIDSimID
Drugs
~5
clinic mammography screening scheduled 0023425122293320133511
clinic mammography screening completed 84184123425122223320133511
clinic follow up completed 73773723425121963320133511
clinic follow up completed 63363323425121523320133511
clinic follow up completed 54554523425121083320133511
clinic follow up completed 48948923425120643320133511
clinic follow up completed 46546523425120203320133511
clinic initial treatment planning completed 44944923425119863320133511
clinic mammography screening completed 44344323425119463320133511
clinic mammography screening completed 13113123425119383320133511
LocationTypeStatusEventEndDateEventStartDateIDSimID
Consults
~10
Loci
bone metabolism 23479114143322572593
Lbrain 23479113193322572593
Llung 23479112943322572593
Rlung 23479112923322572593
brain 23479112683322572593
Raxilla 23479110903322572593
spleen 23479110723322572593
liver 23479110703322572593
abdomen 23479110653322572593
Raxillary lymphnodes 23479110623322572593
ESR concentration 23479110603322572593
Creatinine concentration 23479110583322572593
Alkaline Phosphatase concentration 23479110563322572593
Bilirubin concentration 23479110543322572593
GGT concentration 23479110523322572593
platelet count 23479110503322572593
leucocyte count 23479110483322572593
haemoglobin concentration 23479110463322572593
blood 23479110443322572593
chest 23479110423322572593
Rbreast 23479110363322572593
LateralityNameIDSimID
~20 ~600
2342511955PROBLEM HAS_FINDING 2342511953INVESTIGATION 3320133511
2342511954LOCUS HAS_TARGET 2342511953INVESTIGATION 3320133511
2342511950CONSULT RECOMMENDED_BY 2342511953INVESTIGATION 3320133511
2342511936PROBLEM INDICATED_BY 2342511953INVESTIGATION 3320133511
3320133511PATIENT HAS_LOCUS 2342511952LOCUS 3320133511
2342511950CONSULT RECOMMENDED_BY 2342511951INVESTIGATION 3320133511
2342511936PROBLEM INDICATED_BY 2342511951INVESTIGATION 3320133511
2342511985CONSULT ARRANGED 2342511950CONSULT 3320133511
2342511937LOCUS HAS_LOCUS 2342511948PROBLEM 3320133511
2342511948PROBLEM HAS_FINDING 2342511947INVESTIGATION 3320133511
2342511949CONSULT ARRANGED 2342511946CONSULT 3320133511
2342511937LOCUS HAS_LOCUS 2342511944PROBLEM 3320133511
2342511944PROBLEM HAS_FINDING 2342511943INVESTIGATION 3320133511
2342511937LOCUS HAS_TARGET 2342511943INVESTIGATION 3320133511
2342511945CONSULT ARRANGED 2342511942CONSULT 3320133511
2342511937LOCUS HAS_LOCUS 2342511940PROBLEM 3320133511
2342511940PROBLEM HAS_FINDING 2342511939INVESTIGATION 3320133511
2342511937LOCUS HAS_TARGET 2342511939INVESTIGATION 3320133511
2342511941CONSULT ARRANGED 2342511938CONSULT 3320133511
3320133511PATIENT HAS_LOCUS 2342511937LOCUS 3320133511
2342511937LOCUS HAS_LOCUS 2342511936PROBLEM 3320133511
Item2IDItem2TypeRelationItem1IDItem1TypeSimID
Relations
Why are textual reports Why are textual reports needed?needed?
Clinicians and other health professionals use patient health summaries at the point of care, where time is a critical resource
Reports provide quick access to an overview of a patient’s medical history
Typically, an electronic patient record contains around 1000 messagesEven structured, this volume of data is very largeAccess to relevant information about particular patients is difficult
Textual reports:are easy to read and understandcan be customised to the type of information neededprovide a quick way of identifying errors in the patient recordalleviate the need to know in detail the structure of the underlying database
Why are paraphrases Why are paraphrases needed?needed?
Alternative views of the patient record, i.e., Reports from various viewpoints:
Full chronological reportsSummaries of investigations, interventions, treatmentsSame content, different textual representation
Potted summaries also important (30-second overview of patient’s history)
Content selectionContent selection•Two notions:
•Spine events: the main concepts in the summary (depending on user-defined type of summary)•Skeleton events: linked to the spine by various relations
•Basic procedure:•Step 1: group linked events into clusters and remove small clusters
•Typically, a small number of very large clusters and a small number of small clusters•Small clusters are assumed not to be related to the main topic of the summary
•Step 2: Identify spine events according to the type of summaryLongitudinal, Investigations, Interventions, Problems
•Step 3: Identify the skeleton events If (“problem is spine event” and “investigation has_indication problem”) then select investigation (unless already selected)Repeat step 2 a certain number of times (given by a threshold parameter)
Spine of Problem eventsSpine of Problem events
pain
cancer
breast
radiotherapy cycle
Hyperbaric oxygenation
radiotherapy
lump
mammogram
biopsy
cancer
ulcer
Problem
The patient identifies pain in the left breast. A lump in the breast is found through a mammogram.
A biopsy performed on the breast reveals cancer in the left breast. The patient receives radiotherapy to treat the cancer. Skin ulceration develops in the left breast as a result of radiotherapy, which is treated with hyperbaric oxygenation.
pain
breastradiotherapy
cycle
Hyperbaric oxygenation
radiotherapy
lump
mammogram
biopsycancer
ulcer
Interventions
Radiotherapy on the breast is initiated to treat cancer in the breast. A first radiotherapy cycle is performed.
The radiotherapy causes skin ulceration. The patient receives hyperbaric oxygenation to treat the ulcer.
pain breast
radiotherapy cycle
Hyperbaric oxygenation
radiotherapy
lump
mammogram
biopsy
cancer
ulcer
Investigations
A mammogram is performed because of pain in the left breast, which identifies a lump in the breast. A biopsy of the lump identifies cancer in the left breast.
pain
cancer
breast
radiotherapy cycle
Hyperbaric oxygenation
radiotherapy
lump
mammogram
biopsy
cancer
ulcer
pain
breastradiotherapy
cycle
Hyperbaric oxygenation
radiotherapy
lumpmammogram
biopsy cancer
ulcer
Interventions
Problem
pain breast
radiotherapy cycle
Hyperbaric oxygenation
radiotherapy
lump
mammogram
biopsy
cancer
ulcer
Investigations
Discourse structuringDiscourse structuringMostly given by relations in the EPR19 different types of relations, which can be:
Attributive: Problem has_locus LocusRhetorical: Problem caused_by Intervention
Attributive relations do not contribute to the discourse structure
In a first step, events linked through attributive relations are combined:
Message_Problem+Message_Locus =>Message_Problem_Locus
Messages are grouped according to type of summary:
Longitudinal: events occurring in the same week should be grouped together and further grouped into yearsLogical: arrange chronologically and then group similar events (e.g., liver panels, screening consults)
Discourse structuringDiscourse structuring
Within each group:link messages by discourse relations inferred from EPR relations: Cause, Result, Sequenceassume a List relation if no relation specified
Between groups: If all events in one group are linked to events in another group by some EPR relation, link groups through the corresponding discourse relationOtherwise, assume a List relation
AggregationProblems:
Problem_1:name HAS_LOCUS Locus_1Problem_2:name HAS_LOCUS Locus_2
Enlargement of the liver + Enlargement of the spleen => Enlargement of the liver and/but not of the spleen
Investigations:Investigation_1:name HAS_INDICATION Problem_1
HAS_LOCUS Locus_1Investigation_2:name HAS_INDICATION Problem_2
HAS_LOCUS Locus_2
Examination of the abdomen revealed no enlargement of the liver
Examination of the lymphnodes revealed no lymphadenopathy => Examination revealed no enlargement of the liver and no
lymphadenopathy
Text structuringText structuring
Problem_3 HAS_LOCUS {Locus_1, Locus_2}
Investigation_3 HAS_INDICATION {Problem_1, Problem_2}
AggregationInterventions
Intervention_1 PART_OF Intervention_0Intervention_2 PART_OF Intervention_0
[ID01]Chemotherapy cycle PART_OF [ID0]Chemotherapy[ID02]Chemotherapy cycle PART_OF [ID0]Chemotherapy[ID03]Chemotherapy cycle PART_OF [ID0]Chemotherapy
3 chemotherapy cycles
EllipsisExamination of the left breast revealed no recurrent cancer in
the left breast =>Examination of the left breast revealed no recurrent cancer
Text structuringText structuring
{count} Intervention_1
Text structuringText structuring
Events can be compacted according to domain-specific rules:
Clinical examination is: examination of the liver, examination of the spleen, examination of the abdomen
Clinical examination was normalClinical examination was normal apart from an enlargement of the spleenClinical examination revealed enlargement of the spleen
Liver panel is: billirubin concentration, ESR concentration, GCT concentration
The liver panel was in the normal range (apart from a very high level of GCT)
Maintaining the thread of Maintaining the thread of discoursediscourse
Textual representation should reflect the relative importance of eventsAt discourse level: spine concepts are preferably realised in nuclear units and skeleton events in satellite unitsAt sentence level: spine events are assigned salient syntactical rolesThe status of an event of being on the spine or on the skeleton determines its realisation as a sentence, a main or subordinate clause, phrase
Typical output of the NL generatorTypical output of the NL generatorYear 1
Week 0 A mammography screening was scheduled at the clinic. Week 1 Primary cancer of the right breast; histopathology: invasive tubular adenocarcinoma.
YEAR 2Week 131 Xray revealed no cancer of the right breast.
YEAR 5Week 287 Xray revealed no cancer of the right breast.
YEAR 8Week 443 Xray revealed cancer of the right breast. Week 446 Examination (indicated by primary cancer of the right breast) revealed no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. Testing (indicated by primary cancer of the right breast) revealed no abnormality of the haemoglobin concentration and no abnormality of the leucocyte count. An Xray (indicated by primary cancer of the right breast) was performed. Very high level of the ESR concentration. Very high level of the Creatinine concentration. Very high level of the Alkaline Phosphatase concentration. Very high level of the Bilirubin concentration. Very high level of the GGT concentration. No abnormality of the platelet count.
Week 449 An initial treatment planning was completed at the clinic. Excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. Cancer staging revealed stage1 cancer. Hormone anatagonist therapy was started to treat primary cancer of the right breast. Lumpectomy was performed on the breast to treat primary cancer of the right breast. Primary treatment package was started to treat primary cancer of the right breast.
………………….
YEAR 17Week 893 Xray revealed no cancer of the right breast.
Long chronological report
Typical output of the NL generatorTypical output of the NL generator
Focus on Problems
In week 0, the patient is diagnosed with primary cancer of the right breast, histopathology: invasive tubular adenocarcinoma.
In weeks 131 and 287 Xray revealed no cancer of the right breast.
In week 446, there was no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes revealed by examination. There was no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration or of the ESR concentration.
In week 449, excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. Lumpectomy was performed on the right breast. Hormone anatagonist therapy was initiated to treat primary cancer of the right breast.
In weeks 457 to 737, there was no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. There was no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration and of the ESR concentration.
In weeks 457 to 893, Xray revealed no cancer of the right breast.
Compact reports
Focus on Interventions
In week 0, the patient is diagnosed with primary cancer of the right breast, histopathology: invasive tubular adenocarcinoma.
In week 449, excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. Lumpectomy was performed on the right breast. Hormone anatagonist therapy was started to treat primary cancer of the right breast.
Focus on InvestigationsIn week 0, the patient is diagnosed with primary cancer of the right breast, histopathology: invasive tubular adenocarcinoma.
In weeks 131 and 287 Xray revealed no cancer of the right breast.
In week 446, examinations revealed no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. Testing revealed no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration or of the ESR concentration.
In week 449, excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast.
In weeks 457 to 737, examinations revealed no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. Testing revealed no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration and of the ESR concentration.
In weeks 457 to 893, Xray revealed no cancer of the right breast
Ongoing work on report Ongoing work on report generationgeneration
Add domain-specific knowledge to improve content selection
Some events are become important depending on context
Change the (sub-)domain Test if the generation method is easily portable
Link NLG to IR to improve IRProduce reports for patients
Summary and ConclusionsSummary and Conclusions
CLEF is now entering the integration phase, moving towards testing and deploymentMajor emphases at this point are on privacy and securityInforming patients a major thread for future work.Integrating IE and NLG
Thank You!
Collaborators:Catalina HallettRichard Power
Evaluation procedureEvaluation procedure
Subjects:We tested the performance of 15 subjects.Subjects had a range of expertise in the CLEF domain -- from expert (oncologist) to novice (computer scientist), but most subjects had some medical training.Subjects had no previous experience with the CLEF WYSIWYM query interface, but most were aware of its fundamental principles.
Methodology:Subjects were given a set of four fixed queries to formulate using the CLEF WYSIWYM query interface. The queries were expressed in language as different as possible from the language in the query interface.Each subject received the queries in a different order.
Evaluation – data analysisEvaluation – data analysis
We recorded the time taken to compose each query. the number of operations used for constructing a query and compared it with the optimal number of operations (pre-computed).
We analysed whether performance, as indicated by
SpeedEfficiency
improves with training (experience).
Evaluation resultsEvaluation resultsTime to completionTime to completion
Subjects’ performance improved dramatically with experience.
After their first experience of composing a query, subjects’ completion time halved, and asymptotes at that level.
Time to completion
0
1
2
3
4
5
6
7
1 2 3 4
Order of queryT
ime
(min
s)
Evaluation resultsEvaluation resultsPerformance over time:Performance over time: performance performance
normalised over complexitynormalised over complexity
After just one go with the CLEF interface, subjects are highly proficient in their ability to compose complex queries.By the time they get to their fourth query, subjects’ performance is almost perfect.
Operations
0
0.1
0.2
0.3
0.4
0.5
1 2 3 4
Order of query
(to
tal -
op
tim
al /o
pti
mal
)
Mean : 0.18
Optimal operation = min # of operations needed to compose the query perfectly.
This is a measure of the complexity of the query.
Evaluation – comparison Evaluation – comparison with SQLwith SQL
Very small scale experimentTwo subjects:
with expert knowledge of the structure, organisation and content of the CLEF databasehighly skilled users of SQLwith minimal experience with WYSIWYMwere given access to the SNOMED and ICD codes required to build the SQL
Each subject composed a query first in the CLEF WYSIWYM Interface and then in SQL
Evaluation – comparison Evaluation – comparison with SQLwith SQL
0
2
4
6
8
10
12
Subject 1 Subject 2
WYSIWYM
SQL
Subject 1 – Query 1WYSIWYM: 2.3 minsSQL: 8.5 mins (incomplete)
Subject 2 – Query 2WYSIWYM: 4.5 minsSQL:12 mins (incomplete)
Even with a slowly reacting interface, the subjects were much faster composing queries in WYSIWYM than in SQL
Are the feedback texts Are the feedback texts ambiguous to the usersambiguous to the users
Identified 6 types of ambiguity4 examples of each, with forced-choice judgements by 15 subjectsRandom jugements would give a score of 33%Results show 84% correct judgements
summary patient records
for clinicians and medical researchers
repository summarisation
for patientssummary patient records
linear text animated dialogue
hypertext
Sample report for CliniciansSample report for Clinicians
In the weeks 195 to 196, self examination revealed lump of the right breast.
In week 197, self examination revealed lump of the right breast. Excision biopsy revealed metastatic lymphnode count of the right axilla. Histopathology revealed cancer of the right breast. Cancer staging revealed stage2 cancer. Radical mastectomy was performed on the breast to treat the primary cancer. The patient was diagnosed with metastatic lymphnode count of the right axilla; 19 nodes involved out of 24. The patient was diagnosed with metastatic cancer of the right axilla; histopathology: invasive undifferentiated adenocarcinoma. The patient was diagnosed with cancer of the right breast; histopathology: invasive undifferentiated adenocarcinoma. The patient was diagnosed with stage2 cancer; histopathology: invasive undifferentiated adenocarcinoma. Primary treatment package was initiated to treat primary cancer of the right breast.
Sample report for CliniciansSample report for Clinicians
In the weeks 195 to 196, self examination revealed lump of the right breast.
In week 197, self examination revealed lump of the right breast. Excision biopsy revealed metastatic lymphnode count of the right axilla. Histopathology revealed cancer of the right breast. Cancer staging revealed stage2 cancer. Radical mastectomy was performed on the breast to treat the primary cancer. The patient was diagnosed with metastatic lymphnode count of the right axilla; 19 nodes involved out of 24. The patient was diagnosed with metastatic cancer of the right axilla; histopathology: invasive undifferentiated adenocarcinoma. The patient was diagnosed with cancer of the right breast; histopathology: invasive undifferentiated adenocarcinoma. The patient was diagnosed with stage2 cancer; histopathology: invasive undifferentiated adenocarcinoma. Primary treatment package was initiated to treat primary cancer of the right breast.
…
Sample report for PatientsSample report for Patients
You had a consultation with your doctor on September 20th 1993.
On September 27th you did a self examination and you found that you had a lump in your right breast. A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel.
On October 4th you did another self examination and you found that you still had a lump in your right breast.
On October 11th you had a radical mastectomy to treat cancer in your right breast. A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall. Cancer is a tumour that tends to spread, both locally and to other parts of the body.
…
Cancer is a tumour that tends to spread, both locally and to other parts of the body.
You had a consultation with your doctor on September 20th 1993.
On September 27th you did a self examination.
A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel.
On October 4th you did another self examination.
you found that you had a lump in your right breast.
On October 11th you had a radical mastectomy.
to treat cancer in your right breast.
A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall.
SEQUENCE
SEQUENCE
HAS-FINDING
SEQUENCE
MOTIVATION
EXPLANATION
EXPLANATION
EXPLANATION
PresentingPresentingpatient patient recordsrecordsin hypertext:in hypertext:dividing the dividing the text intotext intorelated unitsrelated units
Cancer is a tumour that tends to spread, both locally and to other parts of the body.
You had a consultation with your doctor on September 20th 1993.
On September 27th you did a self examination.
SEQUENCE
A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel.
On October 4th you did another self examination.
SEQUENCE
you found that you had a lump in your right breast.HAS-FINDING
SEQUENCE
On October 11th you had a radical mastectomy.
to treat cancer in your right breast. MOTIVATION
A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall.
EXPLANATION
EXPLANATION
EXPLANATION
PresentingPresentingpatient patient recordsrecordsin hypertext:in hypertext:giving giving graphical graphical attributes to attributes to the text unitsthe text units
you found that you had a lump in your right breast.
The radical mastectomy was done to treat cancer in your right breast.
You had a consultation with your doctor on September 20th 1993.
On September 27th you did a self examination.
On October 4th you did another self examination.
A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel.
On October 11th you had a radical mastectomy.
PresentingPresentingpatient recordspatient recordsin hypertext:in hypertext:using animation using animation to represent to represent discourse discourse patternspatternsdynamicallydynamically
Cancer is a tumour that tends to spread, both locally and to other parts of the body.
A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall.
You had a consultation with your doctor on September 20th 1993.
You had a consultation with your doctor on September 20th 1993.
On September 27th you did a self examination.
You found that you had a lump in your right breast.
You had a consultation with your doctor on September 20th 1993.
On September 27th you did a self examination.
You had a consultation with your doctor on September 20th 1993.
On September 27th you did a self examination.
You found that you had a lump in your right breast.
A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel.
A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel.
You had a consultation with your doctor on September 20th 1993.
On September 27th you did a self examination.
You found that you had a lump in your right breast.
On October 4th you did another self examination.
A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel.
You had a consultation with your doctor on September 20th 1993.
On September 27th you did a self examination.
You found that you had a lump in your right breast.
On October 4th you did another self examination.
On October 11th you had a radical mastectomy.
A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel.
You had a consultation with your doctor on September 20th 1993.
On September 27th you did a self examination.
You found that you had a lump in your right breast.
On October 4th you did another self examination.
On October 11th you had a radical mastectomy.
A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall.
The radical mastectomy was done to treat cancer in your right breast.
A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel.
You had a consultation with your doctor on September 20th 1993.
On September 27th you did a self examination.
You found that you had a lump in your right breast.
On October 4th you did another self examination.
On October 11th you had a radical mastectomy.
A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall.
Monologues/DialoguesMonologues/Dialogues
Monologue Autonomous agent reads the generated reportAims: accessibility, education (not translation)
Dialogue Report is generated as a script that 2 agents act outAims: accessibility, vicarious learningExample (video clip)