Post on 16-Dec-2015
transcript
Automatic Report Generation from Ontologies: the MIAKT
Approach
Kalina Bontcheva, Yorick WilksDepartment of Computer Science
University of Sheffield
Rationale
• NLG takes as input structured data in a knowledge base or ontology and produces natural language text
• Applied to provide automatic documentation of ontologies or generate textual reports from formal knowledge
• Keeps texts constantly up-to-date so they reflect changes in the ontology
The MIAKT project • Medical Imaging and Advanced Knowledge
Technogies• Breast cancer• Triple assessment process
– Oncologist – clinical assessment– Hystopathologist – cytology– One or more radiologists – X-ray mammograms, MRI scans– Surgeon– Sometimes radiographer
• Types of images – Mammograms, MRI scans, ultrasound…
Removing Repeating Triples• Based on the ontology – inverse properties• <daml:ObjectProperty rdf:about=
"file:/...#involved_in_ta"> <daml:inverseOf rdf:resource= "file:/...#involve_patient"/> …
• involved_in_ta(01401_patient, ta-soton-1069) involve_patient(ta-soton-1069, 01401_patient)
• More complex reasoning will be required to detect facts entailed by already said facts
Discourse Planning• Schemas – capture regular patterns in the
domain; can be applied recursively• Describe-Patient ->
Patient-Attributes,Describe-Procedures
• Patient-Attributes ->
[attribute(Patient, Attribute)],
Patient-Attributes *
The Property Hierarchy
• Special linguistically-motivated properties were introduced to make the NLG modules more generic: – active-action (e.g. involve_patient) – passive-action (e.g., involved_in_ta)– Attribute (e.g. has-age, has-size)– part-whole (e.g., consists-of)
• All properties from the ontology were made sub-properties of one of these 4
• More light-weight approach than having a complete linguistic ontology like GUM (Generalised Upper Model)
Ontology-Based Aggregation• Joining attribute and part-whole properties with
the same first argument to have more coherent sentences
• ATTR(Abnormality: 01401, Mass: 01401_mass)ATTR(Abnormality: 01401, Margin: i_m_microlob)ATTR(Abnormality: 01401, Shape: i_shape_round)ATTR(Abnormality: 01401, Diagnose: i_pr_malig)
• Without aggregation:The abnormality has a mass. The abnormality has a microlobulated margin. The abnormality has a round shape. The abnormality has a probably malignant assessment.
• With aggregation:The abnormality has a mass, a microlobulated margin, a round shape, and a probably …
Surface Realisation• The input is an RDF statement and the
concept which is going to be the subject of the sentence: ATTR(Abnormality: 01401, Mass: 01401_mass) + Abnormality: 1401
• ATTR and PART_OF relations are handled already by an existing realiser (HYLITE) which treats the RDF as a graph and finds a path through it, starting from the focused concept
• Active and passive action properties are mapped to semantic roles like OBJ, PTNT, AGNT
• AGNT(Mammography: 01402, PRODUCE_RESULT)OBJ(PRODUCE_RESULT, Med_Image: 01402_left_cc)
Domain Portability• Availability of lexical resources for the domain, e.g.
UMLS and SPECIALIST or a lexicalised ontology• The classification of the properties into the 4
linguistic ones – possible to do semi-automatically if there are good naming conventions
• The 4 linguistic properties may have to be extended to include others if the domain requires it
• The main effort will be in the text structuring patterns, which require significant understanding of the system in order to modify them
• Machine learning to induce text patterns from labelled examples
Conclusion
• Presented an approach for automatic generation of texts from ontologies
• MIAKT exploits information from the ontology in order to filter out repetitive information and group together similar facts
• Main contribution is in showing how NLG tools can be designed to be easily customisable by non-specialists (through GUI tools)
• New application: sekt.semanticweb.org
Further Info
• http://www.aktors.org/miakt/
• http://www.dcs.shef.ac.uk/~kalina/papers.html
• http://sekt.semanticweb.org