Date post: | 11-Nov-2014 |
Category: |
Technology |
Upload: | dhavalkumar-thakker |
View: | 1,177 times |
Download: | 0 times |
1
COMP3725Knowledge Enriched Information
Systems
Lecture 13: Semantic Augmentation
Dhavalkumar Thakker (Dhaval)School of Computing, University of Leeds
2
Outline
• Semantic Augmentation– What – Why– How
• Existing systems & services for Semantic Augmentation
• Challenges
3
Semantic Augmentation
• From:
• To:
(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.
(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.
http://dbpedia.org/Ontology/Apple_Corps
http://dbpedia.org/Ontology/New_York_City
4
Semantic Augmentation
• Semantic augmentation is a process of attaching semantics to a selected part of a text to assist automatic interpretation of the meaning conveyed by the text.
• Also called semantic annotation, semantic tagging
5
It provides additional information about an existing piece of data.
6
Why Semantic Augmentation?
• Links to complementary information– “More about this”
• Show related or similar informatiom• Reasoning and inferencing offered by
semantics• Semantic annotation is the glue that ties
ontologies into document spaces – remember existing web is document web
• Manual metadata production cost is too high
7
GATE for Semantic Augmentation
• GATE (General Architecture for Text Engineering) – see gate.ac.uk
• GATE Developer is a development environment that provides a rich set of graphical interactive tools for the creation, measurement and maintenance of software components for processing human language.
• See: http://gate.ac.uk/family/developer.html
Overview of Gate Developer
• GATE Developer• Resources Pane
– applications: groups of processes to run on a document or corpus
– language resources: corpus, ontologies, schemas– processing resources: tools that operate on
unstructured text– datastores: saved documents and resources
• Display Pane: whatever you’re currently working with.
• See next slide
9
GATE : Interface
Resources Pane Display
Pane
Processing Resources: ANNIE
• A family of Processing Resources for language analysis included with GATE
• Stands for A Nearly-New Information Extraction system.
• Using finite state techniques to implement various tasks: tokenization, semantic tagging, verb phrase chunking, and so on.
ANNIE IE Modules
http://gate.ac.uk/sale/tao/splitch6.html#chap:annie
Some ANNIE Components
• Tokenizer– word, number, symbol, punctuation, and spaceToken.
• Sentence Splitter– Segments text into sentences
• Part of Speech Tagger– produces a part-of-speech tag as an annotation on each word or
symbol – Nouns, verbs etc.
• Gate Morphological Analyser – detecting morphemes in a piece of text (e.g. car,
caring)• OntoGazetteer
– Semantic Tagging component – uses ontology
13
Demo:
• From:
• To:
(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.
(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.
http://dbpedia.org/Ontology/Apple_Corps
http://dbpedia.org/Ontology/New_York_City
13
14
Step : Download & Start the GATE application
• Download GATE from: http://gate.ac.uk/download/
• Note: the demonstration is using GATE 6.0
15
Step: From Language Resources Select
• GATE document-> Make sure that String content is selected in the last field, see screenshot below. Name the file “Test”
16
Paste following text…in the file
• Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.
17
Step: From Processing resources select following resources
• ANNIE English Tokeniser• ANNIE Sentence Splitter• ANNIE POS Tagger• GATE Morphological Analyser• Note: For all the above, leave the “Name”
field Empty
18
Step: From Processing resources select following resources
19
Step: From Language Resources Select
• OWLIM Ontology– Specify the location of the ontology you would
like to use for semantic augmentation– For example, we are using dbpedia ontology
20
OWLIM Ontology window
21
From Processing Resources Select
• Select Onto Root Gazetteer • & specify parameters as follows:
22
Final steps: Create Corpus
• Go to Language resources and click on GATE Corpus, and add “Test” document created earlier
23
Final steps: Create Corpus Pipeline
• From application
• And add processing resources in order shown below and press “run this application”
24
Results: Go to file, Click on Annotation Set, Annotation List, Lookup
Semantic Augmentation
Other features
• JAPE– a Java Annotation Patterns Engine, provides
regular-expression based pattern/action rules over annotations.
– Grammar to detect entities, validate detected entities, pre & post processing
– Example: “at the Carnegie Stadium”, “at the Emirates Stadium”, “at the O2 Arena”
– See Tutorial: http://gate.ac.uk/sale/thakker-jape-tutorial/index.html
Some Links• Home page is http://gate.ac.uk/• Some good short tutorial videos for getting started:
http://gate.ac.uk/demos/developer-videos/ . These are only a few minutes each, so they’re fast
• User Guide: http://gate.ac.uk/sale/tao/index.html . This is apparently for version 7.1, which is a development build, but again it seems to be fine.
• Lots of documentation : http://gate.ac.uk/documentation.html
• The wiki: http://gate.ac.uk/wiki/ • JAPE grammar by Dhaval Thakker et al
http://gate.ac.uk/sale/thakker-jape-tutorial/index.html
27
Challenge: Term Ambiguity
• ...this apple on the palm of my hand...• ...Apple tried to acquire Palm Inc....• ...eating an apple sitted by a palm tree...
• What do “apple” and “palm” mean in each case?
• Objective is to recognize entities and disambiguate their meaning.
DBpedia Spotlight: Shedding Light on the Web of Documents. Pablo Mendes, Max Jakob, Andrés García-Silva,
and Christian Bizer. In: In the Proceedings of the 7th International Conference on Semantic Systems I-Semantics (2011) .
Challenges
• Disambiguation• Unknown entities • Ontology learning• Scale and speed• Co-referencing
Existing Services for Semantic Augmentation
Existing Services for Semantic Augmentation
31
DBpedia Spotlight
• DBpedia is a collection of entity descriptions extracted from Wikipedia & shared as linked data
• DBpedia Spotlight uses data from DBpedia and text from associated Wikipedia pages
• Learns how to recognize that a DBpedia resource was mentioned
• Given plain text as input, generates annotated texthttp://dbpedia-spotlight.github.com/demo/
33
DBpedia Spotlight
34
References
• DBpedia Spotlight: Shedding Light on the Web of Documents. Pablo Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. In: In the Proceedings of the 7th International Conference on Semantic Systems I-Semantics (2011) .
• Introduction to GATE, Dr. Paula Matuszek• Various resources from gate.ac.uk