+ All Categories
Home > Documents > NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011....

NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011....

Date post: 19-Aug-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
30
Creating Knowledge out of Interlinked Data LOD2 Presentation . 02.09.2010 . Page http://lod2.eu AKSW, Universität Leipzig Sebastian Hellmann NIF – NLP Interchange Format http://aksw.org/Projects/NIF
Transcript
Page 1: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

Creating Knowledge out of Interlinked Data

LOD2 Presentation . 02.09.2010 . Page http://lod2.euAKSW, Universität Leipzig

Sebastian Hellmann

NIF – NLP Interchange Format

http://aksw.org/Projects/NIF

Page 2: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

2

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

Outline:• NLP Interchange Format • Use Cases

– Integration of tools– Meaning Representation Language– Knowledge Extraction with SPARQL– Machine Learning

• Related Projects

2KAIST LOD2 17.8.2011

Page 3: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

3

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

Problem:• Currently NLP software is organized in pipelines• Integration is done „hard-wired“

– For each tool and each framework an adapter has to be created (n*m)

• Difficult to aggregate output• Difficult to exchange single components

3KAIST LOD2 17.8.2011

Page 4: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

4

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

Overview: • NLP tools can be integrated via a common output format (Common

pattern in Enterprise Application Integration)• For each tool a wrapper needs to be created, that reads NIF and

produces NIF• The combination of tools can be adhoc, i.e. it is not a pipeline that

needs to be configured• Multi-layer and overlapping annotations are possible• Ontologies provide interfaces for each layer and for applications

4KAIST LOD2 17.8.2011

Page 5: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

5

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

• First Challenge: Representing Strings in RDF• How to give a part of a document or text an identifier (URI)?• What properties can such URIs have?

5KAIST LOD2 17.8.2011

Page 6: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

6

Creating Knowledge out of Interlinked Data

LOD2 Event . 06.09.2010 . Page http://lod2.eu

NIF – NLP Interchange Format

6

Page 7: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

7

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

7

Example URIs for annotating „Semantic Web“

KAIST LOD2 17.8.2011

Page 8: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

8

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

• First Challenge: Representing Strings in RDF• How to give a part of a document or text an identifier (URI)?• What properties can such URIs have?

8KAIST LOD2 17.8.2011

Page 9: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

9

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

• URIs are used to integrate output. RDF merges naturally, if the URIs are the same (or convertible using a certain recipe)

9KAIST LOD2 17.8.2011

Page 10: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

10

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

• Second challenge: Output of each layer is required to be stable.• Components and layers can be interchanged• Domain ontologies are needed to provide stable interfaces:

– OLiA provides an ontological interface for morpho-syntax http://nachhalt.sfb632.uni-potsdam.de/owl/

– DBpedia provides stable ids for Things

10KAIST LOD2 17.8.2011

Page 11: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

11

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

11KAIST LOD2 17.8.2011

Page 12: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

12

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

12KAIST LOD2 17.8.2011

Page 13: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

13

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

13KAIST LOD2 17.8.2011

Page 14: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

14

Creating Knowledge out of Interlinked Data

http://lod2.eu

Demo - Integration

• http://nlp2rdf.lod2.eu/annotator-stanford/NIFStemmer?input=My%20favorite%20actress%20is%20Natalie%20Portman!&type=text

• http://nlp2rdf.lod2.eu/annotator-stanford/NIFStanfordCore?input=My%20favorite%20actress%20is%20Natalie%20Portman!&type=text

14KAIST LOD2 17.8.2011

Page 15: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

15

Creating Knowledge out of Interlinked Data

http://lod2.eu

Use Cases

• Use Cases– Integration of tools– Meaning Representation Language– Knowledge Extraction with SPARQL– Machine Learning

15KAIST LOD2 17.8.2011

Page 16: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

16

Creating Knowledge out of Interlinked Data

http://lod2.eu

Use Case – Integration of tools

16KAIST LOD2 17.8.2011

Page 17: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

17

Creating Knowledge out of Interlinked Data

http://lod2.eu

Use Case – Meaning Representation Language

• RDF makes data integration easy: URIref, LinkedData

• OWL is based on Description Logics (Guarded Fragment)

• Availability of open data sets (access and licence)

• Diverse serializations for annotations: XML, Turtle, RDFa+XHTML

• Scalable tool support (Databases, Reasoning)

17KAIST LOD2 17.8.2011

Page 18: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

18

Creating Knowledge out of Interlinked Data

http://lod2.eu

Use Case – Meaning Representation Language

18KAIST LOD2 17.8.2011

Page 19: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

19

Creating Knowledge out of Interlinked Data

http://lod2.eu

Use Case – Knowledge Extraction with SPARQL

• Classical approach:• POS tag / Dependency parser (e.g. Stanford)• create a rule/pattern language to extract knowledge

19KAIST LOD2 17.8.2011

Page 20: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

20

Creating Knowledge out of Interlinked Data

http://lod2.eu

Use Case – Knowledge Extraction with SPARQL

Johanna Völker – Learning Expressive Ontologies (LExO)

# Example:# A fish is any aquatic vertebrate animal that is covered with scales, and equipped with two sets of paired fins and several unpaired fins.# [fish] subClassOf [any aquatic vertebrate animal that is covered …]

Construct {?sub rdfs:subClassOf ?super} { ?is a penn:BePresentTense . ?is nlp:superToken ?is_any_aquatic_. ?is_any_aquatic_ a olia:VerbPhrase . ?is_any_aquatic_ nlp:syntacticSubToken [ nlp:normUri ?super] . ?animal nlp:cop ?is . ?animal nlp:nsubj ?fish .?fish nlp:superToken [ nlp:normUri ?sub] .}

20KAIST LOD2 17.8.2011

Page 21: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

21

Creating Knowledge out of Interlinked Data

http://lod2.eu

Use Case - Machine Learning

21KAIST LOD2 17.8.2011

Page 22: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

22

Creating Knowledge out of Interlinked Data

http://lod2.eu

Use Case - Machine Learning

22KAIST LOD2 17.8.2011

Page 23: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

23

Creating Knowledge out of Interlinked Data

http://lod2.eu

Workplan

• EU Deliverable almost finished

• Integration of SnowballStemming and the Stanford Parser

• Next step: Integration of Knowledge Extraction tools (Zemanta, DBpedia Spotlight, Alchemy, OpenCalais, FOX)

• Web Service that read NIF and Output NIF

• Google Code Project: http://code.google.com/p/nlp2rdf/

• Web Site: http://aksw.org/Projects/NIF

23KAIST LOD2 17.8.2011

Page 24: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

24

Creating Knowledge out of Interlinked Data

http://lod2.eu

Summary

• NIF allows to represent NLP output using Knowledge Representation Formalisms (RDF/OWL)

• It is possible to mix it with other Knowledge (e.g. Wikipedia/DBpedia)

• Good foundation to optimize machine learning:• Choose the best algortihms • Choose the best data

24KAIST LOD2 17.8.2011

Page 25: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

25

Creating Knowledge out of Interlinked Data

http://lod2.eu

Related Projects

• Wiktionary

• LLOD

• CKAN / Open Lingusistics

25KAIST LOD2 17.8.2011

Page 26: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

NLP2RDF – http://aksw.org/Projects/NLP2RDF . Page 26 http://lod2.eu

Creation of data sets: Wiktionary2RDF

Page 27: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

NLP2RDF – http://aksw.org/Projects/NLP2RDF . Page 27 http://lod2.eu

Creation of data sets: Wiktionary2RDF

http://en.wiktionary.org/wiki/house• Covers 170 languages• Total of 10 million pages• 900.000 users• RDF Dump will increase number of editors• Same properties as Wikipedia (stable identifiers)•• Hundreds of Wiktionary parsers (especially for English)• Information is trapped in the Wiki• Structure changes make software obsolete•Why try it again?• DBpedia Extraction Framework is very mature (5 years, 15 developers)• Configuration over Code, Templates will allow Wiktionarians to update Parsers• Early contact with the community

Page 28: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

NLP2RDF – http://aksw.org/Projects/NLP2RDF . Page 28 http://lod2.eu

Wiktionary, Wortschatz, OLiA can become the Crystallization point for a Linguistic Linked Data Web

Four major types:• Lexical Semantic Resources• Dictionaries• Corpora• Schemas/Ontologies

Page 29: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

NLP2RDF – http://aksw.org/Projects/NLP2RDF . Page 29 http://lod2.eu

Open Licences – Focus of LOD2 and OKFN

http://ckan.net/

CKAN is an open registry of data and content packages. Harnessing the CKAN software, this site makes it easy to find, share and reuse content and data, especially in ways that are machine automatable.

Working Group on Open Data in Linguisticshttp://linguistics.okfn.org

• Founded on Nov 2010• 40 Members• Membership open, please join• Over 100 data sets in CKAN

Page 30: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated

Creating Knowledge out of Interlinked Data

LOD2 Presentation . 02.09.2010 . Page http://lod2.eu

Thank you for your attention!


Recommended