Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Semantic Web: Extracting and MiningStructured Data from Unstructured Content
Web Science Lecture
Besnik Fetahu
L3S Research Center, Leibniz Universitat Hannover
May 20, 2014
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
1 Introduction
2 Semantic WebOntologiesLinked Data
3 Information Sources
4 Information Extraction and Text MiningMachine ReadingRelation ExtractionNamed Entity Recognition and Disambiguation
5 Semantic Web Application Use CasesKnowledge BasesEntity LinkingEntity RetrievalLinked Data Quality
6 Conclusions
7 Papers for Presentations
8 Resources
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
1 Introduction
2 Semantic WebOntologiesLinked Data
3 Information Sources
4 Information Extraction and Text MiningMachine ReadingRelation ExtractionNamed Entity Recognition and Disambiguation
5 Semantic Web Application Use CasesKnowledge BasesEntity LinkingEntity RetrievalLinked Data Quality
6 Conclusions
7 Papers for Presentations
8 Resources
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Introduction
• Large amounts of data.
• Heterogeneity of information: provenance, quality,content, representation, language etc.
• ‘Unstructured’ vs. Structured.
• Ontologies and Knowledge Bases.
• Entities, topics, relations.
• Use cases: Machine translation, semantic search, etc.
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
1 Introduction
2 Semantic WebOntologiesLinked Data
3 Information Sources
4 Information Extraction and Text MiningMachine ReadingRelation ExtractionNamed Entity Recognition and Disambiguation
5 Semantic Web Application Use CasesKnowledge BasesEntity LinkingEntity RetrievalLinked Data Quality
6 Conclusions
7 Papers for Presentations
8 Resources
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Semantic Web
The Semantic Web vision
The ultimate goal of the Web of data is to enable comput-ers to do more useful work and to develop systems thatcan support trusted interactions over the network. Theterm “Semantic Web” refers to W3C’s vision of the Webof linked data. Semantic Web technologies enable peopleto create data stores on the Web, build vocabularies, andwrite rules for handling data. Linked data are empoweredby technologies such as RDF, SPARQL, OWL, and SKOS.
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Semantic WebMain Components
• Format: turtle, n3, etc.
• Syntax: XML Schema
• Models: RDF
• Taxonomies: RDFS
• Ontologies: OWL
• Query languages: SPARQL
• Interchange formats: RIF
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Semantic WebData Formats and Models
• XML data format
• RDF data representation (〈 subject, predicate, object 〉)
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Semantic WebData Formats and Models
• XML data format
• RDF data representation (〈 subject, predicate, object 〉)
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Ontologies
Ontologies define the following concepts:
• Entities
• Relations
• Domains
• Rules
• Axioms
http://en.wikipedia.org/wiki/Ontology_(information_science)
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Knowledge RepresentationDifferences in OWL ontologies
• OWL-Lite (OWL-Lite ⊂ OWL-DL): supports those usersprimarily needing a classification hierarchy and simpleconstraints. It supports cardinality constraints, and only permitscardinality values of 0 or 1.
• OWL-DL (OWL-DL ⊂ OWL): supports maximum expressivenesswhile retaining computational completeness and decidability.OWL-DL includes all OWL language constructs, but they can beused only under certain restrictions.
• OWL: is meant for users who want maximum expressiveness andthe syntactic freedom of RDF with no computationalguarantees.
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Knowledge RepresentationOntologies and Schemas
• RDF Schema RDFS1
1 classes: rdfs:Class
2 properties: rdf:property, rdfs:subClassOf
3 domains: rdfs:domain
• Web Ontology Language OWL2 (OWL-Lite, OWL-DL)1 classes: owl:Class
2 properties: owl:equivalentClass, owl:sameAs
• Friend of a Friend FOAF ontology3
1 classes: foaf:Agent, foaf:Document,
foaf:Organisation, foaf:Person
• Simple Knowledge Organization System SKOS ontology4
1 classes: skos:Concept, skos:Collection
2 properties: skos:related, skos:broader,
skos:narrower1http://www.w3.org/TR/rdf-schema/2http://www.w3.org/TR/owl-ref/3http://xmlns.com/foaf/spec/4http://www.w3.org/2008/05/skos
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Knowledge Representation
• RDFS example
• Hierarchical class modelling
• OWL ontology example
www.mpi-inf.mpg.de/yago-naga/IJCAI11-tutorial/IJCAI11-tutorial/ijcai11-tutorial.pptx
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Knowledge Representation• RDFS example• Hierarchical class modelling
• OWL ontology example
www.mpi-inf.mpg.de/yago-naga/IJCAI11-tutorial/IJCAI11-tutorial/ijcai11-tutorial.pptx
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Knowledge Representation
• RDFS example
• Hierarchical class modelling
• OWL ontology example
www.mpi-inf.mpg.de/yago-naga/IJCAI11-tutorial/IJCAI11-tutorial/ijcai11-tutorial.pptx
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Knowledge RepresentationOntologies vs. Taxonomies
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Knowledge RepresentationAbox vs. Tbox
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Linked Data• RDF data published as triples 〈 subject, predicate, object 〉• SPARQL standard querying language over RDF data• Linked Data principles:
1 URIs as names for things2 De-referencable URIs
3 Provide information about things using standards: RDF,
SPARQL
4 Interlink with other things
• Billions of triples• Interlink all data into one gigantic graph:lod-cloud,schema.org...
• Microformats: RDFa for annotating web pages
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Everything done?
• Only a small fraction of data is actually structured
• Cumbersome to define manually and explicitly schemas,taxonomies, ontologies
• Large proportion of data is unstructured orsemi-structured
• Can we automatically extract and model such content?
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
1 Introduction
2 Semantic WebOntologiesLinked Data
3 Information Sources
4 Information Extraction and Text MiningMachine ReadingRelation ExtractionNamed Entity Recognition and Disambiguation
5 Semantic Web Application Use CasesKnowledge BasesEntity LinkingEntity RetrievalLinked Data Quality
6 Conclusions
7 Papers for Presentations
8 Resources
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Information Sources
1 Semi-structured data: Wikipedia, WordNet
2 Social Streams: twitter
3 News corpora: NYT Collection, Reuters, Wall
Street Journal (WSJ)
4 Web pages: common-crawl, ClueWeb
5 Linked Data: lod-cloud
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Information Sources
1 Semi-structured data: Wikipedia, WordNet
2 Social Streams: twitter
3 News corpora: NYT Collection, Reuters, Wall
Street Journal (WSJ)
4 Web pages: common-crawl, ClueWeb
5 Linked Data: lod-cloud
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Information Sources
1 Semi-structured data: Wikipedia, WordNet
2 Social Streams: twitter
3 News corpora: NYT Collection, Reuters, Wall
Street Journal (WSJ)
4 Web pages: common-crawl, ClueWeb
5 Linked Data: lod-cloud
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Information Sources
1 Semi-structured data: Wikipedia, WordNet
2 Social Streams: twitter
3 News corpora: NYT Collection, Reuters, Wall
Street Journal (WSJ)
4 Web pages: common-crawl, ClueWeb
5 Linked Data: lod-cloud
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Information Sources
1 Semi-structured data: Wikipedia, WordNet
2 Social Streams: twitter
3 News corpora: NYT Collection, Reuters, Wall
Street Journal (WSJ)
4 Web pages: common-crawl, ClueWeb
5 Linked Data: lod-cloud
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
1 Introduction
2 Semantic WebOntologiesLinked Data
3 Information Sources
4 Information Extraction and Text MiningMachine ReadingRelation ExtractionNamed Entity Recognition and Disambiguation
5 Semantic Web Application Use CasesKnowledge BasesEntity LinkingEntity RetrievalLinked Data Quality
6 Conclusions
7 Papers for Presentations
8 Resources
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Information Extraction and Text Mining• Very large corpora of unstructured text.• Heterogeneity: languages, quality, domains.• Rich underlying structure of unstructured text.• Natural Language Processing (NLP): POS, NER,
Co-Ref, Dependency Parsing (DP) etc.• Utilise NLP output for IE based on syntactic, semantic
and lexical patterns.• Query and Entity based summarisation.
http://en.wikipedia.org/wiki/Wikipedia:Statistics
http://www.worldwidewebsize.com/
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Machine Reading• Autonomous understanding of text by machines• Construct a belief based on the underlying corpus• OpenIE: an IE domain-independent paradigm for relation,
classes, and entities extraction.• TextRunner (Etzioni et al. 2008) self-supervised approach
for OpenIE.• Represent each relation as a triple 〈subject predicate object〉• Understanding and semantics of extracted triples is
primitive still
Machine Reading. Etzioni O., Banko M., J.
Cafarella M. AAAI 2007
Machine Reading. Etzioni O., Banko M., J.
Cafarella M. AAAI 2007
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Machine Reading: TextRunner
1 Self-Supervised Learner
2 Single-pass extractor
3 Redundancy-Based Assessor
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Machine Reading: TextRunner
1 Self-Supervised Learner
2 Single-pass extractor
3 Redundancy-Based Assessor
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Machine Reading: TextRunner
1 Self-Supervised Learner2 Single-pass extractor3 Redundancy-Based Assessor
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Relation Extraction
• DP of chunks of texts for relation extraction
• Syntactic patterns for relation extraction
• Semantic and Lexical patterns for relation extraction
• ReVerb: two step approach “relation first” rather than“arguments first”
1 identify relations2 identify arguments
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Relation Extraction
• DP of chunks of texts for relation extraction
• Syntactic patterns for relation extraction
• Semantic and Lexical patterns for relation extraction• ReVerb: two step approach “relation first” rather than
“arguments first”1 identify relations2 identify arguments
‘‘Michael Webb appeared on Oprah...’’ ⇒〈Michael Webb; appear on; Oprah〉
Schmitz et al. 2007
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Relation Extraction
• DP of chunks of texts for relation extraction
• Syntactic patterns for relation extraction
• Semantic and Lexical patterns for relation extraction
• ReVerb: two step approach “relation first” rather than“arguments first”
1 identify relations2 identify arguments
Schmitz et al. 2007
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Relation Extraction
• DP of chunks of texts for relation extraction
• Syntactic patterns for relation extraction
• Semantic and Lexical patterns for relation extraction
• ReVerb: two step approach “relation first” rather than“arguments first”
1 identify relations2 identify arguments
Fader et al. 2011
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Relation Extraction
• ClausIE (del Corro et al., 2013) a clause based approachfor relation extraction
• Automated approach, less restrictive and with improvedrecall.
del Corro et al. 2013
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Relation Extraction
• ClausIE (del Corro et al., 2013) a clause based approachfor relation extraction
• Automated approach, less restrictive and with improvedrecall.
del Corro et al. 2013
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Named Entity Recognition and Disambiguation
• Textual content has rich underlying syntactical andsemantical structure
• Frequently extracted syntactical and semanticalinformation: POS, Co-Ref and NER.
• Stanford CoreNLP: named entity recognition withspecific entity types Person, Organisation, Place,
Date.
• NED: named entity disambiguation of surface forms withentities from knowledge bases
1 DBpedia Spotlight2 Wikiminer3 AIDA ...
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Named Entity Recognition and Disambiguation• Textual content has rich underlying syntactical and
semantical structure• Frequently extracted syntactical and semantical
information: POS, Co-Ref and NER.
• Stanford CoreNLP: named entity recognition withspecific entity types Person, Organisation, Place,
Date.• NED: named entity disambiguation of surface forms with
entities from knowledge bases1 DBpedia Spotlight2 Wikiminer3 AIDA ...
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Named Entity Recognition and Disambiguation
• Textual content has rich underlying syntactical andsemantical structure
• Frequently extracted syntactical and semanticalinformation: POS, Co-Ref and NER.
• Stanford CoreNLP: named entity recognition withspecific entity types Person, Organisation, Place,
Date.
• NED: named entity disambiguation of surface forms withentities from knowledge bases
1 DBpedia Spotlight2 Wikiminer3 AIDA ...
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Named Entity Recognition and Disambiguation
• Textual content has rich underlying syntactical andsemantical structure
• Frequently extracted syntactical and semanticalinformation: POS, Co-Ref and NER.
• Stanford CoreNLP: named entity recognition withspecific entity types Person, Organisation, Place,
Date.
• NED: named entity disambiguation of surface forms withentities from knowledge bases
1 DBpedia Spotlight2 Wikiminer3 AIDA ...
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
1 Introduction
2 Semantic WebOntologiesLinked Data
3 Information Sources
4 Information Extraction and Text MiningMachine ReadingRelation ExtractionNamed Entity Recognition and Disambiguation
5 Semantic Web Application Use CasesKnowledge BasesEntity LinkingEntity RetrievalLinked Data Quality
6 Conclusions
7 Papers for Presentations
8 Resources
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Knowledge Bases
Prominent knowledge base examples:
1 WordNet knowledge base
2 Wikipedia encyclopaedia
3 DBpedia knowledge base
4 YAGO knowledge base
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Knowledge Bases
Prominent knowledge base examples:
1 WordNet knowledge base
2 Wikipedia encyclopaedia
3 DBpedia knowledge base
4 YAGO knowledge base
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Knowledge Bases
Prominent knowledge base examples:
1 WordNet knowledge base
2 Wikipedia encyclopaedia
3 DBpedia knowledge base
4 YAGO knowledge base
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Knowledge Bases
Prominent knowledge base examples:
1 WordNet knowledge base
2 Wikipedia encyclopaedia
3 DBpedia knowledge base
4 YAGO knowledge base
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Entity Linking and Interlinking• Semantic relatedness of entities• Exploit existing knowledge base structures• Latent relationships via semantic relations
http://www.visualdataweb.org/relfinder/relfinder.php
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Entity Retrieval• Search through structured data in the form of triples• Weigh differently different predicates• Map user keyword queries to matching entities
Blanco et al. 2011
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Linked Data Quality
Zaveri et al. 2012
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
1 Introduction
2 Semantic WebOntologiesLinked Data
3 Information Sources
4 Information Extraction and Text MiningMachine ReadingRelation ExtractionNamed Entity Recognition and Disambiguation
5 Semantic Web Application Use CasesKnowledge BasesEntity LinkingEntity RetrievalLinked Data Quality
6 Conclusions
7 Papers for Presentations
8 Resources
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Conclusions
• Large volumes of unstructured and high quality data
• High applicability of IE techiniques for structuringunstructured data
• Availability of encyclopaedias in the form of knowledgebases
• Wide range of applications in Semantic Web
• Further expansion of knowledge bases with facts about thereal world from unstructured text apart from WikipediaInfoboxes
• Quality aspects of data
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
1 Introduction
2 Semantic WebOntologiesLinked Data
3 Information Sources
4 Information Extraction and Text MiningMachine ReadingRelation ExtractionNamed Entity Recognition and Disambiguation
5 Semantic Web Application Use CasesKnowledge BasesEntity LinkingEntity RetrievalLinked Data Quality
6 Conclusions
7 Papers for Presentations
8 Resources
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Papers for Presentations
1 YAGO: A Core of Semantic Knowledge Unifying WordNet andWikipedia. Suchanek F., Kasneci Gj., Weikum G.,. InProceedings of the 16th WWW, page 697-706, 2007
2 Semantic Stability in Social Tagging Streams. Wagner C.,Singer P., Strohmaier M., Huberman B.,. CoRR, 2013
3 Test-driven Evaluation of Linked Data Quality. Kontokostas D.,Westphal P., Auer S., Hellmann S., Lehmann J., Cornelissen R.,Zaveri A.,. In Proceedings of the 23rd WWW, page 747–758,2014
4 Federated Entity Search Using On-the-Fly Consolidation. HerzigD., Mika P., Blanco R., Tran T.,. In proceedings of the ISWC,page 167-183.
5 Automatic Expansion of DBpedia Exploiting WikipediaCross-Language Information. Palmero Aprosio A., Giuliano C.,Lavelli A.,. In proceedings of the 11th ESWC, page 397-411.
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Resources
• Fabian Suchanek and Gerhard Weikum. 2013. Knowledge harvesting in thebig-data era. In Proceedings of the 2013 ACM SIGMOD InternationalConference on Management of Data (SIGMOD ’13).
• Gerhard Weikum and Martin Theobald. 2010. From information toknowledge: harvesting entities and relationships from web sources. InProceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGARTsymposium on Principles of database systems (PODS ’10).
• Roi Blanco, Peter Mika, and Sebastiano Vigna. 2011. Effective andefficient entity search in RDF data. In Proceedings of the 10th internationalconference on The semantic web (ISWC’11).
• Jeffrey Pound, Peter Mika, and Hugo Zaragoza. 2010. Ad-hoc objectretrieval in the web of data. In Proceedings of the 19th internationalconference on World wide web (WWW ’10).
• Nunes, B. P., Dietze, S., Casanova, M.A., Kawase, R., Fetahu, B. andNejdl, W.. ”Combining a co-occurrence-based and a semantic measure forentity linking.” In Proceedings of the 10th Extended Semantic WebConference, 2013 (ESWC’13).
• Zaveri, Amrapali, Rula, Anisa, Maurino, Andrea, Pietrobon, Ricardo,Lehmann, Jens and Auer, Soren. ”Quality Assessment Methodologies forLinked Open Data.” Semantic Web Journal (2014).
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Resources• Gangemi, Aldo. ”A Comparison of Knowledge Extraction Tools for the
Semantic Web.” In Proceedings of the 10th Extended Semantic WebConference, 2013 (ESWC’13).
• Mendes, Pablo N., Jakob, Max, Garca-Silva, Andres and Bizer, Christian.”DBpedia spotlight: shedding light on the web of documents.” InProceedings of the 7th International Conference on Semantic Systems,2011.
• Yosef, Mohamed Amir, Hoffart, Johannes, Bordino, Ilaria, Spaniol, Marcand Weikum, Gerhard. ”AIDA: An Online Tool for AccurateDisambiguation of Named Entities in Text and Tables.” PVLDB 4 , no. 12(2011): 1450-1453.
• Isabelle Augenstein, Sebastian Pado, and Sebastian Rudolph. 2012.LODifier: generating linked data from unstructured text. In Proceedings ofthe 9th international conference on The Semantic Web: research andapplications (ESWC’12).
• Soren Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, RichardCyganiak, and Zachary Ives. 2007. DBpedia: a nucleus for a web of opendata. In Proceedings of the 6th international The semantic web and 2ndAsian conference on Asian semantic web conference (ISWC’07/ASWC’07).
• Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: acore of semantic knowledge. In Proceedings of the 16th internationalconference on World Wide Web (WWW ’07).
• Oren Etzioni, Michele Banko, Stephen Soderland, and Daniel S. Weld.2008. Open information extraction from the web. Commun. ACM 51, 12(December 2008), 68-74.
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Resources• Mausam, Michael Schmitz, Robert Bart, Stephen Soderland, and Oren
Etzioni. 2012. Open language learning for information extraction. InProceedings of the 2012 Joint Conference on Empirical Methods in NaturalLanguage Processing and Computational Natural Language Learning(EMNLP-CoNLL ’12).
• Raymond J. Mooney and Razvan Bunescu. 2005. Mining knowledge fromtext using information extraction. SIGKDD Explor. Newsl.
• Chang Wang, James Fan, Aditya Kalyanpur, and David Gondek. 2011.Relation extraction with relation topics. In Proceedings of the Conferenceon Empirical Methods in Natural Language Processing (EMNLP ’11).
• Robert Isele, Anja Jentzsch, Christian Bizer: Silk Server - Adding missingLinks while consuming Linked Data. COLD 2010.
• Oren Etzioni. 2008. Machine reading at web scale. In Proceedings of the2008 International Conference on Web Search and Data Mining.
• Luciano Del Corro and Rainer Gemulla. 2013. ClausIE: clause-based openinformation extraction. In Proceedings of the 22nd international conferenceon World Wide Web (WWW ’13).
• Rudi Studer, V.Richard Benjamins, Dieter Fensel, Knowledge engineering:Principles and methods, Data & Knowledge Engineering, Volume 25, Issues1–2, 1998, pages 161-197.
• Christian Bizer, Tom Heath, and Tim Berners-Lee. International Journal onSemantic Web and Information Systems 5(3):1–22 (2009)
Introduction
Semantic Web
Ontologies
Linked Data
InformationSources
InformationExtraction andText Mining
MachineReading
RelationExtraction
Named EntityRecognition andDisambiguation
Semantic WebApplicationUse Cases
KnowledgeBases
Entity Linking
Entity Retrieval
Linked DataQuality
Conclusions
Papers forPresentations
Resources
Thank you!Questions?