+ All Categories
Home > Documents > A Semantic Web Search and Metadata Engine

A Semantic Web Search and Metadata Engine

Date post: 26-Feb-2016
Category:
Upload: connie
View: 62 times
Download: 3 times
Share this document with a friend
Description:
A Semantic Web Search and Metadata Engine. Roi Adadi David Ben-David. Glossary. SWD. < rdf:RDF > … < rdfs:Class rdf:ID =”Department” /> < rdfs:Class rdf:ID =”Course” /> < rdf:Property rdf:ID =“name” > < rdfs:domain > < owl:Class > - PowerPoint PPT Presentation
35
A Semantic Web Search and Metadata Engine Roi Adadi David Ben-David
Transcript
Page 1: A Semantic Web Search and Metadata Engine

A Semantic Web Search and Metadata Engine

Roi AdadiDavid Ben-David

Page 2: A Semantic Web Search and Metadata Engine

Semantic Web Document (SWD)◦ A web page that serializes an RDF graph.◦ Uses one of the recommended RDF syntax languages, i.e. RDF/XML, N-

TRIPLE or N3. Semantic Web Term (SWT)

◦ An RDF resource that represents an instance of rdfs:Class or rdf:Property, and can be universally referenced by its URI reference (URIref).

Semantic Web Ontology (SWO)◦ An SWD is considered to be an SWO when a significant proportion of

the statements it makes defines new SWTs. Semantic Web Database (SWDB)

◦ An SWD that does not define or extend a significant number of terms.◦ Introduces individuals and makes assertions about them.◦ Make assertions about individuals defined in other SWDs.

Glossary<rdf:RDF> … <rdfs:Class rdf:ID=”Department” /> <rdfs:Class rdf:ID=”Course” /> <rdf:Property rdf:ID=“name” > <rdfs:domain> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <rdfs:Class rdf:about=# Department /> <rdfs:Class rdf:about=#Course /> </owl:unionOf> </owl:Class> </rdfs:domain> <rdfs:range rdf:resource= http://www.w3.org/2000/01/rdf-schema#Literal/> </rdf:Property> <rdf:Property rdf:ID=“number” > <rdfs:domain rdf:resource=“#Course”/> <rdfs:range rdf:resource= http://www.w3.org/2000/01/rdf-schema#Literal/> </rdf:Property> <rdf:Property rdf:ID=“department” > <rdfs:domain rdf:resource=“#Course”/> <rdfs:range rdf:resource=“#Department”> </rdf:Property> <rdf:Property rdf:ID=“creditPts” > <rdfs:domain rdf:resource=“#Course”/> <rdfs:range rdf:resource= http://www.w3.org/2000/01/rdf-schema#Literal/> </rdf:Property> <Department rdf:ID=“dept_cs”> <name>Computer Science</name> </Department> <Course rdf:ID=“cs236703” > <name>Object Oriented Programming</name> <department rdf:Resource=“#dept_cs” /> <creditPts>3.0</creditPts> </Course> …</rdf:RDF>

SWD

SWTSWT

SWT

SWT

SWT

SWT

Page 3: A Semantic Web Search and Metadata Engine

SWOClass Document

Class Organization

Property mbox

FOAFhttp://xmlns.com/foaf/spec/index.rdfContain 12 classes and 51 properties (in 466 triples)

(No individuals)

Page 4: A Semantic Web Search and Metadata Engine

SWDB

Name statement

Nick Name statement

FOAF description for Tim Finin

www.cs.umbc.edu/~finin//foaf.rdfDefines three individuals and make statements about them

(No classes or properties)

Page 5: A Semantic Web Search and Metadata Engine

Current form of the Semantic Web◦ web of Semantic Web Documents (SWD)

Navigating the Semantic Web is difficult◦ Paucity of explicit hyperlinks (beyond NS in URIrefs).◦ Relations such as rdfs:seeAlso and owl:imports are rare.

There is a need for a search engine customized for SWD◦ Find and analyze SWDs on the web.◦ Suggest a measure for SWDs’ importance (ranking).

Motivation

Page 6: A Semantic Web Search and Metadata Engine

Semantic Web researchers◦ Search for SWTs and SWOs for publishing their

knowledge.

Software Agents◦ Search SWDs for external knowledge.◦ Retrieve SWOs to fully understand SWTs.

Who needs it?Find the most popular ontologyto publish a personal profile

Page 7: A Semantic Web Search and Metadata Engine

Conventional web navigation and ranking models are not suitable for the Semantic Web.

They do not differentiate SWDs from other web pages.

They do not parse and use the internal structure of SWD and the external semantic links among SWDs◦ Designed to work with NL and unstructured text

Why don’t just use Google?

The FOAF ontology is not among the 10 search results in Google for “person ontology”

Page 8: A Semantic Web Search and Metadata Engine

Finding appropriate ontologies◦ Qualified search (Terms + Types)◦ Ontologies are sorted by their popularity.

Finding instance data◦ Querying SWDs with constraints on the classes and

properties used by them.◦ Helps to integrate Semantic Web data on the web.

Characterizing the Semantic Web◦ Structural properties

Swoogle Objectives

Page 9: A Semantic Web Search and Metadata Engine

Ontology Based Annotation Systems◦ SHOE, Ontobroker, webKB, QuizRDF, CREAM, …◦ Annotating online documents.◦ Document indexes based on the annotations, but

not on the entire document.◦ Use their own ontologies that might not suit some

SWDs

Related Work

Page 10: A Semantic Web Search and Metadata Engine

Ontology Repositories◦ DAML Ontology Library, SemWebCentral, Schema

Web, …◦ Collect ontologies (simply store the entire RDF

document).◦ Do not automatically discover SWDs but rather

require people to submit URLs.◦ Constitute a small portion of the Semantic Web.

Related Work – cont.

Page 11: A Semantic Web Search and Metadata Engine

Semantic Web Browsers◦ W3C’s Ontaria

Searchable and browsable directory of RDF documents developed by the W3C.

◦ Do not automatically discover SWDs.◦ Stores the full RDF graphs.◦ Indexes individuals of well known classes

e.g. foaf:Person, rss:Item

Related Work– cont.

Experiments show:

outperforms them all!

Page 12: A Semantic Web Search and Metadata Engine

Crawler-based indexing and retrieval system for the Semantic web.

Discover semantic web documents Computes relations between documents Store and reason over extracted metadata

◦ The system is designed to scale up to handle tens of millions of documents

Enables rich query constraints on semantic relations

Swoogle

Page 13: A Semantic Web Search and Metadata Engine

Swoogle Architecture

Page 14: A Semantic Web Search and Metadata Engine

Collects candidate URLs to find and cache SWDs◦ Submitted URLs.◦ A Web crawler.◦ A customized meta-crawler (using conventional

search engines).◦ SwoogleBot Semantic Web Crawler .

Analyzes SWDs to produce new candidates.

Swoogle Architecture - Discovery

Up until now Swoogle

has found over 1.7M

SWDs with more than 1G

triples!

Page 15: A Semantic Web Search and Metadata Engine

Analyzes the discovered SWDs Generates the bulk of Swoogle’s metadata

about the Semantic Web◦ Characterizes features associated with SWDs and

SWTs.◦ Tracks relations among SWDs and SWTs.

Swoogle Architecture – Indexing

How SWDs use/define/populate a given SWT?

How two SWTs are associated?…

Page 16: A Semantic Web Search and Metadata Engine

Analyzes the generated metadata. ◦ Classification of SWOs and SWDBs.

Hosts the modular ranking mechanisms.◦ Ontology Rank.

Swoogle Architecture – Analysis

Page 17: A Semantic Web Search and Metadata Engine

provides search services to software agents and users, allowing them to access metadata and navigate the semantic web◦ Swoogle Search – searches SWDs using

constraints on URLs, SWTs being used or defined, etc.

◦ Ontology Dictionary – searches ontologies at the term level and offers more navigational paths.

Swoogle Architecture – Services

Page 18: A Semantic Web Search and Metadata Engine

SWD metadata is collected to make SWD search more efficient and effective.

Derived from the content of SWD as well as the relations among SWDs

3 categories of metadata:◦ Basic metadata◦ Relations among SWDs◦ Analytical results

SWD Metadata

Page 19: A Semantic Web Search and Metadata Engine

Language Features – properties describing the syntactic or semantic features of an SWD. ◦ Encoding – syntactic encoding of an SWD.

“RDF/XML”, “N-TRIPLE” and “N3”.◦ Language – the language used by an SWD.

“OWL”, “DAML+OIL”, “RDFS” and “RDF”.◦ OWL Species – the language species of an SWD

written in OWL. “OWL-LITE”, “OWL-DL” and “OWL-FULL”

Basic Metadata

Page 20: A Semantic Web Search and Metadata Engine

RDF Statistics – properties summarizing node distribution of the RDF graph of an SWD.◦ How an SWD defines new classes, properties and

individuals.◦ Let foo be an SWD and let C(foo), P(foo), I(foo) be the

set of classes, properties and individuals defined in the SWD foo respectively. The onology-ratio R(foo) is calculated by:

◦ R(foo) ranges from 0 to 1, where 0 implies that foo is a pure SWDB and 1 implies that foo is a pure SWO.

Basic Metadata – cont.

𝑅ሺ𝑓𝑜𝑜ሻ= ȁ�𝐶ሺ𝑓𝑜𝑜ሻȁ�+ ȁ�𝑃ሺ𝑓𝑜𝑜ሻȁ�ȁ�𝐶ሺ𝑓𝑜𝑜ሻȁ�+ ȁ�𝑃ሺ𝑓𝑜𝑜ሻȁ�+ ȁ�𝐼ሺ𝑓𝑜𝑜ሻȁ�

<rdf:RDF> <rdfs:Class rdf:ID=”Department” /> <rdfs:Class rdf:ID=”Course” /> <rdf:Property rdf:ID=“name” > <rdfs:domain> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <rdfs:Class rdf:about=# Department /> <rdfs:Class rdf:about=#Course /> </owl:unionOf> </owl:Class> </rdfs:domain> <rdfs:range rdf:resource= http://www.w3.org/2000/01/rdf-schema#Literal/> </rdf:Property> <rdf:Property rdf:ID=“number” > <rdfs:domain rdf:resource=“#Course”/> <rdfs:range rdf:resource= http://www.w3.org/2000/01/rdf-schema#Literal/> </rdf:Property> <rdf:Property rdf:ID=“department” > <rdfs:domain rdf:resource=“#Course”/> <rdfs:range rdf:resource=“#Department”> </rdf:Property> <rdf:Property rdf:ID=“creditPts” > <rdfs:domain rdf:resource=“#Course”/> <rdfs:range rdf:resource= http://www.w3.org/2000/01/rdf-schema#Literal/> </rdf:Property> <Department rdf:ID=“dept_cs”> <name>Computer Science</name> </Department> <Course rdf:ID=“cs236703” > <name>Object Oriented Programming</name> <department rdf:Resource=“#dept_cs” /> <creditPts>3.0</creditPts> </Course></rdf:RDF> 2 4 0.752 4 2R A

_ , 236703I A dept cs cs

, , ,P A name number department creditPts

,C A Department Course

Page 21: A Semantic Web Search and Metadata Engine

Ontology Annotations– properties that describe an SWD as an ontology.◦ The SWD has an instance of OWL:Ontology◦ Swoogle records the following properties:

label (rdfs:label) comment (rdfs:comment) versionInfo (owl:versionInfo/daml:versionInfo)

Basic Metadata – cont.

Page 22: A Semantic Web Search and Metadata Engine

Capturing and analyzing relations at the RDF node level is hard.

Swoogle generalizes RDF node level relations and Focuses on SWD level relations.

Swoogle captures the following SWD level relations:◦ TM/IN – SWD is using terms defined by some other SWDs.◦ IM – an ontology imports another ontology.◦ EX – an ontology extends another ontology◦ PV – an ontology is a prior version of another.◦ CPV – an ontology is a prior version of another and is

compatible with it.◦ IPV - an ontology is a prior version of another and is

incompatible with it.

Relations Among SWDs

Page 23: A Semantic Web Search and Metadata Engine

Inter-Ontology relations

Indicators of inter-ontology relation

Page 24: A Semantic Web Search and Metadata Engine

OntologyRank inspired by Google’s PageRank algorithm.

Underlying Random Surfing Model:◦ Surfer jumps to a random URL◦ With probability d randomly chooses a link to

follow.◦ With probability 1-d jumps to another random URL.

Ranking SWDs

Page 25: A Semantic Web Search and Metadata Engine

Given a document A, A’s Page rank is computed by:

where are web documents that link to A; C(T) is the total outlinks of T; and d is a damping factor, typically set to 0.85.

Page Rank

𝑃𝑅ሺ𝐴ሻ= 𝑃𝑅𝑑𝑖𝑟𝑒𝑐𝑡ሺ𝐴ሻ+ 𝑃𝑅𝑙𝑖𝑛𝑘ሺ𝐴ሻ 𝑃𝑅𝑑𝑖𝑟𝑒𝑐𝑡ሺ𝐴ሻ= ሺ1− 𝑑ሻ 𝑃𝑅𝑙𝑖𝑛𝑘ሺ𝐴ሻ= 𝑑⋅ ቀ𝑃𝑅ሺ𝑇1ሻ𝐶ሺ𝑇1ሻ + ⋯+ 𝑃𝑅ሺ𝑇𝑛ሻ𝐶ሺ𝑇𝑛ሻቁ 𝑇1,…,𝑇𝑛

Page 26: A Semantic Web Search and Metadata Engine

PageRank 1 0.15directPR A d

1 15PR T 2 8PR T 3 18PR T

0.15 8.5 8.65PR A

15 8 18 8.55 4 6linkPR A d

1 5C T 2 4C T 3 6C T

Page 27: A Semantic Web Search and Metadata Engine

The graph formed by SWDs has a richer set of relations.◦ The edges have explicit semantics

Users can navigate the Semantic Web whithin or across the web and RDF graph through 7 groups of navigational paths

The SW Navigation Model

Page 28: A Semantic Web Search and Metadata Engine

The SW Navigation Model

Page 29: A Semantic Web Search and Metadata Engine

The semantics of links lead to a non-uniform probability of following a particular outgoing link.

Given SWD’s A and B, Swoogle classifies inter-SWD links into four categories:◦ imports(A,B) – A import all content of B.◦ uses-term(A,B) – A uses some of the terms defined by B

(without importing B).◦ extends(A,B) – A extends the definitions of terms defined by

B.◦ asserts(A,B) – A makes assertions about the individuals

defined by B. Each category is assigned a different weight, which

represents the probability of following that kind of link.

OntologyRank

Page 30: A Semantic Web Search and Metadata Engine

Given an SWD ȁ, Swoogle computes its raw rank by:

where L(a) is the set of SWDs that link to a, T(x) is the set of SWDs that x links to.

OntologyRank – cont.

𝑟𝑎𝑤𝑃𝑅ሺ𝑎ሻ=ሺ1− 𝑑ሻ+ 𝑑⋅ σ 𝑟𝑎𝑤𝑃𝑅ሺ𝑥ሻ𝑓ሺ𝑥,𝑎ሻ𝑓ሺ𝑥ሻ𝑥∈𝐿ሺ𝑎ሻ 𝑓ሺ𝑥,𝑎ሻ= σ 𝑤𝑒𝑖𝑔ℎ𝑡ሺ𝑙ሻ𝑙∈𝑙𝑖𝑛𝑘𝑠ሺ𝑥,𝑎ሻ 𝑓ሺ𝑥ሻ= σ 𝑓ሺ𝑥,𝑎′ሻ𝑎′∈𝑇ሺ𝑥ሻ

Page 31: A Semantic Web Search and Metadata Engine

Then, Swoogle computes the rank for SWDB and SWO by:

where T(c) is the transitive closure of SWOs imported by a.

OntologyRank – cont.

𝑃𝑅𝑆𝑊𝐷𝐵ሺ𝑎ሻ= 𝑟𝑎𝑤𝑃𝑅ሺ𝑎ሻ 𝑃𝑅𝑆𝑊𝑂ሺ𝑎ሻ= σ 𝑟𝑎𝑤𝑃𝑅ሺ𝑥ሻ𝑥∈𝑇𝐶(𝑎)

Page 32: A Semantic Web Search and Metadata Engine

The problem of Indexing and Searching SWDs◦ Significant semantic information encoded in

marked documents.◦ Reasoning over large collection of documents can

be expensive.

Traditional information retrieval techniques◦ Faster (coarse view of the text).◦ Can quickly retrieve a set of SWD’s based on

similarities of the source text alone.

Indexing and Retrieval of SWDs

Page 33: A Semantic Web Search and Metadata Engine

SWDs are not entirely markup.◦ Search should be applied to both structured and

unstructured components of the document.

We may want SWDs to be available to commonly used search engins◦ Documents must be transformed to a form that a standard

IR engine can understand and manipulate.

Well researched methods for ranking matches, computing similarities between documents and employing relevance feedback.

Applying IR Techniques

Page 34: A Semantic Web Search and Metadata Engine

Look at a document as a collection of either tokens or N-Grams.

URIrefs of classes, properties and individuals corresponds to words in natural languages.

Apply the following process to an SWD◦ Reduce it to triples.◦ Extract URIrefs (with duplicates).◦ Discard URIrefs of blank nodes.◦ Hash each URI to a token.◦ Index the document.

Applying IR Techniques

indexesby either N-Gram

or URIrefs

Matching “time” to:http://foo.com/timeont.owl#timeInterval

http://foo.com/timeont.owl#calendarClockInterval

http://purl.org/upper/temporal/t13.owl#timeThing

Page 35: A Semantic Web Search and Metadata Engine

Swoogle Demo…


Recommended