+ All Categories
Home > Documents > Swoogle: A Semantic Web Search and Metadata Engine

Swoogle: A Semantic Web Search and Metadata Engine

Date post: 31-Jan-2016
Category:
Upload: ethan
View: 50 times
Download: 0 times
Share this document with a friend
Description:
Swoogle: A Semantic Web Search and Metadata Engine. Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng Pavan Reddivari, Vishal Doshi, Joel Sachs Department of Computer Science and Electronic Engineering University of Maryland Baltimore County CIKM ‘04 ------- Dongmin Shin - PowerPoint PPT Presentation
Popular Tags:
19
Swoogle: A Semantic Web Swoogle: A Semantic Web Search and Metadata Engine Search and Metadata Engine Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng Pavan Reddivari, Vishal Doshi, Joel Sachs Department of Computer Science and Electronic Engineering University of Maryland Baltimore County CIKM ‘04 ------- Dongmin Shin IDS Lab 2008.10.22
Transcript
Page 1: Swoogle: A Semantic Web Search and Metadata Engine

Swoogle: A Semantic Web Search Swoogle: A Semantic Web Search and Metadata Engineand Metadata Engine

Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng

Pavan Reddivari, Vishal Doshi, Joel Sachs

Department of Computer Science and Electronic Engineering University of Maryland Baltimore County

CIKM ‘04

-------

Dongmin Shin

IDS Lab

2008.10.22

Page 2: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

IndexIndex

Introduction

Semantic Web Documents

Swoogle Architecture

Finding SWDs

SWD Metadata

Ranking SWDs

Indexing and Retrieval of SWDs

Conclusions

Evaluation and Discussion

Center for E-Business Technology

Page 3: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

IntroductionIntroduction

Semantic Web documents(SWDs) are characterized by semantic annotation and meaningful references to other SWDs

Conventional search engines do not take advantage of these features

A search engine customized for SWDs is needed

Center for E-Business Technology

Swoogle is a crawler-based indexing and retrieval system for the Semantic Web

Page 4: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

IntroductionIntroduction

Three Activities of Swoogle

Finding appropriate ontologies

– Allows users to query for ontologies that contain specified terms anywhere in the document

– The ontologies returned are ranked

Finding instance data

– Enables querying SWDs with constraints on what classes and properties being used/defined by them

Characterizing the Semantic Web

– Be collecting metadata about the Semantic Web, Swoogle reveals interesting structural properties

Center for E-Business Technology

Swoogle automatically discovers SWDs, indexes their metadata and answers queries about it

Page 5: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

Semantic Web DocumentsSemantic Web Documents

SWD A document in a semantic web language that is online

and accessible to web users and software agents

Two kinds of documents of SWD SWOs (Semantic Web Ontologies)

– Correspond to T-Boxes

– Significant proportion of the statements it makes define new terms or extend the definitions of terms defined in other SWDs

SWDBs (Semantic Web Databases)– Correspond to A-Boxes

– It does not define or extend a significant number of terms

– It can introduce individuals and make assertions about them or make assertions about individuals defined in other SWDs

Center for E-Business Technology

Page 6: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

Swoogle ArchitectureSwoogle Architecture

SWD discovery

Discovers potential SWDs

throughout the Web

Metadata creation

Caches a snapshot of a SWD and generates objective metadata about SWDs

Data analysis

Uses the cached SWDs and the created metadata to derive analytical reports

Interface

Providing data services to the Semantic Web community

Center for E-Business Technology

Page 7: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

Finding SWDsFinding SWDs

Google Crawler Using Google Web Service

Start with type extensions

Append some constraints(keywords) to construct more specific queries, and then combine their results

Focused Crawler Crawls documents within a given website

Extension constraint– e.g. not “.jpg” or “.html”

Focus constraint– only crawl URLs relative to the given base URL

Center for E-Business Technology

Page 8: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

Finding SWDsFinding SWDs

Web interface

Registered users can submit a URL of either a SWD or a web directory

JENA2 based Swoogle Crawler

Analyzes the content of a SWD and discovers new SWDs

– E.g. Use URIref, owl:imports, rdfs:seeAlso, foaf:Person

Center for E-Business Technology

Page 9: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

SWD Metadata – Basic MetadataSWD Metadata – Basic Metadata

Language feature Properties describing the syntactic or semantic features

of a SWD– Encoding : syntactic encoding of a SWD : RDF/XML, N-TRIPLE, N3

– Language : Semantic Web language used by a SWD : OWL, DAML, RDFS, RDF

– OWL Species : language species of a SWD written in OWL : OWL-LITE, OWL-DL, OWL-FULL

RDF statistics Properties summarizing node distribution of the RDF

graph

Focus on how SWDs define new classes, properties and individuals– SWDB & SWO by ontology-ratio R(foo)

Center for E-Business Technology

Page 10: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

SWD Metadata – Basic MetadataSWD Metadata – Basic Metadata

Ontology annotation

Properties that describe a SWD as an ontology

– label. i.e. rdfs:label

– comment. i.e. rdfs:comment

– versionInfo. i.e. owl:versionInfo and daml:versionInfo

Center for E-Business Technology

Page 11: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

SWD Metadata – Relations among SWD Metadata – Relations among SWDsSWDs

TM/IN Term reference relations between two SWDs

– i.e. a SWD is using terms defined by some other SWDs

IM An ontology imports another ontology

EX An ontology extends another

– i.e. ontology A defines class AC which has the “rdfs:subClassOf” relation with class BC defined in ontology B

PV An ontology is a prior version of another

CPV An ontology is a prior version of and is compatible with another

IPV An ontology is a prior version of but is incompatible with another

Center for E-Business Technology

Page 12: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

Ranking SWDsRanking SWDs

Random surfing model(PageRank)

not appropriate for the Semantic Web

– Semantics of links lead to a non-uniform probability of following a particular outgoing link

Rational random surfing model

Inter-SWD links into four categories

– imports(A,B), uses-term(A,B), extends(A,B), asserts(A,B)

The more terms in B referenced by A, the more likely a surfer will follow the link from A to B

Center for E-Business Technology

Page 13: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

Ranking SWDsRanking SWDs

Google

Center for E-Business Technology

A

B

D

C

PR(A) = (1-d) + d( 1/4 + 1/2 + 1/3)

Swoogle

A

B

D

C

rawPR(A) = (1-d) + d( 0.4/(0.4+0.3+0.2+0.4) +

0.6/(0.6+0.1) +0.5/(0.5+0.1+0.7))

0.4

0.30.2

0.40.1

0.6

0.5

0.70.1

Page 14: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

Ranking SWDsRanking SWDs

Center for E-Business Technology

Page 15: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

Indexing and Retrieval of SWDsIndexing and Retrieval of SWDs

Using traditional IR techniques

Reasoning over large collections of documents can be expensive

IR techniques have the advantage of being faster, while taking a somewhat more coarse view of the text

Including well researched method for ranking matches, computing similarity between documents

Using N-grams

Can result in a larger vocabulary

Inter-word relationships are preserved

Somewhat resistant to certain kinds of errors

Center for E-Business Technology

Page 16: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

ConclusionsConclusions

Current web search engines

Do not work well with SWDs, as they are designed to work with natural languages and expect documents to contain unstructured text composed of words

Swoogle

A prototype crawler-based indexing and retrieval system for Semantic Web documents

Center for E-Business Technology

Page 17: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

Evaluation and DiscussionEvaluation and Discussion

Pros Clear contribution on the method:

– How to discover potential SWDs

– How to rank SWDs

Cons Poor explanation about ranking algorithm

– The reason they differentiated between SWOs and SWDBs

– How the ranking formula(which are different depend on type of SWD) comes out

Discussion How can Semantic Web retrieval system process conflict

between SWDs

By ranking? Or by TF-IDF? Or else method?

Center for E-Business Technology

Page 18: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

Current Status (@ 2005)Current Status (@ 2005)

Referenced from Li Ding et al.,

"Finding and Ranking Knowledge on the Semantic Web", Proceedings of the 4th International Semantic Web Conference, November 2005.

Tim Finin et al., "Swoogle: Searching for knowledge on the Semantic Web", AAAI 05 (intelligent systems demo), July 2005

System architecture Metadata creation

-> Digest– Computes metadata for

SWDs and semantic web

terms(SWTs) as well as

identifies relations among them

Center for E-Business Technology

Page 19: Swoogle: A Semantic Web Search and Metadata Engine

Copyright 2008 by CEBT

Current Status (@ 2005)Current Status (@ 2005)

Size

SWDs : 135K -> 368K SWDs

SWOs : 13.29% of SWDs -> 1% of SWDs

Ranking SWDs and SWTs

Center for E-Business Technology


Recommended