+ All Categories
Home > Documents > PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web...

PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web...

Date post: 15-Mar-2018
Category:
Upload: ngonhan
View: 217 times
Download: 3 times
Share this document with a friend
38
UMBC UMBC an Honors University in an Honors University in Maryland Maryland 1 Search Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore County Joint work with Li Ding, Anupam Joshi, Yun Peng, Pranam Kolari, Pavan Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott Cost and Vishal Doshi http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F30602-97-1-0215, NSF grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.
Transcript
Page 1: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 1

Search Engines for Semantic Web

KnowledgeTim Finin

University of Maryland, Baltimore County

Joint work with Li Ding, Anupam Joshi, Yun Peng, Pranam Kolari, Pavan Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott

Cost and Vishal Doshi

http://creativecommons.org/licenses/by-nc-sa/2.0/This work was partially supported by DARPA contract F30602-97-1-0215, NSF

grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.

Page 2: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 2

This talk• Motivation• Semantic web 101• Swoogle Semantic Web

search engine• Use cases and applications• Conclusions

Page 3: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 3

Once there were only a

few large computers

Page 4: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 4

Then there were many,

Page 5: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 5

All connected 24x7,Internet

Cellular telephonyIRDA802.11

BluetoothUltra Wide Band

RFIDand more to come

Page 6: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 6

Interoperating;tcp/ip ftp smtp rpc corba ssh

http html xml

gif jpg mpg mp3pdf …

Page 7: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 7

Access to the world’s knowledge

del.icio.us

Page 8: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 8

Google has made us smarter

Page 9: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 9

But what about our agents?

tell

register

Agents still have a very minimal understanding of text and images.

Page 10: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 10

This talk• Motivation• Semantic web 101• Swoogle Semantic Web search

engine• Use cases and applications• Conclusions

Page 11: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 11

XML helps

“XML is Lisp's bastard nephew, with uglier syntax and no semantics. Yet XML is poised to enable the creation of a Web of data that dwarfs anything since the Library at Alexandria.”

-- Philip Wadler, Et tu XML? The fall of the relational empire, VLDB, Rome, September 2001.

Page 12: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 12

“The Semantic Web will globalize KR, just as the WWW globalize hypertext”

-- Tim Berners-Lee

Semantic Web adds semantics

Page 13: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 13

Semantic Web 101<?xml version="1.0" encoding="utf-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf=http://xmlns.com/foaf/0.1/ xmlns:uni=http//ebiquity.umbc.edu/ontologies/uni/>

<uni:Student> <foaf:name>Li Ding</foaf:name> <foaf:mbox rdf:resource=“mailto:[email protected]”/> </uni:Student></rdf:RDF>

• RDF/XML• rdf:RDF tag• namespaces ontologies

• Semantic graph, URIs as nodes & links

• triples

Li Dingfoaf:name

uni:Studentrdf:type

Page 14: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 14

Where’s the semantics?• URIs as “rigid designators”• Conventions for URIs denoting things in the “real

world”• Namespaces and URIs provide an unambiguous shared

vocabulary• RDF, RDFS and OWL have semantics defined using

model theory and also axioms• Ontologies allow agents to draw inferences

– uni:Student is a subclass of foaf:Person– Every uni:Student has at least one uni:school, which must be

an instance of uni:School– A foaf:Person with a uni:school is necessarily a uni:Student

Page 15: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 15

Page 16: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 16

Page 17: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 17

Page 18: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 18

RDF/a RDF/a is a W3C proposal for embedding RDF in XHTML documents

<html xmlns:foaf="http://xmlns.com/foaf/0.1/"> <head><title>Jo Lambda's Home Page</title></head> <body> Hello. This is <span property="foaf:name">Jo Lambda</span>'s home page. <h2>Work</h2> If you want to contact me at work, you can either <a rel="foaf:mbox" href="mailto:[email protected]">email me</a>, or call <span property="foaf:phone">+1 777 888 9999</span>. </body></html>

<> foaf:name "Jo Lambda"^^rdf:XMLLiteral ; foaf:mbox <mailto:[email protected]> ; foaf:phone "+1 777 888 9999"^^rdf:XMLLiteral .

An HTMLDocument with RDF embedded

The triples in ntriple format.

Page 19: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 19

But what about our agents?

A Google for knowledge on the Semantic Web is needed by software agents and programs

SwoogleSwoogle

Swoogle

Swoogle

SwoogleSwoogle

SwoogleSwoogle

Swoogle SwoogleSwoogle

SwoogleSwoogle

SwoogleSwoogle

tell

register

Page 20: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 20

This talk• Motivation• Semantic web 101• Swoogle Semantic Web search

engine• Use cases and applications• Conclusions

Page 21: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 21

Page 22: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 22

•http://swoogle.umbc.edu/•Running since summer 2004•1.4M RDF documents, 250M RDF triples, 10K

ontologies•Semantic Web archive: many dynamic RDF

documents

Page 23: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 23

Analysis

Index

Discovery

IR IndexerSearch Services

Semantic Webmetadata

Web Service

Web Server

Candidate URLs

Bounded Web CrawlerGoogle Crawler

SwoogleBot

SWD Indexer

Ranking

document cache

SWD classifier

human machine

html rdf/xml

the WebSemantic Web

Information flow Swoogle‘s web interface

Legends

Swoogle Architecture

Page 24: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 24

A Hybrid Harvesting Framework

Manual submission

RDF crawlingBounded HTML crawlingMeta crawlingSeeds M Seeds H Seeds R

SwoogleSampleDataset

Inductive learner

the Web

Google API call crawl crawl

true

would

google

Page 25: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 26

Performance – crawlers’ contribution • High SWD ratio: 42% URLs are confirmed as SWD

• Consistent growth rate: 3000 SWDs per day• RDF crawler: best harvesting method• HTML crawler: best accuracy• Meta crawler: best in detecting websites

0 500000 1000000 1500000

html craw ler

meta craw ler

rdf craw ler

sw oogle2

sw d nsw d failed unpinged

# of documents

Page 26: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 27

This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions

Page 27: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 28

Applications and use cases• Supporting Semantic Web developers

– Ontology designers, vocabulary discovery, who’s using my ontologies or data?, use analysis, errors,statistics, etc.

• Searching specialized collections– Spire: aggregating observations and data from biologists– InderenceWeb: searching over and enhancing proofs– SemNews: Text Meaning of news stories

• Supporting SW tools– Triple shop: finding data for SPARQL queries

Page 28: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 32

Web-scale semantic web data access

agent data access service the Web

ask (“person”)Search vocabulary

ask (“?x rdf:type foaf:Person”)

inform (“foaf:Person”)

Fetch docs

Populate RDF database

Query localRDF database

inform (doc URLs)

Search URIrefs in SW vocabulary

Search URLsin SWD index

Compose query

Index RDF data

Page 29: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 33

UMBC Triple Shop• Online SPARQL RDF query processing based on HP’s

Joseki with two features• Selectable reasoning level of inference • Automatically finds SWDs for give queries using Swoogle

backend database– Provide dataset creation wizard and server-side dataset

storage– Tag and share saved datasets

SPARQL: a query language for getting information from RDF graphs (dataset)

Page 30: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 34

UMBC Triple ShopQuerying the Semantic Web is as easy

as shopping(1) Go to http://sparql.cs.umbc.edu/(2) You provide a SPARQL query and constraints on what sources to

use(3) Swoogle finds and suggests documents with relevant data,

producing a dataset(4) You specify the amount of reasoning to do, possibly resulting in an

enhanced dataset(5) We run the query and give you the results(6) You can also download the dataset or save it on the server and give

it tags

Page 31: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 35

Page 32: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 36

Page 33: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 37

Page 34: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 38

This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions

Page 35: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 39

Will it Scale? How?Here’s a rough estimate of the data in RDF documents on the semantic web based on Swoogle’s crawling

System/date Terms Documents Individuals Triples Bytes

Swoogle2 1.5x105 3.5x105 7x106 5x107 7x109

Swoogle3 2x105 7x105 1.5x107 7.5x107 1x1010

2006 1x106 5x107 5x107 5x109 5x1011

2008 5x106 5x109 5x109 5x1011 5x1013

We think Swoogle’s centralized approach can be made to work for the next few years if not longer.

Page 36: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 40

How much reasoning?• SwoogleN (N<=3) does limited reasoning

– It’s expensive– It’s not clear how much should be done

• More reasoning would benefit many use cases– e.g., type hierarchy

• Recognizing specialized metadata– E.g., that ontology A some maps terms from B to C

Page 37: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 41

Conclusion• The web will contain the world’s knowledge in

forms accessible to people and computers– We need better ways to discover, index, search and

reason over SW knowledge• SW search engines address different tasks than

html search engines– So they require different techniques and APIs

• Swoogle like systems can help create consensus ontologies and foster best practices– Swoogle is for Semantic Web 1.0– Semantic Web 2.0 will make different demands

Page 38: PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 42

http://ebiquity.umbc.edu/Annotated

in OWL

For more information


Recommended