+ All Categories
Home > Documents > @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical...

@ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical...

Date post: 28-Dec-2015
Category:
Upload: phillip-pearson
View: 212 times
Download: 0 times
Share this document with a friend
68
@ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor: Tim Finin Date: Jan 19, 2005
Transcript
Page 1: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

@

On Boosting Semantic Web Data Access

Li Ding

Department of Computer Science and Electrical Engineering,University of Maryland Baltimore County

Advisor: Tim FininDate: Jan 19, 2005

Page 2: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

2

@

Outline Introduction

Thesis statement Contributions to computer science

Research description Research plan Preliminary and planned work

WOB-CORE: modeling the Semantic Web with its context Swoogle: digesting and searching the Semantic Web WOB: evaluating semantic web data quality

Summary Thesis schedule

Page 3: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

@

1. Introduction

The Semantic Web in the Web Motivation Thesis statement

Page 4: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

4

@

The Web

The Semantic Web in the Web

Semantic Web Data Access

wrapper service

database(Web) document

Static RDF document

RDF/XML, N3, N-Triple, OWL/XML…

RDF Graph

Agent &Web Service

HTTP HTTP, SOAP FIPA, SOAP,…

Agent World Inference Translation

Application

Page 5: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

5

@

The growing semantic web data More data ( from Swoogle Today, Jan 16, 2005 )

335,858 RDF documents (v.s. Google 8,058,044,651) 156,504 ontological terms (classes or properties) 46,987,876 triples

Well populated ontology (organization adoption) Blog, News feed (e.g. rss) Personal homepage and social networking (e.g. foaf, bio) Digital library (e.g. dc, dcTerms), Copyright – creative commons (cc) Software configuration (trustix) Dictionary (e.g. wordnet) Scientific data ( e.g. CRISISCat - California Invasive Species

Information Catalog) Potential semantic web data

Bibliography CIA world fact book

Page 6: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

6

@

Three challenges before utilizing semantic web data

Semantic Web

Where doesGeorge live ?

Ontology dictionary

Data access service

Which `live’ ?

Get it !

Quality of RDF graph

Which to believe?

Web scale semantic web vocabulary and data access

source

source

JoeRank? Trust ?

I mean ex:livesIn

foo:George ex:livesIn ex:TheWhiteHouse

foo:George ex:livesIn ex:Texas

foo:George ex:livesIn ?x

1

2

3

Page 7: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

7

@

Motivation The utility of semantic web data access depends on

three factors

Availability: how much semantic web data is available in the Web Accessibility: how easily and effectively can users obtain the data they

want Quality: how well can semantic web data satisfy users’ requirements

Applications Spire: sharing scientific information using the Semantic Web SemDis: discovering and evaluating semantic associations in the

Semantic Web

UtilitySWDA = f (Availability, Accessibility, Quality)

Page 8: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

8

@

Spire is a distributed, interdisciplinary research project exploring the use of semantic web technologies in support science in general and the field of ecoinformatics in particular.

Ecological Networks

California Invasive Species Information Catalog

UMBC Tree Survey

NBII-CAIN

Pacific Ecoinformatics and Computational Ecology Lab

Darwin Core

ebiquity@UMBC

MindSwap@UMCP

SF Tree Survey

who@where

How to search and use these data ?

publisher

creator

creator

creator

creator

Sharing semantic web data published by different sources throughout the Web

http://spire.umbc.edu/

Page 9: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

9

@

Al-Qaeda

Mr.X

Terrorist Group

isPresidentOf

listedIn

Company A

investsOrganization B

Osama Bin LadenmemberOf

Afghanistan

locatedIn

Mr. Y

ownedBy

locatedInrelatedTo

USlocatedIn

Kabul basedIn

Afghanistan

Company A

Osama Bin Laden

NASDAQ

CIA World Fact Book

CIA Agent W

Department of State Organization B

FOO News

Kabul

Agent K

Discover complex semantic associations in SW. Evaluate trustworthiness of discovered associations

Step1Collect semantic web data from multiple sources and merge a big RDF graph.

Step 2Discover paths from Mr.X to Osama Bin Laden in the big RDF graph.

Step 3Evaluate trustworthiness of a discovered path with provenance and trust data

http://semdis.umbc.edu/

Page 10: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

10

@

Research overview

Semantic web vocabulary Semantic web data access service

Quality of RDF graph

2. Swoogle: Digesting and Searching the Semantic Web

1. WOB-CORE: Modeling the Semantic Web and its context

3. WOB: Evaluating semantic web data quality

quality

accessibility

Consistency Importance Trustworthiness

Identify dimensions Rank importance Evaluate trustworthiness

Discover SW Digest SW metadata Search & navigation

Search URIrefs Map URIrefs

Search Ontologies Search RDF documents Semantic web “hyperlink”

Utility

Concepts Associations

Page 11: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

11

@

Thesis Statement

Finding and evaluating information in the large scale Semantic Web is critical to users’ adoption but is not met yet. We developed Web of Belief (WOB) ontology, Swoogle data access service and data quality evaluation mechanisms to address these issues. These tools are proven to be effective in building semantic web metadata and boosting web-scale semantic web data access in applications like SemDis and Spire.

Page 12: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

12

@

Contributions to computer science WOB is the first ontology that captures and collects the metadata

of the Semantic Web and its context RDF graph reference language Finer provenance model

Swoogle is one of the first data access services that digest and search the web-scale semantic web. Adaptive semantic web discovery agent Semantic web metadata

RDF graph abstract Ontology dictionary Recognized more relations among resources and document

Semantic web search and navigation model and service One of the first works that investigate semantic web data quality

Ranking the Semantic Web. We identified multiple navigation models for ranking.

Evaluate RDF graph's trustworthiness.

Page 13: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

@

2. Research Description

Modeling the Semantic Web with its context Digesting and searching the Semantic Web Evaluating semantic web data quality

Page 14: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

14

@

Agent World

The Semantic Web and its context

The RDF Graph World

The Web

RDF Document

serializes

RDF graph

Agent

createsbelieves

trusts

RDF resourceuses

Ontology

defines

trustprovenance

subClassOf

legends

Person subClassOf

DocumentsubClassOf

Page 15: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

15

@

Modeling the Semantic Web and its context Goals

Identify concepts and associations Build an ontology in OWL semantics, especially

RDF graph reference language Finer provenance

Populate this ontology by rule-based translation Principles

Build simple, clear and minimal ontology Reuse existing ontology Show entity identity Be aware of inference tractability

Evaluation Analytical comparison with other existing ontologies. Satisfy applications (Swoogle, SemDis) requirements

Page 16: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

16

@

Related works WOB-core Ontology

Meta-ontologies: RDF, OWL Popular ontologies: FOAF, DC

RDF graph reference Naïve approach: RDF test, OWL test RDF reification: RDF specification Named graphs (Carroll et al.2004)

Provenance Digital library (e.g. Dublin Core) Database:

data provenance (Buneman, Khanna, & Tan 2001) view maintenance (Cui, Widom, & Wiener 2000)

AI: knowledge provenance (da Silva, McGuinness, & McCool 2003; Fox

& Huang 2003) proof tracing, PML (da Silva, McGuinness, & Fikes 2004); TELLIS(Gil

& Ratnakar 2002)

Page 17: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

17

@

Web-scale semantic web data access model

agent data access service the Web

Discover RDF Docs

ask (term)Compose query

ask (query)

inform (term URIrefs)

Fetch docs

Compose LocalRDF graph

Query localRDF graph

Digest RDF Docs &Terms

inform (doc URLs)

Search Terms

Search RDF Docs

Page 18: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

18

@

Digesting and searching the Semantic Web Goals:

Web-scale semantic web data access model Data access service

Adaptive RDF document discovery Digest semantic web metadata Semantic web search and navigation model and service

Principles Scalable design Real world application

Evaluation Statistical report on collected metadata, web service usage Precision and recall of search result Users’ satisfaction on search and navigation model

Page 19: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

19

@

Related works: SW vs. Web IR vs. DB

SW vs. Web IR: vocabulary, data model, query SW vs. DB: implicit data, query scale, vocabulary

Page 20: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

20

@

Related works (cont’d) Swoogle

Ontology based annotation systems Annotate web documents

SHOE (UMCP, 1997) Ontobroker (AIFB, karlsruhe, 1998), WebKB (Martin & Eklund, 1999), QuizRDF (BT,2002)

Annotate proper reference & relations CREAM (AIFB,2003)

Ontology repositories DAML ontology library Schema Web Semantic web central

Semantic web ontology browsers W3C’s Ontaria (2004)

Semantic web instance databases Semantic web search

Discovery Meta-crawler focused crawler sw-crawler

Digest DC W3C’s Annotea OWL & RDFS

Search & Navigation Web IR (TFIDF) RDF database query

(e.g. RDQL, SPARQL) Term navigation (e.g.

Ontaria, Hyperdaml)

Page 21: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

21

@

Evaluating semantic web data quality Goals

Investigate dimensions of semantic web data quality Evaluate semantic web data quality

Ranking RDF resources and RDF documents Evaluating RDF graph trustworthiness

Trust and provenance based semantic web navigation model Principles

Semantic web data quality dimensions vary for different granularity and/or background knowledge

Evaluation Analytical analysis and proofs over navigation models and trust

propagation models Simulation (Ding et al. 2004b) for quantifying convergence &

effectiveness Application (Spire, SemDis) users’ feedback

Page 22: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

22

@

Related works Data quality dimensions

Information science (Wang, Storey, & Firth 1995) categorize data quality dimensions by domain interests Integrity (Database) User-satisfaction (Psychology) Statistics (auditing methods) Ontological world-modeling (Wand & Wang 1996)

Imperfect information Taxonomy : (Smithson 1989) (Smets 1991) (Parsons 1996) Computational models (Parsons & Hunter 1998)

probabilistic theory, possibility theory, evidence (Dempster-Shafer) theory.

Page 23: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

23

@

Related works (cont’d) Ranking

Complex network analysis (Newman 2003) Text document ranking Web page ranking:

PageRank (Page et al. 1998; Haveliwala 1999), Hits(Kleinberg,1998)

Semantic ranking: Ranking RDF resources: (H.Zhuge & Zheng 2003) Ranking RDF document: Swoogle (contributed by Tim Finin, Rong Pan, 2004)

Social network analysis Trustworthiness

Content analysis: RDF graph difference (Berners-Lee & Connolly 2004).

Context analysis: semantic web trust layer Information security (Hyvonen 2002) Trust network (Golbeck, Parsia, & Hendler 2003; Richardson, Agrawal,

& Domingos 2003; R.Guha et al. 2004; Ding,et al. 2004) semantic web publishing (Carroll & Bizer 2004). SWAD-Europe’s trust ontology (Arenas et al. 2004)

Page 24: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

@

3. Research Plan

Research objectives and status

Page 25: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

25

@

Research objectives and statusPhase Objectives Artifacts to produce

1 WOB WOB-core ontology (w provenance)

RDF graph reference language

Provenance

translated WOB-core instances

2 Swoogle adaptive discovery agent

semantic web metadata *

search and navigation services*

Swoogle statistics *

3 SW data quality

WOB-quality extension

navigation and ranking model

trust inference algorithms

trust based navigation model

4 Finalize Dissertation

1 .Prototype

2. Complete & revise prototype

3. Evaluation & Justification

Spiral research model

* This is a joint work with others.

Page 26: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

@

WOB-Core ontology RDF graph reference Provenance Status and next step

Preliminary and planned Work: Web Of belief

(WOB)

Page 27: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

27

@

Agent World

WOB-core ontology

The RDF Graph World

The Web

wob:RDFDocument

wob:RDFgraphRef

foaf:Agent

rdfs:Resource

owl:Ontology

foaf:Person

foaf:Document

Association

wob:source

wob:Association

wob:connective

rdfs:domain

rdfs:subClassOf

rdfs:subClassOf

rdfs:subClassOf

wob:isdefinedby

dc:source

wob:creator

wob:sourceDocument

rdfs:subPropertyOf

rdfs:subPropertyOf

Page 28: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

28

@

RDF graph reference Reference entire RDF graph

Reference the RDF graph from a document Reference the RDF graph defined by usePattern

Reference partial RDF graph Accept a set of triples Reject a set of triples Special cases

Referencing class instance Wildcard: “John hasChild _:x”

Page 29: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

29

@

RDF graph reference: an example

wob:RDFGraphRef

wob:RDFDocument

http://foo.com/ex1.rdf

wob:SimpleTriple

foo:George

ex:livesIn

ex:Texas

rdf:type

rdf:type

rdf:type

wob:sourceDocument

wob:usePattern

wob:subject

wob:predicate

wob:objectfoo:George

foaf:mbox

ex:livesIn

[email protected]

ex:Taxas

http://foo.com/ex1.rdf

Page 30: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

30

@

Provenance in the Semantic Web Where Whom why Definition

RDF Resource dc:source dc:creator rdfs:isDefinedBy

RDF graph

RDF document dc:source dc:creator

We differentiate the rdfs:range of provenance relation The scope of provenance property

Minimum semantic element: the semantic will not be complete when any triple is removed Complete: the entire sub-tree URI-complete: minimal sub-tree ends without blank nodes

dc:creator semantics Class instance Class/property definition Document

Page 31: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

31

@

Provenance of RDF graph

Bob (said so)

http://foo.com/example.owl

“A is sub class of B”

whom where

why

implies“A is sub class of C”“C is sub class of B”

“Transitive rule”owl:Class

ex:A ex:Brdfs:subClassOf

rdf:type rdf:type

supports “x is instance of both A and B”

whom

Page 32: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

32

@

Provenance of RDF resource and RDF document

Bob (said so)

http://foo.com/example.owl

“A is sub class of B”

foo.com

Whom(dc:creator)

where(dc:source)

owl:Class

ex:A ex:Brdfs:subClassOf

rdf:type rdf:type

Definition(rdfs:isDefinedBy)

Whom(dc:publisher)

where(dc:source)

Whom(dc:creator)

why

Page 33: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

33

@

Proof

WOB-provenance

Agent RDF Document

RDF Graph

Website

wob:sourceDocument

TBD

• wob:creator• dc:creator

rdfs:subClassOf

RDF Resource• wob:sourceDocument• wob:isDefinedBy• rdfs:isDefinedBy• dc:source

• wob:sourceDocument • dc:source

TBD

• wob:creator• dc:creator

• wob:creator• dc:creator

Page 34: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

34

@

Status and next step We have

Constructed WOB conceptualism Proposed prelim RDF graph reference language Classified provenance in the Semantic Web

We will Refine and evaluate WOB-core ontology Complete RDF graph reference language Add why-provenance Populate WOB-core instances using rule based

translation Evaluate WOB-core ontology

Page 35: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

@

Preliminary and planned Work: Swoogle

Discovery Digest Search and navigation Status and next step

Page 36: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

36

@

The role of Swoogle in the Semantic Web

Semantic WebServices

Semantic web data

Software Agents, Applications

SW data service

database(Web) document

RDF document

usesuses

Directory/Digest Service

Service Finder

digestsdigests

searches

Data Finder Swoogle

Page 37: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

37

@

Discovery - research Crawlers

Google-crawler Focused-crawler Semantic-Web-crawler, e.g. scutter

RDF document word indicator Keywords (positive list and negative list)

filetype: 10 positive, over 100 negative url-pattern content-pattern

Google cat-words (to refine Google query) Revisiting URLs

The would-be RDF document The out-of-date RDF document: changed, deleted The redirected RDF document

Page 38: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

38

@

Discovery – current status Crawler performance

Google crawler is the best Focused crawler needs to be improved

1/3 URLs are verified pure RDF documents Embedded RDF graph.

  RDF docs Non-RDF docs Undecided TOTAL

Focused Crawler 1,465 7% 10,580 52% 8,292 20,337

google crawler 273,023 36% 369,371 49% 110,794 753,188

SW_crawler 61,870 15% 285,506 70% 57,709 405,085

TOTAL 336,358   665,457   176,795 1,178,610

Source: Swoogle (2005-Jan-05) SELECT `discovered_by`, sum(isRDF), sum(1-isRDF), count(*) FROM `digest_url` WHERE 1 group by discovered_by

Page 39: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

39

@

Digest -- research RDF document annotation (join work) RDF graph abstract Ontological term definition Relations (join work)

Document-term relation Document-document relation Term-term relation

Page 40: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

40

@

RDF document annotation (join work) Document

filetype (suffix of URL) When/how discovered Last modified time Document hash Crawling info

RDF/OWL level RDF Syntax SW language OWL species Provenance (creator, publisher)

Ontology Label Version Comment

Page 41: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

41

@

RDF graph abstract Possible models

Bag-of-word : literal, local name of resource Bag-of-URI: URIrefs of non-blank RDF node Triple: swangled triple digest (Mayfield & Finin 2003) Ontological term: defined/referenced/populated

class/property Namespace: used/defined namespace Identity: identity of class instance

Possible methods Document vector Bloom filter (Bloom 1970)

Page 42: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

42

@

Ontological term definition

foaf:name

rdf:type owl:Classrdf:type

“Person”rdfs:label

foaf:name

“Tim Finin”

“Tim’s FOAF File”dc:title

foaf:mbox

rdfs:domain

foaf:Agent

rdfs:subClassOf

Term Definition• rdfs:subClassOf -- foaf:Agent• rdfs:label – “Person”

Empirical C-P bond• foaf:name• dc:title

Ontological C-P bond• foaf:mbox• foaf:name

rdfs:domain

file1

file3file2

foaf:Person

Page 43: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

43

@

Relations: doc-term; doc-doc; term-term

rdfs:Resource

wob:RDFDocument

owl:Ontology

rdfs:subClassOf

swoogle:isUsedBy

swoogle:sameNamespaceswoogle:sameLocalnameC-P bond, P-C bondany RDF triple

swoogle:uses

swoogle:defines

wob:isDefinedBy

swoogle:populatesClassswoogle:populatesPropertyswoogle:refersClassswoogle:refersProperty

swoogle:definesClassswoogle:definesProperty

foaf:Documentrdfs:seeAlsordfs:isDefinedBy

swoogle:officialOntoswoogle:extensionOnto

owl:importsowl:priorVersionowl:backwardCompatibleWithowl:imcompatiableWith

Page 44: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

44

@

Search & Navigation -- researchThe Semantic Web is not simply the Web

Search service Document search – RDF document is not free text Term search – URIref contains compound local

name

Navigation service The RDF graph – Typed links The web of RDF documents – Few hyperlinks The social network of agents – trust & provenance

Page 45: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

45

@

URL

URIref

Semantic web search/navigation model

Resource

RDF Document

uses definesisDefinedBy

officialOntoextensionOnto

OntologyPropertyrdfs:seeAlsordfs:isDefinedBy

Ontology

isUsedBy

rdfs:subClassOf

sameNamespacesameLocalnameANY RDF PROPERTY

Term Search

Document Search

1

2 34

5

6 7

• Keywords+ Filters

• Keywords+ Filters• SPARQL• RDF graph

Page 46: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

46

@

Status and next step We have

Built a automatic semantic web discovery agent Digested part of semantic web metadata

RDF document annotation Relations: res-res; res-doc; doc-doc

Proposed semantic web search/navigation model with prototype implementation

We will Make the agent adaptive Explore efficient RDF graph abstract Provide a complete search/navigation service, esp.

Swoogle search with SPARQL search support Ontology dictionary with user-friendly navigation interface

Complete Swoogle web service Complete Swoogle statistics for quantitative evaluation

Page 47: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

@

Preliminary and planned work:

Semantic Web Data Quality

Dimensions of semantic web data quality Ranking RDF resources and RDF documents Evaluate RDF graph trustworthiness Trust based navigation Status and next step

Page 48: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

48

@

Dimensions of semantic web data qualityRDF graph RDF graph +RDFS/OWL SW metadata SW metadata

+trust

weighted directed graph

RDF graph SW + Web SW + Web + agents

Term Importance centrality betweenness

rel-vaguenss Importance Importance

RDF

Document

Importance

RDF

graph

graph structure

definition closeness semantic consistency rel-completeness

credibility credibility

Agent credibility

More to consider term correlation (C-P bond, P-C bond)

Page 49: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

49

@

Ranking RDF documents and RDF resources PageRank like navigation model

Background knowledge decides w(p) – how credits are distributed along semantic paths from one node

Different context RDF graph as weight directed graph RDF graph + RDFS/OWL RDF graph + RDFS/OWL + WOB (semantic web metadata)

Page 50: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

50

@

Navigation model 1: RDF graph

RDF node

Named edge

Let wg(e) be the frequency of named edges in the given RDF graph

Given a node p, each edge e from p is assigned with weight wg(e), and w(p) is the normalized vlaue

Page 51: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

51

@

Navigation model 2: RDF graph +RDFS/OWL

Individual

Class

Meta Class

Property

typetype

typeLiteral /

Resource

type*

Individual => Property is made by reading triple type* is valid in OWL-FULL semantics Literals and non-instance resources are ignored

Except owl:InverseFunctionalProperty is considered (OWL-FULL)

InverseFunctionalProperty

Page 52: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

52

@

a2http://foo.com/ex.owl

wob:sourceDocumentwob:RDFDocument

rdf:type

foaf:Document

rdfs:subClassOf

rdfs:Class

rdf:type rdf:type

rdfs:Property

wob:source

rdfs:label

rdfs:range rdfs:subPropertyOf

rdf:type

dc:title

An example

Page 53: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

53

@

Navigation model 3: RDF graph +RDFS/OWL+WOB

Individual

Class

Meta Class

Property

typetype

type

RDF Document

Ontology

We assume Swoogle search/navigation services is used. Rank RDF resources and RDF documents together

Page 54: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

54

@

Evaluating trustworthiness [Definition] A philosophical and context dependent concept.

Common interpretations are reliance, faith, and confidence. Examples

“Is the triple (foo:George ex:livesIn foo:WhiteHouse) credible? ”

“Does foo:George (an instance of foaf:Person) always telling truth? ”

Related terms Belief: Trustworthiness of an RDF graph (by individual agent) Trust: Trustworthiness of an agent’s beliefs (by individual agent)

[KR] An agent’s belief (assertion) [ML] A hypothesis of the other agents’ belief quality [SNA] A context dependent inter-agent relation

Reputation: Social trustworthiness of an agent (by the public)

Page 55: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

55

@

How statement is justified trustworthy

I’ve been to Foo many times, and the food was always good!

I’ve been to Foo many times, and the food was always good!

I believe that “Restaurants with good outlook are good” “Foo has good outlook”;

I believe that “Restaurants with good outlook are good” “Foo has good outlook”;

My friends (who have similar taste as me ) said so.

My friends (who have similar taste as me ) said so.

No better alternativeNo better alternative

inductive

deductive

conclusive (mimic)

prima facie (at first view)Foo is a good

restaurant

I believe that “Good restaurants has good outlook” “Foo has good outlook”;

I believe that “Good restaurants has good outlook” “Foo has good outlook”;

abductive

Page 56: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

56

@

Trust propagation in justification Deductive – trustworthiness propagates from the premise w.r.t.

inference rule P -> Q, tv(Q) = tv(P) *tv(P->Q)

Abductive – trustworthiness propagates from the consequence w.r.t. trustworthiness of reversing inference rule P-> Q tv(P) = tv(Q) * f ( tv(P->Q) ) Bayes

Inductive – trustworthiness is derived from past experiences Argumentation – logic coherence Knowledge similarity – statistic coherence

Conclusive – trustworthiness propagates from the other agents through social trust relation Trust(A,B) tv(S,A) = tv(trust(A,B)) * tv(S,B) Recommendation

prima facie – blind trust Tv(S) = constant (normal reputation) Largest take all

Page 57: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

57

@

Agents

The given RDF Graph

RDF graph(w ontology)

Evaluate RDF graph trustworthiness

S1S2

Foaf:person rdf:type owl:ClassS3

Foaf:person rdf:type rdfs:Class

foaf:knows

Foaf:Person

rdfs:Classowl:Classrdfs:subClassOf

(social network) Joe Mike

trusts

believesdisbelieves(Conflict belief)

1

2

3

4

Remove independent assumption by using more data

Page 58: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

58

@

Trust and provenance aware navigation Mechanism

Only pursue highly trusted Shortest distance principle Derive trustworthiness

using weighted consensus No delegation

Complexity control Search Branch – trust filter Search Depth

small world Initiator’s control

ba d

e gf

c

h

initiator distance=0

distance=1

distance=2

domain-refer refer-refer

Page 59: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

59

@

Status and next step We have

Revealed some dimensions of semantic web data quality Proposed some ranking mechanisms based on different

navigation models and background knowledge Proposed some trust evaluation mechanisms based on

different background knowledge Proposed a trust based navigation model

We will Consolidate semantic web data quality dimensions with

more formal description Evaluate, justify and improve ranking and trust evaluation

mechanims

Page 60: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

@

Summary

[R] Thesis Statement [R] Contributions to computer science Research time table Planned milestones

Page 61: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

61

@

Thesis Statement

Finding and evaluating information in the large scale Semantic Web is critical to users’ adoption but is not met yet. We developed Web of Belief (WOB) ontology, Swoogle data access service and data quality evaluation mechanisms to address these issues. These tools are proven to be effective in building semantic web metadata and boosting web-scale semantic web data access in applications like SemDis and Spire.

Page 62: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

62

@

Contributions to computer science WOB is the first ontology that captures and collects the metadata

of the Semantic Web and its context RDF graph reference language Finer provenance model

Swoogle is one of the first data access services that digest and search the web-scale semantic web. Adaptive semantic web discovery agent Semantic web metadata

RDF graph abstract Ontology dictionary Recognized more relations among resources and document

Semantic web search and navigation model and service One of the first works that investigate semantic web data quality

Ranking the Semantic Web. We identified multiple navigation models for ranking.

Evaluate RDF graph's trustworthiness.

Page 63: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

63

@

A tentative research time tablePhase Objectives Artifacts to produce Status

(%)

Time

(months)

1 WOB WOB-core ontology 60 0.5 3

RDF graph reference language 30 1

Provenance 50 0.5

translated WOB-core instances 0 1

2 Swoogle adaptive discovery agent 50 1 5

semantic web metadata * 50 1

search and navigation services * 30 2

Swoogle statistics * 30 1

3 SW Quality WOB-quality extension 20 1 6

navigation and ranking model 40 2

trust inference algorithms 50 2

trust based navigation model 80 1

4 Finalize Dissertation 4 4

TOTAL 18

* This is a joint work with others.

Page 64: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

64

@

Planned milestones WOB-core ontology

It covered all required meta-concepts in Spire and SemDis. Swoogle

It indexed all semantic web data needed by Spire and SemDis. We are expecting millions of RDF documents to be indexed.

It performed better than Google or other semantic web portals in searching ontologies and URIrefs throughout the Web. We are also looking forward to searching class-instance data.

Semantic web data quality RDF documents and RDF resources can be ranked reasonably

using semantic web metadata in WOB. We are expecting users’ satisfaction about Swoogle search precision.

RDF graph trustworthiness can be evaluated reasonably by using trust and provenance information in WOB.

Page 65: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

65

@

PublicationsRefereed Publications Li Ding et al.,

"On Homeland Security and the Semantic Web: A Provenance and Trust Aware Inference Framework", InProceedings, Proceedings of the AAAI SPring Symposium on AI Technologies for Homeland Security, March 2005.

Li Ding et al., "How the Semantic Web is Being Used:An Analysis of FOAF", InProceedings, Proceedings of the 38th International Conference on System Sciences, January 2005.

Li Ding et al., "Analyzing Social Networks on the Semantic Web", Article, IEEE Intelligent Systems, January 2005.

Li Ding et al., "Swoogle: A Search and Metadata Engine for the Semantic Web", InProceedings, Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management , November 2004.

Li Ding et al., "Modeling and Evaluating Trust Network Inference", InProceedings, Seventh International Workshop on Trust in Agent Societies at AAMAS 2004, July 2004.

Li Ding et al., "Trust Based Knowledge Outsourcing for Semantic Web Agents", InProceedings, Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence, October 2003.

Youyong Zou et al., "Using Semantic web technology in Multi-Agent systems: a case study in the TAGA Trading agent environment", Article, Proceeding of the 5th International Conference on Electronic Commerce , September 2003.

Non-Refereed Publications Li Ding et al., "Weaving the Web of Belief into the Semantic Web", Misc, submitted to

WWW2004, May 2004.

Page 66: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

66

@

Selected references Berners-Lee, T., and Connolly, D. 2004. Delta: an ontology for the distribution of differences between rdf

graphs. http://www.w3.org/DesignIssues/Diff. Bloom, B. H. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7):422–

426. Carroll, J. J.; Bizer, C.; Hayes, P.; and Stickler, P. 2004. Named graphs, provenance and trust. Technical

Report HPL-2004-57, HP Lab. Cui, Y.; Widom, J.; and Wiener, J. L. 2000. Tracing the lineage of view data in a warehousing

environment. ACM Trans. on Database Systems 25(2):179–227. da Silva, P. P.; McGuinness, D. L.; and Fikes, R. 2004. A proof markup language for semantic web

services. Technical Report KSL-04-01, Stanford. da Silva, P. P.; McGuinness, D. L.; and McCool, R. 2003. Knowledge provenance infrastructure. Data

Engineering Bulletin 26(4):26–32. Fox, M., and Huang, J. 2003. Knowledge provenance: An approach to modeling and maintaining the

evolution and validity of knowledge. Technical report, University of Toronto. Gil, Y., and Ratnakar, V. 2002. Trusting information sources one citizen at a time. In Proceedings of

International Semantic Web Conference 2002, 162–176. Golbeck, J.; Parsia, B.; and Hendler, J. 2003. Trust networks on the semantic web. In Proceedings of

Cooperative Intelligent Agents. Grandison, T., and Sloman, M. 2000. A survey of trust in internet application. IEEE Communications

Surveys Tutorials (Fourth Quarter) 3(4). Hunter, A., and Parsons, S., eds. 1998. Applications of Uncertainty Formalisms. Springer. Hyvonen, E. 2002. The semantic web – the new internet of meanings. In Semantic Web Kick-Off in

Finland: Vision, Technologies,Research, and Applications. H.Zhuge, and Zheng, P. 2003. Ranking semantic-linked network. In www 2003. Josang, A. 1997. Prospectives for modelling trust in information security. In Proceedings of Australasian

Conference on Information Security and Privacy.

Page 67: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

67

@

Selected references (cont’d) Kanh, B. K.; Strong, D. M.; and Wang, R. Y. 2002. Information quality benchmarks: Product and service

performance. Communications of the ACM 45(4):184–192. Kleinberg, J. 1998. Authoritative sources in a hyperlinked environment. In Proceedings of ACM-SIAM

Symposium on Discrete Algorithms. Mayfield, J., and Finin, T. 2003. Information retrieval on the semantic web: Integrating inference and

retrieval. In Proceedings of the SIGIR 2003 Semantic Web Workshop. McDermott, D. 2001. Why rdf’s reification doesn’t work.

http://lists.w3.org/Archives/Public/wwwrdf-logic/2001Apr/0066. McKnight, D. H., and Chervany, N. L. 1996. The meanings of trust. MISRC Working Paper Series. Newman, M. E. J. 2003. The structure and function of complex networks. SIAM Review 167–256. Page, L.; Brin, S.; Motwani, R.; and Winograd, T. 1998. The pagerank citation ranking: Bringing order to

the web. Technical report, Stanford Digital Library Technologies Project. Parsons, S., and Hunter, A. 1998. A review of uncertainty handling formalisms. In Applications of

Uncertainty Formalisms. Parsons, S. 1996. Current approaches to handling imperfect information in data and knowledge bases.

Knowledge and Data Engineering 8(3). R.Guha; Kumar, R.; Raghavan, P.; and Tomkins, A. 2004. Propagation of trust and distrust. In

Proceedings of the 1st Workshop on Friend of a Friend, Social Networking and the Semantic Web. Richardson, M.; Agrawal, R.; and Domingos, P. 2003.Trust management for the semantic web. In

Proceedings of the Second International Semantic Web Conference. Smets, P. 1998. Probability, possibility, belief: Which and where. Quantified Representation of Uncertainty

and Imprecision 1:1–24. Smithson, M. J., ed. 1989. Ignorance and Uncertainty: Emerging Paradigms. Springer Verlag. Wand, Y., and Wang, R. Y. 1996. Anchoring data quality dimensions in ontological foundations.

Communications of the ACM 39(11):86–95. Wang, R.; Storey, V.; and Firth, C. 1995. A framework for analysis of data quality research. IEEE

Transactions on Knowledge and Data Engineering 7(4):623–639.

Page 68: @ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:

68

@

Some ontologies and their QNamesQName Name URL

rdf Resource Description Framework http://www.w3.org/1999/02/22-rdf-syntax-ns#

rdfs Resource Description Framework schema

http://www.w3.org/2000/01/rdf-schema#

owl Web Ontology Language http://www.w3.org/2002/07/owl#

rss RDF site summary http://purl.org/rss/1.0/

foaf Friend Of A Friend http://xmlns.com/foaf/0.1/

dc Dublin Core Elements http://purl.org/dc/elements/1.1/

bio A vocabulary for biographical information http://vocab.org/bio/0.1/

cc creative commons http://web.resource.org/cc/

trustix (used but not publicly defined) http://www.trustix.net/schema/rdf/spi-0.0.1#

wordnet Wordnet (Princeton U.) http://xmlns.com/wordnet/1.6/


Recommended