Creating and Sharing Structured Semantic Web Contents through the Social Web (Main Evaluation) Aman...

Post on 11-Jan-2016

218 views 0 download

Tags:

transcript

Creating and Sharing Creating and Sharing Structured Semantic Web Structured Semantic Web Contents through the Social Contents through the Social WebWeb

(Main Evaluation)

Aman ShakyaAdvisor: Prof. Hideaki Takeda

Sub-advisors: Assoc. Prof. Nigel Collier

Assoc. Prof. Kenro Aihara

OutlineOutlineIntroduction

◦ Social Semantic Web◦ State-of-art and Problems

Proposed approach◦ The StYLiD system◦ Concept consolidation◦ Concept grouping

Evaluation

Practical applicationsConclusions

7/27/2009 main evaluation 2

Introduction

7/27/2009 3main evaluation

BackgroundBackgroundInformation Sharing

◦Information publishing◦Understandable semantics◦Information dissemination

Shared information◦Better utilization Increased value

Shared information put together◦Valuable knowledge

7/27/2009 main evaluation 4

Social Web and Web 2.0Social Web and Web 2.0

◦Easy to publish, understand and use◦Information sharing platform◦User generated contents◦Connecting people◦Collaboration

◦Mass participation – Power of People◦Wisdom of the crowds

7/27/2009 main evaluation 5

Current Limitations and Current Limitations and NeedsNeeds

Data processing and automation◦Unstructured data only for humans

Interoperability◦Sharing data across

different applications

Integration◦Combining data from different applications

7/27/2009 main evaluation 6

The Semantic WebThe Semantic Web

Web of Structured DataMachine understandable semantics

Ontologies◦Represent Conceptualizations of things◦Consensus and common formats

Enables◦Automated processing ◦ Interoperation and Integration◦Effective search and browsing

7/27/2009 main evaluation 7

ChallengesChallengesDifficult to publish on the Semantic Web

Wide variety of data to share◦ Long Tail of information domains (Hunyh et al. 2007)

Not enough ontologiesOntology creation is a difficult process

Goal - To enable people to easily share wide variety of semantically structured data

7/27/2009 main evaluation 8

?

Social Semantic WebSocial Semantic Web

Social software + Semantic WebWeb 3.0

7/27/2009 main evaluation 9

Social Semantic Web

Information connectivity

- Adapted from (Decker, 2005)

State-of-Art: Social Semantic Web Structured content creation on the

Social Semantic Web

Direct Structured Contents Derived Structured Contents

Instance Data Creation

Ontology + Instance Data creation

Semantification of Social Data

Semantics from Text

Semantics of Tags

Semantic Blogging

Semantic Bookmarking

Semantic Desktop

Semantic Wikis

Collaborative Ontology Creation

7/27/2009 main evaluation

Semantic Annotation

Data Exporters

Scrapers

Emergent Semantics

10

Collaborative Knowledge Base Creation

Collaborative Knowledge Base

Users Users

7/27/2009 main evaluation 11

Knowledge base = ontology + instance data

Collaborative Knowledge Base Creation Systems

Ease of use

Expressiveness

Constraints

Multiplicity

Consensus

Semantic WikisSMW, ikeWiki,

etc

Complexextended

wiki syntax, some

training needed

ModerateMainly instances, concept schemas

possible

strict type

constraints

No NeededWiki way

Freebase

Metaweb Inc.

Moderate

Interactive but

elaborate interface

ModerateConcept schemas,

instances

strict type

constraints

Allowed but

concepts not

related

Mostly neededWiki way,

by admin

my-

Ontology

Siorpaes & Hepp, 2007

Complexunderstanding of ontology

needed

ModerateConcepts, relations, instances

Strict logical

constraints

No NeededWiki way

Ontology

Maturing

Braun et al., 2007

Fairly easy

need to build taxonomy

LowConcept hierarchy

free tagging

No NeededBy

interaction

Desired Solution

Easy Moderate Minimum Yes Optional

7/27/2009 12main evaluation

Problems1. Complexity and learning curve

◦ Powerful collaborative systems difficult for ordinary people

2. Difficult to create perfect concept definitions and ontologies

◦ Difficult to accommodate all requirements

◦ Strict constraints can make the model rigid

3. Existence of multiple conceptualizations◦ Different perspectives or contexts

4. Difficulty of collaboration and consensus

7/27/2009 main evaluation 13

Proposed Approach

7/27/2009 14main evaluation

Proposed Collaborative Knowledge Base Creation

Collaborative Knowledge Base

Users

Users

Local KB

Local KB

Local KB

Users

7/27/2009 main evaluation 15

Overview of Proposed Approach

Social Platformfor

Structured Data Authoring

Concept Grouping

Concepts

Instances

Structured Data Collection

Browsing, Searching,Services

Concept Consolidation

Schema Alignment

Structured Linked Data Grouped

concepts

User Community

Emerging Lightweight Ontologies

7/27/2009 main evaluation 16

StYLiDStYLiDStructure Your own Linked Data

http://www.stylid.org

Social Software for Sharing a wide variety of Structured Data

Users freely define their own concepts Easy for ordinary people

Consolidate multiple concept schemasGroup and organize similar concepts

Popular evolving concepts definitions

7/27/2009 17main evaluation

Creating a new ConceptList of Attributes

Description

Suggested Value Range

7/27/2009 18main evaluation

Or Reuse / Modify existing Concept

“Hotel” Concept

Instance DataLiteral value

Pick value from Suggested range

External URI

Multiple Values

19

Resource URI

Shinjuku Prince Hotel

7/27/2009 main evaluation

Concept ConsolidationConcept Consolidation

7/27/2009 main evaluation 20

Hotel 1

Name

Amenities

Capacity

Contact

Price

Access

Rating

Hotel 2

Name

Facilities

No. of rooms

Phone-number

Single room price

Double room price

Nearest station

Category

Address

Hotel 3

Name

Price

Rating

City

Country

Near-by attractions

Hotel 4

Name

Phone-number

Zip-code

Latitude

Longitude

No. of stories

sameSynonymous / different labels

Different Contexts / PerspectivesMany-to-one Complimentary

7/27/2009 main evaluation 21

Hotel (Consolidated Concept )

Name

Facilities

Capacity

Contact

Single room price

Double room price

Access

Rating

Address

Zip-code

Latitude

Longitude

Near-by attractions

No. of stories

Consolidated Concept

Concept ConsolidationConcept Consolidation A concept consolidation C is defined as a triple

< , S, A> where◦ - consolidated concept

◦ S - set of constituent concepts {C1,C2 ,…..Cn}

◦ A is the attribute alignment between and S

Based on Global-as-View (GAV) approach for data integration (Lenzerini, 2002)

◦ Global schema defined as views on source schemas

Consolidated Concept with consolidated attributes◦ aligned to source concept attributes as views

CC

C

C

7/27/2009 main evaluation 22

Concept ConsolidationConcept Consolidation

23

C1a2a

ma

iCaligned( , )

aligned( , )

1a 1ia2ia

inia

1ia

2a 2ia

aligned( , )ma inia

)( 1ia)( 2

ia

)( inia

view

1C

nC

iM

nM

1M

A = { , … }1M 2M nM

image

< , S, A>C

7/27/2009 main evaluation 23

)( 1ia

k k

Concept ConsolidationConcept ConsolidationConsolidated view of instances

Translation of instances◦From one conceptualization to another

Query Unfolding (Advantage of GAV over LAV)

◦Queries over (in terms of attributes)

to queries over {C1,C2 ,…..Cn}

◦Using alignment A◦Union of results

Translation of queries

C

7/27/2009 main evaluation 24

))(,(),( jj akvakv

)(.....)()()( 2211 nn CQCQCQCQ

Concept CloudConcept Cloud

Sub-Cloud

Consolidated concept

7/27/2009 main evaluation 25

Experiment on ConceptualizationHypothesis

◦ Multiple conceptualizations by different people for the same thing can be consolidated

Methodology◦ Participants given short text passages (6

participants)

◦ List down Facts structured as (Attribute, Value) table

All concept schemas aligned manually

attribute

value

name Kiyomizu

location Kyoto

….. …..

26

Concept schema

7/27/2009 26main evaluation

ObservationsObservations

7/27/2009 main evaluation 27

Types of Alignment Relations found

Attribute label similarity

RemarksRemarksPeople can express their conceptualizations

in terms of schemaDifferent people have different

conceptualizations◦ No one covers all possible attributes

Conceptualizations overlap significantlyMost parts can be alignedMost have simple alignment relations

Multiple conceptualizations can be consolidated

287/27/2009 28main evaluation

Alignment of Concept Alignment of Concept SchemasSchemas

Attribute Alignments suggested Automatically◦ Alignment API implementation (with WordNet extension)

(Euzenat, 2004)

Community-supported alignment◦ Human intelligence + Machine intelligence

Alignments are represented and saved◦ Alignment ontology (Hughes and Ashpole, 2004)

◦ Alignment API alignment specification language (Euzenat et al., 2004) Other formats : C-OWL, SWRL, OWL axioms, XSLT, SEKT-ML and SKOS.

◦ Incremental alignment (maintained collaboratively)

A Unified View◦ Consolidated concept with Consolidated Attributes◦ Homogenous table of data

297/27/2009 main evaluation 29

Two Hotel concepts

x

7/27/2009 main evaluation 30

Consolidated attributes

Semi-automatic Schema Semi-automatic Schema AlignmentAlignment

Search on Consolidated Concept

Consolidated Structured Consolidated Structured SearchSearch

7/27/2009 main evaluation 31

Find all hotels with location “Tokyo” and type “luxury”

Hotel 1 ---- Hotel 2location address

type category

Concept GroupingConcept Similarity

ConceptSim(C1, C2) = w1*NameSim(N1, N2) + w2*SchemaSim(S1, S2)

NameSim ◦ WordNet-based similarity - Lin’s algorithm (1998)◦ Levenshtein distance

SchemaSim ◦ Average similarity of best matching pairs of

attributes

Calculate ConceptSim between all pairs of concepts

Group similar concepts above Threshold 327/27/2009 main evaluation 32

Schema SimilaritySchema SimilarityCalculate NameSim for all pairs of attributes to create

an n1*n2 matrix

M = [NameSim(A1X A2)]

Find best matching pairs using Hungarian Algorithm (M)(Kuhn, 1955; Munkres, 1957)

Calculate matching averageSchemaSim(S1, S2) = 2xSimilarity of best matching pairs / (|A1|+|A2|)

Adapted from Semantic similarity between sentences (Simpson and Dao, 2005)

7/27/2009 main evaluation

A1A2

S1 S2

33

Visualization of Concepts Visualization of Concepts GroupingGrouping

Cytoscape

7/27/2009 main evaluation 34

Experiments on Freebase Experiments on Freebase DataData

Purpose◦ Evaluate automatic schema alignment◦ Evaluate proposed concept grouping method◦ Observations about user-defined concepts

Community-driven database of world’s information

User-defined Types – concept schemas◦ Queried out (May 20, 2008)

Cleaning◦ Filter out test types, stop-words, types without

instances357/27/2009 35main evaluation

ObservationsObservationsAfter cleaning

◦ 1,412 concepts◦ 500 users who defined concepts

People want to share a wide variety of data

People define their own concept schemas

Most people only define few concepts (1-5)◦ Long tail of information types

7/27/2009 main evaluation 36

Freebase Concept Consolidation Concepts with same name, synonyms,

morphological variants◦ 57 consolidated concepts formed

Multiple versions of concept by different users◦ Up to 6 versions of the same concept◦ Same user also defines multiple versions

Alignments suggested automatically◦ 51 alignment relations (44 aligned attribute sets)◦ Human judgement

◦ Precision 88.24%◦ Recall 67.16%

377/27/2009 37main evaluation

Concept Consolidation Concept Consolidation ExampleExample{Recipe (user1), Recipe (user2), Recipes (user3) ….}

r1 r2 r3

Consolidated concept - Recipe Consolidated attributes

◦ {r1#ingredient, r2#ingredients, r3#materials}

◦ {r1#steps, r2#instructions}

◦ r3#directions

◦ r2#tools_required

◦ r3#taste

◦ r3#author ……

Aligned attribute Sets

38

(adapted from Freebase)

7/27/2009 38main evaluation

Evaluation of Concept Evaluation of Concept GroupingGrouping

ConceptSim(C1, C2) = w1*NameSim(N1, N2) + w2*SchemaSim(S1, S2)

Concept grouping with different thresholds (w1 = 0.7, w2 = 0.3)

Concept grouping with different weights (threshold = 0.8)

397/27/2009 39main evaluation

Emergence of Lightweight OntologiesConcepts contributed by communityConcept consolidationConcept groupingPopularity of concepts (as in Tag

clouds)

Common vocabulary for structured information sharing

Conceptual schemas (class/property)Informal organization by similarity

7/27/2009 40main evaluation

Informal Lightweight Informal Lightweight OntologyOntology

7/27/2009 main evaluation

source: Schaffert et al. (2005) p. 7

41

Evaluation

7/27/2009 42main evaluation

Evaluation of UsabilityHypothesis

◦ StYLiD is more usable than Freebase (for given tasks)

Methodology◦ Tasks performed with StYLiD and Freebase

Task 1 - Structured data authoring Task 2 - Concept schema creation Task 3, 4 - Modifying and reusing concepts Task 5 - Structured concepts and instances authoring Task 6 - Searching

◦ Observations Questionnaires, screen logs, comments, etc

7/27/2009 43main evaluation

Example (Task 1)Example (Task 1)

7/27/2009 main evaluation 44

Input Band – The Beatles

ParticipantsParticipantsTotal 15 participants

◦Including 6 without IT background◦Different backgrounds

Public policy, international relations, psychology, telecommunication, networks, hotel staff, etc.

◦From 10 countries◦Age : 22 – 43 (avg. 28.3)◦Most did not know the systems before

7/27/2009 main evaluation 45

ResultsSystem Usability Scale (SUS) (Digital

Equipment Corp.)

◦Average scores: StYLiD – 69.7%, Freebase – 39.3% Enhanced Semantic MediaWiki – 54.8% (Pfisterer et al.,

2008)

Aggregated results from the Tasks (score: 0-4)

467/27/2009 main evaluation

Results for non-IT Results for non-IT participantsparticipants6 participantsSUS scores

◦StYLiD (71.67%), Freebase (50.42%)

7/27/2009 47

ObservationsObservationsStYLiD quite usable without any training,

knowledge or helpMost users preferred StYLiD to Freebase

Specifying attribute value range not easy Strict data type constraints can cause

problemsMany people modify and reuse concepts

People try to input all data in minimum steps Data entry can be made easier and quicker

◦ Auto-complete mechanisms would be helpful

7/27/2009 main evaluation 48

Comparison with some Comparison with some systemssystems

7/27/2009 main evaluation 49

StYLiD FreebaseSemantic MediaWiki

•Concept creation

UI supported UI supported Template markup

•Instance creation

Form-based Form-based Extended wiki syntax + forms

•Data authoring

Blogging / social bookmarking

Structured wiki Wiki text annotation

•Data import Wrappers Bulk import facility

Not possible

•Constraints Flexible Strict type constraints

Strict type constraints

•Multiplicity Allowed Partly No

•Consolidation Schema-level Some instances

No

•Organization Concept grouping

Bases Categories

Practical Applications

7/27/2009 50main evaluation

Application ScenariosSocial Site for

Structured Information Sharing

Concept Schemas

Structured data

External Data

Resources

StYLiD

CMS

IntegrationSchema

Alignment

Information Sharing Social

Semantic Website

Users

Users517/27/2009 51main evaluation

Application ScenariosIntegrated Semantic portal

Structured data

External Data

Resources

StYLiD

Data Backend

IntegrationSchema

Alignment

Integrated Semantic

Portal

UsersAdmin

Concept Schemas

IS1

IS2

IS3

Wrapper1

Wrapper2

Wrapper3

Information Sources

527/27/2009 52main evaluation

Adapting to different Adapting to different scenariosscenariosVariable aspects

◦Data and concepts acquisition ◦Community and motivation◦Functionalities and constraints◦Data quality

Ways of adaptation◦Use of wrappers, etc.◦Delegate functionalities/constraints◦Extensible and customizable open source◦Customized queries and views

7/27/2009 main evaluation 53

Real practical applicationsIntegration of research staff directories

◦Osaka university and Nagoya university◦Data scraped from the websites

A musical community website in Tokyo International Exchange Center

Social data bookmarking site StYLiD.org

A document management system in AIT

7/27/2009 54main evaluation

•10 alignments automatically suggested

•All correct

•Total 19 alignments

7/27/2009 55main evaluation

University Directory Integration

Integrated interface

7/27/2009 56main evaluation

TIEC Musical Community TIEC Musical Community websitewebsite

7/27/2009 main evaluation 57

7/27/2009 main evaluation 58

StYLiD.org Data Bookmarking

7/27/2009 main evaluation 59

Document Management system

Structured Information Structured Information Dissemination in Decentralized Dissemination in Decentralized CommunitiesCommunities

Publishing

Aggregation

SocioBiblog System

Publishing

Aggregation

SocioBiblog System

Publishing

Aggregation

SocioBiblog System

Publishing

Aggregation

SocioBiblog System

Web Extended RSS

Social network links

607/27/2009 60main evaluation

Conclusions

7/27/2009 61main evaluation

ConclusionsConclusionsSocial web application for sharing

structured Semantic Web contents◦ StYLiD ◦ Free contribution, no strict constraints◦ Usable (even without training)

Concept consolidation◦ Multiple conceptualizations exist◦ Overlap significantly and can be consolidated◦ Automatic alignments with good precision and recall◦ A loose collaborative approach for creating shared

concept definitions

7/27/2009 main evaluation 62

Conclusions (contd.)Conclusions (contd.)Concept grouping by similarity

◦ Informal organization◦ Good precision can be obtained◦ Parameters can be tuned for appropriate coverage

and precision

Emergent lightweight informal ontologies◦ Ontology as by-product of information sharing and

integration

Practical applications7/27/2009 main evaluation 63

Future DirectionsFuture DirectionsComputing concept relations

Hierarchical and non-hierarchicalBetter schema alignment techniquesConsolidation of data instancesUsing existing vocabulariesMash-ups / plugins to utilize structured

dataScrapers to collect data from the web…

7/27/2009 main evaluation 64

Thank You!Thank You!QuestionsSuggestions

7/27/2009 main evaluation 65