Date post: | 11-Jan-2016 |
Category: |
Documents |
Upload: | erik-phelps |
View: | 218 times |
Download: | 0 times |
Creating and Sharing Creating and Sharing Structured Semantic Web Structured Semantic Web Contents through the Social Contents through the Social WebWeb
(Main Evaluation)
Aman ShakyaAdvisor: Prof. Hideaki Takeda
Sub-advisors: Assoc. Prof. Nigel Collier
Assoc. Prof. Kenro Aihara
OutlineOutlineIntroduction
◦ Social Semantic Web◦ State-of-art and Problems
Proposed approach◦ The StYLiD system◦ Concept consolidation◦ Concept grouping
Evaluation
Practical applicationsConclusions
7/27/2009 main evaluation 2
Introduction
7/27/2009 3main evaluation
BackgroundBackgroundInformation Sharing
◦Information publishing◦Understandable semantics◦Information dissemination
Shared information◦Better utilization Increased value
Shared information put together◦Valuable knowledge
7/27/2009 main evaluation 4
Social Web and Web 2.0Social Web and Web 2.0
◦Easy to publish, understand and use◦Information sharing platform◦User generated contents◦Connecting people◦Collaboration
◦Mass participation – Power of People◦Wisdom of the crowds
7/27/2009 main evaluation 5
Current Limitations and Current Limitations and NeedsNeeds
Data processing and automation◦Unstructured data only for humans
Interoperability◦Sharing data across
different applications
Integration◦Combining data from different applications
7/27/2009 main evaluation 6
The Semantic WebThe Semantic Web
Web of Structured DataMachine understandable semantics
Ontologies◦Represent Conceptualizations of things◦Consensus and common formats
Enables◦Automated processing ◦ Interoperation and Integration◦Effective search and browsing
7/27/2009 main evaluation 7
ChallengesChallengesDifficult to publish on the Semantic Web
Wide variety of data to share◦ Long Tail of information domains (Hunyh et al. 2007)
Not enough ontologiesOntology creation is a difficult process
Goal - To enable people to easily share wide variety of semantically structured data
7/27/2009 main evaluation 8
?
Social Semantic WebSocial Semantic Web
Social software + Semantic WebWeb 3.0
7/27/2009 main evaluation 9
Social Semantic Web
Information connectivity
- Adapted from (Decker, 2005)
State-of-Art: Social Semantic Web Structured content creation on the
Social Semantic Web
Direct Structured Contents Derived Structured Contents
Instance Data Creation
Ontology + Instance Data creation
Semantification of Social Data
Semantics from Text
Semantics of Tags
Semantic Blogging
Semantic Bookmarking
Semantic Desktop
Semantic Wikis
Collaborative Ontology Creation
7/27/2009 main evaluation
Semantic Annotation
Data Exporters
Scrapers
Emergent Semantics
10
Collaborative Knowledge Base Creation
Collaborative Knowledge Base
Users Users
7/27/2009 main evaluation 11
Knowledge base = ontology + instance data
Collaborative Knowledge Base Creation Systems
Ease of use
Expressiveness
Constraints
Multiplicity
Consensus
Semantic WikisSMW, ikeWiki,
etc
Complexextended
wiki syntax, some
training needed
ModerateMainly instances, concept schemas
possible
strict type
constraints
No NeededWiki way
Freebase
Metaweb Inc.
Moderate
Interactive but
elaborate interface
ModerateConcept schemas,
instances
strict type
constraints
Allowed but
concepts not
related
Mostly neededWiki way,
by admin
my-
Ontology
Siorpaes & Hepp, 2007
Complexunderstanding of ontology
needed
ModerateConcepts, relations, instances
Strict logical
constraints
No NeededWiki way
Ontology
Maturing
Braun et al., 2007
Fairly easy
need to build taxonomy
LowConcept hierarchy
free tagging
No NeededBy
interaction
Desired Solution
Easy Moderate Minimum Yes Optional
7/27/2009 12main evaluation
Problems1. Complexity and learning curve
◦ Powerful collaborative systems difficult for ordinary people
2. Difficult to create perfect concept definitions and ontologies
◦ Difficult to accommodate all requirements
◦ Strict constraints can make the model rigid
3. Existence of multiple conceptualizations◦ Different perspectives or contexts
4. Difficulty of collaboration and consensus
7/27/2009 main evaluation 13
Proposed Approach
7/27/2009 14main evaluation
Proposed Collaborative Knowledge Base Creation
Collaborative Knowledge Base
Users
Users
Local KB
Local KB
Local KB
Users
7/27/2009 main evaluation 15
Overview of Proposed Approach
Social Platformfor
Structured Data Authoring
Concept Grouping
Concepts
Instances
Structured Data Collection
Browsing, Searching,Services
Concept Consolidation
Schema Alignment
Structured Linked Data Grouped
concepts
User Community
Emerging Lightweight Ontologies
7/27/2009 main evaluation 16
StYLiDStYLiDStructure Your own Linked Data
http://www.stylid.org
Social Software for Sharing a wide variety of Structured Data
Users freely define their own concepts Easy for ordinary people
Consolidate multiple concept schemasGroup and organize similar concepts
Popular evolving concepts definitions
7/27/2009 17main evaluation
Creating a new ConceptList of Attributes
Description
Suggested Value Range
7/27/2009 18main evaluation
Or Reuse / Modify existing Concept
“Hotel” Concept
Instance DataLiteral value
Pick value from Suggested range
External URI
Multiple Values
19
Resource URI
Shinjuku Prince Hotel
7/27/2009 main evaluation
Concept ConsolidationConcept Consolidation
7/27/2009 main evaluation 20
Hotel 1
Name
Amenities
Capacity
Contact
Price
Access
Rating
Hotel 2
Name
Facilities
No. of rooms
Phone-number
Single room price
Double room price
Nearest station
Category
Address
Hotel 3
Name
Price
Rating
City
Country
Near-by attractions
Hotel 4
Name
Phone-number
Zip-code
Latitude
Longitude
No. of stories
sameSynonymous / different labels
Different Contexts / PerspectivesMany-to-one Complimentary
7/27/2009 main evaluation 21
Hotel (Consolidated Concept )
Name
Facilities
Capacity
Contact
Single room price
Double room price
Access
Rating
Address
Zip-code
Latitude
Longitude
Near-by attractions
No. of stories
Consolidated Concept
Concept ConsolidationConcept Consolidation A concept consolidation C is defined as a triple
< , S, A> where◦ - consolidated concept
◦ S - set of constituent concepts {C1,C2 ,…..Cn}
◦ A is the attribute alignment between and S
Based on Global-as-View (GAV) approach for data integration (Lenzerini, 2002)
◦ Global schema defined as views on source schemas
Consolidated Concept with consolidated attributes◦ aligned to source concept attributes as views
CC
C
C
7/27/2009 main evaluation 22
Concept ConsolidationConcept Consolidation
23
C1a2a
ma
iCaligned( , )
aligned( , )
1a 1ia2ia
inia
1ia
2a 2ia
aligned( , )ma inia
)( 1ia)( 2
ia
)( inia
view
1C
nC
iM
nM
1M
A = { , … }1M 2M nM
image
< , S, A>C
7/27/2009 main evaluation 23
)( 1ia
k k
Concept ConsolidationConcept ConsolidationConsolidated view of instances
Translation of instances◦From one conceptualization to another
Query Unfolding (Advantage of GAV over LAV)
◦Queries over (in terms of attributes)
to queries over {C1,C2 ,…..Cn}
◦Using alignment A◦Union of results
Translation of queries
C
7/27/2009 main evaluation 24
))(,(),( jj akvakv
)(.....)()()( 2211 nn CQCQCQCQ
Concept CloudConcept Cloud
Sub-Cloud
Consolidated concept
7/27/2009 main evaluation 25
Experiment on ConceptualizationHypothesis
◦ Multiple conceptualizations by different people for the same thing can be consolidated
Methodology◦ Participants given short text passages (6
participants)
◦ List down Facts structured as (Attribute, Value) table
All concept schemas aligned manually
attribute
value
name Kiyomizu
location Kyoto
….. …..
26
Concept schema
7/27/2009 26main evaluation
ObservationsObservations
7/27/2009 main evaluation 27
Types of Alignment Relations found
Attribute label similarity
RemarksRemarksPeople can express their conceptualizations
in terms of schemaDifferent people have different
conceptualizations◦ No one covers all possible attributes
Conceptualizations overlap significantlyMost parts can be alignedMost have simple alignment relations
Multiple conceptualizations can be consolidated
287/27/2009 28main evaluation
Alignment of Concept Alignment of Concept SchemasSchemas
Attribute Alignments suggested Automatically◦ Alignment API implementation (with WordNet extension)
(Euzenat, 2004)
Community-supported alignment◦ Human intelligence + Machine intelligence
Alignments are represented and saved◦ Alignment ontology (Hughes and Ashpole, 2004)
◦ Alignment API alignment specification language (Euzenat et al., 2004) Other formats : C-OWL, SWRL, OWL axioms, XSLT, SEKT-ML and SKOS.
◦ Incremental alignment (maintained collaboratively)
A Unified View◦ Consolidated concept with Consolidated Attributes◦ Homogenous table of data
297/27/2009 main evaluation 29
Two Hotel concepts
x
7/27/2009 main evaluation 30
Consolidated attributes
Semi-automatic Schema Semi-automatic Schema AlignmentAlignment
Search on Consolidated Concept
Consolidated Structured Consolidated Structured SearchSearch
7/27/2009 main evaluation 31
Find all hotels with location “Tokyo” and type “luxury”
Hotel 1 ---- Hotel 2location address
type category
Concept GroupingConcept Similarity
ConceptSim(C1, C2) = w1*NameSim(N1, N2) + w2*SchemaSim(S1, S2)
NameSim ◦ WordNet-based similarity - Lin’s algorithm (1998)◦ Levenshtein distance
SchemaSim ◦ Average similarity of best matching pairs of
attributes
Calculate ConceptSim between all pairs of concepts
Group similar concepts above Threshold 327/27/2009 main evaluation 32
Schema SimilaritySchema SimilarityCalculate NameSim for all pairs of attributes to create
an n1*n2 matrix
M = [NameSim(A1X A2)]
Find best matching pairs using Hungarian Algorithm (M)(Kuhn, 1955; Munkres, 1957)
Calculate matching averageSchemaSim(S1, S2) = 2xSimilarity of best matching pairs / (|A1|+|A2|)
Adapted from Semantic similarity between sentences (Simpson and Dao, 2005)
7/27/2009 main evaluation
A1A2
S1 S2
33
Visualization of Concepts Visualization of Concepts GroupingGrouping
Cytoscape
7/27/2009 main evaluation 34
Experiments on Freebase Experiments on Freebase DataData
Purpose◦ Evaluate automatic schema alignment◦ Evaluate proposed concept grouping method◦ Observations about user-defined concepts
Community-driven database of world’s information
User-defined Types – concept schemas◦ Queried out (May 20, 2008)
Cleaning◦ Filter out test types, stop-words, types without
instances357/27/2009 35main evaluation
ObservationsObservationsAfter cleaning
◦ 1,412 concepts◦ 500 users who defined concepts
People want to share a wide variety of data
People define their own concept schemas
Most people only define few concepts (1-5)◦ Long tail of information types
7/27/2009 main evaluation 36
Freebase Concept Consolidation Concepts with same name, synonyms,
morphological variants◦ 57 consolidated concepts formed
Multiple versions of concept by different users◦ Up to 6 versions of the same concept◦ Same user also defines multiple versions
Alignments suggested automatically◦ 51 alignment relations (44 aligned attribute sets)◦ Human judgement
◦ Precision 88.24%◦ Recall 67.16%
377/27/2009 37main evaluation
Concept Consolidation Concept Consolidation ExampleExample{Recipe (user1), Recipe (user2), Recipes (user3) ….}
r1 r2 r3
Consolidated concept - Recipe Consolidated attributes
◦ {r1#ingredient, r2#ingredients, r3#materials}
◦ {r1#steps, r2#instructions}
◦ r3#directions
◦ r2#tools_required
◦ r3#taste
◦ r3#author ……
Aligned attribute Sets
38
(adapted from Freebase)
7/27/2009 38main evaluation
Evaluation of Concept Evaluation of Concept GroupingGrouping
ConceptSim(C1, C2) = w1*NameSim(N1, N2) + w2*SchemaSim(S1, S2)
Concept grouping with different thresholds (w1 = 0.7, w2 = 0.3)
Concept grouping with different weights (threshold = 0.8)
397/27/2009 39main evaluation
Emergence of Lightweight OntologiesConcepts contributed by communityConcept consolidationConcept groupingPopularity of concepts (as in Tag
clouds)
Common vocabulary for structured information sharing
Conceptual schemas (class/property)Informal organization by similarity
7/27/2009 40main evaluation
Informal Lightweight Informal Lightweight OntologyOntology
7/27/2009 main evaluation
source: Schaffert et al. (2005) p. 7
41
Evaluation
7/27/2009 42main evaluation
Evaluation of UsabilityHypothesis
◦ StYLiD is more usable than Freebase (for given tasks)
Methodology◦ Tasks performed with StYLiD and Freebase
Task 1 - Structured data authoring Task 2 - Concept schema creation Task 3, 4 - Modifying and reusing concepts Task 5 - Structured concepts and instances authoring Task 6 - Searching
◦ Observations Questionnaires, screen logs, comments, etc
7/27/2009 43main evaluation
Example (Task 1)Example (Task 1)
7/27/2009 main evaluation 44
Input Band – The Beatles
ParticipantsParticipantsTotal 15 participants
◦Including 6 without IT background◦Different backgrounds
Public policy, international relations, psychology, telecommunication, networks, hotel staff, etc.
◦From 10 countries◦Age : 22 – 43 (avg. 28.3)◦Most did not know the systems before
7/27/2009 main evaluation 45
ResultsSystem Usability Scale (SUS) (Digital
Equipment Corp.)
◦Average scores: StYLiD – 69.7%, Freebase – 39.3% Enhanced Semantic MediaWiki – 54.8% (Pfisterer et al.,
2008)
Aggregated results from the Tasks (score: 0-4)
467/27/2009 main evaluation
Results for non-IT Results for non-IT participantsparticipants6 participantsSUS scores
◦StYLiD (71.67%), Freebase (50.42%)
7/27/2009 47
ObservationsObservationsStYLiD quite usable without any training,
knowledge or helpMost users preferred StYLiD to Freebase
Specifying attribute value range not easy Strict data type constraints can cause
problemsMany people modify and reuse concepts
People try to input all data in minimum steps Data entry can be made easier and quicker
◦ Auto-complete mechanisms would be helpful
7/27/2009 main evaluation 48
Comparison with some Comparison with some systemssystems
7/27/2009 main evaluation 49
StYLiD FreebaseSemantic MediaWiki
•Concept creation
UI supported UI supported Template markup
•Instance creation
Form-based Form-based Extended wiki syntax + forms
•Data authoring
Blogging / social bookmarking
Structured wiki Wiki text annotation
•Data import Wrappers Bulk import facility
Not possible
•Constraints Flexible Strict type constraints
Strict type constraints
•Multiplicity Allowed Partly No
•Consolidation Schema-level Some instances
No
•Organization Concept grouping
Bases Categories
Practical Applications
7/27/2009 50main evaluation
Application ScenariosSocial Site for
Structured Information Sharing
Concept Schemas
Structured data
External Data
Resources
StYLiD
CMS
IntegrationSchema
Alignment
Information Sharing Social
Semantic Website
Users
Users517/27/2009 51main evaluation
Application ScenariosIntegrated Semantic portal
Structured data
External Data
Resources
StYLiD
Data Backend
IntegrationSchema
Alignment
Integrated Semantic
Portal
UsersAdmin
Concept Schemas
IS1
IS2
IS3
Wrapper1
Wrapper2
Wrapper3
Information Sources
527/27/2009 52main evaluation
Adapting to different Adapting to different scenariosscenariosVariable aspects
◦Data and concepts acquisition ◦Community and motivation◦Functionalities and constraints◦Data quality
Ways of adaptation◦Use of wrappers, etc.◦Delegate functionalities/constraints◦Extensible and customizable open source◦Customized queries and views
7/27/2009 main evaluation 53
Real practical applicationsIntegration of research staff directories
◦Osaka university and Nagoya university◦Data scraped from the websites
A musical community website in Tokyo International Exchange Center
Social data bookmarking site StYLiD.org
A document management system in AIT
7/27/2009 54main evaluation
•10 alignments automatically suggested
•All correct
•Total 19 alignments
7/27/2009 55main evaluation
University Directory Integration
Integrated interface
7/27/2009 56main evaluation
TIEC Musical Community TIEC Musical Community websitewebsite
7/27/2009 main evaluation 57
7/27/2009 main evaluation 58
StYLiD.org Data Bookmarking
7/27/2009 main evaluation 59
Document Management system
Structured Information Structured Information Dissemination in Decentralized Dissemination in Decentralized CommunitiesCommunities
Publishing
Aggregation
SocioBiblog System
Publishing
Aggregation
SocioBiblog System
Publishing
Aggregation
SocioBiblog System
Publishing
Aggregation
SocioBiblog System
Web Extended RSS
Social network links
607/27/2009 60main evaluation
Conclusions
7/27/2009 61main evaluation
ConclusionsConclusionsSocial web application for sharing
structured Semantic Web contents◦ StYLiD ◦ Free contribution, no strict constraints◦ Usable (even without training)
Concept consolidation◦ Multiple conceptualizations exist◦ Overlap significantly and can be consolidated◦ Automatic alignments with good precision and recall◦ A loose collaborative approach for creating shared
concept definitions
7/27/2009 main evaluation 62
Conclusions (contd.)Conclusions (contd.)Concept grouping by similarity
◦ Informal organization◦ Good precision can be obtained◦ Parameters can be tuned for appropriate coverage
and precision
Emergent lightweight informal ontologies◦ Ontology as by-product of information sharing and
integration
Practical applications7/27/2009 main evaluation 63
Future DirectionsFuture DirectionsComputing concept relations
Hierarchical and non-hierarchicalBetter schema alignment techniquesConsolidation of data instancesUsing existing vocabulariesMash-ups / plugins to utilize structured
dataScrapers to collect data from the web…
7/27/2009 main evaluation 64
Thank You!Thank You!QuestionsSuggestions
7/27/2009 main evaluation 65