Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | norman-sutton |
View: | 222 times |
Download: | 0 times |
Semantic Wikis:Fusing Two Strands of the Semantic Web
Dr. Mark GreavesVulcan Inc.
[email protected]© 2008 Vulcan Inc.
2
Talk OutlineTalk Outline
The Argument for Semantic Wikis– Two Strands of the Semantic Web– Semantic Wikis: Bridging the Gap– Lessons from the Design of SMW+
Semantic Wiki Experience with Vulcan’s Project Halo– Question Answering in Science– Wikis for Question Answering
Semantic MediaWiki+
3
Talk OutlineTalk Outline
The Argument for Semantic Wikis– Two Strands of the Semantic Web– Semantic Wikis: Bridging the Gap– Lessons from the Design of SMW+
Semantic Wiki Experience with Vulcan’s Project Halo– Question Answering in Science– Wikis for Question Answering
Semantic MediaWiki+
4
Strand 1: The Semantic Strand of the Semantic WebStrand 1: The Semantic Strand of the Semantic Web
Semantic Web as RDBMS Integration Technology– Semantic representation of schema relations– Centralized workflows for ontology/data definition and management– Powerful reasoning and inference– Enterprise-oriented
Rooted in the original software/tools of the Semantic Web– Initial triplestores and authoring systems were (mostly) stand-alone or within the
confines of a controlled data set– Early DARPA use cases were oriented around data integration
• EII-style applications: BBN’s Foreign Clearance Guide for AMC• More XML-oriented than Web-oriented
The Primary Commercial use of Semantic Web for many years– Examples: Siderean Seamark, Oracle RDF– Still the most well-understood
use cases for the semantic web– Still extremely important commercially
5
Strand 2: The Web Strand of the Semantic WebStrand 2: The Web Strand of the Semantic Web
Semantic Web as a web-scale knowledge publishing technology– Uncontrolled data dynamics, imperfect and voluminous data– Anyone can publish with limited/no knowledge engineer involvement– A massive base of socially-curated semantic data– Balance between quantity and purity (issue with owl:sameAs links)– Semantic data doesn’t have to be associated with HTML web text
Rooted in the original vision of the Semantic Web– Took several years to start to be realized– Difficulty conceiving of massive numbers of overlapping ontologies and class
hierarchies, and uncoordinated data publishing– Hard problem is maintaining a set of informal, evolving, and partial agreements
on vocabularies and ontologies
An exciting and emerging data set– Examples: Yahoo!, Sindice, Linking Open Data– Fairly poorly understood use cases
(especially commercially)– Web-oriented and web-scale is extremely attractive
6
What do Strand 2 Semantic Web Applications Do?What do Strand 2 Semantic Web Applications Do?
Strand 1 semantic web applications have enterprise use cases– EII, E-science, Enterprise content management...– Success of use cases requires unified data models, familiar to DB thinking
Strand 2 semantic web applications address a brand new use case type– “Semantic Web should allow people to have a better online
experience” – Alex Iskold, CEO of AdaptiveBlue– Enhance the human activities of content creation, publishing, linking my data to
other data, forming community, purchasing satisfying things, browsing, etc.– Strongly linked to Web 2.0 business models (such as they are)
• Improve the effectiveness/targeting of advertising• Knowledge management tools for communities
Strand 2 use cases still require Strand 1-style data consistency and vocabulary agreement
Can Strand 2 Semantic Web Applications Overcome theData Chaos of the Emerging Semantic Web?
7
Semantic Wikis are in both StrandsSemantic Wikis are in both Strands
Wikis are tools for Publication and Consensus
MediaWiki (software for Wikipedia, Wikimedia, Wikibooks, etc.)– Most successful Wiki software
• High performance: 10K pages/sec served, scalability demonstrated• LAMP web server architecture, GPL license
– Publication: simple distributed authoring model• Wikipedia: >2.5M English articles, >250M edits, >2.5M images, #8 Alexa traffic rank in August
– Consensus achieved by global editing and rollback• Fixpoint hypothesis, although consensus is not static• Gardener/admin role for contentious cases
Semantic Wikis apply the wiki idea to structured (typically RDFS) information– Authoring includes instances, data types, vocabularies, classes– Natural language text used for explanations– Automatic list generation from structured data, basic analytics, database imports– See e.g., http://wiki.ontoprise.com for one powerful semantic wiki
Semantic Wiki Hypotheses:(1) Significant interesting non-RDBMS Semantic Data can be collected cheaply
(2) Wiki mechanisms can be used to maintain consensus on vocabularies and classes
8
Example: Semantic MediaWiki with Halo Extensions (SMW+)Example: Semantic MediaWiki with Halo Extensions (SMW+)
Knowledge Authoring Capabilities – Syntax highlighting when editing a page– Semantic toolbar in edit mode
• Displays annotations present on the page that is edited• Allows changing annotation values without locating the annotation in the wiki text
– Autocompletion for all instances, properties, categories and templates – Increased expressivity through n-ary relations (available with the SMW 1.0 release)
Semantic MediaWiki+
9
Semantic Navigation Capabilities– GUI-based ontology browser, enables browsing of the wiki's taxonomy and lookup of
instance and property information– Linklist in edit mode, enables quick access of pages that are within the context of the
page being currently edited– Search input field with autocompletion, to prevent typing errors and give a fast
overview of relevant content
Semantic MediaWiki+
Example: Semantic MediaWiki with Halo Extensions (SMW+)Example: Semantic MediaWiki with Halo Extensions (SMW+)
10
Example: Semantic MediaWiki with Halo Extensions (SMW+)Example: Semantic MediaWiki with Halo Extensions (SMW+)
Knowledge Retrieval Capabilities– Combined text-based and semantic search– Basic reasoning in queries with sub-/super-category/-property reasoning and resolution of redirects (equality
reasoning)– GUI-based query formulation interface
Web service integration and import/export support for popular formats Rule system developed for OWL-DLP and most of OWL-R Fully open source under GPL, supported by Ontoprise
Semantic MediaWiki+
11
Cool Idea... But Does it Work?Cool Idea... But Does it Work?
User tests were performed in Chemistry– 20 graduate students were each paid for 20
hours (over 1 month) to collaborate on semantic annotation for chemistry
– ~700 Wikipedia base articles– US high-school AP exams were provided
as content guidance
Initial Results (SMW+ 1.0)– Sparse: 1164 pages (entites), avg 5
assertions per entity• 226 Relations (1123 relation-statements)
and 281 attributes (4721 attribute-statements)– Many bizarre attributes and relations– Very difficult to use with a reasoner
User testing and quality results for (SMW+ 1.1) extensions– Initial SUS scoring (6 SMEs, AP science task) went from 43 to 61; final scores in the 70s– 3 sessions using the Intrinsic Motivation Inventory (interest/value/usefulness); up 14% – Aided by the consistency bot, users corrected 2072 errors (80% of those found) over 3
months
We have continued to build on this framework
Gardening Statistics for Test Wiki
12
Some Lessons Learned from SMW+ (and Freebase)Some Lessons Learned from SMW+ (and Freebase)
User Interface design matters– This is core to MediaWiki’s success– Formal usability testing with SMEs matters a lot– Zero-training matters a lot
Gardening matters– Users need support for debugging– Gardeners can do large scale ontology editing– Supports “Schema Last” data engineering
User-created ontologies are not always well-designed– Flatter than normal– Cheaper than normal
Natural language is necessary to augment bare RDF(S) semantics– Supplemental semantics can be usefully carried in natural language
13
From Strand 2 Web to Strand 1 SemanticsFrom Strand 2 Web to Strand 1 Semantics
Well-designed semantic wikis make possible certain Strand 2 applications– They enable local consensus-building on socially-published data– They allow Strand 2 knowledge publication to go beyond search
Strand 1 semantic data can certainly support Strand 2 applications– Example: use of other triplestore data in SMW+
How can you use Strand 2-collected data to support Strand 1 applications?– Corporate uses of socially-curated data (Metaweb)– Project Halo: Scientific question-answering
14
Talk OutlineTalk Outline
The Argument for Semantic Wikis– Two Strands of the Semantic Web– Semantic Wikis: Bridging the Gap– Lessons from the Design of SMW+
Semantic Wiki Experience with Vulcan’s Project Halo– Question Answering in Science– Wikis for Question Answering
Semantic MediaWiki+
15
Envisioning the Digital Aristotle for Scientific KnowledgeEnvisioning the Digital Aristotle for Scientific Knowledge
Inspired by Dickson’s Final Encyclopedia, the HAL-9000, and the broad SF vision of computing
– The “Big AI” Vision of computers that work with people
The volume of scientific knowledge has outpaced our ability to manage it
– This volume is too great for researchers in a given domain to keep abreast of all the developments
– Research results may have cross-domain implications that are not apparent due to terminology and knowledge volume
“Shallow” information retrieval and keyword indexing systems are not well suited to scientific knowledge management because they cannot reason about the subject matter
– Example: “What are the reaction products if metallic copper is heated strongly with concentrated sulfuric acid?” (Answer: Cu2+, SO2(g), and H2O)
Response to a query should supply the answer (possibly coupled with conceptual navigation) rather than simply list 1000s of possibly relevant documents
16
The Halo Project in One SlideThe Halo Project in One Slide
Project Halo: SME-based Authoring for scientific question-answering systems
Project Halo Goal: To determine whether tools can be built to facilitate robust knowledge formulation, query and evaluation by domain experts, with ever-decreasing reliance on knowledge engineers
– Can SMEs build robust question-answering systems that demonstrate excellent coverage of a given syllabus, the ability to answer novel questions, and produce readable domain appropriate justifications using reasonable computational resources?
– Will SMEs be capable of posing questions and complex problems to these systems?
– Do these systems address key failure, scalability and cost issues encountered in expert systems?
Experimental Scope: Selected portions of the AP syllabi for chemistry, biology and physics
– Example: Balance the following reactions, and indicate whether they are examples of combustion, decomposition, or combination
(a) C4H10 + O2 CO2 + H2O(b) KClO3 KCl + O2
(c) CH3CH2OH + O2 CO2 + H2O(d) P4 + O2 P2O5
(e) N2O5 + H2O HNO3
17
AURA – Automated User-centered Reasoning and Acquisition SystemAURA – Automated User-centered Reasoning and Acquisition System
Aura is a tool to help users formalize AP-level scientific knowledge Aura can then reason with that knowledge So users can ask questions and understand the answers
18
2006 Experimental Results for the Aura System2006 Experimental Results for the Aura System
Professional KE KBs
No natural language
~$10K per syllabus page
DomainNumber of questions
Percentage correct
SME1 SME2 Avg KE
Bio 146 52% 24% 38% 51%
Chem 86 42% 33% 37.5% 40%
Phy 131 16% 22% 19% 21%
Halo Pilot System
Percent correct
Cycorp 37%
SRI 44%
Ontoprise 47%
Time for KF– Concept: ~20 mins for all SMEs– Equation: ~70 s (Chem) to ~120
sec (Physics)– Table: ~10 mins (Chem)– Reaction: ~3.5 mins (Chem)– Constraint: 14s Bio; 88s (Chem)
SME need for help– 68 requests over 480 person
hours (33%/55%/12%) = 1/day
VS.
Science grad student KBs
Extensive natural lang
~$100 per syllabus page
Knowledge FormulationAvg time for SME to formulate a
question– 2.5 min (Bio)– 4 min (Chem)– 6 min (Physics)– Avg 6 reformulation attempts
Usability– SMEs requested no significant help – Pipelined errors dominated failure
analysis
Question FormulationBiology: 90% answer < 10 secChem: 60% answer < 10 secPhysics: 45% answer < 10 sec
System Responsiveness
Interpretation(Median/Max)
Answer(Median/Max)
Bio 3s / 601s 1s / 569s
Chem 7s / 493s 7s / 485s
Phy 34s / 429s 14s / 252s
SME Group Pilot Group
How Can We Increase the Efficiency of SME Authoring?
19
Symbiosis Between Aura and SMW+Symbiosis Between Aura and SMW+
Classical Knowledge Engineering– Expressive knowledge representation– Sophisticated testing and debugging
Knowledge Engineering in Aura– Acquires knowledge for deductive Q/A that
can be used for answering AP questions in sciences
• Uses a DL style class taxonomy, and logic programming style rules with many extensions
– Requires 40 hours of training for knowledge formulation
Semantic Web Knowledge Engineering– Simple knowledge representation– Quantity at some expense of quality
Knowledge Engineering in SMW+– Tool for online authoring and consensus-
building around semantic web content– Captures knowledge at the level of RDFS– Collective editing for quality control– Gardening appropriate for scientific
knowledge– Almost walk up and use system
Can we use the Semantic Media Wiki to capture knowledge that could be used for Q/A in AURA?
– Factual knowledge (e.g., atomic number for carbon is 6, solubility constraints, etc.)– Taxonomic knowledge (e.g., eukaryotic and prokaryotic are two types of cells)
Knowledge creation would be faster, distributed, and cheaper
20
Example: Wikipedia Article on OrganelleExample: Wikipedia Article on Organelle
21
Source Text of Article on Organelle in SMW+Source Text of Article on Organelle in SMW+
22
Fact Box Summarizing the Annotations in SMW+Fact Box Summarizing the Annotations in SMW+
23
Ontology Browser for Test Biology Data in SMW+Ontology Browser for Test Biology Data in SMW+
24
Aura/SMW+ Use CaseAura/SMW+ Use Case
Semantic Wiki includes relevant knowledge
Aura knowledge formulation engineer searches for knowledge during knowledge formulation
The KFE notices useful information in SMW+ The KFE maps the knowledge into Aura
– Currently uses a derivative of Ontomap– Experimenting with FOAM support– ETL workflow
The knowledge is translated into Aura and available for querying
25
AURA User Searches for InformationAURA User Searches for Information
26
Aura User Notices Useful Information in WikiAura User Notices Useful Information in Wiki
27
Aura User Maps Wiki Knowledge into Aura KBAura User Maps Wiki Knowledge into Aura KB
28
Wiki Knowledge Available in Aura for Question-AnsweringWiki Knowledge Available in Aura for Question-Answering
29
ConclusionsConclusions
Two strands of semantic web applications– Strand 1: Structured, enterprise-quality semantic data
• Designed for powerful analytics and easier data fusion– Strand 2: Lightweight web-scale semantic publishing
• A revolution in AI if we can keep the quality up
Semantic Wikis have features from both strands– Easy to see how semantic wikis can leverage Strand 1 data for Strand 2 support– Harder to see how semantic wikis can leverage Strand 2 data for Strand 1
support
Vulcan’s Project Halo – Use of SMW+ to use web-collected data in a question-answering application– Addresses very hard AI problems in scaling up knowledge authoring– Full evaluation of SMW+ and Aura in early 2009
• Is mapping easier than authoring?
30
Thank You
Disclaimer: The preceding slides represent the views of the author only. All brands, logos and products are trademarks or registered trademarks of their respective companies.