Semantic Wikis: Fusing Two Strands of the Semantic Web Dr. Mark Greaves Vulcan Inc. [email protected]...

Semantic Wikis:Fusing Two Strands of the Semantic Web

Dr. Mark GreavesVulcan Inc.

[email protected]© 2008 Vulcan Inc.

mailto:[email protected]

2

Talk OutlineTalk Outline

The Argument for Semantic Wikis– Two Strands of the Semantic Web– Semantic Wikis: Bridging the Gap– Lessons from the Design of SMW+

Semantic Wiki Experience with Vulcan’s Project Halo– Question Answering in Science– Wikis for Question Answering

Semantic MediaWiki+

3




Semantic MediaWiki+

4

Strand 1: The Semantic Strand of the Semantic WebStrand 1: The Semantic Strand of the Semantic Web

Semantic Web as RDBMS Integration Technology– Semantic representation of schema relations– Centralized workflows for ontology/data definition and management– Powerful reasoning and inference– Enterprise-oriented

Rooted in the original software/tools of the Semantic Web– Initial triplestores and authoring systems were (mostly) stand-alone or within the

confines of a controlled data set– Early DARPA use cases were oriented around data integration

• EII-style applications: BBN’s Foreign Clearance Guide for AMC• More XML-oriented than Web-oriented

The Primary Commercial use of Semantic Web for many years– Examples: Siderean Seamark, Oracle RDF– Still the most well-understood

use cases for the semantic web– Still extremely important commercially

5

Strand 2: The Web Strand of the Semantic WebStrand 2: The Web Strand of the Semantic Web

Semantic Web as a web-scale knowledge publishing technology– Uncontrolled data dynamics, imperfect and voluminous data– Anyone can publish with limited/no knowledge engineer involvement– A massive base of socially-curated semantic data– Balance between quantity and purity (issue with owl:sameAs links)– Semantic data doesn’t have to be associated with HTML web text

Rooted in the original vision of the Semantic Web– Took several years to start to be realized– Difficulty conceiving of massive numbers of overlapping ontologies and class

hierarchies, and uncoordinated data publishing– Hard problem is maintaining a set of informal, evolving, and partial agreements

on vocabularies and ontologies

An exciting and emerging data set– Examples: Yahoo!, Sindice, Linking Open Data– Fairly poorly understood use cases

(especially commercially)– Web-oriented and web-scale is extremely attractive

6

What do Strand 2 Semantic Web Applications Do?What do Strand 2 Semantic Web Applications Do?

Strand 1 semantic web applications have enterprise use cases– EII, E-science, Enterprise content management...– Success of use cases requires unified data models, familiar to DB thinking

Strand 2 semantic web applications address a brand new use case type– “Semantic Web should allow people to have a better online

experience” – Alex Iskold, CEO of AdaptiveBlue– Enhance the human activities of content creation, publishing, linking my data to

other data, forming community, purchasing satisfying things, browsing, etc.– Strongly linked to Web 2.0 business models (such as they are)

• Improve the effectiveness/targeting of advertising• Knowledge management tools for communities

Strand 2 use cases still require Strand 1-style data consistency and vocabulary agreement

Can Strand 2 Semantic Web Applications Overcome theData Chaos of the Emerging Semantic Web?

7

Semantic Wikis are in both StrandsSemantic Wikis are in both Strands

Wikis are tools for Publication and Consensus

MediaWiki (software for Wikipedia, Wikimedia, Wikibooks, etc.)– Most successful Wiki software

• High performance: 10K pages/sec served, scalability demonstrated• LAMP web server architecture, GPL license

– Publication: simple distributed authoring model• Wikipedia: >2.5M English articles, >250M edits, >2.5M images, #8 Alexa traffic rank in August

– Consensus achieved by global editing and rollback• Fixpoint hypothesis, although consensus is not static• Gardener/admin role for contentious cases

Semantic Wikis apply the wiki idea to structured (typically RDFS) information– Authoring includes instances, data types, vocabularies, classes– Natural language text used for explanations– Automatic list generation from structured data, basic analytics, database imports– See e.g., http://wiki.ontoprise.com for one powerful semantic wiki

Semantic Wiki Hypotheses:(1) Significant interesting non-RDBMS Semantic Data can be collected cheaply

(2) Wiki mechanisms can be used to maintain consensus on vocabularies and classes

8

Example: Semantic MediaWiki with Halo Extensions (SMW+)Example: Semantic MediaWiki with Halo Extensions (SMW+)

Knowledge Authoring Capabilities – Syntax highlighting when editing a page– Semantic toolbar in edit mode

• Displays annotations present on the page that is edited• Allows changing annotation values without locating the annotation in the wiki text

– Autocompletion for all instances, properties, categories and templates – Increased expressivity through n-ary relations (available with the SMW 1.0 release)

Semantic MediaWiki+

9

Semantic Navigation Capabilities– GUI-based ontology browser, enables browsing of the wiki's taxonomy and lookup of

instance and property information– Linklist in edit mode, enables quick access of pages that are within the context of the

page being currently edited– Search input field with autocompletion, to prevent typing errors and give a fast

overview of relevant content

Semantic MediaWiki+


10


Knowledge Retrieval Capabilities– Combined text-based and semantic search– Basic reasoning in queries with sub-/super-category/-property reasoning and resolution of redirects (equality

reasoning)– GUI-based query formulation interface

Web service integration and import/export support for popular formats Rule system developed for OWL-DLP and most of OWL-R Fully open source under GPL, supported by Ontoprise

Semantic MediaWiki+

11

Cool Idea... But Does it Work?Cool Idea... But Does it Work?

User tests were performed in Chemistry– 20 graduate students were each paid for 20

hours (over 1 month) to collaborate on semantic annotation for chemistry

– ~700 Wikipedia base articles– US high-school AP exams were provided

as content guidance

Initial Results (SMW+ 1.0)– Sparse: 1164 pages (entites), avg 5

assertions per entity• 226 Relations (1123 relation-statements)

and 281 attributes (4721 attribute-statements)– Many bizarre attributes and relations– Very difficult to use with a reasoner

User testing and quality results for (SMW+ 1.1) extensions– Initial SUS scoring (6 SMEs, AP science task) went from 43 to 61; final scores in the 70s– 3 sessions using the Intrinsic Motivation Inventory (interest/value/usefulness); up 14% – Aided by the consistency bot, users corrected 2072 errors (80% of those found) over 3

months

We have continued to build on this framework

Gardening Statistics for Test Wiki

12

Some Lessons Learned from SMW+ (and Freebase)Some Lessons Learned from SMW+ (and Freebase)

User Interface design matters– This is core to MediaWiki’s success– Formal usability testing with SMEs matters a lot– Zero-training matters a lot

Gardening matters– Users need support for debugging– Gardeners can do large scale ontology editing– Supports “Schema Last” data engineering

User-created ontologies are not always well-designed– Flatter than normal– Cheaper than normal

Natural language is necessary to augment bare RDF(S) semantics– Supplemental semantics can be usefully carried in natural language

13

From Strand 2 Web to Strand 1 SemanticsFrom Strand 2 Web to Strand 1 Semantics

Well-designed semantic wikis make possible certain Strand 2 applications– They enable local consensus-building on socially-published data– They allow Strand 2 knowledge publication to go beyond search

Strand 1 semantic data can certainly support Strand 2 applications– Example: use of other triplestore data in SMW+

How can you use Strand 2-collected data to support Strand 1 applications?– Corporate uses of socially-curated data (Metaweb)– Project Halo: Scientific question-answering

14




Semantic MediaWiki+

15

Envisioning the Digital Aristotle for Scientific KnowledgeEnvisioning the Digital Aristotle for Scientific Knowledge

Inspired by Dickson’s Final Encyclopedia, the HAL-9000, and the broad SF vision of computing

– The “Big AI” Vision of computers that work with people

The volume of scientific knowledge has outpaced our ability to manage it

– This volume is too great for researchers in a given domain to keep abreast of all the developments

– Research results may have cross-domain implications that are not apparent due to terminology and knowledge volume

“Shallow” information retrieval and keyword indexing systems are not well suited to scientific knowledge management because they cannot reason about the subject matter

– Example: “What are the reaction products if metallic copper is heated strongly with concentrated sulfuric acid?” (Answer: Cu2+, SO2(g), and H2O)

Response to a query should supply the answer (possibly coupled with conceptual navigation) rather than simply list 1000s of possibly relevant documents

16

The Halo Project in One SlideThe Halo Project in One Slide

Project Halo: SME-based Authoring for scientific question-answering systems

Project Halo Goal: To determine whether tools can be built to facilitate robust knowledge formulation, query and evaluation by domain experts, with ever-decreasing reliance on knowledge engineers

– Can SMEs build robust question-answering systems that demonstrate excellent coverage of a given syllabus, the ability to answer novel questions, and produce readable domain appropriate justifications using reasonable computational resources?

– Will SMEs be capable of posing questions and complex problems to these systems?

– Do these systems address key failure, scalability and cost issues encountered in expert systems?

Experimental Scope: Selected portions of the AP syllabi for chemistry, biology and physics

– Example: Balance the following reactions, and indicate whether they are examples of combustion, decomposition, or combination

(a) C4H10 + O2 CO2 + H2O(b) KClO3 KCl + O2

(c) CH3CH2OH + O2 CO2 + H2O(d) P4 + O2 P2O5

(e) N2O5 + H2O HNO3

17

AURA – Automated User-centered Reasoning and Acquisition SystemAURA – Automated User-centered Reasoning and Acquisition System

Aura is a tool to help users formalize AP-level scientific knowledge Aura can then reason with that knowledge So users can ask questions and understand the answers

18

2006 Experimental Results for the Aura System2006 Experimental Results for the Aura System

Professional KE KBs

No natural language

~$10K per syllabus page

DomainNumber of questions

Percentage correct

SME1 SME2 Avg KE

Bio 146 52% 24% 38% 51%

Chem 86 42% 33% 37.5% 40%

Phy 131 16% 22% 19% 21%

Halo Pilot System

Percent correct

Cycorp 37%

SRI 44%

Ontoprise 47%

Time for KF– Concept: ~20 mins for all SMEs– Equation: ~70 s (Chem) to ~120

sec (Physics)– Table: ~10 mins (Chem)– Reaction: ~3.5 mins (Chem)– Constraint: 14s Bio; 88s (Chem)

SME need for help– 68 requests over 480 person

hours (33%/55%/12%) = 1/day

VS.

Science grad student KBs

Extensive natural lang

~$100 per syllabus page

Knowledge FormulationAvg time for SME to formulate a

question– 2.5 min (Bio)– 4 min (Chem)– 6 min (Physics)– Avg 6 reformulation attempts

Usability– SMEs requested no significant help – Pipelined errors dominated failure

analysis

Question FormulationBiology: 90% answer < 10 secChem: 60% answer < 10 secPhysics: 45% answer < 10 sec

System Responsiveness

Interpretation(Median/Max)

Answer(Median/Max)

Bio 3s / 601s 1s / 569s

Chem 7s / 493s 7s / 485s

Phy 34s / 429s 14s / 252s

SME Group Pilot Group

How Can We Increase the Efficiency of SME Authoring?

19

Symbiosis Between Aura and SMW+Symbiosis Between Aura and SMW+

Classical Knowledge Engineering– Expressive knowledge representation– Sophisticated testing and debugging

Knowledge Engineering in Aura– Acquires knowledge for deductive Q/A that

can be used for answering AP questions in sciences

• Uses a DL style class taxonomy, and logic programming style rules with many extensions

– Requires 40 hours of training for knowledge formulation

Semantic Web Knowledge Engineering– Simple knowledge representation– Quantity at some expense of quality

Knowledge Engineering in SMW+– Tool for online authoring and consensus-

building around semantic web content– Captures knowledge at the level of RDFS– Collective editing for quality control– Gardening appropriate for scientific

knowledge– Almost walk up and use system

Can we use the Semantic Media Wiki to capture knowledge that could be used for Q/A in AURA?

– Factual knowledge (e.g., atomic number for carbon is 6, solubility constraints, etc.)– Taxonomic knowledge (e.g., eukaryotic and prokaryotic are two types of cells)

Knowledge creation would be faster, distributed, and cheaper

20

Example: Wikipedia Article on OrganelleExample: Wikipedia Article on Organelle

21

Source Text of Article on Organelle in SMW+Source Text of Article on Organelle in SMW+

22

Fact Box Summarizing the Annotations in SMW+Fact Box Summarizing the Annotations in SMW+

23

Ontology Browser for Test Biology Data in SMW+Ontology Browser for Test Biology Data in SMW+

24

Aura/SMW+ Use CaseAura/SMW+ Use Case

Semantic Wiki includes relevant knowledge

Aura knowledge formulation engineer searches for knowledge during knowledge formulation

The KFE notices useful information in SMW+ The KFE maps the knowledge into Aura

– Currently uses a derivative of Ontomap– Experimenting with FOAM support– ETL workflow

The knowledge is translated into Aura and available for querying

25

AURA User Searches for InformationAURA User Searches for Information

26

Aura User Notices Useful Information in WikiAura User Notices Useful Information in Wiki

27

Aura User Maps Wiki Knowledge into Aura KBAura User Maps Wiki Knowledge into Aura KB

28

Wiki Knowledge Available in Aura for Question-AnsweringWiki Knowledge Available in Aura for Question-Answering

29

ConclusionsConclusions

Two strands of semantic web applications– Strand 1: Structured, enterprise-quality semantic data

• Designed for powerful analytics and easier data fusion– Strand 2: Lightweight web-scale semantic publishing

• A revolution in AI if we can keep the quality up

Semantic Wikis have features from both strands– Easy to see how semantic wikis can leverage Strand 1 data for Strand 2 support– Harder to see how semantic wikis can leverage Strand 2 data for Strand 1

support

Vulcan’s Project Halo – Use of SMW+ to use web-collected data in a question-answering application– Addresses very hard AI problems in scaling up knowledge authoring– Full evaluation of SMW+ and Aura in early 2009

• Is mapping easier than authoring?

30

Thank You

Disclaimer: The preceding slides represent the views of the author only. All brands, logos and products are trademarks or registered trademarks of their respective companies.

Date post:	28-Dec-2015
Category:	Documents
Upload:	norman-sutton
View:	222 times
Download:	0 times

Semantic Wikis: Fusing Two Strands of the Semantic Web Dr. Mark Greaves Vulcan Inc. [email protected]...

Documents