Investigations into Trust for Collaborative Information Repositories:
A Wikipedia Case Study
Deborah L. McGuinness, Co-Director Knowledge Systems,
Artificial Intelligence Lab, Stanford [email protected]
Joint work with:Honglei Zeng, Paulo Pinheiro da Silva, Li Ding, Dhyanesh Narayanan, and Mayukh Bhaowal
May 22, 2006 MTW - McGuinness
Big Picture
Research theme Make question answering systems
more operational to users (agents/humans) by providing explanations for answers…
In many settings, explanations require some notion of trust in information and/or sources
May 22, 2006 MTW - McGuinness
Trust is a Critical Emerging Component in Social Collaborative Information Spaces
Goal: Allow users to access, view, and analyze information informed by trust ratings. This enables users (and agents) to: Assess the trustworthiness of documents that are
collaboratively created and updated Monitor the changes in trustworthiness of dynamic
documents and provide timely notifications of possible malicious content modification
Identify trustworthy information with visualization tools Access shareable trust information among
heterogeneous systems Enable new design paradigms for Wikis with built-in
trust components – e.g., target text analytic tools at more trustworthy documents or document fragments within a larger resource such as Wikipedia
May 22, 2006 MTW - McGuinness
Some Issues Relevant to Collaborative Information Repositories/Wikis and Trust
Revisions: a key characteristic of Wikis Some social collaborative spaces, such as Wikis allow
(and sometimes promote) updates to posts from others. Note that this differs from traditional bulletin boards, archived mailing lists, etc. that only support revision by way of follow-up posts
Rating-based systems Some web systems support and encourage explicit
ratings of contributors and contributions Wikis have no explicit trust encoding support Simple rating schemes may not work (e.g. an article
rated trustworthy may not still be trustworthy if modified)
We are exploring computational approaches to trust exploiting prominent Wiki features including: Citation-based trust approach (Wiki articles are
interlinked via citations/hyperlinks) Revision-history based trust approach
May 22, 2006 MTW - McGuinness
Terms
Concepts Article Version (of an article) Fragment Author
Relations An article may have multiple
versions, each of which reflects the modification made by an author on a previous version
A version can be split into multiple fragments, each of which is entirely contributed by a single author
Article
Version
Fragment
Author
hasFragment:[1,p]
hasVersion:[1,n]
hasAuthor:[1,m]
hasAuthor:[1,1]
May 22, 2006 MTW - McGuinness
Citation-based Trust
Derive trust based on the citation relationships among articles For example, a well-cited
article may be more trustworthy than an article that has no citations
In the same family as the well known (Google) PageRank.
May 22, 2006 MTW - McGuinness
Link-ratio Algorithm
Link-ratio of an article (i.e., the page with title x): the ratio between the number of citation occurrences of the encyclopedia term x and the number of total occurrences of x (citations and non-citations). For example, “Seattle” appears 3855 times in
Wikipedia, 1408 of which are citations (other mentions are not hot). The link-ratio value of “Seattle” is 1408/3855 = 0.36.
Generally speaking*, the higher the link-ratio value of an article is, the more trustworthy an article is.
Issue: there may be no incentive to link to an encyclopedia entry (e.g. the “love” article vs. the “Gauss's law” article)
May 22, 2006 MTW - McGuinness
Revision History-based Trust (an example of the “natural number” article in Wikipedia)
When 130.94.162.64 (an anonymous author) inserted new content into the “natural number” page, originated by Trovatore, there could be an assumption of implicit trust in the original document fragment(s).
Trovatore 130.94.162.64
isAuthorOf isAuthorOf
Content Insertion
v0: Oct 7, 2005 v1: Dec 1, 2005
Natural number can mean either a positive integer (1, 2, 3, ...) or a non-negative integer (0, 1, 2, 3, …)
The former definition is generally used in number theory, while the latter is preferred in set theory.
May 22, 2006 MTW - McGuinness
Deriving Trust from Revision History
Revision Operations (insertion, deletion, modification) implies trust. trustworthiness of the revised article depends
on the trustworthiness of the previous version, the author of the last revision, and the amount of text involved in the last revision.
Revision history is widely available in cooperative information systems: Collaborative Software Development (CVS) Cooperative Document Authoring (Wikipedia)
May 22, 2006 MTW - McGuinness
A formulation of Revision Trust
(Assumption) The trustworthiness of a new article fragment is (only) dependent on its author.
(Assumption) the trustworthy content of a revised fragment f ’ is the trustworthy content of the previous fragment f minus the trustworthy content that the author a removed from f (e.g., a fragment f could be more trustworthy if the deletion made by a has removed inaccuracies in f)
tf, tf ’, ta are trust values of f, f ’ and a respectively; |f|, |f ’| and |D| are the sizes of f, f ’ and D (D is the deleted text).
f'
| | (1 ) | |t
| ' |f at f t D
f
May 22, 2006 MTW - McGuinness
Inference Web and PML Inference Web is an infrastructure for providing explanations of
results from web applications. It provides tools such as browsers, abstractors, checkers, summarizers, combiners to manipulate and present justifications.
PML is the interlingua representation language for Inference Web. Proof markup language (PML) is a representation language designed to be able to encode information agents may need in order to evaluate results – including where information came from and how it was manipulated.
PML has an OWL encoding (and XML serialization) PML can be (and has been used) to represent justification of
information manipulation steps done by theorem provers (e.g., JTP, SNARK), text analytic tools (e.g., UIMA), task processors (e.g., SPARK), rule engines/systems (e.g., CWM, Cybercop), etc.
The main components concern inference representation and provenance issues such as author, source, etc.
Our current work expands PML to include representation primitives for trust.
May 22, 2006 MTW - McGuinness
fragment
A Sample PML encodinghttp://inferenceweb.stanford.edu/2006/02/example1-iw-wiki.owl
fragment trust
author trust
<iw:NodeSet rdf:about="http://foto.stanford.edu/mediawiki-1.4.12/index.php/Natural_number"> <In mathematics, a natural number is either a positive integer … </iw:hasConclusion> <iw:hasLanguage rdf:resource="http://inferenceweb.stanford.edu/registry/LG/English.owl#English"/> <iw:isConsequentOf> <iw:InferenceStep> <iw:hasRule rdf:resource="http://inferenceweb.stanford.edu/registry/DPR/Told.owl#Told"/> <iw:hasInferenceEngine rdf:resource="http://inferenceweb.stanford.edu/registry/IE/CitationTrust.owl#CitationTrust"/> <iw:hasSourceUsage> <iw:SourceUsage> <iw:hasSource> <iw:Source rdf:about="http://inferenceweb.stanford.edu/wp/registry/PER/Alexandrov.owl#Alexandrov"/> </iw:hasSource> </iw:SourceUsage> </iw:hasSourceUsage> </iw:InferenceStep> </iw:isConsequentOf></iw:NodeSet>
<iw:AggregatedTrustRelation> <iw:hasTrustingParty rdf:resource="http://inferenceweb.stanford.edu/wp/registry/ORG/wikipedia.owl#wikipedia"/> <iw:hasTrustedParty rdf:resource="http://foto.stanford.edu/mediawiki-1.4.12/index.php/Natural_number"/> <iw:hasTrustValue rdf:datatype="http://www.w3.org/2001/XMLSchema#float">0.1766</iw:hasTrustValue></iw:AggregatedTrustRelation>
<iw:AggregatedTrustRelation> <iw:hasTrustingParty rdf:resource="http://inferenceweb.stanford.edu/wp/registry/ORG/wikipedia.owl#wikipedia"/> <iw:hasTrustedParty rdf:resource="http://inferenceweb.stanford.edu/wp/registry/PER/Alexandrov.owl#Alexandrov"/> <iw:hasTrustValue rdf:datatype="http://www.w3.org/2001/XMLSchema#float">0.1766</iw:hasTrustValue></iw:AggregatedTrustRelation>
May 22, 2006 MTW - McGuinness
Proof Markup Language:Node Sets and Inference Steps
iw:hasConclusion:
Direct Assertion (DA)
iw:NodeSet
iw:isConsequenceOf
iw:InferenceStep
iw:hasLanguage: en
iw:hasRule:
iw:hasSourceUsage:
Conclusion:In mathematics, a natural number is either a positive integer (1, 2, 3, 4, ...) or a non-negative integer (0, 1, 2, 3, 4, ...). Encoding this conclusion in PML:
articleID, author, timestamp
In mathematics, a natural number is either a positive integer (1, 2, 3, 4, ...) or a non-negative integer (0, 1, 2, 3, 4, ...).
May 22, 2006 MTW - McGuinness
Proof Markup Language: Aggregated Trust Relation
Wikipedia
iw:AggregatedTrustRelation
iw:hasTrustedParty:iw:hasTrustingParty:
iw:hasTrustValue:
A trivial conclusion:In mathematics, a natural number is either a positive integer (1, 2, 3, 4, ...) or a non-negative integer (0, 1, 2, 3, 4, ...). Encoding trust conclusion in PML:
0.1766
Wikipedia author
May 22, 2006 MTW - McGuinness
Application: Trust View in Wikipedia
Wikipedia Database
articlerevisionauthor
Article D(version, author) +
FragmentationService
Wikipedia DBprocessor
Article D (fragment, version)+(fragment, author)+
Trust ValuationService
Trust RenderingService
PMLfor D
Article D(fragment, trust)+(version, trust)+(author, trust)+
HTML for D
User Click “trust” tabWikipedia
User Click“pml” tabWikipediaWikipedia
view
view
input
outputinput
input
output
output
Article D (version, author)+
citations, …
May 22, 2006 MTW - McGuinness
Wikipedia Article without Trust View
May 22, 2006 MTW - McGuinness
Wikipedia Article with Citation Trust View
Multiple Trust View Tab
Fragments are colored per their trust values computed from
Citation Trust (default mode).
May 22, 2006 MTW - McGuinness
Wikipedia Article with Revision Trust View
Fragments are colored per their trust values computed from
Revision Trust.
May 22, 2006 MTW - McGuinness
Conclusion Inference Web and PML can be used to support encoding and
presentation of trust related to information in social collaborative information repositories such as Wikis.
We have designed and implemented a simple trust representation that extends PML and included support for the extension in our IW tools.
More sophisticated trust modeling and trust processing is expected to be required.
We are investigating Models of trust Trust aggregation from multiple sources and multiple
algorithms Refinements and usage of revision-based trust Additional trust approaches and their combination New applications utilizing (sharable) trust information
More info: Inference Web: iw.stanford.eduSimple examples of PML markup with wiki demo:
foto.stanford.edu/mediawiki-1.4.12/index.php/[email protected]
May 22, 2006 MTW - McGuinness
Extra
May 22, 2006 MTW - McGuinness
Abstract PML
wiki:ArticleVersionhttp://.../title=Natural_number
iw:NodeSetIn mathematics, a natural number
is either a positive integer …
iw:PersonOleg Alexandrov
iw:AggregatedTrustfragment trust is 0.1766
iw:AggregatedTrustauthor trust is 0.1766
iw:NodeSet(fragment n)
…
wiki:hasFragmentList
iw:Person(author m)
iw:hasSource
iw:hasSource iw:OrganizationWikipedia
iw:hasTrustingParty
iw:hasTrustingParty
iw:hasTrustedParty
iw:hasTrustedPartyNote: Green nodes are in IW registry