Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags:...

Social Tagging

Kristina Lerman USC Information Sciences Institute

Thanks to Anon Plangprasopchok for providing material for this lecture.

essembly

delicious

Bugzilla

Social Web

essembly

delicious

Bugzilla

Social Web is a platform for people to create, organize, and share information

Create Information

• People create content (resources) • Text posts: blogs, Twitter, … • Images: Flickr, Picasa, … • Videos: YouTube, Vimeo, … • News stories: Digg, Reddit, Slashdot, … • Bookmarks: Delicious, CiteULike, Bibsonomy, … • Personal profiles: Facebook, MySpace, … • Maps: OpenStreetMaps, … • Locations: FourSquare, …

Organize Information

• People organize resources • Annotate with metadata

• tags: descriptive labels • geotags: geographic coordinates

• Add to folders: organize content within personal hierarchies • E.g., sets and collections on Flickr

• Other types of metadata may include • Discussions, comments, reviews • Ratings, votes, …

• Social Tagging most popular form of annotation

Social Tagging: Delicious

Content (webpage)

User Tags

Rainbow bee-eater Merops ornatus Australia Queensland Mackay Gardens

Mackay May 2008 (Set) Birds (Set) Birds (Pool) Canberra (Pool) Field Guide: Birds of the World (Pool) Birds, Birds, Birds (Pool) BIRDPIX (3/day) (Pool) Australian Birds (Pool) Birds – Kingfishers, Pittas, and Bee-eaters (Pool) Birds of Queensland (Pool)

+ + + + +

+ + + +

+

tags

submitter

public groups

discussion

private albums

Social Tagging: Flickr

Share Information

• People share resources • Social networks: broadcast to social connections

• Friends on Facebook, … • Fans/Followers on Twitter, Digg, …

• Groups affiliations • Hotlists: emerge from collective activity

• E.g., Digg front page, Flickr Explore, Flickr Trends…

Social Networks: Facebook

Social Networks: Flickr

Harvesting Knowledge from Social Tagging

Users Tags

Resources

Resource (web page) User Tags

User Resource (photo)

Tags

RR graph: PageRank

UU graph: Social network analysis

RUT hypergraph: Harvesting knowledge from social tagging

Overview

Harvesting knowledge from social tagging • “Structure of Collaborative Tagging Systems”

• Statistics of tagging activity • Consensus about meaning of document quickly emerges from the

opinions of many users

• “Exploiting Social Annotation for Automatic Resource Discovery” • Learn hidden topics in a collection of tagged documents • Use hidden topics to find relevant documents

Social Tagging

• Tags are labels attached to content • Chosen from an uncontrolled personal vocabulary • Help users to more efficiently

• Browse • Filter • Search information

• Collaborative/social tagging • Anyone can attach labels to resources (not only experts or producers

of content) • Collectively, tags represent a semantic annotation of a resource

(alternative to Semantic Web)

Tagging and Taxonomies

• Taxonomy – hierarchical, exclusive organization of objects • Linnaean classification

felidaepantheratiger felidaefeliscat

• File system: articles about cats in Africa

c:\articles\cats c:\articles\africa c:\articles\africa\cats c:\articles\cats\africa

Search multiple folders to find all relevant content

• Tagging – non-hierarchical, inclusive organization of objects • Articles tagged ‘cat’, ‘africa’

But, will not find articles tagged with ‘cheetah’

‘africa’ ‘cats’

‘cats’ AND ‘africa’

Kinds of Tags

• What content is about (topic) identify who or what document is about: ‘cat’, ‘africa’

• What it is what kind of thing it is: ‘article’, ‘blog’, ‘book’

• Who owns it who owns/created content: ‘nikographer’

• Refining categories refine or qualify categories, especially numbers

• Identify qualities or characteristics express opinion: ‘funny’, ‘interesting’

• Self-reference ‘mystuff’

• Task organizing ‘toread’, ‘jobsearch’

Social Tagging Dimensions

• Tagging rights: who can tag? • Self-tagging – only resource owner (blog posts, Flickr by convention) • Free-for-all – anyone can tag a resource (Delicious)

• Consolidation: assisted tag generation? • Blind tagging – user enters tags independently of other users • Suggestive tagging – system suggests tags based on annotations of other

users • Resource type

• Text – Web pages, blog posts, bibliographic material, … • Multimedia – images, videos, …

• Source of content • User-owned – e.g., images on Flickr • Scavenged from the Web – e.g., Delicious

• Connectivity: links between users • Reciprocity – undirected links (Facebook) vs directed (Flickr, Delicious) • Link type – friend relationship vs contact (on Flickr) shows degree of trust

User Motivations

What are users’ motivations to tag? • Organizational

• Mark items for future personal retrieval

• Social • Mark items for others to find, e.g., concert photos on Flickr

• Can result in spamming • Express opinion, e.g., “funny” tag on video

Collective value emerges from tagging decisions of individual users

• How can users be incentivized to contribute high quality annotations?

Social Tagging on del.icio.us

• Social bookmarking site del.icio.us • Users can tag any Web page (URL)

• Delicious suggests tags based on existing tags for the URL • Delicious aggregates popular tags

• Anyone can see bookmarks of others • Users can create social links

• Value of social tagging • Users bookmark for their own benefit

• Organization • Retrieval

• Useful public good emerges • Tag suggestions • List of popular URLs and tags (hotlists)

Tagging on del.icio.us

Content (webpage)

User Tags

Dynamics of del.icio.us

• Delicious dynamics [Golder & Huberman] • User activity • Tag vocabulary growth • Datasets

• Bookmarks collected over 4 days in June 2005 • Sample of users who posted bookmarks in this period

Dynamics of User Interests

• Tags reflect how user’s interests and knowledge change in time • Tag1 and Tag2 are

consistent interests of the user

• Tag3 is new interest • Or a new way to

differentiate between concepts/interests

tag1

tag2 tag3

bookmark

Tim

es t

ag h

as b

een

used

Stable Patterns in Tagging

• Consider a single URL • As it is tagged by more users • Each tag’s proportion represents the combined description of the URL by

many users • After ~100 bookmarks, relative frequency of each tag is fixed

Tag

prop

ortio

n (w

rt a

ll ta

gs)

Number of bookmarks for URL

Findings

• Consensus about a URL’s topics • Emerges quickly- after ~100 users bookmark it

• URLs do not have to become popular for tags to be useful • Minority opinions can stably coexist with popular ones • Can be used to categorize/organize URLs

• Reasons for consensus • Imitation – users imitate tag selection of others

• But, stable patterns also exist for less common tags (not shown to users)

• Shared knowledge • Can we learn it?

Learning from Social Tagging/Annotation

Goal: Learn concepts from social annotations created by many users

• Annotations by an individual user may be inaccurate and incomplete…

• Annotations from many different users may complement each other, making them meaningful in aggregate

“Jaguar”

Animal Car

= ?

By A lion Rohrs By sparky2000

Learning Concepts from Tags

Goal of Learning Algorithm

Tags

“Animal” “Car”

“Flower”

?

Group semantically related tags and resources

Resources

A group ~ A concept

Challenges in Learning from Annotations

• Sparse data 4-7 tags per bookmark; 3.74 tags per photo [Rattenbury07+]

• Ambiguity jaguar: car vs. animal

• Polysemy window: hole in a wall vs. glass pane that resides in it

• Synonymy kid vs. child

• Disagreement cats\africa vs. africa\cats

• Different Levels of Specificity Dog vs. Beagle

• Multiple facets Bird tagged by appearance, location, scientific/colloquial name”

Document Modeling Approaches

• ‘Bag-of-words’ – tf-idf • Document as a vector of word frequencies

• Small reduction in document description length • Does not handle synonymy and polysemy

• Latent semantic indexing - LSI • Identifies subspace of tf-idf that captures most of the variance in a corpus

• Reduction in document description length (# principal components) • Handles polysemy and synonymy

• Topic modeling – pLSI, LDA • Documents as random mixtures over (hidden) topics, where each topic is a

distribution over words • Large reduction in description length (# topics)

• Inference • Given a document corpus, estimate parameters of the model

– Compute distribution of hidden topics given the document

Document (r)

Topics (z)

Possible Words

Possible Topics

Generated words (t)

pLSI (Hofmann99); LDA (Blei03+)

A Stochastic Process of Word Generation

Possible Words

Possible Topics

travel, flights, airline, flight, airlines, guide, aviation, …

map, maps, world, earth, latitude, longitude, directions, address, geography, distance, zip, usa, gmaps, atlas, …

Learned Topics

video, download, bittorrent, p2p, youtube, media, torrent, torrents, movies, …

High probability words in each topic:

Apply LDA to Tagging

Tags (words)

“Animal” “Car”

“Flower”

LDA

Resource (document)

Application to Resource Discovery

• Resource discovery • Given a seed source, find other data sources that provide the same

functionality • e.g., find geocoders like http://geocoder.us, which returns

geographic coordinates of a specified US address

• Benefits • Increase robustness of II applications

• If http://geocoder.us fails, substitute with another source • Increase coverage of II applications

• http://geocoder.ca geocodes US AND Canadian addresses

discovery Invocation

& extraction

semantic typing

source modeling

Background knowledge • Seed URL

anotherWS unisys

unisys

• sample input values

http://wunderground.com

“90254”

• patterns • domain types

unisys(Zip,Temp,Humidity,…)

• definition of known sources • sample values

unisys(Zip,Temp,…) :-weather(Zip,…,Temp,Hi,Lo)

Source Discovery and Modeling [Ambite et al, 2009]

Exploiting Social Annotation for Resource Discovery

Approach: Use topic modeling of social annotation obtained from Delicious to find sources similar to a given seed URL

Seed URL

Candidates Users Tags

URLs Probabilistic Learning Model

Compute URL Similarity

URL’s distribution over concepts

Rank by Similarity To seed

e.g., LDA, to learn concepts

Obtain Annotation corpus from Delicious

• Crawling strategy • For each seed, retrieve the 20 popular tags • For each tag, retrieve sources annotated with same tag • For each source, retrieve all tags

Corpus of Annotated Resources

• Use LDA to learn 80 topics in each corpus • Distributions over topics is used to compute similarity of target URL to

seed

Topic Modeling of Social Annotations

• Manually label top 100 ranked URLs by similarity to seed URL • Compare to Google’s “find similar URLs” functionality

Source Discovery Results

Source Discovery Results

Discussion

• Users express their knowledge through the tags they create while annotating content

• Apply document modeling techniques to social annotations data

• Infer hidden topics in annotated data • Use topics for source discovery task

• Outperforms standard Web search

• Next – Extract more complex types of knowledge from social annotations • Sentiment • Folksonomies

Date post:	24-Sep-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times