+ All Categories
Home > Documents > Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags:...

Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags:...

Date post: 24-Sep-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
39
Social Tagging Kristina Lerman USC Information Sciences Institute Thanks to Anon Plangprasopchok for providing material for this lecture.
Transcript
Page 1: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Social Tagging

Kristina Lerman USC Information Sciences Institute

Thanks to Anon Plangprasopchok for providing material for this lecture.

Page 2: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

essembly

delicious

Bugzilla

Social Web

Page 3: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

essembly

delicious

Bugzilla

Social Web is a platform for people to create, organize, and share information

Page 4: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Create Information

•  People create content (resources) •  Text posts: blogs, Twitter, … •  Images: Flickr, Picasa, … •  Videos: YouTube, Vimeo, … •  News stories: Digg, Reddit, Slashdot, … •  Bookmarks: Delicious, CiteULike, Bibsonomy, … •  Personal profiles: Facebook, MySpace, … •  Maps: OpenStreetMaps, … •  Locations: FourSquare, …

Page 5: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Organize Information

•  People organize resources •  Annotate with metadata

•  tags: descriptive labels •  geotags: geographic coordinates

•  Add to folders: organize content within personal hierarchies • E.g., sets and collections on Flickr

•  Other types of metadata may include • Discussions, comments, reviews • Ratings, votes, …

•  Social Tagging most popular form of annotation

Page 6: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Social Tagging: Delicious

Content (webpage)

User Tags

Page 7: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Rainbow bee-eater Merops ornatus Australia Queensland Mackay Gardens

Mackay May 2008 (Set) Birds (Set) Birds (Pool) Canberra (Pool) Field Guide: Birds of the World (Pool) Birds, Birds, Birds (Pool) BIRDPIX (3/day) (Pool) Australian Birds (Pool) Birds – Kingfishers, Pittas, and Bee-eaters (Pool) Birds of Queensland (Pool)

+ + + + +

+ + + +

+

tags

submitter

public groups

discussion

private albums

Social Tagging: Flickr

Page 8: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Share Information

•  People share resources •  Social networks: broadcast to social connections

•  Friends on Facebook, … •  Fans/Followers on Twitter, Digg, …

•  Groups affiliations •  Hotlists: emerge from collective activity

• E.g., Digg front page, Flickr Explore, Flickr Trends…

Page 9: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Social Networks: Facebook

Page 10: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Social Networks: Flickr

Page 11: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Harvesting Knowledge from Social Tagging

Users Tags

Resources

Resource (web page) User Tags

User Resource (photo)

Tags

RR graph: PageRank

UU graph: Social network analysis

RUT hypergraph: Harvesting knowledge from social tagging

Page 12: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Overview

Harvesting knowledge from social tagging •  “Structure of Collaborative Tagging Systems”

•  Statistics of tagging activity •  Consensus about meaning of document quickly emerges from the

opinions of many users

•  “Exploiting Social Annotation for Automatic Resource Discovery” •  Learn hidden topics in a collection of tagged documents •  Use hidden topics to find relevant documents

Page 13: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Social Tagging

•  Tags are labels attached to content •  Chosen from an uncontrolled personal vocabulary •  Help users to more efficiently

• Browse •  Filter •  Search information

•  Collaborative/social tagging •  Anyone can attach labels to resources (not only experts or producers

of content) •  Collectively, tags represent a semantic annotation of a resource

(alternative to Semantic Web)

Page 14: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Tagging and Taxonomies

•  Taxonomy – hierarchical, exclusive organization of objects •  Linnaean classification

felidaepantheratiger felidaefeliscat

•  File system: articles about cats in Africa

c:\articles\cats c:\articles\africa c:\articles\africa\cats c:\articles\cats\africa

Search multiple folders to find all relevant content

•  Tagging – non-hierarchical, inclusive organization of objects •  Articles tagged ‘cat’, ‘africa’

But, will not find articles tagged with ‘cheetah’

‘africa’ ‘cats’

‘cats’ AND ‘africa’

Page 15: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Kinds of Tags

•  What content is about (topic) identify who or what document is about: ‘cat’, ‘africa’

•  What it is what kind of thing it is: ‘article’, ‘blog’, ‘book’

•  Who owns it who owns/created content: ‘nikographer’

•  Refining categories refine or qualify categories, especially numbers

•  Identify qualities or characteristics express opinion: ‘funny’, ‘interesting’

•  Self-reference ‘mystuff’

•  Task organizing ‘toread’, ‘jobsearch’

Page 16: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Social Tagging Dimensions

•  Tagging rights: who can tag? •  Self-tagging – only resource owner (blog posts, Flickr by convention) •  Free-for-all – anyone can tag a resource (Delicious)

•  Consolidation: assisted tag generation? •  Blind tagging – user enters tags independently of other users •  Suggestive tagging – system suggests tags based on annotations of other

users •  Resource type

•  Text – Web pages, blog posts, bibliographic material, … •  Multimedia – images, videos, …

•  Source of content •  User-owned – e.g., images on Flickr •  Scavenged from the Web – e.g., Delicious

•  Connectivity: links between users •  Reciprocity – undirected links (Facebook) vs directed (Flickr, Delicious) •  Link type – friend relationship vs contact (on Flickr) shows degree of trust

Page 17: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

User Motivations

What are users’ motivations to tag? •  Organizational

•  Mark items for future personal retrieval

•  Social •  Mark items for others to find, e.g., concert photos on Flickr

• Can result in spamming •  Express opinion, e.g., “funny” tag on video

Collective value emerges from tagging decisions of individual users

•  How can users be incentivized to contribute high quality annotations?

Page 18: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Social Tagging on del.icio.us

•  Social bookmarking site del.icio.us •  Users can tag any Web page (URL)

• Delicious suggests tags based on existing tags for the URL • Delicious aggregates popular tags

•  Anyone can see bookmarks of others •  Users can create social links

•  Value of social tagging •  Users bookmark for their own benefit

• Organization • Retrieval

•  Useful public good emerges • Tag suggestions •  List of popular URLs and tags (hotlists)

Page 19: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Tagging on del.icio.us

Content (webpage)

User Tags

Page 20: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Dynamics of del.icio.us

•  Delicious dynamics [Golder & Huberman] •  User activity •  Tag vocabulary growth •  Datasets

• Bookmarks collected over 4 days in June 2005 •  Sample of users who posted bookmarks in this period

Page 21: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Dynamics of User Interests

•  Tags reflect how user’s interests and knowledge change in time •  Tag1 and Tag2 are

consistent interests of the user

•  Tag3 is new interest •  Or a new way to

differentiate between concepts/interests

tag1

tag2 tag3

bookmark

Tim

es t

ag h

as b

een

used

Page 22: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Stable Patterns in Tagging

•  Consider a single URL •  As it is tagged by more users •  Each tag’s proportion represents the combined description of the URL by

many users •  After ~100 bookmarks, relative frequency of each tag is fixed

Tag

prop

ortio

n (w

rt a

ll ta

gs)

Number of bookmarks for URL

Page 23: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Findings

•  Consensus about a URL’s topics •  Emerges quickly- after ~100 users bookmark it

• URLs do not have to become popular for tags to be useful •  Minority opinions can stably coexist with popular ones •  Can be used to categorize/organize URLs

•  Reasons for consensus •  Imitation – users imitate tag selection of others

• But, stable patterns also exist for less common tags (not shown to users)

•  Shared knowledge • Can we learn it?

Page 24: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Learning from Social Tagging/Annotation

Goal: Learn concepts from social annotations created by many users

•  Annotations by an individual user may be inaccurate and incomplete…

•  Annotations from many different users may complement each other, making them meaningful in aggregate

Page 25: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

“Jaguar”

Animal Car

= ?

By A lion Rohrs By sparky2000

Learning Concepts from Tags

Page 26: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Goal of Learning Algorithm

Tags

“Animal” “Car”

“Flower”

?

Group semantically related tags and resources

Resources

A group ~ A concept

Page 27: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Challenges in Learning from Annotations

•  Sparse data 4-7 tags per bookmark; 3.74 tags per photo [Rattenbury07+]

•  Ambiguity jaguar: car vs. animal

•  Polysemy window: hole in a wall vs. glass pane that resides in it

•  Synonymy kid vs. child

•  Disagreement cats\africa vs. africa\cats

•  Different Levels of Specificity Dog vs. Beagle

•  Multiple facets Bird tagged by appearance, location, scientific/colloquial name”

Page 28: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Document Modeling Approaches

•  ‘Bag-of-words’ – tf-idf •  Document as a vector of word frequencies

•  Small reduction in document description length •  Does not handle synonymy and polysemy

•  Latent semantic indexing - LSI •  Identifies subspace of tf-idf that captures most of the variance in a corpus

•  Reduction in document description length (# principal components) •  Handles polysemy and synonymy

•  Topic modeling – pLSI, LDA •  Documents as random mixtures over (hidden) topics, where each topic is a

distribution over words •  Large reduction in description length (# topics)

•  Inference •  Given a document corpus, estimate parameters of the model

–  Compute distribution of hidden topics given the document

Page 29: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Document (r)

Topics (z)

Possible Words

Possible Topics

Generated words (t)

pLSI (Hofmann99); LDA (Blei03+)

A Stochastic Process of Word Generation

Page 30: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Possible Words

Possible Topics

travel, flights, airline, flight, airlines, guide, aviation, …

map, maps, world, earth, latitude, longitude, directions, address, geography, distance, zip, usa, gmaps, atlas, …

Learned Topics

video, download, bittorrent, p2p, youtube, media, torrent, torrents, movies, …

High probability words in each topic:

Page 31: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Apply LDA to Tagging

Tags (words)

“Animal” “Car”

“Flower”

LDA

Resource (document)

Page 32: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Application to Resource Discovery

•  Resource discovery •  Given a seed source, find other data sources that provide the same

functionality •  e.g., find geocoders like http://geocoder.us, which returns

geographic coordinates of a specified US address

•  Benefits •  Increase robustness of II applications

•  If http://geocoder.us fails, substitute with another source •  Increase coverage of II applications

•  http://geocoder.ca geocodes US AND Canadian addresses

Page 33: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

discovery Invocation

& extraction

semantic typing

source modeling

Background knowledge • Seed URL

anotherWS unisys

unisys

• sample input values

http://wunderground.com

“90254”

• patterns • domain types

unisys(Zip,Temp,Humidity,…)

• definition of known sources • sample values

unisys(Zip,Temp,…) :-weather(Zip,…,Temp,Hi,Lo)

Source Discovery and Modeling [Ambite et al, 2009]

Page 34: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Exploiting Social Annotation for Resource Discovery

Approach: Use topic modeling of social annotation obtained from Delicious to find sources similar to a given seed URL

Seed URL

Candidates Users Tags

URLs Probabilistic Learning Model

Compute URL Similarity

URL’s distribution over concepts

Rank by Similarity To seed

e.g., LDA, to learn concepts

Obtain Annotation corpus from Delicious

Page 35: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

•  Crawling strategy •  For each seed, retrieve the 20 popular tags •  For each tag, retrieve sources annotated with same tag •  For each source, retrieve all tags

Corpus of Annotated Resources

Page 36: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

•  Use LDA to learn 80 topics in each corpus •  Distributions over topics is used to compute similarity of target URL to

seed

Topic Modeling of Social Annotations

Page 37: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

•  Manually label top 100 ranked URLs by similarity to seed URL •  Compare to Google’s “find similar URLs” functionality

Source Discovery Results

Page 38: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Source Discovery Results

Page 39: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies

Discussion

•  Users express their knowledge through the tags they create while annotating content

•  Apply document modeling techniques to social annotations data

•  Infer hidden topics in annotated data •  Use topics for source discovery task

•  Outperforms standard Web search

•  Next – Extract more complex types of knowledge from social annotations •  Sentiment •  Folksonomies


Recommended