Social Media-
Data Mining in TaggingdatenProf. Andreas Hotho
Universität Würzburg & Universität Kassel
29.12.2009Prof. Andreas Hotho 2
Knowledge and Data EngineeringKnowledge and Data Engineering Groupat the University of Kasselto knowledge management data engineeringby jaeschke and 1 other person on 2006-01-27 10:39:07edit delete |
Meine Forschungs-„Tag Cloud“
Trend Detection
Tag Recommender
Spam
LogSonomies
Semantic
Ranking
Graph Structures
Tag Similarity Community Detection
Ontology Learning
Information Retrieval
Data MiningSemantic Web
Social Network Analysis
Statistical Physics
Web 2.0
Machine Learning
Business Intelligence
29.12.2009Prof. Andreas Hotho 3
Definition: Web 2.0
“Web 2.0 ist ein Schlagwort, das für eine Reihe interaktiverund kollaborativer Elemente des Internets, speziell des WWWsteht und damit in Anlehnung an die Versionsnummern von Softwareprodukten eine Abgrenzung von früheren Nutzungsarten postuliert. ”
Wikipediahttp://de.wikipedia.org/wiki/Web_2.0
Tim O'Reilly prägte ihn durch den Artikel „What is Web 2.0“ (30. September 2005)
29.12.2009Prof. Andreas Hotho 4
Eine Web 2.0 Landkarte
artwork by R. Munroe http://xkcd.com/
29.12.2009Prof. Andreas Hotho 5
Agenda
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 2 4 6 8 10 12 14
rank
month
"blog""css"
"design""linux"
"music""news"
"programming""software"
"web"
Einleitung
Tagging
Ranking
Tag-Ähnlichkeiten
Recommender
29.12.2009Prof. Andreas Hotho 6
Lesezeichen im Web
29.12.2009Prof. Andreas Hotho 7
Lesezeichen im Web
Tags
User
Resource
29.12.2009Prof. Andreas Hotho 8
Lesezeichen für Audio Streams
Tags
Users
Resource
29.12.2009Prof. Andreas Hotho 9
Lesezeichen für Photos
Tags
UserResource
29.12.2009Prof. Andreas Hotho 10
Lesezeichen für Videos
Tags
UserResource
29.12.2009Prof. Andreas Hotho 11
Unser System: BibSonomy
BibSonomy zum Teilen von Bookmarks, zur Verwaltung von Literaturlisten
für Forscher, für Forschungsgruppen, für Projekte, ...
http://www.bibsonomy.org
12
Folksonomies allow users
to assign tags
to resources.
Folksonomies
A folksonomy is a tuple F := (U, T, R, Y, Á) where U, T, and R are finite sets, whose elements are called users, tags and resources, Y µ U £ T £ R, called set of tag assignments, Á µ U £ T £ T is a user-specific sub-tag/super-tag relation.
The personomy Pu of user u is the restriction of F to u.
29.12.2009Prof. Andreas Hotho 13
Alle sind am Taggen…
einfacher Weg zur Organisation von Ressourcen
sofort nützlich
Allerdings ist das Vokabular unkontrolliert.
Indizien für konvergierendes und gemeinsam genutztes Vokabular (emergente Semantik): geteiltes implizites Wissen
gegenseitige Beeinflussung der Nutzer
zugrunde liegendes Netzwerk(Folksonomy)
Tag NutzerRessource
http://xkcd.com/
14
Dataset
Data from the del.icio.us folksonomy site Obtained in July 2005 (monthly dumps (14) June 2004 – July 2005) Consists of
|U| = 75,242 users |T| = 533,191 tags |R| = 3,158,297 resources |Y| = 17,362,212 triples
29.12.2009Prof. Andreas Hotho 15
Power-Law-Verteilung in del.icio.us
Tag “unlabeled” kommt 415,950 mal vor
Tag “web” kommt 238,891 mal vor
ungefähr 40% der Tags kommen genau einmal vor
29.12.2009Prof. Andreas Hotho 16
Small World
Milgram prägte den Begriff „Small World“:(Stanley Milgram. The small world problem. Psychology Today, 67(1):61–67, 1967.)
Praktisches Experiment in den USA Zwei beliebige Personen in den USA sind durch eine sehr kurze
Kette miteinander verbunden: „six degrees of separation”
Folksonomies besitzen die so genannten „Small World” Eigenschaften:
kurze charakteristische Pfadlänge hohe Clusterung im Graphen
29.12.2009Prof. Andreas Hotho 17
Small World
29.12.2009Prof. Andreas Hotho 18
Agenda
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 2 4 6 8 10 12 14
rank
month
"blog""css"
"design""linux"
"music""news"
"programming""software"
"web"
Einleitung
Tagging
Ranking
Tag-Ähnlichkeiten
Recommender
19
Search in Folksonomies
Search engines need1. to compute the hits for a query2. and rank them. PageRank algorithm is very successful in the web
(see Google):
each row of A is normalized to 1
Authority values are propagated along the hyperlink according to
x à d Ax + (1-d) p
where A is the row-stochastic adjacency matrix of the web graph, x is the rank vector, p is the random surfer component
(may be used as preference vector),d 2 [0,1] is a weighting factor.
If |A|1 := |p|1 := 1 and there are no rank sinks, then the computation of a fixed point equals the computation of the first eigenvector of the matrix dA + (1-d) p1T .
20
Search in Folksonomies
Folksonomies have a different structure as the web graph:
Web graph Folksonomies
How can a ranking algorithm for this structure look like?
User 3User 4
User 2User 3
User 4
User 2User 3
User 4
User 1User 2
User 3User 4
User 3User 4
User 2User 3
User 4
User 2User 3
User 4
Tag 1Tag 2
Tag 3
Res 1Res 2
Res 3
21
First Aproach: Adapted PageRank
1. Split each hyperedge into six directed edges.
1. Iterative weight propagation according to PageRank:
x à d Ax + (1-d) p .
User 1
Tag 1
Res 1
User 1
Tag 1
Res 1
22
Ranking in Folksonomies: FolkRank
Problems of folksonomy-adapted PageRank dominated by graph structure undirected: weight flows back (PageRank ¼ edge degree)
Differential approach compute rank with and without preferences FolkRank = difference between those rankings normalized to [0,1]
Let RAP be the fixed point with p = 1 Let Rpref be the fixed point with p representing the high
weights for the preferred items R := Rpref – RAP is the final weight vector
23
Results for: “Semantic Web”
PageRank without preference PageRank with preference FolkRank with preference
24
Rankings for „semanticweb“
for discovering semantic relationships, user comunities, and web pages
29.12.2009Gerd Stumme 25
Trends with respect to tag “politics”
US elections in Nov. 2004
29.12.2009Prof. Andreas Hotho 26
Agenda
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 2 4 6 8 10 12 14
rank
month
"blog""css"
"design""linux"
"music""news"
"programming""software"
"web"
Einleitung
Tagging
Ranking
Tag-Ähnlichkeiten
Recommender
29.12.2009Prof. Andreas Hotho 27
Assoziationsregeln
Wenn ein Nutzer Ressourcen mit tag ti getaggt hat, dann hat er auch häufig tj dafür genutzt.
Anwendung: Empfehlen von Tags Lernen von Abhängigkeiten (Taghierarchie)
Aufgabe: Finde alle Regeln der Form: Kunden die i1, ..., in gekauft haben, haben auch j1, ... , jm .
28
Folksonomy Dataset
Del.icio.us crawl 2006 |U| = 667,128 |T| = 2,454,546 |R| = 18,782,132 |Y| = 140,333,714
Excerpt: 10,000 most popular tags |U| = 476,378 |T| = 10,000 |R| = 12,660,470 |Y| = 101,491,722
In the following: tag rank = position in most-popular list: 1: design 2: software 3: blog 4: web …
29
social similarity
29.12.2009Prof. Andreas Hotho 30
contextart graphic creative print portfolios niceweb2.0 web2 web-2.0 webapp “web web_2.0news blogs people weblog culture futurehowto how-to guide tutorials help how_tovideo entertainment awesome fun cool randomajax dhtml dom js ecmascript webdev
tutorial tutorials tips coding code examplesjavascript webdevelopment webdev example examples webprogramming
art design photography illustration blog graphicsweb2.0 ajax web tools blog webdesignnews blog technology politics media dailyhowto tutorial reference tips linux programmingvideo music funny tv software mediaajax javascript web2.0 web programming webdesign
tutorial howto programming reference design cssjavascript ajax programming css web webdesign
freq
Semantische Beziehungen zwischen Tags in Bookmarking Systemen
Ciro Cattuto and Dominik Benz and Andreas Hotho and Gerd Stumme. Semantic Grounding of Tag Relatedness in Social Bookmarking Systems. The Semantic Web - ISWC 2008, 615--631,2008.
29.12.2009Prof. Andreas Hotho 31
Original Tag: „java“
Ähnlichstes Tag:
Freq, FolkRank:„programming“
Cosine:„python“
Beispiel einer semantischen Fundierung
computers
programming
languagesdesign_patterns
java python
Wordnet Synset Hierarchie:
Abb.
29.12.2009Andreas Hotho 32
siblingslength of shortest path
to most related tag
random
shortest paths in WordNet
29.12.2009Andreas Hotho 33
Results for delicious together with similarity pruning
29.12.2009Andreas Hotho 34
Results for delicious together with similarity pruning
29.12.2009Prof. Andreas Hotho 35
Agenda
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 2 4 6 8 10 12 14
rank
month
"blog""css"
"design""linux"
"music""news"
"programming""software"
"web"
Einleitung
Tagging
Ranking
Tag-Ähnlichkeiten
Recommender
36
Personalized Tag Recommendation
37
Personalized Tag Recommendation
38
Personalized Tag Recommendation
Datasets
Pruning the graph based on the post degree (compute the p-core at level k, cf. Batagelj and Zaversnik 2002)
Characteristics of the p-cores at level k.
39
Personalized Tag Recommendation
Results delicious: post core precision/recall plot at level 10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Prec
isio
n
Recall
FolkRankCollaborative Filtering UT
most popular tags by resourceCollaborative Filtering UR
adapted PageRankmost popular tags
40
Agenda
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 2 4 6 8 10 12 14
rank
month
"blog""css"
"design""linux"
"music""news"
"programming""software"
"web"
Einleitung
Tagging
Ranking
Tag-Ähnlichkeiten
Recommender