Personalized Social Media Search through Augmented Folksonomy Graph Qing LI Department of Computer...

Personalized Social Media Search through Augmented Folksonomy Graph

Qing LI

Department of Computer Science

Multimedia Software Engineering Research Centre

City University of Hong Kong

Outline

Background Motivation Methodology Experiment Conclusion

Outline


Background

Huge amount of resources on the World Wide Web Users can contribute a lot more since Web 2.0 User Profiling – assist users to find out what they want

• Folksonomy/Collaborative tagging system: Large popularity on the Web

• Users can contribute on resource descriptions

• Users can annotate resources by tags from their perspectives

• A resource can be annotated by users collaboratively

Web pages photos music books videos

BackgroundCollaborative tagging systems

• Folksonomy/Collaborative tagging system: Large popularity on the Web

• Definition: A folksonomy F is a tuple

F = (U, T, R, Y), Y U X T X R⊆

Web pages photos music books videos

Users Tags Resources


Coffee fan Programmer

“Java” “Java”

Results from www.delicious.com

Most of the current resource search tools… only depends on the relevant match of the query and the resource descriptions no personalization

Tom Bob


-Modeling User by Tags (User Profiling) Tags that annotated by the user

-Modeling Resource by Tags (Resource Profiling) Tags given by users on the resource

Spicy, pork,…, delicious

Beef, lily,…, healthy

Acetous, chop,…, fried

Acetous, sugary, …,high-calorie


Personalized Resource Search Framework

M. G. Noll et al. (ISWC’ 07) (TF)

Web search personalization via

Social Bookmarking &Tagging

S. Xu et al. (SIGIR’ 08) (TF-IDF)

Exploring Folksonomy for Personalized Search

D. Vallet et al. (ECIR’ 10) (BM25)

Personalizing Web Search with

Folksonomy-based User & Document Profiles

Y. Cai & Q. Li (CIKM’ 10) (NTF)

Personalizing Search by Tag-based User Profile

& Resource Profile in Collaborative Tagging Systems

A. Fabian et al. (ESWC’ 11) (CDW)

Semantic Enrichment of Twitter Posts for User

Profile Construction on Social Web

QueryUser Profile

Resources

Input has

Resource rankBased onContent

Relevance

sim

ilari

ty

Resource rankBased on

User Interest Relevance

sim

ilari

ty

Personalized Rank

ag

gre

ga

tion

Background

Outline


-User Communities can help address the problem of

conflictive tags to provide more precise search results.

Motivation

-User Communities can help address the problem of

conflictive tags to provide more precise search results.

Motivation

Existing communities mining methods are mainly

based on ternary relationships (tripartite graph)

among users, tags and resources.

Motivation




-Wu et al. (WWW’ 06)

Exploring social annotations for the semantic web

-Ma et al. (TMM’ 10)

Bridging the semantic gap between image contents and tags

-Xie et al. (JCST’ 12)

Community-Aware Resource Profiling for Personalized Search in

Folksonomy

Motivation




Problems: -- Sparsity

-- Noisy

Motivation

What we have missed?

Content similarity between resourcese.g., visual similarity (images), acoustic similarity (audios)

Motivation

What we have missed?

Semantic similarity between tagse.g., “is-a” relationship (apple & fruit)

Motivation

Motivation We first propose a method to handle the problem of discovering

latent user communities via MFG -- Multi-facet Folksonomy Graph MFG includes not only conventional tripartite relations of user-resource-

tag, but also the tag-to-tag semantic and resource-to-resource content similarities!

Outline


Q: How to integrate these two kinds of similarities with the original ternary

relations (user-resource-tag)? -- Augmented Folksonomy Graph!

Methodology

MethodologyAugmented Folksonomy Graph (AFG)

Image-to-Image Visual Similarity

The cosine similarity between global features (e.g. color correlogram, color histogram, edge direction histogram, wavelet texture) extracted from resource images.

is the feature vector for image


Tag-to-Tag Semantic Similarity

Semantic similarity between two tags (e.g. Lin’s similarity on WordNet):

is the corresponding synset (i.e. the set of synonyms) for tag tx

is the lowest super-ordinate (i.e. the smallest common

denominator in the concept hierarchy) for their information contents,

is the probability of encountering an instance of the synset


Q: How to measure the distance between two users?

Random walk distance: multiple paths; multiple types of edges; unified measurement

MethodologyRandom Walk Model

MethodologyRandom Walk Distance

We use two parameters α and β to control the impact of image-to-

image visual similarity and tag-to-tag semantic similarity on AFG.

where Wl+1|l(j |i) is the final transition probability matrix to be directly applied to the random walk

Cluster Initialization by density function

Find new user to maximize the user-cluster similarity

until the sum of intra-cluster (objective function):

converges!

Methodology

Cluster-based User Community Discovery with APC:

Graph-based TBPS

To find a resource (image) vertex, which has the largest sum of possibilities to reach both the user vertex and all query term (tag)vertices of the query:

Social-based TBPSBased on the “voting” results by user community members

Application of APC in Search

Tag-based Personalized Search (Search by Terms/Tags!)

Graph-based CBPS

To find a resource (image) vertex which has the largest sum of possibilities to reach both the user vertex and query example image of the query:

Social-based CBPSBased on the “voting” results by user community members:

Application of APC in Search

Content-based Personalized Search (Search by Image Example!)

Outline


NUS-WIDE Data Set 269,648 resources (images), 5,018 unique tags, and 50,120

users. Each user on average has annotated about 5 images Ground truth for 81 categories (concepts) of the images. For these images, six types of low-level features are

available: 64-D color histogram, 144-D color correlogram, 73-D edge direction histogram, 128-D wavelet texture, 225-D block-wise color moments and 500-D bag of words based on SIFT description.

We split the dataset into 20% as the test set and 80% as the training set.

Experiment

Dataset

P@N :

MRR:

IMP:

Experiment

Metrics

Baselines

Experiment

: Carmel et al. (JVLDB 2010) Social bookmark weighting for search and recommendation

: Xie et al. (JCST 2012) Community-aware resource

profiling for personalized search in folksonomy : Ma et al. (TMM 2010) Bridging the semantic gap between

image contents and tags.

ExperimentBaselines

Experiment

Performance on TBPS (Graph-based)

Experiment

Performance on TBPS (Social-based)

Experiment

Performance on TBPS (Graph-based & Social-based MRR)

ExperimentPerformance on CBPS (Graph-based)

ExperimentPerformance on CBPS (Social-based)

ExperimentPerformance on CBPS (Graph-based & Social-based MRR)

k from 20 to 80 (or 100): MRR values for all methods are increasing Explanation: more and more user communities can lead to a more

precise partition for all users

ExperimentInfluence of Parameters (impact of k for social-based methods)

k from 80 (or 100) to 120: MRR values for all methods drops Explanation: the number of available users for constructing user

communities becomes smaller and smaller with the increase of k.


k = 80 (or 100): MRR values for all methods reach the peak Explanation: the dataset used contains 81 categories


α: the impact of image-to-image visual similarity

β: the impact of tag-to-tag semantic similarity

Impact on Graph-based TBPS

ExperimentInfluence of Parameters (impact of α, β for graph-based methods)

The best MRR (0.5416)α = 0.0001 β = 0.01

The MRR performance is more sensitive to β than α, becauseα has little impact on the performance when β = 0, while β seems to be quite influential when α = 0.

tag-to-tag semantic similarity is more dominating than the image-to-image visual similarity in TBPS.

α: the impact of image-to-image visual similarity

β: the impact of tag-to-tag semantic similarity

Impact on Graph-based CBPS

ExperimentInfluence of Parameters (impact of α, β for graph-based methods)

The best MRR (0.4832)α = 0.01 β = 0.001

The MRR performance is more sensitive to α than β, becauseβ has little impact on the performance when α = 0, while α seems to be quite influential when β = 0.

image-to-image visual similarity is more dominating than the tag-to tag semantic similarity in CBPS.

The SimRank is based on the concept of structural-context similarity, in which ‘two objects are similar if they relate to similar objects’. For any two user vertices in the AFG, the SimRank value in l+1 step is calculated based on the l step:

The SimRank method is a suitable method for measuring the distance between two vertices in the AFG as well.

(Jeh, G. and Widom, J. Simrank: A Measure of Structural-Context Similarity. KDD 2002)

Experiment

Influence of Alternatives (SimRank v.s. Random Walk)

Experiment

Influence of Alternatives (SimRank v.s. Random Walk)

The classical clustering method K-means can also be applied to the user similarity matrix to discover the user communities.

We compare K-means with our clustering method APC

ExperimentInfluence of Alternatives (K-Means vs. APC)

Outline


Conclusion

We have proposed a facility of AFG -- Augmented Folksonomy Graph, by incorporating both image-to-image visual similarity and tag-to-tag semantic similarity

User communities are then discovered by a density-based clustering method based on user similarity

We have conducted extensive experiments on the NUS-WIDE dataset, the results of which show that the proposed approach outperforms baseline ones in search applications.

Future works Matrix Factorization in finding similar users Implement some real applications based on our proposed method

Thank you!

Date post:	18-Jan-2016
Category:	Documents
Upload:	herbert-benson
View:	214 times
Download:	0 times

Personalized Social Media Search through Augmented Folksonomy Graph Qing LI Department of Computer...

Documents