Date post: | 01-Dec-2014 |
Category: |
Technology |
Upload: | giulianovesci |
View: | 301 times |
Download: | 3 times |
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Expert Finding in Social Networks
Matteo Silvestri Giuliano Vesci
Politecnico di Milano
25-07-2012
Silvestri, Vesci: Expert Finding in Social Networks 1 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Outline
1 Introduction
2 Definition of the problem
3 Techniques for Expertise Retrieval
4 Tests
5 Conclusions
Silvestri, Vesci: Expert Finding in Social Networks 2 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Expert Finding in Social Networks
There are several problems that require looking for expert usersinside online social networks. For example:
friends experts in cinema
friends that know about a particular disease
friends able to use a particular technology
The task of finding experts able to answer specific informativeneeds is called expert finding.
In particular, we studied this problem in the human computationfield of CrowdSearcher, an approach that bridges conventionalsearch experiences to crowdsourcing.
Problem: assign CrowdSearcher tasks to expert users
Silvestri, Vesci: Expert Finding in Social Networks 3 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Research Questions
Can the analysis of social actions (e.g. posts, tweets,interaction with social groups, etc.) help in providing a bettercharacterization of users for search tasks?
Is the combined use of social network information useful tobetter characterize a user?
Among the available approaches to expert finding, which oneis better suited in the context of social networks?
Are social networks oriented toward specific domains ofexpertise?
Goal: methodologies and tools for the selection of best experts in aset of trusted users in (multiple) social networks.
Silvestri, Vesci: Expert Finding in Social Networks 4 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Outline
1 Introduction
2 Definition of the problem
3 Techniques for Expertise Retrieval
4 Tests
5 Conclusions
Silvestri, Vesci: Expert Finding in Social Networks 5 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Definition of the problem
Automatically reach experts for crowdsourced queries:
Given a query q and a set CE = (ce1, ce2, ..., cem) of socialusers that are candidate experts, find a ordered subsetS(CE) ⊂ CE of n users with the highest scores score(q, cei).
Score(q, cei) S(CE)CE
q
Estimating the scoring function score(q, ei) is the main task ofthis work
Silvestri, Vesci: Expert Finding in Social Networks 6 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Social Network Characterization
Two types of social information characterize users:
explicit information: static profiles
implicit information: social dynamic activities
Social network users can perform several activities and publishinformative materials, that we call resources.
The idea is to collect evidence of expertise from multiple resourcesassociated to a candidate.
Silvestri, Vesci: Expert Finding in Social Networks 7 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Resources Levels
Resources are related to the user through a path in the graph.We consider resources connected to a user through a path oflength <= 2.
ALICE
owns / creates
annotates(likes)
Group
rela
tesT
o
(bel
ongs
contains
contains
creates
Level 0
Level 1
Level 2
Silvestri, Vesci: Expert Finding in Social Networks 8 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Outline
1 Introduction
2 Definition of the problem
3 Techniques for Expertise Retrieval
4 Tests
5 Conclusions
Silvestri, Vesci: Expert Finding in Social Networks 9 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Analysis
Resources have to be analyzed to infer expertise information
Crawling (API) Url Extraction
Text Preprocessing
Language Detection
Named Entity Extraction
Crawling: extraction of resources’ textual content exploiting SocialNetworks API
Url Extraction: extract the content of eventual external websites andappend it to the resource’s text
Language Detection: it is not recommended having different languagesin the same index in information retrieval systems, so we detect thelanguage of resources.
Named Entity Extraction: extraction of entities like people, cities andmovies
Textual Preprocessing: we remove stop-words (common words), filterout html tags, perform stemming.
Silvestri, Vesci: Expert Finding in Social Networks 10 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Model 1: Resource Based
Query Resource Candidateweight(r,c)score(q,r)
It is based on resources, considered as documents in a classic Vector SpaceModel. Resources are represented both as term vectors and entity id vectors
1 First, the similarity between the query and resources is computed:
score(q, r) = α·∑
t∈q
(
tf (t, r) · idf (t)2)
+ β ·
∑
e∈q
(
tf (e, r) · idf (t)2 · eConf (e, r))
2 Then, users related to best resources are extracted as possible experts:
score(q, ce) =∑
ri∈S(R)
score(q, ri )
maxrj∈S(R)
score(q, rj )· weight(ri , ce)
Varying on α and β, we obtain three matching methods:
Mixed: α > 0, β > 0
TextOnly: α = 1, β = 0
EntityOnly: α = 0, β = 1
Silvestri, Vesci: Expert Finding in Social Networks 11 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Model 2: User Based
Query DomainCandidate Expertise
score(q,ce)EntityResource
s(d,e)s(d,r)s(d,ce)
We refer to about 70 Freebase domains such as sports, location,education, book, comics, videogames, tv.
For each entity e in a resource, a score s(d, e) is computed, denoting howmuch the entity is related to a domain of expertise d:
s(d , e) =
∑
j∈I (d)
1log2(1+j)
v∑
i=1
1log2(1+i)
,
Then a similar score is computed for each resource s(d, r), given all theentities in the resource related to the domain d:
s(d , r) =∑
e∈E (r)
s(d , e) · rel(e, r),
where rel(e, r) is a measure of relevance of the entity in the resource
Silvestri, Vesci: Expert Finding in Social Networks 12 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Model 2: User Based - User/Domain Matrix
Finally, the score s(d, ce) is computed for each candidateexpert-domain couple, to build a model of the users as amatrix CE,D:
s(d , ce) =
∑
r∈S(R,ce)
weight(r , ce) · s(d , r)
∑
r∈S(R,ce)
weight(r , ce)
Sport Music TV Education Movies ...
Candidate Expert 1 .033 .012 .068 .037 .034 ...
Candidate Expert 2 .057 .056 .000 .019 .018 ...
Candidate Expert 3 .086 .044 .000 .059 .074 ...
... ... ... ... ... ... ...
For each query is computed s(d,q), similarly to resourcesLooking at the matrix of expertise, the score for a user iscomputed as:
score(q, ce) = expertise(q) • expertise(ce) =∑
d∈D(q)
s(d , q) · s(d , ce)
Silvestri, Vesci: Expert Finding in Social Networks 13 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Outline
1 Introduction
2 Definition of the problem
3 Techniques for Expertise Retrieval
4 Tests
5 Conclusions
Silvestri, Vesci: Expert Finding in Social Networks 14 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Experimental Setup
Dataset built through a recruitment campaign:
Facebook Twitter LinkedIn
#Users 39 23 28
#English Resources 107,956 33,022 11,486#Italian Resources 124,537 14,038 4,133
#Total Resources 232,493 47,060 15,619
Test suite of 30 information needs, or queries, involvingvarious domains:
Which php function can I use to obtain the length of a string?
Can you list some restaurant in Milan?
Ground truth: graded relevance judgments of users’ expertiseare obtained from the users themselves trough an onlinequestionnaire
Silvestri, Vesci: Expert Finding in Social Networks 15 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Tests - Resource based configurations comparison
Model Metrics
type level entity MAP MRR NDCG NDCG@10
Resource Based
0text only .2034 .6264 .2963 .3183entity only .0454 .2500 .0731 .0821
mixed .2026 .6014 .2832 .3020
1text only .3330 .8048 .4348 .4542entity only .2767 .8050 .3807 .4059
mixed .3150 .8000 .4272 .4335
2text only .2932 .8111 .4338 .4448entity only .3363 .8122 .4485 .4292
mixed .3245 .8444 .4454 .4581
Data showed in the table were obtained considering:
english resources
as relevants users, the ones above the average,for each query
entityConf (e, r) = 1 + tagMeScore(e, r)
top 50 resources
for the mixed matching method: α = 1, β = 2
weight(e, r) = 1∀r ∈ Lv0, Lv1, weight(e, r) =0.2∀r ∈ Lv2
Silvestri, Vesci: Expert Finding in Social Networks 16 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Tests - Resources window
Another experiment was made by varying the number of resourcesconsidered in the score. We call that size window
For simplicity, we only considered Lv2-Mixed and Lv1-TextOnlyconfigurations
Considering more resources increases system quality till the 3-4%. Then,the curves stabilize: increasing the window size does not lead tosignificantly better results
Silvestri, Vesci: Expert Finding in Social Networks 17 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Tests - User based
Model Metrics
type level MAP MRR NDCG NDCG@10
User Based0 .3685 .7603 .4907 .43321 .3546 .7306 .4990 .4526
2 .3424 .8178 .4770 .4288
Table: Overall-comparison-User-Based
Data showed in the table were obtained considering:
english resources
as relevants users, the ones abovethe average, for each query
top 20 users
weight(e, r) = 1∀r ∈Lv0, Lv1,weight(e, r) = 0.2∀r ∈
Lv2
Silvestri, Vesci: Expert Finding in Social Networks 18 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Tests - User/Resource based models comparison
The two models presented are evaluated in terms of resultsquality and performances.
We considered the best configuration for both: Lv2-Mixed forthe resource based and Lv1 for the user based
Silvestri, Vesci: Expert Finding in Social Networks 19 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Tests - User/Resource based models comparison
The index size is showed in logarithmic scale: index expertiseas a pre-built user-domain matrix provides evident advantages
For the resource based model, the query time is linear on thewindow size, while it is constant for the user based one.
Silvestri, Vesci: Expert Finding in Social Networks 20 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Tests - Verticalization
An additional and interesting experiment is given byconsidering only resources of a single domain and channel
For semplicity, we only considered Lv2-Mixed configuration,with the window size fixed to 50.
DomainChannel
FB TW Lin
computer eng. .2112 .5858 .4472location .1852 .3549 .2033
movies & tv .2794 .4296 .1578music .2868 .4229 .2672science .1827 .4260 .3827sport .2856 .4225 .1933
tech. & games .2297 .4186 .2052
All domains .2526 .4296 .2670
Table: MAP
DomainChannel
FB TW Lin
computer eng. .5038 .7014 .4904location .4423 .4172 .3517
movies & tv .4460 .4960 .2028music .3957 .4631 .4226science .3004 .4366 .4977
sport .5497 .4092 .3298tech. & games .3641 .4545 .2352
All domains .4415 .4791 .3473
Table: NDCG@10
Silvestri, Vesci: Expert Finding in Social Networks 21 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Outline
1 Introduction
2 Definition of the problem
3 Techniques for Expertise Retrieval
4 Tests
5 Conclusions
Silvestri, Vesci: Expert Finding in Social Networks 22 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Conclusions
We classified resources in two main classes: static resourcesand dynamic resources
We adopted and extended two models of experts finding
The analysis of social activities can help to better characterizethe expertise of users
The adoption of multiple social networks can greatly improvethe representation of a user for expert finding purposes, but,for specific domains, it is better to stress single platforms.
Silvestri, Vesci: Expert Finding in Social Networks 23 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Open questions
Exploiting social graph toimprove experts retrieval
Domain specific queriesrequire a less generalapproach
Example: Geolocalizedqueries!
Silvestri, Vesci: Expert Finding in Social Networks 24 / 25
Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions
Questions & Answers
Silvestri, Vesci: Expert Finding in Social Networks 25 / 25