Expert Finding in Social Networks

Introduction Definition of the problem Techniques for Expertise Retrieval Tests Conclusions

Expert Finding in Social Networks

Matteo Silvestri Giuliano Vesci

Politecnico di Milano

25-07-2012

Silvestri, Vesci: Expert Finding in Social Networks 1 / 25


Outline

1 Introduction

2 Definition of the problem

3 Techniques for Expertise Retrieval

4 Tests

5 Conclusions



Expert Finding in Social Networks

There are several problems that require looking for expert usersinside online social networks. For example:

friends experts in cinema

friends that know about a particular disease

friends able to use a particular technology

The task of finding experts able to answer specific informativeneeds is called expert finding.

In particular, we studied this problem in the human computationfield of CrowdSearcher, an approach that bridges conventionalsearch experiences to crowdsourcing.

Problem: assign CrowdSearcher tasks to expert users



Research Questions

Can the analysis of social actions (e.g. posts, tweets,interaction with social groups, etc.) help in providing a bettercharacterization of users for search tasks?

Is the combined use of social network information useful tobetter characterize a user?

Among the available approaches to expert finding, which oneis better suited in the context of social networks?

Are social networks oriented toward specific domains ofexpertise?

Goal: methodologies and tools for the selection of best experts in aset of trusted users in (multiple) social networks.



Outline

1 Introduction



4 Tests

5 Conclusions



Definition of the problem

Automatically reach experts for crowdsourced queries:

Given a query q and a set CE = (ce1, ce2, ..., cem) of socialusers that are candidate experts, find a ordered subsetS(CE) ⊂ CE of n users with the highest scores score(q, cei).

Score(q, cei) S(CE)CE

q

Estimating the scoring function score(q, ei) is the main task ofthis work



Social Network Characterization

Two types of social information characterize users:

explicit information: static profiles

implicit information: social dynamic activities

Social network users can perform several activities and publishinformative materials, that we call resources.

The idea is to collect evidence of expertise from multiple resourcesassociated to a candidate.



Resources Levels

Resources are related to the user through a path in the graph.We consider resources connected to a user through a path oflength <= 2.

[email protected]

ALICE

[email protected]

[email protected]

owns / creates

annotates(likes)

Facebook

Group

rela

tesT

o

(bel

ongs

)[email protected]

[email protected]

contains

contains

creates

Level 0

Level 1

Level 2



Outline

1 Introduction



4 Tests

5 Conclusions



Analysis

Resources have to be analyzed to infer expertise information

Crawling (API) Url Extraction

Text Preprocessing

Language Detection

Named Entity Extraction

Crawling: extraction of resources’ textual content exploiting SocialNetworks API

Url Extraction: extract the content of eventual external websites andappend it to the resource’s text

Language Detection: it is not recommended having different languagesin the same index in information retrieval systems, so we detect thelanguage of resources.

Named Entity Extraction: extraction of entities like people, cities andmovies

Textual Preprocessing: we remove stop-words (common words), filterout html tags, perform stemming.



Model 1: Resource Based

Query Resource Candidateweight(r,c)score(q,r)

It is based on resources, considered as documents in a classic Vector SpaceModel. Resources are represented both as term vectors and entity id vectors

1 First, the similarity between the query and resources is computed:

score(q, r) = α·∑

t∈q

(

tf (t, r) · idf (t)2)

+ β ·

∑

e∈q

(

tf (e, r) · idf (t)2 · eConf (e, r))

2 Then, users related to best resources are extracted as possible experts:

score(q, ce) =∑

ri∈S(R)

score(q, ri )

maxrj∈S(R)

score(q, rj )· weight(ri , ce)

Varying on α and β, we obtain three matching methods:

Mixed: α > 0, β > 0

TextOnly: α = 1, β = 0

EntityOnly: α = 0, β = 1



Model 2: User Based

Query DomainCandidate Expertise

score(q,ce)EntityResource

s(d,e)s(d,r)s(d,ce)

We refer to about 70 Freebase domains such as sports, location,education, book, comics, videogames, tv.

For each entity e in a resource, a score s(d, e) is computed, denoting howmuch the entity is related to a domain of expertise d:

s(d , e) =

∑

j∈I (d)

1log2(1+j)

v∑

i=1

1log2(1+i)

,

Then a similar score is computed for each resource s(d, r), given all theentities in the resource related to the domain d:

s(d , r) =∑

e∈E (r)

s(d , e) · rel(e, r),

where rel(e, r) is a measure of relevance of the entity in the resource



Model 2: User Based - User/Domain Matrix

Finally, the score s(d, ce) is computed for each candidateexpert-domain couple, to build a model of the users as amatrix CE,D:

s(d , ce) =

∑

r∈S(R,ce)

weight(r , ce) · s(d , r)

∑

r∈S(R,ce)

weight(r , ce)

Sport Music TV Education Movies ...

Candidate Expert 1 .033 .012 .068 .037 .034 ...



... ... ... ... ... ... ...

For each query is computed s(d,q), similarly to resourcesLooking at the matrix of expertise, the score for a user iscomputed as:

score(q, ce) = expertise(q) • expertise(ce) =∑

d∈D(q)

s(d , q) · s(d , ce)



Outline

1 Introduction



4 Tests

5 Conclusions



Experimental Setup

Dataset built through a recruitment campaign:

Facebook Twitter LinkedIn

#Users 39 23 28

#English Resources 107,956 33,022 11,486#Italian Resources 124,537 14,038 4,133

#Total Resources 232,493 47,060 15,619

Test suite of 30 information needs, or queries, involvingvarious domains:

Which php function can I use to obtain the length of a string?

Can you list some restaurant in Milan?

Ground truth: graded relevance judgments of users’ expertiseare obtained from the users themselves trough an onlinequestionnaire



Tests - Resource based configurations comparison

Model Metrics

type level entity MAP MRR NDCG NDCG@10

Resource Based

0text only .2034 .6264 .2963 .3183entity only .0454 .2500 .0731 .0821

mixed .2026 .6014 .2832 .3020


mixed .3150 .8000 .4272 .4335


mixed .3245 .8444 .4454 .4581

Data showed in the table were obtained considering:

english resources

as relevants users, the ones above the average,for each query

entityConf (e, r) = 1 + tagMeScore(e, r)

top 50 resources

for the mixed matching method: α = 1, β = 2

weight(e, r) = 1∀r ∈ Lv0, Lv1, weight(e, r) =0.2∀r ∈ Lv2



Tests - Resources window

Another experiment was made by varying the number of resourcesconsidered in the score. We call that size window

For simplicity, we only considered Lv2-Mixed and Lv1-TextOnlyconfigurations

Considering more resources increases system quality till the 3-4%. Then,the curves stabilize: increasing the window size does not lead tosignificantly better results



Tests - User based

Model Metrics

type level MAP MRR NDCG NDCG@10

User Based0 .3685 .7603 .4907 .43321 .3546 .7306 .4990 .4526

2 .3424 .8178 .4770 .4288

Table: Overall-comparison-User-Based

Data showed in the table were obtained considering:

english resources

as relevants users, the ones abovethe average, for each query

top 20 users

weight(e, r) = 1∀r ∈Lv0, Lv1,weight(e, r) = 0.2∀r ∈

Lv2



Tests - User/Resource based models comparison

The two models presented are evaluated in terms of resultsquality and performances.

We considered the best configuration for both: Lv2-Mixed forthe resource based and Lv1 for the user based



Tests - User/Resource based models comparison

The index size is showed in logarithmic scale: index expertiseas a pre-built user-domain matrix provides evident advantages

For the resource based model, the query time is linear on thewindow size, while it is constant for the user based one.



Tests - Verticalization

An additional and interesting experiment is given byconsidering only resources of a single domain and channel

For semplicity, we only considered Lv2-Mixed configuration,with the window size fixed to 50.

DomainChannel

FB TW Lin

computer eng. .2112 .5858 .4472location .1852 .3549 .2033

movies & tv .2794 .4296 .1578music .2868 .4229 .2672science .1827 .4260 .3827sport .2856 .4225 .1933

tech. & games .2297 .4186 .2052

All domains .2526 .4296 .2670

Table: MAP

DomainChannel

FB TW Lin

computer eng. .5038 .7014 .4904location .4423 .4172 .3517

movies & tv .4460 .4960 .2028music .3957 .4631 .4226science .3004 .4366 .4977

sport .5497 .4092 .3298tech. & games .3641 .4545 .2352

All domains .4415 .4791 .3473

Table: NDCG@10



Outline

1 Introduction



4 Tests

5 Conclusions



Conclusions

We classified resources in two main classes: static resourcesand dynamic resources

We adopted and extended two models of experts finding

The analysis of social activities can help to better characterizethe expertise of users

The adoption of multiple social networks can greatly improvethe representation of a user for expert finding purposes, but,for specific domains, it is better to stress single platforms.



Open questions

Exploiting social graph toimprove experts retrieval

Domain specific queriesrequire a less generalapproach

Example: Geolocalizedqueries!



Questions & Answers


Date post:	01-Dec-2014
Category:	Technology
Upload:	giulianovesci
View:	301 times
Download:	3 times

Expert Finding in Social Networks

Technology