+ All Categories
Home > Technology > Learning Social Networks From Web Documents Using Support

Learning Social Networks From Web Documents Using Support

Date post: 25-Jan-2015
Category:
Upload: ceya
View: 1,020 times
Download: 1 times
Share this document with a friend
Description:
 
20
1 Learning Social Networks from Web Documents Using Support Vectors Classifiers IEEE, WI’06 Masoud Makrehchi & Mohamed S. K amel Presenter: Teng-Kai Fan Date: 2008-11-18
Transcript
Page 1: Learning Social Networks From Web Documents Using Support

1

Learning Social Networks from Web Documents Using Support Vectors Classifiers

IEEE, WI’06

Masoud Makrehchi & Mohamed S. Kamel

Presenter: Teng-Kai Fan

Date: 2008-11-18

Page 2: Learning Social Networks From Web Documents Using Support

2

Abstract

Learning social network from incomplete relationship data.

Translating social network extractions into a text classification problem.

SVM (Support Vector Machine)

FOAF (Friend Of A Friend) dataset & F-measure.

Page 3: Learning Social Networks From Web Documents Using Support

3

Outline

Introduction Related Work Problem Statement Proposed approach: Learning Social N

etwork from Incomplete Network. Experiment Conclusion

Page 4: Learning Social Networks From Web Documents Using Support

4

Introduction

A social network is defined as a map of relationship (tie) between individuals (actors).

Applications: Marketing, Advertising. Finding friends.

Page 5: Learning Social Networks From Web Documents Using Support

5

Introduction cont.

In this study, they proposed an approach to generate a social network from a collection of web documents.

Actor-term matrix: every person can be represented by her corresponding documents.

Learning social relation from actor-term database. Assumption: the social network is partially explored

(training dataset). The support vector classifier is employed to extract the

missing relations to complete social network.

Page 6: Learning Social Networks From Web Documents Using Support

6

Related Work

The social network models can be constructed either directly or indirectly.

Direct (descriptive): the concept of acquaintanceship can be extracted from information. e-mail, cited paper, relational database and web page

link…etc.

Indirect (predictive): acquaintanceship is translated into the similarity of two actors. paper, opinions, news…etc.

Page 7: Learning Social Networks From Web Documents Using Support

7

Problem Statement The goal is to predict and learn the network

while knowing only a small number of relations between individual persons.

Social networks are represented either by graphs or matrices or adjacency matrix.

Incomplete matrix (training examples)

Complete matrix (learned matrices)

Page 8: Learning Social Networks From Web Documents Using Support

8

Learning Social Network from Incomplete Network

Two assumptions: A subset of relations represented by adjacency

matrix. The textual data associated to the actors.

Three steps: Modeling the actors in the social network. Modeling the relations between the actors. Training a classifier to learn the social network.

Page 9: Learning Social Networks From Web Documents Using Support

9

Actor Modeling

Each actor is represented by her web documents including home page, blog, CV and so on.

All document associated with an individual are merged together to build a unique document vector. Each document is associated to one actor.

where the weighting schema is

Page 10: Learning Social Networks From Web Documents Using Support

10

Actor Modeling cont. Consequently, the corpus is modeled by a

matrix called Actor-Term Matrix.

Dimension reduction: Stemming and stop-word list. DF (document frequency): terms with DF less

than 5 and more than 100 were removed.

tf*idf

Actor

Term

Page 11: Learning Social Networks From Web Documents Using Support

11

Relationship Modeling

One simple approach to model the relation between two actors is to estimate the similarity of their documents vector. The similarity measure (e.g., cosine, Jaccard and Correlati

on) offers very poor results because it models each relation with only one variable.

A better approach is to aggregate the documents vector of the actors in both sides of the relation and create new aggregated document vector.

Page 12: Learning Social Networks From Web Documents Using Support

12

Relationship Modeling cont.

Let di and dj be the document vectors associated to the actor ai and aj.

The relation between two actors are modeled by aggregating their vectors by an operator such as MIN, MAX, or Product.

The aggregated document vector (relation vector) is obtained as follows:

Page 13: Learning Social Networks From Web Documents Using Support

13

Classifier Design for Imbalance Social Network Data

Imbalance social network data The social network is sparse:

A common approach to dealing with class imbalance is to artificially re-balance the training data. Up-sampling the minority class. Down-sampling the majority class.

n: # of actorsr: # of relations

Page 14: Learning Social Networks From Web Documents Using Support

14

Classifier Design for Imbalance Social Network Data cont.

An SVM classifier with linear kernel is used for learning the social network. Learning social network is a binary class

problem with two classes including positive (connected) and negative (broken).

Page 15: Learning Social Networks From Web Documents Using Support

15

Experiments

Evaluation measures: Precision, Recall and F-measure.

Two-fold cross validation. Dataset: a real FOAF database contains 21

0,611 RDF triples. Relations between the individuals: a set of true s

ocial networks. Any web resource address and URLs related to t

he individuals.

Page 16: Learning Social Networks From Web Documents Using Support

16

Dataset cont.

All social network: Actors: 34,275 Real Ties: 33,419 Possible relationship: 587,370,675 Ratio: 1:17575

Down-sampling: remove with less than 20 and more than 70 members social networks.

After breaking the database into small sub-graphs: Actors: 254 Real ties: 246 Possible relationship: 32,131 Ratio: 1:130

Page 17: Learning Social Networks From Web Documents Using Support

17

Results

Page 18: Learning Social Networks From Web Documents Using Support

18

Results cont.

Page 19: Learning Social Networks From Web Documents Using Support

19

Page 20: Learning Social Networks From Web Documents Using Support

20

Conclusion

A text classification formulation to approximately predict social relations using web documents were proposed. A document vector aggregation model is

proposed instead of document similarity.

Using the down-sampling to deal with high imbalance data.


Recommended