+ All Categories
Home > Documents > EVENT IDENTIFICATION IN SOCIAL MEDIA

EVENT IDENTIFICATION IN SOCIAL MEDIA

Date post: 02-Jan-2016
Category:
Upload: yolanda-harding
View: 32 times
Download: 1 times
Share this document with a friend
Description:
EVENT IDENTIFICATION IN SOCIAL MEDIA. Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University. Social Media Sites Host Many “Event” Documents. “Event”= something that occurs at a certain time in a certain place [Yang et al. ’99] - PowerPoint PPT Presentation
Popular Tags:
15
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University
Transcript

EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Social Media Sites Host Many “Event” Documents

Photo-sharing: Flickr Video-sharing: YouTube Social networking: Facebook

2

“Event”= something that occurs at a certain time in a certain place [Yang et al. ’99]

Popular, widely known eventsPresidential Inauguration, Thanksgiving Day Parade

Smaller events, without traditional news coverageLocal food drive, street fair

Social media documents for “All Points West” festival, Liberty State Park, New

Jersey, 8/8/08

Social media documents for “All Points West” festival, Liberty State Park, New

Jersey, 8/8/08

Identifying Events and Associated Social Media Documents

Applications Event search and browsing Local search …

3

General approach: group similar documents via clusteringEach cluster corresponds to one event and its associated social media documents

Event Identification: Challenges

Uneven data quality Missing, short, uninformative text … but revealing structured context

available: tags, date/time, geo-coordinates Scalability Dynamic data stream of event

information Unknown number of events

Necessary for many clustering algorithms Difficult to estimate

4

Clustering Social Media Documents Social media document

representation Social media document similarity Social media document clustering

Clustering task: definition Ensemble algorithm: combining

multiple clustering results Preliminary evaluation

5

Social Media Document Representation

TitleTitle

Description

Description

TagsTags

Date/TimeDate/Time

LocationLocation

All-TextAll-Text

6

Social Media Document Similarity

Text: tf-idf weights, cosine similarity

7

TitleTitle

Description

Description

TagsTags

Date/TimeDate/Time

LocationLocation

All-TextAll-Text

TitleTitle

Description

Description

TagsTags

Date/Time-

Keywords

Date/Time-

Keywords

Location-ProximityLocation-Proximity

All-TextAll-Text

Location-KeywordsLocation-Keywords

Date/Time-

Proximity

Date/Time-

Proximity

time

Location: geo-coordinate proximity

AA AAAA BB BBBB

Time: proximity in minutes

Social Media Document Clustering Framework

Document featurerepresentation

Social mediadocuments

Event clusters

8

Consensus Function:combine ensemble similarities

Consensus Function:combine ensemble similarities

Clustering: Ensemble Algorithm

Wtitle

Wtags

Wtime

9

f(C,W)f(C,W)

Ctitle

Ctags

Ctime

Ensemble clustering solution

Ensemble clustering solution

Learned in a training step

Learned in a training step

Clustering: Measuring Quality Homogeneous clusters

10

Complete clusters

Metric: Normalized Mutual Information (NMI)Shared information between clustering solution and “ground truth”

Experimental Setup

Data: >270K Flickr photos Event labels from Yahoo!’s “upcoming” event

database Split into 3 parts for training/validation/testing

Clusterers: single pass algorithm with centroid similarity

Weighing scheme: Normalized Mutual Information (NMI) scores on validation set

Consensus function: weighted average of clusterers’ binary predictions

Final prediction step: single pass clustering algorithm

11

Preliminary Evaluation Results Individual clusterer performance

Highest NMI: Tags, All-Text Lowest NMI: Description, Title

Ensemble performance, compared against all individual clusterers Highest overall performance in terms of

NMI More homogenous clusters: each event

is spread over fewer clusters

12

Details in paper

Details in paper

Document similarity metric Ensemble approach

Weight assignment Choice of clusterers

Train a classifier to predict document similarity Features correspond to similarity scores

All-text, title, tags, time, location, etc. Numeric values in [0,1]

State-of-the-art classifiers: SVM, Logistic Regression, …

13

Future Work: Alternative Choices

Future Work: Alternative Choices

Final clustering step Apply graph partitioning algorithms

Requires estimating the number of clusters Evaluation metrics: beyond NMI Datasets

Flickr LastFM, YouTube Exploit social network connections

14

Conclusions

Identified events and their corresponding social media documents Proposed a clustering solution Leveraged different representations of social media

documents Employed various social media similarity metrics

Developed a weighted ensemble clustering approach

Reported preliminary results of our event identification approach on a large-scale dataset of Flickr photographs

15


Recommended