Utilizing social annotations for topical search in Twitter

Post on 21-Jan-2016

34 views 0 download

Tags:

description

Utilizing social annotations for topical search in Twitter. Saptarshi Ghosh BESU Shibpur Complex Network Research Group CSE, IIT Kharagpur. General overview. Social networks in online world Twitter, folksonomies such as Delicious Modeling the network evolution Improving search services - PowerPoint PPT Presentation

transcript

Utilizing social annotations for topical search in Twitter

Saptarshi Ghosh

BESU Shibpur

Complex Network Research GroupCSE, IIT Kharagpur

General overview Social networks in online world

Twitter, folksonomies such as Delicious Modeling the network evolution Improving search services

Socio-technological networks in offline world Indian Railway Network Traffic analysis

Topical attributes of Twitter users Twitter has emerged as an important source

of information & real-time news Increasing access through topical search [Teevan

WSDM 2011]

Motivation: to discover topical attributes / expertise of users

Potential applications Know credentials of a user Identify topical experts

How to discover topical attributes? Prior attempts rely on contents of tweets or user-

profiles [Ramage ICWSM 2010, Pochampally SIGIR Workshop 2011]

Many profiles do not give topical information Tweets often contain day-to-day conversation

difficult to infer topics [Java SNA-KDD 2007, Wagner SocialCom 2012]

Proposed methodology Use social annotations – how a user is described by

others Social annotations gathered through Twitter Lists

Mining Lists to infer topics

Collect Lists containing a given user U

Identify U’s topics from List meta-data

Basic IR techniques such as case-folding, remove domain-specific stopwords

Extract nouns and adjectives

Topics inferred from Lists

linux, tech, open, software, libre, gnu, computer, developer, ubuntu, unix

politics, senator, congress, government, republicans, Iowa, gop, conservative

politics, senate, government, congress, democrats, Missouri, progressive, women

Lists vs. other features

love, daily, people, time, GUI, movie, video, life, happy, game, cool

Most common words from tweets

celeb, actor, famous, movie, stars, comedy, music, Hollywood, pop culture

Most common words from Lists

Profile bio

Who-is-who service Developed a Who-is-

Who service for Twitter

Shows word-cloud for major topics for a given user

http://twitter-app.mpi-sws.org/who-is-who/

N. Sharma, S. Ghosh, F. Benevenuto, N. Ganguly, K. Gummadi, Inferring who-is-who in the Twitter social network, WOSN 2012.

Search system for topic experts Cognos, a search system for topic expertshttp://twitter-app.mpi-sws.org/whom-to-follow/

Given a query (topic) Identify users related to the topic using Lists Rank identified users

Uses ranking scheme based on Lists Relevance of user to query Popularity of user

Cognos results for “politics”

Cognos results for “stem cell”

Evaluation of Cognos Evaluations through user-surveys

Cognos gives accurate results for wide variety of queries

Cognos vs. Twitter Who-To-Follow service Judgment by majority voting Out of 27 queries, Cognos judged better for 12,

Twitter WTF better for 11 and tie for 4

S. Ghosh, N. Sharma, F. Benevenuto, N. Ganguly, K. Gummadi, Cognos: Crowdsourcing Search for Topic Experts in Microblogs, SIGIR 2012.

Twitter as a source of information Characterizing the experts in Twitter

characterizing Twitter platform as a whole

What are the topics on which information is available on Twitter?

Topics in Twitter – major topics to niche ones

Study on the Indian Railway Network

Motivation: rail accidents during 2010

• Details of accidents: in Wiki page on IR accidents

• Considered only accidents due to

• Collision between trains

• Derailment

IRN data collection Crawled schedules of express trains from

www.indianrail.gov.in in October 2010 2195 express train-routes, 3041 stations Scheduled time of each train reaching each station

Express train schedules for several years since 1991 From Trains At A Glance time-tables Obtained from National Rail Museum, New Delhi

Observations Many trunk-routes in the Indo-Gangetic Plain

(IGP) have high daily traffic with low headway

Bad scheduling of IR traffic Routes in north India have especially low headway

during early morning hours when dense fog is likely

Skewed distribution of daily traffic

Unbalanced growth of traffic in IGP Traffic in some segments in IGP has increased by

250% in 2009, compared to the traffic in 1991 Very low construction of new tracks

Publication and press coverageS. Ghosh, A. Banerjee, N. Ganguly. Some insights on the recent spate of accidents in Indian Railways. Physica A, Elsevier, 2012.

Thank You

Questions / Suggestions?

Backup slides

Cognos vs. Twitter Who-To-Follow