+ All Categories
Home > Documents > Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information...

Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information...

Date post: 21-Dec-2015
Category:
Upload: kenneth-lane
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
17
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching, Malaysia
Transcript

Extracting Interest Tags fromTwitter User Biographies

Ying Ding, Jing Jiang

School of Information Systems

Singapore Management University

AIRS 2014, Kuching, Malaysia

Social Media and Personal Data

Dec 5, 2014 AIRS 2014 2

• Much personal information revealed in social media– Content, links, ratings

personal preferences

• All this information is useful to– Researchers: social science– Businesses: targeted

advertising

User Biographies in Twitter

Dec 5, 2014 AIRS 2014 3

• Self-introductions written in free form• Reflect users’ background and interests

User Biographies in Twitter

4

profession interestsage

Around 28% of Singapore Twitter users and 50% of US Twitter usersrevealed their personal interests in their biographies.

Dong Wei et. al. Who am I on Twitter?: A cross-country comparison.WWW’2014

Dec 5, 2014 AIRS 2014

Outline

• Background

• Our task

• Syntactic patterns of interest tags

• Build training data + gold standard

• Method

• Experiments

• Summary

5 Dec 5, 2014 AIRS 2014

Our task

• Automatically extract phrases that describe a user’s personal interests.– We call them “interest tags”– A typical information extraction problem.– Automatically build training data based on

common syntactic patterns.

6 Dec 5, 2014 AIRS 2014

Method

• Linear Chain CRF

• BIO labels

7 Dec 5, 2014 AIRS 2014

Syntactic Patterns of Interest Tags

8

• Based on manual annotation of 500 user biographies.• 28.8% of user biographies contain meaningful interest tags.

Dec 5, 2014 AIRS 2014

Building Training Data

• Seed patterns:

– Play + [NP]

– [NP] + fan

– Interested in + [NP]

• Steps:

– Use seed patterns to extract noun phrases and rank them according to their frequency

– Pick the top-100 ranked noun phrases and use them as positive instances to train CRF

9 Dec 5, 2014 AIRS 2014

Features• Syntactic or dependency features are not used as the

Twitter text is noisy for parsing• Both lexical and POS tag feature are used• To avoid over-fitting: only features extracted from the

surrounding tokens for each position are used

10 Dec 5, 2014 AIRS 2014

Gold Standard

• Two annotators: graduate students

• 500 randomly sampled user biographies

• 1190 sentences– Two annotators disagree on 10 sentences– High agreement

11 Dec 5, 2014 AIRS 2014

Experiment

12

BL-700: top 700 frequent phrases, we choose 700 because it gets the highest F-score among various numbers.Seed: use seed patterns to recognize interest tags

Dec 5, 2014 AIRS 2014

Extracted Patterns

13 Dec 5, 2014 AIRS 2014

Some popular patterns are:•[Interest tag] + fan/lover/enthusiast•I love + [interest tag]•[interest tag] is/are my life

Is it difficult to predict interest tags by users’ tweets?

14 Dec 5, 2014 AIRS 2014

Is it difficult to predict interest tags by users’ tweets?

We also applied Tf-idf ranking, which has been used to extract

personalized user tags, to extract user interest tags.

15 Dec 5, 2014 AIRS 2014

• Interest tags extracted from user’s biographies are not necessarily reflected in a user’s post tweets.• They can work as supplementary information when profiling a user.

Summary

• We studied the problem of extracting interest tags from Twitter user biographies

• We automatically built noisy training data based on syntactic patterns

• We trained CRF classifier on the noisy training data and achieved decent performance

• Interest tags extracted from Twitter user biographies may not be reflected in user’s tweets

16 Dec 5, 2014 AIRS 2014

Thank you!

Questions?

17 Dec 5, 2014 AIRS 2014


Recommended