Download - Text Mining of Social Media: Going beyond the Text …Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015 What is User Geolocation? Given a set of messages

Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015

Text Mining of Social Media: Going beyond the Textand Only the Text

Timothy Baldwin

Source(s): http://people.eng.unimelb.edu.au/tbaldwin/pubs/wnut2015.pdf

http://people.eng.unimelb.edu.au/tbaldwin/pubs/wnut2015.pdf


Talk Outline

1 Introduction

2 Document MetadataUser Geolocation

3 Inter-document Graph StructureUser GeolocationVote PredictionDocument Similarity

4 Other Random Ideas: Putting it in Context

5 Conclusions


Introduction

NLP is inevitably focused on ... text, the whole text andnothing but the text, generally at the sentence or documentlevel, ignoring the greater sentence/document context

Source(s): http://goo.gl/ZS6A4n

http://goo.gl/ZS6A4n


Introduction

Meanwhile, the network analysis and data miningcommunities largely ignore (or use very simple models of)the textual content of nodes, and focus instead onconnections between them, in the form of graphs ofdifferent types

Source(s): http://goo.gl/AP0QPP

http://goo.gl/AP0QPP


Thought Experiment I

In the absence of any other textual context, what does nochange mean?



What if I provide non-textual context?



Source(s): http://40.media.tumblr.com/tumblr_lm6hqnxkHU1qeb9g8o1_500.jpg

http://40.media.tumblr.com/tumblr_lm6hqnxkHU1qeb9g8o1_500.jpg



Source(s): http://i.ytimg.com/vi/5uEwcBT8MFM/maxresdefault.jpg

http://i.ytimg.com/vi/5uEwcBT8MFM/maxresdefault.jpg



Source(s): http://i.ytimg.com/vi/I0sk0ATOaws.jpg

http://i.ytimg.com/vi/I0sk0ATOaws.jpg


Thought Experiment II

Text ↔ non-text predictive modelling


Thought Experiment IIPredict the text:




Thought Experiment IIPredict the image:




Because ... You’re all Individuals!

Source(s): https://whatdaddoes.files.wordpress.com/2014/02/life-of-brian.jpg

https://whatdaddoes.files.wordpress.com/2014/02/life-of-brian.jpg


So What Context are We Talking About?I used graphical context as illustrative examples, but what I amreally talking about is:

Document metadata:author [Wang et al., 2011, Yogatama et al., 2011, Carteret al., 2013, Lui and Baldwin, 2014]author profile [Eisenstein et al., 2011, Bergsma et al., 2013,Volkova et al., 2013, Hovy, 2015]publisher/host site [Yogatama et al., 2011]timestamp [Yogatama et al., 2011]genredomain [Yogatama et al., 2011]

Document typeDocument markupPosition within document [Li et al., 2015]Extra-textual content (tables, graphs, etc.)


The Relevance to Social Media

This is particularly relevant to social media, as a lot ofcontext is immediately accessible, in terms of:

likes/favouritesuser metadata of different typesmessage metadata of different typesuser “timeline”social network data of each userexplicit interactions between users (favourites, mentions,shares/retweets/...)


Broad Aim

Aim: improve NLP through the use of document context,focusing in this talk on:

document metadatainter-document graph structure

Concerns along the way:

means of extracting contextscalability of modelsmodel expressivity


Talk Outline

1 Introduction




5 Conclusions


What is User Geolocation?

Given a set of messages from a user, e.g.:

Waiting for a tram in the rain in Collins St. A more typicalMelbourne day today.Why you keep me up? I ain’t got no worries.New Aussie Hip Hop News: The Yarra stinks.Just had a rather thrilling albeit bumpy camel ride aroundUluru - SO. MUCH. FUN!! Fancy joining me? Enter mycomp here

predict their “home” (as distinct from “source” or “about”)location


What is User Geolocation?

Given a set of messages from a user, e.g.:

Waiting for a tram in the rain in Collins St. A more typicalMelbourne day today.Why you keep me up? I ain’t got no worries.New Aussie Hip Hop News: The Yarra stinks.Just had a rather thrilling albeit bumpy camel ride aroundUluru - SO. MUCH. FUN!! Fancy joining me? Enter mycomp here

predict their “home” (as distinct from “source” or “about”)location ... Melbourne, AU


Twitter User Geolocation: Overall

Methodology1 Discretise the geolocation class space, e.g. using a k-d tree

or via gazetteers [Roller et al., 2012, Han et al., 2012]2 Classify each user by supervised classification, based on

users with geotagged tweets, using either:1 the content of the users’ messages (“text”)2 user metadata

user-declared location (“loc”)timezone (“tz”)description (“desc”)real name (“rname”)

3 a combination of metadata and message content (“ALL”)

Dataset = Twitter-World: around 12M English tweets from1.4M users based around the world [Han et al., 2012]; userlocation = centre of the closest city to the centroid of tweets.


User Geolocation: Results

text lo

c tzdesc

rname

ALL

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.490.53

0.170.12 0.11

0.67

Acc

@16

1

Source(s): Han et al. [2014]


User Geolocation: Findings

User-declared location more accurate than text of postsfrom user; little information in other meta-data fields

The combination of the text and meta-data fields (based ona stacked logistic regression model) is more accurate again

Important to realise that pre-computing and storing imputeduser priors for all users is potentially intractable

... but also note that all this metadata is in plain sight inthe Twitter JSON object, along with the message text,although not necessarily as immediately accessible for othersocial media sources

... BUT still text-based; what about going beyond text?
















User Geolocation: Moving Forward

Much more metadata context that can be integrated,including:

hashtag priors [Carter et al., 2013]content similaritylanguage priors [Han et al., 2014]user “popularity”


Talk Outline

1 Introduction




5 Conclusions


Inter-document Graphs

Various possible sources of inter-document graphs:

explicit inter-document interactions (e.g. mentions)explicit author-level interactions (e.g. following)implicit inter-document similarity (e.g. documentoverlap/similarity)

Various possibilities for graph semantics:

directed vs. undirectedweighted vs. unweightedsingle vs. multiple graphs


Inter-document Graphs

Various possible sources of inter-document graphs:

explicit inter-document interactions (e.g. mentions)explicit author-level interactions (e.g. following)implicit inter-document similarity (e.g. documentoverlap/similarity)

Various possibilities for graph semantics:

directed vs. undirectedweighted vs. unweightedsingle vs. multiple graphs


Network Analytics 101

One of the core concepts in network analytics is homophily— the tendency of individuals to associate and bond withsimilar others

the corollary for network analytics is that stronglyconnected subgraphs tend to share the same labelobvious analogies in clustering and classification, with themain difference being the presence/absence of an explicitgraph

Sometimes connections actually represent heterophily, esp.in adversarial contexts such as debates


Network Analytics 101

One of the core concepts in network analytics is homophily— the tendency of individuals to associate and bond withsimilar others

the corollary for network analytics is that stronglyconnected subgraphs tend to share the same labelobvious analogies in clustering and classification, with themain difference being the presence/absence of an explicitgraph

Sometimes connections actually represent heterophily, esp.in adversarial contexts such as debates


Approaches to Network Inference

Popular approaches to network inference:

label propagation: nearest neighbour-style iterativesemi-supervised approach

collective classification: combine base and networkclassifiers to optimise consistency in the network

matrix factorisation: factorise the matrix into a productof lower-dimensional matrices


Label Propagation

Given a graph G = (V ,E ,W ) where V is the set of nodeswith |V| = n = nl + nu (where nl nodes are labelled and nu

nodes are unlabelled), E is the set of edges, and W is anedge weight matrix.

Simple iterative algorithm [Zhu and Ghahramani, 2002]:

1 for each node u(i)u ∈ Vu, get the set of labelled neighbours

based on E , and label u(i)u based on the (weighted) median

latitude and longitude of the neighbours2 repeat until convergence


Label Propagation

Modified Adsorption [Talukdar and Crammer, 2009]:

C (Y ) =∑

l

[µ1(Yl − Yl )

TS(Yl − Yl )+

µ2YT

l LYl + µ3‖Yl − Rl‖2

]where µ1, µ2 and µ3 are hyperparameters; L is the Laplacianof an undirected graph derived from G ; S is a diagonalbinary matrix indicating if a node is labelled or not; and Rl

is the lth column of matrix R of dimensions n×(m+1).


Collective Classification

Collective classification: given a network and an object oin the network, use (up to) three types of correlations toinfer a label for o:

1 the correlations between the label of o and its observedattributes

2 the correlations between the label of o and the observedattributes and labels of nodes connected to o

3 the correlations between the label of o and the unobservedlabels of objects connected to o

Source(s): Sen et al. [2008]


Collective Classification

Formally, collective classification takes a graph, made up of:

nodes V = {V1, . . . ,Vn}edges E

The task is to label the nodes Vi ∈ V from a label setL = {L1, . . . , Lq}, making use of the graph in the form of aneighborhood function N = {N1, . . . ,Nn}, whereNi ⊆ V \ {Vi}.


Approaches to Collective Classification

Two general approaches to capturing the first twocorrelations:

iterative classification: bootstrap node labels with acontent-only classifier and generate a random ordering overnodes V, then iteratively update estimate of vi based onthe current Ni and update ~ai accordingly [local approach]dual classifier + graph inference: train separatecontent-only and link classifiers, and use graph inference(mean field, loopy belief propagation, min-cut, etc.) to“smooth” the predictions over the graph [global approach]

Source(s): Sen et al. [2008]


User Geolocation: Enter the NetworkThe easiest way to generate network for Twitter usergeolocation is via @user mentions (e.g. @eltimster lovin

the talk)

Question of what to do with user mentions outside thetraining/dev/test data sample

(one) solution = collapse edges throughout-of-network nodes into direct edges

Weighted, directed graph the most obvious approach, butwe have found unweighted, undirected graphs to work best

Modified Adsorption doesn’t scale to well to large,highly-connected graphs, so consider removing edgesassociated with highly-connected users

Source(s): Jurgens [2013], Rahimi et al. [2015a,b]


User Geolocation: What about the Text?

The easiest way to integrate graph- and text-basedclassification is to use the text as a source of priors

Approach 1: use pointwise text-based user priors asbackoff for disconnected nodes [post-processing]

Approach 2: use pointwise text-based user priors as priorsfor all unlabelled nodes [pre-processing]

with MAD, we incorporate the priors as “dongle” nodes(uniquely) connected to a given user


User Geolocation: Results

LR LP

LP-LR

MAD

MAD-LR

0

0.2

0.4

0.60.63

0.560.59

0.70 0.72

Acc

@16

1

Source(s): Han et al. [2014], Rahimi et al. [2015a,b]


User Geolocation: “Celebrity Nodes”

2 5 15 50 500 5kCelebrity threshold T (# of mentions)

700

720

740

760

780

800

820

840

860M

ean

erro

r (in

km

)

Mean errorGraph size

105

106

107

108

109

Grap

h si

ze (#

edg

es)

Source(s): Rahimi et al. [2015a]



Little to separate text- or network-only results(LP < LP < MAD)

Both network-based methods improve with theincorporation of text-based user priors

In terms of computational efficiency, LP > LR� MAD

Removal of highly-connected nodes leads to greatertractability and also better results for MAD


User Geolocation: Moving Forward

Much more network context that can be integrated,including:

retweet interaction datatime distribution datageographical similarityrepresent user by cluster of geotagged tweets rather thansingle node

More refined analysis of local vs. global “celebrities”

Matrix factorisation, and other graph inference methods


Vote Prediction: TaskGiven the text for a given speaker in a political debate, e.g.:

Blackburn, Marsha (R)at this time , i would like to recognizethe gentleman from texas ( mr. hensarling ) who hasworked tirelessly not only on budgeting and not only onlooking at how we budget , but looking at what ...Hensarling, Jeb (R)mr. speaker , unless we enact h.r. 4297 and defeat thedemocratic substitute , americans will receive a mostunwelcome christmas gift from the democrats ...

predict their vote (for or against)

User mentions take the form of direct mentions of others inthe debate, and are provided as part of the dataset

Source(s): Thomas et al. [2006]


Vote Prediction: TaskGiven the text for a given speaker in a political debate, e.g.:

Blackburn, Marsha (R) [for]at this time , i would like to recognizethe gentleman from texas ( mr. hensarling ) who hasworked tirelessly not only on budgeting and not only onlooking at how we budget , but looking at what ...Hensarling, Jeb (R) [for]mr. speaker , unless we enact h.r. 4297 and defeat thedemocratic substitute , americans will receive a mostunwelcome christmas gift from the democrats ...

predict their vote (for or against)

User mentions take the form of direct mentions of others inthe debate, and are provided as part of the dataset

Source(s): Thomas et al. [2006]


Vote Prediction Dataset: ConVote

Data: US congressional debates from 2005 [Thomas et al.,2006]

Mentions manually tagged in the data

TotalTokens 1.2MSpeeches 1699Debates 53Average speakers/speeches per debate 32Average tokens per speech 735Proportion of For speeches 49%


Vote Prediction: Iterative Classifier

Source(s): Burfoot et al. [2011]


Citation Features: Representing Context

For iterative classification over ConVote, we experimentwith three representations of citation:

1 citation count: what are the counts of nodes in Ni thathave the same/different class to vi ?

2 context window: generate a feature vector L× C over thecontext windows of each document in Ni


Vote Prediction: Dual Classifier



Vote Prediction: Results

Baseli

neTex

t

Min

Cut

Itera

tive

(cou

nt)

Itera

tive

(win

d)

Dual

(LBP)

Dual

(MF)

0

0.2

0.4

0.6

0.8

0.51

0.76 0.78 0.80 0.79 0.82 0.82

Acc

urac

y



Vote Prediction: Findings

Once again, text- and graph-based methods produce similaraccuracy in isolation

... and once again, the combination of the two performsbetter again

Dual classifiers tend to do slightly better than iterativeclassifiers



Vote Prediction: Moving Forward

Possibility of adding further context, including:

joint modelling of stance [Sridhar et al., 2015]joint modelling of sentiment of citation contextmodelling of speech turn taking/chronology

Possibility of doing user modelling:

cross-debate user modelling of different speakers/partiesmodelling of speaker “influence”


What about Implicit Graphs?

What happens if we don’t have any explicit mention data togenerate our graph from?

ANSWER: we can still generate (complete) graphs throughcontent similarity, e.g. based on text similarity


Implicit Graphs: ConVote

Generate an undirected weighted graph based on the cosinesimilarity between each document pair, represented byTF-IDF vectors 〈...wi ,j , ...〉 for each document dj where:

wi ,j =1dj

(wi )

1 + log∑

k 1dk(wi )

For iterative classification, calculate the average similarityscore with neighbours of each class

Source(s): Burford et al. [2015]


Implicit Graphs: Results (5-grams)

Baseli

neTex

t

Itera

tive

Dual

(LBP)

Dual

(MF)

0

0.2

0.4

0.6

0.8

0.51

0.760.79

0.76 0.77

Acc

urac

y

Source(s): Burford et al. [2015]


Talk Outline

1 Introduction




5 Conclusions


Other Random Ideas

Expanding our notion of “context” in distributionalsemantics beyond words to include (at least) the userdimension

Inference methods for multiple graph overlays

Expanding our notion of “domain” to include user context


Talk Outline

1 Introduction




5 Conclusions


Conclusions

There’s plenty of context out there in social media, in formsincluding:

user metadataexplicit mention graphimplicit content similarity graph

There’s plenty of evidence to suggest that this context hashigh utility in NLP tasks

There’s also plenty of evidence to suggest that thecombination of text and context analysis is a potentcombination ... GET TO IT!


Conclusions






Conclusions






References

Shane Bergsma, Mark Dredze, Benjamin Van Durme, Theresa Wilson, and DavidYarowsky. Broadly improving user classification via communication-based name andlocation clustering on Twitter. In Proceedings of the 2013 Conference of the NorthAmerican Chapter of the Association for Computational Linguistics: Human LanguageTechnologies (NAACL HLT 2013), pages 1010–1019, Atlanta, USA, 2013. URLhttp://www.aclweb.org/anthology/N13-1121.

Clinton Burfoot, Steven Bird, and Timothy Baldwin. Collective classification ofcongressional floor-debate transcripts. In Proceedings of the 49th Annual Meeting ofthe Association for Computational Linguistics: Human Language Technologies (ACLHLT 2011), pages 1506–1515, Portland, USA, 2011.

Clint Burford, Steven Bird, and Timothy Baldwin. Collective document classification withimplicit inter-document semantic relationships. In Proceedings of the Fourth JointConference on Lexical and Computational Semantics (*SEM 2015), pages 106–116,Denver, USA, 2015.

Simon Carter, Manos Tsagkias, and Wouter Weerkamp. Microblog languageidentification: overcoming the limitations of short, unedited and idiomatic text.Language Resources and Evaluation, 47(1):195–215, 2013.

http://www.aclweb.org/anthology/N13-1121


ReferencesJacob Eisenstein, Noah A. Smith, and Eric P. Xing. Discovering sociolinguistic

associations with structured sparsity. In Proceedings of the 49th Annual Meeting ofthe Association for Computational Linguistics: Human Language Technologies (ACLHLT 2011), pages 1365–1374, Portland, USA, 2011. URLhttp://www.aclweb.org/anthology/P11-1137.

Bo Han, Paul Cook, and Timothy Baldwin. Geolocation prediction in social media databy finding location indicative words. In Proceedings of the 24th InternationalConference on Computational Linguistics (COLING 2012), pages 1045–1062, Mumbai,India, 2012.

Bo Han, Paul Cook, and Timothy Baldwin. Text-based Twitter user geolocationprediction. Journal of Artificial Intelligence Research, 49:451–500, 2014.

Dirk Hovy. Demographic factors improve classification performance. In Proceedings ofthe 53nd Annual Meeting of the Association for Computational Linguistics and 7thInternational Joint Conference on Natural Language Processing (ACL-IJCNLP 2015),pages 752–762, Beijing, China, 2015. URLhttp://www.aclweb.org/anthology/P15-1073.

David Jurgens. That’s what friends are for: Inferring location in online social mediaplatforms based on social relationships. In Proceedings of the 7th InternationalConference on Weblogs and Social Media (ICWSM 2013), pages 273–282, Dublin,Ireland, 2013.

http://www.aclweb.org/anthology/P11-1137



ReferencesJiwei Li, Thang Luong, and Dan Jurafsky. A hierarchical neural autoencoder for

paragraphs and documents. In Proceedings of the 53nd Annual Meeting of theAssociation for Computational Linguistics and 7th International Joint Conference onNatural Language Processing (ACL-IJCNLP 2015), pages 1106–1115, Beijing, China,2015. URL http://aclweb.org/anthology/P15-1107.

Marco Lui and Timothy Baldwin. Accurate language identification of Twitter messages.In Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM),pages 17–25, Gothenburg, Sweden, 2014. URLhttp://www.aclweb.org/anthology/W14-1303.

Afshin Rahimi, Trevor Cohn, and Timothy Baldwin. Twitter user geolocation using aunified text and network prediction model. In Proceedings of the 53nd AnnualMeeting of the Association for Computational Linguistics and 7th International JointConference on Natural Language Processing (ACL-IJCNLP 2015), pages 630–636,Beijing, China, 2015a.

Afshin Rahimi, Duy Vu, Trevor Cohn, and Timothy Baldwin. Exploiting text and networkcontext for geolocation of social media users. In Proceedings of the 2015 Conferenceof the North American Chapter of the Association for Computational Linguistics —Human Language Technologies (NAACL HLT 2015), pages 1362–1367, Denver, USA,2015b.

http://aclweb.org/anthology/P15-1107

http://www.aclweb.org/anthology/W14-1303


References

Stephen Roller, Michael Speriosu, Sarat Rallapalli, Benjamin Wing, and Jason Baldridge.Supervised text-based geolocation using language models on an adaptive grid. InProceedings of the Joint Conference on Empirical Methods in Natural LanguageProcessing and Computational Natural Language Learning 2012 (EMNLP-CoNLL2012), pages 1500–1510, Jeju Island, Korea, 2012. URLhttp://www.aclweb.org/anthology/D12-1137.

Prithviraj Sen, Galileo Mark Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, andTina Eliassi-Rad. Collective classification in network data. AI Magazine, 29(3):93–106, 2008.

Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor, and Marilyn Walker. Jointmodels of disagreement and stance in online debate. In Proceedings of the 53ndAnnual Meeting of the Association for Computational Linguistics and 7th InternationalJoint Conference on Natural Language Processing (ACL-IJCNLP 2015), pages116–125, Beijing, China, 2015. URL http://www.aclweb.org/anthology/P15-1012.

Partha Pratim Talukdar and Koby Crammer. New regularized algorithms for transductivelearning. In Proceedings of the European Conference on Machine Learning(ECML-PKDD) 2009, pages 442–457, 2009.

http://www.aclweb.org/anthology/D12-1137



ReferencesMatt Thomas, Bo Pang, and Lillian Lee. Get out the vote: Determining support or

opposition from congressional floor-debate transcripts. In Proceedings of the 2006Conference on Empirical Methods in Natural Language Processing (EMNLP 2006),pages 327–335, Sydney, Australia, 2006.

Svitlana Volkova, Theresa Wilson, and David Yarowsky. Exploring demographic languagevariations to improve multilingual sentiment analysis in social media. In Proceedingsof the 2013 Conference on Empirical Methods in Natural Language Processing(EMNLP 2013), pages 1815–1827, Seattle, USA, 2013.

Li Wang, Marco Lui, Su Nam Kim, Joakim Nivre, and Timothy Baldwin. Predictingthread discourse structure over technical web forums. In Proceedings of the 2011Conference on Empirical Methods in Natural Language Processing (EMNLP 2011),pages 13–25, Edinburgh, UK, 2011.

Dani Yogatama, Michael Heilman, Brendan O’Connor, Chris Dyer, Bryan R. Routledge,and Noah A. Smith. Predicting a scientific communitys response to an article. InProceedings of the 2011 Conference on Empirical Methods in Natural LanguageProcessing (EMNLP 2011), pages 594–604, Edinburgh, UK, 2011. URLhttp://www.aclweb.org/anthology/D11-1055.

Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data withlabel propagation. Technical report, Technical Report CMU-CALD-02-107, CarnegieMellon University, 2002.

http://www.aclweb.org/anthology/D11-1055