Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Text Mining of Social Media: Going beyond the Textand Only the Text
Timothy Baldwin
Source(s): http://people.eng.unimelb.edu.au/tbaldwin/pubs/wnut2015.pdf
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Talk Outline
1 Introduction
2 Document MetadataUser Geolocation
3 Inter-document Graph StructureUser GeolocationVote PredictionDocument Similarity
4 Other Random Ideas: Putting it in Context
5 Conclusions
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Introduction
NLP is inevitably focused on ... text, the whole text andnothing but the text, generally at the sentence or documentlevel, ignoring the greater sentence/document context
Source(s): http://goo.gl/ZS6A4n
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Introduction
Meanwhile, the network analysis and data miningcommunities largely ignore (or use very simple models of)the textual content of nodes, and focus instead onconnections between them, in the form of graphs ofdifferent types
Source(s): http://goo.gl/AP0QPP
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Thought Experiment I
In the absence of any other textual context, what does nochange mean?
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Thought Experiment I
What if I provide non-textual context?
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Thought Experiment I
Source(s): http://40.media.tumblr.com/tumblr_lm6hqnxkHU1qeb9g8o1_500.jpg
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Thought Experiment I
Source(s): http://i.ytimg.com/vi/5uEwcBT8MFM/maxresdefault.jpg
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Thought Experiment I
Source(s): http://i.ytimg.com/vi/I0sk0ATOaws.jpg
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Thought Experiment II
Text ↔ non-text predictive modelling
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Thought Experiment IIPredict the text:
Source(s): http://40.media.tumblr.com/tumblr_lm6hqnxkHU1qeb9g8o1_500.jpg
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Thought Experiment IIPredict the image:
Source(s): http://40.media.tumblr.com/tumblr_lm6hqnxkHU1qeb9g8o1_500.jpg
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Because ... You’re all Individuals!
Source(s): https://whatdaddoes.files.wordpress.com/2014/02/life-of-brian.jpg
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
So What Context are We Talking About?I used graphical context as illustrative examples, but what I amreally talking about is:
Document metadata:author [Wang et al., 2011, Yogatama et al., 2011, Carteret al., 2013, Lui and Baldwin, 2014]author profile [Eisenstein et al., 2011, Bergsma et al., 2013,Volkova et al., 2013, Hovy, 2015]publisher/host site [Yogatama et al., 2011]timestamp [Yogatama et al., 2011]genredomain [Yogatama et al., 2011]
Document typeDocument markupPosition within document [Li et al., 2015]Extra-textual content (tables, graphs, etc.)
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
The Relevance to Social Media
This is particularly relevant to social media, as a lot ofcontext is immediately accessible, in terms of:
likes/favouritesuser metadata of different typesmessage metadata of different typesuser “timeline”social network data of each userexplicit interactions between users (favourites, mentions,shares/retweets/...)
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Broad Aim
Aim: improve NLP through the use of document context,focusing in this talk on:
document metadatainter-document graph structure
Concerns along the way:
means of extracting contextscalability of modelsmodel expressivity
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Talk Outline
1 Introduction
2 Document MetadataUser Geolocation
3 Inter-document Graph StructureUser GeolocationVote PredictionDocument Similarity
4 Other Random Ideas: Putting it in Context
5 Conclusions
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
What is User Geolocation?
Given a set of messages from a user, e.g.:
Waiting for a tram in the rain in Collins St. A more typicalMelbourne day today.Why you keep me up? I ain’t got no worries.New Aussie Hip Hop News: The Yarra stinks.Just had a rather thrilling albeit bumpy camel ride aroundUluru - SO. MUCH. FUN!! Fancy joining me? Enter mycomp here
predict their “home” (as distinct from “source” or “about”)location
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
What is User Geolocation?
Given a set of messages from a user, e.g.:
Waiting for a tram in the rain in Collins St. A more typicalMelbourne day today.Why you keep me up? I ain’t got no worries.New Aussie Hip Hop News: The Yarra stinks.Just had a rather thrilling albeit bumpy camel ride aroundUluru - SO. MUCH. FUN!! Fancy joining me? Enter mycomp here
predict their “home” (as distinct from “source” or “about”)location ... Melbourne, AU
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Twitter User Geolocation: Overall
Methodology1 Discretise the geolocation class space, e.g. using a k-d tree
or via gazetteers [Roller et al., 2012, Han et al., 2012]2 Classify each user by supervised classification, based on
users with geotagged tweets, using either:1 the content of the users’ messages (“text”)2 user metadata
user-declared location (“loc”)timezone (“tz”)description (“desc”)real name (“rname”)
3 a combination of metadata and message content (“ALL”)
Dataset = Twitter-World: around 12M English tweets from1.4M users based around the world [Han et al., 2012]; userlocation = centre of the closest city to the centroid of tweets.
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
User Geolocation: Results
text lo
c tzdesc
rname
ALL
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.490.53
0.170.12 0.11
0.67
Acc
@16
1
Source(s): Han et al. [2014]
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
User Geolocation: Findings
User-declared location more accurate than text of postsfrom user; little information in other meta-data fields
The combination of the text and meta-data fields (based ona stacked logistic regression model) is more accurate again
Important to realise that pre-computing and storing imputeduser priors for all users is potentially intractable
... but also note that all this metadata is in plain sight inthe Twitter JSON object, along with the message text,although not necessarily as immediately accessible for othersocial media sources
... BUT still text-based; what about going beyond text?
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
User Geolocation: Findings
User-declared location more accurate than text of postsfrom user; little information in other meta-data fields
The combination of the text and meta-data fields (based ona stacked logistic regression model) is more accurate again
Important to realise that pre-computing and storing imputeduser priors for all users is potentially intractable
... but also note that all this metadata is in plain sight inthe Twitter JSON object, along with the message text,although not necessarily as immediately accessible for othersocial media sources
... BUT still text-based; what about going beyond text?
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
User Geolocation: Findings
User-declared location more accurate than text of postsfrom user; little information in other meta-data fields
The combination of the text and meta-data fields (based ona stacked logistic regression model) is more accurate again
Important to realise that pre-computing and storing imputeduser priors for all users is potentially intractable
... but also note that all this metadata is in plain sight inthe Twitter JSON object, along with the message text,although not necessarily as immediately accessible for othersocial media sources
... BUT still text-based; what about going beyond text?
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
User Geolocation: Moving Forward
Much more metadata context that can be integrated,including:
hashtag priors [Carter et al., 2013]content similaritylanguage priors [Han et al., 2014]user “popularity”
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Talk Outline
1 Introduction
2 Document MetadataUser Geolocation
3 Inter-document Graph StructureUser GeolocationVote PredictionDocument Similarity
4 Other Random Ideas: Putting it in Context
5 Conclusions
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Inter-document Graphs
Various possible sources of inter-document graphs:
explicit inter-document interactions (e.g. mentions)explicit author-level interactions (e.g. following)implicit inter-document similarity (e.g. documentoverlap/similarity)
Various possibilities for graph semantics:
directed vs. undirectedweighted vs. unweightedsingle vs. multiple graphs
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Inter-document Graphs
Various possible sources of inter-document graphs:
explicit inter-document interactions (e.g. mentions)explicit author-level interactions (e.g. following)implicit inter-document similarity (e.g. documentoverlap/similarity)
Various possibilities for graph semantics:
directed vs. undirectedweighted vs. unweightedsingle vs. multiple graphs
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Network Analytics 101
One of the core concepts in network analytics is homophily— the tendency of individuals to associate and bond withsimilar others
the corollary for network analytics is that stronglyconnected subgraphs tend to share the same labelobvious analogies in clustering and classification, with themain difference being the presence/absence of an explicitgraph
Sometimes connections actually represent heterophily, esp.in adversarial contexts such as debates
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Network Analytics 101
One of the core concepts in network analytics is homophily— the tendency of individuals to associate and bond withsimilar others
the corollary for network analytics is that stronglyconnected subgraphs tend to share the same labelobvious analogies in clustering and classification, with themain difference being the presence/absence of an explicitgraph
Sometimes connections actually represent heterophily, esp.in adversarial contexts such as debates
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Approaches to Network Inference
Popular approaches to network inference:
label propagation: nearest neighbour-style iterativesemi-supervised approach
collective classification: combine base and networkclassifiers to optimise consistency in the network
matrix factorisation: factorise the matrix into a productof lower-dimensional matrices
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Label Propagation
Given a graph G = (V ,E ,W ) where V is the set of nodeswith |V| = n = nl + nu (where nl nodes are labelled and nu
nodes are unlabelled), E is the set of edges, and W is anedge weight matrix.
Simple iterative algorithm [Zhu and Ghahramani, 2002]:
1 for each node u(i)u ∈ Vu, get the set of labelled neighbours
based on E , and label u(i)u based on the (weighted) median
latitude and longitude of the neighbours2 repeat until convergence
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Label Propagation
Modified Adsorption [Talukdar and Crammer, 2009]:
C (Y ) =∑
l
[µ1(Yl − Yl )
TS(Yl − Yl )+
µ2YT
l LYl + µ3‖Yl − Rl‖2
]where µ1, µ2 and µ3 are hyperparameters; L is the Laplacianof an undirected graph derived from G ; S is a diagonalbinary matrix indicating if a node is labelled or not; and Rl
is the lth column of matrix R of dimensions n×(m+1).
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Collective Classification
Collective classification: given a network and an object oin the network, use (up to) three types of correlations toinfer a label for o:
1 the correlations between the label of o and its observedattributes
2 the correlations between the label of o and the observedattributes and labels of nodes connected to o
3 the correlations between the label of o and the unobservedlabels of objects connected to o
Source(s): Sen et al. [2008]
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Collective Classification
Formally, collective classification takes a graph, made up of:
nodes V = {V1, . . . ,Vn}edges E
The task is to label the nodes Vi ∈ V from a label setL = {L1, . . . , Lq}, making use of the graph in the form of aneighborhood function N = {N1, . . . ,Nn}, whereNi ⊆ V \ {Vi}.
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Approaches to Collective Classification
Two general approaches to capturing the first twocorrelations:
iterative classification: bootstrap node labels with acontent-only classifier and generate a random ordering overnodes V, then iteratively update estimate of vi based onthe current Ni and update ~ai accordingly [local approach]dual classifier + graph inference: train separatecontent-only and link classifiers, and use graph inference(mean field, loopy belief propagation, min-cut, etc.) to“smooth” the predictions over the graph [global approach]
Source(s): Sen et al. [2008]
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
User Geolocation: Enter the NetworkThe easiest way to generate network for Twitter usergeolocation is via @user mentions (e.g. @eltimster lovin
the talk)
Question of what to do with user mentions outside thetraining/dev/test data sample
(one) solution = collapse edges throughout-of-network nodes into direct edges
Weighted, directed graph the most obvious approach, butwe have found unweighted, undirected graphs to work best
Modified Adsorption doesn’t scale to well to large,highly-connected graphs, so consider removing edgesassociated with highly-connected users
Source(s): Jurgens [2013], Rahimi et al. [2015a,b]
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
User Geolocation: What about the Text?
The easiest way to integrate graph- and text-basedclassification is to use the text as a source of priors
Approach 1: use pointwise text-based user priors asbackoff for disconnected nodes [post-processing]
Approach 2: use pointwise text-based user priors as priorsfor all unlabelled nodes [pre-processing]
with MAD, we incorporate the priors as “dongle” nodes(uniquely) connected to a given user
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
User Geolocation: Results
LR LP
LP-LR
MAD
MAD-LR
0
0.2
0.4
0.60.63
0.560.59
0.70 0.72
Acc
@16
1
Source(s): Han et al. [2014], Rahimi et al. [2015a,b]
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
User Geolocation: “Celebrity Nodes”
2 5 15 50 500 5kCelebrity threshold T (# of mentions)
700
720
740
760
780
800
820
840
860M
ean
erro
r (in
km
)
Mean errorGraph size
105
106
107
108
109
Grap
h si
ze (#
edg
es)
Source(s): Rahimi et al. [2015a]
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
User Geolocation: Findings
Little to separate text- or network-only results(LP < LP < MAD)
Both network-based methods improve with theincorporation of text-based user priors
In terms of computational efficiency, LP > LR� MAD
Removal of highly-connected nodes leads to greatertractability and also better results for MAD
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
User Geolocation: Moving Forward
Much more network context that can be integrated,including:
retweet interaction datatime distribution datageographical similarityrepresent user by cluster of geotagged tweets rather thansingle node
More refined analysis of local vs. global “celebrities”
Matrix factorisation, and other graph inference methods
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Vote Prediction: TaskGiven the text for a given speaker in a political debate, e.g.:
Blackburn, Marsha (R)at this time , i would like to recognizethe gentleman from texas ( mr. hensarling ) who hasworked tirelessly not only on budgeting and not only onlooking at how we budget , but looking at what ...Hensarling, Jeb (R)mr. speaker , unless we enact h.r. 4297 and defeat thedemocratic substitute , americans will receive a mostunwelcome christmas gift from the democrats ...
predict their vote (for or against)
User mentions take the form of direct mentions of others inthe debate, and are provided as part of the dataset
Source(s): Thomas et al. [2006]
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Vote Prediction: TaskGiven the text for a given speaker in a political debate, e.g.:
Blackburn, Marsha (R) [for]at this time , i would like to recognizethe gentleman from texas ( mr. hensarling ) who hasworked tirelessly not only on budgeting and not only onlooking at how we budget , but looking at what ...Hensarling, Jeb (R) [for]mr. speaker , unless we enact h.r. 4297 and defeat thedemocratic substitute , americans will receive a mostunwelcome christmas gift from the democrats ...
predict their vote (for or against)
User mentions take the form of direct mentions of others inthe debate, and are provided as part of the dataset
Source(s): Thomas et al. [2006]
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Vote Prediction Dataset: ConVote
Data: US congressional debates from 2005 [Thomas et al.,2006]
Mentions manually tagged in the data
TotalTokens 1.2MSpeeches 1699Debates 53Average speakers/speeches per debate 32Average tokens per speech 735Proportion of For speeches 49%
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Vote Prediction: Iterative Classifier
Source(s): Burfoot et al. [2011]
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Citation Features: Representing Context
For iterative classification over ConVote, we experimentwith three representations of citation:
1 citation count: what are the counts of nodes in Ni thathave the same/different class to vi ?
2 context window: generate a feature vector L× C over thecontext windows of each document in Ni
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Vote Prediction: Dual Classifier
Source(s): Burfoot et al. [2011]
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Vote Prediction: Results
Baseli
neTex
t
Min
Cut
Itera
tive
(cou
nt)
Itera
tive
(win
d)
Dual
(LBP)
Dual
(MF)
0
0.2
0.4
0.6
0.8
0.51
0.76 0.78 0.80 0.79 0.82 0.82
Acc
urac
y
Source(s): Burfoot et al. [2011]
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Vote Prediction: Findings
Once again, text- and graph-based methods produce similaraccuracy in isolation
... and once again, the combination of the two performsbetter again
Dual classifiers tend to do slightly better than iterativeclassifiers
Source(s): Burfoot et al. [2011]
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Vote Prediction: Moving Forward
Possibility of adding further context, including:
joint modelling of stance [Sridhar et al., 2015]joint modelling of sentiment of citation contextmodelling of speech turn taking/chronology
Possibility of doing user modelling:
cross-debate user modelling of different speakers/partiesmodelling of speaker “influence”
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
What about Implicit Graphs?
What happens if we don’t have any explicit mention data togenerate our graph from?
ANSWER: we can still generate (complete) graphs throughcontent similarity, e.g. based on text similarity
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Implicit Graphs: ConVote
Generate an undirected weighted graph based on the cosinesimilarity between each document pair, represented byTF-IDF vectors 〈...wi ,j , ...〉 for each document dj where:
wi ,j =1dj
(wi )
1 + log∑
k 1dk(wi )
For iterative classification, calculate the average similarityscore with neighbours of each class
Source(s): Burford et al. [2015]
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Implicit Graphs: Results (5-grams)
Baseli
neTex
t
Itera
tive
Dual
(LBP)
Dual
(MF)
0
0.2
0.4
0.6
0.8
0.51
0.760.79
0.76 0.77
Acc
urac
y
Source(s): Burford et al. [2015]
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Talk Outline
1 Introduction
2 Document MetadataUser Geolocation
3 Inter-document Graph StructureUser GeolocationVote PredictionDocument Similarity
4 Other Random Ideas: Putting it in Context
5 Conclusions
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Other Random Ideas
Expanding our notion of “context” in distributionalsemantics beyond words to include (at least) the userdimension
Inference methods for multiple graph overlays
Expanding our notion of “domain” to include user context
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Talk Outline
1 Introduction
2 Document MetadataUser Geolocation
3 Inter-document Graph StructureUser GeolocationVote PredictionDocument Similarity
4 Other Random Ideas: Putting it in Context
5 Conclusions
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Conclusions
There’s plenty of context out there in social media, in formsincluding:
user metadataexplicit mention graphimplicit content similarity graph
There’s plenty of evidence to suggest that this context hashigh utility in NLP tasks
There’s also plenty of evidence to suggest that thecombination of text and context analysis is a potentcombination ... GET TO IT!
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Conclusions
There’s plenty of context out there in social media, in formsincluding:
user metadataexplicit mention graphimplicit content similarity graph
There’s plenty of evidence to suggest that this context hashigh utility in NLP tasks
There’s also plenty of evidence to suggest that thecombination of text and context analysis is a potentcombination ... GET TO IT!
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
Conclusions
There’s plenty of context out there in social media, in formsincluding:
user metadataexplicit mention graphimplicit content similarity graph
There’s plenty of evidence to suggest that this context hashigh utility in NLP tasks
There’s also plenty of evidence to suggest that thecombination of text and context analysis is a potentcombination ... GET TO IT!
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
References
Shane Bergsma, Mark Dredze, Benjamin Van Durme, Theresa Wilson, and DavidYarowsky. Broadly improving user classification via communication-based name andlocation clustering on Twitter. In Proceedings of the 2013 Conference of the NorthAmerican Chapter of the Association for Computational Linguistics: Human LanguageTechnologies (NAACL HLT 2013), pages 1010–1019, Atlanta, USA, 2013. URLhttp://www.aclweb.org/anthology/N13-1121.
Clinton Burfoot, Steven Bird, and Timothy Baldwin. Collective classification ofcongressional floor-debate transcripts. In Proceedings of the 49th Annual Meeting ofthe Association for Computational Linguistics: Human Language Technologies (ACLHLT 2011), pages 1506–1515, Portland, USA, 2011.
Clint Burford, Steven Bird, and Timothy Baldwin. Collective document classification withimplicit inter-document semantic relationships. In Proceedings of the Fourth JointConference on Lexical and Computational Semantics (*SEM 2015), pages 106–116,Denver, USA, 2015.
Simon Carter, Manos Tsagkias, and Wouter Weerkamp. Microblog languageidentification: overcoming the limitations of short, unedited and idiomatic text.Language Resources and Evaluation, 47(1):195–215, 2013.
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
ReferencesJacob Eisenstein, Noah A. Smith, and Eric P. Xing. Discovering sociolinguistic
associations with structured sparsity. In Proceedings of the 49th Annual Meeting ofthe Association for Computational Linguistics: Human Language Technologies (ACLHLT 2011), pages 1365–1374, Portland, USA, 2011. URLhttp://www.aclweb.org/anthology/P11-1137.
Bo Han, Paul Cook, and Timothy Baldwin. Geolocation prediction in social media databy finding location indicative words. In Proceedings of the 24th InternationalConference on Computational Linguistics (COLING 2012), pages 1045–1062, Mumbai,India, 2012.
Bo Han, Paul Cook, and Timothy Baldwin. Text-based Twitter user geolocationprediction. Journal of Artificial Intelligence Research, 49:451–500, 2014.
Dirk Hovy. Demographic factors improve classification performance. In Proceedings ofthe 53nd Annual Meeting of the Association for Computational Linguistics and 7thInternational Joint Conference on Natural Language Processing (ACL-IJCNLP 2015),pages 752–762, Beijing, China, 2015. URLhttp://www.aclweb.org/anthology/P15-1073.
David Jurgens. That’s what friends are for: Inferring location in online social mediaplatforms based on social relationships. In Proceedings of the 7th InternationalConference on Weblogs and Social Media (ICWSM 2013), pages 273–282, Dublin,Ireland, 2013.
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
ReferencesJiwei Li, Thang Luong, and Dan Jurafsky. A hierarchical neural autoencoder for
paragraphs and documents. In Proceedings of the 53nd Annual Meeting of theAssociation for Computational Linguistics and 7th International Joint Conference onNatural Language Processing (ACL-IJCNLP 2015), pages 1106–1115, Beijing, China,2015. URL http://aclweb.org/anthology/P15-1107.
Marco Lui and Timothy Baldwin. Accurate language identification of Twitter messages.In Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM),pages 17–25, Gothenburg, Sweden, 2014. URLhttp://www.aclweb.org/anthology/W14-1303.
Afshin Rahimi, Trevor Cohn, and Timothy Baldwin. Twitter user geolocation using aunified text and network prediction model. In Proceedings of the 53nd AnnualMeeting of the Association for Computational Linguistics and 7th International JointConference on Natural Language Processing (ACL-IJCNLP 2015), pages 630–636,Beijing, China, 2015a.
Afshin Rahimi, Duy Vu, Trevor Cohn, and Timothy Baldwin. Exploiting text and networkcontext for geolocation of social media users. In Proceedings of the 2015 Conferenceof the North American Chapter of the Association for Computational Linguistics —Human Language Technologies (NAACL HLT 2015), pages 1362–1367, Denver, USA,2015b.
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
References
Stephen Roller, Michael Speriosu, Sarat Rallapalli, Benjamin Wing, and Jason Baldridge.Supervised text-based geolocation using language models on an adaptive grid. InProceedings of the Joint Conference on Empirical Methods in Natural LanguageProcessing and Computational Natural Language Learning 2012 (EMNLP-CoNLL2012), pages 1500–1510, Jeju Island, Korea, 2012. URLhttp://www.aclweb.org/anthology/D12-1137.
Prithviraj Sen, Galileo Mark Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, andTina Eliassi-Rad. Collective classification in network data. AI Magazine, 29(3):93–106, 2008.
Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor, and Marilyn Walker. Jointmodels of disagreement and stance in online debate. In Proceedings of the 53ndAnnual Meeting of the Association for Computational Linguistics and 7th InternationalJoint Conference on Natural Language Processing (ACL-IJCNLP 2015), pages116–125, Beijing, China, 2015. URL http://www.aclweb.org/anthology/P15-1012.
Partha Pratim Talukdar and Koby Crammer. New regularized algorithms for transductivelearning. In Proceedings of the European Conference on Machine Learning(ECML-PKDD) 2009, pages 442–457, 2009.
Text Mining of Social Media: Going beyond the Text and Only the Text 31/7/2015
ReferencesMatt Thomas, Bo Pang, and Lillian Lee. Get out the vote: Determining support or
opposition from congressional floor-debate transcripts. In Proceedings of the 2006Conference on Empirical Methods in Natural Language Processing (EMNLP 2006),pages 327–335, Sydney, Australia, 2006.
Svitlana Volkova, Theresa Wilson, and David Yarowsky. Exploring demographic languagevariations to improve multilingual sentiment analysis in social media. In Proceedingsof the 2013 Conference on Empirical Methods in Natural Language Processing(EMNLP 2013), pages 1815–1827, Seattle, USA, 2013.
Li Wang, Marco Lui, Su Nam Kim, Joakim Nivre, and Timothy Baldwin. Predictingthread discourse structure over technical web forums. In Proceedings of the 2011Conference on Empirical Methods in Natural Language Processing (EMNLP 2011),pages 13–25, Edinburgh, UK, 2011.
Dani Yogatama, Michael Heilman, Brendan O’Connor, Chris Dyer, Bryan R. Routledge,and Noah A. Smith. Predicting a scientific communitys response to an article. InProceedings of the 2011 Conference on Empirical Methods in Natural LanguageProcessing (EMNLP 2011), pages 594–604, Edinburgh, UK, 2011. URLhttp://www.aclweb.org/anthology/D11-1055.
Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data withlabel propagation. Technical report, Technical Report CMU-CALD-02-107, CarnegieMellon University, 2002.