+ All Categories
Home > Documents > Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary...

Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary...

Date post: 25-Jul-2019
Category:
Upload: dangdang
View: 220 times
Download: 0 times
Share this document with a friend
45
Making natural language processing robust to sociolinguistic variation Jacob Eisenstein @jacobeisenstein Georgia Institute of Technology September 9, 2017
Transcript
Page 1: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Making natural language processingrobust to sociolinguistic variation

Jacob Eisenstein@jacobeisenstein

Georgia Institute of Technology

September 9, 2017

Page 2: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Machine readingFrom text to structured

representations.

New domains of digitizedtexts offer opportunities aswell as challenges.

Page 3: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Annotateand train

Machine readingFrom text to structured

representations.

New domains of digitizedtexts offer opportunities aswell as challenges.

Page 4: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Annotateand train

Machine readingFrom text to structured

representations.

New domains of digitizedtexts offer opportunities aswell as challenges.

Page 5: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Language data then and now

Then: news text, small setof authors, professionallyedited, fixed style

Now: open domain,everyone is an author,unedited, many styles

Page 6: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Language data then and now

Then: news text, small setof authors, professionallyedited, fixed style

Now: open domain,everyone is an author,unedited, many styles

Page 7: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Social media has forcedNLP to confront thechallenge of missingsocial context(Eisenstein, 2013):

I tacit assumptionsabout audienceknowledge

I language variationacross social groups

(Gimpel et al., 2011)(Ritter et al., 2011)(Foster et al., 2011)

Page 8: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Social media has forcedNLP to confront thechallenge of missingsocial context(Eisenstein, 2013):

I tacit assumptionsabout audienceknowledge

I language variationacross social groups

(Gimpel et al., 2011)(Ritter et al., 2011)(Foster et al., 2011)

Page 9: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Social media has forcedNLP to confront thechallenge of missingsocial context(Eisenstein, 2013):

I tacit assumptionsabout audienceknowledge

I language variationacross social groups

(Gimpel et al., 2011)(Ritter et al., 2011)(Foster et al., 2011)

Page 10: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit
Page 11: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit
Page 12: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Finding tacit context in the social network

I Social media texts lackcontext, because it isimplicit between thewriter and the reader.

I Homophily: sociallyconnected individuals tendto share traits.

Page 13: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Assortativity of entity references

Page 14: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit
Page 15: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit
Page 16: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit
Page 17: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

We project embeddings for entities, words, andauthors into a shared semantic space.

“Dirk Novitsky”

“the warriors”

Inner products in this space indicate compatibility.

Page 18: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Socially-InfusedEn,tyLinking

47

Page 19: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Socially-InfusedEn,tyLinking

47

tweeten,tyassignments

author

Page 20: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Socially-InfusedEn,tyLinking

47

tweeten,tyassignments

author

‣ isemployedtomodelsurfacefeatures.g1

Page 21: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Socially-InfusedEn,tyLinking

47

tweeten,tyassignments

author

‣ isusedtocapturetwoassump,ons:‣ En,tyhomophily

‣ isemployedtomodelsurfacefeatures.

‣ Seman,callyrelatedmen,onstendtorefersimilaren,,es

g1

g2

Page 22: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Socially-InfusedEn,tyLinking

48

g2(x, yt, u, t;⇥2) = v(u)u

>W(u,e)v(e)

yt+ v

(m)t

>W(m,e)v(e)

yt

authorembedding men,onembedding

v(u)u

v(e)yt

v(e)yt v

(m)t

g2

�(x, yt, t)

g1

en,tyembedding

Page 23: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Loss-augmentedtraining

Socially-InfusedEn,tyLinking

48

g2(x, yt, u, t;⇥2) = v(u)u

>W(u,e)v(e)

yt+ v

(m)t

>W(m,e)v(e)

yt

authorembedding men,onembedding

v(u)u

v(e)yt

v(e)yt v

(m)t

g2

�(x, yt, t)

g1

en,tyembedding

Page 24: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Learning

49

Page 25: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Learning

49

‣ Loss-augmentedinference:

Page 26: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Learning

49

‣ Loss-augmentedinference: hammingloss

Page 27: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Learning

49

‣ Loss-augmentedinference:

‣ Op,miza,on:stochas,cgradientdescent

hammingloss

Page 28: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Inference

50

‣ Non-overlappingstructure

Inordertolink‘RedSox’toarealen,ty,‘Red’and‘Sox’shouldbelinkedtoNil.

Page 29: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Classifier Struct Struct+Social S-MART

6466687072747678

F1 Dataset

NEEL

TACL

+3.2

+2.0

I Structure prediction improves accuracy.

I Social context yields further improvements.

I S-MART is the prior state-of-the-art(Yang & Chang, 2015).

Page 30: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Social media has forcedNLP to confront thechallenge of missingsocial context(Eisenstein, 2013):

I tacit assumptionsabout audienceknowledge

I language variationacross social groups

(Gimpel et al., 2011)(Ritter et al., 2011)(Foster et al., 2011)

Page 31: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Social media has forcedNLP to confront thechallenge of missingsocial context(Eisenstein, 2013):

I tacit assumptionsabout audienceknowledge

I language variationacross social groups

(Gimpel et al., 2011)(Ritter et al., 2011)(Foster et al., 2011)

Page 32: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Language variation: a challenge for NLP

“I would like to believe he’ssick rather than just meanand evil.”

“You could’ve been gettingdown to this sick beat.”

(Yang & Eisenstein, 2017)

Page 33: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Language variation: a challenge for NLP

“I would like to believe he’ssick rather than just meanand evil.”

“You could’ve been gettingdown to this sick beat.”

(Yang & Eisenstein, 2017)

Page 34: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Personalization by ensemble

I Goal: personalized conditional likelihood,P(y | x , a), where a is the author.

I Problem: We have labeled examples for only afew authors.

I Personalization ensemble

P(y | x , a) =∑k

Pk(y | x)πa(k)

I Pk(y | x) is a basis modelI πa(·) are the ensemble weights for author a

Page 35: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Personalization by ensemble

I Goal: personalized conditional likelihood,P(y | x , a), where a is the author.

I Problem: We have labeled examples for only afew authors.

I Personalization ensemble

P(y | x , a) =∑k

Pk(y | x)πa(k)

I Pk(y | x) is a basis modelI πa(·) are the ensemble weights for author a

Page 36: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Homophily to the rescue?

Sick! Sick!

Sick!Sick!

Labeled data

Unlabeled data

Are language styles assortative on the socialnetwork?

Page 37: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Evidence for linguistic homophilyPilot study: is classifier accuracy assortative on theTwitter social network?

assort(G ) =1

#|G |∑

(i ,j)∈G

δ(yi = yi)δ(yj = yj)

+ δ(yi 6= yi)δ(yj 6= yj)

0 20 40 60 80 100rewiring epochs

0.7000.7050.7100.7150.7200.7250.7300.735

asso

rtativ

ity

follow

0 20 40 60 80 100rewiring epochs

mention

0 20 40 60 80 100rewiring epochs

retweet

original networkrandom rewiring

Page 38: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Evidence for linguistic homophilyPilot study: is classifier accuracy assortative on theTwitter social network?

assort(G ) =1

#|G |∑

(i ,j)∈G

δ(yi = yi)δ(yj = yj)

+ δ(yi 6= yi)δ(yj 6= yj)

0 20 40 60 80 100rewiring epochs

0.7000.7050.7100.7150.7200.7250.7300.735

asso

rtativ

ity

follow

0 20 40 60 80 100rewiring epochs

mention

0 20 40 60 80 100rewiring epochs

retweet

original networkrandom rewiring

Page 39: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Network-driven personalization

I For each author, estimatea node embeddingea (Tang et al., 2015).

I Nodes who shareneighbors get similarembeddings.

πa =SoftMax(f (ea))

P(y | x , a) =K∑

k=1

Pk(y | x)πa(k)

Page 40: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Results

Mixture of Experts NLSE Social Personalization

0.0

0.5

1.0

1.5

2.0

2.5

3.0F1

impr

ovem

ent o

ver C

onvN

et

+0.10

+1.90

+2.80

Twitter Sentiment Analysis

Improvements over ConvNet baseline:

I +2.8% on Twitter Sentiment Analysis

I +2.7% on Ciao Product Reviews

NLSE is prior state-of-the-art (Astudillo et al., 2015).

Page 41: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Variable sentiment words

More positive More negative

1 banging loss fever brokenfucking

dear like god yeah wow

2 chilling cold ill sick suck satisfy trust wealth stronglmao

3 ass damn piss bitch shit talent honestly voting winclever

4 insane bawling fever weird cry lmao super lol haha hahaha

5 ruin silly bad boring dreadful lovatics wish beliebers ariana-tors kendall

Page 42: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

SummaryRobustness is a key challenge for making NLP effective onsocial media data:

I Tacit assumptions about shared knowledge; languagevariation

I Social metadata gives NLP systems the flexibility tohandle each author differently.

The long tail of rare events is the other big challenge.

I Word embeddings for unseen words (Pinter et al., 2017)

I Lexicon-based supervision (Eisenstein, 2017)

I Applications to finding rare events in electronic healthrecords (ongoing work with Jimeng Sun)

Page 43: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

SummaryRobustness is a key challenge for making NLP effective onsocial media data:

I Tacit assumptions about shared knowledge; languagevariation

I Social metadata gives NLP systems the flexibility tohandle each author differently.

The long tail of rare events is the other big challenge.

I Word embeddings for unseen words (Pinter et al., 2017)

I Lexicon-based supervision (Eisenstein, 2017)

I Applications to finding rare events in electronic healthrecords (ongoing work with Jimeng Sun)

Page 44: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

Acknowledgments

I Students and collaborators:I Yi Yang (GT → Bloomberg)I Mingwei Chang (Google Research)I See https://gtnlp.wordpress.com/ for more!

I Funding: National Science Foundation,National Institutes for Health, Georgia Tech

Page 45: Making natural language processing robust to ...jeisenst/slides/eisenstein-mlconf.pdf · Summary Robustness is a key challenge for making NLP e ective on social media data: I Tacit

References I

Astudillo, R. F., Amir, S., Lin, W., Silva, M., & Trancoso, I. (2015). Learning word representations from scarce andnoisy data with embedding sub-spaces. In Proceedings of the Association for Computational Linguistics(ACL), Beijing.

Eisenstein, J. (2013). What to do about bad language on the internet. In Proceedings of the North AmericanChapter of the Association for Computational Linguistics (NAACL), (pp. 359–369).

Eisenstein, J. (2017). Unsupervised learning for lexicon-based classification. In Proceedings of the NationalConference on Artificial Intelligence (AAAI), San Francisco.

Foster, J., Cetinoglu, O., Wagner, J., Le Roux, J., Nivre, J., Hogan, D., & van Genabith, J. (2011). From news tocomment: Resources and benchmarks for parsing the language of web 2.0. In Proceedings of the InternationalJoint Conference on Natural Language Processing (IJCNLP), (pp. 893–901)., Chiang Mai, Thailand. AsianFederation of Natural Language Processing.

Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan,J., & Smith, N. A. (2011). Part-of-speech tagging for Twitter: annotation, features, and experiments. InProceedings of the Association for Computational Linguistics (ACL), (pp. 42–47)., Portland, OR.

Pinter, Y., Guthrie, R., & Eisenstein, J. (2017). Mimicking word embeddings using subword rnns. In Proceedings ofEmpirical Methods for Natural Language Processing (EMNLP).

Ritter, A., Clark, S., Mausam, & Etzioni, O. (2011). Named entity recognition in tweets: an experimental study. InProceedings of EMNLP.

Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015). Line: Large-scale information networkembedding. In Proceedings of the Conference on World-Wide Web (WWW), (pp. 1067–1077).

Yang, Y. & Chang, M.-W. (2015). S-mart: Novel tree-based structured learning algorithms applied to tweet entitylinking. In Proceedings of the Association for Computational Linguistics (ACL), (pp. 504–513)., Beijing.

Yang, Y. & Eisenstein, J. (2017). Overcoming language variation in sentiment analysis with social attention.Transactions of the Association for Computational Linguistics (TACL), in press.


Recommended