Date post: | 18-Jan-2016 |
Category: |
Documents |
Upload: | branden-lane |
View: | 214 times |
Download: | 0 times |
The Linguistics of Spin
Stephan GreenePhilip Resnik
MITH: November 13, 2007
2MITH: November 13, 2007
We All Know What Spin is…
“Mistakes were made.”
- Ronald Reagan, January 27, 1987
3MITH: November 13, 2007
Spin is…
A distinctive point of view, emphasis, or interpretation
• The American Heritage® Dictionary of the English Language, Fourth Edition
A special point of view, emphasis, or interpretation presented for the purpose of influencing opinion
• Merriam-Webster's Online Dictionary, 10th Edition
4MITH: November 13, 2007
Example: Does Reuters Spin in Favor of the Palestinians?
Israelis did X. Palestinians did X.
Claims made by pro-Israeli media watchdog group honestreporting.com
5MITH: November 13, 2007
Example: Does Reuters Spin in Favor of the Palestinians?
Israelis did X. Palestinians did X.
What should be the basis for an objective analysis?
6MITH: November 13, 2007
An Aside: Opinion is not Spin
Product Reviews
LOVE this camera, October 24, 2006
By Computer illiterate in Chicago (Chicago)
I bought this camera about 2 months ago and just love it. It takes excellent pictures with minimal effort. MY last camera was a 3.1 mexapixel Nikon so this is definitely an improvement. This camera has a lot of nice features which I am still learning how to use. We debated about getting the Canon powershot A630 but a friend of mine has that camera and I actually think this camera takes better "out of the box" pictures. (Of course, the A630 did have a bigger screen which would have been a nice feature.) We've taken a lot of shots of my daughter who just turned 1 and again I cannot be happier with the results of this camera.
Movie Reviews
“Have you gotten the point that I LIKE THIS MOVIE?”
7MITH: November 13, 2007
A Hypothesis, Introduced via an Analogy
Facial expressions
Emotion
Construal
Sentiment
Linguistic expression
Anger
Terrorists destroyed the bus
The bus exploded
CausationIntent
Change of State…
Negative
The bus was dispatched
Ekman (2002)
8MITH: November 13, 2007
Surface expression and the construal of events
• Language supports multiple perspectives on the same event
• The cat chased the mouse
• The mouse fled from the cat
• Syntactic choices and semantic construals are connected
• X is similar to Y means
Y is similar to X, right?
• Vietnam is similar to Iraq
• Iraq is similar to Vietnam
Positioning of the noun phrases determines figure/ground or variant/referent construal
(Gleitman et al. 1996; cf. Tversky 1977)
9MITH: November 13, 2007
Surface expression and the construal of events
• A limited conceptual vocabulary accounts for how
semantic construals are encoded by syntactic forms.
• The farmer loaded hay onto the wagon
• The farmer loaded the wagon with hay
• The farmer tossed hay onto the wagon
• *The farmer tossed the wagon with hay Change of state
(Pinker, Jackendoff, Levin, Dowty, and many others)
10MITH: November 13, 2007
Elements of Linguistic Expression
Construal
Sentiment
Linguistic expression
Terrorists destroyed the bus
The bus exploded
CausationIntent
Change of State…
Negative
The bus was dispatched
Word choice
Syntactic frame
kill
butcher
murder
martyr
11MITH: November 13, 2007
Example: Entailments of Verb Choices
0
1
2
3
4
5
6
7
Volition Sentience Punctuality Telicity
Ra
tin
g Ergative Kill Verbs
Transitive Kill Verbs
(a) The witch poisoned Snow White
(b) The witch drowned Snow White
Did the witch intend to do what she did?
People attribute more intent to the witch when the action was to poison Snow White.
The difference, though small, is statistically significant.
(Greene 2007)
*Snow White poisoned.
Snow White drowned.
(Levin 1993, Lemmens 1998)
13MITH: November 13, 2007
Overview
Introduction
Connecting Lexical Semantics with Perceived Sentiment: Psycholinguistic Evidence
Automatically Labeling Documents with Sentiment
• The Death Penalty Debate
• The Israeli-Palestinian Conflict
• Congressional Floor Debate Speeches
Future Directions
14MITH: November 13, 2007
Lexical Semantics → Perceived Sentiment
Intuitions on encoding choices: Who did what to whom?
• volitional agents
• causative verbs
• affected objects, …
Connection: Independently motivated, formally investigated work in linguistics
• Decompositional semantics for Transitive clauses
• Differences in linguistic form correlate with differences in meaning
First Result: Established Semantic Components predict Perceived Sentiment
15MITH: November 13, 2007
What Semantic Properties Matter?
Sources
Dowty (1991) Proto-Role PropertyHopper and Thompson (1980) Transitivity Component
Proto-Agent Properties
• Volitional involvement in the event or state • Volition
• Sentience (and/or Perception)
• Cause event or change of state in object • Agency
• Movement relative to the position of another participant
• Kinesis
• Exists independently of the event
Proto-Patient Properties
• Undergoes change of state, Causally effected by another participant
• Affectedness of Object
• Does not exist independently of the event
• Stationary relative to movement of another participant
• Kinesis
• Object individuation
• (Exists independently of the event?) • Subject-object individuation
• (Incremental Theme?) • Punctuality
16MITH: November 13, 2007
A Human Subject Experiment
Construal Sentiment
Linguistic expression
The men destroyed the bus
CausationIntent
Change of State…
Linguistic expression
The men destroyed the bus
How sympathetic to
the men?
How well does the semantic construal of the event predict sympathy toward
the one(s) responsible for the
event?
17MITH: November 13, 2007
Rating Semantic Components of Transitivity for Different Linguistic Encodings of the Event
Telicity
Punctuality
Subject-Object Individuation
Kinesis
Affectedness of Object
Volition
Agency
Examples:
“Man suffocates 24 year-old woman.” “Suffocation kills 24-year old woman.”
18MITH: November 13, 2007
Rating Event Encoding Sentiment
Suffocation kills 24-year old woman.
19MITH: November 13, 2007
Analysis:Semantic Components Predict Perceived Sentiment
Suffocation kills 24-year old woman.
20MITH: November 13, 2007
Analysis:Semantic Components Predict Perceived Sentiment
Man suffocates 24-year old woman.
The semantic variables account for 81.9% of the variance.
100% would be perfect prediction.
21MITH: November 13, 2007
Additional Results: Event Encoding Distinctions
Human Agent
• “The men killed eight marketgoers.”
Nominalized Agent
• “The explosion killed eight marketgoers.”
Sentiment rating difference is statistically significant at p < .001
0
0.5
1
1.5
2
2.5
3
3.5
4
Sym
pat
hy
(Sen
tim
ent)
Rat
ing
Human Agent
Nominalized Agent
If you express the same event using a “nominalized agent” – the explosion killed rather than the terrorists killed – people interpret the description as significantly more sympathetic to the men.
22MITH: November 13, 2007
Additional Results: Our Statistical Analysis Supports Linguists’ Analysis
Principal Components Analysis corresponds with Agent and Patient proto-role properties
Component
1 2 3
Cause Event .930 -.031 .146
Sentience .925 -.307 .133
Volition .920 -.326 .127
Subject - Object Individuation .916 .091 -.011
Independent Existence .903 -.222 .107
Kinesis (Movement) .871 -.328 .274
Kinesis (Stationary) .639 .360 -.471
No Independent Existence .350 -.274 -.264
Causal Effect .438 .790 -.088
Object Individuation .453 .695 .007
Cause Change of State .579 .618 -.128
Verb .131 .244 .103
Punctuality .056 .128 .901
Telicity -.390 .450 .673
24MITH: November 13, 2007
What Have We Shown?
Construal
Sentiment
Linguistic expression
Terrorists destroyed the bus
The bus exploded
CausationIntent
Change of State…
Negative
The bus was dispatched
Word choice
Syntactic frame
kill
butcher
murder
martyr
25MITH: November 13, 2007
Underlying Semantics, Event Encodings, and Sentiment: Applying the formalized connections to a practical task
Computational Text Classification with respect to Implicit Sentiment
Example Applications: Information services for intelligence analysts, policy analysts, …
26MITH: November 13, 2007
Text Classification: Supervised Machine Learning
Example: Spam Filtering
Input: training data (x1,y1), (x2,y2),…, (xn,yn)
• Each xi is a set of observable features representing an item
• Each yi is a (binary) label (e.g. spam, not spam)
Supervised Machine Learning produces a Classifier
• Input to the Classifier is unlabeled items: (x,?)
• Output is a prediction of the label y: (x,?) → (x,y)
Primary Considerations
• Learning algorithm
• Item representation (feature choice)
27MITH: November 13, 2007
Feature Definition: Translating Semantic Components into Practical Text Representation
Domain Relevant Terms
• Kill verbs and nominalizations
• Automatically identified terms
Grammatical Relations
• Dependency Parse
Terms + Relations = Observable Proxies for Underlying Semantics (OPUS)
murdered
they
28MITH: November 13, 2007
Feature Production
Sentence:
Life Without Parole does not eliminate the risk that the prisoner will murder a guard, a visitor, or another inmate.
Constituent Parse:
(S
(NP (DT the) (NN prisoner))
(VP (MD will)
(VP (VB murder)
(NP
(NP (DT a) (NN guard))
(, ,)
Grammatical relations:
nsubj(murder, prisoner)aux(murder, will)dobj(murder, guard)
OPUS features:
TRANS-murdermurder-nsubjnsubj-prisonermurder-auxaux-willmurder-dobjdobj-guard
29MITH: November 13, 2007
Kill Verbs: Constructing The Death Penalty Corpus
Web Site Number of Documents
PRO: www.prodeathpenalty.com (PRO1) 117 www.clarkprosecutor.org (PRO2) 437 www.yesdeathpenalty.com (PRO3) 26 www.thenewamerican.com (PRO4) 7 www.dpinfo.com (PRO5) 9 596 Total CON: www.deathpenaltyinfo.org (CON1) 319 www.nodeathpenalty.org (CON2) 212 www.deathpenalty.org (CON3) 65 596 Total
30MITH: November 13, 2007
Implicit Sentiment Classification: Value of OPUS Features
Task: Document classification, PRO/CON Death Penalty
Evaluation: two-by-two site-wise cross validation
Feature Set Naïve Bayes C4.5 SVM
baseline n-grams 68.63 71.63 68.37
OPUS (kill verbs) 82.42 77.84 82.09
n+ = 295
n- = 84
p < 0.0001
n+ = 303
n- = 208
p < 0.0001
n+ = 362
n- = 152
p < 0.0001
31MITH: November 13, 2007
Beyond the Kill Verbs: Can the Method be Generalized?
Kill Verbs were chosen because:
• They have been studied extensively
• Lexical semantics suggest they are prototypically transitive
…the Death Penalty Corpus was built around them
Real problems usually work the other way:
• The documents exist
• How do we choose terms to target?
• Solution: Corpus-driven automatic identification of terms
32MITH: November 13, 2007
Generalization: Domain Relevant Terms
Relative Frequency Ratio:
Examples:
‘kill’
‘rob’
‘force’
08.101000000
157
642279010160
rfR
rc
rc
dc
dc
NF
NF
rfR
97.221000000
100
64227901475
rfR
51.11000000
113
64227901101
rfR
dc = domain corpus
rc = reference corpus (BNC)
33MITH: November 13, 2007
Automatically Identified Verbs from the DP Corpus
testify, convict, sentence, execute, aggravate, file, strangle, affirm, stab, schedule, rape, rob, violate, overturn, accord, murder, confess, pronounce, plead, shoot, kill, deny, arrest, condemn, commit, fire, witness, request, steal, review, appeal, decline, grant, rule, die, reject, state, impose, conclude, question, charge, beat, drive, attempt, release, admit, refuse, present, recommend, conduct, order, serve, receive, argue, determine, suffer, seek, issue, claim, note, discover, enter, fail, strike, find, identify, result, return, tell, include, indicate, arrive, sign, force, stop, say, pull, support, reveal, live, raise, ask, visit, drop, believe, hear, love, represent, regard, occur, hit, decide, express, involve, prove, stay, walk, consider, write, spend, end, place, fight, plan, face, base, continue, leave, call, hold, watch, allow, try, obtain, cause, begin, set
34MITH: November 13, 2007
Implicit Sentiment Classification:Automatically Chosen Verbs
Feature Set SVM
baseline bigrams 71.96
OPUS features (“linguistically relevant” verbs)
82.09
OPUS features(automatically chosen verbs)
88.10
n+ = 367, n- = 149, p < 0.001
n+ = 206, n- = 151, p < 0.01
35MITH: November 13, 2007
The Israeli-Palestinian Conflict
The Bitter Lemons Website
The Bitter Lemons Corpus (Lin et al. 2006)
Palestinian Israeli
Editors 148 149
Guests 149 148
Total number of documents 297 297
Average document length 740.4 816.1
Number of sentences 8963 9640
36MITH: November 13, 2007
November 12, 2007 Edition 41 Is the PA beginning to resemble the SLA?
two Palestinian views1.The PA cannot remain transitional
much longer
by Ghassan Khatib
The situation in Nablus can only further discredit the PA in the eyes of its own public and strengthen comparisons with the SLA.
two Israeli views1.The danger is there by Yossi Alpher
Any comparison between Abbas/Salam and Lahd is, to say the least, not flattering.
1.Unite or dissolve an interview with Eyad Sarraj The Palestinians have lost the ability to govern themselves, to make war or to make peace.
1.The political context is totally different
by Dani Reshef The SLA sought to join the Lebanese army as a territorial brigade.
37MITH: November 13, 2007
Implicit Sentiment Classification: BL Corpus Results
70 Experiments
• Avg accuracy: 95.67%
• Max accuracy: 97.64%
Previous Work (Lin 2006)
• NB-B 93.46%
• SVM 88.22%
Classification Accuracy, BL CorpusTest Scenario 1 (DeterminerFilter)
0
2
4
6
8
10
12
Individual Experiment (ρ values and accuracy)
Ter
m T
hre
sho
ld (
ρ)
84
86
88
90
92
94
96
98
Per
cen
t C
orr
ect ρ (Verb)
ρ (Noun)
OPUS
Lin 2006 NB-B
Lin 2006 SVM
38MITH: November 13, 2007
Congressional Floor Debate Speeches: The CS Corpus (Thomas, Pang, and Lee 2006)
Classification Task: binary YEA/NAY (for or against) the legislation under debate
corpus total
training set
test set
development set
speech segments 3857 2740 860 257
debates 53 38 10 5
average number of speech segments
72.8 72.1 86.0 51.4
average number of speakers per debate
32.1 30.9 41.1 22.6
39MITH: November 13, 2007
Implicit Sentiment Classification: Results for CS Corpus
Maximum accuracy: 70.00% (p < 0.05)
Previous Work: 66.05% (Thomas, Pang, and Lee 2006)
SVM Classifier Accuracy
64
65
66
67
68
69
70
71
Term Threshold (ρ )
Per
cen
t C
orr
ect
OPUS-SVM 68.84 69.53 70.00 69.19 68.26 68.84 69.42 68.26 67.79
U-SVM (TPL06) 66.05 66.05 66.05 66.05 66.05 66.05 66.05 66.05 66.05
0.0 0.2 0.4 0.6 1.0 1.5 2.0 2.5 3.0
40MITH: November 13, 2007
Adding Inter-Item Relationships(following Thomas, Pang, & Lee 2006)
Congressional Speech exhibits:
• Repeated speakers
• References to other speakers
These relations can be exploited:
• Same-speaker links: Assume a speaker always exhibits the same orientation within a debate
• Agreement links
— By-name references to other speakers are automatically classified as agreement or disagreement
— Instances of agreement are ‘soft’ versions of same-speaker links
41MITH: November 13, 2007
Congressional Speech: Example
It is not even clear we can move it to another bill at this point. Yet, it is the only bill standing, and it is a bipartisan effort to try to address this scourge that is crossing the country. I thank Chairman Sensenbrenner; also majority leader Roy Blunt, who has been an early leader in this charge; Chairman Barton of the energy and commerce committee for his willingness to have this. I would also thank the several members who have worked so hard to make this comprehensive anti-meth legislation happen. In particular, I would like to thank Representatives Mark Kennedy, Darlene Hooley of Oregon, Dave Reichert and John Peterson, because they provided much of the content of this comprehensive bill and their consistently strong leadership on the house floor.
I would also like to thank the four co-chairmen of the congressional meth caucus, Congressmen Larsen, Calvert, Boswell and Cannon, for their staffs' assistance in putting this together so we could have a bipartisan effort. Congressman Tom Osborne has crusaded on this house floor and across the country on behalf of anti-meth legislation, as has Congressmen Baird, Wamp, Boozman, King, Gordon and so many others. This would not be happening today if we did not have this bipartisan coalition, and I hope it becomes law.
42MITH: November 13, 2007
SVM Classification: a Graph Minimum Cut Representation
43MITH: November 13, 2007
Classification Framework with Inter-Item Relationships
44MITH: November 13, 2007
Classification with Inter-Item Relationships: Results
YEA or NAY classification of speech segments
U-SVM (Thomas, Pang and Lee 2006)
OPUS-SVM
SVM only 66.05 70.00 (p < 0.05)
SVM arcs plus same-speaker arcs 67.21 70.81 (p < 0.05)
SVM arcs plus same-speaker arcs and agreement arcs, θagr = 0
70.81 68.37
SVM arcs plus same-speaker arcs and agreement arcs, θagr = μ
70.81 70.93
SVM arcs plus same-speaker arcs and agreement arcs, θagr = 1.5μ
67.33 70.12
45MITH: November 13, 2007
46MITH: November 13, 2007
Summary
Facial expressions
Emotion
Construal
Sentiment
Linguistic expression
Anger
Terrorists destroyed the bus
The bus exploded
CausationIntent
Change of State…
Negative
The bus was dispatched
Ekman (2002)
The way language encodes underlying sentiment is mediated by a “conceptual lens”, afforded by systematic dimensions of semantic construal.
This fact can be exploited for practical tasks.
47MITH: November 13, 2007
Where Do We Go From Here?
JUSTICE KENNEDY: Do you know, since you seem to have looked at it: In North Carolina, for the person whose had his civil rights taken away, is -- is there any mechanism to get them back earlier by -- by applying for clemency or a pardon or -
JUSTICE GINSBURG: So if you had a statute -- a State like, I'm told, Vermont, that doesn't take away any one's civil rights, not even a first degree murderer's, then that first degree murderer would be equated to someone whose civil rights were taken away and then restored.
48MITH: November 13, 2007
Where Do We Go From Here?
Brent Cunningham, The Rhetoric Beat: Why journalism needs one
Columbia Journalism Review — November / December 2007
What if on 9/11 our major media outlets had employed reporters whose sole job it was to cover the rhetoric of politics— to parse the language of our elected leaders, challenge it, and explain the thinking behind it, the potential power it can have to legitimize certain actions and policies and render others illegitimate? …
Apologies to William Safire, but journalism needs a rhetoric beat. Yes, language has been used and misused in the service of politics since man first had both language and politics. Political rhetoric is not inherently bad, and I am not suggesting a War on Rhetoric. But there are aspects of our present political and cultural reality that underline the need for a prominent, persistent, and intellectually honest airing of our linguistic dirty laundry, and the mainstream press is our best hope for getting it.
49MITH: November 13, 2007
Thank you!