1
Measuring Reputation Daniel Diermeier and Mathieu Trepanier
Kellogg School of Management
Abstract Reputation management has moved to the top of the agenda for many companies, yet “corporate
reputation” remains an elusive concept which is difficult to measure and manage. In this paper we
investigate whether linguistic measurements of reputational shocks contain useful information about
short term future corporate performance. Using news articles for 2005‐2006 for a large sample of firms
listed on NASDAQ and NYSE, we argue that useful information about corporate reputation can be
derived from transient signals (reputational shocks). We create measures of reputational shocks based
on the sentiment and emotions captured from the news coverage about a corporation. We find
evidence supporting a link between some of our measures and next‐day stock return. Our results
suggest that measures of sadness are the most reliable and substantial predictors of performance
among our set of linguistic measurements. We also find that the relationship between reputational
shocks and short term performance varies substantially across industries, with the more consistent
results for manufacturing, retails trade, and transportation industries.
This Version: February 2009
Very preliminary: DO NOT QUOTE WITHOUT PERMISSION.
* Acknowledgements: Viorel Maxim provided able research assistance. All errors are our own. Author contact information: Daniel Diermeier ([email protected]) and Mathieu Trepanier ([email protected]), Kellogg Graduate School of Management, 2001 Sheridan Road, Evanston, IL 60208.
2
Introduction Reputation management has moved to the top of the agenda for many companies, yet “corporate
reputation” remains an elusive concept which is difficult to measure and manage. A common approach
is to interpret corporate reputation as “public opinion for corporations” but with multiple “publics”, i.e.
constituencies, such as customers, employees, investors, regulators and the like. While plausible at first,
the approach has limited practical use, both for companies and researchers. Public opinion is usually
measured by surveys, a very expensive and inflexible tool, which only the largest companies can afford.
Moreover, even when surveys exists (e.g. McDonald’s FastTrack survey), they are not commonly
available to researchers.
An alternative method relies on an indirect approach. The idea is that constituents’ beliefs about a
company or product will be significantly shaped by the information and opinion received through the
media (both mass and user generated). Moms may stop taking their daughters to McDonald’s not
because the staff was unfriendly during their last visit, but because they saw a feature on The Today
Show linking higher rates of breast cancer with French fries. Indeed potential customers may never
become actual ones because of a company’s “reputation”.
Recent laboratory studies (e.g. (Uhlmann, et al. 2008), (Jordan, Diermeier and Galinsky 2008)) have
provided empirical support for the impact of reputational issues on customer perception and behavior.
Customers, for example, will rate a company or a company logo lower if they are exposed to a news
story alleging, e.g. a sexual harassment incident. Moreover, they also will rate product quality lower and
consume less. Importantly, companies’ response strategies do have an effect on customer perception
and behavior. Responses that focus on showing empathy, transparency, and commitment all have
3
positive effects. Finally, evidence of past virtuous behavior, a moral bank account, also has a positive
effect, in the absence of other factors. (Uhlmann, et al. 2008).1
These findings suggest an indirect approach to measuring reputation. Rather than using surveys or focus
groups to assess the state of mind of constituencies, one can measure the “inputs”, i.e. the sentiment
expressed in news paper articles, internet postings, etc. The behavioral link between media influence
and stakeholder attitudes would be provided, by the experimental micro‐data on how stakeholder
perception is formed. This was done by Uhlmann, et al. (2008) for the case of customers.
This leads to the next question on how to measure the “inputs”, i.e. media sentiment about companies
and products. Recent developments in information retrieval, machine learning, and natural language
processing technologies provide a promising path in this direction. A standard approach (followed by
many commercial providers and researchers alike) is to rely on annotated opinion corpora to train and
test opinion retrieval, classification, and aggregation models. This approach has been used with
considerable success in the classification of customer opinions, e.g. online movie reviews. In these
applications, the goal is to correctly classify reviews as “positive” or “negative.” These methods provide
a natural approach to classifying corporate sentiment. First, create a training set of articles about
company X. Next, have human annotators create a training set by classifying each article as “positive”,
“neutral”, and “negative”. Finally, train classification algorithms on the training set and then create
indices based on the classification results.
1 Uhlmann, et al. (2008) have subjects judge the quality of bottled water. Subjects in the sexual harassment condition rate the taste lower and drink less. If a company uses a strategy focusing on transparency and empathy related to the sexual harassment case compared to a stone‐walling or defensive approach, the product quality is rated higher and a larger quantity is consumed.
4
The Classification Approach Methodological Problems
While initially plausible, there are at least three potential problems with this approach. The first problem
is known as the domain dependency problem. Opinion classifiers have achieved accuracy levels as high
as 88% for product reviews (Dave et al., 2003) and 82% for movie reviews (Pang, Lee and Vaithyanathan
2002). However, Finn and Kushmerick (2006) found that an opinion classifier trained on movie reviews
was not effective in predicting the polarity of restaurant reviews, and vice versa. For example, in their
analysis classifiers that are able to predict movie reviews with high accuracy (77%) fail to predict
restaurant reviews (40%). The reason for the domain dependence lies in the importance of expressive
adjectives for classification success. While there are some universal adjectives (like good or bad) that
express opinions, most adjectives that are typical for movie reviews (like gripping or boring), however,
are unlikely to occur in restaurant reviews (like tasty or delicious). This issue is particularly important in
the case of corporate reputations which cross various issue domains.
The second problem has to do with the way opinions are expressed. While customer opinions are
frequently expressed directly (“the food was delicious”) opinions about corporations are frequently
expressed indirectly, i.e. through some form of argument. This is especially true of news articles. For
examples, while news editorials may contain some direct opinion expression, reporting on negative
events, such as lawsuits, strikes, or decreasing stock price, may actually have the same effect on the
audience, even though we would not usually consider them expressions of opinion. Here we are
interested on the effect on the audience, i.e. a company’s customers and other stakeholders. From this
perspective it does not matter whether, e.g. a customer’s opinion of company X drops because of a
critical editorial or a report about a pending government investigation.
The third problem is practical and consists in the absence of existing text corpora related to corporate
reputation that could be used to reliably train classifiers. To investigate these issues Yu, Diermeier and
5
Kaufmann (2009) built a new corporate opinion corpus, the Wal‐Mart Corpus. The practical goal of the
corpus was to facilitate future algorithm development. However, methodologically it also allowed an
evaluation of the reliability and validity of human annotation of corporate opinions. Unless typical
subjects can clearly distinguish positive from neutral or negative news about a company, here Wal‐Mart,
the classification‐based approach to reputation metrics becomes problematic.
Yu, Diermeier and Kaufmann (2009) collected more than 130,000 news articles which mentioned Wal‐
Mart in 2006, and sampled from them 1,080 articles based on the distributions of their publication
dates, the document lengths, and the reach of the publishers. Three coders were then asked to
annotate the polarity (a choice among the three options “positive”, “negative”, or “neutral”) of the 1080
articles at both the paragraph and the document level. To test if paragraph is an appropriate opinion
text unit (without much ambiguity), the fourth category “mixed” was added to the paragraph‐level
annotation.
Cohen’s κ, a standard measure in the content analysis literature, was used to measure inter‐coder. A
minimal κ > 0.60 customarily indicates an acceptable level of reliability. However, none of the polarity
annotation tasks passed this threshold. For example, the average κ was at the document level is 0.30
and 0.39 at the paragraph level.2
What is the reason for this low level of agreement? First of all, news articles report both “opinions” and
“facts”. Many “facts” about corporations easily evoke various opinions among readers, for example,
after reading an article on robbery at a Wal‐Mart parking lot, some readers would worry about the
safety when shopping at Wal‐Mart while others might not feel the same way.
2 These two numbers are not directly comparable with each other because of the additional “mixed” category at the paragraph level. Interestingly, average κ was higher at the title level (0.42).
6
Secondly, Yu, Diermeier and Kaufmann (2009) observe a large grey area at the boundary between
“neutral” and polarized (“positive” or “negative”) categories at all three levels. Further marginal
distribution analysis results demonstrated that individual coders have unique personal biases toward
the polarity category distribution. Even when they annotated different data subsets, the coders
exhibited similar marginal category distributions. In other words, some coders are just more positively
or negatively inclined than others. This phenomenon poses another challenge to classification methods
in that the “ground truth” or “gold standard” is hard to obtain for algorithm training and evaluation
purposes.
A possible counter‐argument could state that, perhaps, the annotators (university undergraduates) were
not trained well enough to make the proper distinctions. But this argument misses the main purpose of
the whole exercise of reputation metrics which is about finding measures of company’s public image.
The relevant public may consist of experts, e.g analysts, but most members of the public will lack any
specific knowledge or expertise. Yet, as customers and other stakeholders their opinion still matters.
To summarize, the promising approach of machine‐based classification faces various challenges in the
context of corporate reputations. First, corporate reputations cross multiple domains, yet classifiers are
typically highly domain specific. Second, opinions about corporations are shaped directly (e.g. an
editorial) or indirectly (e.g. a negative news story). Third, the attempt to design specific corpora for
corporate reputation classification faced the problem of low inter‐coder agreement which made the
establishment of a “ground truth”, essential for any classification task, impossible.
At a deeper level, existing classification approaches focus on the wrong end of the communicative
relation: the sender, while the real concern of corporate reputation metrics lies in the receiver. This
leads to a different approach.
7
A Different Approach – Emotional Lexica
Our new approach is based on the desire to relate to constituency attitudes more directly. To do so, we
utilize an automated text analysis program called Linguistic Inquiry and Word Count (LIWC)
(Pennebaker, Booth and Francis 2006). LIWC identifies the linguistic structure of a text by counting the
number of words associated with a series of pre‐defined dictionaries. These include rudimentary
linguistic features such as pronoun or verb use, but also words associated with mental states such as
emotions, beliefs and attitudes.
For any given text, LIWC will calculate the number of words that matches its pre‐defined dictionaries.
For example, if a word such “hate”, which exists in the ‘negative emotions’ dictionary, appears in a text,
it would be scored as a one. If it appears again, it would receive an additional score of one. If the word
“ugly”, also in the ‘negative emotion’ dictionary, appears in the text the total score would be three. In
other words, LIWC counts word tokens, not types. At the end of the text analysis, LIWC will calculate the
total times these dictionary word appear in the dialogue divided by the total number of words in the
text, creating a percentage. This represents the linguistic footprint or summary of a particular text.
What makes LIWC promising in our context is that LIWC has demonstrated external validity across a
variety of studies, demonstrating how language can represent personality types. Chung and Pennebaker
(2007), Pennebaker and Lay (2002), and Pennebaker, Mehl and Niederhoffer (2003) distinguish
deceptive or ironic speech. Hancock, et al. (2005) represents how speakers tend to converge upon each
other’s speech styles. Niederhoffer and Pennebaker (2002) and Kahn, et al. (2007) distinguish verbal
expressions of emotion. Stirman and Pennebaker (2002) show evidence of differences in self and
collective linguistic references in writings of suicidal and non‐suicidal poets. Further, each dictionary has
8
been compared with a text analysis by human coders to insure reliability, and examined for internal
validity by using a variety of text corpora3
Notice that LIWC constitutes a universal dictionary that has been refined over many studies rather than
the outcome of specific classification experiments. The hope is that these categories correspond with
the mental state evoked in a typical reader of a text.
To provide some prima facie credibility to the measure we discuss a brief example. In 2006, a
multinational healthcare company was faced with some activist pressure concerning one of its products.
The following figures show an analysis of annual news coverage for the company processed by LIWC4.
Each of the spikes in the Anger and Sadness category reflects media response to a clearly identifiable
action including aggressive actions by the company such as lawsuits and product registration decisions.
In early March, news articles echoed criticism of the company by a well‐known health‐related activist
organization over the slow registration of a life‐saving drug in developing countries. Activist criticism
intensified around mid‐April and the company registered the product in several developing countries
leading to higher levels of positive feelings and optimism. In August, a major international conference
attracted substantial coverage of the company’s actions, most of it critical. The highest level of optimism
occurred when a government took drastic regulatory action against the company in response to
pressure from activists and the public more generally.
3 See for example Pennebaker, Booth and Francis (2006) and Pennebaker and Francis (1996). 4 Text Data for the period January 1, 2006 to January 31, 2007 were provided by Lexis/Nexis.
Figure 1: LIW
While thi
character
that, on a
To invest
performa
summary
analyst re
investors,
measure
not comm
WC analysis of s
s case study
istics, not th
verage, a sim
tigate the ef
nce, their sto
measures o
eports, which
, regulators a
for each stak
monly availab
sample reputatio
y certainly lo
e impact on
milar emotiona
ffects of LIW
ock price. No
of the attitud
h assess the
and so forth.
keholder grou
le.
onal environme
oks like a pr
constituencie
al state will b
WC directly,
tice that, in o
des of multip
impact of a
It would pe
up, e.g. custo
ent
romising pro
es. Of course
be triggered in
we look to
our context,
ple constitue
certain eve
erhaps be pre
omers, but th
of of concep
e, the existing
n the audienc
owards a sta
stock price e
ncies. For ex
nt or news s
eferable to h
his requires t
pt, we are st
g application
ce, but this is
andard meas
effects are to
xample, inves
story on cus
have a more
he use of sal
till measuring
ns of LIWC su
a hypothesis
sure of corp
be interpret
stors may co
tomers, supp
granular out
les data whic
9
g text
uggest
.
porate
ted as
onsult
pliers,
tcome
ch are
10
Our approach is similar to that of Tetlock, Saar‐Tsechansky and Macskassy (2008) who find supportive
evidence for a link between daily measures of negative sentiment in newswire articles and next day
stock performance. However, our work differs from Tetlock, Saar‐Tsechansky and Macskassy (2008) in
three key respects. First, aside from obtaining scores for positive and negative sentiment, we also use
linguistic measurements for a variety of emotions (LIWC). As discussed above, an important advantage
of using LIWC comes from its demonstrated external validity. Second, while Tetlock, Saar‐Tsechansky
and Macskassy (2008) rely on pooled estimations for firms in the S&P500, we are concerned by industry‐
level differences. Last, our interest in studying reputation rather than market efficiency leads us to
aggregate our linguistic measurements over longer horizons.
The remainder of this article is organized as follows. Section II details our statistical and linguistic
methodologies. In section III, we describe our data. Section IV, presents our results. Finally, section VI
concludes.
Methodology
We report results for the OLS estimation of
, Ψ , ,
Equation 1
11
Where , is the stock return for firm 1,… on day t=1,…T5 and is the risk free rate on day 6.
is a 1X4 matrix of coefficients, is a 4XT matrix containing Fama and French’s 3 factors (Fama and
Kenneth 1993) and Cahart’s fourth factor (Cahart 1997). The factors allow us to control for returns of
the contemporaneous market (market), size (SMB), book‐to‐market (HML), and momentum factors
(UMD). Ψ is 1X9 matrix of coefficients. , is a 9X(J*T) matrix containing the reputation measures.
Finally, , is an error term.
We define the reputation measures in the following way7. From each article j considered in our study,
we obtain a total word count (# of word in article j) as well as a word count for each linguistic category
(e.g. # of positive words in article j). For each linguistic category, we then compute an article level
proportion. For example:
# #
Equation 2
The reputation measure is the constructed as follows:
5 Daily return is defined using closing stock prices. 6 We use the monthly t‐bill rate divided by the number of trading days in the month as the measure of the risk free rate. 7 We use the positive sentiment measure (pos) for the exposition. The adaptation to all other measures
( , , , , , , , , , , , , , , and , ) is straightforward.
12
, ,
Equation 3
,, ,
,
Equation 4
Were , and , are computed on a 7‐day rolling basis. The concept of a corporate reputation is
often understood to be defined over a horizon longer than 7 days. For our purposes, reputation
measures can be conceptualized as reputational shocks impacting the stock of reputation.
We first estimate the model by pooling over all firms in the sample. We then proceed with estimations
at the industry level8.
Data
We use newswire articles from Dow Jones News Service 2005‐2006 for 2288 (2661) NASDAQ‐listed firms
for 2005 (2006) and 1613 (1991) NYSE‐listed firms for 2005 (2006)9. News articles are obtained from the
Dow Jones News Service for 2005 and 2006. To eliminate articles containing only tables, numbers, or
company names, we require that an article contain at least 50 words. We also require that they contain
at least 5 positive words. Ticker symbols are obtained from the articles’ metadata. To avoid problems of
attribution, we require that no more than 3 ticker symbols be listed in the metadata of an article. Finally,
8 When estimating at the industry level, we require that there be at least 10 firms in an industry. 9 The sample of firms was selected to match all firms listed on NYSE and NASDAQ at the beginning of 2005.
13
we consider only articles published between 12am and 3pm Eastern time. Our sample contains
1,855,266 valid newswire articles for 2005 and 1,989,360 for 2006.
Stock return information for 2005‐2006 is obtained from CRSP. The selected sample consists of the set
of firms listed on either NASDAQ or NYSE at the beginning of 2005 and for which we have at least 120
trading days. The Fama‐French factors are obtained from Kenneth French’s personal webpage10. Table 3
provides summary statistics.
We obtain linguistic measurements from two sources. First, we use the well‐known Harvard‐IV‐4
psychosocial dictionary word classifications (General Inquirer (GI))11 for sentiment or tonality scoring.
, , , are thus derived using the General Inquirer. Second, we use a series of lexicons from the
Linguistic Inquiry and Word Count (LIWC) to measure psychological processes12. For each newswire
article meeting our selection criterion, we obtain a general word count (word), as well as the relevant
linguistic measures from GI and LIWC. The core measures obtained from LIWC are affect, negemo,
posemo, anger, anx, sad, optim, and posfeel. Table 1 provides an overview of the linguistic measures.
Table 3 contains descriptive statistics for key financial variables, article features, emotions, sentiment,
and other linguistic measurements. We see that in our sample, a firm listed on NYSE is mentioned in a
newswire article about twice as often as a NASDAQ listed firm, but that a typical article about a
NASDAQ‐listed firm is longer than one about a NYSE‐listed firm. For both NYSE (NASDAQ) firms, the Dow
Jones newswire articles are about 17% (20%) shorter in 2006 than in 2005. The average numbers of
words from the positive, affect, positive emotions, optimism, and sadness lexicons per article are higher
10 http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html 11 See the General Inquirer’s Web site lists each word in the positive and negative categories: http://www.webuse.umd.edu:9090/tags/TAGNeg.html and http://www.webuse.umd.edu:9090/tags/TAGPos.html. 12 See (Pennebaker, Chung, et al. n.d.) for more details. THE LIWC website ( http://www.liwc.net/liwcdescription.php) also contains useful information.
14
in both years for NASDAQ‐listed firms than for NYSE‐listed firms, but the reverse is true for words from
the positive feeling, negative emotions, anxiety, and anger lexicons.
Industry information for firms in the sample is obtained from Bloomberg. We used the two‐digit North
American Industry Classification System (NAICS) codes. An extra category (NAICS=0) was created to
contain 477 (438) NYSE‐listed firms and 108 (117) NASDAQ‐listed firms for 2005 (2006) for which a
NAICS code was not available from Bloomberg. A non‐ambiguous industry classification was obtained for
3316 (4097) firms for 2005 (2006). Table 4 provides descriptive statistics for 2005 at the industry level
for NYSE firms. Table 2 gives the NAICS labels.
Firms in industry 49 (transportation and warehousing: postal services) have the highest average number
of newswire articles per day with an average of 1 article every 2.63 days. They are followed by firms in
industries 51 and 45 (information and retail trade: sporting goods, hobby, books, music, and general
merchandise) with averages of 1 article every 5 days. Firms in industry 55, 61, and 81 (management of
companies and enterprises, educational services, and other services) have the lowest coverage intensity
with averages of 1 article every 33.3, 14.3, and 14.3 days respectively.
Overall, intensity measures13 for affect, negative, negative emotions, and sadness tend to be higher for
firms industries 11, 31, 32, 44, and 45 and lower for firms in industries 55, 61, 23, and 9214. This result
seems consistent with intuition that firms in agriculture, manufacturing, and retail would exhibit a
newswire coverage that is more intense in sentiment and emotions than that of firms in management of
companies, educational services, construction, and public administration services. Figure 2 depicts key
intensity measures per industry for 2005.
13 Intensity measures for the emotion and sentiment variable are computed by dividing the total number of words for the given emotion/sentiment in an article by the article’s total word count. 14 See Table 4.
15
Results
Table 5 shows estimates for the Ordinary Least Squares (OLS) estimation of Equation 1 where the matrix
, contains one of the measures listed in the first column. The first two sections (emotions and
sentiment) contain the measure of interests. Columns 2 and 3 present the results for firms listed on
either NYSE or NASDAQ while columns 4‐7 show the results for NYSE and NASDAQ‐listed firms
separately. With the exception of positive feelings, all emotions and sentiment coefficients have the
expected signs in the three regressions15. Two emotional measures are consistently significant (at the
0.01 level) across the three sets of results. Negative emotion and sadness shocks to a company’s
reputation are systematically associated with lower next day stock returns. The magnitude of the
coefficients for these two variables is also much greater than for the other measures. Overall, firms
listed on NASDAQ seem more responsive to our reputational shocks. For example, the coefficients for
negative emotions and sadness are roughly twice as large for NASDAQ‐listed firms as they are for NYSE‐
listed firms. For a typical firm listed on NASDAQ, a one standard deviation negative emotion shock to its
reputation is associated with a lower next day stock return by 4.5 basis points. Still for a NASDAQ‐listed
firm, a one standard deviation sadness shock to its reputation leads to a lower next day return by 5.0
basis points.
Table 5 shows OLS estimates for Equation 1 for the core reputational shock measures at industry level
for both NYSE and NASDAQ‐listed firms16. We report the coefficients for the parameters Ψ at the 2‐digit
NAICS code industry level17. Consistent with what we observed in Table 5, we find that negative emotion
and sadness shocks present the most consistency in terms of expected signs and significance. In both
cases, of the 21 industries for which we have estimates, 17 have the expected negative sign. The
15 As affect includes emotions for which intuition would suggest a positive and a negative impact, there is no clear intuition for the coefficient sign. 16 Each regression uses a single reputation measure. 17 We include an industry if it contains at least 10 firms in our sample.
16
estimates are economically significant in many cases. For examples, a one standard deviation increase to
our sadness measure is associated with a more than 14 basis point drop in next day stock price for firms
in wholesale trade (NAICS 42), while the same shock is associated with an almost 12 basis points drop
for firms in transportation and warehousing (NAICS 48). Smaller, yet significant impacts are found for
finance and insurance (NAICS 52), one of the manufacturing classification (NAICS 33), and utilities (NAICS
22) with estimates of ‐2.1, ‐3, and ‐5.2 basis points respectively. Considering shocks to our negative
emotion measure, the largest impact are for healthcare and social assistance (NAICS 62), wholesale
trade (NAICS 42), real estate and rental and leasing (NAICS 53), and one of the retail trade classification
(NAICS 44) with estimates of ‐11.3, ‐10.8, ‐10.3, and ‐10.1 basis points respectively. Our industry‐level
estimates do not show a similar pattern to what was observed in Table 5 when comparing NYSE and
NASDAQ estimates18.
Conclusion and Extensions
Do linguistic measurements of reputational shocks impact corporate performance? We find some
supportive evidence. We find that negative emotion/sentiment and sadness shocks are significantly
associated with short term future stock performance with the expected signs. We further find that the
reputational shock impacts are economically meaningful. For instance, a one standard deviation positive
sadness shock is correlated with a 5.0 basis points lower next‐day return for NASDAQ‐listed firms or with
an about 14.1 basis points drop for firms in wholesale trade. Consistent with intuition, our results
suggest that the impact of linguistic shocks to corporate reputations vary substantially across industries.
18 Results not shown. Available upon request.
17
The central aim of this paper is to stimulate further research on the systematic use of publicly available
information to study of corporate reputation. In terms of pushing the agenda further, it would be
interesting to see how the perception of various stakeholder groups impact corporate performance or
to investigate the role of context (e.g. articles about product defects versus earnings release) in which
the linguistic reputation measures are obtained.
18
Works Cited
Cahart, Mark M. "On the Persistence of Mutual Fund Performance." Journal of Finance, 1997: 57‐82.
Chung, C. K., and J. W. Pennebaker. "The psychological function of function words." In Social Communication, by K. Fiedler (Ed.), 343‐359. New York: Psychology Press., 2007.
Dave, K, Lawrence, S, & Pennock, D. M. (2003). Mining the peanut gallery: opinion extraction and semantic classification of product reviews. Proceedings of the 12th international conference on World Wide Web, 519‐528. Retrieved May 28, 2007, from ACM Digital Library.
Fama, Eugene F., and R. French Kenneth. "Common Risk Factors in the Returns of Stocks and Bonds." Journal of Financial Economics, 1993: 3‐56.
Finn, A., and N. Kushmerick. "Learning to Classify Documents According to Genre." of American Society for Information Science and Technology, 2006: 1506‐1518.
Hancock, J. T., L. Curry, S. Goorha, and M. Woodworth. "Automated linguistic analysis of deceptive and truthful synchronous computer‐mediated communication." Paper presented at the Hawaii International Conference on System Sciences, Hawaii, 2005.
Jordan, J., D. Diermeier, and A. D. Galinsky. "When it’s not the thought that counts: The double‐edged sword of care in corporate crisis responses." Working paper, 2008.
Kahn, J. H., R. M. Tobin, A. E. Massey, and J. A. Anderson. "Measuring emotional expression with the Linguistic inquiry and Word Count." The American journal of psychology, 2007: 263‐286.
Niederhoffer, K. G., and J. W. Pennebaker. "Linguistic style matching in social interaction." Journal of Language and Social Psychology, 2002: 337‐360.
Pang, B., L. Lee, and S. Vaithyanathan. "Thumps up?: Sentiment classification using machine learning techniques." Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP2002), 2002: 79‐86.
Pennebaker, J. W., and M. E. Francis. "Cognitive, emotional, and language processes in disclosure." Cognition & emotion, 1996: 601‐626.
Pennebaker, J. W., and T. C. Lay. "Language use and personality during crises: Analysis of Mayor Rudolph Giuliani's press conferences." Journal of Research in Personality, 2002: 271‐282.
Pennebaker, J. W., M. R. Mehl, and K. G. Niederhoffer. "Psychological aspects of natural language use: Our words, our selves." Annual Review of Psychology, 2003: 547‐577.
Pennebaker, J. W., R. J. Booth, and M. E. Francis. Linguistic inquiry and word count: LIWC. Austin, Texas: Erlbaum Publishers, 2006.
19
Pennebaker, James W., Cindy K. Chung, Molly Ireland, Amy Gonzales, and Roger J. Booth. "The Development and Psychometric Properties of LIWC2007." LIWC.net.
Pennebaker, James W., M.E. Francis, and RJ. Booth. Linguistic Inquiry and Word Count (LIWC). Mahwah, New Jersey: Lawrence Erlbaum Associates, 2001.
Stirman, S., W., and J. W. Pennebaker. "Word Use in the Poetry of Suicidal and Nonsuicidal Poets." Psychosomatic Medicine, 2002: 517‐522.
Tetlock, Paul, Maytal Saar‐Tsechansky, and Sofus Macskassy. "More Than Words: Quantifying Language to Measure Firms' Fundamentals." Journal of Finance, 2008: 1437‐1467.
Uhlmann, E.L., G. Newman, V.L. Brescoll, A. Galinsky, and D. Diermeier. "Corporate crisis communication and its effect on consumers." Manuscript under review, 2008.
Yu, B., D. Diermeier, and S. Kaufmann. "The Wal‐Mart Corpus: A multi‐granularity corporate opinion corpus for opinion retrieval, classification and aggregation." Working paper, 2009.
20
Appendix Table 1: LIWC and GI Categories
Category Abbreviation Examples Words in category GI Positive pos Ability, clean, hopeful 1915 Negative neg Abominable, empty,
haphazard 2291
LIWC Linguistic processes Past tense past Went, ran, had 145 Present tense present Is, does, hear 169 Future tense future Will, gonna 48 Negations negate No, not, never 57 Psychological processes Affective processes affect Happy, cried, abandon 915 Positive emotion posemo Love, nice, sweet 406 Negative emotion negemo Hurt, ugly, nasty 499 Anxiety anx Worried, fearful, nervous 91 Anger anger Hate, kill, annoyed 184 Sadness sad Crying, grief, sad 101 Optimism optim Certainty, pride, win 69 Cognitive processes Insight insight Think, know, consider 195 Causation cause Because, effect, hence 108 Discrepancy discrep Should, would, could 76 Tentative tentat Maybe, perhaps, guess 155 Certainty certain Always, never 83 Inhibition inhib Block, constrain, stop 111 Personal concerns Money money Audit, cash, owe 173 † The LIWC portion of the table is extracted from (Pennebaker, Chung, et al. n.d.)
21
Table 2: 2‐Digit NAICS Labels
2‐Digit NAICS code
Description
11 Agriculture, Forestry, Fishing and Hunting 21 Mining, Quarrying, and Oil and Gas Extraction 22 Utilities 23 Construction 31 Manufacturing: Food, beverage, tobacco, and textile 32 Manufacturing: Wood, paper, printing, petroleum & coal, and chemical 33 Manufacturing: Metal, machinery, computers & electronics, electrical equipment,
transport, and furniture 42 Wholesale Trade 44 Retail Trade: Motor vehicle, furniture, electronics & appliances, building materials, food
& beverage, health, gasoline station, and clothing 45 Retail trade: Sporting goods, hobby, books, music, and general merchandise 48 Transportation and Warehousing: Air, rail, water, trucking, transit and ground
passenger transport, pipeline, and scenic & sightseeing transportation 49 Transportation and Warehousing: Postal services 51 Information 52 Finance and Insurance 53 Real Estate and Rental and Leasing 54 Professional, Scientific, and Technical Services 55 Management of Companies and Enterprises 56 Administrative and Support and Waste Management and Remediation 61 Educational Services 62 Health Care and Social Assistance 71 Arts, Entertainment, and Recreation 72 Accommodation and Food Services 81 Other Services (except Public Administration) 92 Public Administration 0 NAICS not available
22
Table 3: Descriptive Statistics
2005 2006 NASDAQ NYSE NASDAQ NYSE Mean Std Mean Std Mean Std Mean Std Emotions Affect 16.57 13.70 15.33 13.61 16.57 13.79 15.72 13.85 Positive 13.34 11.27 12.14 11.03 13.33 11.38 12.54 11.24 Negative 2.87 4.24 2.91 4.47 2.88 4.27 2.92 4.59 Optimism 5.64 5.72 4.78 5.35 5.58 5.68 4.95 5.44 Anxiety 0.21 0.69 0.24 0.80 0.22 0.78 0.25 0.85 Anger 0.44 1.12 0.47 1.19 0.43 1.05 0.46 1.17 Sadness 1.47 2.83 1.44 2.80 1.49 2.87 1.41 2.80 Positive feelings 0.69 1.51 0.72 1.72 0.67 1.52 0.73 1.83 Sentiment Positive 71.66 52.34 61.58 53.99 76.83 60.73 72.14 65.07 Negative 22.11 20.95 22.55 24.93 23.94 24.72 27.14 30.48 Direction Up 6.92 5.74 6.58 6.49 7.00 5.97 6.81 6.70 Down 1.40 2.16 1.70 2.69 1.44 2.40 1.78 2.94 Cognition Causation 6.33 5.52 4.81 5.21 6.18 5.60 4.80 5.29 Insight 6.86 6.36 5.82 6.11 6.92 6.53 5.99 6.27 Discrepancy 3.22 3.74 3.35 4.35 3.24 3.78 3.39 4.38 Tentative 6.78 6.52 5.31 6.51 7.03 6.79 5.68 7.13 Inhibition 2.72 3.10 2.05 2.85 2.75 3.11 2.11 2.87 Certainty 2.85 3.05 2.31 2.95 2.87 3.11 2.45 3.10 Verb tense Past 7.38 7.63 8.47 8.73 7.40 7.80 8.44 8.75 Present 22.75 15.85 20.00 16.37 22.48 16.36 20.53 16.54 Future 5.86 5.45 5.10 5.58 6.00 5.63 5.30 5.68 Other Negation 1.80 2.32 1.58 2.37 1.87 2.43 1.67 2.50 Money 13.72 16.10 15.80 18.04 14.04 16.76 15.80 18.43 Articles Daily articles/firm 0.06 0.07 0.12 0.15 0.07 0.10 0.14 0.22 Total words/articles 622.07 397.57 522.45 403.06 495.54 345.46 435.23 331.93 Financial variables # of firms 2288 NA 1613 NA 2661 NA 1991 NA Daily return 0.03% 0.03 0.04% 0.02 0.07% 0.03 0.08% 0.02 Market (X 1000) 0.18 6.48 0.18 6.49 0.43 6.70 0.44 6.71 SMB (X 1000) ‐0.07 4.20 ‐0.06 4.20 0.03 4.86 0.03 4.86 HML (X 1000) 0.32 2.29 0.31 2.93 0.47 2.57 0.46 2.57 UDM (X 1000) 0.64 5.05 0.64 5.06 ‐0.21 5.03 ‐0.20 5.03 Risk free rate 0.01% 0.00 0.01% 0.00 0.02% 0.00 0.02% 0.00
23
Table 4: Descriptive Statistics by Industry for NYSE‐Listed Firms in 2005
Industry NAICS Codes
# of Firms
Average Daily # of
Articles /Firm
Average # of
Words /Article
Average Affect
Intensity
Average Negative Words Intensity
Average Negative Emotion Intensity
Average Sadness Intensity
Average Daily Return
11 6 0.15 514.03 3.09 4.78 0.71 0.37 0.04% 21 97 0.09 538.28 2.73 3.83 0.48 0.29 0.12% 22 83 0.14 530.95 3.28 4.41 0.51 0.30 0.07% 23 26 0.08 510.96 2.71 3.65 0.42 0.17 0.05% 31 56 0.13 492.31 3.11 4.44 0.57 0.27 0.06% 32 139 0.15 542.29 2.84 5.26 0.72 0.32 0.06% 33 277 0.14 546.46 2.77 4.42 0.57 0.24 0.06% 42 34 0.11 545.86 2.98 4.07 0.56 0.27 0.06% 44 54 0.15 459.99 3.32 4.25 0.66 0.31 0.06% 45 27 0.20 488.04 2.78 4.73 0.60 0.30 0.08% 48 34 0.16 525.59 2.41 4.25 0.47 0.31 0.08% 49 2 0.38 443.80 3.36 4.39 0.55 0.20 0.01% 51 79 0.20 604.32 2.62 4.70 0.45 0.22 0.03% 52 278 0.13 553.21 2.87 3.95 0.50 0.29 0.06% 53 45 0.08 552.27 2.86 2.74 0.43 0.22 0.07% 54 45 0.17 581.46 2.84 4.30 0.50 0.22 0.05% 55 1 0.03 518.10 2.48 1.43 0.18 0.10 0.16% 56 24 0.09 592.22 2.89 4.82 0.60 0.23 0.07% 61 3 0.07 495.75 2.03 3.23 0.23 0.07 0.04% 62 18 0.09 547.83 3.11 4.11 0.56 0.29 0.03% 71 7 0.10 601.02 2.75 3.36 0.49 0.37 0.08% 72 27 0.16 543.03 3.09 4.24 0.58 0.31 0.08% 81 10 0.07 634.12 2.93 3.76 0.48 0.25 0.10% 92 2 0.11 707.96 2.59 5.15 0.35 0.22 0.09% 0 477 0.05 354.74 3.98 3.89 0.74 0.33 0.07%
Figure 2: Int
0
1
2
3
4
5
1
tensity Measure
11 21 22 23
Affect Intens
es for NYSE‐Liste
31 32 33
sity Negat
ed Firms (2005)
42 44 45 48
tive Intensity
8 49 51 52
Negative E
53 54 55 5
Emotion intens
56 61 62 71
sity Sadne
1 72 81 92
ess intensity
24
0
25
Table 5: OLS Estimates for Various Measures (2005‐2006)
Measures NYSE‐NASDAQ NYSE NASDAQ Coefficient
(X100) t‐stat Coefficient
(X100) t‐stat Coefficient
(X100) t‐stat
Emotions Affect ‐0.013* ‐1.779 ‐0.013* ‐1.747 ‐0.016 ‐1.179 Positive 0.002 0.295 0.000 0.100 0.004 0.315 Negative ‐0.032*** ‐4.248 ‐0.025*** ‐3.577 ‐0.045*** ‐3.050 Optimism 0.007 0.913 0.003 0.443 0.011 0.789 Anxiety ‐0.006 ‐0.833 ‐0.009 ‐1.295 ‐0.004 ‐0.296 Anger ‐0.001 ‐0.169 ‐0.002 ‐0.241 ‐0.003 ‐0.191 Sadness ‐0.038*** ‐5.034 ‐0.027*** ‐3.592 ‐0.050*** ‐3.688 Pos. feel. 0.002 0.265 ‐0.002 ‐0.294 0.007 0.442 Sentiment Positive ‐0.000** ‐1.936 ‐0.000** ‐2.022 ‐0.000* 1.715 Negative ‐0.016** ‐2.093 ‐0.004 ‐0.545 ‐0.036** ‐2.457 Direction Up 0.008 1.064 0.013* 1.786 0.000 0.028 Down ‐0.007 ‐0.966 ‐0.014** ‐1.940 ‐0.003 ‐0.203 Cognition Causation ‐0.010 ‐1.309 ‐0.001 ‐0.060 ‐0.015 ‐1.083
Insight ‐0.000 0.046 ‐0.007 ‐0.922 0.007 0.558 Discrepancy ‐0.003 ‐0.358 0.001 0.085 ‐0.013 ‐0.827 Tentative ‐0.004 ‐0.579 0.003 0.451 ‐0.011 ‐0.780 Inhibition 0.011 1.510 0.006 0.866 0.020 1.432 Certainty 0.011 1.493 0.010 1.407 0.013 0.960 Verb Tense Past ‐0.013 ‐1.760 ‐0.013** ‐1.991 ‐0.025 ‐1.425 Present ‐0.000 ‐0.033 0.004 0.488 ‐0.005 ‐0.377 Future 0.014 0.838 0.015** 2.050 0.012 0.908 Other Negation ‐0.012 ‐1.552 ‐0.007 ‐0.966 ‐0.018 ‐1.243 Money ‐0.016** ‐2.176 ‐0.020*** ‐2.818 ‐0.018 ‐1.118 OLS estimates of next day stock return on a linguistic measure, the four factors (market, SMB, HML, UDM), and a constant. Robust SEs are used. *** Significant at the 0.01 level ** Significant at the 0.05 level * Significant at the 0.10 level
26
Table 6: Industry‐Level OLS Estimates for Various Measures (2005‐2006)
Industries Positive Sentiment
Negative Sentiment
Positive Emotions
Negative Emotions
Sadness Optimism Anxiety Anger
0 ‐0.000 ‐0.001 0.010 ‐0.021 ‐0.029** 0.012 ‐0.008 ‐0.006
11 ‐0.000 0.065 ‐0.066 ‐0.043 0.045 ‐0.030 ‐0.049 ‐0.082
21 0.000 0.041 ‐0.035 0.075 0.024 0.005 ‐0.011 0.103**
22 ‐0.000** 0.018 ‐0.008 ‐0.002 ‐0.052*** 0.005 ‐0.043* 0.034*
23 ‐0.000 0.076 0.055 ‐0.024 ‐0.056 0.140 ‐0.062 0.009
31 ‐0.000 0.016 0.032 0.036 ‐0.018 ‐0.007 0.025 ‐0.002
32 0.000 0.006 0.026 0.006 ‐0.035* 0.012 ‐0.011 0.022
33 0.000 0.003 0.006 ‐0.025* ‐0.030** 0.015 ‐0.004 0.009
42 ‐0.000 ‐0.104** ‐0.056 ‐0.108*** ‐0.141*** 0.033 ‐0.051 0.037
44 ‐0.000* ‐0.068* ‐0.007 ‐0.101** ‐0.093*** 0.023 ‐0.045 ‐0.021
45 ‐0.000 ‐0.067* ‐0.028 ‐0.054 ‐0.039 0.013 0.021 ‐0.018
48 0.000 ‐0.031 0.012 ‐0.121* ‐0.119*** ‐0.013 0.006 ‐0.044
51 0.000 0.017 0.027* ‐0.025 ‐0.033* 0.011 0.012 0.005
52 ‐0.000 ‐0.007 0.004 ‐0.012 ‐0.021** 0.001 ‐0.015 ‐0.010
53 ‐0.000 ‐0.034 ‐0.066 ‐0.103** ‐0.082* ‐0.044 0.012* ‐0.055
54 0.000* ‐0.027 0.055** ‐0.049** ‐0.044* 0.042** 0.023 ‐0.036*
56 0.000 ‐0.007 ‐0.057 0.005 0.021 ‐0.035 ‐0.012 ‐0.034
61 ‐0.000 0.144 0.039 ‐0.061 ‐0.192 ‐0.016 0.109 0.120
62 ‐0.000 ‐0.114** 0.007 ‐0.113** ‐0.067 0.055 ‐0.083 ‐0.076
71 ‐0.000 0.079 ‐0.026 ‐0.031 ‐0.029 ‐0.049 0.108 0.035
72 ‐0.000* 0.007 ‐0.018 ‐0.031 0.064* 0.036 ‐0.007 ‐0.009
81 ‐0.001** 0.127 ‐0.031 0.063 0.029 0.115 0.003 ‐0.137
OLS estimates at the industry level of next day stock return on a linguistic measure, the four factors (market, SMB, HML, UDM), and a constant.Robust SEs are used. *** Significant at the 0.01 level ** Significant at the 0.05 level * Significant at the 0.10 level The marginal effects reported are computed at the sample mean of the underlying variables.
27
Table 7: Measures per industries: 2005‐2006
Measures Industries Sign (# + / # ‐ /non‐significant)
Emotions Affect 42;53 0/2/19 Positive emotions 54 1/0/20 Positive feelings 22;71 1/1/19 Negative emotions 42;44;53;54;62 0/5/16 Optimism 54 1/0/20 Anxiety 0/0/21 Anger 21 1/0/20 Sadness 0;22;33;42;44;48 0/6/15 Sentiment Positive 22;81 0/2/19 Negative 42;62 0/2/19 OLS estimates at the industry level of next day stock return on a linguistic measure, the four factors (market, SMB, HML, UDM), and a constant. Robust SEs are used. Column 2 gives the 2‐digit NAICS codes for the industries with coefficients significant at the 0.05 level for the relevant measures. Column 3 gives the % of 2‐digit NAICS codes in our sample for which we have significant coefficients at the 0.05 level. Column 4 gives x/y/z such that x (y) is the # of 2‐digit NAICS codes for which we have positive (negative) significant coefficient and z is the number of non‐significant coefficients.
28
Figure 3: Positive and Negative Reputation Shock Coefficients by 2‐Digit NAICS Codes (NYSE‐NASDAQ 2005‐2006)
29
Figure 4: Positive and Negative Emotional Reputation Shock Coefficients by 2‐Digit NAICS Codes (NYSE‐NASDAQ 2005‐2006)
30
Figure 5: Sadness and Optimism Shock Coefficients by 2‐Digit NAICS Codes (NYSE‐NASDAQ 2005‐2006)
31
Figure 6: Anger and Anxiety Shock Coefficients by 2‐Digit NAICS Codes (NYSE‐NASDAQ 2005‐2006)