Date post: | 24-Apr-2015 |
Category: |
Marketing |
Upload: | game-connection |
View: | 175 times |
Download: | 2 times |
ON THE VALIDITY AND IMPACT OF METACRITIC SCORESAdams Greenwood-Ericksen, PhD
Special thanks to Erica Holcomb, MS and Cameron Bolinger, MS
Agenda
Why does metareview happen? How does metareview work? Why is metareview problematic? What have we found out so far? What’s next?
In the beginning…
There was Play Meter (1974-present). Then Weekly Shōnen Jump, Computer and
Video Games, Electronic Games, Electronic Gaming Monthly (still around!), Computer Gaming World, Nintendo Power, and so on.*
* According to Wikipedia, at least. I wasn’t around until that last one.
Diversification and the internet
As gaming journals proliferated, so did the diversity of opinions.
With the rise of the internet, this proliferation led to information overload across all areas of media criticism.
This, in turn, led to the beginnings of metareview.
Metareview sites matter in the industry
In general, there’s a sense that Metareview scores are a critical factor in the success or failure of game products.
Why? THQ stock price Bonus money for Fallout: New Vegas Bonus money for Destiny Warren Spector on Metacritic at DICE 2013
http://news.yahoo.com/homefront-reviews-torpedo-thq-stock-price-metacritic-broken-20110316-084500-427.htmlhttp://www.kotaku.com.au/2014/09/destiny-review-scores-may-cost-bungie-25-million/http://www.joystiq.com/2012/03/15/obsidian-missed-fallout-new-vegas-metacritic-bonus-by-one-point/
We wondered…
Why is this such a big deal? How does it work? Does it work at all? How does Metareview help or hurt? Who does it help or hurt? Are there better ways to do this?
My grad students
(Because somebody’s got to do the real work)
Erica Holcomb, MS Cameron Bolinger, MS
What’s good about metareview?
Clearinghouse and index for lots of different information sources
Aggregates lots of individual data points into a more coherent single answer
Metareview reduces information overload
Problems with metareview
Basic premise: give up nuance and diversity of opinion in exchange for clarity
Lots of issues with validity related to scores, aggregation, etc (Greenwood-Ericksen, Poorman, and Papp, 2014).
Vulnerability to manipulation by 3rd parties Leads to oversimplification of a complex
topic Discards lots of relevant context
http://www.eludamos.org/index.php/eludamos
Metacritic vs. Gamerankings Actually indicative of theoretical difference in
approach Gamerankings: all currently extant publications
have equal value Metacritic: some publications are more trustworthy
than others Neither approach is without drawbacks
Greater transparency vs. greater reliability of data Rotten Tomatoes has an interesting alternative
approach as well (but they don’t review games).
Examination of validity issues
Identified lots of issues with the validity of the metareview process in general, and Metacritic specifically (Greenwood-Ericksen, Poorman, & Papp, 2014). Loss of useful diversity Issues with review reliability Distortions in translation to 100 pt scale Lack of transparency in weighting system Very serious problems with interpretation and
application of results Still found very strong relationship between sales
and scores (r = .72)
Sales vs. Scores (≈10 years)
20 30 40 50 60 70 80 90 1000
2
4
6
8
10
12
14
PS3 Games
XBOX360
Metacritic Score
Sale
s (
Mil
lio
ns)
The transparency gap: Metacritic weights
One key issue with Metacritic: secret weighting of scores by publication
Exact weights are a moving target Two things to look at:
Derived weights: try to figure out what numbers Metacritic is using (Greenwood-Ericksen, Poorman, & Papp, 2013; Bolinger, Holcomb, Greenwood-Ericksen, 2015)
Observed weights: watch how much publications push or pull overall scores (Swisher, 2014).
Observed weights Less about exact match, and more about
rationality of influence Scott Swisher (UW Madison, now at
Cambridge) Doctoral thesis on review weighting and
reliability Found some logical and interesting patterns
over time Gamespot/Jeff Gerstmann scandal 2007
Really great treasure trove of data on Metacritic – check it out if you’re interested.
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0CCMQFjAB&url=https%3A%2F%2Fs3.amazonaws.com%2Fsswisher_econ%2FSwisher_InfoAgg_Summary_7-12-2013.pdf&ei=Jj0_VIKcEYSkyQS29oHwBQ&usg=AFQjCNEud9tDybB6oZ54XPyh-O1cAx-vzQ&sig2=G9zxCSmAkT2FAlBXnDgkFg&bvm=bv.77648437,d.aWw
Derived weights
“Derived” in that we’re trying to reproduce the actual weights Metacritic uses This is really hard, it turns out
Obstacles: Rounding Relatively rapid changes in weighting Identifying starting weights Identifying number of tiers (or if there are
tiers at all)
Modeling derived weights
First pass: GDC 2013 Fundamental approach seemed to work
Problems with reliability of method Some assumptions probably wrong
Key outcomes: Established basic method as possible Got us yelled at by Metacritic In response, Metacritic published a ton of new
information about how weighting works Subsequently, found more issues with model
stability and uniqueness. Lots of great new info to work with!
Fanmail!
What they said: They use
weighting tiers, but… We had too many
tiers Our range of
weights was too large, and…
We had at least some of the modeled weights wrong
Modeling derived weights: Recent work
Second pass (GDC 2015): Computationally more robust Incorporated new info from Metacritic Removed most sources of human error Focused on one narrow time frame Focused on one single platform Were able to identify a stable model with 0% error
(that’s possible because the original inputs are manmade) Note: that doesn’t necessarily mean it’s the actual values!
Full results to be presented at GDC San Francisco in March.
Another core question:
Does the weighting matter at all? Did a comparison using 2012 data of game
sales (VGChartz) to: Metacritic metascore Gamerankings score Unweighted average of critic scores curated by
Metacritic Found significant correlation between
games and scores for each, BUT… Metacritic no better than the others
Relative predictive value of metareview scores relative to sales
1st week 2012 sales total0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Correlation of Sales to Scores
MetacriticGamerankingsUnweighted
Sales
Pears
on's
R
What does that mean?
It means that, from the standpoint of sales prediction, the weights don’t matter. Shouldn’t be hugely surprising – differences
always very small Might suggest that reviewers tend to have
similar opinions – or that they’re benchmarking scores to each other
Suggest that Metacritic could drop this controversial aspect of their product without weakening its predictive value
What we’re thinking about next… We’ve found some cool things, but there’s still
a lot to be done on this Do game reviewers adjust their scores based on
already published reviews? Do score averages tend to go up or down over
time? How much does marketing affect score/sales?
Ultimately, more research is needed Metareview isn’t going away – it’s too useful Implications for virtually all media Need to figure out how to do this better!
Questions?
Thanks for your time!