On the validity and impact of metacritic scores adams greenwood-ericksen

ON THE VALIDITY AND IMPACT OF METACRITIC SCORESAdams Greenwood-Ericksen, PhD

Special thanks to Erica Holcomb, MS and Cameron Bolinger, MS

Agenda

Why does metareview happen? How does metareview work? Why is metareview problematic? What have we found out so far? What’s next?

In the beginning…

There was Play Meter (1974-present). Then Weekly Shōnen Jump, Computer and

Video Games, Electronic Games, Electronic Gaming Monthly (still around!), Computer Gaming World, Nintendo Power, and so on.*

* According to Wikipedia, at least. I wasn’t around until that last one.

Diversification and the internet

As gaming journals proliferated, so did the diversity of opinions.

With the rise of the internet, this proliferation led to information overload across all areas of media criticism.

This, in turn, led to the beginnings of metareview.

Metareview sites matter in the industry

In general, there’s a sense that Metareview scores are a critical factor in the success or failure of game products.

Why? THQ stock price Bonus money for Fallout: New Vegas Bonus money for Destiny Warren Spector on Metacritic at DICE 2013

http://news.yahoo.com/homefront-reviews-torpedo-thq-stock-price-metacritic-broken-20110316-084500-427.htmlhttp://www.kotaku.com.au/2014/09/destiny-review-scores-may-cost-bungie-25-million/http://www.joystiq.com/2012/03/15/obsidian-missed-fallout-new-vegas-metacritic-bonus-by-one-point/

http://news.yahoo.com/homefront-reviews-torpedo-thq-stock-price-metacritic-broken-20110316-084500-427.html

http://news.yahoo.com/homefront-reviews-torpedo-thq-stock-price-metacritic-broken-20110316-084500-427.html

http://www.kotaku.com.au/2014/09/destiny-review-scores-may-cost-bungie-25-million/

http://www.joystiq.com/2012/03/15/obsidian-missed-fallout-new-vegas-metacritic-bonus-by-one-point/

We wondered…

Why is this such a big deal? How does it work? Does it work at all? How does Metareview help or hurt? Who does it help or hurt? Are there better ways to do this?

My grad students

(Because somebody’s got to do the real work)

Erica Holcomb, MS Cameron Bolinger, MS

What’s good about metareview?

Clearinghouse and index for lots of different information sources

Aggregates lots of individual data points into a more coherent single answer

Metareview reduces information overload

Problems with metareview

Basic premise: give up nuance and diversity of opinion in exchange for clarity

Lots of issues with validity related to scores, aggregation, etc (Greenwood-Ericksen, Poorman, and Papp, 2014).

Vulnerability to manipulation by 3rd parties Leads to oversimplification of a complex

topic Discards lots of relevant context

http://www.eludamos.org/index.php/eludamos



Metacritic vs. Gamerankings Actually indicative of theoretical difference in

approach Gamerankings: all currently extant publications

have equal value Metacritic: some publications are more trustworthy

than others Neither approach is without drawbacks

Greater transparency vs. greater reliability of data Rotten Tomatoes has an interesting alternative

approach as well (but they don’t review games).

Examination of validity issues

Identified lots of issues with the validity of the metareview process in general, and Metacritic specifically (Greenwood-Ericksen, Poorman, & Papp, 2014). Loss of useful diversity Issues with review reliability Distortions in translation to 100 pt scale Lack of transparency in weighting system Very serious problems with interpretation and

application of results Still found very strong relationship between sales

and scores (r = .72)

Sales vs. Scores (≈10 years)

20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

14

PS3 Games

XBOX360

Metacritic Score

Sale

s (

Mil

lio

ns)

The transparency gap: Metacritic weights

One key issue with Metacritic: secret weighting of scores by publication

Exact weights are a moving target Two things to look at:

Derived weights: try to figure out what numbers Metacritic is using (Greenwood-Ericksen, Poorman, & Papp, 2013; Bolinger, Holcomb, Greenwood-Ericksen, 2015)

Observed weights: watch how much publications push or pull overall scores (Swisher, 2014).

Observed weights Less about exact match, and more about

rationality of influence Scott Swisher (UW Madison, now at

Cambridge) Doctoral thesis on review weighting and

reliability Found some logical and interesting patterns

over time Gamespot/Jeff Gerstmann scandal 2007

Really great treasure trove of data on Metacritic – check it out if you’re interested.

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0CCMQFjAB&url=https%3A%2F%2Fs3.amazonaws.com%2Fsswisher_econ%2FSwisher_InfoAgg_Summary_7-12-2013.pdf&ei=Jj0_VIKcEYSkyQS29oHwBQ&usg=AFQjCNEud9tDybB6oZ54XPyh-O1cAx-vzQ&sig2=G9zxCSmAkT2FAlBXnDgkFg&bvm=bv.77648437,d.aWw

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0CCMQFjAB&url=https://s3.amazonaws.com/sswisher_econ/Swisher_InfoAgg_Summary_7-12-2013.pdf&ei=Jj0_VIKcEYSkyQS29oHwBQ&usg=AFQjCNEud9tDybB6oZ54XPyh-O1cAx-vzQ&sig2=G9




Derived weights

“Derived” in that we’re trying to reproduce the actual weights Metacritic uses This is really hard, it turns out

Obstacles: Rounding Relatively rapid changes in weighting Identifying starting weights Identifying number of tiers (or if there are

tiers at all)

Modeling derived weights

First pass: GDC 2013 Fundamental approach seemed to work

Problems with reliability of method Some assumptions probably wrong

Key outcomes: Established basic method as possible Got us yelled at by Metacritic In response, Metacritic published a ton of new

information about how weighting works Subsequently, found more issues with model

stability and uniqueness. Lots of great new info to work with!

Fanmail!

What they said: They use

weighting tiers, but… We had too many

tiers Our range of

weights was too large, and…

We had at least some of the modeled weights wrong

Modeling derived weights: Recent work

Second pass (GDC 2015): Computationally more robust Incorporated new info from Metacritic Removed most sources of human error Focused on one narrow time frame Focused on one single platform Were able to identify a stable model with 0% error

(that’s possible because the original inputs are manmade) Note: that doesn’t necessarily mean it’s the actual values!

Full results to be presented at GDC San Francisco in March.

Another core question:

Does the weighting matter at all? Did a comparison using 2012 data of game

sales (VGChartz) to: Metacritic metascore Gamerankings score Unweighted average of critic scores curated by

Metacritic Found significant correlation between

games and scores for each, BUT… Metacritic no better than the others

Relative predictive value of metareview scores relative to sales

1st week 2012 sales total0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Correlation of Sales to Scores

MetacriticGamerankingsUnweighted

Sales

Pears

on's

R

What does that mean?

It means that, from the standpoint of sales prediction, the weights don’t matter. Shouldn’t be hugely surprising – differences

always very small Might suggest that reviewers tend to have

similar opinions – or that they’re benchmarking scores to each other

Suggest that Metacritic could drop this controversial aspect of their product without weakening its predictive value

What we’re thinking about next… We’ve found some cool things, but there’s still

a lot to be done on this Do game reviewers adjust their scores based on

already published reviews? Do score averages tend to go up or down over

time? How much does marketing affect score/sales?

Ultimately, more research is needed Metareview isn’t going away – it’s too useful Implications for virtually all media Need to figure out how to do this better!

Questions?

Thanks for your time!

Date post:	24-Apr-2015
Category:	Marketing
Upload:	game-connection
View:	175 times
Download:	2 times

On the validity and impact of metacritic scores adams greenwood-ericksen

Marketing