+ All Categories
Home > Marketing > On the validity and impact of metacritic scores adams greenwood-ericksen

On the validity and impact of metacritic scores adams greenwood-ericksen

Date post: 24-Apr-2015
Category:
Upload: game-connection
View: 175 times
Download: 2 times
Share this document with a friend
Description:
This talk will look at how meta-review sites such as Metacritic, Gamerankings, and Rotten Tomatoes assess games, and how these scores relate to game sales. The broad validity of various metascore generation processes will be discussed, along with recent quantitative research into meta-review scores that touches both on the composition of these scores and also on their usefulness as metrics for assessing game quality.
23
ON THE VALIDITY AND IMPACT OF METACRITIC SCORES Adams Greenwood-Ericksen, PhD Special thanks to Erica Holcomb, MS and Cameron Bolinger, MS
Transcript
Page 1: On the validity and impact of metacritic scores  adams greenwood-ericksen

ON THE VALIDITY AND IMPACT OF METACRITIC SCORESAdams Greenwood-Ericksen, PhD

Special thanks to Erica Holcomb, MS and Cameron Bolinger, MS

Page 2: On the validity and impact of metacritic scores  adams greenwood-ericksen

Agenda

Why does metareview happen? How does metareview work? Why is metareview problematic? What have we found out so far? What’s next?

Page 3: On the validity and impact of metacritic scores  adams greenwood-ericksen

In the beginning…

There was Play Meter (1974-present). Then Weekly Shōnen Jump, Computer and

Video Games, Electronic Games, Electronic Gaming Monthly (still around!), Computer Gaming World, Nintendo Power, and so on.*

* According to Wikipedia, at least. I wasn’t around until that last one.

Page 4: On the validity and impact of metacritic scores  adams greenwood-ericksen

Diversification and the internet

As gaming journals proliferated, so did the diversity of opinions.

With the rise of the internet, this proliferation led to information overload across all areas of media criticism.

This, in turn, led to the beginnings of metareview.

Page 5: On the validity and impact of metacritic scores  adams greenwood-ericksen

Metareview sites matter in the industry

In general, there’s a sense that Metareview scores are a critical factor in the success or failure of game products.

Why? THQ stock price Bonus money for Fallout: New Vegas Bonus money for Destiny Warren Spector on Metacritic at DICE 2013

http://news.yahoo.com/homefront-reviews-torpedo-thq-stock-price-metacritic-broken-20110316-084500-427.htmlhttp://www.kotaku.com.au/2014/09/destiny-review-scores-may-cost-bungie-25-million/http://www.joystiq.com/2012/03/15/obsidian-missed-fallout-new-vegas-metacritic-bonus-by-one-point/

Page 6: On the validity and impact of metacritic scores  adams greenwood-ericksen

We wondered…

Why is this such a big deal? How does it work? Does it work at all? How does Metareview help or hurt? Who does it help or hurt? Are there better ways to do this?

Page 7: On the validity and impact of metacritic scores  adams greenwood-ericksen

My grad students

(Because somebody’s got to do the real work)

Erica Holcomb, MS Cameron Bolinger, MS

Page 8: On the validity and impact of metacritic scores  adams greenwood-ericksen

What’s good about metareview?

Clearinghouse and index for lots of different information sources

Aggregates lots of individual data points into a more coherent single answer

Metareview reduces information overload

Page 9: On the validity and impact of metacritic scores  adams greenwood-ericksen

Problems with metareview

Basic premise: give up nuance and diversity of opinion in exchange for clarity

Lots of issues with validity related to scores, aggregation, etc (Greenwood-Ericksen, Poorman, and Papp, 2014).

Vulnerability to manipulation by 3rd parties Leads to oversimplification of a complex

topic Discards lots of relevant context

http://www.eludamos.org/index.php/eludamos

Page 10: On the validity and impact of metacritic scores  adams greenwood-ericksen

Metacritic vs. Gamerankings Actually indicative of theoretical difference in

approach Gamerankings: all currently extant publications

have equal value Metacritic: some publications are more trustworthy

than others Neither approach is without drawbacks

Greater transparency vs. greater reliability of data Rotten Tomatoes has an interesting alternative

approach as well (but they don’t review games).

Page 11: On the validity and impact of metacritic scores  adams greenwood-ericksen

Examination of validity issues

Identified lots of issues with the validity of the metareview process in general, and Metacritic specifically (Greenwood-Ericksen, Poorman, & Papp, 2014). Loss of useful diversity Issues with review reliability Distortions in translation to 100 pt scale Lack of transparency in weighting system Very serious problems with interpretation and

application of results Still found very strong relationship between sales

and scores (r = .72)

Page 12: On the validity and impact of metacritic scores  adams greenwood-ericksen

Sales vs. Scores (≈10 years)

20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

14

PS3 Games

XBOX360

Metacritic Score

Sale

s (

Mil

lio

ns)

Page 13: On the validity and impact of metacritic scores  adams greenwood-ericksen

The transparency gap: Metacritic weights

One key issue with Metacritic: secret weighting of scores by publication

Exact weights are a moving target Two things to look at:

Derived weights: try to figure out what numbers Metacritic is using (Greenwood-Ericksen, Poorman, & Papp, 2013; Bolinger, Holcomb, Greenwood-Ericksen, 2015)

Observed weights: watch how much publications push or pull overall scores (Swisher, 2014).

Page 14: On the validity and impact of metacritic scores  adams greenwood-ericksen

Observed weights Less about exact match, and more about

rationality of influence Scott Swisher (UW Madison, now at

Cambridge) Doctoral thesis on review weighting and

reliability Found some logical and interesting patterns

over time Gamespot/Jeff Gerstmann scandal 2007

Really great treasure trove of data on Metacritic – check it out if you’re interested.

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0CCMQFjAB&url=https%3A%2F%2Fs3.amazonaws.com%2Fsswisher_econ%2FSwisher_InfoAgg_Summary_7-12-2013.pdf&ei=Jj0_VIKcEYSkyQS29oHwBQ&usg=AFQjCNEud9tDybB6oZ54XPyh-O1cAx-vzQ&sig2=G9zxCSmAkT2FAlBXnDgkFg&bvm=bv.77648437,d.aWw

Page 15: On the validity and impact of metacritic scores  adams greenwood-ericksen

Derived weights

“Derived” in that we’re trying to reproduce the actual weights Metacritic uses This is really hard, it turns out

Obstacles: Rounding Relatively rapid changes in weighting Identifying starting weights Identifying number of tiers (or if there are

tiers at all)

Page 16: On the validity and impact of metacritic scores  adams greenwood-ericksen

Modeling derived weights

First pass: GDC 2013 Fundamental approach seemed to work

Problems with reliability of method Some assumptions probably wrong

Key outcomes: Established basic method as possible Got us yelled at by Metacritic In response, Metacritic published a ton of new

information about how weighting works Subsequently, found more issues with model

stability and uniqueness. Lots of great new info to work with!

Page 17: On the validity and impact of metacritic scores  adams greenwood-ericksen

Fanmail!

What they said: They use

weighting tiers, but… We had too many

tiers Our range of

weights was too large, and…

We had at least some of the modeled weights wrong

Page 18: On the validity and impact of metacritic scores  adams greenwood-ericksen

Modeling derived weights: Recent work

Second pass (GDC 2015): Computationally more robust Incorporated new info from Metacritic Removed most sources of human error Focused on one narrow time frame Focused on one single platform Were able to identify a stable model with 0% error

(that’s possible because the original inputs are manmade) Note: that doesn’t necessarily mean it’s the actual values!

Full results to be presented at GDC San Francisco in March.

Page 19: On the validity and impact of metacritic scores  adams greenwood-ericksen

Another core question:

Does the weighting matter at all? Did a comparison using 2012 data of game

sales (VGChartz) to: Metacritic metascore Gamerankings score Unweighted average of critic scores curated by

Metacritic Found significant correlation between

games and scores for each, BUT… Metacritic no better than the others

Page 20: On the validity and impact of metacritic scores  adams greenwood-ericksen

Relative predictive value of metareview scores relative to sales

1st week 2012 sales total0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Correlation of Sales to Scores

MetacriticGamerankingsUnweighted

Sales

Pears

on's

R

Page 21: On the validity and impact of metacritic scores  adams greenwood-ericksen

What does that mean?

It means that, from the standpoint of sales prediction, the weights don’t matter. Shouldn’t be hugely surprising – differences

always very small Might suggest that reviewers tend to have

similar opinions – or that they’re benchmarking scores to each other

Suggest that Metacritic could drop this controversial aspect of their product without weakening its predictive value

Page 22: On the validity and impact of metacritic scores  adams greenwood-ericksen

What we’re thinking about next… We’ve found some cool things, but there’s still

a lot to be done on this Do game reviewers adjust their scores based on

already published reviews? Do score averages tend to go up or down over

time? How much does marketing affect score/sales?

Ultimately, more research is needed Metareview isn’t going away – it’s too useful Implications for virtually all media Need to figure out how to do this better!

Page 23: On the validity and impact of metacritic scores  adams greenwood-ericksen

Questions?

Thanks for your time!


Recommended