+ All Categories
Home > Education > Analyzing Responses to Likert Items

Analyzing Responses to Likert Items

Date post: 26-May-2015
Category:
Upload: sanjay-kairam
View: 6,589 times
Download: 1 times
Share this document with a friend
Description:
This is a brief (exploratory) discussion of how to run statistical analysis of responses to Likert items. The data used is from a study I ran on users of the PARC Wikipedia dashboard "WikiDashboard" looking at how the tool changed perceptions of credibility.
Popular Tags:
26
Analyzing Responses to Likert Items An Exploration of Data from a Credibility Study Involving WikiDashboard (http://wikidashboard.parc.com ) by Sanjay Kairam
Transcript
Page 1: Analyzing Responses to Likert Items

Analyzing Responses to Likert ItemsAn Exploration of Data from a Credibility Study Involving WikiDashboard (http://wikidashboard.parc.com)

by Sanjay Kairam

Page 2: Analyzing Responses to Likert Items

WikiDashboard StudyThe SystemThe StudyThe Data

Page 3: Analyzing Responses to Likert Items

WikiDashboard• “Social Dynamic

Analysis Tool” for Wikipedia

• Michael Scott (The Office): “Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you are getting the best possible information”

• What happens when we see who is doing the editing?

Page 4: Analyzing Responses to Likert Items

WikiDashboard (Close-Up)

Page 5: Analyzing Responses to Likert Items

WikiDashboard Study

•Study conducted on Amazon Mechanical Turk▫N = 288 subjects▫Subjects paid $0.08 / HIT

•“Please read and evaluate this Wikipedia Article.”

Page 6: Analyzing Responses to Likert Items

Experiment Conditions

•Participants each placed in 1 of 3 conditions (each N = 96):▫Wiki Only (WO)▫Wiki + History (WH)▫WikiDashboard (WD)

Page 7: Analyzing Responses to Likert Items

Articles Used• Each subject read 1 (of 8

possible) Wikipedia articles.• Article “Quality”:

▫ “Low-Quality” articles were those flagged as “B-Class” or “C-Class” by the Wikipedia community.

▫ “High-Quality” articles were those which had at one time been “Featured Articles”.

• Article “Controversiality”:▫ “Controversial” articles

were those on the extensive “List of Controversial Articles”.

ArticlesLow-

QualityHigh-Quality

Non-Controversial

Light Pollution The Sun

Hypnagogia

The Geology of the Grand

Canyon

Controversial

Hip Hop Music

Global Warming

Eco-Terrorism

Barack Obama

Page 8: Analyzing Responses to Likert Items

Survey•Self-Reported Expertise

▫“How familiar are you with the topic discussed on this Wikipedia page?”

•Manipulation/Quality Checks▫“In 5-20 words, please describe what this

Wikipedia page is about.”▫“Please describe one fact from the article that

you found interesting.” (WO)▫“Please name at least one user (by username

or IP address) who has made multiple edits to this page. (WH, WD)

Page 9: Analyzing Responses to Likert Items

Credibility Assessment•Assessing agreement with these statements:

▫“I believe that the information on this page is accurate.” (Accuracy)

▫“I believe that the information on this page is objective.” (Objectivity)

▫“I believe that the information on this page is current and up-to-date.” (Currency)

▫“I believe that this page fully covers the relevant information on the topic.” (Coverage)

▫“I trust the information on this page.” (Trust)

Page 10: Analyzing Responses to Likert Items

Likert Item Responses

•Participants answered using a 5-point scale:▫-2: “Strongly Disagree”▫-1: “Somewhat Disagree”▫0: “Neither Agree nor Disagree”▫+1: “Somewhat Agree”▫+2: “Strongly Agree”

•Now, what do we do with this data?

Page 11: Analyzing Responses to Likert Items

Analyzing Likert Item Responses•Very often, we see papers reporting Likert

responses using means:▫What is the average of 1 “Somewhat

Agree” and 3 “Somewhat Disagree”s? Hint: It’s not “Somewhat Disagree and a

Half”▫In this case, what does a “mean” mean?

•In most cases, an ANOVA would definitely not work as well, though people still try!

Page 12: Analyzing Responses to Likert Items

Options for Analysis1. Non-Parametric Tests for Ordinal Data2. Conversion to an Interval Scale3. Aggregating Items

Page 13: Analyzing Responses to Likert Items

Mann-Whitney U Test

•Also called “Mann-Whitney-Wolcoxon”, “Wilcoxon Rank-Sum”, or “Wilcoxon-Mann-Whitney” test.

•Non-parametric test for assessing whether two independent samples of observations have equally large values.

http://en.wikipedia.org/wiki/Mann-Whitney_U

Page 14: Analyzing Responses to Likert Items

Mann-Whitney U Test• Assumptions:

▫All observations from both groups are independent of each other.

▫The responses are ordinal or continuous measurements.

▫Null hypothesis includes symmetry between two populations considered

▫Under alternative hypothesis, probability of an observation from pop. X exceeding an observation from pop. Y is not equal to 0.5

http://en.wikipedia.org/wiki/Mann-Whitney_U

Page 15: Analyzing Responses to Likert Items

Kruskal-Wallis ANOVA

•What if we want to test more than 2 groups? (as we do, given our 3 experimental conditions)

•Kruskal-Wallis ANOVA is an extension of Mann-Whitney U to 3 or more groups.

•Also non-parametric, though it does assume that both distributions have a similar underlying shape.

http://en.wikipedia.org/wiki/Kruskal-Wallis_one-way_analysis_of_variance

Page 16: Analyzing Responses to Likert Items

Analysis Using Non-Parametric Tests

•Do participants actually notice differences in article quality?▫Mann-Whitney: Significant effects of article

quality for ratings of Accuracy (p < 0.001), Coverage (p < 0.01), Currency (p < 0.001), and Trust (p < 0.001), with marginally significant effect on Objectivity (p < 0.096).

▫Kruskal-Wallis: Significant effect on ratings of Accuracy (p < 0.001), Coverage (p < 0.012), Currency (p < 0.001), and Trust (p < 0.001), with no significant effect on Objectivity.

Page 17: Analyzing Responses to Likert Items

Sample Boxplots: Ratings by Article QualityAccuracy Coverage

Page 18: Analyzing Responses to Likert Items

Analysis Using Non-Parametric Tests

•Do participants notice differences in how “controversial” an article is?▫Mann-Whitney: Significant effect on ratings of

Coverage (p < 0.039), Currency (p < 0.039), Objectivity (p < 0.021), and Trust (p < 0.021), with no effect on ratings of Accuracy.

▫Kruskal-Wallis: Significant effect on ratings of Objectivity (p < 0.042), and marginally significant effect for Coverage (p < 0.077) and Currency (p < 0.083), but no significant effect on Accuracy or Trust.

Page 19: Analyzing Responses to Likert Items

Analysis Using Non-Parametric Tests

•What we really want to know, however, is whether using WikiDashboard or Wiki + History makes participants more sensitive to article quality or controversiality than participants using Wikipedia on its own.

•Both tests only allow us to compare populations separated on the basis of a single variable, however, so we can’t explore these interaction effects.

Page 20: Analyzing Responses to Likert Items

Conversion to Interval Scale

•If there were a way to map our Likert item responses on to an interval scale, we could use more familiar/powerful statistical tests.

•If we found that the mapped data was normal, for instance, we could use our usual parametric tests such as MANOVA, which would help us find these interaction effects.

Page 21: Analyzing Responses to Likert Items

Conversion to Interval Scale• E.J. Snell (1964) describes a procedure for

mapping ordered data, like Likert responses, to an assumed underlying continuous scale of measurement.

• At the end, he emphasizes that “the usefulness of the proposed method depends upon the assumption that the underlying scale of measurement can be transformed to produce a normal distribution.”

Snell, E.J. A Scaling Procedure for Ordered Categorical Data, Biometrics 20(3), pp. 592-607 (1964).http://www.jstor.org/stable/2528498

Page 22: Analyzing Responses to Likert Items

Utilizing the Snell Conversion

•The conversion procedure was used to transform the data – essentially mapped each response (ranging from -2 to +2) to a new point which ranged from roughly -1.00 to +4.05

•Essentially, it looks as if only the distances between the values has changed.

Page 23: Analyzing Responses to Likert Items

Histogram: Original Data

Page 24: Analyzing Responses to Likert Items

Histogram: Snell-Converted Data

Page 25: Analyzing Responses to Likert Items

Aggregating Likert Items

•If we consider the various Likert items to be different measurements of a certain underlying trait (Credibility), then can we sum them and run parametric statistical tests?

•Haven’t tried this yet – is this a valid approach?

Page 26: Analyzing Responses to Likert Items

Analyzing Responses to Likert Itemsby Sanjay Kairam

Email: [email protected]: @skairam


Recommended