+ All Categories
Home > Business > Schuster how to_avoid_mistakes_with_statistics_31052013

Schuster how to_avoid_mistakes_with_statistics_31052013

Date post: 25-May-2015
Category:
Upload: thomasschuster
View: 66 times
Download: 0 times
Share this document with a friend
Description:
In statistical publications, many mistakes can be found. This lecture presents the most serious ones and suggest how to avoid them.
Popular Tags:
65
IW Brown Bag Seminar 31/05/2013 Do Not Trust any Statistics you Did not Make Yourself Or How to Avoid Mistakes with Statistics Prof. Dr. Thomas Schuster International University of Applied Sciences Bad Honnef ∙ Bonn Research Fellow Cologne Institute of Economic Research
Transcript
Page 1: Schuster how to_avoid_mistakes_with_statistics_31052013

IW Brown Bag

Seminar

31/05/2013

Do Not Trust any Statistics you Did not Make Yourself

Or

How to Avoid Mistakes with Statistics

Prof. Dr. Thomas Schuster

International University of Applied Sciences Bad Honnef ∙ Bonn

Research Fellow Cologne Institute of Economic Research

Page 2: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 2

Outline of the Seminar

• First Famous Quotes in Statistics

• Measurement Scales of Variables

• Appropriate Use of Descriptive Statistics

• Appropriate Use of Diagrams

• How to Design Questionnaires

• Common Mistakes in Designing Questions

• Last Famous Quotes in Statistics

IW Brown Bag Seminar 31/05/2013

Page 3: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 3

Famous Quotes in Statistics

• Do not trust any statistics you did not fake yourself – (Presumably not from ) Winston Churchill

• How to lie with statistics – Darrel Huff (1954)

• There are three kinds of lies: lies, damned lies, statistics – Charles Wentworth Dilke

IW Brown Bag Seminar 31/05/2013

Page 4: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 4

Scales of Measurement

The scale indicates the data summarization and statistical analyses that are most appropriate.

The scale determines the amount of information

Scales of measurement include:

Nominal

Ordinal

Interval

Ratio

IW Brown Bag Seminar 31/05/2013

Page 5: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 5

Scales of Measurement

Nominal Scale

A non-numeric label or numeric code may be used.

Data are labels or names used to identify an attribute of the element.

IW Brown Bag Seminar 31/05/2013

Page 6: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 6

Example: Students of a university are classified by the school in which they are enrolled using a nonnumeric label such as Business, Humanities, Education, and so on. Alternatively, a numeric code could be used for the school variable (e.g. 1 denotes Business, 2 denotes Humanities, 3 denotes Education, and so on).

Nominal Scale

Scales of Measurement

IW Brown Bag Seminar 31/05/2013

Page 7: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 7

Scales of Measurement

Ordinal Scale

A non-numeric label or numeric code may be used.

The data have the properties of nominal data and the order or rank of the data is meaningful.

IW Brown Bag Seminar 31/05/2013

Page 8: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 8

Scales of Measurement

Ordinal Scale

Example: German school marks from “very good” to “inadequate” Alternatively, a numeric code could be used for the mark (e.g. 1 denotes very good, 2 denotes good, and so on).

IW Brown Bag Seminar 31/05/2013

Page 9: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 9

Scales of Measurement

Interval Scale

Interval data are always numeric.

The data have the properties of ordinal data, and the interval between observations is expressed in terms of a fixed unit of measure.

IW Brown Bag Seminar 31/05/2013

Page 10: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 10

Scales of Measurement

Example 1: Melissa has an TOEFL score of 105, while Kevin has an TOEFL score of 90. Melissa scored 15 points more than Kevin.

Interval Scale

Example 2: Answers to a question using a Likert scale: “Statistics is difficult to understand.” Strongly agree Agree Undecided Disagree Strongly disagree

IW Brown Bag Seminar 31/05/2013

Page 11: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 11

Scales of Measurement

Ratio Scale

The data have all the properties of interval data and the ratio of two values is meaningful.

Variables such as distance, height, weight, and time use the ratio scale.

This scale must contain a zero value that indicates that nothing exists for the variable at the zero point.

IW Brown Bag Seminar 31/05/2013

Page 12: Schuster how to_avoid_mistakes_with_statistics_31052013

Ratio Scale

Prof. Dr. Thomas Schuster 12

Scales of Measurement

Example: Melissa’s college record shows 36 credit points earned, while Kevin’s record shows 72 credit points earned. Kevin has twice as many credit points earned as Melissa.

IW Brown Bag Seminar 31/05/2013

Page 13: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 13

Quantitative data indicate how many or how much:

discrete, if measuring how many.

continuous, if measuring how much.

Quantitative data are always numeric.

Ordinary arithmetic operations are meaningful for quantitative data.

Quantitative Data

Use either the interval or ratio scale of measurement.

IW Brown Bag Seminar 31/05/2013

Page 14: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 14

Qualitative Quantitative

Non-numerical Numerical Numerical

Data

Nominal Ordinal Nominal Ordinal Interval Ratio

Scales of Measurement

IW Brown Bag Seminar 31/05/2013

Page 15: Schuster how to_avoid_mistakes_with_statistics_31052013

Appropriate Use of Descriptive Statistics and

Diagrams

• The scale of measurement determines

– which types of graphical presentation is appropriate

– which descriptive statistics can be used

– which bi- and multivariate statistical methods can be

applied

Prof. Dr. Thomas Schuster 15 IW Brown Bag Seminar 31/05/2013

Page 16: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 16

Summarizing Data from Nominal or Ordinal

Variables

• Frequency Distribution

• Relative Frequency Distribution

• Percent Frequency Distribution

• Column Graph

• Bar Graph

• Pie Chart

IW Brown Bag Seminar 31/05/2013

Page 17: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 17

Poor

Below Average

Average

Above Average

Excellent

2

3

5

9

1

Total 20

Rating Frequency

Frequency Distribution

IW Brown Bag Seminar 31/05/2013

Quality rating of hotel guests

Page 18: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 18

Poor

Below Average

Average

Above Average

Excellent

0.10

0.15

0.25

0.45

0.05

Total 1.00

10

15

25

45

5

100

Relative

Frequency

Percent

Frequency Rating

Always 100%

Relative and Percent Frequency Distribution

IW Brown Bag Seminar 31/05/2013

Page 19: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 19

Poor Below Average

Average Above Average

Excellent

Fre

qu

ency

Rating

1

2

3

4

5

6

7

8

9

10 Marada Inn Quality Ratings

Column Graph

IW Brown Bag Seminar 31/05/2013

Page 20: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 20

Bar Graph

IW Brown Bag Seminar 31/05/2013

62.1%

51.6%

44.5%

43.8%

42.7%

29.0%

23.5%

0% 25% 50% 75% 100%

Romania

Bulgaria

Hungary

Lithuania

Poland

Latvia

Czech Republic

pe

r c

en

t o

f re

sp

on

de

nts

Support of euro adoption by country 2011

Source: Eurobarometer Flash No. 329 (2011)

Page 21: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 21

Below Average 15%

Average 25%

Above Average 45%

Poor 10%

Excellent 5%

Marada Inn Quality Ratings

Pie Chart

IW Brown Bag Seminar 31/05/2013

Page 22: Schuster how to_avoid_mistakes_with_statistics_31052013

Examples

Prof. Dr. Thomas Schuster 22 IW Brown Bag Seminar 31/05/2013

Source: Australian Bureau of Statistics (2006)

Page 23: Schuster how to_avoid_mistakes_with_statistics_31052013

Examples

Prof. Dr. Thomas Schuster 23 IW Brown Bag Seminar 31/05/2013

Source: Microsoft (2007)

Page 24: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 24

Summarizing Data from Interval- or Ratio-

scaled Variables

• Frequency Distribution

• Relative Frequency and Percent Frequency

Distributions

• Histogram

• Ogive

IW Brown Bag Seminar 31/05/2013

Page 25: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 25

Frequency Distribution

Use between 5 and 20 classes.

Classes must not overlap.

Smaller data sets usually require fewer classes.

Guidelines for establishing classes

Data has to be sorted into classes

IW Brown Bag Seminar 31/05/2013

Page 26: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 26

Frequency Distribution

50 bills of Hudson Auto Repair

50-59

60-69

70-79

80-89

90-99

100-109

2

13

16

7

7

5

Total 50

Parts Cost ($) Frequency

IW Brown Bag Seminar 31/05/2013

Page 27: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 27

50-59

60-69

70-79

80-89

90-99

100-109

Parts

Cost ($)

0.04

0.26

0.32

0.14

0.14

0.10

Total 1.00

Relative

Frequency

4

26

32

14

14

10

100

Percent

Frequency

Relative Frequency and

Percent Frequency Distributions

IW Brown Bag Seminar 31/05/2013

Always 100%

Page 28: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 28

2

4

6

8

10

12

14

16

18

Parts Cost ($)

Fre

qu

en

cy

50-59 60-69 70-79 80-89 90-99 100-110

Tune-up Parts Cost

IW Brown Bag Seminar 31/05/2013

Histogram

Page 29: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 29

An ogive is a graph of a cumulative distribution.

The data values are shown on the horizontal axis. Shown on the vertical axis are the:

cumulative frequencies, or

cumulative relative frequencies, or

cumulative percent frequencies

The frequency (one of the above) of each class is plotted as a point.

The plotted points are connected by straight lines.

Ogive

IW Brown Bag Seminar 31/05/2013

The x-values are determined as follows:

(Upper limit of class + lower limit of next class)/2

Page 30: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 30

Parts Cost (€)

20

40

60

80

100

Cu

mu

lati

ve

Per

cen

tag

e F

req

uen

cy

50 60 70 80 90 100 110

(89.5, 76%)

Tune-up Parts Cost

Ogive with

Cumulative Percent Frequencies

IW Brown Bag Seminar 31/05/2013

Page 31: Schuster how to_avoid_mistakes_with_statistics_31052013

Examples

Prof. Dr. Thomas Schuster 31 IW Brown Bag Seminar 31/05/2013

Source: Google (2010)

Page 32: Schuster how to_avoid_mistakes_with_statistics_31052013

Examples

Prof. Dr. Thomas Schuster 32 IW Brown Bag Seminar 31/05/2013

Source: Institute for Fiscal Studies (2006)

Page 33: Schuster how to_avoid_mistakes_with_statistics_31052013

• Arithmetic Mean

• Geometric Mean

• Harmonic Mean

• Median

• Mode

• Percentiles

• Quartiles

Prof. Dr. Thomas Schuster 33

Descriptive Statistics for Interval- or Ratio-

scaled Variables

Means are mean!

IW Brown Bag Seminar 31/05/2013

Page 34: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 34

Number of observations in the sample

Sum of the values of the n observations

Arithmetic Sample Mean

ix

xn

IW Brown Bag Seminar 31/05/2013

Page 35: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 35

nn

i

ig xx

1

1

Sample formula

Geometric Sample Mean

IW Brown Bag Seminar 31/05/2013

Page 36: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 36

Harmonic Sample Mean

IW Brown Bag Seminar 31/05/2013

𝑥 =1

𝑛

1

1𝑥𝑖

𝑛𝑖=1

Page 37: Schuster how to_avoid_mistakes_with_statistics_31052013

Which mean to choose?

• Geometric mean

– All growth variables

– E.g. inflation, economic growth, wage increase

• Harmonic mean

– If ratios are involved

– E.g. speed in kilometre per hour, price-earnings ratio

• Arithmetic mean

– All other variables

Prof. Dr. Thomas Schuster 37 IW Brown Bag Seminar 31/05/2013

Page 38: Schuster how to_avoid_mistakes_with_statistics_31052013

Prof. Dr. Thomas Schuster 38

Sample Geometric Mean

• An employee received a 5 percent increase in salary

last year and a 15 percent increase this year.

• Calculate the average percentage increase in salary.

• The average percentage increase is 9.886%

– (1.09886 – 1) x 100 = 0.09886 x 100 = 9.886%

• The average is not 10%!

09886.115.105.12 xxg

IW Brown Bag Seminar 31/05/2013

Page 39: Schuster how to_avoid_mistakes_with_statistics_31052013

Examples

Prof. Dr. Thomas Schuster 39 IW Brown Bag Seminar 31/05/2013

Source: New York Society of Security Analysts (2011)

Page 40: Schuster how to_avoid_mistakes_with_statistics_31052013

How to Design a Questionnaire

• Basic rule

– Make it as simple as possible!

• Background

– Missing values distort results

– Forced opinions distort results

– If the response rate is too low, the results are not

representative

Prof. Dr. Thomas Schuster 40 IW Brown Bag Seminar 31/05/2013

Page 41: Schuster how to_avoid_mistakes_with_statistics_31052013

Response Rates of Self-Completion

Questionnaires

• Classification of response rates – Over 85% excellent

– 70–85% very good

– 60–70% acceptable

– 50–60% barely acceptable

– Below 50% not acceptable

– Source • Mangione (1995)

• Response rate should be at least 60%

• Low response rates means a biased sample – The answers are not representative

Page 42: Schuster how to_avoid_mistakes_with_statistics_31052013

Response Rates of Self-Completion

Questionnaires

• Response rate of different types of self-

completion questionnaires

– Mail 31.5%

– Postcard, Email 29.7%

– Postcard, Email, Postcard 28.6%

– Email, Postcard 25.4%

– Email only 20.7%

– Source

• Kaplowitz et al. (2004)

Page 43: Schuster how to_avoid_mistakes_with_statistics_31052013

Response Rates of Self-Completion

Questionnaires

• Strategies to improve response rates – Covering letter with personalised name or address

– Attractive layout

– Clear instructions

– Stamped addressed envelope

– Reminder after some weeks

• Telephone, post, email, …

– Monetary incentives

• Gifts

• Prize of a lottery

Page 44: Schuster how to_avoid_mistakes_with_statistics_31052013

Designing the Self-Completion Questionnaire

• Uncluttered layout

– Neither too short and cramped nor too long and

bulky

• Clear presentation

– Variety of font sizes, bold print, italics, and

CAPITAL letters

– But be consistent!

Page 45: Schuster how to_avoid_mistakes_with_statistics_31052013

Designing the Self-Completion Questionnaire

• Clear instructions to respondent

– How to indicate choice of answer

– How many answers to give

– For each question, there must be one instruction

with different font layout

– Examples

• Please tick one box on each line

• Please tick one box only

• Please tick all that apply

• Please write in

Page 46: Schuster how to_avoid_mistakes_with_statistics_31052013

Designing the Self-Completion Questionnaire

• Keep questions and answers together

– Don’t spread a question over two pages

– Put answers alongside each corresponding question

Page 47: Schuster how to_avoid_mistakes_with_statistics_31052013

Designing the Self-Completion Questionnaire

• Use vertical rather than horizontal alignment of fixed choice answers – Less confusing to read

– Distinguishes questions from answers

– Respondent is less likely to make a mistake

– Question is easier to pre-code

• Exception – Long list of questions with identical answer formats

• E.g. a Likert scale with several questions

Page 48: Schuster how to_avoid_mistakes_with_statistics_31052013

What do you think of the CEO's performance in his job since he took over the running of this company?

(Please tick the appropriate response)

Very good Good Fair Poor Very poor

5 4 3 2 1

What do you think of the CEO's performance in his job since he took over the running of this company?

(Please tick the appropriate response)

Very good ___5

Good ___4

Fair ___3

Poor ___2

Very poor ___1

Horizontal or Vertical Alignment

Page 49: Schuster how to_avoid_mistakes_with_statistics_31052013

Example Likert Scale

In the next set of questions, you are presented with a statement. You are being asked to indicate your level of agreement or disagreement with each statement by indicating whether you: Strongly Agree (SA), Agree (A), are Undecided (U), Disagree (D), or Strongly Disagree (SD).

Please indicate your level of agreement by circling the appropriate response.

23. My job is like a hobby to me.

SA A U D SD

24. My job is usually interesting enough to keep me from getting bored.

SA A U D SD

25. It seems that my friends are more interested in their jobs.

SA A U D SD

• In this case, horizontal alignment should be used

Page 50: Schuster how to_avoid_mistakes_with_statistics_31052013

Example Long List of Questions

• In this case, horizontal alignment should be

used

Source: ISSP 2007

Page 51: Schuster how to_avoid_mistakes_with_statistics_31052013

Designing Questions: General Rules

• Always bear in mind your research

questions

• What do you want to know?

• Imagine yourself as the respondent – How would you answer the questions?

– Identify any vague or misleading questions

– Think about questionnaire length, style and

attractiveness

Page 52: Schuster how to_avoid_mistakes_with_statistics_31052013

Designing Questions: Specific Rules

• Avoid ambiguous terms – ‘Often’, ‘regularly’, ‘frequently’, ‘have’

• Avoid long questions

• Avoid double-barrelled questions – People may have different answers to each part

– No necessary correspondence between parts

– Example

• “How much time do you spend on going to concerts and the cinema?”

Page 53: Schuster how to_avoid_mistakes_with_statistics_31052013

Designing Questions: Specific Rules

• Avoid very general questions – Difficult to answer because they lack a frame of

reference

– Example • “How happy are you in general?”

• Avoid leading questions – Do not seem to suggest that a particular response is

desired

– Example • “Do you think that tuition fees make students less keen to

go to university?”

– There might be a problem with social desirability

Page 54: Schuster how to_avoid_mistakes_with_statistics_31052013

Designing Questions: Specific Rules

• Do not ask two questions in one

• Example – “Which political party did you vote for at the last

election?”

– Firstly establish whether respondent voted at all as a filter question

– Do not ask for opinions about several things at once

• Avoid negative terms (‘not’, ‘never’) – Especially double negatives – this is confusing

– Example • “It is not a good idea to not turn in homework on time”

Page 55: Schuster how to_avoid_mistakes_with_statistics_31052013

Designing Questions: Specific Rules

• Avoid technical terms, jargon and acronyms

• Ensure that respondents have the requisite knowledge – Are questions meaningful?

• There should be a symmetry between closed questions and answers – Example

• “Do you agree or disagree that …”

– Agree ___

– Disagree ___

Page 56: Schuster how to_avoid_mistakes_with_statistics_31052013

Designing Questions: Specific Rules

• There should be a balance between positive

and negative responses to a question (avoid

bias)

• Do not rely on respondent’s memory

– Show cards if there are many possible answers

• Include a “don’t know” option if sensible

• Include a “Refuse answer” option if appropriate

Page 57: Schuster how to_avoid_mistakes_with_statistics_31052013

Common Mistakes in Designing Questions

• Excessive use of open questions

• Excessive use of yes/no questions

• No instructions about how to indicate answers – Examples: tick box, circle, delete

• List answers in close questions that are not mutually exclusive

• More than one answer may be applicable

• Answers do not correspond to the question

Page 58: Schuster how to_avoid_mistakes_with_statistics_31052013

Examples

Prof. Dr. Thomas Schuster 58 IW Brown Bag Seminar 31/05/2013

Source: SOEP (2012)

Page 59: Schuster how to_avoid_mistakes_with_statistics_31052013

Examples

Prof. Dr. Thomas Schuster 59 IW Brown Bag Seminar 31/05/2013

Source: SOEP (2012)

Page 60: Schuster how to_avoid_mistakes_with_statistics_31052013

Examples

Prof. Dr. Thomas Schuster 60 IW Brown Bag Seminar 31/05/2013

Source: SOEP (2012)

Page 61: Schuster how to_avoid_mistakes_with_statistics_31052013

Examples

Prof. Dr. Thomas Schuster 61 IW Brown Bag Seminar 31/05/2013

Source: SOEP (2012)

Page 62: Schuster how to_avoid_mistakes_with_statistics_31052013

Examples

Prof. Dr. Thomas Schuster 62 IW Brown Bag Seminar 31/05/2013

Source: SOEP (2012)

Page 63: Schuster how to_avoid_mistakes_with_statistics_31052013

Examples

Prof. Dr. Thomas Schuster 63 IW Brown Bag Seminar 31/05/2013

Source: SOEP (2012)

Page 64: Schuster how to_avoid_mistakes_with_statistics_31052013

My Recommendations

• Do Not Trust any Statistics you Did not Make Yourself

• A profound knowledge is needed to avoid mistakes with statistics

Prof. Dr. Thomas Schuster 64 IW Brown Bag Seminar 31/05/2013

Page 65: Schuster how to_avoid_mistakes_with_statistics_31052013

Last Famous Quote in Statistics

Conducting data analysis is like drinking a fine

wine. It is important to swirl and sniff the wine, to

unpack the complex bouquet and to appreciate

the experience. Gulping the wine doesn’t work.

Daniel B. Wright (2003)

Prof. Dr. Thomas Schuster 65 IW Brown Bag Seminar 31/05/2013


Recommended