Download - Exploring Data: Frequencies, Central Tendency, Dispersion and Standard Deviation SIT094 The Collection and Analysis of Quantitative Data Week 3 Luke Sloan.

Exploring Data:Frequencies, Central Tendency,

Dispersion and Standard Deviation

SIT094The Collection and Analysis of Quantitative Data

Week 3

Luke Sloan

About Me

• Name: Dr Luke Sloan• Office: 0.56 Glamorgan• Email: [email protected]

• To see me: please email first

mailto:[email protected]

Introduction

• Collecting Quantitative Data

• Levels of Measurement

• Frequencies & Fidelity

• Central Tendency

• Dispersion

• Summary

Collecting Quantitative Data I

“Research involving the collection of data in numerical form… the defining factor is that

numbers result from the process, whether the initial data collection produced numerical values,

or whether non-numerical values were subsequently converted to numbers as part of

the analysis process…”

Source: Jupp 2006:250

Collecting Quantitative Data II• Operationalising of social concepts

• Quantifying ‘fuzzy’ data into VARIABLES

• How to measure feelings, attitudes, behaviours, beliefs and attributes?

• Numbers allow statistical tests

• Statistical tests allow generalisations to made

• Characterisation from samples to populations

Collecting Quantitative Data III• Capture data using instruments

• Surveys (paper, online, telephone, in person)

• Secondary data analysis

• Experiments – difficult outside of the natural sciences

• But social scientists try to emulate the natural science model (remember Popper’s Falsification Principle?)

• But not all data is equal (some are more equal than others!)

Levels of Measurement IData Level Description Examples

Nominal (categorical)

Response categories cannot be placed in a specific order – impossible to judge ‘distance’ between categories

Sex (Male/Female)Ethnicity (White/Black…)Party (Lab/Con/LD…)

Ordinal (categorical)

Response categories can be placed in rank order – distance between categories cannot be measured mathematically

Likert (Agree/Neutral/Disagree)Rank Preference (Coke/Pepsi…)Education (GCSE/A-Level…)

Interval (or continuous)*

Responses measured on a continuous scale with rank order – uniform distance between responses allows mathematical measurement

Age (in years)Income (in £)

*NOTE: Interval = no true zero point (e.g. height), Ratio = true zero point (e.g. income)

Source: David & Sutton (2004)

Levels of Measurement II

• Level of measurement for certain variables is not pre-defined:

– AGE (in years e.g. 22, 34, 54)– AGE (pre-set bands e.g. 18-30, 31-50)– AGE (group membership e.g. mature student)

• There is a hierarchy of data – always try to collect the highest level possible to maximise usefulness!

– Are you bored? (Yes/No)– On a scale of 1-10, how bored are you [where 1=‘practically in tears of

boredom’ and 10=‘riveted’]

Frequencies & Fidelity I

• Not as interesting as it sounds – sorry!

• Frequency tables display the number of times that a value appears in your dataset (per variable across all cases)

• They are always the first thing you do once your data is in electronic form

• Highlights data errors

• Indicative of potential analysis

Frequencies & Fidelity II

Parties coded

Frequency Percent Valid Percent Cumulative Percent

Valid -9 1 .0 .0 .0Conservative 1331 29.9 29.9 30.0

Labour 1103 24.8 24.8 54.8

Lib Dem 1044 23.5 23.5 78.2

Green 368 8.3 8.3 86.5

UKIP 171 3.8 3.8 90.4

BNP 78 1.8 1.8 92.1

Independent 216 4.9 4.9 97.0

Others 135 3.0 3.0 100.0

Total 4447 100.0 100.0

Missing System 1 .0

Total 4448 100.0

What can we say about this table?

A simple frequency table can tell you quite a bit!

Error?

What we would expect?

Look at %s

More than UKIP

Really? Only 1?

What’s this?

Central Tendency I

You have all done quantitative research and you all use measures of central tendency in your normal lives – the average, middle and most

common values

Maintenance grant allowance per week

Divide total grant by number of weeks at uni

Average(MEAN)

How long do you cook a chicken?

Cookbook says 2 hours but internet says 3

Middle(MEDIAN)

What to watch on TV with housemates

Decide based on the most popular choice

Most Common(MODE)

Central Tendency II

MODEthe value that occurs the

most frequently in the data

HighDate Temperature2-Jan 593-Jan 604-Jan 435-Jan 426-Jan 357-Jan 32 <===Mode8-Jan 32 <===Mode9-Jan 4610-Jan 4111-Jan 52

MODE = 32

Central Tendency III

Main reason for going to gym

9 10.0 10.0 10.0

31 34.4 34.4 44.4

33 36.7 36.7 81.1

17 18.9 18.9 100.0

90 100.0 100.0

Relaxation

Fitness

Lose weight

Build strength

Total

ValidFrequency Percent Valid Percent

CumulativePercent

What is the most frequent (MODAL) response?

The mode is useful for thinking about NOMINAL data

Central Tendency IV

Relaxation Fitness Lose weight Build strength

Main reason for going to gym

0

10

20

30

40

Co

un

t

NOMINAL data can be displayed using a bar chart

Central Tendency V

MEDIANthe middle value of the ordered

sample data

HighDate Temperature7-Jan 328-Jan 326-Jan 3510-Jan 415-Jan 42 <===Middle values4-Jan 43 <===Middle values9-Jan 4611-Jan 522-Jan 593-Jan 60

When the sample size if odd, the median is the middle

valueWhen the sample size if even,

the median is the midpoint (mean) of the two middle

valuesMEDIAN = 42.5

There is a general lack of public knowledge about local government

Frequency Percent Valid Percent

Cumulative Percent

Valid Strongly Agree 1911 41.1 41.8 41.8Agree 2281 49.1 49.9 91.6

Neutral 255 5.5 5.6 97.2

Disagree 111 2.4 2.4 99.6

Strongly Disagree 17 .4 .4 100.0

Total 4575 98.5 100.0 Missing System 71 1.5 Total 4646 100.0

Central Tendency VI

The mode and median are useful for thinking about ORDINAL data

What is the most frequent (MODAL) response?

What is the middle (MEDIAN) response?

Central Tendency VII

ORDINAL data can also be displayed using a bar chart

Central Tendency VIII

MEANsum of the value divided by the

number of cases

HighDate Temperature2-Jan 593-Jan 604-Jan 435-Jan 426-Jan 357-Jan 328-Jan 329-Jan 4610-Jan 4111-Jan 52

Sum 442

MEAN = 44.2

Central Tendency IX

The mean, mode and median are useful for thinking about INTERVAL data

What is the average (MEAN) age?

What is the middle (MEDIAN) age?

What is the most common (MODAL) age?

Statistics

What was your age last birthdayN Valid 4290

Missing 158

Mean 54.74

Median 57.00

Mode 62

Central Tendency X

INTERVAL data can be displayed using a histogram

Dispersion I

• Measures of central tendency are heuristics

• They can hide important details in the data

Dataset 1: 1 2 3 4 5 6 7 8 9

Dataset 2: 1 2 3 4 5 6 7 8 90

MEAN = 5MEDIAN = 5MEAN = 14MEDIAN = 5

Need to consider RANGE and STANDARD DEVIATION

Dispersion II

• RANGE measures the difference between the lowest and highest values– Large range may reveal outliers (dataset 2!)– Small range suggests tight grouping of data

• STANDARD DEVIATION (SD) measures the distance (deviation) of each value from the mean– Large SDs occur when data points are a long way from the

mean (wide range of different values)– Small SDs occur when data points are close to the mean

(values do not differ very much)

Dispersion III

• For example:Age

(Sample 1)Age

(Sample 2)18302331211920192821

85553131252791110

Descriptive Statistics

N Range Minimum Maximum Mean

Std. Deviation

Age 10 13.00 18.00 31.00 23.0000 4.85341

Valid N (listwise) 10

Descriptive Statistics

N Range Minimum Maximum Mean

Std. Deviation

Age 10 48.00 7.00 55.00 23.0000 21.01851Valid N (listwise) 10

Summary• Levels of measurement determine how data can be analysed

• Vital to understand what your data represents and into which level of measurement it falls

• Frequency tables help us to screen data for errors

• Frequency tables also help us to identify the median and mode

• Central tendency is a heuristic, but very common because of this

• Dispersion plays a vital role in critically evaluating central tendency

• These modes of analyses are often referred to as DESCRIPTIVE STATISTICS or UNIVARIATE ANALYSIS (literally ‘one variable’!)

Lies, Damn Lies and Statistics?

90% of Sun readers want a cap on immigration

The average Yale graduate earns $30,000 within six months of graduating

The Green Party is not well supported as it received less than 5% of the national vote in the 2010 General Election

House prices drop by 10% in the UK

90% of students at Cardiff University are binge drinkers