+ All Categories
Home > Documents > Visualizing and Presenting Data

Visualizing and Presenting Data

Date post: 13-Jan-2016
Category:
Upload: gagandeepsinghwalia
View: 234 times
Download: 0 times
Share this document with a friend
Description:
For MBA students
29
Visualizing and Presenting Data
Transcript
Page 1: Visualizing and Presenting Data

Visualizing and Presenting Data

Page 2: Visualizing and Presenting Data

2

Glyn Davis & Branko Pecar

Newspapers, magazines and television all use these types of displays to try and convey information in an easy to assimilate way.

In a nut shell what these forms of displays aim to do is to summarise large sets of raw data such that we can see at a glance the 'behaviour' of the data.

Tables?

Graphical representation?

Frequency distributions?

Page 3: Visualizing and Presenting Data

Learning Objectives

3

• Understand the different types of data variables that can be used to represent a specific measurement.•Know how to present data in table form.•Present data in a variety of graphical forms.•Construct frequency distributions from raw data.•Distinguish between discrete and continuous data.•Construct histograms for equal and unequal class widths.•Understand what we mean by a frequency polygon.

Page 4: Visualizing and Presenting Data

Introduction

4

In statistics we have two distinct types:

1. Descriptive Statistics – comprises collecting, presenting data (tables and graphs) and describing data (central tendency, dispersion, skewness, kurtosis).

2. Inferential Statistics – drawing conclusions about a population value based upon sample data (point and interval estimates, hypothesis testing, fitting lines to data sets (X, Y) using least squares regression, and analysing time series data).

Page 5: Visualizing and Presenting Data

Summary of Presenting Data

5

Presenting Data

Categorical Data

Tabulating data

Tables

Graphing data

Bar charts

Pie charts

Numerical Data

Frequency distributions

Histograms

Polygons

Cumulative distributions

Cumulative frequency

graphs

Bivariate data

Scatter plots

Time series plots

Page 6: Visualizing and Presenting Data

The Different Types of Data Variable

6

• Variable - A variable is any measured characteristic or attribute that differs for different subjects e.g. height of a building, eye colour.

• Qualitative (or categorical) – Descriptive variable measuring a particular characteristic (e.g. eye colour) or the variable can be ranked (e.g. finished first, fourth etc.). These variables have values that can only placed into categories such as yes and no.

• Quantitative (or numerical) – these variables have values that represent quantities. A numerical variable measured on two scales (interval/ratio)

Page 7: Visualizing and Presenting Data

The Different Types of Data Variable

7

• Nominal –Assigning items to categories e.g. number of people

with blue eyes. When numbers are placed to label an

item/individual, it is called as nominal data. Frequency

distributions are usually used to tabulate and analyse problems

involving nominal data.

• Ordinal – A set of data is said to be ordinal if the values

belonging to it can be ranked. Number are used to rank

objects/attributes

Page 8: Visualizing and Presenting Data

• Interval - An interval scale is a scale of measurement

where the distance between any two adjacent units

of measurement (or ‘intervals’) is the same but the

zero point is arbitrary

• Ratio - Ratio data are continuous data where both

differences and ratios are interpretable and have a

natural zero

The Different Types of Data Variable

Page 9: Visualizing and Presenting Data

Recognising a measure scale

9

Measurement Scale

Recognising a measure scale

Nominal data 1. Classification data e.g. male or female, red or black car.

2. Arbitrary labels e.g. m or f, r or b, 0 or 1.3. No ordering e.g. it makes no sense to state that

r > b.Ordinal data 1. Ordered list e.g. student satisfaction scale of 1,

2, 3, 4, and 5.2. Differences between values are not important

e.g. political parties can be given labels: far left, left, mid, right, far right etc. and student satisfaction scale of 1, 2, 3, 4, and 5.

Interval data 1. Ordered, constant scale, with no natural zero e.g. temperature, dates.

2. Differences make sense, but ratios do not e.g. temperature difference

Ratio data 1. Ordered, constant scale, and a natural zero e.g. length, height, weight, and age.

Page 10: Visualizing and Presenting Data

Tables

10

Tables come in a variety of formats, from simple tables to frequency distribution, that allow data sets to be summarised in a form that allows users to be able to access important information.

Proposed voting behaviour by 1110 university students

(Source: University Student Survey October 2008)Party Frequenc

yor Party Frequenc

y %Conservativ

e400 Conservativ

e36

Labour 510 Labour 46Democrat 78 Democrat 7

Green 55 Green 5Other 67 Other 6Total 1110 Total 100

Example 1.1

Simple table illustrating the voting intentions of 1110 students

Page 11: Visualizing and Presenting Data

Simple Tables

11

Half-yearly sales of XBAR Ltd.Month Januar

yFebruar

yMarch April May June Total

Pink 5200 4100 6000 6900 6050 7000 35250Blue 2100 1050 2950 5000 6300 5200 22600Total 7300 5150 8950 11900 12350 12200 57850

Single MarriedUnder

3030+ Under

3030+

Less than 15 hrs per week

330 358 1162 484

15 hrs or more per week

1719 241 643 1521

Total 2049 599 1805 2005

Example 1.2 Half yearly sales of XBAR Ltd

Example 1.3 Viewing habits of adult males

Page 12: Visualizing and Presenting Data

Frequency Distributions

12

Consider the set of data that represents the number of insurance claims processed each day by an insurance firm over a period of 40 days:

3, 5, 9, 6, 4, 7, 8, 6, 2, 5, 10, 1, 6, 3, 6, 5, 4, 7, 8, 4, 5, 9, 4, 2, 7, 6, 1, 3, 5, 6, 2, 6, 4, 8, 3, 1, 7, 9, 7, 2.

Page 13: Visualizing and Presenting Data

Frequency Distributions

13

Consider the set of data that represents the number of insurance claims processed each day by an insurance firm over a period of 40 days: 3, 5, 9, 6, 4, 7, 8, 6, 2, 5, 10, 1, 6, 3, 6, 5, 4, 7, 8, 4, 5, 9, 4, 2, 7, 6, 1, 3, 5, 6, 2, 6, 4, 8, 3, 1, 7, 9, 7, 2.

SCORE TALLY FREQUENCY, f1 111 32 1111 43 1111 44 1111 55 1111 56 1111 11 77 1111 58 111 39 111 3

10 1 1Sf = 40

Example 1.4Frequency distribution

Page 14: Visualizing and Presenting Data

Grouped Frequency Distributions

14

Consider the following data set of miles recorded by 120 salesmen in one week.

403 407 407 408 410 412 413 413423 424 424 425 426 428 430 430435 435 436 436 436 438 438 438444 444 445 446 447 447 447 448452 453 453 453 454 455 455 456462 462 462 463 464 465 466 468474 474 475 476 477 478 479 481490 493 494 495 497 498 498 500415 430 439 449 457 468 482 502416 431 440 450 457 469 482 502418 432 440 450 458 470 483 505419 432 441 451 459 471 485 508420 433 442 451 459 471 486 509421 433 442 451 460 472 488 511421 434 443 452 460 473 489 515

Example 1.5 Data set

Page 15: Visualizing and Presenting Data

Counting frequencies

15

MILEAGE TALLY FREQUENCYf

400 - 419 1111 1111 11 12420 - 439 1111 1111 1111 1111

1111 1127

440 - 459 1111 1111 1111 1111 1111 1111 1111

34

460 - 479 1111 1111 1111 1111 1111

24

480 - 499 1111 1111 1111 15500 - 519 1111 111 8

Sf = 120

Example 1.5 Grouped frequency distribution

See text for the Excel solution

Page 16: Visualizing and Presenting Data

Class Intervals and Boundaries

16

1. Discrete data occurs as an integer (whole number) e.g. 1, 2, 3, 4, 5, 6,.......etc.

2. Continuous data occurs as a continuous number and can take any level of accuracy, e.g. the number of miles travelled could be 440.3 or 440.34 … etc.

MATHEMATICAL LIMITSTATED LIMIT DISCRETE CONTINUOUS

A 5 - under 10

10 - under 15

5 - 910 - 14

5 - 9.999999'10 -

14.999999'

B 5 - 910 – 15

5 - 910 - 15

4.5 - 9.59.5 - 15.5

Data can exist in two forms: discrete and continuous:

Normally, we would look at creating 5 – 12 classes in the grouped frequency distribution, where class width = Upper – Lower Class Boundaries.

ClassesofNumber

ValueLowestValueHighestwidthClass

Page 17: Visualizing and Presenting Data

Graphical Representation of Data

17

The next stage of analysis after the data has been tabulated is to graph the data using a variety of methods to provide a suitable graph. In this section we will explore:

1. Bar charts2. Pie charts3. Histograms4. Frequency polygons5. Scatter plots6. Time series plots

The type of graph you will use to graph the data depends upon the type of variable you are dealing with within your data set e.g. category (or nominal), ordinal, or interval (or ratio) data as follows:

Data type Which graph to use?Category or nominal

Bar chart, pie chart, cross tab tables (or contingency tables)

Ordinal Bar chart, pie chart, scatter plots.

Interval or ratio

Histogram, frequency polygon, histogram. Cumulative frequency curve (or ogive), scatter plots, time series plots.

Page 18: Visualizing and Presenting Data

Bar charts

18

Categorical data is represented largely by bar and pie charts. Bar charts are very useful in providing a simple pictorial representation of several sets of data on one graph.

Example 1.7

Bar chart for proposed voting behaviour

See text for the Excel solution

Page 19: Visualizing and Presenting Data

Horizontal Bar Charts

19

Example 1.8

Component bar chart for half yearly car sales

See text for the Excel solution

Page 20: Visualizing and Presenting Data

Pie charts

20

In a pie chart the relative frequencies are represented by a slice of a circle. Each section represents a category, and the area of a section represents the frequency or number of objects within a category.

They are particularly useful in showing relative proportions, but their effectiveness tends to diminish for more than eight categories.

Example 1.11

Pie chart for proposed voting behaviour

See text for the Excel solution

Page 21: Visualizing and Presenting Data

Pie chart angles

21

A set of instructions is provided below if you would like to calculate the angles of each slice in the circle that represents each voting category.

Political Party

Voting Behaviour

AngleCalculation

Angle(1 decimal

place)Conservative

400 (360/1110)*400

129.70

Labour 510 (360/1110)*510

165.40

Democrat 78 (360/1110)*78

25.30

Green 55 (360/1110)*55

17.80

Other 67 (360/1110)*67

21.70

Total = 1110 359.9

Page 22: Visualizing and Presenting Data

Histograms

22

Glyn Davis & Branko Pecar

A graph of the data in a frequency distribution is called a histogram. The area of each bar is a measure of the frequency of occurrence (number of values) within each category. If the bar widths are the same (constant) then the height of the bar is directly related to the frequency and this information can then be used to construct the histogram.

Histogram for the number of insurance claims processed

Example 1.12

Page 23: Visualizing and Presenting Data

Histogram Example

23

Example 1.13

Histogram for the miles recorded by 120 salesman

See text for the Excel solution

Page 24: Visualizing and Presenting Data

Frequency Polygon

24

Glyn Davis & Branko Pecar

A frequency polygon is formed from a histogram by joining the mid-points of the tops of the rectangles by straight lines. The mid-points of the first and last class are joined to the x-axis to either side at a distance equal to (1/2)th the class interval of the first and last class.

Example 1.15

Frequency Polygon for the miles recorded by 120 salesman

See text for the Excel solution

Page 25: Visualizing and Presenting Data

Creating Scatter Plots

25

A scatter plot is a graph which helps us assess visually the form of relationship between two variables. To illustrate the idea of a scatter plot consider the following problem.

Employee Number Productivity, X % Raise in Productivity, Y1 47 4.22 71 8.13 64 6.84 35 4.35 43 5.06 60 7.57 38 4.78 59 5.99 67 6.9

10 56 5.711 67 5.712 57 5.413 69 7.514 38 3.815 54 5.916 76 6.317 53 5.718 40 4.019 47 5.220 23 2.2

Example 1.16

Page 26: Visualizing and Presenting Data

Scatter plots

26

Glyn Davis & Branko PecarSee text for the

Excel solution

Example 1.16

Scatter plot for the % raise in productivity against productivity

Page 27: Visualizing and Presenting Data

Time series

27

Time series analysis is concerned with data collected over a period of time. It attempts to isolate and evaluate various factors which contribute to changes over time in such variable series as imports and exports, sales, unemployment and prices. If we can evaluate the main components which determine the value of say sales for a particular month then we can project the series into the future to obtain a forecast.

Sales of Pip Ltd 2001-2004 (tons)Year Quarter

1Quarter

2Quarter

3Quarter

42001 654 620 698 7232002 756 698 748 8022003 843 799 856 8892004 967 876 960 976

Example 1.17

Page 28: Visualizing and Presenting Data

Time series plots

28

See text for the Excel solution

Example 1.17

Time series plot for quarterly sales of Pip Ltd

Page 29: Visualizing and Presenting Data

Conclusion

29

In this presentation we explored summarising data sets using the following three concepts:

Tables

Frequency distributions

Graphs


Recommended