+ All Categories
Home > Documents > STA 291 Fall 2009

STA 291 Fall 2009

Date post: 22-Feb-2016
Category:
Upload: jorryn
View: 32 times
Download: 0 times
Share this document with a friend
Description:
STA 291 Fall 2009. Lecture 4 Dustin Lueker. Bar Graph (Nominal/Ordinal Data). Histogram: for interval (quantitative) data Bar graph is almost the same, but for qualitative data Difference: - PowerPoint PPT Presentation
Popular Tags:
24
STA 291 Fall 2009 Lecture 4 Dustin Lueker
Transcript
Page 1: STA 291 Fall 2009

STA 291Fall 2009

Lecture 4Dustin Lueker

Page 2: STA 291 Fall 2009

STA 291 Fall 2009 Lecture 4 2

Bar Graph (Nominal/Ordinal Data) Histogram: for interval (quantitative) data Bar graph is almost the same, but for

qualitative data Difference:

◦ The bars are usually separated to emphasize that the variable is categorical rather than quantitative

◦ For nominal variables (no natural ordering), order the bars by frequency, except possibly for a category “other” that is always last

Page 3: STA 291 Fall 2009

First Step◦ Create a frequency distribution

Pie Chart(Nominal/Ordinal Data)

STA 291 Fall 2009 Lecture 4 3

Highest Degree Obtained

Frequency(Number of Employees)

Grade School 15High School 200Bachelor’s 185Master’s 55Doctorate 70Other 25Total 550

Page 4: STA 291 Fall 2009

Bar graph◦ If the data is ordinal, classes are presented in the

natural ordering

We could display this data in a bar chart…

STA 291 Fall 2009 Lecture 4 4

Grade School

High School Bachelor's Master's Doctorate Other0

50

100

150

200

250

Page 5: STA 291 Fall 2009

Pie is divided into slices◦ Area of each slice is proportional to the frequency

of each class

Pie Chart

STA 291 Fall 2009 Lecture 4 5

Highest Degree Relative Frequency Angle ( = Rel. Freq. x 360 )

Grade School 15/550 = .027 9.72

High School 200/550 = .364 131.04

Bachelor’s 185/550 = .336 120.96

Master’s 55/550 = .1 36.0

Doctorate 70/550 = .127 45.72

Other 25/550 = .045 16.2

Page 6: STA 291 Fall 2009

Pie Chart for Highest Degree Achieved

STA 291 Fall 2009 Lecture 4 6

Grade School

High School

Bache-lor's

Master's

DoctorateOther

Page 7: STA 291 Fall 2009

7

Write the observations ordered from smallest to largest◦ Looks like a histogram sideways◦ Contains more information than a histogram,

because every single observation can be recovered Each observation represented by a stem and leaf

Stem = leading digit(s) Leaf = final digit

Stem and Leaf Plot

STA 291 Fall 2009 Lecture 4 7

Page 8: STA 291 Fall 2009

8

Stem and Leaf Plot

STA 291 Fall 2009 Lecture 4 8

Stem Leaf # 20 3 1 19 18 17 16 15 14 13 135 3 12 7 1 11 334469 6 10 2234 4 9 08 2 8 03469 5 7 5 1 6 034689 6 5 0238 4 4 46 2 3 0144468999 10 2 039 3 1 67 2 ----+----+----+----+

Page 9: STA 291 Fall 2009

9

Useful for small data sets◦ Less than 100 observations

Practical problem◦ What if the variable is measured on a continuous scale, with

measurements like 1267.298, 1987.208, 2098.089, 1199.082 etc.◦ Use common sense when choosing “stem” and “leaf”

Can also be used to compare groups◦ Back-to-Back Stem and Leaf Plots, using the same stems for

both groups. Murder Rate Data from U.S. and Canada

Note: it doesn’t really matter whether the smallest stem is at top or bottom of the table

Stem and Leaf Plot

STA 291 Fall 2009 Lecture 4 9

Page 10: STA 291 Fall 2009

10

Stem and Leaf Plot

STA 291 Fall 2009 Lecture 4 10

PRESIDENT AGE PRESIDENT AGE PRESIDENT AGEWashington 67 Fillmore 74 Roosevelt 60

Adams 90 Pierce 64 Taft 72Jefferson 83 Buchanan 77 Wilson 67Madison 85 Lincoln 56 Harding 57Monroe 73 Johnson 66 Coolidge 60Adams 80 Grant 63 Hoover 90Jackson 78 Hayes 70 Roosevelt 63Van Buren 79 Garfield 49 Truman 88Harrison 68 Arthur 56 Eisenhower 78Tyler 71 Cleveland 71 Kennedy 46Polk 53 Harrison 67 Johnson 64Taylor 65 McKinley 58 Nixon 81

Reagan 93Ford 93Stem Leaf

Page 11: STA 291 Fall 2009

11

Discrete data◦ Frequency distribution

Continuous data◦ Grouped frequency distribution

Small data sets◦ Stem and leaf plot

Interval data◦ Histogram

Categorical data◦ Bar chart◦ Pie chart

Grouping intervals should be of same length, but may be dictated more by subject-matter considerations

Summary of Graphical and Tabular Techniques

STA 291 Fall 2009 Lecture 4 11

Page 12: STA 291 Fall 2009

12

Present large data sets concisely and coherently

Can replace a thousand words and still be clearly understood and comprehended

Encourage the viewer to compare two or more variables

Do not replace substance by form Do not distort what the data reveal

Good Graphics

STA 291 Fall 2009 Lecture 4 12

Page 13: STA 291 Fall 2009

13

Don’t have a scale on the axis Have a misleading caption Distort by using absolute values where

relative/proportional values are more appropriate

Distort by stretching/shrinking the vertical or horizontal axis

Use bar charts with bars of unequal width

Bad Graphics

STA 291 Fall 2009 Lecture 4 13

Page 14: STA 291 Fall 2009

14

Frequency distributions and histograms exist for the population as well as for the sample

Population distribution vs. sample distribution

As the sample size increases, the sample distribution looks more and more like the population distribution

Sample/Population Distribution

STA 291 Fall 2009 Lecture 4 14

Page 15: STA 291 Fall 2009

15

The population distribution for a continuous variable is usually represented by a smooth curve◦ Like a histogram that gets finer and finer

Similar to the idea of using smaller and smaller rectangles to calculate the area under a curve when learning how to integrate

Symmetric distributions◦ Bell-shaped◦ U-shaped◦ Uniform

Not symmetric distributions:◦ Left-skewed◦ Right-skewed◦ Skewed

Population Distribution

STA 291 Fall 2009 Lecture 4 15

Page 16: STA 291 Fall 2009

Symmetric

Right-skewed

Left-skewed

Skewness

STA 291 Fall 2009 Lecture 4 16

Page 17: STA 291 Fall 2009

Center of the data◦ Mean◦ Median◦ Mode

Dispersion of the data Sometimes referred to as spread

◦ Variance, Standard deviation◦ Interquartile range◦ Range

Summarizing Data Numerically

17STA 291 Fall 2009 Lecture 4

Page 18: STA 291 Fall 2009

Mean◦ Arithmetic average

Median◦ Midpoint of the observations when they are

arranged in order Smallest to largest

Mode◦ Most frequently occurring value

Measures of Central Tendency

18STA 291 Fall 2009 Lecture 4

Page 19: STA 291 Fall 2009

Sample size n Observations x1, x2, …, xn Sample Mean “x-bar”

Sample Mean

19

1 2

1

x ( ) /1

n

n

ii

x x x n

xn

SUM

STA 291 Fall 2009 Lecture 4

Page 20: STA 291 Fall 2009

Population size N Observations x1 , x2 ,…, xN Population Mean “mu”

Note: This is for a finite population of size N

Population Mean

20

1 2

1

( ) /1

N

N

ii

x x x N

xN

SUM

STA 291 Fall 2009 Lecture 4

Page 21: STA 291 Fall 2009

Requires numerical values◦ Only appropriate for quantitative data◦ Does not make sense to compute the mean for

nominal variables◦ Can be calculated for ordinal variables, but this does not

always make sense Should be careful when using the mean on ordinal variables Example “Weather” (on an ordinal scale)

Sun=1, Partly Cloudy=2, Cloudy=3,Rain=4, Thunderstorm=5Mean (average) weather=2.8

Another example is “GPA = 3.8” is also a mean of observations measured on an ordinal scale

Mean

21STA 291 Fall 2009 Lecture 4

Page 22: STA 291 Fall 2009

Center of gravity for the data set Sum of the values above the mean is equal

to the sum of the values below the mean

Mean

STA 291 Fall 2009 Lecture 4 22

Page 23: STA 291 Fall 2009

Mean◦ Sum of observations divided by the number of

observations

Example◦ {7, 12, 11, 18}◦ Mean =

Mean (Average)

23STA 291 Fall 2009 Lecture 4

Page 24: STA 291 Fall 2009

Highly influenced by outliers◦ Data points that are far from the rest of the data

Not representative of a typical observation if the distribution of the data is highly skewed◦ Example

Monthly income for five people1,000 2,000 3,000 4,000 100,000

Average monthly income = Not representative of a typical observation

Mean

24STA 291 Fall 2009 Lecture 4


Recommended