+ All Categories
Home > Documents > 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

Date post: 01-Jan-2016
Category:
Upload: brianna-martin
View: 224 times
Download: 4 times
Share this document with a friend
Popular Tags:
39
1 Laugh, and the world laughs with you. Weep and you weep alone. ~Shakespeare~
Transcript
Page 1: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

1

Laugh, and the world laughs with you.Weep and you weep alone. ~Shakespeare~

Page 2: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

2

Chapter 3: Data Description

Types of data Graphical/Numerical

summaries

Page 3: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

3

What are Data?

Any set of data contains information about some group of individuals. The information is organized in variables.

Page 4: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

4

Terms

A population is a collection of all individuals about which information is desired.

A sample is a subset of a population. A variable is a characteristic of an individual. The distribution of a variable tells us what

values/categories it takes and how often it takes those values/categories in the population.

Page 5: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

5

Data Analysis

Goal: to study how variables relate to one another in a population

Method: estimating the distributions of variables (in the whole population) by summarizing the distributions of data on those variables

Page 6: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

6

Example: A College’s Student Dataset

The data set includes data about all currently enrolled students such as their ages, genders, heights, grades, and choices of major.

Who? What individuals do the data describe? Population/sample of study? What? How many variables do the data

describe? Give an example of variables.

Page 7: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

7

Types of Variables

A categorical variable places an individual into one of several groups or categories.

A quantitative variable takes numerical values for which arithmetic operations such as adding and averaging make sense.

Q. Which variable is categorical ? Quantitative?

Page 8: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

A variable

Categorical/Qualitative

Numerical/Quantitative

Nominal variable Ordinal variable Discrete variable Continuous variable

8

Q: Does “average” make sense?

Yes

Yes

No

No NoYes

Q: Is there any natural ordering among categories? Q: Can all possible values be listed down?

Page 9: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

9

Two Basic Strategies to Explore Data

Begin by examining each variable by itself. Then move on to study the relationship among the variables.

Begin with a graph or graphs. Then add numerical summaries of specific aspects of the data.

Page 10: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

10

A Dataset of CSUEB Students

Gender Height (inches)

Weight (pounds)

College

M 68.5 155 Bsns

F 61.2 99 Sci

F 63.0 115 Bsns

M 70.0 205 Sci

M 68.6 170 Bsns

F 65.1 125 Arts

M 72.4 220 Arts

M -- 188 Sci

Page 11: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

11

Summarizing Data

We will start from summarizing data on a variable to on several variables by:

Displaying the distribution of data with graphs

Describing the distribution of data with numbers

Page 12: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

12

Terms

Frequency = the # of individuals in a category or at a value.

Relative frequency = the % of individuals in a category or at a value.

They both can be used to display the distribution of data.

Page 13: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

13

Graphical Tools for One Variable

For a categorical variable:– Pie charts– Bar graphs

For a quantitative variable:– Histograms– Stem-and-leaf plots (read on your own)– Boxplots

Page 14: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

14

How to Make a Pie Chart

1. Calculate the % for each category

2. Draw a pie and slice it accordingly.

Page 15: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

15

Freshman41.9%

Sophomore23.3%

Junior14.0%

Senior20.9%

Pie Chart

Class Make-up on First Day

Page 16: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

16

How to Make a Bar Chart

1. Label frequencies on one axis and categories of the variable on the other axis.

2. Construct a rectangle at each category of the variable with a height equal to the frequency in the category.

3. Leave a space between categories

Page 17: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

17

41.9%

23.3%

14.0%

20.9%

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

35.0%

40.0%

45.0%

Freshman Sophomore Junior Senior

Year in School

Per

cen

t

Class Make-up on First Day

Bar Graph

Page 18: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

18

Displaying Distributions of Quantitative Variables

Stem-and-leaf plots: good for small to medium datasets

Histograms: Similar to bar charts; good for medium to

large datasets

Page 19: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

19

How to Make a Histogram

1. Divide the range of data by the approximate # of intervals desired (usually 5-20). Round the resulting number to a convenient number (the common width for the intervals).

2. Construct intervals with the common width so that the first interval contains the smallest data value and the last interval contains the largest data value.

3. Draw the histogram: the variable on the horizontal axis and the count (or %) on the vertical axis.

Page 20: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

BPS - 5th Ed. Chapter 120

Histograms: Class Intervals

How many intervals?– One rule is to calculate the square root of the

sample size, and round up.

Size of intervals?– Divide range of data (maxmin) by number of

intervals desired, and round to convenient number

Pick intervals so each observation can only fall in exactly one interval (no overlap)

Page 21: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

21

What do We See from Histograms?

Important features we should look for: Overall pattern

– Shape– Center– Spread

Outliers, the values that fall far outside the overall pattern

Page 22: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

22

How to Make a Stemplot

1. Separate each observation into a stem consisting of all but the final (rightmost) digit and a leaf, the final digit. Stems may have as many digits as needed, but each leaf contains only a single digit.

Example: height of 68.5 leaf = “5” and the other digit “68” will be the stem

Page 23: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

23

How to Make a Stemplot

2. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column.

3. Write each leaf in the row to the right of its stem, in increasing order out from the stem.

Page 24: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

Weight Data:Stemplot

(Stem & Leaf Plot)

24

10 016611 00912 003457813 0035914 0815 0025716 55517 00025518 00005556719 24520 321 02522 023242526 0

Key

20|3 means203 pounds

Stems = 10’sLeaves = 1’s

Page 25: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

25

Overall Pattern—Shape

How many peaks, called modes? A distribution with one peak is called unimodal.

Symmetric or skewed?– Symmetric if the large values are mirror images of

small values– Skewed to the right if the right tail (large values) is

much longer than the left tail (small values)– Skewed to the left if the left tail (small values) is

much longer than the right tail (large values)

Page 26: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

26

Describing Data on a Quantitative Variable

(Sec 3.4) To measure center: Mode, Mean and Median

(Sec 3.5) To measure variability: Range, Interquartile Range (IQR) and Standard Deviation (SD)

Outliers (Sec 3.6) Five-number summary and boxplot

Page 27: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

BPS - 5th Ed. Chapter 227

Quartiles

Three numbers which divide the ordered data into four equal sized groups.

Q1 has 25% of the data below it.

Q2 has 50% of the data below it. (Median)

Q3 has 75% of the data below it.

Page 28: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

BPS - 5th Ed. Chapter 228

Obtaining the Quartiles

Order the data. For Q2, just find the median.

For Q1, look at the lower half of the data values, those to the left of the median location; find the median of this lower half.

For Q3, look at the upper half of the data values, those to the right of the median location; find the median of this upper half.

Page 29: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

Weight Data: Sorted

29

100 124 148 170 185 215101 125 150 170 185 220106 127 150 172 186 260106 128 152 175 187110 130 155 175 192110 130 157 180 194119 133 165 180 195120 135 165 180 203120 139 165 180 210123 140 170 185 212

L(M)=(53+1)/2=27

L(Q1)=(26+1)/2=13.5

Page 30: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

BPS - 5th Ed. Chapter 230

Weight Data: Quartiles

Q1= 127.5

Q2= 165 (Median)

Q3= 185

Page 31: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

Five-Number Summary

• minimum = 100

• Q1 = 127.5

• M = 165

• Q3 = 185

• maximum = 260

31

InterquartileRange (IQR)= Q3 Q1

= 57.5

IQR gives spread of middle 50% of the data

Page 32: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

32

M

Weight Data: Boxplot

Q1 Q3min max

100 125 150 175 200 225 250 275

Weight

Page 33: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

33

Identifying Outliers

• The central box of a boxplot spans Q1 and Q3; recall that this distance is the Interquartile Range (IQR).

• We call an observation a mild (or extreme) outlier if it falls more than 1.5 (or 3.0) IQR above the third quartile or below the first quartile.

Page 34: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

34

Summarizing Data from 2 Variables

2 categorical var’s Contingency table

(Cluster or stacked) bar chart

2 quantitative var’s Regression equation

Scatterplot

1 categorical + 1 quantitative var

Side-by-side boxplot

Page 35: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

BPS - 5th Ed. Chapter 135

Time Plots

A time plot shows behavior over time. Time is always on the horizontal axis, and the variable

being measured is on the vertical axis. Look for an overall pattern (trend), and deviations from

this trend. Connecting the data points by lines may emphasize this trend.

Look for patterns that repeat at known regular intervals (seasonal variations).

Page 36: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

36

Average Tuition (Public vs. Private)

Page 37: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

Empirical Rule (68-95-99.7 rule)

If a variable X follows normal distribution, that is, all X values (the whole population) show bell-shaped, then:

Mean(X) + 1*SD(X) covers 68% of possible X values

Mean(X) + 2*SD(X) covers 95% of possible X values

Mean(X) + 3*SD(X) covers 99.7% of possible X values

37

Page 38: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

z-Scores & The Empirical Rule

Since the z-score is the number of standard deviations from the mean, we can easily interpret the z-score for bell-shaped populations using The Empirical Rule.

When a population has a histogram that is approximately bell-shaped, thenApproximately 68% of the data will have z-scores between –1 and 1.Approximately 95% of the data will have z-scores between –2 and 2.All, or almost all of the data will have z-scores between –3 and 3.

z = –3 z = –2 z = –1 z = 1 z = 2 z = 3

Copyright ©2014 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Page 39: 1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

Minitab Exercise

39

Use the CSUEB dataset

1. Key in data in Minitab

2. Draw all plots and calculate statistics in Minitab


Recommended