Chapter 3 Descriptive Statistics II: Additional Descriptive Measures and Data Displays.

Post on 04-Jan-2016

231 views 1 download

transcript

Chapter 3

Descriptive Statistics II: Additional Descriptive Measures and Data

Displays

PERCENTILES

If the value A is the pth percentile value for a data set, then at least p% of the values are less than or equal to A and at least (1-p)% of the values are greater than or equal to A.

Percentile Rules

Rule 1: If the position calculator, , produces an

integer, average the value occupying that position in the ordered list with the value in the next higher position and use the result as the pth percentile value.

Rule 2: If the position calculator, , produces a non-integer, round the position result up to the next

higher integer. The pth percentile value will be the value occupying that position in the ordered list.

np

100

np

100

Quartiles

Quartiles Q1, Q2, and Q3 break an ordered list of numbers into four approximately equal subgroups, each containing about 25% of the values.

Interquartile Range (3.1)

IQR = Q3 – Q1

Stem-and-Leaf Illustration

89 57 82 88 55 66 65 70 99 100 74 70 85 72 75 80 95 95 85 60 85 90 80 90 92 95 98 65 80 89

The stem-and-leaf diagram for the data appears below:

This row shows the values 66, 65, 60 and 65, in the order in which they appear in the data list.

5 7 5 6 6 5 0 5 7 0 4 0 2 5 8 9 2 8 5 0 5 5 0 0 9 9 9 5 5 0 0 2 5 8 10 0

Figure 3.1 Box Plot Illustration

In a standard box plot, the box extends from the first quartile to the third quartile. The position of the median is indicated inside the box.

The “whiskers” extend to the largest and smallest values.

220 225 230 235 240 245 250

Smallest Middle 50% Largest

Q1 Q3

Q2(median)

Figure 3.2 A Second Box Plot

This box plot represents a symmetric data set, with the median centered inside the box.

220 225 230 235 240 245 250

Identifying Outliers

• 1.5 x Interquartile Range

• Chebyshev’s Rule

• Empirical Rule

Chebyshev’s Rule (3.2)

For any set of values, at least

(1 - 1/k2) x 100%

of them will be within plus or minus k standard deviations of the mean, where k is a number greater than 1.

The Empirical Rule

For a Bell-Shaped Distribution:

• 68.3% of the values will be within 1 standard deviation of the mean.

• 95.5% of the values will be within 2 standard deviations of the mean, and

• 99.7% (almost all) of the values will be within 3 standard deviation of the mean.

Figure 3.3 A Bell-Shaped (Normal)

Distribution

68.3%

95.5%

99.7%

-3 -2 -1 0 1 2 3

Calculating z scores (3.3)

deviation standard

meanvalue Z =

Covariance (3.4) (Population)

N

yx yixi ))(( xy =

Figure 3.4 Covariance Possibilities

In a), an upward sloping line best describes the points, indicating a positive covariance. In b), the downward sloping line implies a negative covariance. In c), the line has 0 slope, which means a covariance of 0.

x

y

x

y

x

y

(a) Positive (b) Negative

(c) Zero

Correlation Coefficient (3.5) (Population)

)( )( yx

xy

xy =

Covariance (3.6) (Sample)

1

))((

n

yyxx ii

sxy =

Correlation Coefficient (3.7) (Sample)

)( )( yx

xy

ss

srxy =

Coefficient of Variation (3.8) (Population)

CV =

Geometric Mean (Version 1) (3.9)

GM =

nnxxx ..21

Geometric Mean (Version 2) (3.10)

n

mountBeginningA

ntEndingAmouGM =

Weighted Average (3.11)

xw wixi

= wi