+ All Categories
Home > Education > Statr sessions 4 to 6

Statr sessions 4 to 6

Date post: 27-Jan-2015
Category:
Upload: ruruchowdhury
View: 110 times
Download: 0 times
Share this document with a friend
Description:
Praxis Weekend Analytics
Popular Tags:
26
Measures of Variability: Ungrouped Data • Measures of Variability - tools that describe the spread or the dispersion of a set of data. – Provides more meaningful data when used • with measures of central tendency • in comparison to other groups
Transcript
Page 1: Statr sessions 4 to 6

Measures of Variability: Ungrouped Data

• Measures of Variability - tools that describe the spread or the dispersion of a set of data.– Provides more meaningful data when used

• with measures of central tendency• in comparison to other groups

Page 2: Statr sessions 4 to 6

Measures of Spread or Dispersion: Ungrouped Data

• Common Measures of Variability–Range– Inter-quartile Range–Mean Absolute Deviation–Variance and Standard Deviation–Coefficient of Variation

Page 3: Statr sessions 4 to 6

Range

• The difference between the largest and the smallest values in a set of data– Advantage – easy to compute– Disadvantage – is affected by extreme values

Page 4: Statr sessions 4 to 6

Interquartile Range

• Interquartile Range - range of values between the first and third quartiles

• Range of the “middle half”; middle 50%– Useful when researchers are interested in the

middle 50%, and not the extremes

Interquartile Range (IQR) = -

Page 5: Statr sessions 4 to 6

Deviations from the mean

• Useful for interval or ratio level data• An examination of deviation from the mean can

reveal information about the variability of the data– Deviations are used mostly as a tool to compute

other measures of variability• However, the sum of deviations from the arithmetic

mean is always zero:Sum () = 0

• There are two ways to solve this conundrum…

Page 6: Statr sessions 4 to 6

Mean Absolute Deviation (MAD)

• One solution is to take the absolute value of each deviation around the mean. This is called the Mean Absolute Deviation

• Note that while the MAD is intuitively simple, it is rarely used in practice

||

5 - 8 89 - 4 416 3 317 4 418 5 5

MAD = = = 8.4

Page 7: Statr sessions 4 to 6

Sample Variance

• Another solution is the take the Sum of Squared Deviations (SSD) about the mean

• Sample Variance is the average of the squared deviations from the arithmetic mean

• Sample Variance is denoted by s2

Why Sum of Squared Deviations about the mean?- Squaring deviations remove sign- The deviations are amplified

Page 8: Statr sessions 4 to 6

Calculation of Sample Variance

5 - 8 64 9 - 4 1616 3 9 17 4 16 18 5 25

------------------------------------------------ = 65 = 0 = 130

Sample Variance = = = 32.5

Degree of Freedom

Page 9: Statr sessions 4 to 6

Sample Standard Deviation

• Sample standard deviation is the square root of the sample variance

• Denoted by s• Benefit: Same units as original data

= = 5.7

Page 10: Statr sessions 4 to 6

Standard Deviation: Empirical Rule

If a variable is normally distributed, then:1. Approximately 68% of the observations lie within 1 standard

deviation of the mean2. Approximately 95% of the observations lie within 2 standard

deviations of the mean3. Approximately 99.7% of the observations lie within 3 standard

deviations of the mean

Notes: Also applies to populations Can be used to determine if a distribution is normally

distributed

Page 11: Statr sessions 4 to 6

Standard Deviation : Empirical Rule

68%

x x s x s2 x s3x sx s 2x s 3

95%

99.7%

Page 12: Statr sessions 4 to 6

A Note about the Empirical Rule

1. Find the mean and standard deviation for the data

2. Compute the actual proportion of data within 1, 2, and 3 standard deviations from the mean

3. Compare these actual proportions with those given by the empirical rule

4. If the proportions found are reasonably close to those of the empirical rule, then the data is approximately normally distributed

Note: The empirical rule may be used to determine whether or not a set of data is approximately normally distributed

Page 13: Statr sessions 4 to 6

z Scores• Z score – represents the number of Standard Deviation a

value (x) is above or below the mean of a set of numbers when the data are normally distributed

• Z score allows translation of a value’s raw distance from the mean into units of standard deviations

• z-scores typically range from -3.00 to +3.00• z-scores may be used to make comparisons of raw scores

For samples, =

For populations =

Page 14: Statr sessions 4 to 6

Coefficient of Variation (C.V.)

• Coefficient of Variation (CV) – measures the volatility of a value (perhaps a stock portfolio), relative to its mean. It’s the ratio of the standard deviation to the mean, expressed as a percentage

• Useful when comparing Standard Deviation is computed from data with different means

• Measurement of relative dispersion

C. V. =

Page 15: Statr sessions 4 to 6

Coefficient of Variation (C.V.)

Since 15.86 > 11.90, the first population is more variable, relative to its mean, than the second population

Consider two different populations

Page 16: Statr sessions 4 to 6

Calculation of Grouped Mean

= = = 43.0

Sometimes data are already grouped, and we are interested in calculating summary statistics

Interval Frequency (f) Midpoint (M) f*M20-under 30 6 25 15030-under 40 18 35 63040-under 50 11 45 49550-under 60 11 55 60560-under 70 3 65 19570-under 80 1 75 75

50 2150

Page 17: Statr sessions 4 to 6

Median of Grouped Data - Example

Median = = =

L : Lower limit of median class interval : Cumulative total of freq.s up to median class : Freq. of median classW: Class width

CumulativeClass Interval Frequency Frequency20-under 30 6 630-under 40 18 2440-under 50 11 3550-under 60 11 4660-under 70 3 4970-under 80 1 50 N = 50

Page 18: Statr sessions 4 to 6

Mode of Grouped Data

Mode== 35

• Mode : Midpoint of the modal class• Modal class : the class with greatest frequency

Class Interval Frequency20-under 30 630-under 40 1840-under 50 1150-under 60 1160-under 70 370-under 80 1

Page 19: Statr sessions 4 to 6

Variance and Standard Deviation of Grouped Data

Page 20: Statr sessions 4 to 6

Variance and Standard Deviation of Grouped Data

618111131

50

f

20-under 3030-under 4040-under 5050-under 6060-under 7070-under 80

Class Interval

253545556575

M15063049560519575

2150

fM

-18-82

122232

M M 2

324644

144484

1024

f M2

19441152

441584145210247200

Page 21: Statr sessions 4 to 6

Measures of Shape - Skewness

• Symmetrical – the right half is a mirror image of the left half

• Skewed – shows that the distribution lacks symmetry; used to denote the data is sparse at one end, and piled at the other end– Absence of symmetry– Extreme values or “tail” in one side of a distribution– Positively- or right-skewed vs. negatively- or left-skewed

Page 22: Statr sessions 4 to 6

Measures of Shape - Skewness

0 5 10 15 20

0.0

00

.05

0.1

00

.15

x

y

0 5 10 15 20

0.0

00

.05

0.1

00

.15

x

y

Positively- or right-skewed vs. negatively- or left-skewed

Page 23: Statr sessions 4 to 6

5-Number Summary

1. L, the smallest value in the data set2. Q1, the first quartile (also P25)

3. , the median (also P50 and 2nd quartile)

4. Q3, the third quartile (also P75)

5. H, the largest value in the data set

The 5-number summary indicates how much the data is spread out in each quarter

Page 24: Statr sessions 4 to 6

Box-and-Whisker Plot

A graphic representation of the 5-number summary:• The five numerical values (smallest, first quartile, median, third

quartile, and largest) are located on a scale, either vertical or horizontal

• The box is used to depict the middle half of the data that lies between the two quartiles

• The whiskers are line segments used to depict the other half of the data

• One line segment represents the quarter of the data that is smaller in value than the first quartile

• The second line segment represents the quarter of the data that is larger in value than the third quartile

Page 25: Statr sessions 4 to 6

Example: Box-and-Whisker Plot

Example: A random sample of students in a sixth grade class was selected. Their weights are given in the table below. Find the 5-number summary for this data and construct a boxplot:

63 64 76 76 81 83 85 86 88 89 90 91 92 93 93 93 94 97 99 99 99 101 108 109 112

63 85 92 99 112L Q1

~x Q3 H

Page 26: Statr sessions 4 to 6

Example: Box-and-Whisker Plot

Weights from Sixth Grade Class

11010090807060

L Q1~x Q3 H

Weight


Recommended