Date post: | 03-Apr-2018 |
Category: |
Documents |
Upload: | mrfrederick87 |
View: | 218 times |
Download: | 0 times |
of 43
7/28/2019 Lecture 03. Numerical Descriptive Statistics
1/43
Statistics
ST 361: Statistics for Engineers
Numerical Descriptive Statistics
Kimberly Weems
5260 SAS Hall
mailto:[email protected]:[email protected]7/28/2019 Lecture 03. Numerical Descriptive Statistics
2/43
Statistics
Numeric Measures
Why?Kim is in an introductory history class. On the
midterm exam Kim scored 64 out of 100? Did she
do well?
The class average was a 42.
By knowing the average for the class we can
make a comparison.
7/28/2019 Lecture 03. Numerical Descriptive Statistics
3/43
Statistics
Numeric Measures
Allow us to make comparisonsOf individuals to the group
Of group to other groups
Measures of centerGive an idea about the main chunk of the data
7/28/2019 Lecture 03. Numerical Descriptive Statistics
4/43
Statistics
Measures of Central Tendency
Mean-average Notation:
Population mean: mu
Sample mean: y-bary
7/28/2019 Lecture 03. Numerical Descriptive Statistics
5/43
Statistics
Measures of Central Tendency
Summation Notation
1 2 3
1
...
n
i n
i
y y y y y
7/28/2019 Lecture 03. Numerical Descriptive Statistics
6/43
Statistics
Measures of Central Tendency
Summation Notation
1 2 3
1
...
n
i n
i
y y y y y
Sum of y
7/28/2019 Lecture 03. Numerical Descriptive Statistics
7/43Statistics
Measures of Central Tendency
Summation Notation
1 2 3
1
...
n
i n
i
y y y y y
Sum of yIndividualvalues
7/28/2019 Lecture 03. Numerical Descriptive Statistics
8/43
Statistics
Measures of Central Tendency
iyy
n
7/28/2019 Lecture 03. Numerical Descriptive Statistics
9/43
Statistics
Measures of Central Tendency
iyy
n
Sum of thevalues
Sample size
7/28/2019 Lecture 03. Numerical Descriptive Statistics
10/43
Statistics
Measures of Central Tendency
Median- Middle value in a data set whenvalues are put in increasing order
50% of values above and 50% below
If even number of observations just averagemiddle two.
7/28/2019 Lecture 03. Numerical Descriptive Statistics
11/43
Statistics
Simple Example:
A health researcher examined the amount ofsoda that a group of teenagers consumed
during a day. The resulting amounts in ounces
were: 9, 9, 6, 15, 12, 14, and 40.
7/28/2019 Lecture 03. Numerical Descriptive Statistics
12/43
Statistics
Simple Example:
A health researcher examined the amount ofsoda that a group of teenagers consumed
during a day. The resulting amounts in ounces
were: 9, 9, 6, 15, 12, 14, and 40.
Mean: 15
9+9+6+15+12+14+40
7
10515
7
iyy
n
7/28/2019 Lecture 03. Numerical Descriptive Statistics
13/43
Statistics
Simple Example:
Soda consumed Median:
In increasing order 6 9 9 12 14 15 40
7/28/2019 Lecture 03. Numerical Descriptive Statistics
14/43
Statistics
Simple Example:
Soda consumed Median: 12
In increasing order 6 9 9 12 14 15 40
7/28/2019 Lecture 03. Numerical Descriptive Statistics
15/43
Statistics
Mean vs Median
A health researcher examined the amount ofsoda that a group of teenagers consumed
during a day. The resulting amounts in ounces
were: 9, 9, 6, 15, 12, 14, and 40.
0 10 20 30 40
7/28/2019 Lecture 03. Numerical Descriptive Statistics
16/43
Statistics
Mean vs Median
A health researcher examined the amount ofsoda that a group of teenagers consumed
during a day. The resulting amounts in ounces
were: 9, 9, 6, 15, 12, 14, and 40.
0 10 20 30 40
mean
7/28/2019 Lecture 03. Numerical Descriptive Statistics
17/43
Statistics
Mean vs Median
A health researcher examined the amount ofsoda that a group of teenagers consumed
during a day. The resulting amounts in ounces
were: 9, 9, 6, 15, 12, 14, and 40.
0 10 20 30 40
meanmedian
7/28/2019 Lecture 03. Numerical Descriptive Statistics
18/43
Statistics
Problem with the mean:
Sensitive to unusual values and skewed datapulled away from the median
Skewed Right
Mean greater than median Skewed left
Mean less than median.
SymmetricMean and median are about the same.
7/28/2019 Lecture 03. Numerical Descriptive Statistics
19/43
Statistics
Trimmed Mean
A compromise between the average and themedian.
Less sensitive to outliers.
Observations are ordered from smallest to largest.A trimming percentage 100r% is chosen where r is
a number between 0 and 0.5.
Suppose r=0.1, so that the trimming percentage is
10%. Then if n=20, 10% of 20 is 2: the trimmedmean results from deleting (trimming) the largest
2 observations and the 2 smallest.
7/28/2019 Lecture 03. Numerical Descriptive Statistics
20/43
Statistics
CoalEmissions Uncertainty Project
(2009-10), Alissa Anderson,Colin Geisenhoffer, Brody
Heffner, Michael Shaw & Emily
Wisner
After 2% Trim
Before 2% Trim
7/28/2019 Lecture 03. Numerical Descriptive Statistics
21/43
Statistics21
Measures of Variability
Why? Tell us about consistency and predictability
Allow comparison of groups
Gives scale of reference to compare individuals
7/28/2019 Lecture 03. Numerical Descriptive Statistics
22/43
Statistics22
Measures of Variability
Range-difference in maximum and minimumHow spread out are the values
Soda Amounts: Range = 40-6=34
0 10 20 30 40
7/28/2019 Lecture 03. Numerical Descriptive Statistics
23/43
Statistics23
Measures of Variability
Problem: Range only looks at two values.Does not quantify spread of the others.
Solution: Look at all values => How far are
they from mean Variance- summarizes distance between all
individuals and the mean
7/28/2019 Lecture 03. Numerical Descriptive Statistics
24/43
Statistics24
Measures of Variability
Important notation:Population variance: 2sigma squared
Sample variance: s2
7/28/2019 Lecture 03. Numerical Descriptive Statistics
25/43
Statistics25
Measures of Variability
Important Formula:
2
2 1
N
i
i
y
N
7/28/2019 Lecture 03. Numerical Descriptive Statistics
26/43
Statistics26
Measures of Variability
Important Formula:
2
2 1
N
i
i
y
N
Squaredto get ridofnegatives
Calculateaverage ofthe
squareddistances
7/28/2019 Lecture 03. Numerical Descriptive Statistics
27/43
Statistics27
Measures of Variability
Sample Variance
2
2 1
1
n
i
i
y ys
n
7/28/2019 Lecture 03. Numerical Descriptive Statistics
28/43
Statistics28
Measures of Variability
Sample Variance
2
2 1
1
n
i
i
y ys
n
Divides byn-1 insteadof N
7/28/2019 Lecture 03. Numerical Descriptive Statistics
29/43
Statistics29
Measures of Variability
Sample Variance
2
2 1
1
n
i
i
y ys
n
Divides byn-1 insteadof N
Sum ofsquares
7/28/2019 Lecture 03. Numerical Descriptive Statistics
30/43
Statistics
Simple Example:
A health researcher examined the amount ofsoda that a group of teenagers consumed
during a day. The resulting amounts in ounces
were: 9, 9, 6, 15, 12, 14, and 40.
7/28/2019 Lecture 03. Numerical Descriptive Statistics
31/43
Statistics
7/28/2019 Lecture 03. Numerical Descriptive Statistics
32/43
Statistics
What does it tell us?
By itself not much. Some people try lots of tricks to try to recreate
the data set from this number.
The purpose of the number is to make a
comparison with other data sets.
Example: Another group of teens had soda
consumption that had a variance of 473.2.
Other group was more spread out than our group.
7/28/2019 Lecture 03. Numerical Descriptive Statistics
33/43
Statistics33
The Standard Deviation
Variance is not on the same scale as theoriginal data.
Standard Deviationsquare root of the
variance.Has the same units as original data
Allows more direct comparisons
7/28/2019 Lecture 03. Numerical Descriptive Statistics
34/43
Statistics
The Standard Deviation
For amount of soda
2
131.33 11.46
s s
s
7/28/2019 Lecture 03. Numerical Descriptive Statistics
35/43
Statistics
What does it tell us?
Understand variability in the data.Which is more consistent.
7/28/2019 Lecture 03. Numerical Descriptive Statistics
36/43
Statistics 36
City Temperature
City Raleigh
Mean 59
Median 61
SD 15
7/28/2019 Lecture 03. Numerical Descriptive Statistics
37/43
Statistics 37
City Temperature
City Raleigh Fargo
Mean 59 42
Median 61 43
SD 15 24
7/28/2019 Lecture 03. Numerical Descriptive Statistics
38/43
Statistics 38
City Temperature
City Raleigh Fargo Fairbanks
Mean 59 42
Median 61 43
SD 15 24
7/28/2019 Lecture 03. Numerical Descriptive Statistics
39/43
Statistics 39
City Temperature
City Raleigh Fargo Fairbanks
Mean 59 42 28
Median 61 43 31
SD 15 24 28
7/28/2019 Lecture 03. Numerical Descriptive Statistics
40/43
Statistics 40
City Temperature
City Raleigh Fargo Fairbanks Honolulu
Mean 59 42 28 77
Median 61 43 31 77
SD 15 24 28 3
7/28/2019 Lecture 03. Numerical Descriptive Statistics
41/43
Statistics 41
Coefficient of Variation
Coefficient of Variation (CV)ratio of standarddeviation to mean
Used to compare variability when scales are very
different.
. .s
C Vy
7/28/2019 Lecture 03. Numerical Descriptive Statistics
42/43
Statistics
Example:
Students in a midwestern state take a end ofgrade exam that has a maximum of 100 points.
A class testing a new teaching method had a
standard deviation of 10.
Students in an east coast state take an end of
grade exam that has a maximum of 500 points.
A class testing the new teaching method had a
standard deviation of 30. Which is more
varied?
7/28/2019 Lecture 03. Numerical Descriptive Statistics
43/43
Example
The mean for the midwestern state was 70.
The mean for the east coast state was 350.
10. . 0.143 14.3%
70
sC V
y
30. . 0.086 8.6%
350
s
C V y