Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | juliette-babbit |
View: | 224 times |
Download: | 3 times |
Probabilistic & Statistical Techniques
Probabilistic & Statistical Techniques
Eng. Tamer Eshtawi
First Semester 2007-2008
Eng. Tamer Eshtawi
First Semester 2007-2008
Lecture 5
Chapter 2 (part 3)
Statistics for Describing
DataMain Reference: Pearson
Education, Inc Publishing as Pearson Addison-Wesley.
Section 3-4Measures of position
Key Concept
This section introduces measures that can be used to compare values from different data sets, or to compare values within the same data set. The most important of these is the concept of the z score.
z Score (or standardized value)the number of standard
deviations that a given value x is above or below the mean
Definition
Sample
x
z
Population
Round z to 2 decimal places
Measures of Position z score
s
xxz
Interpreting Z Scores
Whenever a value is less than the mean, its corresponding z score is negative
Ordinary values: z score between –2 and 2 Unusual Values: z score < -2 or z score > 2
Definition
Q1 (First Quartile) separates the bottom 25% of sorted values from the top 75%. Q2 (Second Quartile) same as the median; separates the bottom 50% of sorted values from the top 50%.
Q1 (Third Quartile) separates the bottom 75% of sorted values from the top 25%.
Q1, Q2, Q3 divide ranked scores into four equal
parts
Quartiles
25% 25% 25% 25%
Q3Q2Q1(minimum) (maximum)
(median)
Find lower & upper Quartile
To fined Q1, first calculate one-quarter of n and add ½ to obtain ¼ n + ½ . Round this to nearest integer.
Example 1 1 1 2 3 3 8 11 14 19 19 20
n = 11,then ¼ n + ½ = ¼ (11)+½ = 3.25 rounded off to 3Q1 = 2Q3 = 19
Example 2 2 5 5 6 7 10 15 21 21 23 23 25
n = 12,then ¼ n + ½ = ¼ (12)+½ = 3.5 then the Q1 in position 3 & 4 which is (5+6)/2=5.5
Q2 in position 9 & 10 which is (21+23)/2=22
Percentiles
Just as there are three quartiles separating data into four parts, there are 99 percentiles denoted P1, P2, . . . P99, which partition the data into 100 groups.
Percentile of value x = • 100number of values less than x
total number of values
n total number of values in the data set
k percentile being used
Notation
Converting from the kth Percentile to the Corresponding Data Value
nK
100
P oflocation k
Find the percentile corresponding the weight of 0.8143& find P10, P25
Example 1
81465.02
0.8150.8143 10&9 936
100
25
0.8073 4 6.336100
10
2210036
80.8143 of percentil
thth25
th10
22
betweenP
P
P
Solution
Interquartile Range (or IQR): Q3 - Q1
10 - 90 Percentile Range: P90 - P10
Semi-interquartile Range:2
Q3 - Q1
Midquartile:2
Q3 + Q1
Some Other Statistics
Recap
In this section we have discussed:
z Scores
z Scores and unusual values
Quartiles
Percentiles
Other statistics
Section 3-5Exploratory Data Analysis (EDA)
This section discusses outliers, then introduces a new statistical graph called a boxplot, which is helpful for visualizing the distribution of data.
Key Concept
Important Principles
An outlier can have a dramatic effect on the mean.
An outlier can have a dramatic effect on the standard deviation.
An outlier can have a dramatic effect on the scale of the histogram so that the true nature of the distribution is totally obscured.
For a set of data, the 5-number summary consists of the minimum value; the first quartile Q1; the median (or second quartile Q2); the third quartile, Q3; and the maximum value.
A boxplot is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q1; the median; and the third quartile, Q3.
Definitions
Boxplots
Boxplots – cont.
Boxplots – cont.
Boxplots – cont.
Boxplots - Example
Recap
In this section we have looked at:
Exploratory Data Analysis
Effects of outliers
5-number summary
Boxplots
General Examples
Example 1
1612
192
n
xx
Fine mean, median, mode, midrange
x27817111525161414141318
192
x81113141414151617182527
5.172
278
14
5.142
1514
Midrange
Mode
Median
Solution
Example 2
0)5(6
9144.4)025.4(6
)1(
2
22
s
nn
xxns
Fine Standard deviation, variance for each of the two sample
x x2
0.8192 0.67110.815 0.66420.8163 0.66630.8211 0.67420.8181 0.66930.8247 0.68014.9144 4.025
Coke
x x2
0.8258 0.68190.8156 0.66520.8211 0.67420.817 0.66750.8216 0.6750.8302 0.68924.9313 4.053
Pepsi
32
22
1006.3)5(6
9313.4)053.4(6
)1(
s
nn
xxns
Example 3
xz
62.0
20.989.2
62.0
2.98100
100 )
z
xa
262.0
2.9896.96
96.96 )
z
xb
062.0
2.982.98
2.98 )
z
xc
Example 4Fine the indicated quartile or percentilea) Q1, b) Q3, c) P80, d) P33
Q1 position = ¼ n + ½ = ¼ (36)+½ = 9.5 (between 9th – 10th)
Q1= ( 0.8143+0.815 )/2=0.8147
Q3= ( 0.8207+0.8211 )/2=0.8209
8229.0 12 88.1136100
33
0.8152 92 8.2836100
80
th25
th80
P
P
Example 5Draw the boxplot for the following data set
-1 0 0 0 0 0 0 0 0 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 4 4 4 4 5 5 5 5 5 6 6 7 13 Sum = 139
673.252
139
n
xx
Solution
13 valuemaximum
1- valueminimum
5.32
43
214&13
5.1352 oflocation
2
22
22
3
114
21
41
21
41
1
Q
Qbetween
nQ
Mode
Median
th
Flash points
Which measure of center is the only one that can be used with data at the nominal level of measurement?
A. Mean
B. Median
C. Mode
Which of the following measures of center is not affected by outliers?
A. Mean
B. Median
C. Mode
Find the mode (s) for the given sample data.
79, 25, 79, 13, 25, 29, 56, 79
A. 79
B. 48.1
C. 42.5
D. 25
Which is not true about the variance?
A. It is the square of the standard deviation.
B. It is a measure of the spread of data.
C. The units of the variance are different from the units of the original data set.
D. It is not affected by outliers.
Weekly sales for a company are $10,000 with a standard deviation of $450. Sales for the past week were $9050. This is
A. Unusually high.
B. Unusually low.
C. About right.
In a data set with a range of 55.1 to 102.8 and 300 observations, there are 207 data points with values less than 88.6. Find the percentile for 88.6.
A. 32
B. 116.03
C. 69
D. 670
H.W 2Fine mean, median, mode, midrange, range, standard deviation, variance, P30Then draw the Boxplot
Age of US President