Post on 14-May-2015
transcript
Descriptions of DataDescriptions of Data
Measures of Central TendencyMeasures of Central Tendency
Definition:Definition: A Measure of Central Tendency has been A Measure of Central Tendency has been defined as a statistic calculated from a set of defined as a statistic calculated from a set of observations or scores and designed to typify or observations or scores and designed to typify or represent that series. It is also defined as the tendency of represent that series. It is also defined as the tendency of the same observations or cases to cluster about a point, the same observations or cases to cluster about a point, with either to an absolute value or to a frequency of with either to an absolute value or to a frequency of occurrence; usually but not necessarily, about midway occurrence; usually but not necessarily, about midway between the extreme high and the extreme low values in between the extreme high and the extreme low values in the distribution.the distribution.
Measures of Central Tendency
The Mean
Definition: The arithmetic mean or simply the mean is the average of a group of measures.
Characteristics of the mean
1. The arithmetic mean, or simply mean is the center of gravity
or balance point of a group of measures.
2. The mean is easily affected by a change in the magnitude of any of the measures.
Characteristics of the MeanCharacteristics of the Mean
3. The mean is the most reliable measure of central tendency because it is always the center of gravity of any group of measures.
Uses of the Mean
Compute the mean when
1. the mean of a group of measures is needed.2. the center of gravity or balanced point of a group of
measures is wanted.3. every measure should have an effect upon the measure of
central tendency.
Uses of the MeanUses of the Mean
Compute the mean when
4. the most reliable measure of central tendency is desired.
5. the group from which the mean has been derived is more or less homogeneous and a more realistic mean is desired. For instance, the mean of the measure 11, 12, 13, 50, and 64 is 30 which is very far from any of the measures and therefore not realistic.
6. other statistical measures involving the mean are to be computed. Examples of such measures are the standard deviation, coefficient of correlation, critical ratio, etc..
Definition: The arithmetic mean or simply the mean of a data set is the sum of the values divided by the number of values. That is, if X1, X2, . . . , XN are the individual scores in a population of size N, then the population mean is defined as:
Definition: If X1, X2, . . . , Xn are the individual scores in a sample size n, then the sample mean is defined as:
N
XN
ii
1
X
n
XX
n
ii
1
Example 1: Find the mean of the following scores: 4, 10, 7, 5, 9,7.
Example 2: A sample of n = 6 scores has a mean of M = 40. One new score is added to the sample and the new mean is found to be M = 42. What can you conclude about the value of the new score?
Definition: For group data or those which are placed in a frequency distribution table, the mean can be approximated by the following formula:
N
fX
n
fXX or
Example: Consider the following frequency distribution table of the 15 graduate behavioral statistics students.
Classes Frequency
10 – 19 5
20 – 29 4
30 – 39 3
40 – 49 2
50 – 59 1
The Weighted MeanThe Weighted Mean
Definition: The Weighted Mean is a variation of the arithmetic mean which assigns weight to the individual scores in a data set.
where - the weighted mean
- the weight
- the individual scores
- number of cases
n
ii
n
iii
W
XWXW
1
1
XW
iW
iX
n
Example: Suppose we have determined the digit span for a brief time period) in thirty - seven – 4 year – olds. What is the mean digit span for our sample?
X f
6 2
5 7
4 17
3 5
2 3
1 2
0 1
Example: Consider the following item in a questionnaire .
Do you agree that RH bill be implemented?
Please check your attitude.
_____ Strongly agree
_____ Agree
_____ Fairly agree
_____ Disagree
_____ Strongly disagree
Suppose 10 individuals were asked to answer the preceding question and the following responses are obtained:
3 - Strongly Agree, 4 – Agree, 2 – Disagree, and 1 – Strongly disagree. What is the average numerical response and its categorical equivalent?
Note: Consider the following Hypothetical Mean Range for a 5 point scale categorical responses:
4.20 - 5.00 - Strongly Agree
3.40 - 4.19 - Agree
2.60 - 3.39 - Fairly Agree
1.80 - 2.59 - Disagree
1.00 - 1.79 - Strongly Disagree
The MedianThe Median
Definition: The median is the middle most value in an ordered sequence of data.
Remark: The median is unaffected by any extreme observations in a set of data and hence, whenever an extreme observation is present, it is appropriate to use the median rather than the mean to describe a set of data.
Statistical Treatment: For an even number of observations:
22
2
2
nn XX
Md
For an odd number of observations:
Example: A manufacturer of flashlight batteries took a sample of 13 from a day’s production and burned them continuously until they failed. The number of hours they burned were
342 426 317 545 264 451 1049
631 512 266 492 562 298.
Determine the median.
2
1 nXMd
Example: The following data are the amount of calories in a 30 – gram serving for a random sample of 10 types of fresh – baked chocolate chip cookies.
_______________________________________________
Product Calories
_______________________________________________
Hillary Rodham Clinton’s 153
Original Nestle Toll House 152
Mrs. Fields 146
Stop and Shop 138
Duncan Hines 130
David’s 146
David’s Chocolate Chunk 149
Great American Cookie Company 138
What is the median amount of calories?
The ModeThe Mode
Definition: The mode is the value in a set of data that appears most frequently. It may be obtained from an ordered array.
Remark: Unlike the arithmetic mean, the mode is not affected by the occurrence of any extreme values. However, the mode is used only for descriptive purposes because it is more variable from sample to sample than other measures of central tendency.
Example: Consider the out – of – state tuition rates for the six – school sample from Pennsylvania.
4.9 6.3 7.7 8.9 7.7 10.3 11.7
The MidrangeThe Midrange
Definition: The midrange is the average of the smallest and largest observations in a set of data.
Statistical Treatment:
Remark: The midrange is often used as a summary measure both by financial analysts and by weather reporters, since it can provide an adequate, quick, and simple measure to characterize the entire data set – be it a series of daily closing stock prices over a whole year or a series of recorded hourly temperature readings over a whole day.
2argestlsmallest XX
Midrange
Note: In dealing with data such as daily closing stock prices or hourly temperature readings, an extreme value is not likely to occur. Nevertheless, in most applications, despite its simplicity, the midrange must be used cautiously.
Remark: The midrange becomes distorted as a summary measure of central tendency if an outlier is present.
Measures of Non-central LocationMeasures of Non-central Location
Definition: The measures of non-central location or fractiles are values below which a specified fraction or percentage of a given observation in a data set must fall.
Remark: The measures of non-central location are employed particularly when summarizing or describing the properties of large sets of numerical data
Types of Fractiles
Definition: The percentiles are the 99 score points which divide a distribution of scores into 100 equal parts.
Notation: where iP ni , 3, 2, ,1
Ungrouped Data:
Formula:
observation of the data set
placed in array
where i = 1, 2, 3, . . . , 99.
Grouped Data:
Definition: The deciles are the 9 score points which divide the array of observations into 10 equal parts.
Ungrouped Data: score
where i = 1, 2, 3, . . . , 9
th
i
niP
100
1 theof value
f
CFin
cLCBPpre
Pi i
100
th
i
niD
10
1 theof value
Grouped Data:
Definition: The quartiles are the 3 score points which divide the array of observations into 4 equal parts.
Ungrouped Data: observation of the
data set placed in array
where i = 1, 2, 3, . . . , 9
f
CFin
cLCBDpre
Di i
10
th
i
niQ
4
1 theof value
Grouped Data:
f
CFin
cLCBQpre
Qi i
4
Measures of VariationMeasures of Variation
Definition: Variation is the amount of dispersion or “spread” in the data.
Types of Measures of Variation
I. The Range – the difference between the largest and smallest
observations in a set of data.
Range = Xlargest - Xsmallest
Remark: The range measures the total spread in the set of data. Although the range is a simple measure of total variation in the data, its distinct weakness is that it does not make into account how the data are actually distributed between the smallest and largest values.
The Inter - quartile Range
Definition: The inter – quartile range (also called midspread) is the difference between the third and first quartiles in a set of data.
Inter – quartile = Q3 – Q1
The Variance and the Standard Deviation
- the measures of variation that takes into account on how all
the values in the data set are distributed.
- the measures evaluate how the values fluctuate about the
mean.
Statistical Treatment:
Population Standard Deviation:
Population Variance:
N
X i
N
i
2
1
N
XN
ii
1
2
2
Sample Standard Deviation:
Sample Variance:
Computational Formula:
1
1
2
n
XXs
n
ii
1
1
2
2
n
XXs
n
ii
1
1
22
2
nn
XXns
n
iii
11
2
1
nn
XXn
s
n
i
n
iii
Example: Consider again the out – of – state tuition rates for the six – school sample from Pennsylvania.
4.9 6.3 7.7 8.9 7.7 10.3 11.7
Determine the following:
1. Range
2. Inter – quartile Range
3. Standard Deviation
4. Variance
The Coefficient of VariationThe Coefficient of Variation
Definition: The coefficient of variation is a relative measure of variation. It is expressed as a percentage rather than in terms of the units of the particular data.
Statistical Treatment:
%100
X
sCV
Measures of SkewnessMeasures of Skewness
Definition: The measures of skewness show the degree of symmetry or asymmetry of a distribution and also indicate the direction of skewness.
Types of Skewness
I. Positively Skewed – has a longer tail to the right.
- more concentration of values below than above the mean.
- XMM d 0
II. Negatively Skewed – has a longer tail to the left.
- more concentration of values above than below the mean.
-
Pearson’s Coefficient of Skewness - use to determine the direction of skewness.
Remark: a) If SK > 0, then the distribution is skewed to the right.
b) SK < 0, then the distribution of the data set is skewed to left.
c) If SK = 0, then the distribution is symmetric.
MoMdX
Example: Consider again the out – of – state tuition rates for the six – school sample from Pennsylvania.
4.9 6.3 7.7 8.9 7.7 10.3 11.7
Determine the direction of skewness of the preceding data.
Measures of Kurtosis
Definition: The measures of kurtosis show the relative flatness or peakedness of a distribution.
Types of Kurtosis
I. Platykurtic – a distribution which is relatively flat.
II. Mesokurtic – a distribution which is between platykurtic
and leptokurtic.
III. Leptokurtic – a usually peaked distribution.
Coefficient of Kurtosis – use to determine the relative flatness of peakedness of a distribution.
Statistical Treatment:
Remark: a) Ku = 3, then the distribution is mesokurtic
b) Ku > 3, then the distribution is leptokurtic.
c) Ku < 3, then the distribution is platykurtic
Example: Consider again the out – of – state tuition rates for the six – school sample from Pennsylvania.
4.9 6.3 7.7 8.9 7.7 10.3 11.7
Determine the direction of skewness of the preceding data.
3
1
3
ns
XXKu
n
ii