Statistics Handouts
Page 1 of 92
MANUAL
IN
STATISTICS … statistics made simple …
19th
edition
Ms. Yumi Vivien V. De Luna, MSME
Subject Teacher
Statistics Handouts
Page 2 of 92
TABLE OF CONTENTS
Exercise No. Title Page
1
2
3
4
5
6
7
8
9
Variables and the Summation Notation
Frequency Distribution Table
Numerical Descriptive Measures
Weighted Means
FPC, Combination and Permutation
Probability
Normal Distribution
Test of Hypothesis I
Test of Hypothesis II
6
17
31
38
54
60
68
77
80
Lesson No. Title Page
1
2
3
4
5
6
7
8
9
10
11
12
Methods of Data Collection and Presentation
Frequency Distribution Table
Numerical Descriptive Measures
Weighted Means
Sampling
FPC, Combinations and Permutations
Probability
Normal Distribution
Estimation
Test of Hypothesis
Two-way ANOVA
Pearson Moment Correlation
7
14
19
32
40
51
55
66
69
72
84
88
Statistics Handouts
Page 3 of 92
Sources/ References:
Concepts, sample problems and information given by this manual were taken from the following :
1. Fundamental Statistics for College Students by Pagoso, et al.
2. Graduate Research Manual – Guide to thesis and Dissertations (Aquinas Graduate School)
3. How to Design and Evaluate Research Education by Fraenkel and Wallen
4. Introduction to Statistics by Walpole
5. Introduction to Statistical Methods by Parel, Alonzo, et al.
6. Laboratory Manual in Statistics I, UPLB
7. Manual on Training on Microcomputer-Based for the Social Sciences (Richie Fernando Hall
AdeNU, 2005)
8. Statistics for the Health Sciences by Kuzma
9. Applied Basic Statistics by Flordeliza Reyes
10. Fundamental Concepts and Methods in Statistics by George Garcia
11. Simplified Statistics for Beginners by Dr. Cesar Bermundo
12. http://statistics.about.com/od/Descriptive-Statistics/a/What-Is-Kurtosis.htm
Statistics Handouts
Page 4 of 92
I. Statistics and its Scope
STATISTICS encompasses all the methods and procedures used in the
collection, presentation, analysis and interpretation of data. DESCRIPTIVE STATISTICS comprise those methods concerned with
collecting and describing a set of data so as to yield meaningful information.
STATISTICAL INFERENCE comprises those methods concerned with the analysis of a subset of data leading to predictions or inferences about the
entire set of data
Population vs Sample Population is the set of all entities and elements under study. Sample is
the subset of population.
Parameters vs Statistics
Parameters refer to all descriptive measures or characteristics of population while statistics refer to sample characteristics.
Census vs Survey
Census is the process of gathering information from every element of the population while survey is the process of gathering information from
every element of the sample.
II. Variables and its Level of Measurement Variable is an observable characteristics of a person or object which is
capable of taking several values or of being expressed in several different categories. It can be either quantitative (discrete or continuous) or qualitative data.
MEASUREMENT SCALES
a. Nominal – are simply labels, names or categories. Number assignment is used for identification purposes, no meaning can be attached to the
magnitude or size of such numbers. Examples are gender, civil status, telephone numbers, etc..
b. Ordinal - whereas nominal scales only classify, ordinal scales do not
only classify but also order the classes. Examples are job position, military ranks, etc...
c. Interval – quantitative but has no true zero point. Examples are IQ,
room temperature, etc... d. Ratio – quantitative and has true zero point. Examples are number of
children; physics test scores, etc…
Statistics Handouts
Page 5 of 92
SUMMATION NOTATION
For a given universe, suppose we observe a variable, say X. We may denote
the first value as X1, the second as X2 and so on. In general, Xi is the observation on variable X made on the ith individual.
Given a set of N observations or data values represented by X1, X2, …, XN, we express their sum as
∑
where Σ is the summation symbol; i is the index of the summation; and
Xi is the summand.
1 is the lower limit N is the upper limit
Theorem 1. If c is a constant, then
∑
∑
Theorem 2. If c is constant, then
∑
Theorem 3. If a and b are constants, then
∑( ) ∑
∑
Statistics Handouts
Page 6 of 92
Exercise # 1 – Variables and the Summation Notation At the end of this exercise, the student must be able to: 1. identify different types of variables 2. classify data according to level of measurement
3. employ summation notation
I. Identify the level of measurement.
A. From all patients admitted in a hospital, the following information are collected:
1. name of patient
2. age
3. sex
4. body temperature 5. blood pressure
6. amt. of deposit
7. first time to see a doctor regarding ailment? (yes/no)
8. heartbeat per minute
9. weight
10. height 11. no. of glasses of fluid intake per day
12. no. of meals taken in a day
B. The following information are of interest for selected students of AdeNU who are cigarette
smokers. 1. age when first smoked
2. average no. of sticks consumed per day
3. main source of allowance
4. amt. of weekly allowance
5. Is your father a smoker? (yes/no)
6. occupation of father 7. brand of cigarette
8. position in the family
II. Instruction will be given by your teacher.
Date Set 1. Data on Head Circumference (in cm) and Foot Length (cm) of 8 Newborn Babies.
Baby no. 1 2 3 4 5 6 7 8
Head
circumference (x)
32 33 37 38 35 32 38 34
Foot length (y) 5.6 6.2 6.8 6.6 6.4 5.4 6.0 6.1
Data Set 2. Data on Height (cm) and Weight (lbs) of 8 Stat Students.
Student no. 1 2 3 4 5 6 7 8
Height(x) 168 141 165 180 165 156 150 147
Weight (y) 110 90 120 125 142 97 105 110
Statistics Handouts
Page 7 of 92
Lesson #1 – Methods of Data Collection and Presentation
METHODS OF DATA COLLECTION
Various methods for data gathering are available. A researcher should be able to use the most appropriate.
1. Survey Method – questions are asked to obtain information, either through self administered questionnaire or interview (personal, telephone or internet)
Ways Advantages Disadvantages
Personal Interview
Flexibility in obtaining
answers
More in-depth answers
Can observe the respondent’s behavior
expensive
field interviews are hard to control
errors in interviewing
time consuming
Mailed Questionnaires
wider geographic distribution of
respondents possible
respondents can answer
at their convenience
no personal interviewer’s
bias
centralized control o
people doing the survey
relatively inexpensive
respondent may be more candid if he/she can
answer anonymously
responses rate may be low
hard to obtain in-depth information
usable mailing list may be unavailable
respondent not the addressee
cannot observe respondent’s behavior
Phone Interview
relatively inexpensive
fast
centralized control of
people doing survey
respondents maybe more
candid
unlisted telephone
number
outdated telephone
directory
interview time needs
to be relatively short
selected sample may
not have telephones
Statistics Handouts
Page 8 of 92
2. Observation Method – makes possible the recording of behavior but only at a
time of occurrence (e.g., observing reactions to a particular stimulus, traffic count).
Advantages over Survey Method:
does not rely on the respondent’s willingness to provide information
certain types of data can be collected only by observation (e.g., behavior
patterns of which the subject is not aware of or ashamed to admit)
the potential bias caused by the interviewing process is reduced or
eliminated Disadvantages over Survey Method:
things such as awareness, beliefs, feelings and preferences cannot be observed
the observed behavior patterns can be rare or too unpredictable thus increasing the data collection costs and time requirements
3. Experimental Method – a method designed for collecting data under controlled conditions. An experiment is an operation where there is actual human interference with the conditions that can affect the variable under
study. This is an excellent method of collecting data for causation studies. If properly designed and executed, experiments will reveal with a good deal of
accuracy, the effect of a change in one variable on another variable. 4. Use of Existing Studies – e.g., census, health statistics, and weather bureau
reports
Two types:
documentary sources – published or written reports, periodicals, unpublished documents, etc.
field sources – researchers who have done studies on the area of
interest are asked personally or directly for information needed
5. Registration method – e.g., car registration, student registration, and
hospital admission
Statistics Handouts
Page 9 of 92
METHODS OF DATA PRESENTATION
1. Textual form – data are incorporated to a paragraph.
Advantages:
This method is appropriate only if there are few numbers to be presented.
Gives emphasis to significant figures and comparisons
Disadvantages:
It is not desirable to include a big mass of quantitative data in a “text” or
paragraph, as the presentation becomes incomprehensible.
Paragraphs can be tiresome to read especially if the same words are
repeated so many times
2. Tabular Presentation – systematic organization of data in rows and columns
Advantages:
More concise than textual presentation
Easier to understand
Facilitates comparisons and analysis of relationship among different categories
Presents data in greater detail than a graph
PARTS OF A STATISTICAL TABLE:
a. Heading – consists of a table number, title and head note. The title explains what are presented, where the data refers and when the data apply.
b. Box Head – contains the column heads which describes the data in each
column, together with the needed classifying and qualifying spanner heads.
c. Stub – these are classification or categories found at the left. It describes
the data found in the rows of the table.
d. Field – main part of the table
e. Source Note – an exact citation of the source of data presented in the table (should always be placed when figures are not original)
Statistics Handouts
Page 10 of 92
Illustration:
Table 4.4 Philippines Crime Volume and Rate by Type in 1991
1991
Type Volume Crime Rate
Total
Index Crimes
Murder
Homicide Physical Injury
Robbery
Theft
Rape
Non Index Crimes
11,326
77,261
8,707
8,068 21,862
13,817
22,780
2,026
44,065
195
124
8,707
8,069 21,862
13,817
88,780
2,026
71
Source: Philippines National Police
Guidelines:
Title should be concise, written in telegraphic style, not in complete
sentence
Column labels should be precise.
Categories should not overlap.
Unit of measure must be clearly stated
Show any relevant total, subtotals, percentages, etc..
Indicate if the data were taken from another publication by including a source note
Tables should be self-explanatory, although they may be accompanied by a paragraph that will provide an interpretation or direct attention to
important figures
BOXHEAD
d
STUB FIELD
HEADING
SOURCE NOTE
Statistics Handouts
Page 11 of 92
3. Graphical Presentation- a graph or chart device for showing numerical
values or relationship in pictorial form
Advantages:
main feature and implication of a body of data can be grasped at a glance
can attract attention and hold the reader’s interest
simplifies concepts that would otherwise have been expressed in so many words
can readily clarify data, frequently bring out hidden facts and relationship
Common Types of Graph
a. Line Chart – graphical presentation of data especially useful for showing trends
over a period of time.
b. Pie Chart – a circular graph that is useful in showing how a total quantity is
distributed among a group of categories. The “pieces of the pie” represent the proportions of the total that fall into each category.
c. Bar Chart – consists of a series of rectangular bars where the length of the bar
represents the quantity or frequency for each category if the bars are arranged horizontally. If the bars are arranged vertically, the height of the bar represents the quantity
d. Pictorial Unit chart – a pictorial chart in which each symbol represents a definite
and uniform value
Statistics Handouts
Page 12 of 92
THE STEM-AND-LEAF DISPLAY
The stem-and-leaf display is an alternative method for describing a set of data.
It presents a histogram-like picture of the data, while allowing the experimenter to retain the actual observed values of each data point. Hence, the stem-and-leaf display is partly tabular and partly graphical in nature.
In creating a stem-and-leaf display, we divide each observation into two parts,
the stem and the leaf. For example, we could divide the observation 244 as follows: Stem Leaf
2
Alternatively, we could choose the point of division between the units and
tens, whereby Stem Leaf
24 The choice of the stem and leaf coding depends on the nature of the data set.
Steps in Constructing the Stem-and –Leaf Display
1. List the stem values , in order, in a vertical column 2. Draw a vertical line to the right of the stem value 3. For each observation, record the leaf portion of that observation in the row
corresponding to the appropriate stem 4. Reorder the leaves fro lowest to highest within each stem row. Maintain
uniform spacing for the leaves so that the stem with the most number of
observations has the longest line. 5. If the number of leaves appearing in each row is too large, divide the stem into
two groups, the first corresponding to leaves beginning with digits 0 through 4 and the second corresponding to leaves beginning with digits 0 through 4 and the second corresponding to leaves beginning with digits 5 through 9. This
subdivision can be increased to five groups if necessary. 6. Provide a key to your stem-and-leaf coding so that the reader can recreate the
actual measurements from your display.
Statistics Handouts
Page 13 of 92
Example: Typing speeds (net words per minute) for 20 secretarial applicants
68 72 91 47
52 75 63 55
65 35 84 45
58 61 69 22
46 55 66 71
Stem Leaf (unit=1)
2 3
4 5 6
7 8
9
2 5
5 6 7 2 5 5 8 1 3 5 6 8 9
1 2 5 4
1
Note: The stem-and –leaf display should include a reminder indicating the units of the data value.
Example:
Unit = 0.1 1 2 represents 1.2 Unit = 1 1 2 represents 12
Unit = 10 1 2 represents 120
Statistics Handouts
Page 14 of 92
Lesson #2 – Frequency Distribution Table
Date Set. Given below is the distribution of statistics test scores of 50 students (Perfect score is 70
and passing score is 60% of it )
5
8
10
18
19
20
20
20
20
21
21
21
23
23
23
24
25
25
25
26
27
28
29
29
30
30
30
32
35
35
35
35
36
36
37
38
39
40
40
40
45
47
48
49
50
55
58
59
60
70
Steps in the construction of frequency distribution: 1. Determine the range R of the distribution.
R = highest observed value – lowest observed value = 70 - 5 = 65
2. Determine the number of classes, k, desired. By the square root rule
K = N , where N = total number of observations
= √ K 7
the number of classes is to be rounded off to the nearest WHOLE NUMBER.
3. Calculate the class size, c.
First find: c’ = R/K =
The class size is to have the SAME PRECISION AS TO THE RAW DATA and should take the value nearest to c’. Hence, c’ = 9
4. Enumerate the classes or categories based on the quantities calculated in steps
1-3 bearing in mind that:
a) the lowest class must include the lowest observed value and the highest class,
the highest observed value. (The lowest value of the data is the lower class
limit of the first class). b) That each observation will go into one and only class (that none of the values
can fall into possible gaps between successive classes and that the classes do not overlap).
Successive lower class limits may be obtained by adding c’ to the preceding lower class limit. And so with the upper limits.
Statistics Handouts
Page 15 of 92
I. Tally the observations to determine the class frequency or the number of observations falling into each class.
Classes Frequency 5 - 13 3
14 - 22 9
23 - 31 15
32 - 40 13
41 - 49 4 50 - 58 3
59 - 67 2
68 - 76 1
II. Add other informative columns.
1. True Class Boundaries (TCB) – remove discontinuity between classes and
consider the true range of values.
(Lower TCB) LTCB = LL – 0.5 (unit)
(Upper TCB) UTCB = UL + 0.5(unit)
a unit depends on the precision of data
example. 1st class: LTCB = 5 - 0.5(1) = 4.5 UTCB = 13 + 0.5(1) = 13.5
Note:
If data Unit of precision
is a whole number has 1 decimal place
has 2 decimal places
1 0.1
0.01
2. Class Mark (CM) = the center of a class. It is the midpoint of the class
interval where observations in a class tend to cluster about.
CM = ( )
3. Relative Frequency (RF) – proportion of observations falling in one class (in
%)
RF =
x 100%
Statistics Handouts
Page 16 of 92
FREQUENCY DISTRIBUTION TABLE
Classes
LL UL
True Class
Boundaries (TCB)
LTCB UTCB
CM
Freq
RF (%)
CF
< >
RCF
< >
5 - 13
14 - 22
23 - 31
32 - 40
41 - 49 50 - 58
59 - 67
4.5 - 13.5
13.5 - 22.5
22.5 - 31.5
31.5 - 40.5
40.5 - 49.5 49.5 - 58.5
58.5 - 67.5
9
18
27
36
45 54
63
3
9
15
13
4 3
2
6
18
30
26
8 6
4
Statistics Handouts
Page 17 of 92
Exercise # 2 – Frequency Distribution Table Objectives: At the end of the exercise, the student is expected to: 1. describe the different methods of data presentation; 2. organize data by constructing a frequency distribution table
A. On organizing data: Construct an FDT for the given data. Show computations for R, K and c.
Table 1. Stat Midterms Scores of Section N3 Students, 1
st sem 2014
Student # Scores
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
12 17 15 38 38 38 43 47 47 47 53 57 57 58 58 60 62 67 70 70 70 72 73 77 77 77 77 77 78 80 82 87 87 92 92 93 93 93 95 100
Statistics Handouts
Page 18 of 92
Table 2.
Average Life Expectancy of 30 Selected Countries 2011 (source: www.statista.com)
Country
Life Expectancy
Hongkong
Japan
Italy Iceland
Switzerland
France
Spain
Singapore Australia
Israel
Sweden
United Kingdom
Norway
Luxembourg South Korea
Canada
New Zealand
Netherlands
Austria Ireland
Belgium
United States
Poland
Mexico
China Indonesia
Philippines
India
Cambodia
South Africa
84
83
83 83
83
82
82
82 82
82
82
82
81
81 81
81
81
81
80 80
80
79
77
74
73 69
69
66
65
53
Statistics Handouts
Page 19 of 92
Lessons # 3 – Numerical Descriptive Measures
NUMERICAL DESCRIPTIVE MEASURES
I. Measure of Location – value within the range of the data which describes its location or position relative to the entire set of data. The more common measures are measures of central tendency, percentile, decile and quartile.
A. Measure of Central Tendency – describes the “center” of the data. It is a
single value about which the observations tend to cluster. The common
measures are mean, median and mode.
Characteristics
When to Use
1. Mean ( ) – sum of the observations
divided by the number of observations totaled
interval statistic
calculated average
value is determined
by every case in the distribution
affected by extreme values
variables are in at least
interval scale
value of each score is
desired
values are considerably
concentrated or closed to each other
2. Median (Md) – middle value of an array
ordinal statistic
rank or position
average
not affected by
extreme values
ordinal interpretation is needed
middle score is desired
presence of extreme
values
3. Mode (Mo) –
observations which occurs most
frequently in the data set
nominal statistics
inspection average
not unique; have more than one mode
most “popular” score
unaffected by
extreme values
represents the
majority
nominal interpretation
is needed
quick approximation of
central tendency desired
Statistics Handouts
Page 20 of 92
B. Percentile (Pi) – divides the data set into 100 equal parts, each part having
one percent of all the data values. For example, if patrick received a rating of 90th percentile in the National Secondary Achievement Test, this means
that 90% of the students who took the test had scores lower than Patrick’s. C. Decile (Di) – divides a data set into ten equal parts, each part having ten
percent of all data values. The first decile is the 10 th percentile, the second decile is the 20th pe4rcentile, and so on, up to the tenth decile which is the
100th percentile.
D. Quartile (Qi) – divides a data set into four equal parts, each part having
twenty-five percent of all data values. The first quartile is the 25th percentile, the second is the 50th percentile, the third is the 75th percentile,
and the fourth quartile is the 100th percentile.
II. Measure of Dispersion – describes the extent to which the data are dispersed.
The more commonly used measures are:
A. Range (R)
- not a stable measure of variation because it can fluctuate greatly with a change in just a single score, either the highest or the lowest
- easiest to compute but the LEAST SATISFACTORY because its value is dependent only upon the two extremes
B. Variance (s2/ ) - considers the position of each observation relative to the mean
of the set; denoted by 2
C. Standard Deviation (s/) - best measure of variation - important as a measure of heterogeneity or unevenness within
a set of observations - used when comparing two or more sets of data having the
same units of measurement
D. Coefficient of Variation ( CV )
- used to compare the variability of 2 or more sets of data even when the observations are expressed in different units of measurement.
Statistics Handouts
Page 21 of 92
III. Measure of Skewness (SK) – describes the extent of departure of the
distribution of the data from symmetry.
SK = 0, Symmetric Distribution
the median is the score pt. which bisects
the total area. Half of the area would fall to the left and half to the right
mode is the score pt. with the highest
frequency, the pt. on the x-axis corresponds to the tallest pt. of the curve
mean is the score pt on the x-axis that corresponds to the pt. of balance
SK > 0, Positively Skewed
bump on the left indicates that the mode
corresponds to a low value
tail extending to the right means that the
mean, which is sensitive to each score value, will be pulled in the direction of the extreme scores and will have a high
value
median which is unaffected by extreme
values will have a value between the mode and the mean
SK < 0, Negatively Skewed
mean will have a lower numerical value
than the median because the extremely low scores will pull the mean to the left
bump usually occurs at the right indicating that the mode has a high
numerical value
median will still be in the middle
Statistics Handouts
Page 22 of 92
IV. Measure of Kurtosis – measures the degree of peakedness of a data of distribution, denoted by k. If the distribution of the data is bell-shaped, k=3. If
the shape of the distribution is relatively peaked, k>3. If the shape is relatively flat, k<3.
K= 3
A distribution that is peaked in the same way as any normal distribution, not just the standard
normal distribution, is said to be mesokurtic. The peak of a mesokurtic distribution is neither high nor low, rather it is considered to be a baseline for
the two other classifications.
K> 3
A leptokurtic distribution is one that has kurtosis
greater than a mesokurtic distribution. Leptokurtic distributions are identified by peaks that are thin and tall. The tails of these
distributions, to both the right and the left, are thick and heavy. Leptokurtic distributions are
named by the prefix "lepto" meaning "skinny."
K<3
The third classification for kurtosis is platykurtic. Platykurtic distributions are those that have a
peak lower than a mesokurtic distribution. Platykurtic distributions are characterized by a certain flatness to the peak, and have slender
tails. The name of these types of distributions come from the meaning of the prefix "platy"
meaning "broad."
Statistics Handouts
Page 23 of 92
FORMULAS FOR UNGROUPED DATA
Data Set 1: 115 115 120 120 120 125 125 130 300
Data Set 2: 115 115 120 120 120 125 125 125 130 130
Numerical Measures
Computation
Data 1 Data 2
1. Mean = = Xi/N
2. Median
3 .Mode is determined by mere
inspection.
4. Variance
2 = Xi2 - 2
N Where is the mean of the ungrouped data
5. Standard Deviation = positive
square root of variance
6. Coefficient of Variation
CV = [ / ] x 100%
7. Measure of Skewness
SK =
)(3 MedianMean
Statistics Handouts
Page 24 of 92
Numerical Measures
Computation
Data 1 Data 2
8. Pi =
9. Di =
10. Qi =
Note: MEDIAN
If n is odd, the median position equals (n=1)/2, and the value of the (n+1)2th
observation in the array is taken as the median, i. e.,
Md = X( [n/1] / 2)
If n is even, the mean of the two middle values in the array is the median, i.e.,
Md =
where n is the number of samples
Statistics Handouts
Page 25 of 92
FORMULAS FOR GROUPED DATA
Data Set
TCB
LTCB UTCB
CM
(Xi)
Freq
(fi)
CF
<
Σ fi Xi = 256.7
Σ fi Xi2 = 1783.7
2.65 – 3.75 3.75 – 4.85
4.85 – 5.95 5.95 – 7.05 7.05 – 8.15
8.15 – 9.25
3.2 4.3
5.4 6.5 7.6
8.7
5 4
8 3 12
8
5 9
17 20 32
40
40
Numerical Measures
Computation
1. mean () = fiXi, where
N
fi = frequency of the ith class
Xi= classmark of the ith class N = total no. of observation
K = number of classes
2. median (Md)
= LTCBMd + c
Md
b
F
CFN
2
where LTCBMd = LTCB of the median class
C = class size <CFb = <CF of the class preceding
median class
FMd = frequency of the median class
N = total number of observations
NOTE: the middle class is the class which
contains the (n/2)th value of the array
Statistics Handouts
Page 26 of 92
3. mode (Mo)
= LTCBMo + c
abMo
bMo
FFF
FF
2
where LTCBMo = LTCB of the modal class C = class size FMo = frequency of the modal class Fb = frequency of the class preceding the
modal class Fa = frequency of the class following the
modal class
NOTE: the modal class is the class with the highest frequency
4. Variance ( 2)
= N
fiXi 2
2 where
fi = freq. Of the ith class
Xi= classmark of the ith class
N = total number of observations
2G = mean of the grouped data
5. Standard deviation () = positive
square root of the variance
6. Coefficient of Variation
CV = [ / ] x 100%
7. Measure of Skewness
SK =
)(3 medianmean
Statistics Handouts
Page 27 of 92
7. Percentiles
Pi = LTCBPi + c
Pi
b
F
CFNi
)100
(
where LTCBPi = LTCB of the PI class
C = class size
<CFb = <CF of the class preceding Pi class
FMd = frequency of the PI class
N = total number of observations
8. Deciles
Di = LTCBDi + c
Di
b
F
CFNi
)10
(
where LTCBDi = LTCB of the Di class
C = class size
<CFb = <CF of the class preceding Di
class
FMd = frequency of the Di class
N = total number of observations
9. Quartiles
Qi = LTCBQi + c
Qi
b
F
CFNi
)4
(
where LTCBQi = LTCB of the Qi class
C = class size
<CFb = <CF of the class preceding Qi
class
FMd = frequency of the Qi class
N = total number of observations
Statistics Handouts
Page 28 of 92
FORMULAS:
1. Mean () = N
fixi
2. Median (Md) = LTCBMd + c
Md
b
F
CFN
2
7. Variance (2) =
N
fiXi 2
2
3. Mode (Mo) = LTCBMo + c
abMo
bMo
FFF
FF
2
8. Standard Deviation () = iancevar
4. Pi = LTCBPi + c
Pi
b
F
CFNi
)100
(
9. CV = %100x
5. Di = LTCBDi + c
Di
b
F
CFNi
)10
(
10. SK =
)(3 medianmean
6. Qi = LTCBQi + c
Qi
b
F
CFNi
)4
(
Statistics Handouts
Page 29 of 92
THE BOXPLOT
Definition. The boxplot is a graph that is very useful for displaying the following features of the data:
Location
Spread
Symmetry
extremes
outliers
Steps in Constructing Boxplot
1. Construct a rectangle with one end of the first quartile and the other end at
the third quartile 2. Put a vertical line across the interior of the rectangle at the median 3. Compute for the interquartile range (IQR), lower fence (FL) and the upper fence
(FU) given by: IQR = Q3 – Q1
FL = Q1 – 1.5 IQR FU = Q3 – 1.5 IQR
4. Locate the smallest value contained in the interval [FL , Q1]. Draw a line from
this value to Q1. 5. Locate the largest value contained in the interval [Q3 , FU]. Draw a line from
this value to Q3. 6. Values falling outside the fences are considered outliers and are usually
denoted by “x”
Remarks:
1. The height of the rectangle is arbitrary and has no specific meaning. If several
boxplots appear together, however, the height is sometimes made proportional to the different sample sizes.
2. If the outlying observation is less than Q1 – 3 IQR or greater than Q3 + 3 IQR it
is identified with a circle at their actual location. Such an observation is called a far outlier.
Statistics Handouts
Page 30 of 92
Examples:
1. Data Set A: 1 15 21 22 24
10 18 22 23 25 14 20 22 24 28
2. Data Set B: 3 10 11 12 19 8 10 12 16 19
9 10 12 16 30
Statistics Handouts
Page 31 of 92
More Problems: 1. Suppose a teacher assigns the following weights to the various course requirements:
Assignment 15% Project 25% Midterms 20% Finals 40%
The maximum score a student may obtain for each component is 100. Sheila obtains marks of 83 for assignment, 72 for project, 41 for midterms and 49 for the finals. Find her mean mark for the score.
2. Two of the quality criteria in processing butter cookies are the weight and color
development in the final stages of oven browning. Individual pieces of cookies are scanned by a spectrophotometer calibrated to reflect yellow-brown light. The readout is expressed in per cent of a standard yellow-brown reference plate and a value of 41 is considered optimal (golden-yellow). The cookies were also weighed in grams at this stage. The means and standard deviations of 30 sample cookies are presented below.
Mean sd
Color 41.1 10 Weight 17.7 3.2
Which of the two quality criteria is more varied? 3. The following are weight losses (in pounds) of 25 individuals who enrolled in a five-week
weight-control program:
2 3 3 4 4 4 5 5 6 7 7 8 8 8 9 9 9 9 10 10 10 11 11 11 12
Compute for the 3rd quartile, 7th decile, and 89th percentile.
Statistics Handouts
Page 32 of 92
Exercise #3 – Numerical Descriptive Measures Objectives: A1t the end of the exercise, the student is expected to identify and compute appropriate numerical descriptive measures for ungrouped and grouped data, specifically,
measure of central tendency
measure of dispersion; and
measure of skewness
A. Using your raw data set and the FDT you constructed in exercise # 2, compute for the appropriate descriptive measures (ungrouped and grouped). Show solution for grouped data only.
B. Construct these tables in your worksheets and summarize the values obtained.
I. Measure of Central Tendency
Mean Median Mode
ungrouped grouped ungrouped grouped ungrouped grouped
II. Measure of Dispersion
Range Variance Standard Deviation Coeff. Of Variation ungrouped grouped Ungrouped grouped ungrouped Grouped ungrouped Grouped
III. Measure of Skewness
ungrouped Grouped
IV. Fractiles
P90 D6 Q3
Ungrouped Grouped Ungrouped Grouped Ungrouped Grouped
Statistics Handouts
Page 33 of 92
Lesson # 4 – Weighted Means
Weighted Means
Weighted Mean is a statistical measure obtained when data is gathered from a survey
questionnaire using the Likert Scale
A Likert scale is a psychometric scale commonly used in questionnaires and is the most widely
used scale in survey research. When responding to a Likert questionnaire item, respondents
specify their level of agreement to a statement.1 A Likert item is simply a statement the
respondent is asked to evaluate according to any kind of subjective or objective criteria.
Generally, the level of agreement or disagreement is measured. Often five ordered response
levels are used, although many psychometricians advocate using seven or nine levels. A recent
empirical study2 found that a 5- or 7- point scale may produce slightly higher mean scores
relative to the highest possible attainable score, compared to those produced from a 10-point
scale, and this difference was statistically significant.
Strategies: 5- Very Effective, 4- Effective,3-Moderately effective/Undecided,…
Practices: 5- Highly Observed/Always/Fully Aware, 4- Observed/Sometimes/Aware,…
Traits/Attitudes: 5-Very Evident, 4-Somewhat Evident, 3-Undecided, 2-Somewhat inevident, 1-
Not evident
1 http://en.wikipedia.org/wiki/Likert_scale
2 Dawes, John (2008). "Do Data Characteristics Change According to the number of scale points used? An experiment using
5-point, 7-point and 10-point scales". International Journal of Market Research 50 (1): 61–77.
Statistics Handouts
Page 34 of 92
Table 1. Illustration of a Likert Scale Questionnaire
Research Title: Solid Waste Management of Ateneo de Naga University
Below is a list of Solid Waste Management practices. Please check the boxes with the
appropriate number corresponding to your chosen answer as to how these are practices are observed. Scale: 5 - Very High 4 - High 3 - Moderate 2 - Low 1 - Very Low
5 4 3 2 1
A. GENERATION OF WASTE
Ateneo de Naga University
1.Provides information through campaigns or
seminars about solid waste generation
2. Introduces strategies on how to apply the 4R's
( Reuse, Recycle, Reduce and Respond ) of Solid Waste
Management
3. Provides campaign to patronize the use of reusable
and recycled materials
4. Rejects products which are harmful to the
environment such as foam, styrofoam, CFC aerosols,
oil-based paints, pesticides, insecticides, plastics,
wood preservatives, glues and adhesives
5. Encourages the use of unused side of old papers or
recycles its own paper ( as shown by the exam papers
used, handouts, memo, letters, etc)
6. Encourages or requires the use of refillable inks for pens, ballpens, printers, etc..
7. Allows the use of old notebooks from previous years
instead of requiring new ones
8. Encourages to reuse envelopes, boxes, packaging
materials and folders
9. Repairs or disposes defective computers in
laboratories or offices
Statistics Handouts
Page 35 of 92
Table 2. Tallied Data
5 4 3 2 1 Weighted
Means
A. GENERATION OF WASTE
Ateneo de Naga University 1.Provides information through campaigns or seminars about solid waste generation 2. Introduces strategies on how to apply the 4R's ( Reuse, Recycle, Reduce and Respond ) of Solid Waste Management 3. Provides campaign to patronize the use of reusable and recycled materials 4. Rejects products which are harmful to the environment such as foam, styrofoam, CFC aerosols, oil-based paints, pesticides, insecticides, plastics, wood preservatives, glues and adhesives 5. Encourages the use of unused side of old papers or recycles its own paper ( as shown by the exam papers used, handouts, memo, letters, etc) 6. Encourages or requires the use of refillable inks for pens, ballpens, printers, etc.. 7. Allows the use of old notebooks from previous years instead of requiring new ones 8. Encourages to reuse envelopes, boxes, packaging materials and folders 9. Repairs or disposes defective computers in laboratories or
offices
0
2
6
0
7
1
2
6
0
6
8
8
5
6
1
3
11
2
12
10
22
7
12
4
4
18
3
38
29
38
34
33
41
42
27
43
64
71
46
74
62
73
69
53
72
Cumulative Weighted Mean
Source: Valenzuela 2007, p.66
Statistics Handouts
Page 36 of 92
Table 3
Adjectival Interpretation of the Likert Scale (cumulative mean)
Rating Scale
Range
Interpretation
5 4 3
2 1
4.20 – 5.00
3.40 – 4.19
2.60 – 3.39
1.80 – 2.59
1.00 – 1.79
Very High – Almost all indicators are
practiced
High – 75% of the indicators were practiced
Moderate – 50% of the indicators were
practiced
Low – 25% of the indicators were practiced
Very Low – almost none of the indicators were practiced
Table 4
Adjectival Interpretation of the Likert Scale (per item)
Rating Scale
Range
Interpretation
5 4 3 2 1
4.20 – 5.00
3.40 – 4.19
2.60 – 3.39
1.80 – 2.59
1.00 – 1.79
Very High – Almost all respondents practice
the said indicator
High – 75% of the respondents
Moderate – 50% of the respondents
Low – 25% of the respondents
Very Low – almost none of the respondents…
Statistics Handouts
Page 37 of 92
Table 5 .
Extent of Solid Waste Management in AdeNU ( faculty and students) , 2007
Weighted
Mean
Interpretation
A. GENERATION OF WASTE
Ateneo de Naga University
1.Provides information through campaigns or seminars about
solid waste generation
2. Introduces strategies on how to apply the 4R's ( Reuse,
Recycle, Reduce and Respond ) of Solid Waste Management
3. Provides campaign to patronize the use of reusable and recycled materials
4. Rejects products which are harmful to the environment such as
foam, styrofoam, CFC aerosols, oil-based paints, pesticides,
insecticides, plastics, wood preservatives, glues and adhesives
5. Encourages the use of unused side of old papers or recycles its
own paper ( as shown by the exam papers used, handouts, memo,
letters, etc)
6. Encourages or requires the use of refillable inks for pens, ballpens, printers, etc..
7. Allows the use of old notebooks from previous years instead of
requiring new ones
8. Encourages to reuse envelopes, boxes, packaging materials and folders
9. Repairs or disposes defective computers in laboratories or
offices
1.67
1.68
2.08
1.52
1.86
1.47
1.56
2.04
1.46
Very Low
Very Low
Low
Very Low
Low
Very Low
Very Low
Low
Very Low
Cumulative Weighted Mean
1.7
Very Low
Statistics Handouts
Page 38 of 92
Generation of Waste
The extent of performance of SWM practices of students and faculty on the area of generation
of wastes is given in Table 5. The results show the respondents’ mean, based on the nine (9)
indicators used, ranged from 1.4 to 2.08 or from “ very low” to “low” ratings. The respondents gave
an overall mean that resulted to “very low” to the following indicators: “provides information through
campaigns or seminars about SWM (1.67)”, “introduces strategies on how to apply the 4R's of Solid
Waste Management (1.68)”,, “rejects products which are harmful to the environment such as foam,
Styrofoam, CFC aerosols, oil-based paints, pesticides, insecticides, plastics, wood preservatives, glues
and adhesives (1.52)” , “encourages the use of refillable ink (1.47)”, “allows the use of old notebooks
(1.56) “ and “repairs or disposes defective computers (1.46)”. The “very low” also implied that almost
none of the respondents observe the mentioned practices.
On the indicators stating that “provides campaign to patronize the use of reusable and
recyclable materials (2.08)”, “encourages the use of unused side of old papers or recycles its own
paper (1.86)”, “encourages or requires the use of refillable materials (3.2)”,and “encourages to reuse
envelopes, boxes, packaging materials and folders (2.04)” had an overall mean of “low”. Only 25% of
the respondents observe the mentioned indicators.
The students and faculty gave an overall weighted mean that resulted to “very low”. In
totality, the cumulative mean score resulted to 1.7. The result implied that almost none of the
indicators were being observed under the generation component of SWM.
Survey results reveal that there was a need for intensive information campaign about SWM
and that the University had yet to implement strategies on how to apply the 4R’s. Such an outcome
presents an opportunity to promote waste-saving measures among the student and teaching
population in the AdeNU in line with the future promotion of the 4R’s.
Statistics Handouts
Page 39 of 92
Exercise # 4 -Weighted Means
A. For the raw data given, obtain the weighted mean for each item and the
cumulative/total weighted mean.
B. Interpret the cumulative/total weighted mean.
C. What is the highest and lowest obtained weighted means. Interpret the values.
D. Conclusion. Make a discussion on the result of the test base on the objective of the study.
Rating Scale
Range of The
Likert’s Scale
Interpretation
5
4
3
2
1
4.20 – 5.00
3.40 – 4.19
2.60 – 3.39
1.80 – 2.59
1.00 – 1.79
Extremely Characteristic of Me – Almost all
indicators are evident.
Somewhat Characteristic of Me – 75% of the
indicators are evident.
Neither Un/Characteristic of Me – 50% of the
indicators are evident.
Somewhat Uncharacteristic of Me – 25% of the
indicators are evident.
Extremely Uncharacteristic of Me – almost
none of the indicators are evident.
Statistics Handouts
Page 40 of 92
Problem Set Thesis title: Portable Games and Devices towards Aggressive Behavior of the First Year BS Digital
Animation Students of Ateneo de Naga University Objective: To determine the level of influence of playing Portable Games and Devices on the behavior
specifically aggressiveness of the respondents
Table 1
Results from the Standard Questionnaire by Buss and Perry.
Indicators
5
4
3
2
1
Weighted
Means
1. Some of my friends think I am a
hothead.
18 12 15 12 13
2. If I have to resort to violence to protect
my rights, I will.
17 21 10
15 7
3. When people are especially nice to me, I
wonder what they want.
14 17 15 17 7
4. I tell my friends openly when I disagree
with them.
17 28 10 10 5
5. I have become so mad that I have broken
things.
10 17 14 15 14
6. I can’t help getting into arguments when
people disagree with me.
16 18 14 13 9
7. I wonder why sometimes I feel so bitter
about things.
9 23 15 17 6
8. Once in a while, I can’t control the urge
to strike another person.
12 16 10 16 16
9. I am an even/tempered person. 18 21 15 13 3
10. I am suspicious of overly friendly strangers.
11 19 17 13 10
Cumulative Weighted Mean
Statistics Handouts
Page 41 of 92
Lesson # 5 – Sampling
SAMPLE SIZE DETERMINATION
Slovin’s Formula: 21 Ne
Nn
Where n = sample size N = population size
e = margin of error (usually at 5%)
A researcher would want to make a socio-economic survey of a school with a population of 5000 students. If he allows a margin of error of 5%, how many
students must he take into sample?
n = 2)05.0(50001
5000
= )0025(.50001
5000
= 5.121
5000
= 5.13
5000
= 37.370 ~ 370
Important: Samples should be as large as a researcher can obtain with a
reasonable expenditure of time and energy. A recommended minimum number of subjects is 100 for a descriptive study, 50 for a correlational, and 30 in each group for experimental and causal- comparative study.
Statistics Handouts
Page 42 of 92
SAMPLING METHODS
Random Sampling Methods
Nonrandom Sampling Methods
every element in the population
has an equal chance of being
chosen
example: The dean of a school
of education in a large
midwestern university wishes
to find out how her faculty feel
about the sabbatical leave
requirements at the university.
She places all 150 names of the
faculty in a hat, mixes them
thoroughly , and then draws
out the names of 25 individuals
to interview.
not all elements are given a equal
chance of being included in the
sample
some elements may be deliberately
ignored (that is, giving them no
chance at all) in the choice of
elements for the sample
example: The manager of the
campus bookstore at a local
university wants to find out how
students feel about the services of
the bookstore provides. Every day for
two weeks during her lunch hour,
she asks every person who enters
the bookstore to fill out a short
questionnaire she has prepared and
drop it in a box near the entrance
before leaving. At the end of the two-
week period, she has a total of 235
completed questionnaires.
Statistics Handouts
Page 43 of 92
I. RANDOM SAMPLING METHODS
A. Simple Random Sampling (SRS) – is a method of selecting n units out of N units in the population in such a way that every distinct sample of size n
has an equal chance of being drawn.
Required : complete list of the elements of the population Features : each and every number of the population has an equal
chance of being chosen When to use : population size is not very large
population is homogeneous
Procedures : i. Lottery method/Chip-in-the-box/Fish-in-the-Bowl ii. Table of Random Numbers iii. Calculator/computer generated random numbers
Illustration: Table of Random Numbers
011723 223456 222167 912334 379156 233989
086401 016265 411148 059397 022334 080675
666278 106590 879809 051965 004571 036900 063045 786326 098000
560132 345678 356789 727009 344870 889567
000037 121191 258700 667899 234345 076567
Statistics Handouts
Page 44 of 92
B. Stratified Sampling – the population of N units is first divided into
subpopulations called strata. Then a simple random sample is drawn from each stratum, the selection being made independently in different strata.
Required : complete list of the elements of the population
Features : representative for each strata or subgroups of the
population are randomly chosen as elements of the sample
When to use : Population size is large; Population is heterogeneous but
elements can be grouped into homogeneous strata ; When we want
representative for each strata or subgroups
Procedure: Given a population N = 365, the researcher grouped the
respondents according to gender where there are 219 females and 146
males. Using stratified sampling, how many respondents will be obtained
from each strata?
N = 365 , use Slovins formula to get the sample size n
n = 2)05.0(3651
365
= )0025(.3651
365
= 9125.01
365
= 9125.1
365
= 190.849 ~ 191
Statistics Handouts
Page 45 of 92
Researcher identifies 2 subgroups or strata
219 females (60% = 365
219) 146 males (40% =
365
146)
using Slovins we compute the required sample size n,
then we multiply it by the percentage
191 x 0.60 191 x 0.40
Population of 365
115 females 76 males
Statistics Handouts
Page 46 of 92
C. Cluster Sampling – a method of sampling where a sample of distinct groups, or clusters, of elements is selected and then a census of every element
in the selected clusters is taken.
Features : population is grouped into clusters or small units
composed of population elements; each cluster
contains as varied a mixture as possible and at the
same time one cluster is nearly as alike as the other
: Sometimes referred to as an area sample because it
is frequently applied on a geographical basis, blocks
in a community or city are occupied by heterogeneous
groups
When to use : large population
: list of all members of the population is not available;
only a population list of clusters is required.
Procedure : 50 barangays in Naga City
Randomly choose 3 barangays
Statistics Handouts
Page 47 of 92
C. Multi-stage Sampling – the population is divided into hierarchy of
sampling units corresponding to the different sampling stages. In the first
stage of sampling, the population is divided into primary stage units (PSU)
then a sample of PSUs is drawn. In the second stage of sampling, each
selected PSU is subdivided into second-stage units (SSU) then a sample of
SSU is drawn. The process of subsampling can be carried to a third stage
fourth stage and so on, by sampling the subunits instead of enumerating
them completely at each stage.
Features :this technique uses several stages or phases in
getting sample from the general population
When to use : conducting nationwide surveys or any survey involving
a large universe
Statistics Handouts
Page 48 of 92
Illustration of Multistage Sampling:
Philippines (17 regions)
Choose randomly 5 regions
R1 R2 R3 R4 R5
Choose randomly 2 provinces for each region
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
Choose randomly 1 city for each province
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
Choose randomly 2 barangays for each city
Then choose randomly 5 households for each barangay
Statistics Handouts
Page 49 of 92
Populations
CDE
MNO
MNO
F G
A H G K L I D W E R
T Y U O P S F G H J
Z X C V B N M
A B C D E
25%
F G H I J
K L M N O
50%
P Q R S T
25%
CDE
MNO
M N
O C M
B C 25%
F H M O 50%
Q S 25%
MNO
F G
SIMPLE RANDOM STRATIFIED
SAMPLING
Statistics Handouts
Page 50 of 92
Populations
C D
I J K
C D
AB
CDE
FG
HKL MNO
AB
G H
E F
D A
AB
FG
HKL
A B
CLUSTER SAMPLING
TWO-STAGE SAMPLING
Statistics Handouts
Page 51 of 92
II. Non-random sampling
A. Convenience - chooses sample at the researcher’s convenience
example. To find out how students feel about food service in the student union at an East Coast university, the manager stands outside the main door of the cafeteria one Monday morning and interviews the
first 50 students who walk out of the cafeteria B. Purposive - use their judgement to select a sample that
they believe will provide the data they need example. A graduate student wants to know how retired people aged 65 and over feel about their “golden years”. He has ben told by one of
his professors, an expert on aging and the aged population, that the local Association of Retired Workers is a representative cross section of retired people age 65 and over. He decides to interview a sample of 50
people who are members of the association to get their views. C. Quota - sets a sample size then chooses the
respondents without setting criteria. The researcher proceeds to fill the prescribed quota. The researcher is left to his own convenience or preference.
D. Snowball
REASONS FOR USING NON-RANDOM SAMPLING
a. Some might use this technique because they just want to get a “feel” of the market before launching or producing a certain product.
b. Lack of logistics or inadequate knowledge in the use of random methods
c. The validity of the sample is based on the soundness of the judgement of whoever make the choice.
Example. One would naturally use judgement instead of
randomness in the choice of people who will work for a company.
Statistics Handouts
Page 52 of 92
Lesson # 6 – FPC, Permutations and Combinations
Definition. FUNDAMENTAL PRINCIPLE OF COUNTING. If one event can occur in m different ways, and if, after it has happened in one of these ways, a second event can occur in n different ways, and then both events can occur, in the order stated, in m
x n different ways.
Examples 1. If there are seven doors providing access to a building, in how many ways can a
person enter the building by one door and leave by a different door?
2. How many even three-digit numbers can be formed fro the digits 1, 2, 3,4 and 5 if each digit can be used only once?
3. How many different arrangements, each consisting of five different letters, can be
formed from the letters of the word “PERSONAL” if each arrangement is to begin and end with a vowel?
4. How many different arrangements of five distinct books each can be made on a
shelf with space for five books?
5. Suppose that there are 3 math books and 3 physics books, how many different
arrangements of the six books can be made on a shelf if books on the same subject are to be kept together?
6. How many ways can a 10-question true-false exam be answered?
Statistics Handouts
Page 53 of 92
Definition. PERMUTATION (nPr). Let S be a set containing n elements and suppose r
is a positive integer such that r < n. Then a permutation of r elements of s is an arrangement in a definite order, without repetitions of r elements of s.
Theorem 1. The number of permutations of n elements taken r at a time is given by either of the following formulas:
a. nPr = n(n-1)(n-2) … (n-r+1) b. nPr = n! / (n-r)!
Special case: nPn = n!
Examples: 1. A bus has six vacant seats. If three additional passengers enter the bus, in how
many different ways can they be seated?
2. In how many ways can 3 boys and 3 girls be seated in a row containing six seats
if a. a person may sit in any seat
b. boys and girls must sit in alternate seats?
Theorem 2. If we are given n elements, of which exactly m1 are of one kind, exactly m2 are alike of a second kind, …, and exactly mk are alike of a kth kind, and if
n=m1 + m2 + .. + mk, then the number of distinguishable permutations that can be made of the n elements taking them all at one time is
Examples:
1. Determine the number of different nine-digit numerals that can be formed from the digits 6,6,6,5,5,5,4,4 and 3.
2. How many permutations can be formed from the word HONOLULU?
Statistics Handouts
Page 54 of 92
Definition. COMBINATION (nCr). Let s be a set containing n elements, and suppose r
is a positive integer such that r< n. then a combination of r elements of s is containing r distinct elements.
Theorem 3. The number of combinations of n elements taken r at a time is given by
nCr = nPr / r!
= n! / (n-r)!r!
Theorem 4. NCr = nCn-r
Examples: 1. A football conference consists of 8 teams. If each team plays every other team,
how many conference games are played?
2. A student has 10 posters to pin up on the walls of her room, but there is space for only 7. In how many ways can she choose the posters to be pinned up?
3. How many committees of five can be formed from 7 sophomores and 5 freshmen
if each committee is to consist of 3 sophomores and 2 freshmen?
consist of at least 3 sophomores?
at most 3 sophomores?
Statistics Handouts
Page 55 of 92
Exercise #5 – FPC, Combinations and Permutations Objectives: At the end of the exercise, the student is expected to be able to: 1. Count the number of ways an event may possibly occur by:
a. listing all possible outcomes in the sample space corresponding to the event; and b. using the method of counting.
2. Solve problems requiring the applications of the concept of permutation and combination.
I. Show complete solution for each. 1. How many different outcomes are possible in a roll of 3 dice? In tossing 2 coins?
In rolling 2 dice and tossing 3 coins simultaneously? 2. How many distinct permutations can be made from the word FOOL? List them
down. 3. Package of 10 game boy sets contains 3 defective sets. If 5 sets are to be picked
out randomly and sent to a customer for an inspection, in how many ways can
the customer find at least two defective set? 4. How many different telephone numbers can be formed from a seven-digit number
if the first digit cannot be zero?
5. A college freshman must take a science course, a humanities course, and a math course. If she may select any of 6 science courses, any of 4 humanities, and any
of 4 math courses, how many ways can she set her program? 6. A shelf contains 3 books in red binding, 4 books in blue and 2 in green. In how
many different orders can they be arranged if all the books of the same color
must be kept together? 7. How many different numbers greater than 200 can be formed from the digits
1,2,3,4 and 5 (a)if repetitions are not allowed? (b) repetitions are allowed? 8. How many committees of 5 can be selected from 12 republicans and 8 democrats
(a) if it must contains 2 republicans and 3 democrats? (b)if it must contains at
least 3 republicans? 9. There are 8 baseball teams in a league. How many games will be played if each
team play each of the other teams 40 times?
10. In how many ways can one make a selection of 5 black balls, 3 red balls, and 2 white balls from a box containing 8 black balls, 7 red balls and 5 white balls?
11. The tennis squad of one college consists of 8 players that if another consist of 10 players. In how many ways can a doubles match between the 2 institutions be arranged?
12. In how many ways can one make selection 4 novels, 3 biographies and 6 detective stories from a shelf containing 10 novels, 8 biographies and 10 detective stories.
Statistics Handouts
Page 56 of 92
Lesson #7 – Probability
PROBABILITY
SAMPLE SPACE is the set of all possible outcome of a given experiment.
A subset of the sample space of an experiment is called an EVENT associated
with the experiment Definition. PROBABILITY OF AN EVENT. If S is the sample space of an experiment
and E is an event associated with the experiment, the probability of E, denoted by P(E), is defined by
P(E) = . n(E) . where n(E) are the numbers of elements in E and S respectively. n(S)
Furthermore, if P(E)= 0 then the event will never happen or it is an “impossible”
event. If P(E) = 1, the event is certain to happen or it is a “sure” event. Examples:
1. Determine the probability of each of the following events: a. Obtaining a 4 on a throw of a single die b. Obtaining a head on a toss of a coin
2.
a. a. If 2 dice are thrown, what
is the probability of obtaining a sum of 8? a sum of 3?
1 2 3 4 5 6
1 (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
2 (2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
3 (3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
4 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6)
5 (5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
6 (6,1) (6,2) (6,3) (6,4) (6,5) (6,6)
3. Determine the probability of each of the following events
a. Drawing a heart from a deck of 52 playing cards b. Drawing 4 spades in succession from a deck of 52 playing cards if after each
card is drawn it is not replaced in a deck
4. If a French, Spanish, Russian and English books are placed at random on a shelf
with a space for 4 books, what is the probability that the Russian and English books will be next to each other?
Statistics Handouts
Page 57 of 92
CONJUNCTION AND DISJUNCTION PROBABILITIES
Definition. CONJUNCTION PROBABILITY. This type of probability is associated with events happening together, one event and another event occurring at the same time. Events, however, may be independent or dependent
Case 1. P(A and B) = P(A) x P(B) When the occurrence of one event does not influence the probability of the
occurrence of the other event, these events are said to be independent. Example. At birth the probability that US female will survive to age 65 is
approximately 7/10. The probability that a male will survive to age 65 is approximately 3/5. What is the probability that both male and female
will be alive at age 65?
What is the probability that only the male will be alive at age 65?
What is the probability that at least one of the two will be alive at age 65?
Case 2. P(A and B)= P(A) x P(B/A) When the occurrence of one event is conditioned by the other event, these
events are said to be conditional. Example. Suppose a box contains 30 fuses 5 of which are defective. What
is the probability of drawing at random two defective fuses in succession if the first fuse that has been drawn is not returned before making the second draw?
Statistics Handouts
Page 58 of 92
Definition. DISJUNCTION PROBABILITY. This type of probability is associated with
several events that happen either separately or simultaneously. Disjunction probability is concerned with “either or” relationship.
Case 3. P(A or B) = P(A) + P(B When the events do not have common sample points, they are said to be mutually exclusive. Example. What is the probability that in a single toss of a two dice, the
sum will be 5 or 8?
Case 4. P(A or B) = P(A) + P(B) – P(AB) There are also cases of joint events which are not mutually exclusive because there are some elements common to both events. Example. What is the probability of getting a sum of 5 or a sum greater
than 4 in a throw of two dice?
Example. Take a math class with 52 students, 27 of whom are males and the rest are females. A total of 21 of the males and 15 of the females got a grade above 90. What is the probability that if a student is chosen
at random, this student has either grade of above 90 or is a male?
Statistics Handouts
Page 59 of 92
PROBABILITIES INVOLVING QUALITATIVE DATA IN CONTINGENCY TABLE
When the data are presented in the form of frequencies and are classified
according to qualitative rather than quantitative categories, they are called qualitative data in contingency tables.
Illustration:
Vegetarian Status
Gender
Vegetarian
Non Vegetarian
Total
Male
20 23 43
Female
22 25 47
Total
42 48 90
1. To find the probability of a single event from qualitative data, simply divide the subtotal of the desired event by the grand total.
P(A) = subtotal/ grand total Example. The probability that a person is vegetarian
2. To find the conjunction probabilities of two independent events from qualitative data, divide the observed frequency where the two events intersect by the grand total.
P(A and B) = observed freq. of the two events intersection . Grand total
Example. The probability that a person is female and a vegetarian
Statistics Handouts
Page 60 of 92
3. To find the probabilities of two dependent events from qualitative
data, divide the observed frequency where the two events intersect by the subtotal of the event which is used as a condition
P(A and B) = observed freq. of the two events intersection . Subtotal of the conditional events
Example. The probability of getting a male at random provided that he is a non- vegetarian
4. To find the disjunction probabilities of the two events P(A or B) = Subtotal of 1st event . + . subtotal of 2nd event .
grand total grand total
– Observd Freq. Of Intersectx
grand total
Example. The probability of getting a female or a person who is a non vegetarian
Statistics Handouts
Page 61 of 92
Exercise # 6 - Probability Objectives: At the end of the exercise, the student is expected to be able to apply the different operations on probability
II. Show complete solution for each.
1. On a throw of two dice, what is the probability of obtaining a sum that at most 10?
2. If a single card is drawn from deck of 52 playing cards, what is the probability of each of the following events: (a) obtaining a red card; (b) obtaining a diamond;
and (c) obtaining an ace or heart?
3. A committee of 5 is to be selected from 10 seniors and 8 juniors. What is the
probability that the committee is to consist of at most 3 seniors?
4. A number of two different digits is to be formed from the digits 1,2,3,4 and 5.
Determine the probability of each of the following events:
a. the no. is odd b. no. is greater than 25
5. A student guesses his answers on a 3-question test. What is the probability that he will get a. two correct and one wrong answer
b. at least two correct c. all wrong
d. at most two correct e. two correct and last answer is wrong
Statistics Handouts
Page 62 of 92
6. Classification of Patients in a Hospital
Pregnant Elderly Children
Male 0 27 35 62
Female 28 49 11 88
28 76 46 150 What is the probability that a patient chosen at random from among the 150 will be:
a. pregnant b. female or elderly c. female and elderly
d. male or a child e. male provided that he is elderly
f. child given male
Statistics Handouts
Page 63 of 92
PROBABILITY DISTRIBUTIONS
Concept of a Random Variable
Definition. A function whose value is a real number determined by each element n the sample space is called a random variable.
Remark. We shall use an uppercase letter, say X, to denote a random variable and its corresponding lowercase letter, x in this case, for one of its value.
Example (Experiment #1): An experiment consists of tossing a coin 3 times and observing the result. The possible outcome and the values of the random variables X
and Y, where X is the number of heads and Y is the number of heads minus the number of tails are
Sample Points X Y HHH 3 3
HHT 2 1 HTH 2 1
HTT 1 -1 THH 2 1 THT 1 -1
TTH 1 -1 TTT 0 -3
DISCRETE AND CONTINUOUS PROBABILITY DISTRIBUTIONS
Definition. If a sample space contains a finite number of possibilities or an unending
sequence with as many elements as there are whole numbers, it is called
a discrete sample space. Definition. A random variable defines over a discrete sample space is called a
discrete random variable
Definition. If a sample space contains an infinite number of possibilities equal to the number of points on a line segment, it is called a continuous sample space.
Definition. A random variable defines over a continuous sample space is called a continuous random variable.
Statistics Handouts
Page 64 of 92
Discrete Probability Distributions
Definition. A table or formula listing all possible values that a discrete random variable can take on, along with the associated probabilities, is called a discrete probability distribution.
Remark. The probabilities associated with all possible values of a discrete random
variable must sum to 1.
Examples. For Experiment #1, the discrete probability distributions of the random variables X and Y are
x 0 1 2 3
P(X = x) 1/8 3/8 3/8 1/8
Y -3 -1 1 3
P(Y = y) 1/8 3/8 3/8 1/8
Continuous Probability Distribution
Definition. The function with values f(x) is called a probability density function for the continuous random variable X, if
*the total area under its curve and above the horizontal axis is equal to 1; and
*the area under the curve between any two ordinates x=a and x=b gives the probability that X lies between a and b.
Remarks: 1. A continuous random variable has a probability of zero of assuming exactly any
of its values, that is, if X is a continuous random variable, then P(X=x) = 0 for all real numbers x.
2. The probability random variable X that can assume values between 0 and 2 has a density function given by
f(x) = {
Statistics Handouts
Page 65 of 92
Expected Values
Definition. Let X be a discrete random variable with probability distribution
x x1 x2 … xn
P(X = x) f(X1) f(X2) … f(Xn)
The mean or expected value of X is
( ) ∑ ( )
Examples:
1. Find the mean of the random variables X and Y of Experiment No. 1
x 0 1 2 3
P(X = x) 1/8 3/8 3/8 1/8
E(X) = (0)(1/8) + (1)(3/8) + (2)(3/8) + (3)(1/8) = 12/8 or 1.5
Y -3 -1 1 3
P(Y = y) 1/8 3/8 3/8 1/8
E(Y) = (-3)(1/8) + (-1)(3/8) + (1)(3/8) + (3)(1/8) = 0
Statistics Handouts
Page 66 of 92
Definition. Let X be a random variable with mean then the variance of X is
( ) ( )
Definition. Let X be a discrete random variable with probability distribution
x x1 x2 … xn
P(X = x) f(X1) f(X2) … f(Xn)
The variance of X is
( ) ( ) ∑( ) ( )
Example:
In experiment No. 1, find the variance of X.
Using the definition of Var(X),
E(X) = 1.5
( ) ( ) ∑ ( ) ( )
= ( 0 – 1.5)2 (1/8) + ( 1 – 1.5)2 (3/8) + ( 2 – 1.5)2 (3/8) + ( 3 – 1.5)2 (1/8)
= 0.75
Example. A used car dealer finds that in any day, the probability of selling no car is 0.4, one car is 0.2, two cars is 0.15, 3 cars is 0.10, 4 cars is 0.08, five cars is 0.06 and six cars is 0.01. Let g(X) = 500 + 1500X represent the salesman’s daily earnings,
where X is the number of cars sold. Find the salesman’s expected daily earnings.
Statistics Handouts
Page 67 of 92
Lesson # 8 – Normal Distribution
PROPERTIES OF A NORMAL CURVE The normal distribution is represented by a normal curve. A normal curve is
bell-shaped figure, has the following six properties:
1. It is symmetrical about X .
2. The mean is equal to the median, which is also equal to mode. 3. The tail or ends are asymptotic relative to the horizontal line 4. The total area under the normal curve is equal to 1 or 100%
5. The normal curve area may be subdivided into at least three standard scores each to the left and to the right of the vertical axis.
6. Along the horizontal line, the distance from one integral standard score to the next integral standard score is measured by the standard deviation.
AREA UNDER THE NORMAL CURVE
In making use of the properties of the normal curve to solve certain types of statistical problems, one must first learn how to find areas under the normal curve.
The first step in finding areas under the normal curve is to convert the normal curve of any given variable into a standardized normal curve by using the
formula:
X XZ
S
where Z = standard score
X = mean
S = Standard deviation X = given value of a particular variable
WORDED PROBLEMS: 1. Given a normal distribution with mean 350 and standard deviation s=40, find the
probability that x assumes a value greater than 362.
Statistics Handouts
Page 68 of 92
2. An electrical firm manufactures light bulbs that have a length of life that is
normally distributed with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that a bulb burns between 778 and 834
hours
3. On an examination the average grade was 74 and the standard deviation was
7. If 12% of the class are given A’s, and the grades are curved to follow a normal distribution, what is the lowest possible A and the highest possible B?
Find D6.
4. The quality grade-point averages of 300 college freshmen follow approximately
a normal distribution with a mean of 2.1 and a standard deviation of 0.8. How
many of these freshmen would you expect to have a score
a. between 2.4 and 3.5? b. greater than 3.8? c. less than 1.7?
Statistics Handouts
Page 69 of 92
Exercise # 7 – Normal Distribution
Objectives: At the end of the exercise the student should be able to: 1.Find probabilities using the standard normal probability curve; 2. Apply the concepts of finding areas under the normal probability curve in solving
problems
I. Find the probability. a. P( z < -1.257 f. P( z > 0.85) k. P(1.33 < z < 1.56) b P( z < 1.65) g. P( z > 0.69) l. P(-1.48 < z < 2.04)
c. P( z < 0.92) h. P( z > 3.01) m. P(-0.58 < z < 1.05) d. P( z < -2.02) i. P( z > 2.84) n. P(-0.92 < z < 0.07) e. P( z < -1.24) j. P( z > 0.53) o. P(-1.45 < z < 1.87)
II. Find the unknown constant a given the area under the normal curve. a. P(z < a) = 0.25 b. P(z > a) = 0.99
III. Solve the following problems.
a. Given a normal distribution variable X with mean 18 and standard
deviation 2.5, find
i. P(X < 15) ii. P(17 < X < 21) iii. the value of k such that P(X < k) = 0.2578;
iv. the value of k such that P(X > k) = 0.1539
b. If a set of grades on a statistics exam are approximately normally distributed with a mean of 74 and a standard deviation of 7.9, find
i. the lowest passing grade if the lowest 10% of the students are given F’s;
ii. the highest B if the top 5% of the students are given A’s;
c. A soft drink machine is regulated so that it discharges an average of 200
milliliters per cup. If the amount of drink is normally distributed with a = 15 milliliters,
i. What is the probability that a cup contains between 180 and 230 milliliters?
ii. How many cups will likely to overflow if 220 milliliter cups are used
to the next 1000 drinks? iii. Below what value do we get the smallest 35% of the drinks?
Statistics Handouts
Page 70 of 92
Lesson # 9 – Estimation
ESTIMATION - refers to any process by which sample information is used to predict or estimate
the numerical value of some population measure.
- The formula, function or procedure used in estimating a population parameter is
called an estimator. The value obtained with the use of the estimator is the estimate.
- Two types of estimators: point estimator and interval estimator. A point estimator yields a numerical value of the estimate. An interval estimate gives a range or band of values within which the value of the parameter is estimated to lie.
INTERVAL ESTIMATION OF THE POPULATION MEAN
An interval estimate of ( or any parameter) incorporates a measure of the
confidence in the reliability of the range or interval of values within which the parameter is estimated to lie. Thus, an interval estimate is also called a confidence
estimate, and its limits, confidence limits.
Where
= level of significance
1- = level of confidence
( ) 1P X k X k
. .s en
2
( . .)k Z s e
Statistics Handouts
Page 71 of 92
Example.
1.The mean IQ of a random sample of 400 high school students is 110. The standard
deviation of the population of IQ scores is 16. If the population is normally distributed, find:
a. a .95 confidence interval estimate of
b. a .90 confidence interval estimate of
Find the .90 confidence interval estimate of the mean weight of all the pupils in a certain school if a random sample of 25 pupils has a mean weight of 70lbs with a
standard deviation of 15lbs. Assume the population weights to be normally distributed.
2
1.96Z
2
1.64Z
2
1.711t
Statistics Handouts
Page 72 of 92
3. The contents of 7 similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8, 10.0,
10.2 and 9.6 liters. Find a 95% confidence interval for the mean content of all such containers, assuming an approximate normal distribution for containers contents. (
)
4. The mean and standard deviation for the quality grade-point averages of a
random sample of 36 college seniors are calculated to be 2.6 and 0.3, respectively. Find the 99% confidence interval for the mean of the entire senior class. Interpret
the obtained confidence interval. ( )
5. The manager of a home delivery service for pizza pies wants an estimate of the
average time it takes to deliver an order within the town proper of the City of Naga. A sample of 25 deliveries had a Mean time of 15 minutes and a standard deviation of 4 minutes. Construct a 95% confidence interval for the average time for all deliveries.
Interpret the interval obtained. ( Z = 1.96 )
6. A random sample of 12 students in a certain dormitory showed an average weekly expenditure of P400 for snack foods, with a standard deviation of P50.25.
Construct a 90% confidence interval for the average amount spent each week on snack foods by female students living in this dormitory, assuming the expenditure
to be approximately normally distributed. Interpret your confidence interval. ( t = 1.796)
Statistics Handouts
Page 73 of 92
Lesson # 10- Test of Hypothesis
COMMON TERMS IN INFERENTIAL STATISTICS
A HYPOTHESIS is a statement, which aims to explain facts about the real
world. A test of hypothesis is a two-way decision problem. It is a procedure to substantiate or invalidate a claim which is stated as null hypothesis
Definition. A NULL HYPOTHESIS (Ho) is the hypothesis that we hope to accept or reject; must always express the idea of non significance of
difference An ALTERNATIVE HYPOTHESIS (Ha). The rejection of Ho is the acceptance of this hypothesis.
TYPE I and TYPE II ERROR
Decision Ho is TRUE Ha is TRUE
Reject Ho Type I error Correct decision Accept Ho Correct decision Type II error
Type I error ( error) – when we reject the null hypothesis when in fact
the null hypothesis is true.
Type II error ( error) – when we accept the null hypthesis when in fact
the null hypothesis is false.
ONE-TAILED AND TWO-TAILED TEST
Definition. When the rejection region located at only one extreme of the range of values for the test statistics, the test is ONE-TAILED. If Ha is a
statement of non-equality represented by the sign , then the hypothesis is non-directional, thus we have a two-tailed test.
Statistics Handouts
Page 74 of 92
Steps in Test of Hypothesis: i. State the hypotheses, Ho and Ha. ii. Determine the appropriate test statistic to use iii. Choose the level of significance and formulate the decision rule iv. Compute the value of statistic from the sample data v. Make a decision (reject or accept) in accordance with the decision rule
formulated vi. Draw a conclusion in relation to the objective of the original problem
I. Mean of a Single Population
Case 1. Z Test
a. Hypotheses: Ho: 0 against
A. Ha: 0 or
B. Ha: 0
C. Ha: 0
i. Test Statistic : Z Test j. Computation:
0XZc
n
k. Decision Rule: At a level of significance ,
A. For Ha: 0 reject Ho if /Zc/ > 2
Z , otherwise accept Ho.
B. For Ha: 0 reject Ho if Zc < -Z otherwise accept Ho.
C. For Ha: 0 reject Ho if Zc > Z otherwise accept Ho.
Statistics Handouts
Page 75 of 92
Example 1. The weight of crabs is normally distributed with mean 28.5
ounces and standard deviation of 3 ounces. A new breeder claims that he can breed crabs yielding a mean weight of more than 28 ounces. A
random sample of 16 crabs from the new breeder had a mean weight of
29.2 ounces. At = 5%, do the data support the breeders claim?
i. Ho : = 28.5
Ha: > 28.5
ii. Test Statistic: Z Test
iii. Decision Rule : Reject Ho if Zc > Z otherwise accept Ho.
iv. Computation:
√
√
Z = 1.645
v.Decision: Since Zc < Z (0.933 = 1.645), accept Ho.
vi. Conclusion: At 5 % level of significance, there is no enough evidence to
support the new breeders claim OR the mean weight of the samples is not significantly different from the mean of 28.5.
Example 2. For the past five years, the mean height of AdeNU students is 60 inches. A simple random sample of 100 is taken from the present students. It was found that the mean height is 65 inches with a standard
deviation of 4 inches. Is there reason to believe that the mean height of present AdeNU students different from the past five years at 5% level of significance?
Statistics Handouts
Page 76 of 92
Case 2. T Test
a. Hypotheses: Ho: 0 against
D. Ha: 0 or
E. Ha: 0
F. Ha: 0
l. Test Statistic : T Test
m. Computation:
XTc
s
n
n. Decision Rule: At a level of significance ,
D. For Ha: 0 reject Ho if /Tc/ > [ , 1]
2n
T
, otherwise accept Ho.
E. For Ha: 0 reject Ho if Tc < -T, n otherwise accept Ho.
F. For Ha: 0 reject Ho if Tc > T, n otherwise accept Ho.
Example 3. A softdrink vending machine is set to dispense 6 ounces per cup. If the machine is tested eight times, yielding a mean cup fill of 5.8 ounces with a standard deviation of 0.16 oz. Is there evidence at 5%
level of significance that the machine is underfilling cups. Assume normality.
i. Ho : = 6
Ha: < 6
ii. Test Statistic: T Test
iii. Decision Rule : reject Ho if Tc < -T, otherwise accept Ho.
iv. Computation:
√
√
-T, n = -T[0.05,7] = -1.895
Statistics Handouts
Page 77 of 92
v. Decision: Since -3.536 < -1.895, reject Ho.
vi. Conclusion: At 5 % level of significance, there is evidence to say that the machine is under filling the cups.
Example 4. The monthly output of a plywood manufacturers was measured
in nine randomly selected months. The results obtained (in tons) are 100, 120, 100, 102, 130, 140, 150, 140 and 145. Test the hypothesis that the mean monthly output is 140 tons against the alternative that it is not 140
tons at 10%level of significance. Assume that the monthly output is normal random variable.
Statistics Handouts
Page 78 of 92
Exercise # 8 – Test of Hypothesis ( Z and T Test)
A. Carry out a complete test of hypothesis for the following problems. 1. A certain brand of powdered milk is advertised as having net weight of 250
grams. If the net weights of a random sample of 10 cans are 253, 248, 252,245,247,249,251,250,247 and 248 grams, can it be concluded that the
average net weight of the cans is less than the advertised amount? Use = 0.01 and assume that the net weight of this brand of powdered milk is normally
distributed. 2. In a time and motion study, it was found that the average time required by
workers to complete a certain manual operation was 26.6. A group of 20 workers was randomly chosen to receive a special training for two weeks. After the training it was found that their average time was 24 minutes and a standard
deviation of 3 minutes. Can it be concluded that the special training speeds up
the operation? Use = 0.05
3. The manager of an appliance store, after noting that the average daily sales was only 12 units, decided to adopt a new marketing strategy. Daily sales under this strategy were recorded for 90 days after which period the average was found to be
15 units with a standard deviation of 4 units. Does this indicate that the new
marketing strategy increased the daily sales? Employ = 0.01
4. The daily wages in a particular industry are normally distributed with a mean of P66.00. In a random sample of 144 workers of a very large company in this industry, the average daily wage was found to be P62.00 with a standard
deviation of P12.50, can this company be accused of paying inferior wages at the 0.01 level of significance?
Statistics Handouts
Page 79 of 92
II. Two Population Means – T Test
A. Dependent or Paired/ Independent
i. Ho: population mean of A is equal to population mean of B Ha: The population means are not equal
ii. Decision rule: Reject Ho if p-value < level of significance
Or t-computed > t-value, otherwise accept Ho.
III. ANOVA
Sample Problems: a. A researcher wishes to know if there are differences on the average preparation
time of four methods of preparing a solvent. b. An agriculturist may compare the average yields of three corn varieties used by Los
Banos c. A consumer wish to know if the different brands of gasoline in the market are
equally good with respect to average mileage d. A medical researcher is interested in comparing the effectiveness of 3 different
treatments to lower the cholesterol of patients with high values e. An ecologist wants to compare the amount of certain pollutant in five rivers
i. Ho: There is no difference between groups
Ha: There is difference between groups
i. Decision rule: Reject Ho if p-value < level of significance Or f-value > critical value, otherwise accept Ho.
IV. Chi-Square Test-t of Independence
This test is usually applied on enumeration data or data in contingency tables.
It tests the association or independence of one variable from another variable.
i. Ho: The two variables are independent Ha: The two variables are dependent.
ii. Decision rule: Reject Ho if p-value < level of significance
Or X2 value > critical value, otherwise accept Ho.
Statistics Handouts
Page 80 of 92
SAMPLE PROBLEMS
Two Population Means - T test
A. Dependent or Paired
1. In a study of the effectiveness of physical exercise in weight reduction, a
simple random sample of 8 persons engaged in a prescribed program of physical exercise for one month showed the ff. Results:
Weight Before
209
178
169
212
180
192
158
180
Weight
After
196
171
170
207
177
190
159
180
At 1% level of significance, do the data provide evidence that the prescribed program of exercise is effective?
a. Ho: The weights before and after are equal therefore the procedure is not
effective.
Ha: The weights before and after are not equal therefore the procedure is
effective.
b. Decision rule: Reject Ho if T-computed > critical value, otherwise accept Ho
at 1% level of confidence.
c. Test Statistics: T-test on Two Populations
d. Computation: T-computed = 2.07 Critical value = 3.499
e. Decision: Accept Ho.
f. Conclusion: At 1% level of significance, there is sufficient evidence to say
that the program is not effective.
Statistics Handouts
Page 81 of 92
B. Independent
2. Some statistics students complain that pocket calculators give other
students advantage during statistics examination. To check this
contention, a simple random sample of 45 students were randomly assigned to two groups, 23 to use calculators and 22 to perform
calculations by hands. The students then took a statistics examination that required a modest amount of arithmetic. The results are shown below:
With Calculator
85 86 89 84 82 83 90 91 86 90 87 87 92 85 86 89 88 88 89 90 85 89 90
Without Calculator
86 88 90 92 86 85 88 89 85 91 86 85 92 84 83 88 90 91 86 90 86 87
Do the date provide sufficient evidence to indicate that the students taking
this particular examination obtain higher scores when using a calculator? Test at = 10%.
a. Ho: The mean scores are equal. Ha: The mean scores are not equal.
b. Decision rule: Reject Ho if T-computed > critical value, otherwise accept Ho.
c. Test Statistics: T-test on Two Populations
d. Computation: T-computed = 0.25 Critical value = 1.303
e. Decision: Accept Ho.
f. Conclusion: At 10% level of significance there is no enough evidence to say
that the use of calculators will assure students of higher scores.
Statistics Handouts
Page 82 of 92
ANOVA
3. A study was conducted to compare the three teaching methods. Three
groups of 6 students were chosen and each group is subjected to one of three types of teaching method. The grades of the students taken at the end of the semester are given as:
Group I
Method A
Group II
Method B
Group III
Method C
Student 1 84 70 90
Student 2 90 75 95
Student 3 92 90 100
Student 4 96 80 98
Student 5 84 75 88
Student 6 88 75 90
a. Ho: The three teaching methods are equal. Ha: The three teaching methods are not equal.
b. Decision rule: Reject Ho if F-computed > critical value, otherwise accept Ho.
c. Test Statistics: F-test ANOVA
d. Computation: F-computed = 13.121 Critical value= 3.68
e. Decision: Reject Ho.
f. Conclusion: There is evidence to say that the three methods are not
equal. We can also conclude that Method III is more effective since it students got
higher grades compared to the other two methods.
Statistics Handouts
Page 83 of 92
Chi-Square Test of Independence
4. It is believed that people with high blood pressure need to watch their weight. A random sample of 300 subjects was classified according to their weight and blood pressure. At the 5% level of significance, is there
sufficient evidence to conclude that a person’s weight is related to his blood pressure?
Blood Pressure
Weight High Normal Low
Overweight Normal
Underweight
40 36
16
34 77
33
18 27
19
a. Ho: Weight is independent with blood pressure or weight is unaffected by
blood pressure or the two variables weight and blood pressure are
independent.
Ha: Weight is dependent with blood pressure or weight is affected by blood pressure or the two variables weight and blood pressure are dependent.
b. Decision rule: Reject Ho if X2-computed > critical value, otherwise accept Ho.
c. Test Statistics: Chi-square Test
d. Computation: X2-computed = 12.75 Critical value = 9.49
e. Decision: Reject Ho.
f. Conclusion: At 5% level of significance, there is evidence to say that weight is affected by blood pressure. For overweight persons, most of them (approximately 40% of the actual population) will have higher blood pressure.
For normal weight person, they are most likely to have normal blood pressure. Those who are underweight will also most likely to have normal blood pressure.
Statistics Handouts
Page 84 of 92
Exercise # 9 – Test of Hypothesis (T-test, ANOVA and Chi-Square Test) Objectives: At the end of the exercise, the student is expected to be able to apply the appropriate statistical procedure in performing test of hypothesis of various problems Carry out a complete test of hypothesis for the following problems.
1. As part of a study to determine the effects of a certain oral contraceptive on
weight gain, 12 healthy females were weighed at the beginning of a course
of oral contraceptive usage. They were reweighed after three months. Do the results suggest evidence of weight gain? Use = 0.05
Subject 1 2 3 4 5 6 7 8 9 10 11 12
Initial
Weight
120 141 130 162 150 148 135 140 129 120 140 130
3-Month
Weight
123 143 140 162 145 150 140 143 130 118 141 132
Source: Basic Statistics for Health Sciences by Kuzma
d. Ho:
Ha:
e. Test Statistic:
f. Decision Rule:
g. Computation: computed value = 1.75
Critical value = 2.201 h. Decision:
i. Conclusion:
Statistics Handouts
Page 85 of 92
2. An investment analyst claims to have mastered the art of forecasting the
price changes of gold. The ff. Table gives the actual gold price changes and
the changes forecasted by the investment analyst (in%) on a simple random sample of 8 months. Use a = 5%.
Month 1 2 3 4 5 6 7 8
Actual Price Changes 7.3 -2.1 8.5 -1.5 9.2 6.7 -4.8 -0.8
Forecasted Changes 14.9 -19.7 7.0 -5.3 1.0 -0.8 -8.3 6.7
a. Ho:
Ha:
b. Test Statistic: o. Decision Rule:
p. Computation: Computed value = 1.15 Critical value = 2.365
q. Decision:
r. Conclusion:
Statistics Handouts
Page 86 of 92
3. Four groups of 4 patients each were subjected to four different types of
treatment fort he same ailment. The following data are on the number of days that elapsed before that were completely cured. What conclusions may be
drawn about the four types of treatment?
Treatment A
Treatment B
Treatment C
Treatment D
Patient 1 10 11 3 6
Patient 2 9 11 4 10
Patient 3 6 18 5 8
Patient 4 7 6 7 11
a. Ho:
Ha:
b. Test Statistic:
c. Decision Rule:
d. Computation: Computed value = 3.474
Critical value = 3.49
e. Decision:
f. Conclusion:
Statistics Handouts
Page 87 of 92
4. Test if there is significant association between academic performance and
IQ
Table. Academic Performance and IQ of 100 Students
IQ
Academic
Performance
High
Average
Low
Total
Passed
Failed
31
1
45
4
4
15
80
20
Total
32
49
19
100
a. Ho:
Ha:
b.Test Statistic:
c.Decision Rule:
d.Computation: Computed value = 51.25 Critical value = 5.99
e.Decision:
f.Conclusion:
Statistics Handouts
Page 88 of 92
Lesson # 11 - TWO-FACTOR ANOVA
Example 1. A research study was conducted to examine the impact of eating a
high protein breakfast on adolescent’s performance during a physical education physical fitness test. Half of the subjects received a high protein breakfast and half were given a low protein breakfast. All of the adolescents, both male and
female, were given a fitness test with high scores representing better performance. Test scores are recorded below.
Males Females
High Protein Low Protein High Protein Low Protein
10
7 9 6
8
5
4 7 4
5
5
4 6 3
2
3
4 5 1
2
Statistical test results:
Treatment F -value F-critical
between (protein level) within (gender) among (interaction betwn
protein level and gender)
*8.89 *20.00 2.22
4.49 4.49 4.49
8.53 8.53 8.53
5% 1%
Ho : There is no difference on the performance between the two protein levels There is no difference on the performance between the two gender There is no interaction between protein levels and gender
Interpretation:
At 5% level of significance it can be concluded that there is significant difference on the performance for both protein level and gender. There was no
significant interaction effect. Based on this data, it appears that a higher protein diet results in a better fitness test scores. Additionally, young men seem to have a significantly higher fitness test score than women.
Statistics Handouts
Page 89 of 92
Seatwork:
1. Different typing skills are required for secretaries depending on whether one is
working in a law office, an accounting firm, or for research mathematical group at a major university. In order to evaluate candidate for this positions, an employment agency administers three distinct standardized typing samples. A time penalty has
been incorporated into the scoring of each sample based on the number of typing errors. The mean and standard deviation for each test, together with the score
achieved by a recent applicant, are given in Table below. For what type of position does this applicant seem to be best suited?
Sample Applicant’s
Score
Mean Standard
Deviation
Law Accounting Scientific
141 sec 7min 33min
180sec 10min 26min
30 sec 2min 5min
Statistics Handouts
Page 90 of 92
2. Researchers have sought to examine the effect of various types of music on
agitation levels in patients who are in the early and middle stages of Alzheimer’s disease. Patients were selected to participate in the study based on their stage of
Alzheimer’ s disease. Three forms of music were tested: easy listening, Mozart, and piano interludes. While listening to music, agitation levels were recorded for the patients with a high score indicating a higher level of agitation. Scores are recorded
below.
Early Stage Alzheimer Middle Stage Alzheimer
Piano
Interlude
Mozart
Easy
Listening
Piano
Interlude
Mozart
Easy
listening
21 24 22
18 20
9 12 10
5 9
29 26 30
24 26
22 20 25
18 20
14 18 11
9 13
15 18 20
13 19
Statistics Handouts
Page 91 of 92
3. A study examining differences in life satisfaction between young adults, middle
adult and older adult men and women was conducted. Each individual who participated in the study completed a life satisfaction questionnaire. A high score on
the test indicates a higher level of life satisfaction. Test scores are recorded below.
Male Females
Young Adult
Middle Adult
Older Adult
Young Adult
Middle Adult
Older Adult
4
2 3
4 2
7
5 7
5 6
10
7 9
8 11
7
4 3
6 5
8
10 7
7 8
10
9 12
11 13
Mean = 3 6 9 5 8 11
Statistics Handouts
Page 92 of 92
Lesson # 12 – Pearson Moment Correlation
Pearson Moment is one of the measures of correlation which quantifies the strength as well as direction of such relationship. The correlation coefficient (r) has
the following interpretation:
Scale ( +/ -) Decision
1.00 0.80 - 0.99
0.60 – 0.79 0.40 – 0.59 0.20 – 0.39
0.01 – 0.19 0.00
Perfect Relationship Very Strong Relationship
Strong relationship Moderate Relationship
Weak Relationship
Very Weak Relationship No relationship
Table. Result of AdNU Entrance Examinees of 20 Examinees
No. SAI RPM Math English 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20
52 84 113 92 98 91 52 116 101 83 65 96 94 89 91 92 101
97 89 96
25 40 90 90 80 80 15 40 60 15 10 95 80 65 45 80 95
95 80 95
47 48 58 47 54 56 52 68 69 48 52 54 54 56 54 64 58
56 56 58
21 11 29 14 17 19 18 38 22 16 16 19 15 20 21 17 33
17 11 27