+ All Categories
Home > Documents > Course Notes Statistics

Course Notes Statistics

Date post: 08-Nov-2014
Upload: sam-cinco
View: 185 times
Download: 4 times
Share this document with a friend
Elementary Statistics
Popular Tags:
Page 1: Course Notes Statistics
Page 2: Course Notes Statistics


• Defining What Statistics Really Is

1.1 Nature of Statistics

The term “Statistics” came from the Latin word ‘status’ which could be translated as ‘state’. The usage of this term only became popular during the 18th century where they defined Statistics as “the science of dealing with data about the condition of a state or community”. The practice of statistics could be traced back even from the early biblical times where they gather figures related to governance of the state for they realized the importance of these figures in governing the people.

Even until today, worldwide, governments have intensified their data gathering and

even widen the scope of their numerical figures due to the rise of more cost-efficient methods for collecting data. Some of the most popular figures that are being released by almost all countries are Gross National Product (GNP), Birth rates, Mortality Rates, Unemployment Rate, Literacy Rates and Foreign Currency Exchange Rates.

Also, the use of Statistics is not limited to government use only. Right now, almost all

business sectors and fields of study use statistics. Statistics serves as the guiding principle in their decision making and helps them come up with sound actions as supported by the analysis done in their available information.

Indicated below are some of the uses of Statistics in various fields:

Medicine: Medical Researchers use statistics in testing the feasibility or even the

efficacy of newly developed drugs. Statistics is also used to understand the spread of

the disease and study their prevention, diagnosis, prognosis and treatment


Page 3: Course Notes Statistics


Statistics is the branch of science that deals with the collection, presentation, organization, analysis and interpretation of data.

1.2 Basic Concepts

Economics: Statistics aids Economists analyze international and local markets by

estimating some Key Performance Indicators (KPI) such as unemployment rate,

GNP/GDP, amount of exports and imports. It is also used to forecast economic

fluctuations and trends.

Market Research: derives statistics by conducting surveys and coming up decisions

from these statistics through feasibility studies or for testing the marketability of a new


Manufacturing: use statistics to assure the quality of their products through the use of

sampling and testing some of their outputs

Accounting/Auditing: uses sampling techniques in statistics to examine and check

their financial books.

Education: Educators use statistical methods to determine the validity and reliability of

their testing procedures and evaluating the performance of teachers and students.

We normally hear the word “statistics” when people are talking about basketball or the vital

statistics of beauty contestants. In this context the word “statistics” is used in the plural form which

simply means a numerical figure. But the field of Statistics is not only limited to these simple figures and

archiving them. In the context of this course, the definition of “Statistics” is mainly about the study of

the theory and applications of the scientific methods dealing all about the data and making sound

decisions on this.

Sometimes, gathering the entire collection of elements is very tedious, expensive or even time-

consuming. Because of this data gatherers sometimes resort to collecting just a portion of the entire

collection of elements. The term coined for the entire collection of elements is called Population while

the subset of the population is referred as the Sample.

Page 4: Course Notes Statistics


Population is the collection of all elements under consideration in a statistical inquiry while the sample is a subset of a population.

The variable is a characteristic or attribute of the elements in a collection that can assume different values for the different elements. While an observation is a realized

value of the variable, and the collection of these observations is called the data.

THINK: Could you say that the entire population is also a sample?

The specification of the population of interest depends upon the scope of the study.

Let’s say that if we wish to know the average expenditure of all households in Metro Manila,

then the population of interest is the collection of all households in Metro Manila. If there is a

need to delimit the scope of the study due to some constraints, we could redefine the

population of interest. We could delimit the scope of the study to only specific city in Metro

Manila. With this the study would only include the collection of all households in ________


The elements of the population is not only limited to individuals, it can be objects,

animals, geographical areas, in other words, almost anything. Some examples of possible

populations are: the set of laborers in a certain manufacturing plant, the set of foreigners

residing on Boracay for a certain day, set of Ford Fiesta produced in the entire Philippines on a


In any studies involving the use of Statistics, there would be at least one attribute of the

element in the population which we would be studying. This attribute or characteristic is what

we call variable. Just like in the field of Mathematics, we normally denote a variable with a

single capital letter i.e. A, X, Z.

Example: The Department of Health is interested in determining the percentage of children below 12 years old infected by the Hepatitis B virus in Metro Manila in 2006.

Population: Set of all children below 12 years old in Metro Manila in 2006

Variable of Interest: whether or not the child has ever been infected by the Hepatitis B virus.

Possible Observations: Infected, Never Infected

Regardless of whether every element of the data on the population or sample is used, it is often still difficult to convey meaning to these observations is not summarized. This is the

Page 5: Course Notes Statistics


The parameter is a summary measure describing a specific characteristic of a population while a statistic is a summary measure describing a specific characteristic of the sample.

1.3 Fields of Statistics

Descriptive Statistics includes all the techniques used in organizing, summarizing, and presenting the data on hand, while Inferential Statistics includes all the techniques used

in analyzing the sample data that will lead to generalizations about a population from which the sample came from.

reason why it is important to condense these observations to a single figure to completely describe the entire data. This condensed value is what we call summary measure.

There are two major fields in Statistics. The first one is (i) Applied Statistics, this deals

mainly with the procedures and techniques used in the collection, presentation, organization,

analysis and interpretation of data. On the other hand, the second one is (ii) Mathematical

Statistics, which is concerned with the development of the mathematical foundations of the

methods used in Applied Statistics.

In this course, we would mostly deal with the basics of Applied Statistics. This field

could also by sub-divided into two areas of interest. These two are Descriptive and Inferential

Statistics. Both are definitive of their names.

To clarify, we may use descriptive statistics for population data or sample data. If we

are dealing with population data, then the results of the study are applicable only to the

defined population. In the same manner, if we use descriptive statistics to sample data, then

the conclusions are applicable only to the selected sample.

Page 6: Course Notes Statistics


1.4 Statistical Inquiry

Statistical Inquiry is a designed research that provides information needed to solve a research problem.

Oftentimes, researchers can now find an appropriate statistical technique that will help them

answer their research problems. This is because o the wide array of applications of the various statistical

techniques used in a statistical inquiry. Below is the diagram depicting the entire process of statistical


Step 1: • Identify the Problem

Step 2 • Plan the Study

Step 3 • Collect the Data

Step 4 • Explore the Data

Step 5 • Analyze Data and Interpret the Results

Step 6 • Present the Results

Page 7: Course Notes Statistics


• Theory without data is just an Opinion

2.1 Measurement

Measurement is the process of determining the value or label of the variable based on what has been observed.

Ratio level of measurement has all of the following properties :

a) the numbers in the system are used to classify a person/object into distinct, nonoverlapping, and exhaustive categories; b) the system arranges the categories according to magnitude; c) the system has a fixed unit of measurement representing a standard size throughout the scale; and d) the system has an absolute zero.

The data used for statistical analysis should always be accurate, complete, and up-to-

date because the information that we would get is only as good as the data that we have. Good quality data comes at a cost but if we have the assurance of obtaining essential information that answers our research problem then it is all worth it.

Naturally, our interpretation of the values in our data will depend on the measurement

system or the rule that we used to assign the values to the different categories of the variable.

In particular, it will depend on the relationship among the values used in the system. The

general classification used to describe the types of relationship among these values or

categories is what is known as “levels of measurement”.

The four levels of measurement are nominal, ordinal, interval and ratio level. It is

necessary to know the level of measurement used to measure a variable because this will help

in the interpretation of the values of the variables and choosing the suitable statistical

technique to use in the analysis.

Page 8: Course Notes Statistics


Interval Level of Measurement satisfies only the first three conditons of the ratio level of measurement.

Some examples of variables with ratio level of measurement are:

1. Distance traveled by a car (in km) 2. Height of a flag pole (in metres)

3. Weight of a whole dressed chicken (in kilograms)

Now we will discuss each of the properties that is required for a measuring scale to have in

order for it to be considered as having a ratio level of measurement:

a) The numbers in the system are used to classify a person/object into distinct

nonoverlapping, and exhaustive categories.

This first condition requires that we use categories that would place the observations

logically into one and only one category. This means that two objects assigned the same value

must belong in the same category and be placed in a different category if the characteristics of

interest is really different.

b) The system arranges the categories according to magnitude.

This second property requires that the measurement system must arrange the

categories according to either ascending or descending order.

c) The system has a fixed unit of measurement representing a standard size

throughout the scale.

The third property requires the scale to use a unit of measure that depicts a fixed and

determinate quantity. This means that a one-unit difference must have the same

interpretation wherever it appears in the scale.

d) The system has an absolute zero.

The fourth property requires the measurement system to have an absolute zero or the

true zero point. This means that the scale considers the value, “0” (zero) as the complete

absence of the characteristic itself. One example of this is any monetary measurement where

zero means that there is absolutely no money.

The only difference of the interval level of measurement to the ratio level of

measurement is the absence of the absolute zero value. This means that the interval level of

measurement considers “0” (zero) as a value like any other numbers and not as the absence of

Page 9: Course Notes Statistics


Ordinal Level of Measurement satisfies only the first two conditons of the ratio level of measurement.

Nominal Level of Measurement satisfies only the first property of the ratio level of measurement.

2.2 .1 Data Collection Methods

2.2 Collecting Data

the characteristic of interest. The most common example of this is measuring temperature in

Celsius or Fahrenheit where the value “zero” does not mean that there is no temperature.

The ordinal level of measurement only uses a scale that ranks or orders the observed

values in either ascending or descending order. The interval or simply the difference of the

scale from one point to another does not need to be equal all throughout the scale. For

example the ranking of the student in class according to their grades could be tagged as 1st,

2nd, 3rd, 4th and so on. The difference of the grade between the 1st student and the 2nd placed

student does not need to be of the same gap between the 4th placer and the 5th placer.

The nominal level of measurement is the weakest level of measurement among the

four. This is because its only aim is to classify the values into separate categories without

regards to the ordering of these categories in ascending or descending manner. Most often,

this level of measurement uses non-quantifiable categories like the different religions, zip code

or the student number.

The most commonly used methods for collecting data are: i.) Use of Documented Data,

ii.) Surveys, iii.) Experiments, and iv.) Observation.

Use of Documented Data

It is not necessary to use original data in conducting studies; sometimes it would make

things easier if the researcher uses the data that is already available if there is such one suitable

for the study.

The only dilemma with using documented data is its reliability and veracity. Therefore,

the researcher must look closely on the source of this data to have a measure on the reliability

Page 10: Course Notes Statistics


Primary Data are data documented by the primary source, meaning, the data collectors

themselves documented the data.

Secondary Data are data documented by a secondary source, meaning, an individual/agency, other than the data collectors, documented the data.

The Survey is a method of collecting data on the variable/s of interest by asking people questions. When data came from asking all the people in the population, then it is called

census. On the other hand, when the data came from asking a sample of people selected from a well-defined population, the it is called a sample survey.

The Experiment is a method of collecting data where there is direct human intervention

on the conditions that may affect the values of the variable of interest.

The Observation Method is a method of collecting data on the phenomenon of interest

by recording the observations made about the phenomenon as it actually happens.

of the data that would be used. Also, these documented data can be categorized in to two, the

primary data and the secondary data.


Another common method of collecting data is the survey. The people who answer the

questions in a survey are called the respondents. This method is much more expensive than

collecting data using documented stuff. Another problem of using surveys is that reliability of

the data depends mainly on the survey process itself, either from the respondent, the survey

design, questionnaire or if it is a personal interview there might be a problem with the

interviewer if he/she lacks training.


If the researcher is interested in something that involves cause-and-effect relationship,

conducting the experiment is most likely the suitable way of collecting data. The most

common experiment that is normally conducted during the primary level is the mongo seed

experiment. The aim of this experiment is to see the relationship of the growth of the mongo

in relation with sunlight exposure, amount of water and the type of soil.

Observation Method

Page 11: Course Notes Statistics

JDEUSTAQUIO 10 Type of Questions

2.2.2 The Questionnaire

The observation method is useful in studying the reactions and behavior of individuals or

groups of persons/objects in a given situation or environment as it happens, For example, a

researcher may use the observation method to study the behavior patterns of an indigenous

tribe which is difficult to be gathered using the other methods.

The questionnaire is an instrument for measuring which is used in various data collection methods (commonly used in surveys). The questionnaire may either be self- administered or interview-based which are both explanatory of their names.

A Closed-ended question is a type of question that includes a list of response categories from which the respondent will select his/her answer.

An Open-ended question is a type of question that does not include response categories.

Comparison of Open-Ended and Closed-Ended Questions

Open-Ended Closed-Ended





Respondent can freely answer

Can Elicit feeling and emotions of the respondent

Can reveal new ideas and views that the researcher might not have considered

Good for complex issues

Good for questions whose possible responses are unknown

Allow respondents to clarify answers

Get detailed answers Shows how respondent think

Facilitates tabulation of responses

Easy to code and analyze

Saves time and money

High response rate since it is simple and quick to answer

Response categories make questions easy to understand

Can repeat the study and easily make comparisons

Page 12: Course Notes Statistics

JDEUSTAQUIO 11 Response Categories for Close-ended Questions






Difficult to tabulate and code

High refusal late because it requires more time and effort on the respondent

Respondents need to be articulate

Responses can be inappropriate or vague

May threaten respondent

Responses have different levels of detail

Increases respondent to burden when there are too many or too limited response categories

Bias responses against categories excluded in the choices

Difficult to detect if the respondent misinterpreted the question

1. Two-way Question – provides only two alternative answers from which the

respondent can chose

Example: Have you ever traveled outside the country by any means of transportation?

Yes No

2. Multiple-choice Question – provides more than two alternatives from which the

respondent can only choose one.

Example: What is your marital status?

Never Married Married

Divorced/Separated Widowed

3. Checklist Question – provides more than two alternatives from which the respondent

can choose as many responses that apply to him/her.

Example: What kind/s of novel do you like to read?

Comedy Horror

Romance Non-fiction

Fantasy Mystery

Sci-Fi Others, please specify ____________

Page 13: Course Notes Statistics


4. Ranking Question – provides categories that respondents have to either arrange from

highest to lowest or vice versa depending upon a particular criterion.

Example: Below is a list of considerations in choosing and buying a new laptop. Put number

(1) beside the quality that you prioritize the most, (2) for the second priority and so on.

Prize [ ]

Brand [ ]

Quality [ ]

Durability [ ]

Style [ ]

Novelty [ ]

Warranty [ ]

5. Rating Scale Question – provides a graded scale showing all possible directions and

intensity of attitude of a respondent on a particular question or statement.

Example: How satisfied are you on the teaching method of your instructor in this course?

1 2 3 4 5

Very Dissatisfied

Dissatisfied Neutral Satisfied Very Satisfied

6. Matrix Question – a type of question which places various questions together to save space

in the questionnaire. It is like having any of the five earlier types of questions and squeezing more than one question in a form of a table. Example: For each statement, please indicate with a checkmark whether you agree or disagree with it

Statements Agree Disagree

Statistics is a very difficult subject

Only few people could understand Statistics

I would rather sleep than study Statistics at home

Page 14: Course Notes Statistics

JDEUSTAQUIO 13 Pitfalls to Avoid in Wording Questions

1. Avoid Vague Questions – State all question clearly. All respondents must have the same

interpretation to a question. If not, their answers will not be comparable, making it difficult

to analyze their responses.

Example: How often do you watch a movie in a movie theatre?

Very Often


Not too often


Problem: The word “often” is vague. Instead, you may ask how many times did he/she

watched a movie last month.

2. Avoid Biased Question – A biased question influences the respondents to choose a particular response over the other possible responses. Whether the bias is caused accidentally or intentionally, the data would become useless because it still failed to reveal the truth. Example: There are many different types of sport like badminton, basketball, billiards, bowling and tennis. Which type of sport d you enjoy watching?

Problem: The sports mentioned in the first sentence will be in the top of the minds of the respondents. It is likely for the respondents to choose from among these sports. This will result in a bias against the sports not mentioned in the list.

3. Avoid Confidential and Sensitive Questions – These questions usually offend the pride or

jeopardize the prestige of the respondent.

Example: Do you bring home office supplies? If yes, how often do you bring home office supplies?

Problem: The question may sound offensive to the pride of the respondent.

4. Avoid Questions that are difficult to answer – Do not ask questions that are too difficult for

the respondent to answer truthfully. Such questions would only encourage respondents to guess their answers, if not totally refuse to answer the question.

Example: If you are the president of the nation, what are you going to do to attain economic recovery?

Page 15: Course Notes Statistics


2.3 Sampling and Sampling Techniques

2.3.1 Basic Concepts

The target population is the population we want to study

The sampled population is the population from where we actually select the sample

5. Avoid Questions that are confusing or perplexing to answer – Sometimes a poorly written question can confuse the respondent on how to answer the question

Example: Did you eat out and watch a movie last weekend?

Problem: This is a double-barreled question, where you combine two or more question in to a single question. You should opt to separate this question into two to avoid confusion.

6. Keep the Questions short and simple – Long and complicated question can be difficult to

understand. The respondent may lose interest in the question because of its length or might have problem comprehending very long statement needed to understand the question.

As we have discussed on the previous Chapter 1, sample is the subset of a population. Some

people think that if we are basing our analysis on samples, why don’t we just guess our analysis entirely

without any data?

This question could be partially answered by a quote from Sir Charles Babbage, the Father of the

Computer who said that, “Errors using inadequate data are much less than those using no data at all”.

So now, before we can talk about the different sampling selection procedures, we need to

familiarize ourselves first with some terms.

It is good if the target and the sampled population have the same collection of elements. The

problem is that often times in life, expectations do not jive well with reality. One example where the

target and the sampled population would be different from each other is the case where the target

population is the collection of all the residents of Metro Manila. If we would be using a telephone

directory to select our sample, this collection would be very different from the target population since

this would exclude all the residents that have no landline.

Page 16: Course Notes Statistics


The sampling frame or frame is a list or map showing all the sampling units in the


Sampling error is the error attributed to the variation present among the computed values of the statistic from the different possible samples consisting of n elements.

Nonsampling errors is the error from other sources apart from sampling fluctuations

In any statistical inquiry, whether the data will come from a census or from a sample, it is

important that we are conscious of all the possible errors that we introduce (hopefully not intentionally)

in the results of the study. In order for us to do this and reduce these errors, we need to understand the

possible sources of errors, namely, the sampling errors and the nonsampling errors.

Note that the ONLY TIME that the sampling error would not be present is if we have conducted

a census. However, census results are NOT ERROR-FREE. Census and samples can both have

nonsampling errors (simply the errors not brought solely by sampling).

Diagram of the Various Sources of Error

Total Error

Nonsampling Error

Error in the implementation of

the sampling design

Selection Error

Frame Error

Population Specification


Measurement Error

Instrument Error

Response Error

Processing Error

Interviewer Bias

Surrogate Information


Sampling Error

Page 17: Course Notes Statistics


2.3.2 Methods of Probability Sampling

Probability Sampling is a method of selecting a sample wherein each element in the

population has a known, nonzero chance of being included in the sample; otherwise, it is a nonprobability sampling method. Simple Random Sampling

Simple Random Sampling (SRS) is a probability sampling method wherein all possible subsets consisting of n elements selected from the N elements of the population have the same chances of selection.

In simple random sampling without replacement (SRSWOR), all the n elements in the sample must be distinct from each other.

In simple random sampling with replacement (SRSWR), the n elements in the sample need not be distinct, that is, an element can be seleceted more than once as a part of the sample.

A nonzero chance of inclusion means that the sampling procedure must give all the

elements of the sample population an opportunity of being a part of the sample. All of

the elements that belong in the sampled population must be included in the selection


Another requirement of probability sampling is that we should be able to determine

the chance that an element will be included in the selected sample. Take note that the

probability of each element in the sampled population need not be equal to each


The most apparent example of SRSWOR that we could see every day on mass

media is the National lottery where the numbers that would be drawn must be distinct and

every number should have an equal chance of being selected in the draw.

Page 18: Course Notes Statistics

JDEUSTAQUIO 17 Stratified Sampling

Stratified sampling is a probability sampling method where we divide the population into nonoverlapping subpopulations or strata, and then select one sample from each stratum. The sample consists of all the samples in the different strata.

Visual representation of Simple Random Sampling without Replacement.

Stratified sampling, in general, simply requires the division of the population into

nonoverlapping strata, wherein each element of the population needs to belong to exactly one

stratum. Then each sample would be selected form the strata using any probability sampling

method. If simple random sampling used for each sample in the strata then this sampling is

called stratified random sampling.

Page 19: Course Notes Statistics


Visually, it might look something like the image below. With our population, we can easily

separate the individuals by color.

Once we have the strata determined, we need to decide how many individuals to select from

each stratum. The most common practice is that the number selected should be proportional.

In our case, 1/4 of the individuals in the population are blue, so 1/4 of the sample should be blue

as well. Working things out, we can see that a stratified (by color) random sample of 4 should

have 1 blue, 1 green and 2 red.

Page 20: Course Notes Statistics

JDEUSTAQUIO 19 Systematic Sampling

Systematic sampling is a probability sampling method wherein the selection of the first element is at random and the selection of the other elements in the sample is systematic by taking every kth element from the random start, where k is the sampling interval

To select a sample using systematic sampling, we need to perform the following steps:

1. Decide on a method of assigning a unique serial number, from 1 to N, to each one of the

elements in the population.

2. Choose n = sample size so that it is a divisor of N = population size. Compute for the sampling

interval k = N/n.

3. Select a number from 1 to k, using a randomization mechanism. Denote the selected number by

r. The element in the population assigned to this number is the first element of the sample.

4. The other elements of the sample are those assigned to the numbers r + k, r + 2k, r +3 k, and so

on, until you get a sample size of n.

5. In case that k = N/n is not a whole number; the first element would still be r but would be a

randomly chosen number from 1 to N instead k as used on the previous step.

By visual explanation, so to use systematic sampling, we need to first order our individuals, then select

every kth.

In our example, we want to use 3 for k? Can you see why? Think what would happen if

we used 2 or 4.

Page 21: Course Notes Statistics

JDEUSTAQUIO 20 Cluster Sampling

Cluster sampling is a probability sampling method wherein we divide the population into nonoverlapping groups or clusters consisting of one or more elements, and then select a sample of clusters. The sample will consist of all the elements in the selected clusters.

For our starting point, we pick a random number between 1 and k. For our visual, let's

suppose that we pick 2. The individuals sampled would then be 2, 5, 8, and 11.

To select a sample using cluster sampling, we need to perform the following steps:

1. Divide the population into nonoverlapping clusters.

2. Number the clusters in the population from 1 to N.

3. Select n distinct numbers from 1 to N using a randomization mechanism. The selected clusters

are the clusters associated with the selected numbers

4. The sample will consist of all the elements in the selected clusters.

Cluster sampling is often confused with stratified sampling, because they both involve "groups". In reality, they're very different. In stratified sampling, we split the population up into groups (strata) based on some characteristic.

In essence, we use cluster sampling when our population is already broken up into groups (clusters), and each cluster represents the population. That way, we just select a certain number of clusters.

Page 22: Course Notes Statistics


With our visual, let's suppose the 12 individuals are paired up just as they were sitting in the original population.

Since we want a random sample of size four, we just select two of the clusters. We would

number the clusters 1-6 and use technology to randomly select two random numbers. It might

look something like this:

Page 23: Course Notes Statistics

JDEUSTAQUIO 22 Multistage Sampling

Multistage sampling is a probability sampling method where there is a hierarchical configuration of sampling units and we select a sample of these units in stages.

2.3.3 Methods of Nonprobability Sampling

Unlike all the other previously presented sample selection procedures where the

process of sampling takes place in a single phase, we accomplish the selection of the elements

in the sample under multistage sampling after several stages of sampling. We first partition the

population into non-overlapping primary stage units (PSUs) and select a sample of PSUs. We

then subdivide the selected PSUs into non-overlapping second-stage units (SSUs) and select a

sample of SSUs. We continue the process until we identify the elements in the sample at the

last stage of sampling.

For example, consider a light-bulb example using two-stage sampling procedure. Let's

suppose that the bulbs come off the assembly line in boxes that each contains 20 packages of

four bulbs each. One strategy would be to do the sample in two stages:

Stage 1: A quality control engineer removes every 200th box coming off the line. (The plant

produces 5,000 boxes daily. (This is systematic sampling.)

Stage 2: From each box, the engineer then samples three packages to inspect. (This is an

example of cluster sampling.)

All sampling methods that do not satisfy the requirements of probability sampling are

considered as nonprobability sampling selection procedures. These methods do not make use

of randomization mechanism in identifying the sampling units included in the sample. It allows

the researcher to choose the units in the sample subjectively. And since the sample selection is

subjective, there is really no way to assess the reliability of the results without so much

assumptions (remember assumptions are very prone to mistakes).

Page 24: Course Notes Statistics

JDEUSTAQUIO 23 Haphazard or Convenience Sampling Judgement or Purposive Sampling Quota Sampling

Despite this drawback of nonprobability sampling, these methods are still more

commonly used since it is less costly and easier to administer.

Here are some of the most basic nonprobability sampling selection procedures:

In haphazard or convenience sampling, the sample consists of elements that are most

accessible or easier to contact. This usually includes friends, acquaintances, volunteers, and

subject who are available and willing to participate at the time of the study.

The most common example that we could see on the television is the text polls about a

certain issue. This type of sampling the opinion of the people doesn’t involve randomization

mechanism in the selection of the units in the sample. This is sometimes referred to as the

nonprobability counterpart of simple random sampling.

The elements are carefully selected to provide a “representative” sample. Studies have

demonstrated that selection bias can arise even with expert choice but nevertheless the

method may be appropriate for very small samples when the expert has a good deal of

information about the population-elements. The two common features of the method are: a.)

sampling units often consist of relatively large groups; and, b.) sampling units are chosen so

that they will provide accurate estimates for important control variables for which results are

known for the whole population and its hoped that it will give “good” estimates for other

variables that are highly correlated with the control variables. This sampling method may be

considered as the nonprobability counterpart of Cluster sampling.

This is considered as the nonprobability counterpart of stratified sampling. In this

method, interviewers are assigned quotas of respondents of different types to interview. The

quotas are sometimes chosen to be in proportion to the estimated population figures for

various types, often based on past census data. The researcher also chooses the groups or

strata in the study but the selection of the sampling units within the stratum does not make

use of a probability sampling method.

Page 25: Course Notes Statistics


2.4 Presentation of Data

2.4.1 Textual Presentation

Textual Presentation of data incorporates important figures in a paragraph of text.

After data collection, we organize and analyze the data, and then we present the results

of our analysis in some form that will allow us to reveal and highlight the important

information that we were able to extract. Unless we do this, we will only get lost in huge

mound of numbers and labels that we have collected.

Our grade school teachers already taught us this various kinds of presenting the data so

why do we need to study this again?

We may be familiar with the line chart and the bar chart but we need to learn or review

the basic principles of constructing a good table and a good graph. With good data

presentation, we can discover, and even explore possible relationships. Poor data presentation

will only mislead, deceive, and misinform. It is therefore essential that we remember to put a

more conscious effort to use these different methods of presentation properly in order to

maximize data description and analysis.

In textual presentation, it aims to direct the readers’ attention to some data that need

particular emphasis as well as to some important comparisons and to supplement with a

narrative account from a table or a chart.

It could also show the summary measures like minimum, maximum, totals and

percentages. We do not need to put all figures in a textual presentation; we just have to select

the most important ones that we want to focus on.

Example: The Philippine Stock Exchange composite index lost 7.19 points to 2,099.12 after trading

between 2,095.30 and 2,108.47. Volume was 1.29 billion shares worth 903.15 million pesos

(16.7milliondollars). The broader all share index gained 5.21 points to 1,221.34. (From: Free mandated

March 17, 2005)

When the data become voluminous, the textual presentation is strongly not advised

because the presentation becomes almost incomprehensible.

Page 26: Course Notes Statistics


2.4.2 Tabular Presentation

Tabular Presentation of data arranges figures in a systematic manner in rows and columns.

Tabular presentation is the most common method of data presentation. It can be used

for various purposes such as description, comparison, and even showing relationships between

two or more variables of interest.

We will discuss three types of presenting in tabular form, namely; Leader Work, Text

Tabulation and Formal Statistical table which is categorized according to their format and


Leader Work Leader work has the simplest layout among the three types of tables. It contains no table title or column headings and has no table borders. This table needs an introductory or descriptive statement so that the reader can understand the given figures.

The Population in the Philippines for the Census Years 1975 to 2000 is as follows a

1975 42,070,660

1980 48,098,460

1990 60,703,206b

1995 68,616,536b

2000 76,498,735 a National Statistics Office b The 1990 and 1995 figures include the household population, homeless population, and Filipinos in Philippines embassies

and mission abroad. In addition, the census comprise institutional population found living quarters such as penal institutions, orphanages, hospitals, military camps, etc.

As you can see, the above table would not be clear without the introductory statement. Likewise, both have no table numbers that we can use to refer to these figures. Thus, we use the leader work when there are only one or two columns of figures that we can incorporate as part of the textual presentation for a more organized presentation.

Text Tabu lat io n

The format of text tabulation is a little bit more complex than leader work. It already

has column headings and table borders so that it is easier to understand than leader work.

However it still does not have table title and table number. Thus, it also requires an

introductory statement so that the readers can comprehend the given figures. Similar to leader

work, we can place additional explanatory statement in the footnote.

Page 27: Course Notes Statistics


The Population in the Philippines for the Census Years 1975 to 2000 is as followa

Year No. of Filipinos

(in thousands)

1975 42,070.66

1980 48,098.46

1990 60,703.21b

1995 68,616.54b

2000 76,498.74 a National Statistics Office b The 1990 and 1995 figures include the household population, homeless population, and Filipinos in Philippines embassies and

mission abroad. In addition, the census comprise institutional population found living quarters such as penal institutions, orphanages, hospitals, military camps, etc.

Form al Statistical Table

The formal statistical table is the most complete type of table since it has all the

different and essential parts of a table like table number, table title, head note, box head, stub

head, column headings, and so on. It could be a stand-alone table since it does not need any

accompanying texts and it could be easily understood on its own.

Heading consists of the table number, title and head note. It is located on top of the table of figures.

i. Table number is the number that identifies the position of the table in a sequence. ii. Table title states in telegraphic form of the subject, data classification, and place and period

covered by the figures in the table.

iii. Head note appears below the title but above the top cross rule of the table and provides

additional information about the table.

Box head consists of spanner heads and columns heads.

i. Spanner head is a caption or label describing two or more column heads. ii. Column head is a label that describes the figures in a column.

iii. Panel is a set of column heads under the same spanner head.

Stub consists of row captions, center head, and stub head. It is located at the left side of the table.

i. Row caption is a label that describes the figures in a row. ii. Center head is a label describing a set of row captions.

iii. Stub head is a caption or label that describes all of the center heads and row captions. It is located on the first row.

iv. Block is a set of row captions under the same center head.

Page 28: Course Notes Statistics


Table 10.9 Employed Persons by Major Industry Group

January 2008 - October 2010

(in thousands)

Industry Group 2010 2009 2008

Oct Jul Apr Jan Oct Jul Apr Jan Oct Jul Apr Jan

Total 36,488 36,237 35,413 36,001 35,478 35,508 34,997 34,262 34,533 34,593 33,535 33,693

Agriculture 12,265 12,244 11,512 11,806 12,072 11,940 12,313 11,846 12,320 12,103 11,904 11,792

Agriculture, Hunting and Forestry

10,769 10,760 10,073 10,351 10,563 10,476 10,841 10,446 10,860 10,695 10,450 10,409

Fishing 1,496 1,484 1,439 1,455 1,509 1,464 1,472 1,400 1,460 1,408 1,454 1,383

Industry 5,375 5,409 5,487 5,322 5,154 5,273 5,088 4,856 5,078 5,130 5,000 4,981

Mining nd Quarrying 197 194 212 193 169 177 166 152 176 154 151 152

Manufacturing 3,058 3,003 3,063 3,009 2,937 2,947 2,841 2,849 2,897 2,960 2,883 2,963

Electricity, Gas and Water 163 141 137 157 160 145 130 134 123 146 123 126

Construction 1,957 2,071 2,075 1,963 1,888 2,004 1,951 1,721 1,882 1,870 1,843 1,740

Services 18,550 18,585 18,414 18,872 18,250 18,294 17,595 17,560 17,135 17,360 16,630 16,919

Wholesale & Retail Trade, Repair of Motor Vehicles, Motorcycles & Personal & Household Goods

7,158 7,030 6,885 7,064 6,901 6,725 6,681 6,635 6,528 6,599 6,322 6,333

Hotels and Restaurants 1,119 1,037 991 1,104 1,012 1,064 976 988 941 984 924 964

Transport, Storage and Communication

2,711 2,704 2,741 2,735 2,735 2,694 2,628 2,660 2,587 2,525 2,575 2,674

Financial Intermediation 412 420 383 384 375 376 389 337 373 369 366 364

Real Estate, Renting and Business Activities

1,239 1,166 1,061 1,119 1,100 1,090 1,023 1,044 985 969 953 904

Public Administration & Defense, Compulsory Social Security

1,771 1,835 1,959 1,823 1,771 1,772 1,794 1,659 1,690 1,741 1,661 1,612

Education 1,165 1,238 1,156 1,146 1,168 1,157 1,068 1,157 1,096 1,076 1,028 1,083

Health and Social Work 465 457 447 432 412 428 408 435 406 386 384 390

Other Community, Social & Personal Service Activities

855 866 984 949 868 876 907 857 796 847 843 846

Private Households with Employed Persons

1,954 1,831 1,804 2,114 1,908 2,110 1,718 1,785 1,733 1,863 1,572 1,747

Extra-Territorial Organizations & Bodies

1 1 3 2 0 2 3 3 * 1 2 2


1. Data were taken from the results of the quarterly rounds of the Labor Force Survey (LFS) using past week as reference period.

2. Details may not add up to totals due to rounding.

3. The definition of unemployment was revised starting the April 2005 round of the LFS. As such, LFPRs, employment rates and unemployment

rates are not comparable with those of previous survey rounds. Also starting with January 2007, estimates were based on 2000 Census-based


4. Data are as of January 2012.

p/ - preliminary

Source: National Statistics Office (NSO).


Table number


Head note

Stub head Spanner head

Column head Panel





r h



source note

Page 29: Course Notes Statistics


2.4.3 Graphical Presentation

Tabular Presentation of data portrays numerical figures or relationships among variables in pictorial form.

The graph or statistical chart is a very powerful tool in presenting data. It is an

important medium of communication because we can create a pictorial representation of the

numerical figures found in tables without showing too many figures.

We construct graphs not only for presentation purposes but also as an initial step in

analysis. The graph, as a tool for analysis, can exhibit possible associations among the variables

and can facilitate the comparison of different groups. It can also reveal trends over time.

The different types of statistical charts are line chart, vertical bar chart, horizontal bar

chart, pictograph, pie chart, and statistical map. It is important to know when and how to use

these different charts. The selection of the correct type of chart depends upon the specific

objective, the characteristic of the users, the kind of data, and the type of device and material

on hand.

Line Chart

The line chart is useful for presenting historical data. This chart is effective in showing the movement of a series over time. As shown in the figures below, the movement can be increasing, decreasing, stationary, or could be fluctuating.






1 2 3 4 5 6 7 8 9 10


. of




Years of Service

No. of Accidents involving Company B during their Years of


Title at Top

Scale figures for


Scale label for



Source Note

Grid lines

Scale figures for

x-axis Scale label for


Page 30: Course Notes Statistics


NEVER use line charts/graphs that are too stretched either horizontally or vertically, for it may mislead the person looking at the graph and interpret it as something that it is not really representing.

Page 31: Course Notes Statistics


Types of Line Chart

Simple Line Chart – This has only one curve and is appropriate for one series of time data.

Multiple Line Chart – This type of line chart shows two or more curves. We use this if we wish

to compare the trends in two or more data series.

Although the use of Multiple Line Chart is now commonly used, it should be taken

notice the number of series that you include in a graph, if there are a lot of series in a single

chart, it might become too confusing to see.

Number of Daily Responses (Example of Single Line Chart)

Page 32: Course Notes Statistics


Co lu m n Chart

We use the column charts to compare amounts in a time series data. The emphasis in

a column chart is on the differences in magnitude rather than the movement of a series.

We can also use the column chart to graph the frequency distribution of a

quantitative variable. We call this chart a frequency histogram.

For time series data, we arrange the columns on the horizontal axis in

chronological order, starting with the earliest date.

The proportions of the columns must be just right. Columns must not be too wide or too

narrow. The space between the bars must also be just right. Usually, the space between bars is

around one-fourth of the width of the column.

It is also advisable to use scale figures that are multiples of 5. If the observed values are so

small, we can use multiples of 1 or 2.

Title at Top

Scale figures for


Scale label for


Grid lines

Scale figures for


Page 33: Course Notes Statistics


Horizontal Bar Chart

Its use is appropriate when we wish to show the distribution of categorical data.

We use the horizontal bar chart so we can compare the magnitudes for the different

categories of a qualitative variable. We place the categories of the qualitative variable on the y-

axis. This will be more practical than placing the categories on the x-axis because there is more

space for text labels on the y-axis.

Just like the column charts, the bars should not be too wide, too narrow, too long and nor

too short.

Arranging the bars according to length usually facilitates comparisons. It may be

decreasing or ascending order.

If there are “Others” category, we always place this as the first or the last


If the categorical variables have a natural ordering, such as a rating scale, then we

should retain the order of the categories in the scale instead of arranging the bars

according to length.

We should always choose appropriate colors or patterns for the bars. We should

avoid selecting wavy and weird patterns since this will only produce an optical


Page 34: Course Notes Statistics


Pie Chart

It is a circle divided into several sections. Each section indicates the proportion of each

component or category. This is useful for data sorted in to categories for a specific period.

The purpose is to show the component parts with respect to the total in terms of the

percentage distribution.

The components of the pie chart should be arranged according to magnitude.

If there’s an ‘Others’ category, we put it in the last section. We use different colors,

shading, or patterns to distinguish one section of the pie to the other sections.

We plot the biggest slice at 12 o’clock.

If we want to emphasize a particular sector of the pie chart, we may explode that slice

by detaching it from the rest of the sectors.

The pie chart is applicable for qualitative rather than quantitative data. However, if

the variable has too many categories (more than 6), we should use the horizontal bar

chart rather than the pie chart.

Page 35: Course Notes Statistics



o It is like a horizontal bar chart but instead of using bars, we use symbols or pictures to

represent the magnitude.

o The purpose of this chart is to get the attention of the reader.

o The pictograph provides an overall picture of the data without presenting the exact


o Usually, we can only show approximate figures in a pictograph since we have to round off

figures to whole numbers. It still allows the comparison of different categories even if we

just present only the approximate values.

o The choice for the symbol or picture should be apt for the type of data. It should be self-

explanatory, interesting, and simple.

Statistical Maps

This type of chart shows statistical data in geographical areas.

This could also be called as crosshatched maps or shaded maps.

Geographic areas may be barangays, cities, districts, provinces, and countries.

The figures in the map can be ratios, rates, percentages, and indices.

We do not use the absolute values and frequencies in statistical maps.

Page 36: Course Notes Statistics


Types of Statistical Maps

Shaded Map – map that makes use of shading patterns. The shading pattern

indicates the degree of magnitude. It usually runs gradually from dark to light

(Darker shading of the map usually means larger magnitude).

Dot map – chart that gives either the location or the number of establishments in

a certain geographical area. The example below is a dot map of the number of

people with Hispanic decent in the US.

Page 37: Course Notes Statistics


2.5.1 Raw Data and Array

Raw Data are data in their original form.

Array is an ordered arrangement of data according to magnitude. We also refer to the array as sorted data or ordered data

The first step in data analysis is organizing the collected data. In its organized form,

important features of the data become clear and apparent.

The two common forms of organized data are the array and the frequency distribution

The actual data that we collect from surveys, observation, and experimentation are

what we call raw data. Raw data have not yet been organized or processed in any manner.

Example: Raw Data of the Final Grades of 100 Selected Students who took Stat 101

79 73 74 88 66 88 72 60 77 53 62 85 60 63 56 77 74 93 92 72 74 78 92 87 57 60 79 66 93 57 79 82 86 69 92 97 51 99 92 62 81 83 86 77 82 70 86 89 50 80 65 79 60 53 66 92 55 94 65 79 79 73 90 76 70 67 67 97 79 76 94 81 64 52 72 92 66 78 62 82 75 88 57 72 73 50 79 55 56 74 52 81 63 89 63 65 95 79 77 76

Arranging the observations manually according to magnitude is very tedious especially

if we are dealing with voluminous data. Thus, it is more convenient to use computer programs

to sort the data.

The array is not a summarized data set. It is simply an ordered set of observations. We

consider both the raw data and array as ungrouped data.

2.5 Organization of Data

Page 38: Course Notes Statistics


2.5.2 Frequency Distribution (FDT)

The frequency distribution (FDT) is a way of summarizing data by showing the number of observations that belong in the different categories or classes. We also refer to this as

grouped data.

Example: Array of the Final Grades of 100 Selected Students who took Stat 101

50 56 62 66 72 76 79 81 88 92 50 57 63 66 73 76 79 82 88 92 51 57 63 67 73 77 79 82 88 93 52 57 63 67 73 77 79 82 89 93 52 60 64 69 74 77 79 83 89 94 53 60 65 70 74 77 79 85 90 94 53 60 65 70 74 78 79 86 92 95 55 60 65 72 74 78 80 86 92 97 55 62 66 72 75 79 81 86 92 97 56 62 66 72 76 79 81 87 92 99

The frequency distribution is another way of organizing the data. It is a summarized

form of the raw data or array wherein we do not see the actual observed values anymore.

The two general forms of frequency distribution are single-value grouping and grouping

by class intervals:

1. Single-value grouping – is a frequency distribution where the classes are the distinct

values of the variable. This is applicable for data with only a few unique values.

2. Grouping by Class Intervals – is a frequency distribution where the classes are the


Example: Suppose we have data on the number of children of 50 married women using any modern

contraceptive method.

0 0 1 2 2 2 3 3 4 4

0 0 1 2 2 3 3 3 4 4

0 1 1 2 2 3 3 3 4 4

0 1 1 2 2 3 3 3 4 5

0 1 1 2 2 3 3 3 4 5

Page 39: Course Notes Statistics


Since there are only 6 unique values in the data set, then we use single-value grouping,

Distribution of Married Women Using Any Modern Method of Contraceptive by Number of Children

No. of Children Number of

Married Women

0 7

1 8

2 11

3 14

4 8

5 2

Concepts related to Frequency Distribution

1. Class Interval – is the range of values that belong in the class or category. 2. Class Frequency – is the number of observations that belong in a class interval. 3. Class Limits – are the end numbers used to define the class interval. The lower

class limit (LCL) is the lower end number while the upper class limit (UCL) is the upper end number.

4. Open Class Interval – is a class interval with no lower class limit or no upper class limit.

5. Class Boundaries – are the true class limits. If the observations are rounded figures, then we identify the class boundaries based on the standard rules of rounding as follows: the lower class boundary (LCB) is halfway between the lower class limit of the class and the upper class limit of the preceding class while the upper class boundary (UCB) is halfway between the upper class limit of the class and the lower class limit of the next class.

6. Class size – is the size of the class interval. It is the difference between the upper class boundaries of the class and the preceding class; or the difference between the lower class boundaries of the next class and the class.

7. Class Mark - is the midpoint of a class interval. It is the average of the lower class limit and the upper class limit or the average of the lower class boundary and upper class boundary of a class interval.

Page 40: Course Notes Statistics


After learning the concepts that we need to construct a frequency distribution table, we can now list down the steps in constructing a frequency distribution table.

After constructing the basic frequency distribution table, we could now add some other components to it that would help us in the analysis of the data.

o Relative Frequency – is the class frequency divided by the total number of observations

o Relative Frequency Distribution Percentage (RFP) – is relative frequency multiplied by 100.

Step 1:

• Determine the adequate number of classes denoted by K

• We can use the Sturges's rule to approximate the number of classes which is given by K = 1+ 3.322(log n)

Step 2:

• Determine the range, R = highest observed value - smallest observed value

Step 3: • Compute for the pre-class size C' = R/K

Step 4:

• Determine the class size, C, by rounding-off C' to a convenient number

Step 5:

• Choose the lower class limit of the first class. Make sure that the smallest observation will belong in the first class.

Step 6:

• List the class intervals. Determine the lower class limits of the suceeding classes y adding the class size to the lower class limit of the previous class. The last lass should include the largest observation.

Step 7: • Tally all the observed values in each class interval

Step 8:

• Sum the frequency column and check against the total number of observations

Page 41: Course Notes Statistics


The relative frequency and RFP show the proportion and percentage of observations falling in each class. The RFP allows us to compare two or more data sets with different totals. The sum of the RFP column is one hundred percent (100%).

Another component that could be added to the FDT is the cumulative frequency

distribution which is comprised of two components.

o The less than cumulative frequency distribution (<CFD) shows the number of

observations with values smaller than or equal to the upper class boundary.

o The greater than cumulative frequency distribution (>CFD) shows the number of

observations with values higher than or equal to the lower class boundary.

Example: Using the data of the Grades of 10o Students who took Stat 101, we would

construct the frequency distribution table with the extra components; RF, RFP

<CFD and >CFD.

First, we will compute for K using the Sturges’ rule,

K = 1 + (3.322*log n) = 1 + (3.322*log 100)

= 1 + (3.322 *2) = 7.644 8

Secondly, we compute for the range, R

R = max. value – min. value = 99 – 50 = 49

Third, compute for C’ and eventually C

C’ = R / K = 49 / 8 = 6.125 7

Now we can create the FDT for the data set,

Class Limits Class Boundaries Frequency Class Mark RF RFP CFD

LCL UCL LCB UCB f x f/n % < CFD > CFD

50 - 56 49.5 - 56.5 11 53 0.11 11 11 100

57 - 63 56.5 - 63.5 13 60 0.13 13 24 89

64 - 70 63.5 - 70.5 13 67 0.13 13 37 76

71 - 77 70.5 - 77.5 19 74 0.19 19 56 63

78 - 84 77.5 - 84.5 19 81 0.19 19 75 44

85 - 91 84.5 - 91.5 11 88 0.11 11 86 25

92 - 98 91.5 - 98.5 13 95 0.13 13 99 14

99 - 105 98.5 - 105.5 1 102 0.01 1 100 1


Page 42: Course Notes Statistics


Graphical Presentation of the Frequency Distribution

We can effectively interpret the frequency distribution when displayed pictorially since

more people understand and comprehend the data in graphic form. In this section we would

discuss the various method of presenting the frequency distribution in graphical form.

1. Frequency Histogram

The frequency histogram shows the overall picture of the distribution of the observed

values in the dataset. It displays the class boundaries on the horizontal axis and the class

frequencies on the vertical axis.

The frequency histogram shows the shape of the distribution. The area under the

frequency histogram corresponds to the total number of observations. The tallest vertical bar

shows the frequency of the class interval with the largest class frequency.

2. Relative Frequency/ Relative Frequency Percentage Histogram

The RF or RFP histogram displays the class boundaries on the horizontal axis

and the relative frequencies or RFPs of the class intervals on the vertical axis. It

represents the relative frequency of each class by a vertical bar whose height is equal

to the relative frequency of the class. The shape of the relative frequency histogram

and frequency histogram are the same.

Page 43: Course Notes Statistics


3. Frequency Polygon

For the frequency polygon, plot the class frequencies at the midpoint of the

classes and connect the plotted points by means of straight lines. Since it is a polygon

we need to close the ends of the graph. To close the polygon, add an additional class

mark on both ends of the graph wherein both ends have the frequency of 0.

The advantage of the frequency polygon over the frequency histogram is that

it allows the construction of two or more frequency distributions on the same plot

area. This facilitates the comparison of the different frequency distributions. The

frequency polygon also exhibits the shape of the data distribution.

Page 44: Course Notes Statistics


4. Ogives

The ogive is the plot of the cumulative frequency distribution. This graphical

representation is used when we need to determine the number of observations below

or above a particular class boundary.

The less than ogive is the plot of the less than cumulative frequencies against the

upper class boundaries. On the other hand, the greater than ogive is the plot of the

greater than cumulative frequencies against the lower class boundaries. Connect the

successive points by straight lines.

If we superimpose the less than and greater than ogives, the point of intersection

gives us the value of the median. The median divides the ordered observations into

two equal parts.

Page 45: Course Notes Statistics


The average is the popular term that is used to refer to a measure of central tendency.

Most are already accustomed to thinking in terms of an average as a way of representing the

collection of observations by a single value.

For instance, we often use the average score to represent the scores in the exam of all

students in a class. We can say that if the average score is high, then we conclude that the class

performed well. The average could also be used to compare the performance of two groups

based on the average of both groups and comparing which one has the higher average.

The most common measure of central tendency is the arithmetic mean. The two other

measures of central tendency that we will present in this section are the median and the mode.

All of these measures aim to give information about the ‘center’ of the data or distribution.

The summation notation provides a compact way of writing the formulas for some of

the summary measures that would be discussed in this section. The capital Greek letter

“sigma”, is the mathematical symbol that represents the process of summation.

The symbol, ∑ is equal to X1 + X2 + X3 + … + Xn

where Xi = value of the variable for the ith observation

i = index of the summation (the letter below ).

1 = lower limit of the summation (the number below ).

n = upper limit of the summation (the letter above ).

We read ∑ as “summation of X sub i, where I is from 1 to n”.

• Summary Measures Part 1

3.1 Measures of Central Tendency

3.1 .1 Summation Notation

Page 46: Course Notes Statistics


Some Notes on Summation:

1) The index (as indicated by the letter below ) may be any letter, but the letters i, j, k are

the most common. For example, ∑ = ∑

even if their indexes are different

because the terms of the sum and the index sets of the two summations are the same.

2) The lower limit of the summation may start with any number. For example we can have

∑ . This is equal to X3 + X4 + X5 + X6 since the index set of {3, 4, 5, 6}.

3) The index of the summation will not necessarily appear as a subscript in the terms of

the summation. For example, we can have ∑ . This is equal to 1 + 2 + 3 + 4 + 5

since the notation indicates that the terms of the sum are the values of the index themselves.

The arithmetic mean, or simply called the mean, is the most common type of average.

It is the sum of all observed values divided by the number of observations. When people use

the term “average”, usually they refer to the arithmetic mean.

By definition, the computation of the population mean and sample mean involve the

same process. To compute for their values, we get the sum of all the measures in the collection

and divide this sum by the number of elements in the collection. The main difference is that

the collections of measures used to compute the population mean is taken from all of the

elements in the population while the sample mean’s collection is only taken from the selected

sample. Thus, the population mean is a parameter while the sample mean is a statistic.

3.1 .2 The Arithmetic Mean

The arithmetic mean is the sum of all the values in the collection divided by the total number of elements in the collection.

The population mean for a finite population with N elements, denoted by the

lowercase Greek Letter mu, , is;

The sample mean for a sample with n elements, denoted by X (read as "X-bar") is;

Page 47: Course Notes Statistics



1. Consider the sample on the final grades on Stat 101 of the 100 selected students.

2. Five judges give their scores on the performance of a gymnast as follows: 8, 9, 9, 9, and 10. Find the mean score of the gymnast.

Approximating the Mean from Grouped Data:

Sometimes, the only data that we have is already the frequency distribution and the

raw data is not accessible. In this case, we cannot compute for the value of the mean for this

kind of data. However, we could still estimate the mean of the frequency distribution.

The formula for estimating the mean of the population and the sample are indicated below:

Population Mean: ∑

Sample Mean:

where fi = the frequency of the ith class Xi = the class mark of the ith class k = total number of classes

N or n = total number of observations, ∑

Example: Consider the frequency distribution of the Final Grades in Stat 101 of the 100

selected students.

Class Limits Frequency Class Mark RF

LCL UCL f x fiXi

50 - 56 11 53 583

57 - 63 13 60 780

64 - 70 13 67 871

71 - 77 19 74 1406

78 - 84 19 81 1539

85 - 91 11 88 968

92 - 98 13 95 1235

99 - 105 1 102 102

n=100 fixi =7484

Page 48: Course Notes Statistics


Thus, the mean final grade of the 100 selected Stat 101 Students is approximately 74.84.

Remark: Note that we have computed the mean of the 100 selected Stat 101 Students using

the raw data and the frequency distribution. We could see that they are not equal but relatively

near to each other.

Some Modifications for the Mean:

a. Weighted Mean

Sometimes, we know that the individual observed values vary in their degree of

importance. In this case, it is recommended to use the weighted mean. The weighted

mean assigns weights to the observations depending on their relative importance.

The formula for the weighted mean is given below:

Example: Ron wants to determine his GWA for this semester given his grades on CRS.

Class Units Grade

EEE 10 3.0 2.00

Humanidades 1 THW1 3.0 1.25

Math 54 TWTHFU3 5.0 1.75

PE 2 UF WFC (2.0) 1.00

Physics 11 THV 3.0 2.50

Chem 14 WFW 3.0 1.25

∑ X

( ) ( ) ( ) ( ) ( ) ( )

Page 49: Course Notes Statistics


b. Combined Mean

If we want to get the mean of the combination of several data sets but only

given the means and number of observations of each data set, we could use the

formula for the combined mean:

Example: Three sections of a statistics class containing 28, 32, and 35 students

averaged 83, 80, and 76 respectively, on the same final examination. What is the

combined population mean for all three sections?

Solution: We let N1= 28, N2 = 32, and N3= 35, 1=83, 2=80, and 3=76

( ) ( ) ( )

Thus, the mean grade of the students in the 3 sections is 79.4.

c. Trimmed Mean

Sometimes, we want to remove the outliers or the extreme values in the

data before getting the mean to get more reliable information. To do this, we

could use the trimmed mean. Below are the steps in computing the trimmed


1. Create an array from the raw data.

2. Decide on the percentage of the data set that we will remove in the

upper and lower end of the ordered observations.

The objective of the trimmed mean is to remove the influence of possible

outliers that appear in both the lower and upper portion of the ordered data.

Example: Compute for the 5% trimmed mean for the given data.

10 11 11 17 10 14 11 20

12 14 13 20 12 16 13 12

15 16 15 12 15 19 15 14

18 18 18 14 18 12 18 18

20 12 500 17 20 14 524 20

The arithmetic mean of the data is 39.95

Page 50: Course Notes Statistics


Round-off Rule In performing clculations, we only round-off the final answer and not the

transitional values. The final answer should increase by one digit of the original observations. For example, the mean of the data set 3, 4, and 6 is 4.3333... . Round this figure to the nearest tenth since the original observed values are whole numbers. Thus, the mean becomes 4.3. On the other hand, if the original observed values have one decimal place like 4.5, 6.3, 7.7, 8.9, then we round the final answer to two decimal places. Thus, if we get the mean, the final answer is 6.85

First we create an array;

10 12 12 14 15 17 18 20

10 12 13 14 15 18 18 20

11 12 13 14 16 18 19 20

11 12 14 15 16 18 20 500

11 12 14 15 17 18 20 524

Then we compute the 5% of the total number of data points, 5% of 40 is 2,

therefore we remove the first 2 and the last two data points in the data that we

have. And we would have the 36 data points listed below.

11 12 12 14 15 16 18 18 20

11 12 13 14 15 16 18 18 20

11 12 13 14 15 17 18 19 20

12 12 14 14 15 17 18 20 20

Then we compute for the trimmed mean which is just simply the arithmetic

mean of the trimmed dataset which would result to 15.39 which is very far from

the arithmetic mean of the original data which is 39.95. We can see that 15.39 is a

better summary measure than 39.95 since it represents more of the data points.

Page 51: Course Notes Statistics


The median is the value that divides the array into two equal parts.

Another summary measure for getting the central tendency is the median. The median

divides an ordered set of observations into two equal parts. In other words, it is the measure

occupying the positional center of the array.

If an observation is smaller than the median, then it belongs to the lower half of the

array while if the observation is greater than the median then it belongs in the upper half of the


The first step in finding the median, denoted by or Md, is to arrange the observations

in an array. We let X(1) is the smallest observation while X(n) is the largest observation. The

process of determining the median is different for the datasets with even and odd number of


Case I: Number of Observations is Odd ; The formula is:

. /, This formula means that the median is the .


observation in the array

Case II: Number of Observations is Even ; The formula is:

. /

. /

, this means that the median is the average of the two middle values.


a. The following are the number of years of operation of 8 oil distributing companies: 9, 11, 16, 12,

17, 20, and 18. Find the median

Array: 9, 11, 12, 16 , 17, 18, 20 ; Therefore the median year of operation is 16.

b. Using the data on the Final Grades of 100 Selected Stat 101 Students. Find the median.

We get the average of the 50th and 51st observations = 76 ;

Therefore the median grade is 76%

3.1 .3 The Median

Page 52: Course Notes Statistics


Approximating the Median from Grouped Data:

We can approximate the median from a frequency distribution. We obtain a good

approximate for the median if the observed values belonging in the median class are evenly

spaced throughout the class interval. The median class is the class interval containing the

median. To get the median, we perform the following steps:

Step 1: Calculate n/2, where n=∑ is the number of observations.

Step 2: Construct the less than cumulative frequency distribution (<CFD)

Step 3: Locate the value in the <CFD column that is greater than or equal to n/2. The class interval corresponding to that value is the median class.

Step 4: Approximate the median using the formula given below:



where LCBMd is the lower class boundary of the median class C is the class size n is the total number of observations <CFMd-1 is the less than cumulative frequency preceding the median


fMd is the frequency of the median class

Example: Using the FDT of the Final Grades of 100 Selected Stat 101 Students. Find the median.

Class Boundaries Frequency < CFD


49.5 - 56.5 11 11

56.5 - 63.5 13 24

63.5 - 70.5 13 37 70.5 - 77.5 19 - fMd 56

77.5 - 84.5 19 75

84.5 - 91.5 11 86

91.5 - 98.5 13 99

98.5 - 105.5 1 100


) = 75.29 ;

Thus the median final grade of the 100 Stat 101 Students is 75.29

median class <CFMd-1

Page 53: Course Notes Statistics


The mode is the observed value that occurs with greatest frequency in a data set.

The mode is the most frequent observed value in the dataset. If the data is small, we

could easily identify the mode if there is/are any just through inspection. However, for large

amount of data, identifying the computer manually is a difficult task. In general, the mode is

less popular than the mean and median in terms of being a measure of central tendency.


1. We consider the height in inches of 10 basketball players: 70, 70, 71, 71, 72, 72, 72, 72,

75, and 75. Find the mode;

Answer: 72 is the modal height

2. We consider the shoe sizes of 24 female faculty members: 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7, 8, 9, 9,

10, 10, 10, 10, 10, 10, 10, 10, and 11. Find the mode;

Answer: The modal shoe sizes are 6 and 10 (if there are two modes we can call this

phenomenon as bimodal distribution)

3. We consider the scores of 15 students in a quiz: 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20,

20, and 20. Find the mode;

Answer: There is no mode (sometimes, the mode really does not exist)

Approximating the Mode from Grouped Data:

We can approximate the mode from a frequency distribution. To get the mode, we

just need perform the following steps:

Step 1: Locate the modal class. For frequency distributions with equal class sizes, the modal class is the class interval with the highest frequency.

Step 2: Approximate the mode using the formula given below:



where LCBMo is the lower class boundary of the modal class C is the class size f1 is the frequency of the class preceding the modal class f2 is the frequency of the class following the modal class fMo is the frequency of the modal class

3.1 .4 The Mode

Page 54: Course Notes Statistics


Example: Using the FDT of the Final Grades of 99 Selected Stat 101 Students. Find the mode.

Class Boundaries Frequency


49.5 - 56.5 11 56.5 - 63.5 13 63.5 - 70.5 13 – f1 70.5 - 77.5 19 - fMo

77.5 - 84.5 18 – f2

84.5 - 91.5 11 91.5 - 98.5 13

98.5 - 105.5 1


( ) / 76.5;

Thus the modal final grade of the 100 Stat 101 Students is 76.5

Summary of the Different Measures of Central Tendency

Measure of Central

Tendency Definition Data Requirement

Existence/ Uniqueness

Takes into account

every value?

Affected by Outliers

Can treat formula


Mean "center of mass"

At least interval scale

and values that are close to each other

Always exists/ Always unique

Yes Yes Yes

Median "center of the array"

Divides the array into two equal parts

At least ordinal scale

Always exists/ Always unique

No No No

Mode "typical value"

Most frequent value

Even if nominal scale only

Might not exist/ Not

always unique

No No No

modal class

Page 55: Course Notes Statistics


A measure of location provides us information on the percentage of observations in the collection whose values are less than or equal to it. We also commonly refer

to these measures of locations as quantiles or fractiles.

The percentiles divide the ordered observations into 100 equal parts.

On the previous section, we have learned that the median is a measure of central

tendency. But in this section we also know that the median is also a measure of location and

three more measures of location; the percentiles, quartiles and deciles.

Recall: The median divides the ordered the observations into two equal parts. We

could interpret that at least 50% of the observation have values less than or

equal the median and at least 50% of the observation have values greater than

or equal the median value.

We could generalize this aspect of the median into percentiles, wherein the

percentiles divides the ORDERED observations into 100 equal parts. There is a total of 99

percentiles which can be denoted as; P1, P2, P3 …, P99.

For any k (1 to 99), we can interpret Pk as a value for which at least k % of the

observations are less than or equal to its value and at least (100-k) % of the observations are

greater than or equal to the value of Pk.

(i.e. the 56th percentile of a distribution P56 is a value such that at least 56% of the observations

are less than or equal to its value and at least 44% are greater than or equal to its value)

3.2 Measures of Location

3.2.1 The Percentiles

Page 56: Course Notes Statistics


Computing for the Percentile using Empirical Distribution with Averaging:

The steps involved in determining Pk using the empirical distribution number with

averaging are as follows:

Step 1: Arrange the observations from lowest to highest. Denote the ordered observations by X(i). Thus, X(i) is the value on the ith position of the array.

Step 2: Compute for

, where n is the number of observations and k is the subscript

of Pk. For example, if you want the 20th percentile, P20, then k =20.

Step 3: Use the following rule to determine the kth percentile:


is an integer, then Pk =

. / . /


is not an integer, then Pk = X(c) where c is the closest integer greater than


a. The following are the total receipts of seven mining companies (in million pesos): 4.6, 1.3, 7.3, 6.6, 10.5, 50.7, and 12.6. Find the 75th percentile.


Arrange the data in an array (lowest to highest).

Array: 1.3 4.6 6.6 7.3 10.5 12.6 50.7 Notation: X(1) X(2) X(3) X(4) X(5) X(6) X(7)

Compute for nk/100 = (7) (75)/100 = 5.25. The number 5.25 is not an integer. Thus, we

use the second formula of the empirical distribution number with averaging. The closest

integer greater than 5.25 is 6 so the 75th percentile is X(6), the sixth data item in the array.

Therefore, the 75th percentile is equal to 12.6.

b. The following are the number of years of operation of 20 mining companies: 4, 5, 6, 6, 7, 8, 10, 10, 11, 16, 17, 17, 18, 19, 20, 20, 21, 23, 25, and 30. Determine the 90th percentile.

Solution: Arrange the data in an array. Compute for nk/100=(20)(90)/100=18. The number 18 is an integer. Thus, we use the first formula of the empirical distribution with averaging.

. / . /

( ) ( )

Thus, we can say that 90 percent of the mining companies have been operating for less

than 24 years or 10% of the mining companies have been operating for more than 24 years.

Page 57: Course Notes Statistics


Approximating the Percentile from Grouped Data:

To approximate the kth percentile from a frequency distribution, we just need to

perform the following steps:

Step 1: Calculate nk/100, where n=∑ is the number of observations.

Step 2: Construct the less than cumulative frequency distribution (<CFD)

Step 3: Locate the value in the <CFD column that is greater than or equal to nk/100. The class interval corresponding to that value is the kth percentile class.

Step 4: Approximate the median using the formula given below:



where is the lower class boundary of the Pkth class

C is the class size n is the total number of observations k is the percentile of interest is the less than cumulative frequency preceding the Pk

th class

is the frequency of the median class

Example: Using the FDT of the Final Grades of 100 Selected Stat 101 Students. Find the 75th percentile.

Class Boundaries Frequency < CFD


49.5 - 56.5 11 11

56.5 - 63.5 13 24

63.5 - 70.5 13 37

70.5 - 77.5 19 56 77.5 - 84.5 19 –fp75 75

84.5 - 91.5 11 86

91.5 - 98.5 13 99

98.5 - 105.5 1 100

Compute for nk /100=75*100/100 = 75. From the <CFD column, the value that is

greater than or equal to nk/100 = 75. Now we compute, .

/ = 84.5 ;

Thus we can say that at least 75% of all the students have a final grade in Stat 101 less than or equal to 84.5. At the same time, at least 25% of the students have grades greater than or equal to 84.5.

P75th class


Page 58: Course Notes Statistics


The quartiles divide the ordered observations into 4 equal parts.

The deciles divide the ordered observations into 10 equal parts.

There are three quartiles; we interpret them in the following manner:

Q1, read as ‘first quartile’, is the value for which at least 25% of the observations are less than or equal to it and 75% of the observations are greater than or equal to it.

Q2, read as ‘second quartile’, is the value for which at least 50% of the observations are less than or equal to it and 50% of the observations are greater than or equal to it.

Q3, read as ‘third quartile’, is the value for which at least 75% of the observations are less than or equal to it and 25% of the observations are greater than or equal to it.

As we can see, quartiles are just special cases of percentiles. Q1=P25, Q2=P50, and Q3=P75.

Therefore the computation for Q1 would be the same as the computation for P25 as so as the other

two quartiles.

There are nine deciles; we interpret them in the following manner:

D1, read as ‘first decile’, is the value for which at least 10% of the observations are less than or equal to it and 90% of the observations are greater than or equal to it.

D2, read as ‘second decile’, is the value for which at least 20% of the observations are less than or equal to it and 80% of the observations are greater than or equal to it.

D9, read as ‘ninth decile’, is the value for which at least 90% of the observations are less than or equal to it and 10% of the observations are greater than or equal to it.

As we can see, quartiles are just special cases of percentiles. D1=P10, D2=P20, D3=P30,

D4=P40, D5=P50, D6=P60, D7=P70, D8=P80, and D9=P90. Therefore the computation for D5 would be

the same as the computation for P50 and Q2 as so as the other deciles.

3.2.2 The Quartiles

3.2.3 The Deciles

Page 59: Course Notes Statistics


The range is the distance between the maximum value and the minimum value. In formula, we write this as:

Range = highest value - lowest value = maximum - minimum

The mean, median and the mode are not always sufficient to provide us the complete

picture of the data. Oftentimes, it is possible that two or more data sets have the same center

but differ in other aspects like the distance between the observations.

This aspect of the data could be described by the summary measures under the

measures of dispersion. This measure allows us to determine the degree of dispersion of the

observations about the center of the distribution. If the value of the summary measure is small,

then this indicates that the observations are not too different from each other so that the lump

of the observations is located on the center. On the other hand, if its value is large, then this

indicates that the observations are much dispersed and widely spread out of the center.

The range is the simplest and easiest-to-use measure of dispersion. It is a common

practice to present the range by stating the smallest and the largest values in the collection.

Example: Given the weight of five rabbits (in pound) 8, 12, 10, 14, 15. Compute for the range.

Solution: The lightest rabbit weighs 8 pounds and the heaviest rabbit weigh 15 pounds. Thus,

the range of the weights of the rabbit is;

Range = heaviest – lightest = 15 – 8 = 7 pounds

4.1 Measures of Dispersion

4.1.1 The Range

Page 60: Course Notes Statistics


Approximating the Percentile from Grouped Data:

We can approximate the range from a frequency distribution using the formula given below:

where is the upper class limit of the last class interval is the lower class limit of the first class interval Example: Using the FDT of the Final Grades of 100 Selected Stat 101 Students. Find the range.

Class Boundaries Frequency


50 - 56 11

57 - 63 13

64 - 70 13

71 - 77 19

78 - 84 19

85 - 91 11 92 - 98 13

99 - 105 1

The upper class limit of the last class interval is 105 and the lower class limit of the first class interval is 50. Thus the range is: Range = 105 – 50 = 55.

The variance is a measure of dispersion that we can use to describe the variation of the

measurements in the collection. The variance could also be used to determine if the mean is a

good measure of central tendency. A relatively small variance indicates that the observations

are highly concentrated about the mean so that it is appropriate to use the mean to represent

all of the values in the collection. Whereas, if the variance is significantly large, then it signifies

that, on the average, the observations are very different from the mean.

4.1.2 The Variance and Standard Deviation

Lowest Class


Highest Class


Page 61: Course Notes Statistics


The sample variance, unlike the population variance, is not the average of the squared

deviations of the mean from the observations. Its denominator is the total number of

observations in the sample minus one (n-1) and not n. We use (n-1) to make up for the

tendency of the estimator to underestimate.

The unit of the variance is the square of the unit of measurement which makes the

interpretation of the variance difficult to relate with the original observations. And because of

this, we may use the standard deviation.

The population variance for a finite population with N elements, denoted by 2 (where is the small Greek letter sigma) is:

The sample variance for a sample with n elements, denoted by s2, is;

where Xi = measure taken from the ith unit in the collection

= population mean = sample mean

The population standard deviation for a finite population with N elements, denoted by is:

The sample standard deviation for a sample with n elements, denoted by s is:

Page 62: Course Notes Statistics


Example: Given the IQ of seven students in the sample, 100, 99, 110, 105, 112, 107, and 116,

compute for the standard deviation

Solution: Let X be the IQ of the ith student, i = 1, 2, …, 7. The number of students in the

sample is n=7. The sample mean is;

Xi (Xi-107) (Xi-107)2

100 -7 49

99 -8 64

110 3 9

105 -2 4

112 5 25

107 0 0

116 9 81

(Xi-107)2 = 232

Thus the sample variance is ∑ ( )

To get the standard deviation, we get the square root of the sample variance. Thus, the

sample standard deviation, √ √ .

Computational Formula for Variance

Population variance:


where Xi = measure taken from the ith unit in the collection N = is the number of observations in the population n = is the number of observations in the sample

Page 63: Course Notes Statistics


Approximating the Standard Deviation from Grouped Data:

We use the ff. notations and terms in the approximation of the variance and the

standard deviation from grouped data:

Xi = midpoint or class mark of the ith class interval fi = frequency of the ith class interval k = number of classes N = number of observation in the population n = number of observation in the sample

= population mean = sample mean

The computational formula for the Variance for the population and sample are as follows:

Population Variance:

Sample Variance:

Example: Consider the frequency distribution of the Final Grades in Stat 101 of the 100

selected students. Compute for the sample variance and sample standard deviation.

Class Limits Frequency Class Mark

LCL UCL fi xi fiXi fiXi2

50 - 56 11 53 583 30899

57 - 63 13 60 780 46800

64 - 70 13 67 871 58357

71 - 77 19 74 1406 104044

78 - 84 19 81 1539 124659

85 - 91 11 88 968 85184

92 - 98 13 95 1235 117325

99 - 105 1 102 102 10404


fixi =7484 fiXi2=577672



( ) ( ) ( )

( )

√ ∑



( ) √

Page 64: Course Notes Statistics


The z-score or the standard score measures how many standard deviations an observed value is above or below the mean:

Population z-score 𝜇

𝜎 where is the population mean

is the population standard deviation

Sample z-score

𝑠 where is the sample mean

s is the sample standard deviation

The z-score or standard score helps determine the relative position of an observed value

in the collection where the observed value is below or above the mean and it also measures

how far the observed value is from the mean in terms of the size of the standard deviation.

We can use the standard score two compare two or more observed values from

different data sets. We can also use the standard score in identifying possible outliers in our


Example: The mean grade in Statistics 101 is 70% and the standard deviation is 10%, whereas in

Math 17, the mean grade is 80% and the standard deviation is 20%. Mark got a grade

of 75% in Stat 101 and a grade of 90% in Math 17. In which subject did Mark perform

better if we consider the grades of the other students in the two subjects?


If we consider the grades of the other students in the two subjects, Mark’s score in

Stat 101 is just as good as his score in Math 17. Based on the z-scores, Mark’s scores in both

subjects are 0.5 standard deviations above their respective mean scores.

4.1.3 The Z-score

Page 65: Course Notes Statistics


The coefficient of variation is the ratio of the standard deviation to the mean, expressed as a percentage. The formula of the coefficient of variation (CV) is as follows:

Population CV 𝜎

𝜇 100% where is the population mean

is the population standard deviation

Sample CV 𝑠

100% where is the sample mean

s is the sample standard deviation

The coefficient of variation expresses the standard deviation as a percentage of the

mean. A large coefficient of variation indicates that the dataset is highly variable because its

standard deviation is large relative to the size of the mean.

We do not use the coefficient of variation is the mean is less than or equal to zero. When

the mean is zero, then the coefficient of variation will be undefined. When the mean is

negative, the coefficient of variation is meaningless.

Example: Suppose we want to buy a stock and we can select from one out of the two. The

prices of stock 1 and stock 2 per share are 2100 PhP and 650 PhP respectively. Let

us say that for the past months, we compiled data on a sample of prices f stock 1

and stock 2 at the close of trading and we have the following statistics:

Stocks 1 Stocks 2

Mean 2095 665

Standard Deviation 450 80

Solution: We compute for the coefficient of variation to know which stock has more variable price.



From the calculation, stock 1 has a more variable price than that of stock 2. Thus we

will select stock 1 if we want to take chance that its price will increase. We just have to

remember that by choosing stock 1, we are also taking the risk that its price will decrease.

4.1.4 The Coefficient of Variation

Page 66: Course Notes Statistics


If it is possible to divide the histogram at the center into two identical halves, wherein each half is a mirror image of the other, then the distribution is called a symmetric

distribution. Otherwise, it is called a skewed distribution.

Relying solely on a measure of central tendency and a measure of central tendency

and a measure of dispersion in figuring out the behavior of a dataset may sometimes be

misleading. It is possible for two datasets to have equal means and equal standard deviations;

and yet, the shapes of their histograms are extremely different.

The figure below shows various examples of symmetric and skewed distributions. We

will notice that there are two distinct types of skewness. Either the concentration of

observations is on the right side of the distribution which is tapering-off on the left side or the

other way around.

4.2 Measures of Skewness

4.2.1 Symmetry and Skewness

Page 67: Course Notes Statistics


A distribution is said to be positively skewed or skewed to the right when the concentration of the values is at the left-end of the distribution and the upper tail of the distribution stretches out more than the lower tail.

A distribution is said to be negatively skewed or skewed to the left when the concentration of the values is at the right-end of the distribution and the lower tail of the distribution stretches out more than the upper tail.

Skewness presents a problem in the analysis of data because it can adversely affect the

behavior of certain summary measures. For this reason, certain procedures in statistics depend

on symmetric assumptions. It would be inappropriate to use these procedures in the presence

of severe skewness. Sometimes we need to perform special preliminary adjustments, such as

transformations before analyzing the data.

In general we should look if there is the presence of skewness in the data before

analysis for us to prevent contamination or errors in the succeeding analysis because it may

result to spurious conclusions.

Relationship of the Three Measure of Central Tendency and the Skewness of the Distribution

Page 68: Course Notes Statistics


All measures of skewness that would be discussed are relative to each other, thus we can always use the following interpretations for the computed measures:

Sk = 0 symmetric distribution

Sk > 0 positively skewed distribution

Sk < 0 negatively skewed distribution

Pearson’s Coefficient for Skewness

The Pearson’s first coefficient of skewness for a sample is,

The Pearson’s second coefficient of skewness for a sample is, ( )

The bases for these formulas are the relationship of the mean, median, and the

mode. The problem in first coefficient of skewness regarding the mode led to the development

of the second coefficient of skewness. The second coefficient formula is based on empirical

evidence on the distance of the median from the mean and the mode.

Example: Given the mean, median, mode and sample standard deviation of two different

sets of test scores in Stat 101 Finals, compute for the Pearson’s coefficient of

skewness for these two sets of scores.

Set A: X = 29.5 X s=19.33



( )

𝑠 ( )

Set B: X = 70.5 X s=19.33


( )

𝑠 ( )

Both coefficients of skewness indicate that the distribution of set A is positively skewed while the

distribution of set B is negatively skewed. Their magnitudes are equal indicating that they have the same

degree of asymmetry but in opposite directions.

4.2.2 Common Measures of Skewness

A measure of skewness is a single value that indicates the degree and direction of asymmetry.

Page 69: Course Notes Statistics


Coefficient of Skewness Based on Third Moment

The population coefficient of skewness based on third moment is derived by

∑ ( )

An unbiased estimator of the coefficient of skewness based on third moment is derived by

√ ( )

(√ )

where ∑ ( )

Example: Given the sample data of two sets of test scores in Stat 101 Finals, compute for the

unbiased estimator of the coefficient of skewness based on the third moment for

these two sets of scores.

Set A

10 10 15 15 15 15 15 15 15

15 15 15 15 15 15 20 20 20

20 20 20 20 20 20 20 20 25

25 25 25 25 25 25 25 35 35

40 40 40 40 40 45 45 45 50

60 75 75 80 95

Set B

5 20 25 25 40 50 55 55 55

60 60 60 60 60 65 65 75 75

75 75 75 75 75 75 80 80 80

80 80 80 80 80 80 80 80 85

85 85 85 85 85 85 85 85 85

85 85 85 90 90

Set A:

∑ ( )


∑ ( )


√ ( )

(√ ) = 1.7297

Set B:

∑ ( )


∑ ( )

= -11757,

√ ( )

(√ ) = - 1.7297

Sample coefficient of skewness

based on the third moment

Page 70: Course Notes Statistics


Coefficient of Skewness Based on the Quartiles

The coefficient of skewness based on the quartiles is defined as:

( ) ( )

Example: Compute for the coefficient of skewness based on the quartiles of the two sets of

tests scores used in the previous example.

Set A: =40

( ) ( )

( )

Set B: =85

( ) ( )

( )

The term kurtosis came from the Greek word “kurtos” meaning convex. It is used to

describe the hump of a relative frequency distribution as compared to the normal distribution.

The normal distribution is a bell-shaped curve that is symmetric about its mean, . We

would discuss the further details of the normal distribution on the succeeding lessons. Below are the

three types of distribution according to its kurtosis based on the normal distribution:

1. Mesokurtic – The hump is the same as the normal curve. It is neither too flat nor too peaked.

2. Leptokurtic – The curve is more peaked and the hump is narrower or sharper than the normal curve. The prefix “lepto” came from the Greek word leptos meaning small or thin.

3. Platykurtic – The curve is less peaked and the hump is flatter than the normal curve. The prefix “platy” from the Greek word platus means wide or flat.

4.3 Measures of Kurtosis

4.3.1 Types of Kurtosis

Page 71: Course Notes Statistics


Coefficient of Kurtosis Based on the Fourth Moment

The population coefficient of kurtosis based on the fourth moment is derived by

∑ ( )

The Sample coefficient of kurtosis based on the fourth momentis derived by

[ ( )


where ∑ ( )

In general: kurt = 3 mesokurtic

kurt > 3 leptokurtic

kurt < 3 platykurtic

Exercise: Compute for the sample coefficient of kurtosis based on the fourth moment on

the two datasets of the scores of the students in Stat 101

(Hint: Set A and Set B’s coefficient of kurtosis should almost be equal and leptokurtic)

Page 72: Course Notes Statistics


• The Exploratory Perspective

Classical statistical techniques yield the most favorable results under the condition that certain assumptions are satisfied. However, in reality, not all these assumptions are satisfied. Exploratory data analysis involves probing the data before comparing them to any probabilistic model. The techniques of exploratory data analysis help us to cope with a set of data in a fairly informal way, guiding us toward structure relatively quickly and easily. It provides us with an extensive repertoire of methods for the detailed study of a set of data.


Confirmatory Data Analysis Exploratory Data Analysis

Assesses the reproducibility of the observed patterns and effects.

Isolates patterns and features of the data and reveals these forcefully to the analyst.

Work under a stringent set of assumptions

Flexible in distribution

Incorporates past gained info / data Explores the data at hand to discover structure / relationships.

Four themes of EDA

Resistance – a resistant method produces results that change only slightly when a small part of the data is replaced by new numbers.

Residuals – what remain after a summary or fitted model has been subtracted out of the data according to the schematic equation:

Residual = data – fit

- a key attitude of EDA asserts that an analysis of a set of data is not complete

without a careful examination of the residuals.

Re-expression – involves finding what scale would simplify the analysis of the data.

Revelation – visual displays meet the analyst’s need to see behavior and thus to grasp the unexpected features as well as familiar regularities in the data.

Page 73: Course Notes Statistics


Transforming Data

Frequently, the data that we obtain will not give straightforward information and is difficult

to summarize, occasionally, the cost and ease of analysis are seriously impaired. Among the

reasons that cause these difficulties are the following:

- Strong asymmetry

- Many outliers in one tail

- Widely differing measures of location and space. - Large and systematic residuals

In order to avoid these difficulties, one possibility to consider is the use of data

transformation. We change not only the units by which the data are stated, but also the basic unit

of measurement.

Def’n: A transformation of the batch x1, x2, . . . , xn, is a function T that replaces each xi by a new value T(xi) so that the transformed values of the batch are T(x1), T(x2), . . , T(xn). Properties of Transformations

1. They preserve the order of the data in a batch; that is, they are strictly increasing function. Data values that are larger in the original scale will be larger in the re-expressed scale, but the spacing may change.

2. They preserve letter of a batch except for small differences that may result from interpolating between data points. In particular, because letter values rely on order, medians are transformed to medians, and fourths to fourths.

3. They are continuous functions; this guarantees that points that are very close together in the raw batch will also be very close together in the re-expressed batch, at least relative to the scale being used.

4. They are smooth functions in that the functions we use have derivatives of all orders. This requirement guarantees that the functions do not have sharp corners.

5. They are specified by elementary functions, so that re-expression with the aid of all but the simplest hand-held calculators is quick and easy.

Reasons for Transforming Data

1. To enhance interpretation in a natural way.

Sometimes transformation provides a natural way of reporting the information. For example, temperature reading is commonly reported in Fahrenheit degrees (OF). Transforming it in Celsius degrees (OC) facilitates interpretation since 0OC and 100OC are the freezing and boiling points, respectively, of water.

Page 74: Course Notes Statistics


2. To arrive at a symmetric pattern.

Symmetry is a desirable property not just for aesthetic reasons. A “typical” value

like the average or median summarizes a batch and is best understood when the batch

pattern is symmetrical.

3. To stabilize the spread in several batches.

With several batches, an increase in the level of a batch usually brings about an

increase in spread. If the relationship between spread and level is strong, there is

often a need to transform the batches so that they will be better suited for

comparison, visual exploration, and confirmatory analysis, e.g. ANOVA models

assumes constant variance within groups. In addition, the individual batches will

become more nearly symmetric and may have fewer outliers.

4. To straighten out relationship between two variables.

There are advantages in working with a linear relationship between two variables.

Interpretation is easier and departures from fit are more easily detected.

Furthermore, interpolation and extrapolation are easy. Transformation of one or both

variables sometimes will straighten relationships that are not linear.

5. To simplify the structure in two-way tables.

The two-way table presents another data structure in which transformation may lead to simplification. Transformation often makes it easier to understand and explain all the systematic variation in the table using an additive model.

The Stem-and-Leaf display

Page 75: Course Notes Statistics


The stem-and leaf display is a graphical tool used to organize data in such a way that the

shape/distribution of the data is seen clearly without losing the actual data values. It enables us to

notice such characteristics as:

- the symmetry of the batch

- how spread out the data values are

- the presence of values that are far removed from the rest - the presence of areas wherein the data is concentrated

- the presence of gaps in the sorted data


1. It helps us to see the distribution of the data values within each interval, as well as patterns in the data values

2. By preserving more early digits of the data values, the display also shortens the link back to the individual observation and any identifying information accompanying it.

3. The lines in the display provide more information than the bars in the histogram.

Constructing a basic stem-and-leaf display

1. Choose a suitable pair of adjacent digits in the data. 2. Split the data value between the two digits. 3. We allocate a separate line in the display for each possible string of leading digits (stem). 4. Write down the trailing digits (leaf) of each data value on the line corresponding to its

leading digits. As an option, the leaves can be sorted in ascending order. 5. Do not forget to indicate the title and unit.


Ranks are assigned by counting in from each end of the ordered batch. The depth of the data

value is the smaller of 2 ranks.

o Depth is symmetric and the median will have the highest depth. o For lines having more than 1 leaf, the depth will be the maximum among the leaves on the

line. o If the median falls on a stem, instead of putting the depth of the median on the line to

which it belongs, we count the leaves on the middle line and enclose it in parenthesis. Otherwise, use the definition of depth.

Example The following data show the commissions earned (in thousands of pesos) by a firm of 26

real-estate brokers in a month:

93 67 75 88 66 60 56 71 102 83 81 60 57 59 77 70 86 87 95 84 71 73 66 70 95 82

Page 76: Course Notes Statistics


# n

M depth of median median

F depth of fourth lower fourth upper fourth

1 lower extreme upper extreme

# n reminds the reader that the

batch has n values

M stands for median

F stands for fourth

This column contains the depths

of M,F and the extremes

Stem Leaf (unit = 1) Depth

5 | 6 7 9 3

6 | 0 0 6 6 7 8

7 | 0 0 1 1 3 5 7 (7)

8 | 1 2 3 4 6 7 8 11

9 | 3 5 5 4

10 | 2 1

Note: Resistant methods are little affected by a small fraction of unusual data values. When

unusual data values are noticed, these are discarded, and the basis for the choice of scale is the

rest of the data that remains.

Letter Values

For exploratory purposes, it is often advantageous to use summaries based on sorting and

counting. Such summaries can be resistant, that is, an arbitrary change in a small part of the batch can

have only a small effect on the summary. Letter values are a collection of observations drawn

systematically from the batch, more densely from the tails than from the middle. These letter values

can be used to

define resistant measures of location

define the amount of spread in the batch

search for outliers

If we are to make effective use of summary values, we must present them in a format that

reveals the important numerical features of a batch and invites simple calculations related to

location and spread. A simple, useful and flexible method is with the use of the 5-letter summary.

The skeleton of this letter-value display looks like the following:

Page 77: Course Notes Statistics


Constructing the 5-number summary 1. Sort the data from lowest to highest. 2. Find the median and the depth of the median.

The upward rank is the position of the value counting upward from the smallest value.

The downward rank is the position of the value counting downward from the largest value.

The depth of a data value is the smaller between its upward rank and its downward rank.




3. Find the fourths/hinges and their depths




4. Find the extremes

- the two data values with a depth of 1, namely, the minimum and the maximum

5. Present the data following the format shown above.

The 5-number summary can be extended to a 7-letter summary by adding two more summary

values, called the eighths. This simply follows the pattern started by defining the fourths, with the

depth of the eights being defined as




For larger and larger batches we can continue to add pairs of summary values by halving he

fraction of the data remaining beyond the previous non-extreme summary value at each end of

the batch and stopping when the depths reaches 1.

Box-and-Whisker Plot

The box-and whisker plot, more commonly known as the boxplot, was developed by John Tukey. It is a visual representation of the 5-number summary of a batch of numbers which shows much of the structure of the batch. The boxplot shows characteristics that derive from the actual data, not from an assumed distributional form. It can be used when we cannot or do not assume a distributional form for the data.

From the boxplot, the following characteristics can be determined:

a. location b. spread c. skewness d. tail length e. outlying observations

Page 78: Course Notes Statistics


Constructing Box-and-Whisker Plot

1. Obtain the 5-letter summary of the batch and the fourth spread. The cut-offs are defined as the two point 1.5 midspreads (fourth-spread) away from the two fourths.

df = FU – FL lower cut-off = FL – 1.5df upper cut-off = FU + 1.5df

2. Construct the box. The left (bottom) end of the box is located at the lower fourth, the right (top) end is located at the upper fourth. These are called the hinges. The median is a line located inside the box, between the two fourths.

3. The tips/caps of the whiskers are lines located at the points in the batch farthest from the hinges, but within the defined cut-offs.

4. Any cases beyond these marks are marked individually: - outliers are points between 1.5 and 3 midspreads from the hinges, denoted by an

x-mark. - extremes are points beyond 3 midspreads from the hinges, denoted by a circle.


Population of the 15 largest US cities in 1990

City Pop'n (in 10,000s) City Pop'n (in 10,000s)

New York 778 Washington D.C. 76

Chicago 355 St. Louis 75

Los Angeles 248 Milwaukee 74

Philadelphia 200 San Francisco 74

Detroit 167 Boston 70

Baltimore 94 Dallas 68

Houston 94 New Orleans 63

Cleveland 88

# 15 Population of 15 Largest US

Cities in 1960

M 8 88

F 3h 74 183.5

1 63 778

dF = FU – FL = 109.5

Outside Cut-offs: (-90.25, 347.75)

Outliers: New York (778), Chicago (355)

Page 79: Course Notes Statistics


An abstract model is a description of the essential properties of a phenomenon.

A deterministic model is a type of abstract model that describes a phenomenon through known relationships among the states and events, in which a given input will always produce the same output.

The development of probability theory was not originally intended to be used in solving

inferential problems. It was first developed to give answers to professional gambler’s questions

on the systematic pattern of outcomes of games involving dice or cards that will allow them to

adjust their bets to the “odds” of success. This is the reason why most of the basic examples on

probability theory are die-throwing experiments and the selection in a deck of cards.

Today, many important phenomena that are of interest to humankind share something

in common with these games of chance. It is impossible to predict with certainty when such a

phenomenon will occur. By studying patterns, we can learn more about the behavior of the

phenomenon of interest and then be able to predict an occurrence of a phenomenon with a

certain degree of confidence.

The use of abstract models is actually not new to many of us. We apply the

mathematical formula provided by an abstract model to come up with an approximation of


The deterministic model is a model that we commonly encounter during the

application part of Elementary Math. One example is the computation of the area of a certain

rectangular piece of land. The area that you would get would always be the same every time

you compute for it.

6.1 Probabilistic Models

Page 80: Course Notes Statistics


A probabilistic/stochastic model is a type of abstract model that describes a phenomenon by assigning a likelihood of occurrence to the different possible outcomes of the process.

A random experiment is a process that can be repeated under similar conditions but whose outcome cannot be predicted with certainty beforehand.

The sample space, denoted by (Greek letter, omega), is the collection of all possible outcomes of a random experiment. An element of the sample space is called a sample point.

An example of a stochastic model is the game that involves tossing a coin. The results

of the tosses would not be certain even if it is loaded (unfair coin). In fact, no matter how many

times we repeat the process, it is impossible to predict with certainty what the next outcome

will be.

In inferential statistics, the process of selecting a sample of size n from a population of

size N using probability sampling is one of the random experiments of interest. It is just like

selecting n cards at random from a deck of N=52 cards.

Even if we use exactly the same sample selection procedure, there is no way we can

predict, without any error, what the composition of the next sample will be.

We can show the sample space by using any of the various methods of listing. One

example is the roster method, where we list all the possible outcomes of the experiment.

Example: Specify the sample space of the experiment of tossing a coin twice.

First we use H to denote the result of getting a head in a toss and T to denote

the result of getting a tail in a toss. Then there are just four possible results therefore:

= {HH, HT, TH, TT}

6.2 Basic Concepts of Probability

Page 81: Course Notes Statistics


An event is a subset of the sample space whose probability is defined. We say that an event occurred if the outcome of the experiment is one of the sample point belonging in the event; otherwise, the event did not occur.

Aside from the roster method, we can specify the sample space using the rule method

which is usually more preferred when the experiment has a lot of results to list


Example: Specify the sample space of the experiment of tossing a coin 1000 times.

Again, first we use H to denote the result of getting a head in a toss and T to

denote the result of getting a tail in a toss. Then there are 21000

= 1.0715E+301

possible results therefore we would use the rule method:

= {(x1, x2, , )| xi {H, T} for all i }

In set theory, a subset of the universal set is a set. Since an event is a subset of the

sample space (our universal set), then we can use the same notation to denote a set which is

any capital Latin letter to denote an event of interest.

Example: Consider the experiment of tossing a pair of colored dice, one is red and the

other one is green. Let = {(x, y) | x {1, 2, …, 6} and y {1, 2, …, 6}} where x is for the red die

and y for the green die. This sample space contains 36 sample points by rules of counting.

Again, first we use H to denote the result of getting a head in a toss and T to denote the

result of getting a tail in a toss. Then there are 21000

= 1.0715E+301 possible results

therefore we would use the rule method:

Some examples of events are listed below:

A = event of having the same number of dots on both dice

= { (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6) }

B = event of 3 dots on the red die

= { (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6) }

C = event of getting a sum of 5 dots on both dice

= { (1, 4), (2, 3), (3, 2), (4, 1) }

D = event of 7 dots on the green die =

Page 82: Course Notes Statistics


The impossible event is the empty set, . The sure event is the sample space,

Two events A and B are mutually exclusive events if and only if AB=; that is, A and B have no common elements.

The event D is an example of an impossible event where we know that this event would never happen. Also, sometimes two events could occur simultaneously but also sometimes two events could never occur simultaneously. The easiest way to check if two events could happen simultaneously is to look at the sample points of both events, if they have at least one common sample point, this means that the two events can happen simultaneously otherwise, if they do not have any common sample points, these events cannot occur simultaneously which is also called mutually exclusive events.

The concept of mutually exclusive events can be extended to more than two events.

For example, three events, say A, B, and C are mutually exclusive events whenever it is

impossible for any pair of these events to occur simultaneously. Mathematically speaking,

AB=, AC=, and BC= must all be true.

Review of Some Concepts in Set Theory for Probability Theory

Let A, B and C be events in then:

1. Inclusion: We say that A is a subset of B if all points of a set A are also points in B. Symbolically, ( )

2. Equality: If and B , then A and B are said to be equal, denoted by A=B.

3. Intersection of A and B: AB = {x : x A and x B}

4. Union of A and B: AB = {x : x A or x B} 5. Complement of A: AC =* +

6. Set Difference: A\B = ABC = {x : x A and x B}

7. Symmetric Difference: (A\B) (B\A) = {x : x A and x B} or {x : x B and x A}


a) Unions, intersections and set differences f events are also events. b) Some of the properties of event composition and event relations listed below are

useful in answering problems on probability.

Some Properties of Event Composition and Event Relations:

1. Reflexivity of Inclusion: A A.

2. Transitivity of Inclusion: A B, B .

3. A\B and B\A are disjoint. AB and A\B = ABC are also disjoint.

4. AB + ABC = A

5. Reflexivity of Union and Intersection: AA=A and AA=A

Page 83: Course Notes Statistics


The probability of an event A, denoted by P(A), is a function that assigns a measure of chance that event A will occur and must satisfy the following properties: a) ≤ ( ) ≤ for any event A

b) P() = 1

c) Finite Additivity: If A can be expressed as the union of n mutually exclusive

events, that is, A=A1 A2 .... An, then P(A) = P(A1) + P(A2) + ... + P(An)

A simple event is an event which contains only one element of the sample space while a compound event is an event that can be expressed as the union of simple events, thus containing more than one sample point.

6. Commutative Property of Union and Intersection: AB= BA and AB=BA

7. Associative Property of Union and Intersection:(AB) C=A(BC)= (AC) B

(AB) C=A (BC) = (AC) B

8. Distributive Property of Union and Intersection: A (BC)= (AB) (AC)

A (BC) = (AB) (AC)

9. De Morgan’s Law: (AC)C = A

(AB) = AC BC

(AB) = AC BC

The last property, also called the finite additivity property, provides a useful tool in

computing for the probability of an event. It says we need to express event A as the union of

mutually exclusive events with known probabilities, and then, we simply get the sum of their

individual probabilities in order to compute for P(A).


1. Being a subset of the sample space, an event is itself a set not a number. On the other

hand, the probability of an event is a number.

2. Probability that is near to 1 indicates that the event is more likely to occur. It is not a

guarantee that the event would occur, it is just that it is a common occurrence. On the

other hand if the event is near to 0, it is a rare event, meaning it is less likely to occur

3. Probabilities near ½ or 0.5 indicates that the event is just as likely to occur as to not.

Page 84: Course Notes Statistics


Under classical approach, a probability of an event is determined even before the

experiment is performed using the following rule:

If an experiment can result in any one of N different equally likely outcomes, and if

exactly n of these outcomes corresponds to event A, then the probability of event A is:

( )

Example: Tossing a Fair Coin Thrice

Note that in the experiment of tossing a fair coin thrice, it was already assumed that the

coin is “fair” meaning, heads and tails are equally likely to occur on each toss. It is apparent that

the random experiment need not be performed to assess the probability of occurrence of an

event. The number of sample points in event A, the event that there is at least one tails in three

tosses of a fair coin, is seven. Then P(A), the probability that three tosses of a coin would result

to at least one tails, is 7/8.

Under relative frequency approach, the probability of an event is determined by

repeating the experiment a large number of times using the rule:

( )


Numerous intensive studies have been conducted to analyze consumer planning for the

purchase of durable goods such as television sets, refrigerators, washing machines, stoves and

automobiles. Suppose that a marketing director for a consumer electronics company was

interested in studying the intention of consumers to purchase new large television sets

(defined as 35 inches or larger) in the next 12 months and, as a follow-up, whether they in fact

actually purchased the television. Suppose that a sample of 1000 households was initially

6.3 Approaches to Assigning Probabilities

6.3.1 Classical Probability (A Priori)

6.3.2 Relative Frequency (A Posteriori)

Page 85: Course Notes Statistics


selected and the respondents were asked whether they actually purchased the television. The

results are summarized below.

Purchase Behavior of 1,000 Household for Large Televisions

Planned to Purchase

Actually Purchased Total

Yes No

Yes 200 50 250

No 100 650 750

Total 300 700 1000

Consider the random experiment of conducting an interview of a household regarding the intention to purchase a large television and performing a follow-up as to whether or not the said household purchased the television a year after. The random experiment is performed for 1000 different households. Suppose that a new household with same characteristics as previously surveyed is subjected to the same interview and follow-up. The probability that the new household intends to buy a large television and pushes through a year is approximately (200/100) = 0.20.

Under the Subjective Probability Approach, the probability of occurrence of an event is

determined by the use of intuition, personal beliefs, and other indirect information.


Suppose that an oil spill has occurred. An environmental scientist asks, “What is the

probability that this spill can be contained before it causes widespread damage to nearby

beaches?” Many factors come into play; among them are the type of spill, the amount of oil

spilled, the wind and water conditions during the clean-up operation, and the nearness of the

beaches. These factors make this spill unique. The scientist called upon to make a value

judgment, that is, to assign a probability to the event based on informed personal information.

6.3.3 Subjective Probability

Page 86: Course Notes Statistics


Suppose an experiment can be divided into 2 stages. Then if the first stage of the

experiment can result in n distinct possible outcomes and if, for each outcome of the first

stage, there are m distinct possible outcomes, then, there are nm possible outcomes of this



1. If a man has red, green and gold shirts and red, green and gold neckties, how many

ways can he choose different colors for his shirt and necktie?

A = {(x, y): x {R, G, Au} and y {R, G, Au}. The first coordinate of the ordered pair, x, represents the color of the shirt selected while the second coordinate, y, represents the color of the necktie selected.

Let n = number of choices for x m = number of choices for y

Then n(A) =nm =(3)(3) = 9

2. From a menu containing 3 soups, 2 salads, 6 entrees, and 3 desserts, how many

different dinners can be ordered?

A = {(x1, x2, x3, x4): x1 {1, 2, 3}, x2 {1, 2}, x3 {1, 2, 3, 4, 5, 6} and x4 {1, 2, 3}. where x1 = type of soup ordered let n1 = number of types of soup

x2 = type of salad ordered n2 = number of types of salads

x3 = type of entrees ordered n3 = number of types of entrees

x4 = type of dessert ordered n4 = number of types of desserts

Then n(A) = n1n2n3n4 = (3)(2)(6)(3) = 108

3. How many 3-digit numbers can be formed from the digits 1,2,5,6 and 9 a. If each digit need not be distinct?

A = {(x1, x2, x3): xi {1, 2, 5, 6, 9}, i= 1, 2, 3}. n(A) = (5)(5)(5) = 125

b. If each digit must be distinct?

B = {(x1, x2, x3): xi {1, 2, 5, 6, 9}, i= 1, 2, 3 where xi xj for i j}. n(B) = (5)(4)(3) = 60

6.4 Counting Techniques

6.4.1 Basic Principle of Counting

Page 87: Course Notes Statistics


An ordered r-tuple of elements of a nonempty set Z, say (z1, z2,...,zk), with distinct

components (that is, zi zj) is called a permutation of r elements of Z. If set Z contains n distinct elements, then the number of r-permutations of set Z is denoted by P(n,r) or nPr A subset {z1, z2,...,zk}} with k distinct elements of a nonempty set Z, is called a combination of k elements of Z. If set Z contains n distinct elements, then the number of r-combinations

of set Z is denoted by C(n,r) or n⬚r

4. If a multiple-choice test consists of 5 questions each with 4 possible answers of which

only 1 is correct,

a. How many different ways can a student answer the questions?

A = {( x1, x2, x3, x4, x5): xi {a, b, c, d }, i= 1, 2, 3, 4, 5}.

n(A) = (4)(4)(4)(4)(4) = 45 = 1024

1. How many different ways can a student answer all the 5 questions incorrectly?

B = {( x1, x2, x3, x4, x5): xi {a, b, c, d } – {y}, where y is the correct answer, i= 1, 2, 3, 4, 5}.

n(B) = (3)(3)(3)(3)(3) = 35 = 243


1. How many different 7-place license plates are possible if the first 2 places are for

letters and the other 5 are for numbers? What about if no letter or no number can

be repeated in a single license plate?

2. Paul, John, Ringo and George have formed a band consisting of 4 instruments. If

each of the boys can play all 4 instruments, how many different arrangements are

possible? What if Paul and John can play all 4 instruments but George and Ringo

can play only the piano and drums?

3. How many different ways can a true-false test consisting of 10 questions be


Example: Suppose Z = {1, 2, 3, 4, 5}. List down all the possible permutations of 3 elements

of Z. List down all possible combination of 3 elements of Z.

There are 60 possible permutations for this scenario.

Whereas, there are only 10 possible combinations as follows: {1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5}, {1, 4, 5}, {2, 3, 4}, {2, 3, 5}, {2, 4, 5}, {3, 4, 5}.

6.4.2 Permutations and Combinations

Page 88: Course Notes Statistics


If we are counting the number of ways k objects could be chosen from n objects without regards on being distinct (getting th e same object more than once). We could get it by the formula nk. While if we need to take into account that the selected objects needs to be distinct. We could get it by the formula n! = n x (n-1) x (n-2) x ..... x 1. We also define 0! = 1.


There are k! distinct permutations associated with each combination. k! represents the

number of ways in which you can arrange the elements that were included in the combination.

When we are counting the number of different groups of k objects that can be formed

from n distinct objects, we would count the combinations while if we are interested in

determining the number of different ordered arrangements of k objects selected from n

distinct objects, we count the permutations.


1. Consider the experiment of tossing 4 distinguishable dice.

a. How many possible outcomes are there?

= {(x1, x2, x3, x4): xi Z = {1, 2, 3, 4, 5, 6}}

n() = nk = 64 = 1296

b. How many possible outcomes are there for which no two dice show the same

number of spots?

A = {(x1, x2, x3, x4): xi Z = {1, 2, 3, 4, 5, 6}, xi xj for i j}

n(A)= (n)k = (6)4= (6)(5)(4)(3) =360

c. How many possible outcomes are there for which all the spots are even?

B = {(x1, x2, x3, x4): xi Z = {2, 4, 6}}

n(B) = nk = (3)4 = 81

2. From a group of 5 men and 7 women.

1. How many committees of 5 persons can be formed?

Since we are counting the number of different groups of k=5 objects that could

be formed from a total of n=12 objects then we are counting the number of

combinations. Therefore, the number of possible committees of size 5 that can

be formed is ( ) (


Page 89: Course Notes Statistics


2. How many committees of 2 men and 3 women can be formed?

The experiment can be divided into 2 stages: (a) the selection of the men and (b)

the selection of women. Then by the basic principle of counting, there are n1n2

possible committees where n1= number of ways that the men can be selected

and n2= number of ways that the women can be selected.

( ) (

). Thus, the number of committees consisting of 2 men

and 3 women that can be formed is n1n2= ( ) ( )=(10)(35) = 350.


1. Consider the game of poker where a player is given 5 cards.

a. How many 5-card poker hands are there?

b. How many of these 5-card poker hands contain exactly 3 hearts?

2. A class consists of 10 men and 20 women. An examination is given, and the students

are ranked according to their performance. Assume that no two students obtain the

same score.

a. How many different ranking are possible?

b. If the men were ranked just among themselves and the women among

themselves, how many different ranking are possible?

3. Five separate awards are to be presented to selected students from a class of 30.

How many different outcomes are possible if

a. A student can receive any number of awards;

b. Each student can receive at most 1 award

Example: How many different letter arrangements can be formed using the letters P E P P E R?

There are


possible letter arrangements

where n = number of letters r1 = number of P’s r2 = number of E’s r3 = number of R’s

Additional Counting Theorems:

The number of permutations of n distinct objects in a circle is (n-1)!

The number of distinct permutations of n things of which r1 are of the 1st kind, r2 are of

the 2nd kind, ..., rk are of the kth kind is

where ∑ 𝑟

Page 90: Course Notes Statistics


Take note that there are actually 6! permutations of the letters P1, E1, P2, P3, E3, R when the 3 P’s and 2 E’s can be distinguished from each other. But in this case, we know that the letters cannot be distinguished from each other. Exercise:

1. How many different signals, each consisting of 9 flags hung in a line, can be made from a set of 4 white flags, 3 red flags, and 2 blue flags, if all flags of the same color are identical?

2. How many ways can 20 new applicants be assigned to the 5 committees of an organization so that each committee will get 4 new applicants each?

Suppose an urn contains M balls, labeled 1 to M, and a sample of size n is drawn, then

there are:

i. Mn ordered samples with replacement

ii. (M)n ordered samples without replacement

iii. ( ) unordered samples without replacement

Example: Suppose an urn contains 10 balls, labeled 1 to 10, and a sample of size 5 is drawn.

1. How many ordered samples with replacement can be drawn? (10)5

2. How many ordered samples without replacement can be drawn? (10)5

3. How many unordered samples with replacement can be drawn? ( )

6.4.3 Special Results in Counting

Theorem: Suppose an urn contains M balls, labeled 1 to M, and those labeled from 1 to K (K<M) are defective. Define Ak = the set containing all possible ordered samples of size n which contains k defectives. (i) under sampling with replacement, n(Ak) = (ii) under sampling without replacement, n(Ak) =

Page 91: Course Notes Statistics



1. Suppose an urn contains 10 balls, labeled 1 to 10, and those labeled from 1 to 4 are

defective. Define A2 = the set containing all possible ordered samples of size 5 which

contains 2 defectives.

a. How many elements belong to A2 if sampling is done with replacement?

( ) .

/ ( )n (

* ( )

b. How many elements belong to A2 if sampling is done without replacement?

( ) .

/ ( ) (

* ( )

( )

2. If a multiple-choice test consists of 10 questions each with 4 possible answers such that

a. there are exactly 7 correct answers

( ) .

/ ( )n (

* ( )

b. there are at least 7 correct answers

( ) ∑0.

/ ( )n 1


* ( ) ]

c. there are at most 7 correct answers

( ) ∑0.

/ ( )n 1


* ( ) ]

Exercise: Suppose a group of 20 undergraduate students and 10 post graduate students are

available to fill certain student government posts. If 6 students are to be randomly selected

from this group,

1. How many possible ordered samples with replacement are there;

2. How many possible ordered samples without replacement are there;

3. How many possible ordered samples with replacement will contain exactly 3

undergraduate students;

4. How many possible ordered samples without replacement will contain exactly 3

undergraduate students?

Page 92: Course Notes Statistics


Bonus Exercise:

1. How many different letter arrangements can be made from the letters of the word

M I S S I S S I P P I?

2. How many ways can 8 people be seated in a row if

a. there are no restrictions on the seating arrangement;

b. Persons A and B must sit next to each other;

c. there are 4 men and 4 women and no 2 men or women can sit next to each


d. there are 5 men and they must sit next to each other;

3. A woman has 8 friends, of whom she will invite 5 to a party. How many choices does she

have if 2 of the friends are feuding and will not attend together? How many choices

does she have if 2 of hers friends will only attend together?

4. How many ways can a man divide 7 gifts among his 3 children if the eldest is to receive 3

gifts and the others 2 each

5. Suppose a precinct consists of 150 voters, 100 of whom are women and the remaining

50 are men. Suppose a sample of 25 voters will be selected in this precinct, how many


a. ordered sample with replacement are there?

b. ordered sample without replacement are there?

c. unordered sample without replacement are there?

d. ordered sample with replacement are there when 10 of the voters selected are


e. ordered sample without replacement are there when 10 of the voters selected

are men?

f. ordered sample without replacement are there when at least 20 of the voters

selected are men?

Page 93: Course Notes Statistics


Theorem (Complement Events): If A is an event then P(AC) = 1 – P(A)

Thus, the sum of complementary events is 1; P(A) +P(Ac) = 1

Theorem (Additive Rule): If A and B are any two events, then;

P(AB)= P(A)+P(B)- P(AB)

Corollary: If A and B are mutually exclusive, then P(AB)= P(A)+P(B)

Corollary: If A1, A2, A3, ... , An are mutually exclusive, then

P(A1 A2A3 … An )= P(A1) + P(A2) + P(A3) + … +P(An)

Theorem: If A and Bare any two events, then using the previous two theorems;

P(ABC)= P(A) - P(AB)

Example: A health worker is studying the prevalence of certain diseases in a particular

community. Based on previous studies, the health worker was able to come up with the

following figures: 10% of the people in the community will contract disease A sometime during

their lifetime; 25% will contract disease B; and 5% will contract both diseases. Find the

probability that a randomly selected person from this community will contract:

a) at least one of the 2 diseases

b) disease B but not disease A

c) exactly one of the 2 diseases


Let A = event that selected person will contract disease A B = event that selected person will contract disease B

We can express the given percentages in terms of probabilities as follows:

P(A) = 0.10 P(B) = 0.25 P(AB) = 0.05

a) at least one of the 2 diseases; P(AB)= P(A)+P(B)- P(AB) = 0.10 + 0.25 - 0.05 = 0.30

b) disease B but not disease A; P(BAC)= P(B) - P(AB) = 0.25 -0.05 = 0.20

c) exactly one of the 2 diseases; P[(ABC) (BAC) = P(ABC) + P(BAC)

= [P(A) - P(AB)] + [P(B) - P(AB)] = 0.05 + 0.20 = 0.25

6.5 Probabilities of Events

Page 94: Course Notes Statistics


Theorem: If A and B are any two events where P(B>0) then; P(A|B) = ( ∩ )

( )

There are times when we change our assignment of the probability of an event

whenever we have additional information concerning the occurrence of other events.

The original measure of the probability without using additional information

concerning the occurrence of other events is called an unconditional probability. While the

probability measure derived using the information concerning the occurrence of other events

that has already happened is called a conditional probability.

To reiterate the Conditional probability is the probability of event A occurring when we

already know that some event B has already occurred.

Example: There are 100 insurance claims that are classified according to the type of policy and

whether the claim is fraudulent or not. If a claim is selected at random,

Category Type of Policy

Total Fire Auto Other

Fraudulent 6 1 3 10

Non-fraudulent 14 29 47 90

Total 20 30 50 100

Find the probability of:

a) selecting a fraudulent claim given that such claim is for a fire policy

b) selecting a fraudulent claim given that such claim is for a policy that is not about fire


Let A = event of selecting a fraudulent claim B = event of selecting a fire policy

We can express the given data in terms of probabilities as follows:

P(A) = 10/100 = 0.10 P(B) = 20/100 = 0.20 P(AB) = 6/100 = 0.06

P(BC) = 80/100 = 0.80 P(ABC) = 4/100 = 0.04

a) P(A|B) = ( )

( ) ⁄

b) P(A|BC) = ( )

( ) ⁄

6.5.1 Conditional Probability

Page 95: Course Notes Statistics



1. A movie critic feels that the probabilities that a certain movie will get an award for best

actress is 0.18, for best actor is 0.33, and at least one of these two awards is 0.40.

Suppose it was just announced that the movie won the best actor award, what is the

probability that it will win the best actress award?

2. The HR Department conducted a census to determine whether fear of flying is a major

problem in their company. The employees were first classified as flyers (flown at least

once), non-flyers likely to fly, or non-flyers not likely to fly. Then the employees were

asked whether they get anxious of flying. The results of the census were as follows:

Anxiety Level Flight Experience

Flyers Non-flyers

Likely to Fly Non-flyers Not

likely to Fly

No anxiety 750 120 95

A little anxious 175 45 5

Very anxious 120 45 80

Two events A and B are said to be independent events if and only if any one of the

following conditions is satisfied:

a) P(A|B) = P(A) if P(B) > 0; or

b) P(B|A) = P(B) if P(A) > 0; or

c) P(AB) = P(A)* P(B)

Otherwise, the events are dependent.


1) The probability that Renzo will correctly answer the toughest question in an exam is

1/4. The probability that Sandro will correctly answer the same question is 4/5.

Assuming that the two events are independent (would not cheat), find the probability

of the following events:

a. Event that both Renzo and Sandro will answer the question correctly

b. Event that only Sandro will answer the question correctly

6.5.2 Independent Events

Page 96: Course Notes Statistics


A random variable is a function whose value is a real number determined by each element in the sample space.

2) The probability that a Japanese industry will locate in Cebu is 0.7. The probability that it

will locate in Bataan is 0.3, and the probability that it will locate in at least one of the

two provinces is 0.79. Define A = event that a Japanese Industry will locate in Cebu and

B = event that a Japanese Industry will locate in Bataan. Are A and B independent

events? Justify your answer.

3) Consider the experiment of tossing a fair die twice. Define the following event:

A = event of observing a sum of 11 dots in the two tosses

B = event of observing even number of dots in both tosses

C = event of observing 6 dots on the first toss

D = event of observing even number of dots on the first toss

Identify all pairs of independent events. Justify your claim by showing that one of the

conditions in the definition of independence is satisfied.


1. An uppercase letter is used to denote a random variable and its corresponding

lowercase letter is used to denote any one of its values.

2. To have an idea on the possible values that a random variable could take, it would be a

good practice to first understand the random experiment whose possible outcomes are

the elements of the sample space then define the random variable of interest. The

values that the random variable assumes would then depend on the individual sample

points in the said sample space.

7.1 Concept of a Random Variable

Page 97: Course Notes Statistics


Discrete Sample Space - A sample space that contains a finite number of possibilites or an unending sequence with as many elements as there are whole numbers.

Discrete Random Variable - A random variable defined over a discrete sample space.

Discrete Probability Distribution - A table or formula listing of all possible values of a discete random variable along with the associated probabilities.


Consider a random experiment that requires a fair coin be tossed thrice. Then we denote X as the resulting number of Heads. Then we have:

The Sample Points and the Number of Heads in Three Tosses of a Fair Coin

Sample Point X Sample Point X





Each possible value of the random variable X represents an event that is a subset of the sample space for a given experiment.

In the above experiment of tossing a coin thrice,

{X = 0} represents {TTT} = event of getting three tails (zero heads), {X = 1} represents {HTT, THT, TTH} = event of getting one head in three coin tosses, {X = 2} represents {HTH, HHT, THH} = event of getting two heads in three coin tosses, {X = 3} represents {HHH} = event of getting three heads in three coin tosses

Exercise: Owner-Umbrella Match

A totally colorblind baggage attendant returns three umbrellas of the same design but

has different colors at random to three customers who had previously checked them. If Jason,

Charlie, and Michael, in that order, receives one of the umbrellas, list the sample points for the

possible orders returning the umbrellas and find the values m of the random variable M that

represents the number of correct matches.

7.2 Probability Distributions

7.2.1 Discrete Probablity Distribution

Page 98: Course Notes Statistics


The probability mass function (PMF) of a discrete random variable, denoted by f(), is a function defined for any real number x as;

f(x) = P(X=x)

The values of the discrete random variable X for which f(x)>0 are called its mass points

***Note that the Discrete Probability Distribution and the Probability Mass Function are the same***

Example: Tossing a Fair Coin Thrice (Cont.)

The probability distribution table of X, the number of heads in three tosses of a fair coin, is given by:

x 0 1 2 3

P(X=x) 1/8 3/8 3/8 1/8

Note that since X is the number of heads in three tosses of a fair coin, then P(X=x) is the “probability that the number of heads in three tosses of a fair coin is x”. We then say that the probability that 2 heads resulted in three tosses of a fair coin is 3/8.

What is the probability that there is only one head in three tosses of a fair coin?

What is the probability of getting more than one head in three tosses of a fair coin?

What is the probability of getting at least one head in three tosses of a fair coin?

How are these probabilities denoted?


1. Tossing A Fair Coin Thrice: Construct the probability distribution table for Y, which is the difference of the number of tails from the number of heads.

2. Owner-Umbrella Match: Construct the probability distribution table for M, the number of correct owner-umbrella matches

Remarks: 1. The probabilities associated with all possible values of a discrete random variable must

sum to 1.

2. If X is a discrete random variable, then P(X x) may not be the same as P(X > x).

3. Since (X>x) and (X≤x) are complementary events, then P(X>x) + P(X≤x) =1.

Page 99: Course Notes Statistics


Continuous Sample Space - A sample space that contains a infinite number of possibilites equal to the number of points on a line segment.

Continuous Random Variable - A random variable defined over a continuous sample space.


For the following experiments, is the sample space discrete or continuous?

1. Throw a coin until a head occurs. 2. Measure the distance of a certain mode of transportation will travel over a prescribed

test course on 5 liters of gasoline. 3. Measure the length of time before a chemical reaction takes place. 4. Counting the number of brown-eyed children (out of 2) born to a heterozygous couple

for eye color. 5. Counting the number of signals correctly identified on a radar screen by a German air

traffic controller in a 30-minute time span in which 10 signals arrive.

Probability Density Function

The function with values f(x) is called a probability density function for the continuous

random variable X, if

a. the total area under its curve and above the horizontal axis is equal to 1; and

b. the area under the curve between any two ordinates x=a and x=b gives the

probability that X lies between a and b.

A continuous random variable has a probability of zero assuming exactly any of its

values. Consequently, its probability distribution cannot be given in tabular form.

Consider a random variable whose values are the heights of all the people over 21 years

old. Between any 2 values, say 163.5 and 164.5cm, there’s an infinite number of height, of

which only 1 is 164cm. The probability of selecting a person at random who is 164cm tall, and

not one of the infinitely large set of heights so close to 164cm that you cannot humanly

measure the difference, is extremely remote. Thus, the probability of this event is almost equal

to zero. To get a useable probability we need to consider intervals instead of specific points.

7.2.2 Continuous Probablity Distribution

Page 100: Course Notes Statistics


Example: Areas Under a Rectangle

A continuous random variable X that can assume values between 0 and 2 has a density function given by:

( ) 2 𝑟 𝑟

a. Check if the total are under the rectangle and over the horizontal axis indeed 1, i.e. if P(0 < X < 2) = 1

b. P(X > 1.5) = P(1.5 < X < 2) + P(X 2) = (0.5)(0.5) + 0 = 0.25

c. P(X < 0.75) = P(X < 0) + P(0 ≤ X ≤ 0.75) = 0 + (0.75)(0.5) = 0.375

d. P(X = 0.75) = 0

e. P(X ≤ 0.75) = P(X < 0.75) = 0.375


Let X be a continuous random variable and a, b . Then, the following equities hold:

1. P(a ≤ X ≤ b) = P(X=a) + P(a < X ≤ b) = P(a < X ≤ b) 2. P(a ≤ X ≤ b) = P(a ≤ X < b) + P(X=b) = P(a ≤ X < b) 3. P(a ≤ X ≤ b) = P(X=a) + P(a < X < b) + P(X=b) = P(a < X < b)

Hence, P(a ≤ X ≤ b) = P(a ≤ X < b) = P(a < X ≤ b) = P(a < X < b)

Exercises: Areas Under a Rectangle

A continuous random variable X that can assume values between 2 and 4 has a density function given by:

( )

a. Show that P(2 < X < 4) = 1 b. Find P(X < 3.5) c. Find P(2.5 < X < 3.5)

(Hint: Area of Trapezoid = 𝑟 ⁄ )

Page 101: Course Notes Statistics


Associated with ay random variable are constants, or parameters, that are descriptive of the behavior of the random variable. Knowledge of the numerical values of these parameters gives the researcher a quick insight into the nature of the variables. These numerical values could be computed using expected values.

1. Expected Value (Mean)

The expected value, of a probability distribution is the long-run theoretical average value of the random variable. It is the value one would expect that the random variable would take upon repeated performance of the random experiment. It is the value one would expect that the random variable would take upon repeated performance of the random experiment.

In the discrete case, = E(X) =∑ ( )

In the continuous case, = E(X) ∫ ( )

2. Variance The variance of the probability distribution is the expected value of the squared

differences between the values that a random variable can take and its mean. It is a measure indicating the extent of variability about the mean value of the values that the random variable assumes.

In the discrete case, 2 = E(X - )2 =∑ ( ) ( )

In the continuous case, 2 = E(X - )2 ∫ ( ) ( )

3. Standard Deviation

The standard deviation, is the non-negative square root of the variance.

Example: Tossing a Fair Coin Thrice (Cont.)

The probability distribution table of X, the number of heads in three tosses of a fair coin, is given by:

x 0 1 2 3

P(X=x) 1/8 3/8 3/8 1/8

The expected value of X, E(X), is; = E(X) = (0)(1/8) + (1)(3/8) + (2)(3/8) + (3)(1/8) =12/8 = 1.5

E(X) = 1.5 implies that on the average, the number of heads we will observe in three tosses of a coin is 1.5. This makes sense since if the coin is fair, we would expect half of all the tosses to result to heads and the other half to results to tails.

Variance; 2 = E(X - )2 =∑ ( ) ( ) = (0-1.5)2(1/8) + (1-1.5)2(3/8) + (2-1.5)2(3/8)

+ (3-1.5)2(1/8) = 0.75

Exercise: Compute for the expected value and the variance of Y (difference of the no. tails and heads)

7.3 Expected Values

Page 102: Course Notes Statistics


A random experiment whose outcomes have been classified into two categories, labeled as "success" and "failure", is called a Bernoulli trial.

Mathematicians have already developed many stochastic models throughout the

centuries. Several of these models are so important in theory and application, and repeatedly

used in practice. In this section, we will be presenting some of these models, together with

their means and variances.

With this experiment, the Bernoulli distribution only has 2 mass points. These are 0 and

1. We can equivalently define the Bernoulli random variable as:

{ 𝑟 𝑟 𝑟

Based on the definition of the Bernoulli random variable, we could see that it is an

example of a discrete random variable. Since, it is a discrete random variable, we could

determine its probability using the probability mass function (PMF) given by;

( ) { 𝑟

where p = P(event of observing a “success”, 1)

If X follows a Bernoulli distribution, then we write X~Be(p). The Bernoulli distribution

has only 1 parameter, p. And if X~Be(p), the E(X) = p and Var(X) = p(1-p).

The Binomial experiment is a random experiment consisting of n Bernoulli trials. These

trials are strictly independent to each other and must be identical, meaning that the value of p

must be the same for each one of the Bernoulli trials.

7.4 Common Distributions

7.4.1 Bernoulli Distribution

7.4.2 Binomial Distribution

Page 103: Course Notes Statistics


To reiterate a binomial experiment should satisfy the following:

a) It consists of observing the outcomes of a sequence of n trials.

b) Each trial can result in one of only 2 possible outcomes which we can label as “success”

or “failure”.

c) The probability of success, p, must be the same for each one of the n trials.

d) The trials are independent in the sense that the probability of success at a particular

trial should not be affected by the outcomes of the previous trials.

A simple random sampling with replacement to select a sample for studies that aim to estimate

the proportion is an example of a binomial experiment

Since the Binomial random variable is just an extension of the Bernoulli random

variable, we could see that it is also an example of a discrete random variable. Since, it is a

discrete random variable, we could also determine its probability using the probability mass

function (PMF) given by;

( ) {.

/ ( )



where n and p are such that n is a positive integer and p is any real number between 0 and 1.

If X follows a Binomial distribution, then we write X~Bi(n, p). The Binomial

distribution has 2 parameters, n and p. And if X~Bi(n, p), the E(X) = np and Var(X) = np(1-p).

Example: A multiple –choice quiz has 15 questions, each with 4 possible answers of which only

1 is correct. Suppose a student has been abset for the past meetings and has no idea what the

quiz is all about. The student simply uses a randomization mechanism in answering the item.

a. What is the probability that the student will get a perfect score?

b. What is the probability that the student will get at least 3 correct answers?

c. What is the student’s expected number of correct answers?


Define X = the number of correct answers out of the 15 items. This scenario could be

treated like a binomial experiment where there are n=15 trials and the probability of getting

the correct answer for each trial p= 1/4= 0.25.

Thus, X~Bi(n = 15, p = 0.25) and its PMF is;

( ) {(

* ( )



Page 104: Course Notes Statistics


a. P(X=15) is the probability that the student will get a perfect score.

Using the given PMF,

P(X=15) = f(15) = ( ) ( )

As expected, there is a very small chance of getting a perfect score if a student is

simply guessing the answers.

b. P(X ) is the probability that the student will get at least 3 correct answers.

P(X ) ( ) , ( ) ( ) ( )-

( )

c. The student’s expected number of correct answers is E(X) = np = (15)(0.25) = 3.75 4


1. Rey is a fairly good basketball player. The chance that he can shoot from the free throw

line is as high as 0.8. If he were given 3 free throws (assuming his shots are

independent), what is the probability that he will be able to shoot the ball at least 2


2. A soda company wishes to compare the taste appeal of a new formula (formula A) with

their original formula (formula B). The company got 10 people to judge the 2formulas.

Each judge is given 3 glasses in random order, two containing formula A and the other

one containing formula B. Each judge tastes all 3 and states which glass he enjoyed the

most. Suppose there is actually no distinguishable difference between the tastes of

formulas A and B.

a. Find the probability that at least 8 of the people state a preference for formula A.

b. What is the expected number of people out of the 10 who will state a preference

for formula A? What is the variance?

As discussed already on the video, the normal distribution is often times considered as the most important distribution since it is the distribution that is related to most of the natural phenomenon in our world. This is also why it is called the normal distribution since it is often the assumed distribution of an experiment when the experiment is in its normal condition. Thus this distribution would be the one that we would mostly use in the discussion of inferential statistics, more specifically, the standard normal distribution which would also be discussed later on.

7.4.3 The Normal Distribution

Page 105: Course Notes Statistics


A continuous random variable X is said to be normally distributed if its density function is given by:

( )

√ 𝜎 {

. 𝜇

𝜎/ } , for

Notation: If X follows the above distribution, we write X ~ N(, 2)


The curve below was generated for X ~ (64.4,2.4)


Page 106: Course Notes Statistics


Recall: 1. How is a probability density function defined? 2. If X is a continuous random variable, how do we evaluate P(a<X<b) for ordinates a and

b? 3. What is the area under the curve generated by the density function f(x) and above the

horizontal axis? Remarks:

1. If X ~ N(, 2), then E(X) = and Var(X) = 2. 2. The graph of the normal distribution is called the normal curve.

3. The curve is bell-shaped and symmetric about a vertical axis through the mean . 4. The normal curve approaches the horizontal axis asymptotically as we proceed in either

direction away from the mean. 5. The total area under the curve and above the horizontal axis is equal to 1. 6. The mathematical equation for the probability distribution of the normal distribution

depends upon two parameters, and , its mean and standard deviation. Once and are specified, the normal curve is completely determined.

Below are the graphs of some normally distributed random variables with varying mean and standard deviation. Try to identify the pattern of the graph by comparing each distribution.

Page 107: Course Notes Statistics


Since X is a continuous random variable, then for any real numbers a and b such that a<b, we can get P(a<X<b) by getting the area of the shaded region below. This is true for

whatever value of and 2.

We saw how the normal curve is dependent upon the mean and standard deviation of

the distribution under investigation. The area under the curve between any two ordinates must

then also depend upon the values of and .

The problem with this computation is that if there are to normally distributed random

variables, say X1 ~ N(1, 12) and X2 ~ N(2, 2

2) and 1 2 and 12 2

2. Then P(a<X1<b)

P(a<X2<b). But to be able to compare these two distributions, we could standardize them by

transforming all of the observations of any normal random variable X to a new set of

observations of a normal random variable with mean 0 and variance 1.

The distribution of a normal random variable with mean 0 and a standard deviation equal to 1 is called a standard normal distribution.

Notation: If X follows the above distribution, we write X ~ N(0, 1)


If X ~ N(, 2), then X can be transformed into a standard normal random variable through

the following transformation:

Hence, whenever X is between the values x1 and x2, the random variable Z falls between


. Thus, P(x1<X<x2) = P(z1<X<z2).

a b

7.4.4 The Standard Normal Distribution

Page 108: Course Notes Statistics


Because now that we have learned a way to standardize a normally distributed random

variable, we can now just look on the table of areas for the standard normal distribution to get

the area (or probability) that we desire without using any integration and some other calculus

related concepts.


1. Area Under the Standard Normal Curve: Suppose Z ~ N(0, 1)

a. Find the probability α1 such that P(Z≤ ) = α1

b. Find the probability α2 such that P(Z ) = α2

c. Find the probability α3 such that P(Z ) = α3

d. Find the probability α4 such that P(Z ) = α4

e. Find the probability α5 such that P( ≤ ) = α5

f. Find the probability α6 such that P( ≤ ) = α6

g. Find the standard score z1 such that P(z< z1) = 0.50

h. Find the standard score z2 such that P(z> z2) = 0.1788

i. Find the standard score z3 such that P(-z3 < z< z3) = 0.984

2. Area under a Normal curve: One of the major contributors to air pollution is hydrocarbons emitted from the exhaust system of automobiles. The number of hydrocarbons emitted by an automobile per mile is normally distributed with a mean of 1 gram and a standard deviation of 0.25 gram. What is the probability that a randomly selected automobile will emit between 0.9 and 1.54 grams of hydrocarbon per mile?


Let X = number of grams of hydrocarbons emitted by an automobile per mile, where X ~ N(1, 0.252)

( ≤ X ≤ ) .

𝜎 ≤

/ ( ≤ ≤ )

( ≤ ) ( ≤ )

Thus 64% of the automobiles in operation emit between 0.90 and 1.54 grams of hydrocarbons per mile driven.

3. Given the normally distributed random variable X with mean 18 and standard deviation 2.5, find the value of k such that P(x < k) = 0.2578


(X ≤ ) (X


( ≤


but P(z<-0.65) = 0.2578.


, solving for k gives us k = 16.375

Page 109: Course Notes Statistics



1. The achievement scores for a college entrance examination are normally distributed with mean 75 and standard deviation equal to 10. What fraction of the scores would one expect to lie between 70 and 90?

2. A soda machine is regulated so that it dispenses an average of 200mL per cup. If the amount of drink dispensed is normally distributed with a standard deviation equal to 15mL,

a. What fraction of the cups will contain more than 224 mL? b. What is the probability that a cup contains between 191 mL and 209 mL? c. How many cups will likely overflow if 230mL cups are used for the next 1000

drinks? d. Below what value do we get the smallest 25% of the drinks?

Knowledge of sampling distributions of statistics is important to understand the

statistical methods under statistical inference. Under statistical inference, the interest lies in

being able to say something about a parameter with an unknown value. Recall that we can

only determine the true value of a parameter after getting pertinent information from ALL

units in the population. But sometimes we resort to getting just partial information from the

population by taking a random sample of units from which measurements will be obtained.

This partial information will then be the basis of conclusions regarding the unknown


Consider the experiment of getting a random sample from a population with unknown



1. Will the mean of the sample measurements you obtained be exactly equal to the

unknown population mean?

2. If different samples of the same size are obtained from the same population, will the

sample means of measurements from the different samples be the same?

Page 110: Course Notes Statistics



Consider the random experiment of getting a random sample of size two with

replacement from a population of units with measurements {2, 3, 4}. (Note that the population

mean of measurements is = 3 while the population variance is 2 = 2/3.

The sample points in are all the possible random sample of size 2 taken with

replacement from the population with measurements {2, 3, 4}.

It is apparent that the value of the sample mean varies across the different possible sample of size 2. Variability in the values sample mean is visibly present. It can be further observed that not all the sample means are equal to the population mean of 3. We can then say that the mean of the random sample is not necessarily equal to the population mean.

On the average, what is the value of the

sample mean? What is the variance of the sample mean?

Note the E( ), the average sample means, is 3, which is equal to = 3. Also, Var( ), the

variance of the sample mean is 1/3=(2/3)/2=2/n where n is the sample size.

Another example for the sampling distribution is this;

An organization consists of 6 qualified voters: A1, A2, A3, A4, A5, and A6. Renzo and

Sandro are 2 candidates vying for the same position. A1, A2, A3, and A4 are already decided to

vote for Renzo while A5 and A6 will vote for Sandro. Suppose we select a sample of size 2 using

SRSWOR. Construct the sampling distribution of .

where Xi = { 𝑟 𝑟 𝑟

for i = 1, 2, 3.

(Note that can also be viewed in this example as a sample proportion)


Since n = 2, then there will be ( ) = 15 possible samples.

Page 111: Course Notes Statistics


Sample {X1, X2} Sample {X1, X2}

{A1, A2} {1, 1} 1 {A2, A6} {1, 0} ½

{A1, A3} {1, 1} 1 {A3, A4} {1, 1} 1

{A1, A4} {1, 1} 1 {A3, A5} {1, 0} ½

{A1, A5} {1, 0} ½ {A3, A6} {1, 0} ½

{A1, A6} {1, 0} ½ {A4, A5} {1, 0} ½

{A2, A3} {1, 1} 1 {A4, A6} {1, 0} ½

{A2, A3} {1, 1} 1 {A5, A6} {0, 0} 0

{A2, A5} {1, 0} ½

All of these 15 possible samples have the same chance of selection because we select

the sample using SRSWOR. We then use the classical definition of probability to construct the

sampling distribution of X .

Sampling Distribution of

0 1/2 1

f( ) 1/15 8/15 6/15


Consider the random experiment of taking a random sample of size n without

replacement from a population with measurement {2, 3, 4}

1. What are the sample points in ?

2. What are the values that takes on?

3. Verify that E( ) = = 3 and Var( ) = 𝜎


/ = 1/6.


1. In the experiment of taking a random sample from a population, it can be seen that the

sample mean is a function taking on numerical values which depend on the random

sample taken. And the different possible random samples are the sample points in the

sample space. The sample mean is then a random variable. In fact, every statistic is a

random variable.

Page 112: Course Notes Statistics


2. As a random variable, a statistic has a probability distribution. The probability

distribution of a statistic is called its sampling distribution.

3. Consider the random experiment of taking a random sample of size n with replacement

from a population of size N. Then, E( )= and Var( ) =2/n.

4. Consider the random experiment of taking a random sample of size n without

replacement from a population of size N. Then, E( )= and Var( ) =𝜎


/ is called

the finite population correction factor.

5. √ ( ) is the standard deviation of the sampling distribution of the sample mean . In

general, the standard deviation of the sampling distribution of a statistic is called the

standard error of the statistic.

6. If selection of a random sample is done without replacement from a large population, it

is as if the selection is done with replacement since as n becomes large, .

/ .

Suppose that a random sample (X1, …, Xn) of size n taken from a population of normally

distributed measurements with population mean E(X)= and variance Var(X)=2. Then E( )=

and Var( )= 2/n.

If the sample size n is sufficiently large, approximately ~ N(,2/n). Hence,

as n , Z = ( - )/(/√ ) approximately ~ N(0, 1).


1. The Central Limit Theorem tells us that if a sufficiently large random sample is taken

from a large or infinite population with mean and variance 2, then regardless of the

distribution of the measurements in the population , the sample mean will

approximately follow a Normal distribution with mean and variance 2/n.

2. Since approximately ~ N(,2/n) and every Normally distributed random variable can

be transformed to a standard normal random variable. Then we could just use the Table

of Areas under the Standard Normal Curve in getting the probabilities instead of

performing integral calculus.

3. The normal approximation in the theorem will be good only if the population is not too

different from the normal distribution.

8.1 The Central Limit Theorem (CLT)

Page 113: Course Notes Statistics



1. A hardware store receives a consignment of bolts whose diameter has a normal

distribution with a mean diameter of 1.2 inches and a standard deviation of 0.02 inch.

The consignment will be considered substandard and returned if the mean diameter of

a sample of 120 bolts is less than 1.197 inches or greater than 1.203 inches. Find the

probability that the consignment will not be returned.


Let X = diameter of a bolt in a consignment received by a hardware store

Then X ~ N(1.2, 0.022) n = 120 ~ N(1.2, 0.022/120)

P(consignment will not be returned) = P( ≤ ≤ )

= P

√ ⁄≤ 𝜇

𝜎 √ ⁄≤

√ ⁄ ( ≤ ≤ ) ( )

= 1 – 2(0.0505) = 1 – 0.1010 = 0.899

2. A forester for the DENR studying the effects of fertilization on certain species of trees in the South is interested in estimating the average basal area of three trees. In studying basal areas of similar trees for many years, he knows that these measurements have a standard deviation of approximately 4 square inches. If the forester samples n = 30 trees, find the probability that the sample mean will be within 2 square inches of the population mean.


Let A = basal area (in square inches) of a tree of certain species

n = 30 ~ N(, 42/30)

( ≤ ≤ ) ( ) ( √ ⁄

√ ⁄


( ≤ ≤ ) ( ) ( )

One of the statistics of interest in many settings is the sample proportion. A number

of random variables are dichotomous (yields only one of the two categories as response).

The two outcomes of a dichotomous variable may be generically labeled as

“success” and “failure”. The outcome of interest is usually referred to as success.

8.2 Sampling Distribution of Proportion

Page 114: Course Notes Statistics


Thus, we have the sample proportion, ∑

n which is the proportion for success

in n units and the population proportion, ∑

, which is the proportion of success in

the population. Then E( ) = P and Var( ) = ( )


And based on the Central Limit Theorem, if n is quite large and the population is large or infinite then:

. ( )

/ and

√ ( )

( )


Mr. Reyes believes that he can win a city election if in a survey he wishes to conduct prior to the

election; at least 55% of the randomly selected voters in the city are in favor of him. He also

believes that about 50% of the city’s voters favor him. If 100 voters were randomly selected

and asked their preference, what is the probability that Mr. Reyes receives at least 55% of the



Let { 𝑟

i = 1, 2, …., 100

The outcome of interest is “favoring Mr. Reyes” and a voter in the city is considered a “trial”.

The voters were given the same options and were polled in a similar manner. It is assumed that

the preference of one voter does not affect another voter’s choice. We must approximate

P( ), where ∑

n and when P = 0.50. Now, E( ) = P= 0.50 and the Var( ) =

( )

n ( )

. The sample size, 100 is sufficiently large to satisfy the conditions

of the Central Limit Theorem. Thus, approximately ~ N(0.5, 0.0025).

hence, ( ) (

√ ( )

√ ( )

) ( ) ( )

( ≤ )

Page 115: Course Notes Statistics


Most of the time we are not fortunate enough to know the variance of the population

from which we select our random samples. For samples of size , a good estimate of 2 is provided by calculating S2. What then happens to the distribution of z-values in the CLT if we

replace 2 by S2?

As long as S2 is a good estimate of 2 and does not vary much from sample to sample, which is usually the case for n ≥ 30, the values (x -μ) / (σ/√n) are still approximately standard normal variables and the CLT is still valid.

If the sample size is small (n<30), the values of S2 fluctuate considerably from sample to sample and the distribution of the values (x -μ) / (S/√n) is no longer a standard normal distribution. We are now dealing with the distribution of a statistic that we shall call t, whose

values are given by 𝜇 √ ⁄


If and S2 are the mean and variance respectively, of a random sample of size n

taken from a population which is normally distributed with mean and variance 2, then;


√ ⁄

is a random variable having the t-distribution with v = n-1 degrees of freedom.

Comparison between the t-distribution and the Standard Normal Distribution

1. Both are symmetric about zero. 2. Both are bell-shaped but the t-distribution is more variable.

i. t-values depend on the fluctuation of two quantities: and S2.

ii. z-values depend only on the changes in from sample to sample.

3. When the sample size is large, i.e. n ≥ 30, the t-distribution can be well

approximated by the standard normal distribution.

Areas Under the Curve


T~ tv where v=n-1, n is the sample size

tα = is the t-score leaving an area of α in the right tail of the t-distribution.

That is, if T~ tv, then P(T>tα) = α. By symmetry of the t-distribution about 0, tα =- t1-α


1. Find t0.99 when v=10 on the table of areas under the t-distribution.

2. Find k such that P(k<T<2.807) = 0.945 when T~t23.

8.3 The Student's t-distribution

Page 116: Course Notes Statistics


• Estimation and Hypothesis Testing

9.1 Estimation for Single Population

2.2 .1 Data Collection Methods

Concepts Related to Estimation Estimator – an estimator is any statistic whose value is used to estimate an

unknown parameter.

Estimate – a realized value of an estimator

Unbiased Estimator – An estimator is said to be unbiased if the average of the estimates it produces under repeated sampling is equal to the true value of the parameter being estimated.

Interval Estimator – An interval estimator of a population parameter is a rule that tells us how to calculate two numbers based on sample data, forming an interval within which the parameter is expected to lie.

Interval Estimate or Confidence Interval – An interval estimate is a realized interval of values of an interval estimator. The endpoints of a confidence interval are the lower and upper confidence limits.

Confidence Coefficient – The confidence coefficient is the probability that the interval estimator encloses the true value of the parameter.


1. The sample mean, is an estimator of the population mean, µ.

Consider the following problem:

An electrical firm manufactures light bulbs that have a length of life that is

normally distributed, with a standard deviation of 40 hours. We are interested in

estimating the mean length of life of all light bulbs produced by this firm. A random

sample of 25 bulbs has a mean life of 780 hours.

Page 117: Course Notes Statistics


In the above example, 780 hours is the point estimate of the true mean length of

life of all light bulbs produced by this firm.

2. Let { 𝑟 𝑟 𝑟

. Recall that P is the

probability of success. An estimator of P is the sample proportion, ∑

, the

proportion of success in the sample.

Suppose the Math department is interested in estimating the true proportion of

all students who pass Math 17 on the first take. In a random sample of 200 students who

enrolled in Math 17, 138 passed on their first take.

The sample proportion 138/200 of students who passed Math 17 on their first

take is the point estimate of the true proportion of all students who pass Math 17 on first



1. An estimator is not expected to estimate the population parameter without error. We

do not expect to estimate µ exactly, but we certainly hope that it is not too far off.

2. An estimator, being a statistic (and hence a random variable) has variability. Note that

point estimates may vary for different possible samples. Providing the point estimate

for a sample will not reflect the extent of the variability of the estimates that may be

obtained in estimating the parameter of interest. In other words, we do not have a

gauge of how near or how far the estimates are from the parameter. We then cannot

assess the precision of our estimation results. Interval estimation, however, takes into

account this extent of variability among the estimates. Hence, we can have an idea of

the proximity of our estimate from the true value.

Derivation of the Interval Estimator for the Mean, (when is known):

( ⁄ ⁄ )

( ⁄

√ ⁄ ⁄ ) ( ⁄

√ ⁄

√ *

( ⁄

√ ⁄

√ * ( ⁄

√ ⁄

√ *

( ⁄

√ ⁄

√ *

Hence the probability that is enclosed in . ⁄

√ ⁄

√ / is 1-α

Page 118: Course Notes Statistics


Interpretation of (1-α)100% Confidence Interval:

If we take repeated samples of size n and if for each of these samples, we compute the (1-

α)100% confidence interval, then (1-α)100% of the resulting confidence intervals will

contain the unknown value of the parameter.

A good confidence interval is one that is as narrow as possible and has a large

confidence coefficient, near 1. The narrower the interval, the more exactly we have located the

parameter; whereas the larger the confidence coefficient, the more confidence we have that a

particular interval encloses the true value of the parameter. However, for a fixed sample size,

as the confidence coefficient increases, the length of the interval also increases.

Recall the Central Limit Theorem and the t-distribution and the remarks regarding their

use. A (1-α)100% confidence interval for is given by:

a. known: . ⁄

√ ⁄

√ /

b. unknown, n < 30: . ⁄

√ ⁄

√ / where tα/2 is the t-

value with v=n-1 degrees of freedom.

c. unknown, n ≥ 30: . ⁄

√ ⁄

√ /


1. The above formulas hold strictly for random samples from a normal distribution.

However, they provide good approximate (1-α)100% confidence intervals when the

distribution is not normal provided the sample size is large.

2. The derivation of the (1-α)100% confidence interval estimator for when is known

has already been derived.

9.1.1 Estimating the Population Mean

Page 119: Course Notes Statistics



An electrical firm manufactures light bulbs that have length of life that is normally

distributed, with a standard deviation of 40 hours. If a random sample of 25 bulbs has a mean

life of 780 hours, find a 95% confidence interval for the population mean of all bulbs produced

by this firm.


Let X = length of life of a light bulb (in hours) manufactured by an electrical firm: X ~ N(,402).

Why is there no value of in N(,402)?

Given: = 780 n =25 1-α = 0.95 α = 0.05

= 0.025 ⁄ = z0.025 = 1.96

A (1-α)100% confidence interval estimate for is given by . ⁄

√ ⁄

√ /.

A 95% confidence interval for the true mean length of light bulb (in hours) manufactured by

a certain electrical firm is

( ( )

√ ( )

√ *

= (764.32, 795.68)

Based on sample results, we are 95% confident that the light bulbs manufactured by a

certain electrical firm last, on the average, at least 764.32 hours and at most 795.68 hours.

X is normally distributed

2 is known?

Formula a n > 30

Formula c Formula b

n >30

2 is known?

Formula a Formula c

Nonparametric Methods











Page 120: Course Notes Statistics



1. Regular consumption of presweetened cereals contribute to tooth decay, heart disease,

and other degenerative diseases, according to a study by Dr. M Albreight of the National

Institute of Health and Dr. D. Solomon, Professor of Nutrition and Dietetics at the

University of London. In a random sample of 20 similar servings of Alpha Bits, the mean

sugar content was 1.13 grams with a standard deviation of 2.45 grams. Assuming that the

sugar content is normally distributed, construct a 95% confidence for the mean sugar

content for single serving of Alpha Bits.

2. A random sample of 100 automobile owner shows that an automobile is driven on the

average 23,500 kilometers per year, in the state of Virginia, with a standard deviation of

3900km. Construct a 99% confidence interval for the average number of miles an

automobile is driven annually in Virginia.

An approximate (1-α)100% confidence interval for p is given by;

( ⁄ √

⁄ √


Let X { 𝑟


1-α = 0.95 α = 0.05

= 0.025 ⁄ = z0.025 = 1.96

A 95% confidence interval for the true prop. of students who pass Math on their first take is

( √( )( )

( )( )


= (0.62590170, 0.754098293) Based on sample results, we are 95% confident that the proportion of students who

pass Math on their first take is at least 0.63 and at most 0.75.

9.1.2 Estimating the Population Proportion

Page 121: Course Notes Statistics


9.2 Sample Size Determination for Estimation

In previous sections, the steps in constructing a confidence interval in estimating an

unknown parameter ( or p) involves

1. Getting a random sample of size n from the population.

2. Computing the point estimate based on the sample.

3. Choosing the appropriate formula based on the problem (i.e. is the population

variance known? Is X normally distributed? Or is Xi defined as a binary variable taking

on 1 for “success” or 0 for “failure” for the ith trial,i= 1, 2, …, n? Is the sample size

large? And so on) to calculate the (1-α)100% confidence interval.

4. Interpreting the resulting (1-α)100% confidence interval.

Suppose that the population variance 2 is known and we state, say, that “we are (1-

α)100% confident that is within the interval . ⁄𝜎

√ ⁄


√ /.”

Note that is going to be within the (1-α)100% confidence interval if and only if the

error, e of estimating using is at most ⁄𝜎

√ . Then, saying “we are (1-α)100% confident

that is within the interval . ⁄𝜎

√ ⁄


√ /” is equivalent to saying that “We are

(1-α)100% confident that the error e of estimating using cannot exceed ⁄𝜎

√ .”

Now suppose a researcher desires to estimate using with (1-α)100% confidence and

wishes that the random sample of size n that he takes will give an estimate which is within a

specified value e of . That is, he wishes to be (1-α)100% confident that the random sample

that he would take will give a realized value of such that the error of estimating will not

exceed a specified value e. How large a sample is necessary should the researcher take?

In such scenario, we don’t intend to construct (1-α)100% confidence for from a

sample that is already taken from the population. In fact, the sample has not been taken yet

and we are about to determine the sample size first. What we have are the following:

1. The population standard deviation is known.

2. The confidence coefficient (1-α) is known, i.e. the researcher sets how confident he

wishes to be in estimating using .

3. The maximum amount of error ⁄𝜎

√ in estimating using is specified.

9.2.1 Sample Size for Estimating

Page 122: Course Notes Statistics


Assuming (1), the objective then is to determine the sample size n that satisfies (2) and

(3). But since zα/2, and e are known, then the formula ⁄𝜎

√ will give .



So to interpret, “We can be (1-α)100% confident that getting a random sample of

. ⁄


will provide an estimate which is at most a specified amount e away from the value

of ”.

Note that the larger the sample size is, the smaller the standard error 𝜎

√ of the sample

mean is. The possible values of the sample mean fluctuate less then as the sample size is

increased. But the sample size cannot be increased at the whim of the researcher since each

additional unit in the sample entails costs and in whatever study, the research design is

influenced by the budgetary constraints. The selection of the sample size is then a compromise

between the extent of precision of results desired and the financial considerations.

When the computed sample size is not an integer, we round it up to the nearest integer.


An electrical firm manufactures light bulbs that have a length of life that is

approximately normally distributed, with a standard deviation of 40 hours. How large a sample

is needed if we wish to be 95% confident that the sample mean will be within 10 hours of the

true mean?


Let L = length of life (in hours) of a light bulb manufactured by a certain electrical firm

L approximately ~ N(, 2)

We can be (1-α)100% confident that getting a random sample of . ⁄



provide an estimate which is at most a specified amount e away from the value of .

Given: =40 e =10 1-α = 0.95 α = 0.05

= 0.025 ⁄ = z0.025 = 1.96

Then .( ) ( )


( )

Therefore, we could be 95% confident that taking a random sample of 62 light bulbs

will provide an estimate which is within 10 hours of the true mean length of life of light bulbs.

Page 123: Course Notes Statistics


Let. { 𝑟 𝑟 𝑟

. Recall that P is the

probability of success. An estimator of P is the sample proportion, ∑

, the proportion of

success in the sample. Recall that E( ) = P and Var( ) = ( )


Note that the standard error of involves P, the parameter of interest. Hence, in

constructing a (1-α)100% confidence interval for P, the confidence limits are supposedly

√ ( )

and √

( )

, which are not independent of P, the parameter that is

supposedly being estimated. However, for large samples, little error is introduced in

substituting the statistic for the true proportion P. Therefore, an approximate (1-α)100%

confidence interval for P is given by ( √ ( )

( )


Similar to how we determine the sample size in estimating using , it can be

reasoned out that saying we are (1-α)100% confident that P is within the interval

( √ ( )

( )

) is the same as saying we are (1-α)100% confident that

the error e of estimating P using cannot exceed √ ( )


Therefore, a researcher could be (1-α)100% confident that getting a random sample of

( )

will provide an estimate which is at most a specified amount e away from the

value of P.

The formula using

( )

is used when an approximate value for p is available.

However in some cases when we do not have an approximation of P to start with, we might as

well work with the largest sample size that we could obtain given the degree of confidence and

the extent of error we are willing to commit. Such maximum sample size is attained by using

P= Q = 0.5. This will give us the conservative formula


Example: A chemist has prepared a product designed to kill 60% of a particular type of insect.

How large a sample should be used if he desires to be 95% confident that he is 0.02 of the true

fraction of insects killed?

9.2.2 Sample Size for Estimating P

Page 124: Course Notes Statistics


9.3 Hypothesis Testing Procedures


Let X = number of insects of a particular type killed by the new product, X ~Bi (n, p).

We can be (1-α)100% confident that getting a random sample of

( )

will provide an

estimate which is at most a specified amount e away from the value of p.

Given: P 0.60 1- P 0.40 e =0.02 1-α = 0.95 α = 0.05

= 0.025 ⁄ = z0.025 = 1.96

Then .( ) ( )( )


Therefore, we could be 95% confident that taking a random sample of 2305 insects of a

particular type will provide an estimate which is within 0.02 of the true fraction of insects killed.

Exercise: What if an approximation for the true fraction of insects killed by the new product is

not available? How large a sample should be used if the chemist desires to be 95% confident

that he is 0.02 of the true fraction of insects killed?

Often, the problem confronting us is not so much the estimation of a parameter as

discussed in the previous section, but rather the formulation of a set of rules that lead to a

decision culminating in the “non-rejection” or rejection of some statement or hypothesis about

the population.


1. A medical researcher might be required to decide on the basis of experimental evidence

whether a certain vaccine is superior to one presently being marketed.

2. An engineer might have to decide on the basis of sample data whether there is a

difference between the accuracy of kinds of gauges.

3. A sociologist might wish to collect appropriate data to enable her to decide whether the

blood type and the eye color of an individual are independent variables.

The procedures for establishing a set of rules that lead to the rejection or non-rejection of

these kinds of statements comprise a major area of statistical inference called Hypothesis


The truth or falsity of a statistical hypothesis is never known with certainty unless we

examine the entire population. This, of course, would be impractical in most situations. Instead

we take a random sample from the population of interest and use the information contained in

this sample to decide whether the hypothesis is likely to be true or false.

Page 125: Course Notes Statistics


Concepts Related to Estimation

1. Statistical Hypothesis –is an assertion or conjecture concerning one or more populations.

2. Null Hypothesis (Ho) – is the hypothesis that is being tested; it represents what the experimenter doubts to be true.

3. Alternative Hypothesis (Ha) – is the operational statement of the theory that the experimenter believes to be true and wishes to prove. It is the contradiction of the null hypothesis.

4. One-tailed Test of Hypothesis – is a test where the alternative hypothesis specifies a one-directional difference for the parameter of interest.

Example: Ho: =14 vs. Ha: > 14 , Ho: =14 vs. Ha: < 14

5. Two-tailed Test of Hypothesis – is a test where the alternative hypothesis does not specify a directional difference for the parameter of interest.

Example: Ho: = 14 vs. Ha: 14

6. Test Statistic – is a statistic whose value is calculated from sample measurements and on which the statistical decision will be based.

7. Critical Region or Rejection Region – is the set of values of the test statistic for which the null hypothesis will be rejected.

8. Acceptance Region – is the set of values of the test statistic for which the null hypothesis will not be rejected.

9. Critical Value – the value of the test statistic separating the acceptance and rejection regions.

10. Type I Error – is the error made by rejecting the null hypothesis when it is true. 11. Type II Error – is the error made by accepting (not rejecting) the null hypothesis when it

is false. 12. Level of Significance, α – is the maximum probability of Type I error the researcher is

willing to commit.

Decision State of Nature

Null Hypothesis is True Null Hypothesis is False

Null Hypothesis is Rejected Incorrect Decision (Type I

Error committed) Correct Decision

Null Hypothesis is Not Rejected Correct Decision Incorrect Decision (Type II

Error committed)

Page 126: Course Notes Statistics


Steps in Hypothesis Testing

1. State the null hypothesis (Ho) and the alternative hypothesis (Ha).

2. Choose the level of Significance α.

3. Select the appropriate test statistic and establish the critical region.

4. Collect the data and compute the value for the test statistic from the

sample data.

5. Make the decision. Reject Ho if the value of the test statistic belongs in

the critical region. Otherwise, do not reject Ho

Ho Test Statistic Ha Critical Region

a. known

= o

√ ⁄

< o

> o


z < -zα

z > zα

|z| > zα/2

b. unknown

= o

√ ⁄

v = n-1

< o

> o


t < -tα

t > tα

|t| > tα/2


1. The above tests are exact α-level tests for samples from a normal distribution.

However, they provide good approximate α-level test when the distribution is not

normal provided that the sample size is large, i.e. n > 30.

2. If is unknown and n > 30, use the test in (a) replacing the test statistic by

√ ⁄

Problem: (Automobile Example)

It is claimed that an automobile is drawn on the average less than 25,000 km per year.

To test the claim, a random sample of 100 automobile owners is asked to keep a record of the

kilometers they travel. Would you agree with this claim if the random sample showed an

average of 23,500 km and a standard deviation of 3,900 km? Use a 0.01 level of significance.

9.3.1 The Hypothesis Testing Process

Page 127: Course Notes Statistics



Let X = number of kilometers an automobile is driven in a year

1. State the null hypothesis (Ho) and the alternative hypothesis (Ha).

Null Hypothesis – hypothesis being tested, represents what the experimenter doubts

to be true.

Alternative Hypothesis – operational statement of the theory that the experimenter

believes to be true and wishes to prove; contradiction of HO

The null and alternative hypotheses are statistical hypotheses.

Ho: = 25,000 vs. HA: < 25,000

2. Choose the level of significance α.

α = 0.01

3. Select the appropriate test statistic and establish the critical region.


Since is unknown and n>30, choose to use the test statistic and the critical region

specified in remark (2).

√ ⁄

The form of critical region would depend on the form of the alternative hypothesis.

Since the alternative hypothesis is one tailed and has the form HA: < o, then the critical

region takes the form Z < -zα.

Critical Region: Z < -zα = -z0.01 = -2.326

This means that if the computed test statistic Z is less than -2.326, then Ho is rejected at 0.01 level of significance.

The shaded region is the critical region.

The unshaded region is the acceptance region.

The value which separates the rejection region and the acceptance region is -2.326, the critical value.

Page 128: Course Notes Statistics


4. Collect the data and compute the value of the test statistic from the sample data.

Given: = 23,500 s = 3,900 n = 100

√ ⁄

5. Make the decision. Reject Ho if the value of the test statistic belongs in the critical region. Otherwise, do not reject Ho.

Since -3.846153846 < -2.326, we reject Ho at 1% level of significance.


Rejection of a hypothesis by a statistical test does not mean that the hypothesis is false;

and in the same manner, it’s non-rejection by the test does not imply that it is true.

Rejecting the hypothesis only means that: given that the hypothesis is true, the observed

sample has a very small probability of occurrence. This is taken as evidence of the falsity

of the hypothesis.

Conclusion: At o.01 level of significance, based on the sample results, there is sufficient

evidence to say that the true average number of kilometers an automobile is driven per year is

less than 25,000 kilometers.

Page 129: Course Notes Statistics


Decisions and States of Nature

1. A correct decision has been made when

The null hypothesis is not rejected when it is true

The null hypothesis is rejected when it is false

2. An incorrect decision has been made when

The null hypothesis is rejected when in fact, it is true. (Type I error)

The null hypothesis is not rejected when in fact, it is false. (Type II error)


β = probability of Type II error = probability of not rejecting the null hypothesis when in fact it is false

1- β = power of the test = probability of rejecting the null hypothesis when in fact it is false

α = level of significance = maximum probability of Type I error the researcher is willing to commit = maximum probability specified by the researcher of rejecting the null hypothesis when in fact it is true.

Be aware that in fixing α, the experimenter is controlling Type I error probabilities, not the Type II error. If this approach is taken, the experimenter should specify the Ho and Ha so that it is most important to control the Type I error probability. For example, suppose an experimenter expects the data to support a particular hypothesis but he does not wish to make the assertion unless the data do give convincing support. The test can be set up so that the alternative hypothesis is the one that he expects the data to support, and hopes to prove. By using a small α, the experimenter is guarding against saying the data support Ha when it is false.

Note: The level of significance is the area of the critical region. The level of significance is specified prior to the hypothesis testing procedure and provides an objective way of assessing when to reject the null hypothesis based on the sample results. The level of significance is used to assess how likely (or unlikely) is it to get a random sample such as what has been observed if the null hypothesis is true.

Question: When does one become stricter in avoiding the mistake of rejecting the null hypothesis when it is true: when α is decreased or increased?

Interpretation of the Level of Significance α:

If we repeatedly take random sample of size n, and for each of these samples, the hypothesis testing procedure is performed using the level of significance α, then at most α*100% of all the decisions you make will result in rejecting the null hypothesis, when in fact it is true.

Page 130: Course Notes Statistics



1. The null hypothesis Ho will always be stated using the equality sign so as to specify a

single value. In this way, the probability of committing a Type I error can be controlled.

Whether one sets up a one-tailed or a two-tailed test will depend on the conclusion to

be drawn if Ho is rejected. The location of the critical region can only be determined

only after the Ha has been stated.

2. Note the similarities between the scenarios of when to use the three available formulas

for estimating the population mean in the previous section and the scenarios of when

to use the test statistics and critical regions available for testing a hypothesis on the

population mean. The flowchart used in choosing the test statistic and establishing the

critical region in testing a hypothesis on the population mean.

Example A: Test Ho: = 50 vs. Ha: 50, if a random sample of 16 subjects had mean 48 and

standard deviation of 5.8 at 0.05 level of significance. Assume that the sample

was taken from a Normal population with standard deviation of 6.


i. Hypotheses: HO: = 50 vs. HA: 50

ii. Level of Significance: α = 0.05

iii. Test Statistic: Since the random sample was taken from a population of measurements that follow a Normal distribution and the population standard

deviation is known (=6), we use the test statistic:


iv. Critical Region: |Z| > zα/2 = z0.05 /2 = z0.025 Z > z0.025 or Z < -z0.025 Z > 1.96

v. Computation: Given: = 48 = 6 s = 5.8 n = 16 ;

√ ⁄

Page 131: Course Notes Statistics


vi. Decision: Since |Z| = |-1.33| ≯ 1.96, we do not reject HO at 5% level of significance.

vii. Conclusion: At 5% level of significance, there is not enough evidence to say that the true mean is significantly different from 50.

Exercises: (Dietary Goal Problem) According to Dietary Goals for the United States (1977), high sodium intake may be related to ulcers, stomach cancer, and migraine headaches. The human requirement for salt is only 230mg per day, which is surpassed in most single servings of ready-to-eat cereals. A random sample of 20 similar servings of Special K had mean sodium content of 244mg of sodium and standard deviation of 24.5mg. Is there sufficient evidence to believe that the average sodium content for single servings of Special K exceeds the human requirement for salt at α=0.025? at α=0.10? Assume normality.

Remarks: For the same data set, as α increases, the size of the critical region also increases. Consequently, if Ho is rejected at α-level of significance, then Ho will also be rejected at a higher level of significance. For example, if Ho is rejected at α =0.05 then testing at α = 0.1 will also lead to the rejection of Ho. However, Ho will not necessarily be rejected at α = 0.01.

Example: (Automobile Example Continued)

For α = 0.01, the critical region is Z < -2.326.

Question: Does the change in the level of significance entail a change in the value of test statistic?

For α = 0.05, the critical region is Z < -1.96.

Page 132: Course Notes Statistics


Since Ho : = 25,000 is rejected at 0.01 level of significance, it will also be rejected at 0.025 level of significance.

The p-value is the smallest value of α for which Ho will be rejected based on sample

information. It is the probability under Ho of obtaining a sample as extreme as or more extreme

than the one that was observed

If p-value ≤ α, then Ho is rejected, otherwise, Ho is not rejected.


(Automobile Example) – Recall that Z = -3.846153846. Since we have a one-tailed test

and the Ho is rejected for values of Z such that Z<-z0.01, we consider the scenario of getting

a random sample as more extreme than observed if it gives us a computed value of

Z < -3.846153846. Hence, the p-value is given by;

p-value = P(Z ≤ -3.846153846) 0.

Since the p-value, which is approximately equal to 0, is less than 0.01 level of


9.3.2 Using the P-value in Tests of Hypothesis

Page 133: Course Notes Statistics


(Example A) – Note that Z = -1.333. Since we have two-tailed test and the Ho is rejected

for values of Z such that |Z| > z0.025, we consider the scenario of getting a random sample

as more extreme than observed if it gives us a computed value of Z such that Z < -1.333 or

Z > 1.333, i.e., |Z| > 1.333. Hence, the p-value is given by;

p-value = P( |Z| > 1.333)

= P({Z < -1.333} {Z > 1.333}) = P({Z < -1.333}) + P({Z > 1.333}) = 2*P({Z < -1.333}) = 2*(0.0918) p-value = 0.1836

Since 0.1836 is not ≤ 0.05, Ho is not rejected at 0.05 level of significance.

Exercise: Compute for the p-value of the Dietary Goal Problem.

Consider the problem of testing the hypothesis that the proportion of successes in a

binomial experiment equals some specified value.

Let X be the number of successes in n trials.

If the unknown proportion P is not expected to be too close to 0 or 1 and n is large, a

large sample approximation is given by:

Ho Test Statistic Ha Critical Region

P = po

P < po

P > po

P po

z < -zα

z > zα

|z| > zα/2

Example: A commonly prescribed drug on the market for relieving nervous tension is believed

to be only 60% effective. Experimental results with a new drug administered to a random

sample of 100 adults who were suffering from nervous tension showed that 70 received relief.

Is this sufficient evidence to conclude that the new drug is superior to the one commonly

prescribed? Use a 0.05 level of significance.

9.3.3 Test of Hypothesis for the Population Proportion

Page 134: Course Notes Statistics



Let { 𝑟 𝑟 𝑟 𝑟 𝑟

Then X = number of adults (out of 100) who received relief after taking the new drug

i. Hypotheses: HO: P = 0.60 vs. HA: P > 0.60

ii. Level of Significance: α = 0.05

iii. Test Statistic:

iv. Critical Region: Z > zα = z0.05 = 1.645 Z > 1.645

v. Computation: Given: x = 70 po = 0.60 qo = 1-0.60 = 0.40 n = 100 ;

( )( )

√( )( )( )

vi. Decision: Since Z = 2.041241452 > 1.645, we reject HO at 5% level of significance.

vii. Conclusion: At 5% level of significance, based on sample results, there is sufficient evidence to say that the new drug is superior to the one commonly prescribed.

Exercise: Swain vs. Alabama Case

In 1965, the U.S. Supreme Court decided the case of Swain vs. Alabama. Swain, a black man, was convicted in Talladega County, Alabama, of raping a white woman. He was sentenced to death. The case was appealed to the Supreme Court on the grounds that there were no blacks on the jury; Moreover, no black “within the memory of persons now living has ever served on any petit jury in any civil or criminal case tried in Talladega County, Alabama.”

The Supreme Court denied the appeal, on the following grounds. As provided by Alabama law, the jury was selected from a panel of about 100 persons. There were 8 blacks on the panel (They did not serve on the jury because they were “struck”, or removed, through a maneuver called peremptory challenges by the prosecution. Such challenges were until quite recently constitutionally protected.). The Supreme Court ruled that the presence of 8 blacks on the panel showed “The overall percentage disparity has been small and reflects no studied attempt to include or exclude a specified a number of blacks”.

At that time in Alabama, only men over the age of 21 were eligible for jury duty. There were 16,000 such men in Talladega County, of whom about 26% were black.

If 100 people were chosen by simple random sampling from this population, what is the chance that 8% or fewer would be black?

What do you conclude about the Supreme Court’s opinion?

Page 135: Course Notes Statistics


• Estimation and Hypothesis Testing

10.1 Estimation Procedures Involving Two


In this chapter, the basic concepts learned in estimation and hypothesis testing are applied in the scenario wherein the interest lies in the comparison between two population means or two population proportions. This is quite useful especially when we would like to conclude which among two methods is “better” or which among two groups obtained “higher scores”. If the comparison is based on sample results, we will be able to readily see which among the sample mean values is higher, if there exists any difference at all between the two statistics.

But an important question to ask is if such difference exists by chance in the samples that have been drawn. Perhaps, if information was taken from all units in the two populations under consideration, results would show that there is no difference at all in the mean values. How confident can we then be of generalizing results regarding the difference of two means or proportions based on sample information to the populations of interest? When can we then say that we have sufficient evidence based on sample results that the two population means or proportions are significantly different from each other?

The said questions are answered by performing estimation and hypothesis testing procedures involving two populations.

Consider getting a random sample of observations X11, … , X1n with meanX from a

population with mean 1 and variance 12. Similarly, consider getting a second random sample

X21, … , X2n with mean X from a population with mean 2 and variance 22. The point

estimator of the difference between 1 and 2 is the statistic X X .

If random sampling is done in two populations, it can be done either through (1)

selection of two independent samples or through (2) paired sampling.

Paired sampling is done to try to eliminate the effect of factors which are not of interest

to us but may affect the study results when comparison of two population means is performed.

This is achieved by “matching” or studying two related samples. Matching may be achieved by:

Page 136: Course Notes Statistics


a) using the same subject in the two samples (e.g. in assessing whether there was

increased knowledge about a subject matter after an academic year, one may

compare test results at the start and at the end of classes of the same pupils).

b) pairing of subjects with respect to any extraneous variable which might affect or

influence the outcome (e.g., in assessing the effectiveness of one method over the

other in teaching kids, the extent of learning may also be affected by the

intelligence scores of the kids).

If the manner by which a random sample taken from a population does not depend in

any way on how the other random sample is taken from another population, or if units from

the two samples are not matched by controlling an extraneous factor, then two samples are

said to be independent.

A (1-α) 100% Confidence Interval for 1-2 is given by:


(X X ) √


(X X ) ⁄ √

( ) ( )


(X X ) ⁄ √








(X X ) √

10.1.1 Estimating the Difference of Two Population Means Based on Two Independent Samples

Page 137: Course Notes Statistics



1. Formulas (a) to (c) hold strictly for independent samples selected from normal

populations. However, they provide good approximate (1-α)100% confidence intervals

when the distributions are not normal, provided both n1 and n2 are greater than 30 by

using the properties of the t-distribution and by invoking the Central Limit Theorem.

2. Even if the population variances are considerably different, formula (b) will still provide a

good estimate provided that n1=n2 and both populations are normal. Therefore, in a

planned experiment, one should make every effort to equalize the size of the samples.

3. The flowchart above summarizes the cases by which the different formulas for the

(1-α)100% confidence interval estimators for 1-2 are used.


A statistic test was given to a random sample of 50 girls and another random sample of

75 boys. The mean score of the girls is 80 with a standard deviation of 4 and the mean score of

the boys is 86 with a standard deviation of 6. Find a 95% confidence interval for the difference


X1 and X2 both normally


12 and 2

2 is known?

Formula a

n1 and n2 >30

Formula d 12 = 2


n1 = n2

Formula b Formula c

Formula b

n1 and n2 >30

12 and 2

2 is known?

Formula a Formula d

Nonparametric Methods

Flowchart of Formulas to Use in Estimating the Difference between 1 and 2















Page 138: Course Notes Statistics



Let Bi = score of the ith boy in a statistic test, i = 1, 2, ..., 75

Let Gj = score of the jth girl in a statistic test, i = 1, 2, ..., 50

Then X approximately ~ N(B, ⁄ ) and X approximately ~ N(G,

⁄ )

Given: X


1-α = 0.95 α = 0.05 α/2 = 0.025 zα/2 = z0.025 = 1.96

A (1-α)100% confidence interval estimate for B-G is given by

((X X ) √

(X X ) √


Question: Why use this formula among the four available interval estimators?

A 95% confidence interval for the difference of the true mean score of the girls from

the true mean score of the boys is;

(( ) ( )√

( ) √

, ( )

Based on sample results, we are 95% confident that the true mean score of the boys in

a statistics test exceeds the true mean score of the girls by at least 4.25 and by at most 7.75

points. Since all the values in the interval of differences are positive, there is indication that the

true mean score of the boys is higher than the true mean score of the girls in the said statistics


Remark: Suppose that based on random samples taken from two populations with means 1

and 2, a (1-α)100% confidence interval for 1 - 2 has been constructed and the estimate is (a1,

b1). Then, based on the same random samples, a (1-α)100% confidence interval for 2 - 1 will

give the estimate (-b1, -a1).


1. Students may choose between a 3-unit course in Physics without lab and a 4-unit course with lab. The final written examination is the same for each section. The mean score of a random sample of 12 students in the section with lab is 84 with a standard deviation of 4, and the mean score of 18 students in the section without lab is 77 with a standard deviation of 6. Find a 99% confidence interval for the difference between the mean grades for the two courses. Assume the populations to be approximately normally distributed with equal variances.

Page 139: Course Notes Statistics


2. An obstetrician wanted to know the effectiveness of two methods of relieving pain in women due for child delivery. Specifically, she wanted to see if there is a difference in the mean duration of active labor (in minutes) in these women who were introduced to these methods. She enrolled 20 women in labor in the study which she later on divided randomly into two equal groups. Group A subjects were given continuous epidural infusion while Group B subjects received hourly intermittent epidural injection. The sample sizes, mean duration of active labor (in minutes) and standard deviations were as follows:

Sample Group Sample

Sizes Mean Duration of Active

Labor (in minutes) Standard Deviation

(in minutes)

Continuous Epidural Infusion Group

10 134.1 79.79

Hourly Intermittent Epidural Injection Group 10 88.8 85.01

Assume normality of duration of active labor (in minutes) and equality of population variances. Construct a 98% confidence interval of the difference of the mean duration of active labor (in minutes) of women introduced to these methods. Does this suggest that there is a difference in the effects of the two methods in terms of reducing the duration of active labor?

If taking of measurements for two groups is implemented from the same units or if pairing of units from two groups with respect to some extraneous factor is done, then we come up with related samples.

Let Xi = the ith observation from the first group Yi = the ith observation from the second group

di = Xi – Yi , i = 1, 2, … ,n and di ~ N(D, D2).

Then a (1-α)100% confidence interval for D is given by;


√ *

∑ n

∑ n

(∑ n )

( )

10.1.2 Estimating the Difference of Two Population Means based on Two Related Samples

Page 140: Course Notes Statistics



It claimed that a new diet will reduce a person’s weight by 4.5 kilograms on the average

in a period of two weeks. The weights of a random sample of 7 women who followed the diet

were recorded before and after a 2-week period are as given in the table below. Compute a

95% confidence interval for the mean difference in the weight. Assume the distribution of

weights to be approximately normal.


1 2 3 4 5 6 7

Weight Before 58.5 60.3 61.7 69.0 64.0 62.6 56.7

Weight After 60.0 54.9 58.1 62.1 58.5 59.9 54.4


Let WBi = weight (in kgs) of the ith woman before undergoing a new diet

WAi = weight (in kgs) of the ith woman after undergoing a new diet

di = WBi – WAi = difference in weight (in kgs) of the ith woman before and after

undergoing new diet for two weeks , i = 1, 2, … ,7 and di ~ N(D, D2).


I 1 2 3 4 5 6 7

di -1.5 5.4 3.6 6.9 5.5 2.7 2.3

1-α = 0.95 α = 0.05 α/2 = 0.025 tα/2,v = t0.025, 6 = 2.447

A 95% confidence interval for the true mean difference in the weights of women after two

weeks of undergoing new diet is;

(( ) ( )( )

√ ( )

( )( )

√ + ( )

Based on the sample results, we are 95% that on the average, there is weight loss of

at least 0.99 kg and at most 6.12 kg among women after two weeks of undergoing new diet.

Since 4.5 kg is within the interval, then there is insufficient evidence to discredit the claim.

Since all the values in the interval of mean weight loss are positive, there is indication that on

the average, there is weight loss.

Page 141: Course Notes Statistics


Exercise: Twenty college freshmen were divided into 10 pairs, each member of the pair having

approximately the same IQ. One of each pair was selected at random and assigned

to a mathematics section using programmed materials only. The other member of

each pair was assigned to a section in which the professor lectured. At the end of the

semester, each group was given the same examination and the following results

were recorded.

Pair 1 2 3 4 5 6 7 8 9 10

Programmed Materials 76 60 85 58 91 75 82 64 79 88

Lectures 81 52 87 70 86 77 90 63 85 83

Find a 98% confidence interval for the mean difference in scores of the two learning

procedures. Assume normality.

Suppose we have two populations with proportions of success P1 and P2 respectively,

and we wish to estimate the difference P1 – P2.

Let X = number of successes in n1 trials (sample 1) from the first population and

Y = number of successes in n2 trials (sample 2) from the second population

Then a point estimator of P1 – P2 is:


When n1 and n2 are large, an approximate (1-α)100% confidence interval for P1 – P2 is given by:


) √


) √


10.1.3 Estimating the Difference of Two Population Proportions

Page 142: Course Notes Statistics



In a random sample of 200 students, 78 of the 120 females and 60 of the 80 males

passed Math 17 on their first take. Construct a 95% confidence interval for p1-p2 where p1 and

p2 are the true proportions of females and males, respectively, who passed Math 17 on their

first take.


Let X = number of female students (out of 120) who passed Math 17 on their first take.

Y = number of male students (out of 80) who passed Math 17 on their first take.

where P1 is the population proportion of females who pass Math 17 on first take and

P2 is the population proportion of males who pass Math 17 on first take.




1-α = 0.95 α = 0.05 α/2 = 0.025 zα/2 = z0.025 = 1.96

A 95% confidence interval for the difference of the true proportion of male students who pass Math on their first take from the true proportion of female students who pass Math on their first take is

(( ) √( )( )

( )( )

( ) √

( )( )

( )( )


( )

Based on the sample results, we are 95% confident that the difference of the true

proportion of male students who pass Math on their first take from the true proportion of

female students who pass Math on their first take is at least -0.23 and at most 0.028. Since the

interval contains both positive and negative values, there is a possibility that the true

proportions of male and female students who pass Math 17 on their first take do not differ.

Page 143: Course Notes Statistics


10.2 Hypothesis Testing Involving Two Populations

The similarities in cases in choosing the appropriate interval estimators for 1-2 and for

D and in selecting the appropriate test statistic in hypothesis testing can be noted.

Ho Test Statistic Ha Critical Region

a. 12 and 2

2 known

1-2 = do ( )

√( ⁄ ) (

⁄ )

1-2 < do

1-2 > do

1-2 do

z < -zα

z > zα

|z| > zα/2

a. 12 = 2

2 but unknown

1-2 = do

( )

√( ⁄ ) ( ⁄ )

v = n1+n2-2

√( )

( )

1-2 < do

1-2 > do

1-2 do

t < -tα

t > tα

|t| > tα/2

b. 12 2

2 and unknown

1-2 = do

( )

√( ⁄ ) (

⁄ )

,( ⁄ ) (

⁄ )-

( ⁄ )

( ⁄ )

1-2 < do

1-2 > do

1-2 do

t < -tα

t > tα

|t| > tα/2


1. The remarks made in the previous section regarding the use of the appropriate interval

estimator for 1-2 is applicable in choosing the appropriate test statistic and

establishing the critical region in hypothesis testing procedures involving 1-2.

2. If 12 and 2

2 are unknown but n1 and n2 are greater than 30, use S12 and S2

2 in place of

12 and 2

2 in the test statistic in Case (a) so that the test statistic is:

10.2.1 Test of Hypothesis for the Difference of Two Population Means Based on Two Independent Samples

Page 144: Course Notes Statistics


( )

√( ⁄ ) (

⁄ )

The critical region stays the same.

3. The flowchart which serves as a guide in the usage of the interval estimators for 1-2 in

the previous section can be applied to hypothesis testing procedure for 1-2.


A statistic test was given to 50 girls and 75 boys. The girls made an average of 80 with a

standard deviation of 4 ad the boys had an average of 86 with a standard deviation of 6. Is

there sufficient evidence at 0.05 level of significance that the average grades of girls and boys



Let Bi = score of the ith boy in a statistic test, i = 1, 2, …,75

Gj = score of the jth girl in a statistic test, j = 1, 2, …,50

Then approximately ~ N(G, G2/50)

i. Hypotheses: Ho: B=G B-G = 0 Ha: BG B-G 0

Note: Be careful in the formulation of HO and Ha

ii. Level of Significance: α=0.05

iii. Test Statistic: Since B2 and G

2 are unknown but nB >30 and nG >30 then the formula is:

( )

√( ⁄ ) (

⁄ )

iv. Critical Region: |Z| > zα/2 = z0.025 Z > z0.025 or Z < -z0.025 Z > 1.96 or Z < -1.96

v. Computation:


( )

√( ⁄ ) ( ⁄ )

√ ⁄

vi. Decision: Since z = 6.708203933

vii. Conclusion: At 5% level of significance, based on sample results, there is sufficient

evidence to say that the average grades of girls and boys differ. There is indication that

the true mean score of the boys is higher than the true mean score of the girls in the said

statistics test.

Question: If the hypotheses are written as follows:

Ho: G=B G-B = 0 Ha: GB G-B 0, how will the testing procedure change?

Page 145: Course Notes Statistics


In general, if the alternative hypothesis is stated as

1. Ha: 1>2 1-2 > 0, what test statistic will be used? What critical region will be


2. Ha: 1<2 1-2 < 0, what test statistic will be used? What critical region will be


3. Ha: 2>1 2-1 > 0, what test statistic will be used? What critical region will be


4. Ha: 2<1 2-1 < 0, what test statistic will be used? What critical region will be



As seen in the solved problem, it does not matter if the alternative hypothesis is

formulated as Ha: 12 or as Ha: 21. Though there are changes in the testing procedure,

the conclusion will be the same.

In general, the conclusion will be the same for (1) and (4) also, (2) and (3) would have the

same conclusions.

Ho Test Statistic Ha Critical Region

D = do

√ ⁄

D < do

D > do

D do

t < -tα

t > tα

|t| > tα/2


A taxi company is trying to decide whether the use of radial tires instead of regular

belted tires improves fuel economy. Twelve cars were driven twice over a prescribed test

course, each time using a different type of tires (radial and belted) in random order. At

0.025 level of significance, we can conclude that cars equipped with radial tires give better

fuel economy than those equipped with belted tires? Assume the populations to be

normally distributed.

10.2.2 Test of Hypothesis for the Mean Difference Based on Two Related Samples

Page 146: Course Notes Statistics


Car No. x1i x2i di Car No. x1i x2i di

1 4.2 4.1 0.1 7 5.7 5.7 0.0

2 4.7 4.9 -0.2 8 6.0 5.8 0.2

3 6.6 6.2 0.4 9 7.4 6.9 0.5

4 7.0 6.9 0.1 10 4.9 4.7 0.2

5 6.7 6.8 -0.1 11 6.1 6.0 0.1

6 4.5 4.4 0.1 12 5.2 4.9 0.3


We are going to use the formula on testing the difference between two population

means using two related samples. Why?

Let x1i = mileage (in km/liter) of the ith car using radial tires

x2i = mileage (in km/liter) of the ith car using regular belted tires

di= x1i - x2i = difference of mileage of the ith car using regular belted tires from its

mileage using radial tires. i = 1, 2, …, 12.

Note how the di was defined affects the formulation of the null and alternative hypothesis.

di ~ N(D, D2)

i. Hypotheses: Ho: D = 0 vs. Ha: D > 0

ii. Level of Significance: α = 0.05

iii. Test Statistic:


iv. Critical Region: t > tα, n-1 = t0.025, 11 = 2.201

v. Computation:


√ ⁄

vi. Decision: Since t = 2.484515151 > 2.201, we reject Ho at 2.5% level of significance.

vii. Conclusion: At 2.5% level of significance, based on sample results, there is sufficient

evidence to conclude that on the average, cars equipped with radial tires give better fuel

economy than those equipped with regular belted tires.

Question: How is the testing procedure affected if di is defined as di = x2i-x1i, i = 1, 2, …, 12?

Page 147: Course Notes Statistics



For determination of whether or not a heat treatment is effective in reducing the

number of bacteria in skim milk at Kroft Foods Inc., counts were made before and after

treatment on 12 samples of skim milk with the results shown in the table below. The data are in

the form of log DMC, the logarithms of direct microscopic counts. Test the hypothesis at 0.05

level of significance that the heat treatment is effective. Assume normality of log DMC


Sample 1 2 3 4 5 6 7 8 9 10 11 12

Before Treatment 6.98 7.08 8.34 5.30 6.26 6.77 7.03 5.56 5.97 6.64 7.03 7.69

After Treatment 6.95 6.94 7.17 5.15 6.28 6.81 6.59 5.34 5.98 6.51 6.84 6.99

Suppose that interest lies in the comparison of proportions P1 and P2 of an attribute in

two populations. We want to see whether based on the sample results; there is sufficient

evidence to say that P1 is significantly different from P2. In other words, it is of interest to test

the null hypothesis Ho: P1 = P2 where P1 and P2 are the two population proportions of interest.

The testing procedure involves selection of independent samples of size n1 and n2 from

two binomial populations.

Let X = number of successes in n1 trials (sample 1) from the first population and

Y = number of successes in n2 trials (sample 2) from the second population

The sample proportions

n and

n are computed and the common

(population) proportion P is given as the pooled estimate

. The test is as follows:

Ho Test Statistic Ha Critical Region

p1 = p2

√ . /

p1 < p2

p1 > p2

p1 p2

z < -zα

z > zα

|z| > zα/2

10.2.3 Test of Hypothesis for the Difference of Two Population Proportions

Page 148: Course Notes Statistics



In a survey of 200 students, 78 out of the 120 females in the sample passed Math 17 on

their first take while this figure is 60 among the 80 male students. Will you agree that the

proportion of males who passed Math 17 on their first take is higher than the proportion of

females who passed the same course on their first take? Test at α=0.05.


Let X = number of females (out of 120) who passed Math 17 on their first take

Y = number of males (out of 80) who passed Math 17 on their first take

Where P1 is the population proportion of females who pass Math 17 on first take and P2 is the population proportion of males who pass Math 17 on their first take.

i. Hypotheses: Ho: P1 = P2 vs. Ha: P1 < P2

ii. Level of Significance: α = 0.05

iii. Test Statistic:

√ .


iv. Critical Region: z > -zα= -z0.05 = -1.645

v. Computation:




√ . /

√( )( ) .


vi. Decision: Since z = -1.498011773 ≮ -1.645, we do not reject Ho at 5% level of significance.

vii. Conclusion: At 5% level of significance, based on sample results, there is insufficient

evidence to say that the proportion of males who passed Math 17 on their first take is

significantly different from the population of females who passed Math 17 on their first


Page 149: Course Notes Statistics


• Chi-Square Test

Suppose a sample of units has been taken from a population and information on

classification according to two nominal variables has been obtained. Tests of independence are

useful in assessing whether classification in one categorical variable has a relationship with

classification in another categorical variable.

For example, suppose that an employee responsible for monitoring the quality of

products manufactured by their firm is concerned with determining whether or not there is a

relationship between the production shift and the presence of a defect in the units produced.

If in the population of all units manufactured by the firm, the true proportion of

defective items per shift is 5%, then being classified as defective or not has nothing to do with

the shift of production of the unit. In this scenario, presence of defect is independent of the

production shift. If however, the true proportions of defective items differ among the

production shifts, then the presence of defect in a unit is related to when the unit has been


Unfortunately, the employee has no way of knowing the proportion of defectives per

shift in the population of units manufactured by the firm unless he subjects each unit produced

by the firm to testing, which would be impractical. What he can do is to take a random sample

of units from the population, classify the units according to production shift and presence of

defect (the two categorical variables of interest), and to test the independence of the said


Now it is possible that even if production shift and presence of defect are independent

in the population, he may get a sample of units such that the sample proportions of defective

units vary among the production shifts. This is because of sampling error, the error committed

because only a sample has been taken from the population instead of taking information from

all units in the population. But if the presence of defectives and non-defectives per shift in the

sample of units would not differ much from what is expected under independence.

To illustrate, suppose that the employee obtained a sample of units such that the

number of units classified per shift and the number of defectives and non-defectives are as


Page 150: Course Notes Statistics


Cross-tabulation of Units According to Production Shift and Presence of Defect

Shifts Without Defect With Defect Total

Morning E11 = r1c1 = (400)(950)/1000 E12 = r1c2 = (400)(50)/1000 r1 = 400

Afternoon E21 = r2c1 = (300)(950)/1000 E22 = r2c2 = (300)(50)/1000 r2 = 300

Night E31 = r3c1 = (300)(950)/1000 E32 = r3c2 = (300)(50)/1000 r3 = 300

Total c1 = 950 c2 = 50 N = 1000

The marginal frequencies and grand total are then as follows:

r1 = the number of units in the sample produced under Morning shift = 400

r2 = the number of units in the sample produced under Afternoon shift = 300

r3 = the number of units in the sample produced under Night shift = 300

c1 = the number of units in the sample without defects = 950

c2 = the number of units in the sample with defects = 50

N = the number of units in the sample = 1000

Note that among the 1000 units in the sample, 95% are classified as not being defective

and 5% are classified as being defective. If production shift and presence of defect are truly

independent, then we should expect that than, per production shift, we will be able to classify

95% of the units as non-defective and 5% as defective. That is,

E11 = number of units in the sample produced under Morning shift and classified as not

having defects that we expect to obtain if production shift and presence of

defect are truly independent = (400)(950)/1000 = 380.

E12 = number of units in the sample produced under Morning shift and classified as

having defects that we expect to obtain if production shift and presence of

defect are truly independent = (400)(50)/1000 = 20.

In general, Eij = the number of units in the sample produced under Shift I and outcome j

that we expect to obtain if production shift and presence of defect are truly independent,

i= 1, 2, 3, and j= 1,2.

The other expected frequencies under independence of production shift and presence

of defect can be similarly calculated and are presented in the table below.

Expected Frequencies Under Independence of Production Shift and Presence of Defect

Shifts Without Defect With Defect Total

Morning 380 20 400

Afternoon 285 15 300

Night 285 15 300

Total 950 50 1000

Page 151: Course Notes Statistics


Note that the expected frequencies were calculated using the marginal frequencies

and the grand total. To illustrate,

( )( )

( )( )

If the observed number of units in the sample classified under the variables “production

shift” and “presence of defect” (the observed frequencies) are not much different from what is

expected (expected frequencies, Eij’s) under independence of production shift and presence of

defect in the population, then there is no sufficient evidence based on the sample to reject

independence . But if the said differences are large, then we tend to reject the hypothesis of

independence of production shift and presence of defect in the units produced by the firm.

The question now is, how large should the differences between the observed frequencies

and the expected frequencies under independence be for the employee to reject independence

of production shift and presence of defect?

For example, suppose that the employee has already constructed the contingency table

of 1000 units classified according to production shift and presence of defect:

Observed Frequencies of 1000 Units Classified According to Production Shift and Presence of Defect

Shifts Without Defect With Defect Total

Morning 392 8 400

Afternoon 275 25 300

Night 283 17 300

Total 950 50 1000

Note that the sample percentages of defectives are different for the three shifts: 2% for

Morning shift, 8.33% for Afternoon shift, 5.67% for Night shift. There are then differences

between the observed and the expected number of units classified according to production

shift and presence of defect. As an example, there are more units produced under afternoon

shift that have defects (O22=25) than what one would expect if production shift and presence of

defect are independent (E22=15). The differences Oij-Eij can be assessed for the three shifts

(i=1,2,3) and outcome (j=1,2). But are these differences large enough for us to reject

independence of production shift and presence of defect?

The Chi-Square Test of Independence is a formal statistical test that provides an

objective assessment as to whether or not the magnitude of the differences between observed

and expected frequencies are large enough to reject the null hypothesis (independence).

Page 152: Course Notes Statistics


It makes use of the following test statistic and critical region:

∑∑( )


(r )( )

where r = number of levels of the row variable and

c = number of levels of the column variable


1. Will χ2 be ever a negative value? Why?

2. When will χ2 take the value of zero?

3. When does χ2 tend to be a large value?

Let us test the hypothesis of independence of production shift and presence of defect

using the chi-square test of independence.

Example: Given below is a 3 x 2 contingency table of 1000 units classified according to

production shift and presence of defect.

Observed Frequencies of 1000 Units Classified According to Production Shift and Presence of Defect

Shifts Without Defect With Defect Total

Morning 392 8 400

Afternoon 275 25 300

Night 283 17 300

Total 950 50 1000

Test the hypothesis of independence of production shift and presence of defect at 5%

level of significance.


a. Hypothesis: Ho: Production shift and presence of defect are not related.

HA: Production shift and presence of defect are related.

b. Level of Significance: α = 0.05

c. Test Statistic: ∑ ∑( )

∑ ∑

d. (r )( )

( )( )

Page 153: Course Notes Statistics


e. Computations:

Observed Frequencies of 1000 Units Classified According to Production Shift and Presence of Defect

Shifts Without Defect With Defect

Total Observed Expected Observed Expected

Morning 392 380 8 20 400

Afternoon 275 285 25 15 300

Night 283 285 17 15 300

Total 950 50 1000


f. Decision: Since χ2 = 14.87719 > 5.991, we reject Ho at 5% level of significance.

g. Conclusion: At 5% level of significance, based on sample results, there is

sufficient evidence to say that presence of defect in a unit is related to the shift

when it was produced.


1. Mediterranean Diet Case Study: In the study, 605 survivors of heart attack who were made to

undergo either the AHA diet or the Mediterranean diet were monitored and classified according to

health condition. The resulting contingency table of subjects according to diet followed and health

condition is presented below.

Diet Health Condition

Cancers Deaths Nonfatal Illness Healthy Total

AHA 15 24 25 239 303

Mediterranean 7 14 8 273 302

Total 22 38 33 512 605

Is there sufficient evidence to say that diet and health condition are related at α=0.05

2. A research was undertaken to study factors related to mother’s choice of infant feeding method.

One of the factors examined was monthly family income. Do the data below indicate an association

between family income and method of feeding? Use 0.10 level of significance.

Monthly Family Income

Method of Feeding Total

Bottle Breast

200 - 249 4 65 69

250 -499 24 12 36

500 - 749 5 29 34

750 - 1000 6 4 10

Total 39 110 149

Page 154: Course Notes Statistics



1. The test is VALID if at least 80% of the cells have expected frequencies of AT

LEAST 5 and no cell has an expected frequency ≤ 1.

2. If many expected frequencies are very small, researchers commonly combine

categories of variables to obtain a table having larger cell frequencies. Generally,

one should not pool categories unless there is a natural way to combine them.

3. For a 2 x 2 contingency table, a correction called Yates’ correction for continuity is

applied. The formula then becomes;

∑∑(| | )

4. We could also test the independence of a 2 x 2 contingency table without

computing for the expected frequencies using the formula below;

Variable X Variable Y

Total Category 1 Category 2

Category A a b a+b

Category B c d c+d

Total a+c b+d a+b+c+d = N

.| |


( )( )( )( )

Example: A service company is classified as small if the number of its employees is at

most 200, and it is classified as big otherwise. Profit in sales of services, such as training

and consulting, is classified as either low or high.

Profit Level Size of Company

Small Big

Low Service Profit 30 63

High Service Profit 75 32

Test whether the size of the service company is independent of the level of profit in sales

of services at the 0.05 level of significance.


a. Hypothesis: Ho: Size of the service company is independent of the level of profit

in sales of services.

HA: Size of the service company is not independent of the level of

profit in sales of services.

Page 155: Course Notes Statistics


b. Level of Significance: α = 0.05

c. Test Statistic: n.| |


( )( )( )( )

d. (r )( )

( )( )

e. Computations:

Profit Level Size of Company

Total Small Big

Low Service Profit 30 63 93

High Service Profit 75 32 107

Total 105 95 200

.| |


( )( )( )( ) .|( )( ) ( )( )|


( )( )( )( )

f. Decision: Since χ2 = 27.064 > 3.841, we reject Ho at 5% level of significance.

g. Conclusion: At 5% level of significance, based on sample results, there is

sufficient evidence to say that Size of the service company is not independent of

the level of profit in sales of services.

Page 156: Course Notes Statistics


• Introduction to Correlation and Regression Analysis

6.1 Introduction to Correlation

In the estimation and hypothesis testing procedures for parameters that we performed

in the previous chapters, we are concerned about a single variable of interest which we

measure from units drawn from one or two populations.

In this chapter, the focus would be on finding and assessing relationships existing

between variables of at least interval scale. We are specifically interested in determining,

estimating and assessing a linear relationship existing between two variables.

To visualize the relationship that exists between two variables X and Y of at least

interval scale, a scatter diagram which plots the ordered measurements (Xi, Yi) taken from n

observations would be helpful.

Example (GPI and Starting Salary):

Suppose a researcher wishes to investigate the relationship between the achieved

grading-point index (GPI) and the starting salary f recent graduates majoring in business. A

random sample of 30 recent graduates majoring in business is drawn, and the data pertaining

to the GPI and starting salary (in thousands of dollars) are recorded for each individual in the

following table:

6.1.1 Graphical Presentation of Linear Relationship

Page 157: Course Notes Statistics


GPI and Starting Salary of 30 Recent Graduates Majoring in Business (in thousands of dollars)

Individual GPI Salary Individual GPI Salary

1 2.7 17.0 16 3.0 17.4

2 3.1 17.7 17 2.6 17.3

3 3.0 18.6 18 3.3 18.1

4 3.3 20.5 19 2.9 18.0

5 3.1 19.1 20 2.4 16.2

6 2.4 16.4 21 2.8 17.5

7 2.9 19.3 22 3.7 21.3

8 2.1 14.5 23 3.1 17.2

9 2.6 15.7 24 2.8 17.0

10 3.2 18.6 25 3.5 19.6

11 3.0 19.5 26 2.7 16.6

12 2.2 15.0 27 2.6 15.0

13 2.8 18.0 28 3.2 18.4

14 3.2 20.0 29 2.9 17.3

15 2.9 19.0 30 3.0 18.5

Preliminary investigation on the scatterplot below indicates that there is a positive

linear relationship between the grade-point index and the starting salaries (in thousand dollars)

of the random sample of 30 recent graduates majoring in Business. As the grade-point index

increases, the starting salary (in thousand dollars) also tends to increase approximately, on a

straight line.







0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0



g Sa







Scatterplot of the Grade-Point Index vs. the Starting Salaries (in 000's) of 30 Recent Graduates Majoring in Business

Page 158: Course Notes Statistics


Though the linear relationship between the GPI and the starting salaries of the 30

recent graduates can be clearly seen in the above graph, it would be much informative to

quantify the strength of the said linear relationship by computing a summary measure called

linear correlation coefficient.

The correlation coefficient is a summary measure on the strength of the linear

relationship between two variable X and Y that is independent of their respective scales of


Properties of the Linear Correlation Coefficient:

1. The correlation coefficient ρ takes values from -1 to 1.

2. A positive value of ρ indicates that the line slopes upward to the right. This means that

as X increases, Y tends to increase also. Similarly as X decreases, Y also tends to

decrease. On the other hand, a negative value of ρ indicates that the line slopes

downward to the right. This means that as X increases, Y tends to decrease.

Alternatively, as X decreases, Y tends to increase.

3. When ρ is -1 or 1, there is perfect linear relationship between X and Y and all the points

(x, y) fall on a straight line. A ρ close to -1 or 1 indicates a strong linear relationship but it

does not necessarily imply that X causes Y or Y causes X. It is possible that a third

variable may have caused the change of both X and Y (spurious correlation), producing

the observed relationship.

4. If ρ = 0, then there is no linear correlation between X and Y. A value of ρ = 0, however,

does not mean a lack of association. For example, if a strong quadratic relationship

exists between X and Y, it is still possible to obtain a zero correlation to indicate

nonlinear relationship.

Properties of the Linear Correlation Coefficient:

For a random sample of n pairs of measurements (Xi, Yi), i= 1, 2, …, n, the Pearson

Product Moment Coefficient of Correlation, denoted by r, can be computed using the formula:

𝑟 ∑ X n (∑ X

n )(∑

n )

√( ∑ X n

(∑ X n )

)( ∑ n

(∑ n )


6.1.2 The Pearson Product Moment Coefficient of Correlation

Page 159: Course Notes Statistics


GPI and Starting Salary Example (Cont.)

Let Xi = grade-point index of the ith graduate majoring in Business

Yi = starting salary (in thousand dollars) of the ith graduate majoring in Business

From the calculated value of r, there is a strong positive linear relationship between the

GPI and the starting salary of the 30 recent graduates majoring in Business. As the GPI of the

recent graduate increases, the starting salary tends to increase.

Graduate (i)

GPI (Xi)

Starting Salary (Yi) XiYi Xi

2 Yi2

1 2.7 17.0 45.9 7.29 289.00

2 3.1 17.7 54.87 9.61 313.29

3 3.0 18.6 55.8 9.00 345.96

4 3.3 20.5 67.65 10.89 420.25

5 3.1 19.1 59.21 9.61 364.81

6 2.4 16.4 39.36 5.76 268.96

7 2.9 19.3 55.97 8.41 372.49

8 2.1 14.5 30.45 4.41 210.25

9 2.6 15.7 40.82 6.76 246.49

10 3.2 18.6 59.52 10.24 345.96

11 3.0 19.5 58.5 9.00 380.25

12 2.2 15.0 33 4.84 225.00

13 2.8 18.0 50.4 7.84 324.00

14 3.2 20.0 64 10.24 400.00

15 2.9 19.0 55.1 8.41 361.00

16 3.0 17.4 52.2 9.00 302.76

17 2.6 17.3 44.98 6.76 299.29

18 3.3 18.1 59.73 10.89 327.61

19 2.9 18.0 52.2 8.41 324.00

20 2.4 16.2 38.88 5.76 262.44

21 2.8 17.5 49 7.84 306.25

22 3.7 21.3 78.81 13.69 453.69

23 3.1 17.2 53.32 9.61 295.84

24 2.8 17.0 47.6 7.84 289.00

25 3.5 19.6 68.6 12.25 384.16

26 2.7 16.6 44.82 7.29 275.56

27 2.6 15.0 39 6.76 225.00

28 3.2 18.4 58.88 10.24 338.56

29 2.9 17.3 50.17 8.41 299.29

30 3.0 18.5 55.5 9.00 342.25

Sum 87.0 534.3 1564.2 256.1 9593.4 Sum2 7569.0 285476.5

r 0.865088

Page 160: Course Notes Statistics



1. The sample Pearson correlation coefficient r is used to estimate ρ based on a random

sample of n pairs of measurements (Xi, Yi), for i = 1, 2, …, n.

2. r also takes on values between -1 and 1.

3. Just like ρ, when r = 1 or -1, all the points (xi, yi), for i= 1, 2, …,n, fall on a straight line;

when r = 0, they are scattered and give no evidence of a linear relationship. Any other

value of r suggests the degree to which the points tend to be linearly related.

Example: The table below presents the pairs of measurements (Xi, Yi), for i = 1, 2, …, 30 and the

computation for the sample Pearson product moment coefficient of correlation r.

i Xi Yi XiYi Xi2 Yi


1 1 2 2 1 4

2 2 4 8 4 16

3 3 6 18 9 36

4 4 8 32 16 64

5 5 10 50 25 100

6 6 12 72 36 144

7 7 14 98 49 196

8 8 16 128 64 256

9 9 18 162 81 324

10 10 20 200 100 400

11 11 22 242 121 484

12 12 24 288 144 576

13 13 26 338 169 676

14 14 28 392 196 784

15 15 30 450 225 900

16 16 30 480 256 900

17 17 28 476 289 784

18 18 26 468 324 676

19 19 24 456 361 576

20 20 22 440 400 484

21 21 20 420 441 400

22 22 18 396 484 324

23 23 16 368 529 256

24 24 14 336 576 196

25 25 12 300 625 144

26 26 10 260 676 100

27 27 8 216 729 64

28 28 6 168 784 36

29 29 4 116 841 16

30 30 2 60 900 4

Page 161: Course Notes Statistics


Xi Yi XiYi Xi2 Yi


Sum 465 480 7440 9455 9920

Sum2 216225 230400

r 0

It would be erroneous to conclude that since the value of r is 0, there is no relationship

between X and Y. Based on examination of the values of X and Y in the above table, and

based on what can be clearly seen on the scatter diagram below, there is a relationship

between X and Y, but this relationship is not linear.

The correlation coefficient measures the degree of linear relationship between two

variables X and Y. A value of 0 for the correlation coefficient does not imply absence of


Source: http://en.wikipedia.org/wiki/Correlation









0 5 10 15 20 25 30 35



Scatterplot of Values of X vs. Values of Y

Page 162: Course Notes Statistics


Remark: Several sets of (x, y) points, with the Pearson correlation coefficient of x and y for each set.

Note that the correlation reflects the noisiness and direction of a linear relationship (top row),

but not the slope of that relationship (middle), nor many aspects of nonlinear relationships

(bottom). N.B.: the figure in the center has a slope of 0 but in that case the correlation

coefficient is undefined because the variance of Y is zero.

The computation and interpretation of r is applicable to the data on hand. The linear

relationship being described by the sample correlation holds for the same units from which the

measurements on X and Y where obtained.

However, there is an important question that we may want answered: Does the linear

relationship described by the sample correlation hold for the population from which the

sample of units has been taken? or did the presence of linear relationship among sample

measurements of X and Y only occur by chance?

To answer the said question, we can perform hypothesis testing procedure fro the

correlation coefficient.

Ho Test Statistic Ha Critical Region

ρ = 0 𝑟√

√ 𝑟

ρ < 0

ρ > 0

ρ 0

t < -tα, v

t > tα, v

|t| > tα/2, v

GPI and Starting Salary Example (Cont.)

Is the linear correlation coefficient between GPI and starting salary of all recent

graduates majoring in Business significantly different from zero? Test at α = 0.01.

a. Hypotheses: Ho: ρ = 0 vs. Ha: ρ 0

b. Level of Significance: α = 0.01

c. Test Statistic: √n

√ ,

d. Critical Region: |t| > tα/2, n-2 = |t| > t0.005, 28 = 2.763

e. Computations: r = 0.865088 n = 30


√ 𝑟 √

√ ( )

6.1.3 Hypothesis Testing Procedure for the Correlation Coefficient

Page 163: Course Notes Statistics


6.2 Introduction to Regression Analysis

f. Decision: Since |t| = 9.12564431079317 > 2.763, we reject Ho at 1% level of significance.

g. Conclusion: At 1% level of significance, based on sample results, there is sufficient

evidence to say that the linear relationship between GPI and starting salaries of recent

graduates majoring in Business is significant.

Until now, we have discussed statistical inferences based on the sample measurements

of a sing variable. In many investigations, two or more variables are observed for each

experimental unit in order to determine: (1) whether the variables are related, (2) how

strong the relationships appear to be and (3) whether one variable of primary interest can be

predicted from observations of the other variables.

Regression analysis concerns the study of relationships between variables with the

object of identifying, estimating, and validating the relationship. The estimated relationship

can then be used to predict one variable from the value of the other variable/s. In this

course, we study the subject with specific reference to the straight-line model.

A regression problem involving a single predictor (also called simple linear regression)

arises when we wish to study the relation between two variable X and Y and use it to predict

Y from X. The variable X acts as an independent variable whose values are controlled while

the variable Y depends on X and is also subjected to unaccountable variations or errors.

Illustration (Drug Evaluation Study): In one stage of the development of a new drug from

an allergy, an experiment is conducted to study how different dosages of the drug affect the

duration of relief from the allergic symptoms. Ten patients are included in the experiment.

Each patient receives a specified dosage of the drug and is asked to report back as soon as

the protection of the drug seems to wear off. The observations are recorded below, which

shows the dosage x and duration of relief y for the 10 patients.

6.2.1 Simple Linear Regression (SLRM)

X – is called the independent/ explanatory/ predictor/ causal or input variable

Y – is called the dependent or response variable

Page 164: Course Notes Statistics


Dosage (in mL) and the Number of Days of Relief from Allergy of 10 Patients

Dosage (xi)

Duration of Relief (yi)

3 9

3 5

4 12

5 9

6 14

6 16

7 22

8 18

8 24

9 22

Seven different dosages are used in the experiment, and some of these are repeated for

more than one patient. A glance at the table shows that y generally increases with x, but it is

difficult to say much more about the form of the relation simply by looking at this tabular


Generally, for any (generic) experiment, we use n to denote the sample size or the

number of runs of the experiment. Each run gives a pair of observations (x, y) in which x is

the fixed setting of the independent variable and y denotes the corresponding response.

Data Structure for a Simple Regression Independent

Variable Response Variable

x1 y1

x2 y2 x3 y3 . . .



. xn yn

We always begin our analysis by plotting the data because the eye can easily detect

patterns along a line or a curve. Thus, plotting a scatter diagram is an important preliminary

step prior to undertaking a formal statistical analysis of the relationship between two variables

Page 165: Course Notes Statistics


Recall that if the relation between y and x is exactly a straight line, then the variables

are connected by the formula;

where β0 indicates the intercept of the line with y-axis and β1 represents the slope of

the line. Statistical ideas must be introduced into study of the relation when the points in a

scatter diagram do not lie perfectly on a line, as in the scatter plot above. We think of these

data as observations on an underlying linear relation that is being masked by random

disturbances or experimental errors due in part to differences in severity of allergy, physical

condition of subjects, their environment, and so on. Given this viewpoint, we formulate the

following linear regression model as a tentative representation of the mode of relationship

between y and x.


We assume that the response Y is a random variable that is related to the predictor

variable x by:

, i = 1, 2, …, n

1. Yi denotes the response corresponding to the ith observation/experimental unit in which the input variable x is set at the value xi.

2. n are the unknown error components that are superimposed on the true linear relation. We assume that they are normally distributed with mean 0 and an

unknown variance 2.

3. The parameters β0 and β1, which together locate the straight line, are unknown.








0 2 4 6 8 10








Scatter Diagram of Dosage (x) against Duration of Relief (y)

Page 166: Course Notes Statistics


Because we again are just using a sample of the entire population of interest, we would

only be estimating β0 and β1. The problem of estimating the regression parameters β0 and β1

can be viewed as fitting the best line of the y to x relationship on the scatter diagram. One can

draw a line by eyeballing the scatter diagram, but such a judgment may be open to dispute.

Moreover, statistical inferences cannot be based on a line that is estimated subjectively. On

the other hand, the method of least squares is an objective and efficient method of

determining the best-fitting straight line. Moreover, this method is quite versatile because its

application extends beyond the simple straight-line regression model.

Suppose that an arbitrary line y = b0+ b1x is drawn on the scatter diagram. At the value xi

of the independent variable, the y-value predicted by this line is b0+ b1xi whereas the observed

value is yi. The discrepancy between the observed and predicted y’s is then (yi-b0-b1xi) = di,

which is the vertical distance of the point from the line.

Considering such discrepancies at all the n points, we take

∑ ∑( )

as an overall measure of the discrepancy of the observed point from the trial line

yi-b0-b1xi. The magnitude of D obviously depends on the line that is drawn. In other words, it

depends on b0 and b1, the two quantities that determine the trial line. A good fit will make D as

small as possible. We now state the principle of least squares in general terms to indicate its

usefulness to fitting many other models.


Determine the values for the parameters so that the overall discrepancy

∑( 𝑟 𝑟 𝑟 𝑟 )

is minimized. The estimates thus determined are called the least square estimates.

For the straight-line model, the least squares principle involves the determination of

b0 and b1 to minimize.

The quantities b0 and b1 thus determined are denoted by and , respectively and

called the least squares estimates of the regression parameters β0 and β1. The best-fitting

straight line is then given by the equation

6.2.2 Least Squares Method of Parameter Estimation

Page 167: Course Notes Statistics


The Formulas for the Least Squares Estimates are;

Least Squares Estimate of β0 :

Least Squares Estimate of β1 :











so that the estimated regression line is .


is the estimated mean value of Y when the value of Y when the value of X is

set to 0.

is the estimated increase/decrease in the mean of Y for every unit increase

in the value of X.

Cause-and-Effect and Linear Relationships

One source of misconception of statistics in the area of correlation and regression is

the inference of a cause-and-effect relationship between characteristics from the

appearance of a string linear relationship.

When anyone states, “Studies show that A is a cause of B and some statistics back it

up,” be ready to reply, “Correlation does not imply causation.” Always be on the lookout for

what lurks beneath the data.


1. Sleeping with one's shoes on is strongly correlated with waking up with a headache.

Therefore, sleeping with one's shoes on causes headache.

The above example commits the correlation-implies-causation fallacy, as it prematurely concludes

that sleeping with one's shoes on causes headache. A more plausible explanation is that both are

caused by a third factor, in this case going to bed drunk, which thereby gives rise to a correlation. So

the conclusion is false.

Page 168: Course Notes Statistics


2. Young children who sleep with the light on are much more likely to develop myopia in

later life. Therefore, sleeping with the light on causes myopia.

This is a scientific example that resulted from a study at the University of Pennsylvania Medical

Center. Published in the May 13, 1999 issue of Nature, the study received much coverage at the time

in the popular press. However, a later study at Ohio State University did not find that infants

sleeping with the light on caused the development of myopia. It did find a strong link between

parental myopia and the development of child myopia, also noting that myopic parents were more

likely to leave a light on in their children's bedroom. In this case, the cause of both conditions is

parental myopia, and the above-stated conclusion is false.

3. As ice cream sales increase, the rate of drowning deaths increases sharply. Therefore,

ice cream consumption causes drowning.

The aforementioned example fails to recognize the importance of time and temperature in

relationship to ice cream sales. Ice cream is sold during the hot summer months at a much greater

rate than during colder times, and it is during these hot summer months that people are more likely

to engage in activities involving water, such as swimming. The increased drowning deaths are

simply caused by more exposure to water-based activities, not ice cream. The stated conclusion is


4. Since the 1950s, both the atmospheric CO2 level and obesity levels have increased

sharply. Hence, atmospheric CO2 causes obesity.

Richer populations tend to eat more food and consume more energy

5. HDL ("good") cholesterol is negatively correlated with incidence of heart attack.

Therefore, taking medication to raise HDL will decrease the chance of having a heart


Further research has called this conclusion into question. Instead, it may be that other underlying

factors, like genes, diet and exercise, affect both HDL levels and the likelihood of having a heart

attack; it is possible that medicines may affect the directly measurable factor, HDL levels, without

affecting the chance of heart attack.

These conclusions are example of spurious (false) correlation. It simply means that

two variables are correlated due to some other variable that is related to both the former


Page 169: Course Notes Statistics



1. Apply the least squares method to the given dataset on the Drug Evaluation Study


Computations for the Least Squares Line


(x) Duration of Relief

(y) x2 xy

3 9 9 27 7.15

3 5 9 15 7.15

4 12 16 48 9.89

5 9 25 45 12.63

6 14 36 84 15.37

6 16 36 96 15.37

7 22 49 154 18.11

8 18 64 144 20.86

8 24 64 192 20.86

9 22 81 198 23.60

Sum 59 151 389 1003 151



( ) ( ) ( )( )

( )( )

Thus, the estimated regression line is given by: .








2 3 4 5 6 7 8 9 10








Scatter Diagram of Dosage (x) against Duration of Relief (y)

Page 170: Course Notes Statistics


2. Suppose that the following data were collected on emphysema patients: the number of

years the patient smoked (x) and a physician’s evaluation of the patient’s lung capacity

(y) (measured on a scale of 0 to 100). The results for a sample of ten patients appear in

the accompanying table:

Patient Years of Smoking (X)

Lung Capacity (Y)

1 25 55

2 36 60

3 22 50

4 15 30

5 48 75

6 39 70

7 42 70

8 31 55

9 28 30

10 33 35

i. Plot the data on a scatter diagram.

ii. Use the method of least squares to estimate the regression line.

iii. Predict a person’s lung capacity after 30 years of smoking.


For (i), the scatter plot is given below:

y = 1.3092x + 11.238










10 20 30 40 50


g C



Years of Smoking

Scatter Diagram of Years of Smoking vs. Lung Capacity

Page 171: Course Notes Statistics


6.3 Diagnostic Checking for Regression

The individual deviations of the observations yi from the fitted values are called the

residuals, and we denote these by ei. That is;


1. The residuals are not viewed as the ‘estimates’ of the error terms εi’s, however, they are

important since their values are used to check the assumptions of the regression model.

2. Although some residuals are positive and some negative, a property of the least squares

fit/estimates is that the sum of the residuals is always zero.

3. The residual sum of squares is also called the sum of the squares due to error and is

abbreviated as SSE.

( )





4. An estimate of the error variance 2 is obtained by dividing SSE by n-2. The reduction by

tow is because two degrees of freedom are lost from estimating the two parameters β0

and β1.

Example (Drug Evaluation Study cont.): Using the data from the Drug Evaluation Study,

compute for the estimate of the error.


yi 9 5 12 9 14 16 22 18 24 22

7.15 7.15 9.89 12.63 15.37 15.37 18.11 20.86 20.86 23.60

ei 1.85 -2.15 2.11 -3.63 -1.37 0.63 3.89 -2.86 3.14 -1.60

so that,


n ( ) ( ) ( )

6.3.1 Residual and Error Variance Estimate

Page 172: Course Notes Statistics


In terms of the sum of squares of X and Y,

( )

( )

It is important to remember that the line obtained by the principle of

least squares is an estimate of the unknown true regression line. In our drug evaluation

problem, the estimated line is

Its slope 2.74 suggests that the mean duration of relief increases by 2.74 days for each

unit dosage of the drug. Also, if we were to estimate the mean or expected duration of relief

for a specified dosage x=4.5 mg, we would naturally use the fitted regression line to calculate

the estimate -1.07+2.74(4.5) = 11.26 days. A few questions concerning these estimates

naturally arise at this point.

1. In light the value 2.74 for , could the slope β1 of the true regression line be as much

as 4? Could it be zero so that the true regression line is y = β0, which does not depend

on x? What are the plausible values for β1.

2. How much uncertainty should be attached to the estimated duration of 11.26 days

corresponding to the given dosage x = 4.5?

To answer these and other related questions, we must know something about the

sampling distribution of the least squares estimators. These sampling distributions will enable

us to test the hypotheses and set confidence intervals for the parameters β0 and β1 that

determine the straight line and for the straight line itself.

Again, the t-distribution would be used.

6.3.2 Inferences on β1 and β0 (Test of Significance)

Page 173: Course Notes Statistics


Inferences Concerning the Slope β1 The standard error of the least squares estimator is

( )

and the estimated standard error is given by

( )

(i) To test H0: β1 = 0 vs. β1 0, use the test statistic

( )

and reject H0 when |t| ≥ tα/2, n-2

(ii) A (1-α)100% confidence interval for β1 is

( n ( ))

where tα/2, n-2 is the upper α/2 point of the t-distribution with n-2 degrees of freedom

Inferences Concerning the Intercept β0

The standard error of the least squares estimator is

( ) √ ⁄ ⁄

and the estimated standard error is given by

( ) √ ⁄ ⁄

(i) To test H0: β0 = 0 vs. β0 0, use the test statistic

( )

and reject H0 when |t| ≥ tα/2, n-2

(ii) A (1-α)100% confidence interval for β0 is

( n ( ))

where tα/2, n-2 is the upper α/2 point of the t-distribution with n-2 degrees of freedom

Page 174: Course Notes Statistics



1. Do the data on the Drug Evaluation Study constitute strong evidence that the intercept

and the slope are both significantly different from zero?


For β1, we are to test the null hypothesis H0: β1 = 0 against the alternative

hypothesis Ha: β1 0. The test statistic is given by:

( ) √

Since t=6.21 ≥ tα/2, n-2, we conclude that β1 is significantly different from zero and

that the duration of relief tends to vary linearly (in this case, increasing) with the

dosage of the drug over the range of values considered in the study.

For β0, we are to test the null hypothesis H0: β0 = 0 against the alternative

hypothesis Ha: β0 0. The test statistic is given by:

( ) √ √ ⁄ ( )

Since t= |-0.3888| ≱ tα/2, n-2, we accept the null hypothesis that the intercept is

not significantly different from zero. Anyway, the parameter β0 is of little

importance to us because the range of x values covered in the experiment was 3 to

9 and it would be unrealistic to extend the line to x=0.

2. Do the data on the Smoking and Lung Capacity Study constitute strong evidence that

the intercept and the slope are both significantly different from zero?


For β1, we are to test the null hypothesis H0: β1 = 0 against the alternative

hypothesis Ha: β1 0. The test statistic is given by:

( ) √


For β0, we are to test the null hypothesis H0: β0 = 0 against the alternative

hypothesis Ha: β0 0. The test statistic is given by:

( ) √ √ ⁄ ( )

, Conclusion?

Page 175: Course Notes Statistics


To arrive at a measure of adequacy of the straight-line model, we examine how

much of the variation in the response variable is explained by the fitted regression line. To

this end, we view an observed yi as consisting of two components:

or in terms of the estimated regression model,

Recall that the purpose of regression is to explain the variability of the dependent

variable Y using the variability of the independent variable x. The total variability of the y-

values is reflected in the sum of squared deviations from their mean, that is

∑( )

Using the formula for SSE,

( )

we have,

( )


( )

6.3.3 The Coefficient of Determination



part explained by the

linear relation

part unaccounted

by the linear relation


value predicted value residual



Unexplained Variability/

Variability due to other factors

Variability explained

by the Linear relation

Page 176: Course Notes Statistics



1. The first term on the right-hand side of the equation is called the sum of squares due

to linear regression (SSR). Likewise, SYY is also called the total sum of squares of Y


2. In order for the straight-line model to be considered as providing a good fit of the

data, SSR should comprise a large portion of SST.

3. As an index of how well the straight-line model fits, it’s reasonable to consider the

proportion of the y-variability explained by the linear relation.

Definition: The Coefficient of Determination, denoted by R2, is given by;

It is the proportion of variation in Y that can be attributed to a linear relationship

between X and Y.


1. Let us consider the drug evaluation data given in the previous sections. From our

earlier computations,

SYY = 370.9 SXX = 40.9 SXY = 112.1

The coefficient of determination is given by;

( )

( )( )

2. For the lung capacity data,

( ⁄ ) . ⁄ /

Page 177: Course Notes Statistics


Remark: When the value of R2 is small, we can only conclude that a straight-line relation

does not give a good fit to the data. Such a case may arise due to the following


i. There is little relation between the variables in the sense that the scatter

diagram fails to exhibit any pattern. In this case, the use of a different

regression model is not likely to reduce the SSE or explain a substantial part of


ii. There is prominent relation but it is nonlinear in nature, that is, the scatter

diagram is banded more around a curve than a line. The part of SST that is

explained by a straight-line regression is small because the model is

inappropriate. Some other relationship may improve the fit substantially.

Page 178: Course Notes Statistics


Assignment 1:

Classify each statement according to the level of measurement used to get the value 7:

1) Teddy measured the temperature of the object as 7 .

2) Teddy has a score of 7 in the Stat Quiz.

3) Teddy’s basketball shirt number is 7.

4) Teddy’s shoe size is 7.

5) Teddy has 7 cousins.

What method of data collection is most appropriate for the ff. cases (survey, experiment,


6) Studying two groups of patients and determining if exercise lowers the blood


7) A group of medical intern students studies the effects of laughter to patients in a


8) An NGO compares the household expenditures in Quezon City.

9) A car manufacturer studies the preference of cars for the next production.

10) The DOH evaluates the benefits of the family planning methods given to a certain


Exercise 1: Construct the Frequency Distribution Table of the Final Grades of the 92 Stat

101 Students last Semester of the.

68 68 95 88 84 43 74 80 76 68

81 80 92 79 71 90 76 78 64 75

80 65 79 67 90 84 71 71 78

76 95 81 77 44 50 65 70 70

96 18 91 83 47 66 43 84 62

83 83 68 32 62 91 77 72 82

60 81 82 70 73 83 83 72 75

66 88 56 86 53 93 76 93 61

83 96 77 90 92 85 80 75 82

78 18 84 65 76 70 89 93 70

Page 179: Course Notes Statistics


Assignment 2: Compute for the Variance and Standard deviation of height, weight and


Exercise 2: Compute the Skewness of each of the THREE data sets (A, B, and C).




68 74 78 80 85

58 76 86 89 92

58 70 71 80 93

68 74 79 80 86

59 76 87 89 92

59 70 71 81 94

69 75 79 81 87

59 76 87 89 92

59 70 71 83 94

69 75 79 82 87

59 76 87 89 93

59 70 72 84 94

69 75 79 82 88

61 76 88 90 93

64 69 72 84 94

70 76 79 83 89

62 76 88 90 94

64 69 72 85 94

71 76 79 83 89

63 76 88 90 97

64 69 73 85 97

71 77 79 83 89

64 77 88 91 98

64 69 73 85 98

72 78 79 84 90

65 77 88 91 98

66 70 73 86 98

73 78 80 84 90

66 78 88 92 98

66 71 74 86 98

Mean = 79

Mean = 82

Mean = 77

Exercise 3:


the probabilities of the following events:

a) AB

b) BAc

c) (AB)c

d) ABc

e) (ABc) (BAc)

f) Ac Bc

II. A die is loaded so that all the numbers have the same chances of occurrence except for a 6 whose chance of coming up is three times the chance of any other number coming up. Find the probabilities of the following events: (Hint consider 8 possible outcomes)

a) Event that a 6 comes up in a single toss. b) Event of observing an even number in a single toss c) Event of observing a number less than 5 in a single toss

Page 180: Course Notes Statistics


III. Three methods, A, B and C are available for teaching a certain industrial skill. Only one of these methods used in teaching a particular worker. The failure rate is 20% for method A, 10% for method B, and 30% for method C. However, method B is a lot more expensive and hence is used only 10% of the time while method C is very cheap and is used 50% of the time. Suppose a worker was selected at random and failed to learn the skill correctly, what is the probability that the worker was taught using method C?

IV. Suppose that 30% of the licensed drivers in Metro Manila are incompetent. Suppose also that a diagnostic test is available. If a randomly selected driver is incompetent, the probability that the test will so indicate is 0.9; and if the selected driver is competent, the probability that the test will so indicate is 0.85.

a) Given that the test indicates that a particular driver is incompetent, what is the probability that the test is correct?

b) Given that the test indicates that a particular driver is competent, what is the probability that the test is wrong?

V. The probability that a Japanese industry will locate in Cebu is 0.7. The probability that

it will locate in Bataan is 0.3, and the probability that it will locate in at least one of the two provinces is 0.79. Define A = event that a Japanese industry will locate in Cebu and B = event that a Japanese industry will locate in Bataan. Are A and B independent events? Justify your answer by showing that the condition/s of independence is/are satisfied or not satisfied.

VI. Suppose that a computer contains 6 boards, 2 of which are defective and the remaining 4 boards are non-defective. Four boards are selected randomly and each one is examined to determine if it is defective or not. Define X = number of defective boards in a sample of size 4.

a) Construct the probability mass function of X.

b) Use the PMF derived in (a) to compute for the probability that the sample

selected will contain only 1 defective board.

c) Use the PMF derived in (a) to compute for the probability that the sample

selected will contain at least 1 defective board.

VII. The CDF of a continuous random variable X is as follows:

( ) { ≤ ≤

Find the following probabilities using this CDF:

a) P(X > 0.25)

b) P (0.3 < X < 0.7)

c) P(0.4 ≤ X ≤ 1.250

Page 181: Course Notes Statistics


VIII. Suppose a gambler wins 50 PhP if the sum of dots in a toss of a pair of fair dice is

either 7 or 11, and loses 10 PhP, otherwise. Find the expected gain/loss.

IX. A wine’s distinctive taste is a result of ageing it in wooden casks. Some of the wine evaporates while it is aging in the porous wooden casks. Some of the wine evaporates while it is aging in the porous wooden casks. Define X = percentage of wine in the cask that is lost due to evaporation. Suppose X is normally distributed with mean 5% and a standard deviation of 1%. What is the probability of losing more than 7.5% of the wine due to evaporation?

X. Suppose that the IQ’s of applicants of a certain science high school follow a normal distribution with mean of 120 and a standard deviation of 9.

a) One of the requirements of the school in accepting a student is that the student’s IQ must be at least 115. What proportion of the applicants will be rejected on the basis of their IQ?

b) What is the 97.5th percentile IQ of the applicants?

Assignment 3: I. Suppose the measures of the 40 elements in the population are as follows:

Element x Element x Element x Element x

1 2 11 22 21 42 31 62

2 4 12 24 22 44 32 64

3 6 13 26 23 46 33 66

4 8 14 28 24 48 34 68

5 10 15 30 25 50 35 70

6 12 16 32 26 52 36 72

7 14 17 34 27 54 37 74

8 16 18 36 28 56 38 76

9 18 19 38 29 58 39 78

10 20 20 40 30 60 40 80

Suppose we select a sample of size 4 from this population using systematic sampling. If

n=4 then the sampling interval is k=40/4=10. There will be only 10 possible samples if we

use systematic sampling and we will be giving each one of these samples the same

chances of selection.

a.) List down all of the 10 possible samples.

b.) Construct the sampling distribution of .

c.) Determine E( ) and Var( ).

Page 182: Course Notes Statistics


II. Suppose the mean monthly income, , of the households in the exclusive subdivisions

in Metro Manila is 200,000PhP with a standard deviation = 150,000PhP. What is the

probability of selecting a random sample of 100 families whose mean monthly income is

larger than 250,000PhP?

III. Let (X1, X2, . . ., Xn) be a random sample. Find the value of c that satisfies the condition

that P(-c < <c ) is approximately equal to 0.95 for each of the following conditions:

a.) the sample size n = 100 and the population mean is = 6 and variance is 2 =


b.) the sample size is n=25 and the population is normally distributed with =6 and

variance 2 = 42.25.

c.) the sample size is n=25 and the population is normally distributed with =6 but

the variance 2 is unknown and is estimated by the sample variance S2=42.25.

Assignment 4:

I. Laboratory tests of bacterial counts are often used for declaring a water source

“polluted”. Suppose that the distribution of bacterial counts in a sample taken from a

certain lake is normally distributed with a variance of 9,000,000.

a.) Suppose 25 water samples were taken over the course of July 2004 and yielded

a mean count of 12,000. Construct an 80% confidence interval estimate of the

unknown mean bacterial count in this lake at this time.

b.) In July 2005, another set of water samples was taken from the same lake and

noted a bacterial count of 14,000. Is this an evidence of pollution effect? Explain

your answer.

II. According to a 1984 American study, about one in three individuals feels shopping is an

unpleasant experience (Journal of Marketing Research February/March 1984). Suppose we

take a national sample of 4,100 Filipino male and female adults, and we determine each

respondent’s opinion on the pleasantness of shopping. The survey produced the

following results:

Males Females

Sample Size 2,015 2,085

Number who think shopping is an unpleasant experience

850 570

a.) Compute a 95% confidence interval for the proportion of males in the sample

who think shopping is an unpleasant experience.

b.) Compute a 95% confidence interval for the proportion of females in the sample

who think shopping is an unpleasant experience.

Page 183: Course Notes Statistics


c.) Which group appears to dislike shopping more? Explain your answer.

III. If you wanted to estimate the proportion of births which are girls to within ±0.01 with

90% confidence, what sample size would be necessary? How large must the sample size

be for 95% confidence? How large must the sample size be for 99% confidence?

IV. What can you conclude about the relationship of the sample size with the confidence


Assignment 5:

I. An experiment was conducted to determine whether different baking times produce different

rises of chocolate chip muffins. Twenty four muffins were baked for 20 minutes and the rise of

each muffin was recorded. Another set of 20 muffins were baked for 25 minutes and the rise of

each muffin was also recorded. The data, in centimeters, are given below.

20 minutes 25 minutes

2.8 3.0 2.8 3.1

3.0 3.1 2.7 3.1

3.1 3.0 2.9 3.0

2.9 3.1 2.9 3.1

2.7 3.0 3.1 3.1

2.6 3.1 3.0 3.0

2.6 3.0 2.6 3.0

2.8 3.2 2.7 3.1

2.7 3.1 2.8

2.6 3.0 2.7

2.8 3.0 2.8

2.9 3.1 2.8

Provide a point estimate for the difference in the mean rise of a chocolate chip muffin between

those baked for 20 minutes and those baked for 25 minutes.

II. Using the data on the previous question, Provide a 99% confidence interval estimate of the

difference between the mean rise of muffins baked for 20 minutes and those baked for 25

minutes. Assume the normality of the data.

III. Consider the data on the number of births per 1,000 populations in African and Asian countries

indicated below.

a.) Estimate the proportion of African countries with number of births greater than 30,000.

b.) Estimate the proportion of Asian countries with number of births greater than 30,000.

c.) Find a 95% confidence interval for the difference of population proportions of countries

with number of births greater than 30,000.

Page 184: Course Notes Statistics


Births (per 1,000 population)

African Countries Asian Countries

Algeria 20 Armenia 10

Benin 41 Brunei 22

Botswana 27 China 12

Burkina Faso 45 Georgia 11

Cameroon 37 India 25

Cape Verde 29 Indonesia 22

Chad 49 Iran 18

Comoros 47 Japan 9

Eritrea 39 Kuwait 18

Ethiopia 41 Kyrgyzstan 21

Gambia 41 Lebanon 23

Guinea-Bissau 50 Malaysia 26

Lesotho 33 Maldives 18

Libya 28 Mongolia 18

Madagascar 43 Myanmar 25

Malawi 51 Nepal 34

Mali 50 North Korea 17

Mauritius 16 Oman 26

Mayotte 41 Pakistan 34

Senegal 37 Philippines 26

Seychelles 18 Qatar 20

Sudan 38 Syria 28

Togo 38 Turkey 21

Tunisia 17 UAE 16

Zambia 42 Uzbekistan 24

IV. Discuss Briefly: Differentiate sampling independently from two populations from paired

sampling from two populations.

Page 185: Course Notes Statistics


Exercise 4:

I. A mortgage type of loan that is secured by a designated piece of property. If the borrower defaults on the loan, the lender can sell the property to recover the outstanding debt. The following data are outstanding principal balance of home mortgages foreclosed by the bank due to default by the borrower during the last 3 years obtained from a random sample of 12 foreclosed mortgages:

95,982 81,422 39,888 46,836 66,899 69,110

59,200 62,331 105,812 55,545 56,635 72,123

Test the claim that the average outstanding balance of home mortgages is less than 80,000 using a 0.05 level of significance.

II. The manager of the credit department for an oil company would like to determine whether the average monthly balance of credit card holders is higher than 3,000 PhP. An auditor randomly samples 150 accounts and finds that the average owed is 4,170 PhP with a standard deviation of 1,182.50 PhP. Using 0.05 level of significance, can the auditor conclude that there is evidence that the average monthly balance is really higher than 3,000 PhP.

III. A television manufacturer claims in its warranty that in the past, less than 15% of its television sets needed any repair during their first two years of operation. In order to test the validity of this claim, a government testing agency selects a sample of 100 sets and finds that 12 sets requires some repair within their first two years of operation. Is the manufacturer’s claim valid? Test at 0.01 level of significance.

IV. Consider the cellphone data usage below: a) Test the claim that the average monthly expense on cellphone use of more than half

of the female undergraduate students in UPD is at least 500 PhP. Use α = 0.05.

b) Test the claim that the average monthly expense on cellphone use of more than half of the male undergraduate students in UPD is at least 500 PhP. Use α = 0.05.

Average Monthly Expense on Cellphone Use Among Females

8000 1000 600 500 300 250

4000 1000 600 500 300 250

2300 1000 600 500 300 250

2000 1000 600 500 300 250

2000 1000 600 500 300 250

2000 1000 600 500 300 225

2000 900 600 500 300 200

2000 900 600 500 300 200

1800 900 560 500 300 200

1700 900 550 500 300 200

1500 900 550 500 300 200

Page 186: Course Notes Statistics


1500 900 500 500 300 200

1500 900 500 500 300 200

1500 900 500 500 300 200

1400 900 500 500 300 125

1200 800 500 500 300 250

1200 800 500 500 300 250

1200 750 500 500 300 250

1200 750 500 500 300 250

1100 750 500 500 300 250

1100 750 500 500 275 250

1000 750 500 500 250 250

1000 750 500 500 250 225

1000 700 500 500 250 200

1000 700 500 400 250 200

1000 700 500 400 250 250

1000 600 500 400 250 250

1000 600 500 400 250 250

1000 600 500 400 250 250

1000 600 500 400 250

1000 600 500 300 250

1000 600 500 300 250

1000 600 500 300 250

1000 600 500 300 250

1000 600 500 300 250

Average Monthly Expense on Cellphone Use Among Males

3000 900 500 500 300 250

2500 750 500 500 300 250

2000 750 500 400 300 250

1500 700 500 350 300 250

1462 600 500 300 300 250

1200 600 500 300 300 150

1000 600 500 300 300 125

1000 600 500 300 300 100

1000 600 500 300 300 100

1000 600 500 300 275 100

1000 600 500 300 270 250

1000 600 500 300 250 250

1000 600 500 300 250 250

1000 500 500 300 250 250

1000 500 500 300 250 250

900 500 500 300 250

Page 187: Course Notes Statistics


Exercise 5:

I. Consider the data on seating capacity in cinemas in European countries in 1997 and 1999. Test the hypothesis that the mean seating capacity in cinemas has increased by more than 5,000 from 1997 to 1999. Use the 0.01 level of significance. What assumptions did you make in testing this hypothesis?

Seating Capacities in Cinemas (in thousands)

European Country


1997 1999

Croatia 53 52

Czech Republic 300 292

Denmark 51 52

Germany 772 801

Iceland 10 9

Latvia 23 26.1

Lithuania 28.2 26.1

Luxembourg 26 21

Norway 90.1 89

Poland 200 211

Portugal 97.1 143

Romania 149 129

Slovakia 83.6 95.3

Slovenia 27 24

Switzerland 110.9 230.8

II. In a study made by Cain, Oakhill, and Lemmon (2005), the ability of participants to read and understand written words out of sentence context (Gates-MacGinitie Primary Two Vocabulary Test), and word reading accuracy in context and reading comprehension (Neale Analysis of Reading Ability) were among several characteristics measured in 28 participants. Two groups participated in the study: 14 good comprehenders and 14 poor comprehenders. The following table shows the summary:


Good Comprehenders Poor Comprehenders

Mean Standard Deviation Mean Standard Deviation

Gates-MacGinitie Vocabulary 34.2 2.75 34 2.04

Neale Analysis Word Reading Accuracy 10.6 7.05 10.7 6.97

Page 188: Course Notes Statistics


At 0.05 level of significance, determine whether the mean performance of good and poor comprehenders differ in each of the three characteristics. Assume normality of both.

III. In 1999, a study showed that in a sample of 2,200 students enrolled at the tertiary level in the Philippines, 54.91% are females. In the same year, in a sample of 1,701 students enrolled at the tertiary level in Bangladesh, only 32.33% are females. Is there sufficient evidence to conclude that the proportion of females receiving tertiary education is higher in the Philippines than in Bangladesh? Test at 0.05 level of significance.

IV. In 2001, a sample of 1,980 illiterate individuals from country A showed that 1,236 of these individuals are females. In the same year, a sample of 2,108 illiterate individuals from country B showed that 1,209 of these individuals are females. Can we conclude that the proportions of females among illiterate individuals are different for the two countries? Test at 0.05 level of significance.

V. In a sample of 160 students enrolled in private schools, 60 were found to be smokers. In a sample of 650 students enrolled in public schools, 115 were found to be smokers. Is there sufficient evidence to conclude that there is a higher proportion of student smokers in private schools than in public schools? Test at 0.01 level of significance.

Assignment 6:

I. How many cells may be allowed to have expected frequencies less than 5 (but at least 1) for the

chi-square test for independence to be valid if you have a contingency table of dimension?

a) 2 x 2

b) 2 x 3

c) 3 x 5

d) 4 x 4

e) 5 x 5

II. The following data is a tabulation of the nature of work of working children and job satisfaction.

The summarized data came from the masteral thesis by Viriña (2002).

Job Satisfaction Nature of Work

Permanent Non-permanent

Satisfied with Job 1145 3026

Not Satisfied 267 1215

Test at 0.05 level of significance if nature of work and job satisfaction are related.

Viriña also summarized data on the nature of work of working children and present

preference, whether they want to study or to work as shown below:

Page 189: Course Notes Statistics


Present Preference Nature of Work

Permanent Non-permanent

To Study 736 3311

To work 676 930

Test at 0.05 level of significance if nature of work and preference to study or work are


III. The following table was part of the results of a pilot project conducted by the Nutrition Center

of the Philippines in Bauan, Batangas, on the development of an anemia control program.

Perform a test for independence on the summarized data using the 0.05 level of significance. The

data is obtained from Mendoza et al. (2000).

Classification of Subjects

Nutritional Status

Normal 10 Malnourished 20 Malnourished 30 Malnourished

Normal 332 531 122 11

Anemic 198 404 217 23

Exercise 6:

The data below refers to the amount of Greenhouse gas emission (Greenhouse), number of disposal sites for solid wastes to be recycled of the states (Landfill) and emission of sulfur dioxide and nitrogen oxide measured in thousand tons (Acid) of 30 states in the US.

US State Landfill Acid Greenhouse

1 108 954.2 129.0

2 64 217.5 62.4

3 140 94.9 72.3

4 3 158.6 14.2

5 170 1135.8 167.4

6 180 1397.6 145.4

7 76 2526.2 217.1

8 82 388.6 68.5

9 115 330.8 70.8

10 34 1174.7 109.6

11 31 1008.9 172.9

12 30 491.1 77.9

13 139 362.1 89.9

14 55 833.7 181.5

15 53 316.2 83.1

16 75 238.2 51.0

17 78 1412.1 112.8

18 36 164.1 39.6

Page 190: Course Notes Statistics


19 110 94.9 21.0

20 153 466.4 54.2

21 47 326.4 43.5

22 87 3320.3 255.2

23 130 332.3 90.7

24 44 1815.7 254.6

25 9 1307.7 110.2

26 750 2407.5 459.0

27 30 135.6 42.8

28 39 1341.8 95.9

29 150 545.2 99.9

30 80 164.5 54.8

I. Using the variables Landfill and Greenhouse:

a.) Plot a scatter diagram of the data on the amount of greenhouse gas emission and number of disposal sites for solid wastes (greenhouse takes the y-axis). Does there appear to be a linear relationship between the two variables?

b.) Compute for the Pearson correlation coefficient. What conclusion can you draw based on the value of the correlation coefficient?

c.) Test whether ρ is different from 0 using 0.05 level of significance.

d.) Fit a regression model using the two variables with greenhouse as the ‘dependent variable’.

e.) Compute for the coefficient of determination. Does the data have a good fit with the data?

II. Using the variables Acid and Greenhouse:

a) Plot a scatter diagram of the data on the amount of greenhouse gas emission and acid

precipitation precursor of the states (greenhouse takes the y-axis). Does there appear to be

a linear relationship between the two variables?

b) Compute for the Pearson correlation coefficient. What conclusion can you draw based on the value of the correlation coefficient?

c) Test whether ρ is different from 0 using 0.05 level of significance.

d) Fit a regression model using the two variables with greenhouse as the ‘dependent variable’.

e) Compute for the coefficient of determination. Does the data have a good fit with the data?

Page 191: Course Notes Statistics


Areas Under the Standard Normal Distribution: P(Z < z)=α

Z 0.0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359

0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753

0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141

0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517

0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879

0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224

0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549

0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852

0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133

0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389

1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621

1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830

1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015

1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177

1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319

1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441

1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545

1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633

1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706

1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767

2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817

2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857

2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890

2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916

2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936

2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952

2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964

2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974

2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981

2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986

3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990

3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993

3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995

3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997

3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998

Page 192: Course Notes Statistics


Areas Under the Standard Normal Distribution: P(Z < z)=α

Z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

-0 0.5 0.496 0.492 0.488 0.484 0.4801 0.4761 0.4721 0.4681 0.4641

-0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247

-0.2 0.4207 0.4168 0.4129 0.409 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859

-0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.352 0.3483

-0.4 0.3446 0.3409 0.3372 0.3336 0.33 0.3264 0.3228 0.3192 0.3156 0.3121

-0.5 0.3085 0.305 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.281 0.2776

-0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451

-0.7 0.242 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148

-0.8 0.2119 0.209 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867

-0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.166 0.1635 0.1611

-1 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379

-1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.123 0.121 0.119 0.117

-1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.102 0.1003 0.0985

-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823

-1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681

-1.5 0.0668 0.0655 0.0643 0.063 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559

-1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455

-1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367

-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294

-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.025 0.0244 0.0239 0.0233

-2 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183

-2.1 0.0179 0.0174 0.017 0.0166 0.0162 0.0158 0.0154 0.015 0.0146 0.0143

-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.011

-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084

-2.4 0.0082 0.008 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064

-2.5 0.0062 0.006 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048

-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.004 0.0039 0.0038 0.0037 0.0036

-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.003 0.0029 0.0028 0.0027 0.0026

-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.002 0.0019

-2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014

-3 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.001 0.001

-3.1 0.001 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007

-3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005

-3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003

-3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002

Page 193: Course Notes Statistics


100(1-α)th Percentiles of the t-Distribution: tα

Degrees of



0.1 0.05 0.025 0.01 0.005

1 3.078 6.314 12.706 31.821 63.657

2 1.886 2.920 4.303 6.965 9.925

3 1.638 2.353 3.182 4.541 5.841

4 1.533 2.132 2.776 3.747 4.604

5 1.476 2.015 2.571 3.365 4.032

6 1.440 1.943 2.447 3.143 3.707

7 1.415 1.895 2.365 2.998 3.499

8 1.397 1.860 2.306 2.896 3.355

9 1.383 1.833 2.262 2.821 3.250

10 1.372 1.812 2.228 2.764 3.169

11 1.363 1.796 2.201 2.718 3.106

12 1.356 1.782 2.179 2.681 3.055

13 1.350 1.771 2.160 2.650 3.012

14 1.345 1.761 2.145 2.624 2.977

15 1.341 1.753 2.131 2.602 2.947

16 1.337 1.746 2.120 2.583 2.921

17 1.333 1.740 2.110 2.567 2.898

18 1.330 1.734 2.101 2.552 2.878

19 1.328 1.729 2.093 2.539 2.861

20 1.325 1.725 2.086 2.528 2.845

21 1.323 1.721 2.080 2.518 2.831

22 1.321 1.717 2.074 2.508 2.819

23 1.319 1.714 2.069 2.500 2.807

24 1.318 1.711 2.064 2.492 2.797

25 1.316 1.708 2.060 2.485 2.787

26 1.315 1.706 2.056 2.479 2.779

27 1.314 1.703 2.052 2.473 2.771

28 1.313 1.701 2.048 2.467 2.763

29 1.311 1.699 2.045 2.462 2.756

30 1.310 1.697 2.042 2.457 2.750

1.282 1.645 1.96 2.326 2.576

Page 194: Course Notes Statistics


100(1-α)th Percentiles of the Chi-Square Distribution: χα2

Degrees of



0.995 0.99 0.975 0.95 0.9 0.1 0.05 0.025 0.01 0.005

1 0.000 0.000 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879

2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597

3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838

4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860

5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750

6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548

7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278

8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955

9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589

10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188

11 2.603 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757

12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.300

13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819

14 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319

15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801

16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267

17 5.697 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718

18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156

19 6.844 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191 38.582

20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997

21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401

22 8.643 9.542 10.982 12.338 14.041 30.813 33.924 36.781 40.289 42.796

23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181

24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559

25 10.520 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314 46.928

26 11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290

27 11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.195 46.963 49.645

28 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278 50.993

29 13.121 14.256 16.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336

30 13.787 14.953 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672
