Inferential Statistics Paf 203 Data Analysis and Modeling for Public Affairs.

Inferential Statistics

Paf 203Data Analysis and Modeling for Public Affairs

REVIEW

What is statistics? What is the difference between a population

and a sample? What is a parameter? A statistic? What are the measures of central

tendency? What are the measures of dispersion? What is descriptive statistics What is inferential statistics?

Statistics is like a bikini, what it reveals is suggestive, what it conceals is vital.

(Aaron Levenstein)

"Statistics is like a miniskirt, it covers up essentials but gives you the ideas.” (A Paris banker )

There are three kinds of lies – lies, damned lies and statistics. – Mark Twain

Statistics Quotations

http://thinkexist.com/quotes/aaron_levenstein/



Review

The word statistics can be viewed in two contexts

Plural senseStatistics as actual number derived from the data

Singular sense Statistics as a science

Review

Statistics is the science of designing studies, gathering data, and then classifying, summarizing, interpreting, and presenting these data to explain, make inferences, and support the decisions that are reached.

points out four stages in a statistical investigation, namely:

1) Collection of data 2) Presentation of data3) Analysis of data 4) Interpretation of data (to draw

valid conclusion)

Review

Population- is the complete collection of measurements, objects, or individuals under study.

A sample- is a portion or subset taken from the population.

A parameter is a number that describes a population characteristic.

A statistic is a number that describes a sample characteristic.

Two Broad Categories of Statistics Descriptive Statistics


Descriptive Statistics

Used to describe a mass of data in a clear, concise and informative way

Deals with the methods of organizing, summarizing,

and presenting data

Example

The National Statistics Office (NSO) presented the Philippine population by age group and gender using a graph.

1.10


Concerned with making generalizations (drawing conclusions) about the characteristics of a larger set (population) where only a part (sample) is examined

1.11

Inferential Statistics Larger Set(N units/observations)

Smaller Set(n units/observations)

Inferences and Generalizations

I.12

ExampleA new milk formulation designed to improve the psychomotor development of infants was tested on randomly selected infants. Based on the results, it was concluded that the new milk formulation is effective in improving the psychomotor development of infants.

METHODS OF DRAWING CONCLUSIONS

Deductive Method It draws conclusions from

general to specific. It assumes that any part of the

population will bear the observed characteristics of the population.

Hence, conclusions are stated with certainty.

Population

Sample

Inference

4.A.13

ILLUSTRATION

Statement 1: All UPLB students are intelligent.

Statement 2: Pedro is a UPLB student.Conclusion:

Pedro is intelligent.

Inductive Method It draws conclusions from specific

to general. It assumes that the characteristics

observed from a part of the population is likely to hold true for the whole population.

Hence, conclusions are subject to uncertainty.

Sample

Population

Inference

METHODS OF DRAWING CONCLUSIONS

4.A.16

ILLUSTRATION

Statement 1: Pedro is a UPLB student.

Statement 2: Pedro is intelligent.

Conclusion: All UPLB students are intelligent.

4.A.17

It makes use of the inductive method of drawing conclusions.

Sampling Process

Sample

Data

Population

Inferences/Generalization(Subject to Uncertainty)

INFERENTIAL STATISTICS

The Necessary Steps of Inferential Statistics

Specify the question to be answered and identify the population of interest

Describe how to select the sample Select the sample and analyze the sample

information using descriptive statistics Use the information on step 3 to make an

inference about the population Determine the reliability of the inference.

What do we need to know?

Random variable and its behavior Sampling is where a sample is drawn from

much larger body of measurements called population.

Some definitions

Inferential Statistics is generalizing a particular characteristic of a population by generating the information from a sample.

A variable is a characteristic that changes or varies over time or different individuals or objects under consideration.A population is the set of all measurements of interest.A sample is a subset of measurements selected from a population of interest.

(cont.) Some definitions

A sampling distribution is a theoretical, probabilistic distribution of all possible sample outcomes (with constant sample size n), for the statistic that is to be generalized to the population.

Collecting dataIt is possible to gather data from an entire population , this is called a census. Usually, data gathered from experiments and observations come form samples.

(cont.) Some definitions

A sample should be representative of the population but there are many ways that samples can be selected. It is helpful to categorize them into non probability and probability samples.

A nonprobability sample is one where judgment of the experimenter, the method in which data are collected, or other factors could affect the results of the sample.

Variable

A variable is a characteristic that changes or varies over time or different individuals or objects under consideration.Broad Classification of Variables:

QUANTITATIVE DISCRETE CONTINUOUS

QUALITATIVE

Types of VariableQualitative

assumes values that are not numerical but can be categorized

categories may be identified by either non-numerical descriptions or by numeric codes

Types of VariableQuantitative

indicates the quantity or amount of a characteristic

data are always numericcan be discrete or continuous

2.A.26

Types of Quantitative Variables

Discrete – variable with a finite or countable number of possible values

Continuous – variable that assumes any value in a given interval or continuum of values

- a rule or function that assigns exactly one real number to every possible outcome of a random experiment

RANDOM VARIABLE

Note: The domain of the function is the sample space S and the co-domain is the set of real numbers, .

S

- 0

Discrete random variables take on a set of distinct possible values or a countably infinite number of possible values.

TYPES OF RANDOM VARIABLES

Continuous random variables take on any value within a specified interval or continuum of values.

TYPES OF RANDOM VARIABLES

3.C.29

• SAMPLING – the process of selecting a sample• PARAMETER – descriptive measure of the

population• STATISTIC – descriptive measure of the sample• INFERENTIAL STATISTICS – concerned with

making generalizations about parameters using statistics

Basic Concepts in Sampling

WHY DO WE USE SAMPLES?

1. Reduced Cost2. Greater Speed or Timeliness3. Greater Efficiency and Accuracy4. Greater Scope5. Convenience6. Necessity7. Ethical Considerations

TWO TYPES OF SAMPLES

• Non-Probability Samples

• Probability Samples

Non-Probability Samples

• Samples are obtained haphazardly, selected purposively or are taken as volunteers.

• The probabilities of selection are unknown.

• They should not be used for statistical inference.• They result from the use of judgment sampling,

accidental sampling, purposively sampling, and the like.

Three commonly employed non probability samples include judgment samples, voluntary samples, and convenience samples.

1. Judgment samples- sample selection is sometimes based on the opinion of one or more persons who feel qualified to identify items for a sample as being characteristic of the population. Example: a political campaign manager intuitively picks certain voting districts as reliable places to measure the public’s opinion of her candidate. The poll that is taken form this district is a judgement sample based on the campaign manager’s expertise and experience.


2. Voluntary sample- sometimes questions are posed to the public by publishing them in print media or by broadcasting them over the radio or the television. Dialing one number indicates yes, while the other indicates no. Such polls produce voluntary samples and attract only those who are interested in the subject matter.


3. Convenience samples- Often people want to take an easy sample. For example, a surveyor will stand in one location and ask passersby their question or questions. Or the student working on a project will ask the entire class to fill out a survey questionnaire. These samples taken at the convenience of the surveyor is called a convenience sample.

Probability Samples …

• Samples are obtained using some objective chance mechanism, thus involving randomization.

• They require the use of a complete listing of the elements of the universe called the sampling frame.

• The probabilities of selection are known.

• They are generally referred to as random samples.

• They allow drawing of valid generalizations about the universe/population.

Probability Sample

A probability sample is one of which the chance of selection of each item in the population is known before the sample is picked.

Types of probability samples

1. Simple random sample- is a probability sample which is chosen in such a way that all possible groupings of a given size have an equal chance of being picked, and if each item in the population has an equal chance of being selected.

(cont.) Types of probability samples

2. Systematic samples- Suppose we have a list of 1000 registered voters in a community and we want to pick a probability sample of 50. We can use a random number table to pick one of the first 20 voters (1,000/50=20) on our list. If the table gave us the number 16, then the 16th voter on the list would be the first to be selected. We would then pick every 16th name after this random start (the 36th voter, 56th voter, etc.) to produce a systematic sample.


3. Stratified samples-If the population is divided into relatively homogenous groups, or strata, and a sample is drawn from each group to produce an overall sample, this overall sample is known as a stratified sample. Stratified sample is usually performed when there is a large variation within the population and the researcher has some prior knowledge of the structure of the population that can be used to establish the strata. The sample results from each stratum are weighted and calculated with the sample results of other strata to provide the overall estimate.


4. Cluster samples- is one in which the individual units to be sampled are actually groups or clusters of items. It is assumed that the individual items within each cluster are representative of the population. Example: consumer surveys in big cities emply cluster sampling. They divide the city into blocks, each block containing a cluster of households to be surveyed. A number of clusters are selected for the sample, and the households in the cluster are surveyed.

METHODS OF PROBABILITY SAMPLING

Simple Random Sampling

Stratified Random Sampling

Systematic Random Sampling

Cluster Sampling

SIMPLE RANDOM SAMPLING(SRS)

• Most basic method of drawing a probability sample

• Assigns equal probabilities of selection to each possible sample

• Results to a simple random sample

TYPES OF SIMPLE RANDOM SAMPLE (SRSWOR)

SRS Without Replacement (SRSWOR) – does not allow repetitions of selected units in the sample

TYPES OF SIMPLE RANDOM SAMPLE (SRSWR)

SRS With Replacement (SRSWR) – allows repetitions of selected units in the sample

STRATIFIED RANDOM SAMPLING

The universe is divided into L mutually exclusive sub-universes called strata.

Independent simple random samples are obtained from each stratum.

Note:

1 1

L L

h hh h

N N n n

ILLUSTRATION

Stratified Random Sample

Advantages of Stratification

1. It gives a better cross-section of the population.2. It simplifies the administration of the

survey/data gathering.3. The nature of the population dictates some

inherent stratification.4. It allows one to draw inferences for various

subdivisions of the population.5. Generally, it increases the precision of the

estimates. 4.B.49

SYSTEMATIC SAMPLING

Adopts a skipping pattern in the selection of sample units

Gives a better cross-section if the listing is linear in trend but has high risk of bias if there is periodicity in the listing of units in the sampling frame

Allows the simultaneous listing and selection of samples in one operation

4.B.51

Population

Systematic Sample

ILLUSTRATION

Determine the sampling interval, k = N/n

CLUSTER SAMPLING

• It considers a universe divided into N mutually exclusive sub-groups called clusters.

• A random sample of n clusters is selected and their elements are completely enumerated.•It has simpler frame requirements.•It is administratively convenient to

implement.

ILLUSTRATIONPopulation

Cluster Sample

What is a sampling error?

If we want to make a judgment about a population from a sample, we want those results to be as typical as the population. But this is difficult to do so, and we have to live with sampling errors. Errors can also come from coding and recoding of data. Results obtained from a biased sample are worthless.

How to determine a sample size

Use the formula

where: P is the proportion of the target population that is based on prior information, Q is (1-P) , and d is the degree of error that is defined by the investigator.

Example: n= (50%) 50%/(3/2) ²= 1,111 or about 1200.

If N is known, then adjust to n*= n/(1+ n/N)

22PQnd

Concept of Hypothesis testing

The objective of hypothesis testing is to determine whether or not the sample data support some belief or hypothesis about the population. In hypothesis testing, we make assumptions about the unknown parameters.

Hypothesis testing has five steps:

1. Formulating the hypothesis2. Selecting the statistical analysis model to be

used3. Setting the criteria for rejecting the null

hypothesis4. Analysis 5. Making a decision.

Formulating the hypothesis

There are two types of hypotheses: the null hypothesis (Ho) and the alternative hypothesis (Ha).

The alternative hypothesis is the hypothesis that the researcher wants to prove. The purpose of the alternative hypothesis is to determine whether or not the evidence provided by the sample is enough to establish that the null hypothesis is not true. If there is enough such evidence, then we will say that there is evidence to support the alternative hypothesis.

(cont.) Formulating the hypothesis

The alternative hypothesis is the hypothesis that the researcher wants to prove. The purpose of the alternative hypothesis is to determine whether or not the evidence provided by the sample is enough to establish that the null hypothesis is not true. If there is enough such evidence, then we will say that there is evidence to support the alternative hypothesis.

Examples of types of hypothesis

a. Hypothesis concerning the value of the population mean

b. Hypothesis concerning the value of the difference in the means of two populations.

c. Hypothesis concerning the relationship of the two nominal scale intervals

0

1 2

2 0

There are also three possible forms of alternative hypothesis:

Ha ≤ 0 or Ha ≥0Ha > 0Ha < 0

1. Selecting the statistical model to be used:

Having specified the null and alternative hypothesis, we then select the appropriate test statistic or statistical model to be used. The choice of our statistic would depend on the number of factors: 1. the nature of the hypothesis problem, 2. the level of measurement used, and the assumptions of normality.

Some of the most frequent statistical tests used are:– the T-test– the Z test – Chi square test.

2. Setting the criteria for rejecting the null hypothesis.

This involves two things: 1.selecting a significance level and 2. determining the area of rejection.

The level of significance refers to the probability of rejecting the null hypothesis when it is true. This is called the Type I error or the error. (The Type II error or the error is accepting the null hypothesis when it is not true). The level of significance refers to the probability that we will reject the null hypothesis. We make the selection of the level of significance before we compute for the test statistic. We need to select a level of significance that we think is reasonable. The decision as to which significance level to use depends on the questions involved. Social scientists routinely accept the probability of 0.05 for rejecting the null hypothesis. If a statistical test would lead to significant policy recommendations, then you may wish to reduce the risk of being in error and signify a significance level of 0.01 or even .001.

TWO TYPES OF ERROR

A TYPE I ERROR is committed when we reject a true null hypothesis.

A TYPE II ERROR is committed when we accept a false null hypothesis.

5.B.64

PROBABILITY OF COMMITTING ERRORS

The PROBABILITY OF A TYPE I ERROR is usually denoted by .

Type I error

Reject Ho Ho is TRUE

P

P

It is also known as the level of significance of a statistical test.

The PROBABILITY OF A TYPE II ERROR is usually denoted by .

Type II error

Accept Ho Ho is FALSE

P

P

PROBABILITY OF COMMITTING ERRORS

5.B.66

DECISION MATRIX

ACTION Ho is true Ho is false

Reject Ho

Type I error (α)

Correct decision

Accept Ho

Correct decision

Type II error (β)

Area of Rejection

Based on the significance level we choose, we then delineate our region of acceptance and region of rejection. The region of rejection is also called the critical region. Outcomes falling here mean we reject the null hypothesis. Our critical region will also depend on whether we are doing a right tailed test, a left tailed test or a two-tailed test.

If our alternative hypothesis involves the > sign, we use the right tailed test. When our alternative hypothesis involves < sign, we use the left tailed test. When our alternative hypothesis involves the = sign, we will use the two tailed test.

3. Analysis

The analysis part is the process of computing for our test statistic based on the assumptions we made and the data we have.

4. Making a decision

In assessing the null hypothesis, we can accept the null hypothesis or reject it in favor of the alternative hypothesis. Our decision will be based on the value of the test statistic we obtain in the analysis stage. If the value of the test statistic is located in the critical region, we reject the null hypothesis in favor of the alternative hypothesis. Our findings may be taken as conclusive even if there is the probability that we may be in error. If the test statistic is located in the acceptance region, we accept the null hypothesis. Our findings are not conclusive. We simply do not have enough evidence to prove our alternative hypothesis.

Example:

When the judge makes the pronouncement that the defendant is “guilty”, he serves the sentence imposed even if there is a probability that he is not actually guilty. But when the judge hands down a verdict of not guilty, it is usually not because it has been proven beyond reasonable doubt, that he is not guilty. There is simply not enough evidence to prove that the defendant is guilty.

Example of hypothesis testing: The T distribution

Compare the academic achievement of the foreign students and the total student population

Given: student body GPA=2.0, variance is unknown; foreign student GPA=2.58, s= 1.23, n=30

Step 1. Stating the null hypothesis: Ho: µ= 2.00 , H1: µ ≠ 2.00.

Step 2. Selecting the sampling distribution and establishing the critical region-use the t statistic, and define the probability of error, Ü= 0.01, a two tailed test with the degree of freedom= (n-1)=29. Step 1. Make assumptions-random sampling, sampling distribution is normal

(Cont.) T distribution

Step 3. Critical region= +/- 2.756

Step 4. Computing the Test statistic:Tcomp= ×-µ/ s/ /(n-1) t= 2.58-2.00/1.23 /29 t= .58/.23 t= + 2.52

Step 5 Making a decision- do not reject the Ho. The difference between the sample mean (2.58) and the population mean (2.0) is no greater than is expected if only random chance were operating.

Introduction to the Chi square test of independence

The Chi square ( ) test of independence is a very general test that is used to evaluate whether or not frequencies which have been empirically obtained differ significantly from those which would be expected if no relationship between the variables existed.

2

Chi square test of independence

Suppose we want to look at the relationship between religious affiliation and political affiliation. Suppose that, for this purpose, you selected a random sample of 100 Iglesia ni Cristo (INC) members and 100 Jesus is Lord (JIL) members. We asked each of them how they voted during the last Presidential elections, and put the results in a bivariate table.

INC JIL Total

LAMPP 80 (80%)

40 (40%)

120

LAKAS 20 (20%)

60 (60%)

80

Total 100% 100% 200

Chi square test of independence

If we examine the percentages, we see that of the 100 INC members, a larger proportion (80%) voted LAMMP while of the 100 JIL members, a larger proportion (60%) voted LAKAS. It seems that there was a great tendency for INC members to vote LAMMP and JIL members to vote LAKAS. Can we base on this sample results conclude that there is a relationship between religious affiliation and political affiliation?

The Chi-square test of independence is a technique for testing the level of statistical significance obtained by a bivariate relationship in a cross tabulation. It can apply to any level of measurement as this relationship can always be put in a bivariate or contingency table.

END – PART 1

Date post:	08-Jan-2018
Category:	Documents
Upload:	maude-collins
View:	213 times
Download:	0 times

Inferential Statistics Paf 203 Data Analysis and Modeling for Public Affairs.

Documents