Date post: | 08-Jan-2018 |
Category: |
Documents |
Upload: | maude-collins |
View: | 213 times |
Download: | 0 times |
Inferential Statistics
Paf 203Data Analysis and Modeling for Public Affairs
REVIEW
What is statistics? What is the difference between a population
and a sample? What is a parameter? A statistic? What are the measures of central
tendency? What are the measures of dispersion? What is descriptive statistics What is inferential statistics?
Statistics is like a bikini, what it reveals is suggestive, what it conceals is vital.
(Aaron Levenstein)
"Statistics is like a miniskirt, it covers up essentials but gives you the ideas.” (A Paris banker )
There are three kinds of lies – lies, damned lies and statistics. – Mark Twain
Statistics Quotations
Review
The word statistics can be viewed in two contexts
Plural senseStatistics as actual number derived from the data
Singular sense Statistics as a science
Review
Statistics is the science of designing studies, gathering data, and then classifying, summarizing, interpreting, and presenting these data to explain, make inferences, and support the decisions that are reached.
points out four stages in a statistical investigation, namely:
1) Collection of data 2) Presentation of data3) Analysis of data 4) Interpretation of data (to draw
valid conclusion)
Review
Population- is the complete collection of measurements, objects, or individuals under study.
A sample- is a portion or subset taken from the population.
A parameter is a number that describes a population characteristic.
A statistic is a number that describes a sample characteristic.
Two Broad Categories of Statistics Descriptive Statistics
Inferential Statistics
Descriptive Statistics
Used to describe a mass of data in a clear, concise and informative way
Deals with the methods of organizing, summarizing,
and presenting data
Example
The National Statistics Office (NSO) presented the Philippine population by age group and gender using a graph.
1.10
Inferential Statistics
Concerned with making generalizations (drawing conclusions) about the characteristics of a larger set (population) where only a part (sample) is examined
1.11
Inferential Statistics Larger Set(N units/observations)
Smaller Set(n units/observations)
Inferences and Generalizations
I.12
ExampleA new milk formulation designed to improve the psychomotor development of infants was tested on randomly selected infants. Based on the results, it was concluded that the new milk formulation is effective in improving the psychomotor development of infants.
METHODS OF DRAWING CONCLUSIONS
Deductive Method It draws conclusions from
general to specific. It assumes that any part of the
population will bear the observed characteristics of the population.
Hence, conclusions are stated with certainty.
Population
Sample
Inference
4.A.13
ILLUSTRATION
Statement 1: All UPLB students are intelligent.
Statement 2: Pedro is a UPLB student.Conclusion:
Pedro is intelligent.
Inductive Method It draws conclusions from specific
to general. It assumes that the characteristics
observed from a part of the population is likely to hold true for the whole population.
Hence, conclusions are subject to uncertainty.
Sample
Population
Inference
METHODS OF DRAWING CONCLUSIONS
4.A.16
ILLUSTRATION
Statement 1: Pedro is a UPLB student.
Statement 2: Pedro is intelligent.
Conclusion: All UPLB students are intelligent.
4.A.17
It makes use of the inductive method of drawing conclusions.
Sampling Process
Sample
Data
Population
Inferences/Generalization(Subject to Uncertainty)
INFERENTIAL STATISTICS
The Necessary Steps of Inferential Statistics
Specify the question to be answered and identify the population of interest
Describe how to select the sample Select the sample and analyze the sample
information using descriptive statistics Use the information on step 3 to make an
inference about the population Determine the reliability of the inference.
What do we need to know?
Random variable and its behavior Sampling is where a sample is drawn from
much larger body of measurements called population.
Some definitions
Inferential Statistics is generalizing a particular characteristic of a population by generating the information from a sample.
A variable is a characteristic that changes or varies over time or different individuals or objects under consideration.A population is the set of all measurements of interest.A sample is a subset of measurements selected from a population of interest.
(cont.) Some definitions
A sampling distribution is a theoretical, probabilistic distribution of all possible sample outcomes (with constant sample size n), for the statistic that is to be generalized to the population.
Collecting dataIt is possible to gather data from an entire population , this is called a census. Usually, data gathered from experiments and observations come form samples.
(cont.) Some definitions
A sample should be representative of the population but there are many ways that samples can be selected. It is helpful to categorize them into non probability and probability samples.
A nonprobability sample is one where judgment of the experimenter, the method in which data are collected, or other factors could affect the results of the sample.
Variable
A variable is a characteristic that changes or varies over time or different individuals or objects under consideration.Broad Classification of Variables:
QUANTITATIVE DISCRETE CONTINUOUS
QUALITATIVE
Types of VariableQualitative
assumes values that are not numerical but can be categorized
categories may be identified by either non-numerical descriptions or by numeric codes
Types of VariableQuantitative
indicates the quantity or amount of a characteristic
data are always numericcan be discrete or continuous
2.A.26
Types of Quantitative Variables
Discrete – variable with a finite or countable number of possible values
Continuous – variable that assumes any value in a given interval or continuum of values
- a rule or function that assigns exactly one real number to every possible outcome of a random experiment
RANDOM VARIABLE
Note: The domain of the function is the sample space S and the co-domain is the set of real numbers, .
S
- 0
Discrete random variables take on a set of distinct possible values or a countably infinite number of possible values.
TYPES OF RANDOM VARIABLES
Continuous random variables take on any value within a specified interval or continuum of values.
TYPES OF RANDOM VARIABLES
3.C.29
• SAMPLING – the process of selecting a sample• PARAMETER – descriptive measure of the
population• STATISTIC – descriptive measure of the sample• INFERENTIAL STATISTICS – concerned with
making generalizations about parameters using statistics
Basic Concepts in Sampling
WHY DO WE USE SAMPLES?
1. Reduced Cost2. Greater Speed or Timeliness3. Greater Efficiency and Accuracy4. Greater Scope5. Convenience6. Necessity7. Ethical Considerations
TWO TYPES OF SAMPLES
• Non-Probability Samples
• Probability Samples
Non-Probability Samples
• Samples are obtained haphazardly, selected purposively or are taken as volunteers.
• The probabilities of selection are unknown.
• They should not be used for statistical inference.• They result from the use of judgment sampling,
accidental sampling, purposively sampling, and the like.
Three commonly employed non probability samples include judgment samples, voluntary samples, and convenience samples.
1. Judgment samples- sample selection is sometimes based on the opinion of one or more persons who feel qualified to identify items for a sample as being characteristic of the population. Example: a political campaign manager intuitively picks certain voting districts as reliable places to measure the public’s opinion of her candidate. The poll that is taken form this district is a judgement sample based on the campaign manager’s expertise and experience.
Three commonly employed non probability samples include judgment samples, voluntary samples, and convenience samples.
2. Voluntary sample- sometimes questions are posed to the public by publishing them in print media or by broadcasting them over the radio or the television. Dialing one number indicates yes, while the other indicates no. Such polls produce voluntary samples and attract only those who are interested in the subject matter.
Three commonly employed non probability samples include judgment samples, voluntary samples, and convenience samples.
3. Convenience samples- Often people want to take an easy sample. For example, a surveyor will stand in one location and ask passersby their question or questions. Or the student working on a project will ask the entire class to fill out a survey questionnaire. These samples taken at the convenience of the surveyor is called a convenience sample.
Probability Samples …
• Samples are obtained using some objective chance mechanism, thus involving randomization.
• They require the use of a complete listing of the elements of the universe called the sampling frame.
• The probabilities of selection are known.
• They are generally referred to as random samples.
• They allow drawing of valid generalizations about the universe/population.
Probability Sample
A probability sample is one of which the chance of selection of each item in the population is known before the sample is picked.
Types of probability samples
1. Simple random sample- is a probability sample which is chosen in such a way that all possible groupings of a given size have an equal chance of being picked, and if each item in the population has an equal chance of being selected.
(cont.) Types of probability samples
2. Systematic samples- Suppose we have a list of 1000 registered voters in a community and we want to pick a probability sample of 50. We can use a random number table to pick one of the first 20 voters (1,000/50=20) on our list. If the table gave us the number 16, then the 16th voter on the list would be the first to be selected. We would then pick every 16th name after this random start (the 36th voter, 56th voter, etc.) to produce a systematic sample.
(cont.) Types of probability samples
3. Stratified samples-If the population is divided into relatively homogenous groups, or strata, and a sample is drawn from each group to produce an overall sample, this overall sample is known as a stratified sample. Stratified sample is usually performed when there is a large variation within the population and the researcher has some prior knowledge of the structure of the population that can be used to establish the strata. The sample results from each stratum are weighted and calculated with the sample results of other strata to provide the overall estimate.
(cont.) Types of probability samples
4. Cluster samples- is one in which the individual units to be sampled are actually groups or clusters of items. It is assumed that the individual items within each cluster are representative of the population. Example: consumer surveys in big cities emply cluster sampling. They divide the city into blocks, each block containing a cluster of households to be surveyed. A number of clusters are selected for the sample, and the households in the cluster are surveyed.
METHODS OF PROBABILITY SAMPLING
Simple Random Sampling
Stratified Random Sampling
Systematic Random Sampling
Cluster Sampling
SIMPLE RANDOM SAMPLING(SRS)
• Most basic method of drawing a probability sample
• Assigns equal probabilities of selection to each possible sample
• Results to a simple random sample
TYPES OF SIMPLE RANDOM SAMPLE (SRSWOR)
SRS Without Replacement (SRSWOR) – does not allow repetitions of selected units in the sample
TYPES OF SIMPLE RANDOM SAMPLE (SRSWR)
SRS With Replacement (SRSWR) – allows repetitions of selected units in the sample
STRATIFIED RANDOM SAMPLING
The universe is divided into L mutually exclusive sub-universes called strata.
Independent simple random samples are obtained from each stratum.
Note:
1 1
L L
h hh h
N N n n
ILLUSTRATION
Stratified Random Sample
Advantages of Stratification
1. It gives a better cross-section of the population.2. It simplifies the administration of the
survey/data gathering.3. The nature of the population dictates some
inherent stratification.4. It allows one to draw inferences for various
subdivisions of the population.5. Generally, it increases the precision of the
estimates. 4.B.49
SYSTEMATIC SAMPLING
Adopts a skipping pattern in the selection of sample units
Gives a better cross-section if the listing is linear in trend but has high risk of bias if there is periodicity in the listing of units in the sampling frame
Allows the simultaneous listing and selection of samples in one operation
4.B.51
Population
Systematic Sample
ILLUSTRATION
Determine the sampling interval, k = N/n
CLUSTER SAMPLING
• It considers a universe divided into N mutually exclusive sub-groups called clusters.
• A random sample of n clusters is selected and their elements are completely enumerated.•It has simpler frame requirements.•It is administratively convenient to
implement.
ILLUSTRATIONPopulation
Cluster Sample
What is a sampling error?
If we want to make a judgment about a population from a sample, we want those results to be as typical as the population. But this is difficult to do so, and we have to live with sampling errors. Errors can also come from coding and recoding of data. Results obtained from a biased sample are worthless.
How to determine a sample size
Use the formula
where: P is the proportion of the target population that is based on prior information, Q is (1-P) , and d is the degree of error that is defined by the investigator.
Example: n= (50%) 50%/(3/2) ²= 1,111 or about 1200.
If N is known, then adjust to n*= n/(1+ n/N)
22PQnd
Concept of Hypothesis testing
The objective of hypothesis testing is to determine whether or not the sample data support some belief or hypothesis about the population. In hypothesis testing, we make assumptions about the unknown parameters.
Hypothesis testing has five steps:
1. Formulating the hypothesis2. Selecting the statistical analysis model to be
used3. Setting the criteria for rejecting the null
hypothesis4. Analysis 5. Making a decision.
Formulating the hypothesis
There are two types of hypotheses: the null hypothesis (Ho) and the alternative hypothesis (Ha).
The alternative hypothesis is the hypothesis that the researcher wants to prove. The purpose of the alternative hypothesis is to determine whether or not the evidence provided by the sample is enough to establish that the null hypothesis is not true. If there is enough such evidence, then we will say that there is evidence to support the alternative hypothesis.
(cont.) Formulating the hypothesis
The alternative hypothesis is the hypothesis that the researcher wants to prove. The purpose of the alternative hypothesis is to determine whether or not the evidence provided by the sample is enough to establish that the null hypothesis is not true. If there is enough such evidence, then we will say that there is evidence to support the alternative hypothesis.
Examples of types of hypothesis
a. Hypothesis concerning the value of the population mean
b. Hypothesis concerning the value of the difference in the means of two populations.
c. Hypothesis concerning the relationship of the two nominal scale intervals
0
1 2
2 0
There are also three possible forms of alternative hypothesis:
Ha ≤ 0 or Ha ≥0Ha > 0Ha < 0
1. Selecting the statistical model to be used:
Having specified the null and alternative hypothesis, we then select the appropriate test statistic or statistical model to be used. The choice of our statistic would depend on the number of factors: 1. the nature of the hypothesis problem, 2. the level of measurement used, and the assumptions of normality.
Some of the most frequent statistical tests used are:– the T-test– the Z test – Chi square test.
2. Setting the criteria for rejecting the null hypothesis.
This involves two things: 1.selecting a significance level and 2. determining the area of rejection.
The level of significance refers to the probability of rejecting the null hypothesis when it is true. This is called the Type I error or the error. (The Type II error or the error is accepting the null hypothesis when it is not true). The level of significance refers to the probability that we will reject the null hypothesis. We make the selection of the level of significance before we compute for the test statistic. We need to select a level of significance that we think is reasonable. The decision as to which significance level to use depends on the questions involved. Social scientists routinely accept the probability of 0.05 for rejecting the null hypothesis. If a statistical test would lead to significant policy recommendations, then you may wish to reduce the risk of being in error and signify a significance level of 0.01 or even .001.
TWO TYPES OF ERROR
A TYPE I ERROR is committed when we reject a true null hypothesis.
A TYPE II ERROR is committed when we accept a false null hypothesis.
5.B.64
PROBABILITY OF COMMITTING ERRORS
The PROBABILITY OF A TYPE I ERROR is usually denoted by .
Type I error
Reject Ho Ho is TRUE
P
P
It is also known as the level of significance of a statistical test.
The PROBABILITY OF A TYPE II ERROR is usually denoted by .
Type II error
Accept Ho Ho is FALSE
P
P
PROBABILITY OF COMMITTING ERRORS
5.B.66
DECISION MATRIX
ACTION Ho is true Ho is false
Reject Ho
Type I error (α)
Correct decision
Accept Ho
Correct decision
Type II error (β)
Area of Rejection
Based on the significance level we choose, we then delineate our region of acceptance and region of rejection. The region of rejection is also called the critical region. Outcomes falling here mean we reject the null hypothesis. Our critical region will also depend on whether we are doing a right tailed test, a left tailed test or a two-tailed test.
If our alternative hypothesis involves the > sign, we use the right tailed test. When our alternative hypothesis involves < sign, we use the left tailed test. When our alternative hypothesis involves the = sign, we will use the two tailed test.
3. Analysis
The analysis part is the process of computing for our test statistic based on the assumptions we made and the data we have.
4. Making a decision
In assessing the null hypothesis, we can accept the null hypothesis or reject it in favor of the alternative hypothesis. Our decision will be based on the value of the test statistic we obtain in the analysis stage. If the value of the test statistic is located in the critical region, we reject the null hypothesis in favor of the alternative hypothesis. Our findings may be taken as conclusive even if there is the probability that we may be in error. If the test statistic is located in the acceptance region, we accept the null hypothesis. Our findings are not conclusive. We simply do not have enough evidence to prove our alternative hypothesis.
Example:
When the judge makes the pronouncement that the defendant is “guilty”, he serves the sentence imposed even if there is a probability that he is not actually guilty. But when the judge hands down a verdict of not guilty, it is usually not because it has been proven beyond reasonable doubt, that he is not guilty. There is simply not enough evidence to prove that the defendant is guilty.
Example of hypothesis testing: The T distribution
Compare the academic achievement of the foreign students and the total student population
Given: student body GPA=2.0, variance is unknown; foreign student GPA=2.58, s= 1.23, n=30
Step 1. Stating the null hypothesis: Ho: µ= 2.00 , H1: µ ≠ 2.00.
Step 2. Selecting the sampling distribution and establishing the critical region-use the t statistic, and define the probability of error, Ü= 0.01, a two tailed test with the degree of freedom= (n-1)=29. Step 1. Make assumptions-random sampling, sampling distribution is normal
(Cont.) T distribution
Step 3. Critical region= +/- 2.756
Step 4. Computing the Test statistic:Tcomp= ×-µ/ s/ /(n-1) t= 2.58-2.00/1.23 /29 t= .58/.23 t= + 2.52
Step 5 Making a decision- do not reject the Ho. The difference between the sample mean (2.58) and the population mean (2.0) is no greater than is expected if only random chance were operating.
Introduction to the Chi square test of independence
The Chi square ( ) test of independence is a very general test that is used to evaluate whether or not frequencies which have been empirically obtained differ significantly from those which would be expected if no relationship between the variables existed.
2
Chi square test of independence
Suppose we want to look at the relationship between religious affiliation and political affiliation. Suppose that, for this purpose, you selected a random sample of 100 Iglesia ni Cristo (INC) members and 100 Jesus is Lord (JIL) members. We asked each of them how they voted during the last Presidential elections, and put the results in a bivariate table.
INC JIL Total
LAMPP 80 (80%)
40 (40%)
120
LAKAS 20 (20%)
60 (60%)
80
Total 100% 100% 200
Chi square test of independence
If we examine the percentages, we see that of the 100 INC members, a larger proportion (80%) voted LAMMP while of the 100 JIL members, a larger proportion (60%) voted LAKAS. It seems that there was a great tendency for INC members to vote LAMMP and JIL members to vote LAKAS. Can we base on this sample results conclude that there is a relationship between religious affiliation and political affiliation?
The Chi-square test of independence is a technique for testing the level of statistical significance obtained by a bivariate relationship in a cross tabulation. It can apply to any level of measurement as this relationship can always be put in a bivariate or contingency table.
END – PART 1