of 97
7/24/2019 Lecture 1 Stat Review
1/97
1
0.
Introduction
What is econometrics?
Econometrics is the application of statistics and economic theory todata in order to test economic hypotheses.
Economic theory describes relationships between economic variables.
For example, the law of demand tells us that as prices go down, thequantity demanded will go up.
However, as the owner of a firm or as a policymaker, we are ofteninterested in the magnitudeof the relationship between two variables.
For example, if cigarette taxes increase, the quantity demanded falls.By how much? What will be the impact on tax revenues?
To answer these questions, we need to know something about theempiricalrelationship between cigarette prices and cigarette demand.
7/24/2019 Lecture 1 Stat Review
2/97
2
We could ask a variety of other questions:
1) What is the impact of education on earnings?
2)
How much do increases in government transfers (e.g., TANF)reduce work effort?
3)
What is the effect of an increased police force on the amount ofcrime committed in a city?
Econometrics is also useful for forecasting.
1)
Firms forecast revenues and costs.
2) Governments forecast consumer spending and unemployment rates.
7/24/2019 Lecture 1 Stat Review
3/97
3
Does econometrics always give the right answer?
Earnings Years of Education
is the statistical relationship between of years of education and earnings.One more year of education will increase earnings by .
However, we only have observational datato estimate this statistical
relationship or correlation.
We typically will not be analyzing a randomized experiment.
Does one more year of school really causeearnings to increase?
Or, do more able people, who would have earned more anyway, get more
education?
We will have to rely both on economic theory as well as our understanding
of econometric theory to interpret our findings.
7/24/2019 Lecture 1 Stat Review
4/97
4
Is econometrics the same as program evaluation?
Program evaluation undertakes an examination of a program (or policy)through the study of the programs goals, processes, and outcomes.
For example, an evaluation of the Pittsburgh Promise program, whichprovides scholarships and other college related needs to graduates ofPittsburgh Public Schools, would likely include a study of whether the
program increased the educational attainment of city school graduates.
Such an evaluation would implement statistical and econometricmethodologies as part of the study.
While economists would find the results of this evaluation very useful, they
would be interested in knowing whether this program also informs us aboutthe relationship between educational attainment and outcomes of interest toeconomists such as wages, crime, intergenerational outcomes, etc.
7/24/2019 Lecture 1 Stat Review
5/97
5
ExampleIn 1973, the Indonesian government decided that it was important to
provide equity across the countrys provinces.
Indonesia undertook a massive schooling building program in whichover 61,000 primary schools were built within the next six years.
The intent of the program was to target new schools in areas whereenrollments were previously low which was likely due, in part, to the
long distances students had to travel to attend school.
Between 1973 and 1978, the school enrollment rates of 7 to 12 year oldIndonesians rose from 69 percent to 83 percent.
From the perspective of whether or not the program increasededucation levels in Indonesian, it appears to have been successful.
7/24/2019 Lecture 1 Stat Review
6/97
6
From an economists viewpoint, this program can be used to ask aquestion of great interest such as does increased education raise wages?
Duflo (2001) uses the Indonesian schooling building program toanswer precisely this question.
The idea is that this program is effectively an experiment in that itraised education levels in some parts of Indonesian but not in others.
In terms of an experiment, children who reside in areas where schoolbuilding increased are the treatment group while those in areas whereno new schools were built are the control group.
She is able to study whether the increase in education causesan
increase in the wages of those affected by the program.
7/24/2019 Lecture 1 Stat Review
7/97
7
In addition, we can also use economic theory to think about how theprogram might impact those who were not directly affected by it.
An increase in the supply of educated workers will shift the laborsupply curve and therefore lead to a new lower equilibrium wage which
will indirectlyaffect those born before the school building program.
Duflo (2004) examines the impact of the schooling building programon those born before the program took effect in their province.
She finds that the increase in educated workers due to the programreduces the wages of workers in older age cohorts by 4 to 10 percent.
By thinking through the economic theory for how an increased supply
of workers will affect the economy overall, we can find implicationsfor how those who do not participate in a program may be affected.
7/24/2019 Lecture 1 Stat Review
8/97
8
The goal of this course is to impart a basic understanding ofeconometric theory in order to be able to interpret the findings fromstudies that implement econometric methodologies.
As mentioned earlier, not all studies will use true experiments or
natural experimentsto estimate the impact of program or policy.
As such, we will require a number of assumptions to be maintained inorder for these observational studies to have a causal interpretation.
Therefore, it is very important to understand the theory behind themethods that we will learn, the assumptions that they require, underwhat circumstances these assumptions are violated, and what, ifanything, we can learn when the assumptions are incorrect.
The empirical examples in class and empirical exercises in homeworkassignments are aimed to link the theory you learn in class withapplications that illustrate these important issues.
7/24/2019 Lecture 1 Stat Review
9/97
9
I. Statistical Review
For this course, we will assume that everyone understands basicprobability and statistics.
However, we will spend the first two or three classes reviewing theseconcepts for two reasons.
First, we want to be certain that everyone has seen the same topicspresented in a similar manner before moving onto econometrics.
Second, many of the statistical concepts you have previously seen willbe applied and extended in econometrics.
By reviewing these concepts, it will be much easier to see the parallelsbetween what you already know and how they are applied.
7/24/2019 Lecture 1 Stat Review
10/97
10
Appendix B Fundamentals of ProbabilitySection B.1 Random Variables and Their Probability Distributions
A random variable is a variable whose value is determined by the
outcome of an experiment.
A discrete random variable takes on a finite or a countably infinite
number of values.
Examples
Tossing a coin, rolling a pair of dice, drawing a card
A discrete random variable,, is described by its probability densityfunction (pdf), denoted as, which is a list of all of the values therandom variable can take on and the associated probabilities where
, 1,2, . . . ,
where can be one of kpossible values.
7/24/2019 Lecture 1 Stat Review
11/97
11
A continuous random variable has a sample space that contains anuncountably infinite number of outcomes.
Examples
Temperature, height, and an amount of time.
However, the probability that a continuous random variable takes onany particular value exactlyis zero.
Thus, for continuous random variables, we work with the cumulativedistribution function (cdf) which is written as
whereis the continuous pdf.
7/24/2019 Lecture 1 Stat Review
12/97
12
Section B.2 Joint and Conditional Distributions, and IndependenceLetXand Ybe discrete random variables.
The joint distribution of
and
is fully described by their joint
probability density function,, , The random variables
and
are independent if and only if their
joint pdf can be written as
,, whereandare the marginal pdfs forand ,respectively.
We will not examine the joint pdf of continuous random variablesthis semester which is why it is not discussed here.
7/24/2019 Lecture 1 Stat Review
13/97
13
In economics, we are often interested in the pdf of one randomvariable given a particular value of another random variable.
We write the conditional PDF of givenis defined as||,,
Notice that||is only defined if 0.When both random variables are discrete we can write|| | which is read as the probability
given that
.
Whenand are independent, then knowing the value ofprovides no information of and vice versa such that
|
|
and
|
|
7/24/2019 Lecture 1 Stat Review
14/97
14
Section B.3 Features of Probability Distribution FunctionsExpected Value
The expected value, or mean, of a probability distributionfunction that takes on kdiscrete values
, , , is
where
is the pdf of
.
Ifis a continuous random variable then
We write , or sometimes , and refer to as the population mean.
7/24/2019 Lecture 1 Stat Review
15/97
15
We can also compute the expected value of a function, ,of the random variable.Ifis a discrete random variable then the expected value ofthe random variable is given by
Ifis a continuous random then the expected value of therandom variable
is
7/24/2019 Lecture 1 Stat Review
16/97
16
Properties of the Expected Value, 1)
For any constant c, .2)
For any constants aand b, .3)
If , , , are constants and , , , arerandom variables, then
Alternatively, we can write this expression as
7/24/2019 Lecture 1 Stat Review
17/97
17
Variance
The variance measures the dispersion of a pdf.
The variance is the expected value of the (squared) differencea value ofXand the mean of the distribution.
We can apply the formulas for the expected value of afunction ofXto compute the variance.
For example, the variance of a discrete random variable is
7/24/2019 Lecture 1 Stat Review
18/97
18
Properties of 1)If cis constant, then
0.
2)
If aand bare constants, then .One issue with using the variance is that its units are the
squareof the units of the random variable.
For example, if the random variableXis measured in feet thenis measured in feet squared.In some instances it is useful to work with the positive squareroot of the variance which is known as the standard deviationand is denoted as .
7/24/2019 Lecture 1 Stat Review
19/97
19
Section B.4 Features of Joint and Conditional DistributionsCovariance
The covariance is a measure of how much two randomvariables move together (co-vary).
,
Notice that if
tends to be above its mean when
is above its
mean, then
, 0.
Similarly, iftends to be below its mean when is above itsmean, or vice versa, then , 0.
7/24/2019 Lecture 1 Stat Review
20/97
7/24/2019 Lecture 1 Stat Review
21/97
21
Correlation CoefficientThe correlation coefficient offers an advantage over thecovariance since it is on a rather intuitive scale.
, , Notice that , will have the same sign as , .In addition, 1 , 1.Whereas, , can take on any real value, , allows us to scale the degree to which two variables co-vary.
+1 meansXand Yare perfectly positively correlated.
-1 meansXand Yare perfectly negatively correlated.
7/24/2019 Lecture 1 Stat Review
22/97
7/24/2019 Lecture 1 Stat Review
23/97
23
Conditional Expectation
While the covariance and correlation treat the relationship betweenXand Ysymmetrically, in many instances we will be interested inexplaining a variable in terms of another variable.
For example, we may be interested in knowing whether earningsdepend upon an individuals level of education.
One set of statistics we might compute is the expected amount of
earnings for people conditional on their levels of education.
The conditional expectation for a discrete random variable givenwhere takes on mdifferent values , , , is| |
7/24/2019 Lecture 1 Stat Review
24/97
7/24/2019 Lecture 1 Stat Review
25/97
25
5)
| |, |
This property is a more general version of the law of iteratedexpectations.
6)
If | , then , 0(, 0).Moreover, every function ofXis uncorrelated with Y.Note that the converse of this last property is nottrue; if
, 0, then it is possible that
|depends on
.
Combining these last two properties notice that if andare random variables where | 0then
i.
0since
| 0 0
ii. , 0, i.e. UandXare uncorrelated since| 0.
7/24/2019 Lecture 1 Stat Review
26/97
7/24/2019 Lecture 1 Stat Review
27/97
7/24/2019 Lecture 1 Stat Review
28/97
28
4)Any linear combination of independent, identically distributednormal random variables has a normal distribution.
This last property has implications for the average ofindependent, identically distributed normal random variables.
If , , , are independent random variables, each ofwhich is distributed , , then the average of the randomvariables,
is normally distributed.
Furthermore,
1
1
1 1
7/24/2019 Lecture 1 Stat Review
29/97
7/24/2019 Lecture 1 Stat Review
30/97
30
The Chi-Square Distribution
Let where , , , are independent standardnormal random variables.
ThenXfollows the chi square distribution with ndegrees offreedom (which is a special case of the gamma distribution)
which we write as~.Degrees of freedom generally refers to the number of
independent pieces of information to create a random variable.
We will use the abbreviation d.f.to refer to degrees of freedom.
If a random variableXis distributed
~, then it has an
expected value of nand a variance of 2n.
7/24/2019 Lecture 1 Stat Review
31/97
7/24/2019 Lecture 1 Stat Review
32/97
32
The FDistribution
Suppose Uand Vare independent chi square random variableswith n and m degrees of freedom, respectively.
A random variable of the form is said to have an Fdistribution with m and n degrees of freedom.
We will use the notation
, to denote an Frandom variable with mand ndegrees of freedom.
7/24/2019 Lecture 1 Stat Review
33/97
7/24/2019 Lecture 1 Stat Review
34/97
34
The tDistribution
Let be a standard normal random variable.Let be a chi square random variable independent of whichhas
degrees of freedom.
The Students tratio with ndegrees of freedom is denoted
The tdistribution with degrees of freedom has an expectedvalue of zero and a variance of
2.
7/24/2019 Lecture 1 Stat Review
35/97
35
The standard normal distribution and the tdistribution havesimilar shape.
Both have an expected value of zero and the variance of thetdistribution, , converges to 1 as .
00.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
3 2 1 0 1 2 3
Probability
Dens
ity
Zratio tratio,4df tratio,10df
7/24/2019 Lecture 1 Stat Review
36/97
7/24/2019 Lecture 1 Stat Review
37/97
7/24/2019 Lecture 1 Stat Review
38/97
38
Sampling
In many instances, we will be interested in knowing the valueof one or more population parameters.
For example, if we want to know about the degree of incomeinequality in society, we would be curious to know about theexpected value and variance of the population incomedistribution.
If we have a Census, then we would be able to learn the truecharacteristics of the income distribution.
However, interviewing everyone in the population is a verycostly exercise in terms of both time and money.
7/24/2019 Lecture 1 Stat Review
39/97
39
Random Sampling
Instead, we will observe a sample of the population and usethe sample to generate our best guess as to what the true
characteristics of the population distribution actually are.
Suppose that is a random variable with a probability densityfunction; where is an unknown parameter.A random sample from; is observations, , , , ,that are drawn independently from the pdf; .We sometimes refer to the random sample , , , asindependent, identically distributed (i.i.d.) random variables.
7/24/2019 Lecture 1 Stat Review
40/97
40
Section C.2 Finite Sample Properties of Estimators
We now turn to estimators of population parameters and note thatthere are two types of properties of these estimators.
The first set of properties is finite sample properties which aresometimes referred to as small sample properties.
The latter title is somewhat misleading since it refers to samples ofany size whether the number of observations is small or large.
The second set of properties is asymptotic properties which refer tothe behavior of estimators as the sample size approaches infinity.
7/24/2019 Lecture 1 Stat Review
41/97
41
Estimators and Estimates
Any function of a random sample whose objective is toapproximate a parameter is called an estimator.
Example
Suppose that , , , is a random sample from apopulation with a mean of .The sample average,
, is an estimator of the
unknown population mean, .After we collect the actual data, , , , , and wecompute the estimator by using the values that we measure in
the sample, the resulting value is known as an estimate.
7/24/2019 Lecture 1 Stat Review
42/97
7/24/2019 Lecture 1 Stat Review
43/97
7/24/2019 Lecture 1 Stat Review
44/97
44
We define the bias of an estimator as
Example
For the sample average, , we have alreadyseen that .Therefore, we can compute the bias of
.
0The bias of the sample average is 0whichmeans, as we have already seen, that is unbiased.
7/24/2019 Lecture 1 Stat Review
45/97
7/24/2019 Lecture 1 Stat Review
46/97
7/24/2019 Lecture 1 Stat Review
47/97
47
Example
We can compute the sampling variance of the sample
average, .
1
1 Notice that the sampling variance of gets smaller as thesample size, , gets larger.
7/24/2019 Lecture 1 Stat Review
48/97
7/24/2019 Lecture 1 Stat Review
49/97
7/24/2019 Lecture 1 Stat Review
50/97
50
The following graph compares the sampling variance of two
estimators where the estimator with the smaller samplingvariance is shown with the solid red line.
Notice that for the given interval , , the estimatorwith the smaller variance has more probability in this range.
7/24/2019 Lecture 1 Stat Review
51/97
51
Example
As we have seen, the sample average , is anunbiased estimator for and has a sampling variance of
The alternative estimator using only the first observationof the random sample,
, is also unbiased estimator for
.
The variance of the alternative estimator is Therefore,
is more efficient than
since
7/24/2019 Lecture 1 Stat Review
52/97
52
Section C.3 Asymptotic or Larger Sample Properties of Estimators
Another useful set of properties of estimators areasymptotic or larger sample properties of the estimators.
One useful reason for investigating the asymptotic properties
of estimators is that we can examine the performance of anestimator as the sample size grows which gives us anotherway to choose between estimators.
Another useful reason for examining asymptotic properties isthat determining the sampling distribution in finite samples israther difficult for some estimators.
However, in many cases, it is easier to determine the
asymptotic sampling distribution and to use it as anapproximation in order to draw inferences.
7/24/2019 Lecture 1 Stat Review
53/97
53
Consistency
One useful property for an estimator is that as the samplegrows infinitely large, the estimator equals the true parameter.
Formally, if
is an estimator of
with a sample size n, then
is a consistent estimator of if for every 0,lim 1If is not consistent for then we say it is inconsistent.In addition, if is consistent, then we say that is the
probability limit of
which is written as
7/24/2019 Lecture 1 Stat Review
54/97
54
A useful illustration of consistency is the sample average,
,
from a population with mean and variance .We have already seen that is unbiased for and, in addition,we saw that
Notice that as , 0.Therefore, is a consistent estimator of .Thus, if , , , are independent and identicallydistributed random variables with mean
, then
,which is known as the law of large numbers.
7/24/2019 Lecture 1 Stat Review
55/97
55
A (biased) alternative estimator for the population mean is
1 1
1 Notice that as , .In addition, we can show that 1
Notice that as , 0.It can be shown that is a consistent (but biased) estimator of .
7/24/2019 Lecture 1 Stat Review
56/97
56
Asymptotic Normality
In order to draw inferences, we need to know not only theestimator, but we also need to know information about thesampling distribution of the estimator.
Many econometric estimators are approximated by the normaldistribution as the sample size gets large.
Let
: 1,2, be a sequence of random variables such
that for all numbersz,
where is the standard normal cdf.Then, is said to have an asymptotic standard normaldistribution which we write as ~ 0,1where the astands for either asymptotically or approximately.
7/24/2019 Lecture 1 Stat Review
57/97
7/24/2019 Lecture 1 Stat Review
58/97
58
Section C.5 Interval Estimation and Confidence Intervals
While estimation of a population parameter generally yields asingle number as an estimate, that overlooks the fact that thereis uncertainty about the true parameter.
Example
The sample average yields a point estimate, , of the truepopulation average,.However, simply reporting this point estimate ignores thefact that has a sampling distribution.
However, we can instead generate an interval estimate whichis range in which the true parameter lies.
7/24/2019 Lecture 1 Stat Review
59/97
59
Example
Suppose that , , , are independent randomvariables, each of which is distributed , .We have already seen that
~0,1As we have seen, the sample average, , is an unbiasedpoint estimate for the population mean, .How can we use this information to create an interval
estimate for the true population mean, ?
7/24/2019 Lecture 1 Stat Review
60/97
60
~0,1We can create an interval that has a 95% probability ofcontaining the population mean,
.
We call such an interval a 95% confidence interval.
In general, we can create a 100 1 %confidenceinterval by choosing a level of significance,
.
The smaller the value of that we choose, the higher ourlevel of confidence.
However, to increase our confidence, we will need alarger interval.
7/24/2019 Lecture 1 Stat Review
61/97
7/24/2019 Lecture 1 Stat Review
62/97
62
. . 0.95For example, using textbook Appendix Table G.1., we seethat the probability that a standard normal randomvariable falls between -1.96 and +1.96 is 0.95, or
1.96 1.96 0.95
0
0.1
0.2
0.3
0.4
z_0.025 z_0.025
Z
7/24/2019 Lecture 1 Stat Review
63/97
63
The cdf for the standard normal distribution shown below
is similar to Appendix Table G.1 for 3.1 1.8.Z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010-2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048-2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
The values ofZare determined by 0.XYwhereXis thetenths place shown in the rows and Yis the hundredths
place in the columns.
As the table shows, 1.96 0.025.
Th df f h d d l di ib i h b l
7/24/2019 Lecture 1 Stat Review
64/97
64
The cdf for the standard normal distribution shown below
is similar to Appendix Table G.1 for 1.8 3.1.Z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.091.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.97061.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.97672.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.98172.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.98572.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.98902.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.99162.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.99362.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.99522.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.99642.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.99742.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.99812.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
As the table shows, 1.96 0.975.Therefore,
1.96 1.96 1.96 1.96 0.9750.025 0.95
7/24/2019 Lecture 1 Stat Review
65/97
65
1.96 1.96 0.95
Since is a standard normal random variable, theprobability is 0.95 that it falls between -1.96 and +1.96, or
1.96 1.96 0.95We can then re-write the expression inside of tofind the 95% confidence for
.
1.96 1.96 0.95
1.96 1.96
0.95
1.96 1.96 0.95
E l
7/24/2019 Lecture 1 Stat Review
66/97
66
Example
The height of white females who registered to vote in
Allegheny County, PA during the 1960s is normally
distributed with a variance of 6.25 (in inches).
If a random sample of 9women is selected and thesample average is height 65.5, construct a 95%confidence interval for the true average height, .
Noting that 6.25 2.5, the 95%confidence interval for is
1.96
1.96
65.5 1.96
2.5
9 65.5 1.96
2.5
963.87 67.13
7/24/2019 Lecture 1 Stat Review
67/97
67
Confidence Intervals for the Mean from a Normally DistributedPopulation
Assuming that we have a random sample from a normallydistributed population andthat we know the variance of the
distribution, we can use the approach on the preceding slidesto construct a confidence interval, .In situations in which we have much prior experience with theitems being sampled, such as the manufacturer of a product
who has detailed knowledge of the weight of their product,then we may know the variance of the distribution.
However, in many instances we will not know the variance of
the distribution so that we cannot use the above methods.
In order to construct a confidence interval for a random
7/24/2019 Lecture 1 Stat Review
68/97
68
In order to construct a confidence interval for a random
sample that is drawn from a normal distribution but withunknown variance, we must first estimate the variance.
The sample variance for a sample of size nis computed as 1
It can be shown that is an unbiased estimator of the truepopulation variance,
.
If the random sample is drawn from a normal distribution,
then the ratio follows the chi square distribution with
1degrees of freedom.
It can also be shown that if is a random sample
7/24/2019 Lecture 1 Stat Review
69/97
69
It can also be shown that if
, , , is a random sample
from the normal distribution with mean , and variance ,then the tratio
has a Student tdistribution with 1degrees of freedomwhere is the square root of the sample variance, .We can create a
100 1 %confidence interval for
using an approach similar to the method used earlier forZ.
Thus, for the 95% confidence interval, we must find theappropriate values such that
., ., 0.95
7/24/2019 Lecture 1 Stat Review
70/97
7/24/2019 Lecture 1 Stat Review
71/97
Example
7/24/2019 Lecture 1 Stat Review
72/97
72
Example
Returning to the height example of white females whoregistered to vote in Allegheny County, PA during the1960s where height is normally distributed.
Suppose that for the random sample of 9women,we compute the sample average height 65.5.However, we do not know the variance of height in the
population,
, but are able to compute the sample
variance of height, 8.5.Construct a 95% confidence interval for the true averageheight,
, among white female registered voters.
7/24/2019 Lecture 1 Stat Review
73/97
73
., ., 0.95
First, notice that with 1 8d.f., ., 2.306.df 0.10 0.05 0.025 0.01 0.005
8 1.397 1.860 2.306 2.896 3.355
Inserting the appropriate values into the expression belowyields the 95% confidence interval for, .
., ., 65.52.306 8.59
65.5 2.306 8.59
63.26 67.74
7/24/2019 Lecture 1 Stat Review
74/97
7/24/2019 Lecture 1 Stat Review
75/97
75
Section C.6 Hypothesis Testing
Suppose that we have limited information about a populationparameter.
We may develop an idea or a hypothesis about the trueparameter value.
If we are able to randomly sample from our population, wecould then test our hypothesis.
When testing our hypothesis, we call our hypothesis the nullhypothesis ().
Example
7/24/2019 Lecture 1 Stat Review
76/97
76
In our height example, we may formulate a nullhypothesis that the true mean height of white femaleregistered voters in Allegheny County is 63 inches.
We would write : 63We test the null hypothesis against an alternative
hypothesis () of which there are multiple options.: 63: 63 are called one-sided alternative hypotheses.: 63is a two-sided alternative hypothesis.
7/24/2019 Lecture 1 Stat Review
77/97
77
To test the null hypothesis against one of these alternativehypotheses, we need to develop a decision rule.
Once we have done so, we then use this decision rule to decide
if we will rejectour null hypothesis or if wefail to rejectit.
There are two approaches to forming our decision rules forhypothesis testing:
1)
using confidence intervals
2)using the test of signficance
7/24/2019 Lecture 1 Stat Review
78/97
7/24/2019 Lecture 1 Stat Review
79/97
We are now ready to test our hypothesis.
7/24/2019 Lecture 1 Stat Review
80/97
80
We previously found for our example that the 95% confidenceinterval for is
63.26 67.74
The 95% confidence interval does not contain our nullhypothesis value of 63.
We therefore reject our null hypothesis at the 95% level of
confidence.
In hypothesis testing, the confidence interval is also called theacceptance region.
The area outside the region is the critical region and the limitsof the regions are the critical values.
7/24/2019 Lecture 1 Stat Review
81/97
7/24/2019 Lecture 1 Stat Review
82/97
7/24/2019 Lecture 1 Stat Review
83/97
Examining the tdistribution table with d.f.,
7/24/2019 Lecture 1 Stat Review
84/97
84
1 8df 0.10 0.05 0.025 0.01 0.0058 1.397 1.860 2.306 2.896 3.355we see that the critical value is
.,
1.860.
The decision rule is to reject if 1.860.We form the tratio as before and we previously found the tratio to be 2.57.
The difference is now the critical value we use in our decisionrule which is 1.860 as opposed to 2.306.
Since the tratio is still greater than our critical value, we reject
the null hypothesis.
7/24/2019 Lecture 1 Stat Review
85/97
How can we think about Type I and Type II errors?
7/24/2019 Lecture 1 Stat Review
86/97
86
In a trial, the null hypothesis is that the defendant is not guiltywhile the alternative hypothesis is that the defendant is guilty.
A Type I error is to reject the null hypothesis when it is true.
In a trial, that would mean finding a defendant guilty when sheis really not guilty.
A Type II error is to accept the null when it is false.
In a trial, that would mean finding a defendant not guilty whenshe is really guilty.
Most jurists prefer to reduce the number of not guilty personswho are sent to prison (i.e. reduce the number of Type I errors).
P-values
O h h i i d b fi di i i l l
7/24/2019 Lecture 1 Stat Review
87/97
87
Our hypothesis testing proceeds by finding a critical valueand then testing whether either the sample average lies withinthe confidence interval or if the t-statistic exceeds a threshold.
Another approach is to ask the question How likely is thatwe would observe the sample mean, , that we find in oursample if the population mean is really ?In our height example, we would ask how likely is it that we
would observe 65.5in our sample if the true populationmean is 63(the null hypothesis).We proceed as before when we were using the test ofsignificance approach.
65.5 632.92 9 2.57
Using the tdistribution table and where ,
7/24/2019 Lecture 1 Stat Review
88/97
88
2.57df 0.10 0.05 0.025 0.01 0.0058 1.397 1.860 2.306 2.896 3.355we see that the estimated tratio of 2.57 falls between 2.306which corresponds to a probability of p=0.05 (in both tails)and 2.896 which corresponds to a probability of p=0.02.
Thus, the probability that we would observe 65.5 underthe null hypothesis that
63is between 0.05 and 0.02.
We call this probability the p-value.
Why concern ourselves with p-values?
Instead of arbitrarily choosing a level of significance as we didbefore, we are now reporting the probability that we wouldfind the sample mean if the null hypothesis is in fact true.
Testing the Equality of Two Population Means
S th t i di id l b l t f t l ti
7/24/2019 Lecture 1 Stat Review
89/97
89
Suppose that individuals can belong to one of two populationsand we are interested in knowing if the means of the two
populations are the same.
Observations from both populations are normally distributedwith~, and ~ , .The null hypothesis is
:
which we can also write as: 0The alternative hypothesis is
: or : 0
7/24/2019 Lecture 1 Stat Review
90/97
7/24/2019 Lecture 1 Stat Review
91/97
91
From the sample of size mfrom the first population we can
calculate the sample mean and the sample variance .From the second sample, with a sample size n, we can
determine and .The test statistic that we use is a tratio which is
The degrees of freedom subscript is intentionally left off since,
as discussed above, we are assuming that we have a large samplesuch that this statistic has a standard normal distribution.
7/24/2019 Lecture 1 Stat Review
92/97
92
Notice the similarity between the tratio for the test of equalityof two means and the tratio that we used previously to test ahypothesis about a population mean
The difference is that the sample mean and the populationmean from the initial tratio are replaced by the differencesinthe sample means and the population means while the variance
of the sample mean is replaced by the variance of thedifferencein means.
Example
We can test whether mean height differs between male and
7/24/2019 Lecture 1 Stat Review
93/97
93
We can test whether mean height differs between male andfemale voters in Allegheny County.
We draw a new sample of 36 male and 36 female voters wherethe means of the sample are
summarize height sex
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
height | 72 67.29861 3.850791 59.5 75
sex | 0
Notice that the variable sexappears to be missing.
However, since it is a character variable, not a numericvariable, we must use the tabulatecommand.
tabulate sex
sex | Freq. Percent Cum.------------+-----------------------------------
F | 36 50.00 50.00
M | 36 50.00 100.00
------------+-----------------------------------
Total | 72 100.00
To test the null hypothesis that the mean heights of men and
women are equal we need to compute the means and standardd i i l f d
7/24/2019 Lecture 1 Stat Review
94/97
94
women are equal, we need to compute the means and standarddeviations separately for men and women.
summarize height if sex=="F"
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
height | 36 64.52778 2.850926 59.5 72
summarize height if sex=="M"
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
height | 36 70.06944 2.481799 64 75
The bysortcommand yields the same result.
bysort sex: summarize height
-----------------------------------------------------------------------------------
-> sex = F
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
height | 36 64.52778 2.850926 59.5 72
-----------------------------------------------------------------------------------
-> sex = M
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
height | 36 70.06944 2.481799 64 75
bysort sex: summarize height
-----------------------------------------------------------------------------------
-> sex = F
7/24/2019 Lecture 1 Stat Review
95/97
95
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
height | 36 64.52778 2.850926 59.5 72
-----------------------------------------------------------------------------------
-> sex = M
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
height | 36 70.06944 2.481799 64 75
The sample statistics are for women 64.5, 2.85, and
8.12and for men
70.1,
2.48, and
6.15.
Since our null hypothesis is : 0while ouralternative hypothesis is : 0, we will perform atwo tailed test.
At the 0.05level of significance, the critical tvalue is1.96 so our decision rule is to reject if || 1.96.
7/24/2019 Lecture 1 Stat Review
96/97
7/24/2019 Lecture 1 Stat Review
97/97