Lecture 1 Stat Review

7/24/2019 Lecture 1 Stat Review

1/97

1

0.

Introduction

What is econometrics?

Econometrics is the application of statistics and economic theory todata in order to test economic hypotheses.

Economic theory describes relationships between economic variables.

For example, the law of demand tells us that as prices go down, thequantity demanded will go up.

However, as the owner of a firm or as a policymaker, we are ofteninterested in the magnitudeof the relationship between two variables.

For example, if cigarette taxes increase, the quantity demanded falls.By how much? What will be the impact on tax revenues?

To answer these questions, we need to know something about theempiricalrelationship between cigarette prices and cigarette demand.


2/97

2

We could ask a variety of other questions:

1) What is the impact of education on earnings?

2)

How much do increases in government transfers (e.g., TANF)reduce work effort?

3)

What is the effect of an increased police force on the amount ofcrime committed in a city?

Econometrics is also useful for forecasting.

1)

Firms forecast revenues and costs.

2) Governments forecast consumer spending and unemployment rates.


3/97

3

Does econometrics always give the right answer?

Earnings Years of Education

is the statistical relationship between of years of education and earnings.One more year of education will increase earnings by .

However, we only have observational datato estimate this statistical

relationship or correlation.

We typically will not be analyzing a randomized experiment.

Does one more year of school really causeearnings to increase?

Or, do more able people, who would have earned more anyway, get more

education?

We will have to rely both on economic theory as well as our understanding

of econometric theory to interpret our findings.


4/97

4

Is econometrics the same as program evaluation?

Program evaluation undertakes an examination of a program (or policy)through the study of the programs goals, processes, and outcomes.

For example, an evaluation of the Pittsburgh Promise program, whichprovides scholarships and other college related needs to graduates ofPittsburgh Public Schools, would likely include a study of whether the

program increased the educational attainment of city school graduates.

Such an evaluation would implement statistical and econometricmethodologies as part of the study.

While economists would find the results of this evaluation very useful, they

would be interested in knowing whether this program also informs us aboutthe relationship between educational attainment and outcomes of interest toeconomists such as wages, crime, intergenerational outcomes, etc.


5/97

5

ExampleIn 1973, the Indonesian government decided that it was important to

provide equity across the countrys provinces.

Indonesia undertook a massive schooling building program in whichover 61,000 primary schools were built within the next six years.

The intent of the program was to target new schools in areas whereenrollments were previously low which was likely due, in part, to the

long distances students had to travel to attend school.

Between 1973 and 1978, the school enrollment rates of 7 to 12 year oldIndonesians rose from 69 percent to 83 percent.

From the perspective of whether or not the program increasededucation levels in Indonesian, it appears to have been successful.


6/97

6

From an economists viewpoint, this program can be used to ask aquestion of great interest such as does increased education raise wages?

Duflo (2001) uses the Indonesian schooling building program toanswer precisely this question.

The idea is that this program is effectively an experiment in that itraised education levels in some parts of Indonesian but not in others.

In terms of an experiment, children who reside in areas where schoolbuilding increased are the treatment group while those in areas whereno new schools were built are the control group.

She is able to study whether the increase in education causesan

increase in the wages of those affected by the program.


7/97

7

In addition, we can also use economic theory to think about how theprogram might impact those who were not directly affected by it.

An increase in the supply of educated workers will shift the laborsupply curve and therefore lead to a new lower equilibrium wage which

will indirectlyaffect those born before the school building program.

Duflo (2004) examines the impact of the schooling building programon those born before the program took effect in their province.

She finds that the increase in educated workers due to the programreduces the wages of workers in older age cohorts by 4 to 10 percent.

By thinking through the economic theory for how an increased supply

of workers will affect the economy overall, we can find implicationsfor how those who do not participate in a program may be affected.


8/97

8

The goal of this course is to impart a basic understanding ofeconometric theory in order to be able to interpret the findings fromstudies that implement econometric methodologies.

As mentioned earlier, not all studies will use true experiments or

natural experimentsto estimate the impact of program or policy.

As such, we will require a number of assumptions to be maintained inorder for these observational studies to have a causal interpretation.

Therefore, it is very important to understand the theory behind themethods that we will learn, the assumptions that they require, underwhat circumstances these assumptions are violated, and what, ifanything, we can learn when the assumptions are incorrect.

The empirical examples in class and empirical exercises in homeworkassignments are aimed to link the theory you learn in class withapplications that illustrate these important issues.


9/97

9

I. Statistical Review

For this course, we will assume that everyone understands basicprobability and statistics.

However, we will spend the first two or three classes reviewing theseconcepts for two reasons.

First, we want to be certain that everyone has seen the same topicspresented in a similar manner before moving onto econometrics.

Second, many of the statistical concepts you have previously seen willbe applied and extended in econometrics.

By reviewing these concepts, it will be much easier to see the parallelsbetween what you already know and how they are applied.


10/97

10

Appendix B Fundamentals of ProbabilitySection B.1 Random Variables and Their Probability Distributions

A random variable is a variable whose value is determined by the

outcome of an experiment.

A discrete random variable takes on a finite or a countably infinite

number of values.

Examples

Tossing a coin, rolling a pair of dice, drawing a card

A discrete random variable,, is described by its probability densityfunction (pdf), denoted as, which is a list of all of the values therandom variable can take on and the associated probabilities where

, 1,2, . . . ,

where can be one of kpossible values.


11/97

11

A continuous random variable has a sample space that contains anuncountably infinite number of outcomes.

Examples

Temperature, height, and an amount of time.

However, the probability that a continuous random variable takes onany particular value exactlyis zero.

Thus, for continuous random variables, we work with the cumulativedistribution function (cdf) which is written as

whereis the continuous pdf.


12/97

12

Section B.2 Joint and Conditional Distributions, and IndependenceLetXand Ybe discrete random variables.

The joint distribution of

and

is fully described by their joint

probability density function,, , The random variables

and

are independent if and only if their

joint pdf can be written as

,, whereandare the marginal pdfs forand ,respectively.

We will not examine the joint pdf of continuous random variablesthis semester which is why it is not discussed here.


13/97

13

In economics, we are often interested in the pdf of one randomvariable given a particular value of another random variable.

We write the conditional PDF of givenis defined as||,,

Notice that||is only defined if 0.When both random variables are discrete we can write|| | which is read as the probability

given that

.

Whenand are independent, then knowing the value ofprovides no information of and vice versa such that

|

|

and

|

|


14/97

14

Section B.3 Features of Probability Distribution FunctionsExpected Value

The expected value, or mean, of a probability distributionfunction that takes on kdiscrete values

, , , is

where

is the pdf of

.

Ifis a continuous random variable then

We write , or sometimes , and refer to as the population mean.


15/97

15

We can also compute the expected value of a function, ,of the random variable.Ifis a discrete random variable then the expected value ofthe random variable is given by

Ifis a continuous random then the expected value of therandom variable

is


16/97

16

Properties of the Expected Value, 1)

For any constant c, .2)

For any constants aand b, .3)

If , , , are constants and , , , arerandom variables, then

Alternatively, we can write this expression as


17/97

17

Variance

The variance measures the dispersion of a pdf.

The variance is the expected value of the (squared) differencea value ofXand the mean of the distribution.

We can apply the formulas for the expected value of afunction ofXto compute the variance.

For example, the variance of a discrete random variable is


18/97

18

Properties of 1)If cis constant, then

0.

2)

If aand bare constants, then .One issue with using the variance is that its units are the

squareof the units of the random variable.

For example, if the random variableXis measured in feet thenis measured in feet squared.In some instances it is useful to work with the positive squareroot of the variance which is known as the standard deviationand is denoted as .


19/97

19

Section B.4 Features of Joint and Conditional DistributionsCovariance

The covariance is a measure of how much two randomvariables move together (co-vary).

,

Notice that if

tends to be above its mean when

is above its

mean, then

, 0.

Similarly, iftends to be below its mean when is above itsmean, or vice versa, then , 0.


20/97


21/97

21

Correlation CoefficientThe correlation coefficient offers an advantage over thecovariance since it is on a rather intuitive scale.

, , Notice that , will have the same sign as , .In addition, 1 , 1.Whereas, , can take on any real value, , allows us to scale the degree to which two variables co-vary.

+1 meansXand Yare perfectly positively correlated.

-1 meansXand Yare perfectly negatively correlated.


22/97


23/97

23

Conditional Expectation

While the covariance and correlation treat the relationship betweenXand Ysymmetrically, in many instances we will be interested inexplaining a variable in terms of another variable.

For example, we may be interested in knowing whether earningsdepend upon an individuals level of education.

One set of statistics we might compute is the expected amount of

earnings for people conditional on their levels of education.

The conditional expectation for a discrete random variable givenwhere takes on mdifferent values , , , is| |


24/97


25/97

25

5)

| |, |

This property is a more general version of the law of iteratedexpectations.

6)

If | , then , 0(, 0).Moreover, every function ofXis uncorrelated with Y.Note that the converse of this last property is nottrue; if

, 0, then it is possible that

|depends on

.

Combining these last two properties notice that if andare random variables where | 0then

i.

0since

| 0 0

ii. , 0, i.e. UandXare uncorrelated since| 0.


26/97


27/97


28/97

28

4)Any linear combination of independent, identically distributednormal random variables has a normal distribution.

This last property has implications for the average ofindependent, identically distributed normal random variables.

If , , , are independent random variables, each ofwhich is distributed , , then the average of the randomvariables,

is normally distributed.

Furthermore,

1

1

1 1


29/97


30/97

30

The Chi-Square Distribution

Let where , , , are independent standardnormal random variables.

ThenXfollows the chi square distribution with ndegrees offreedom (which is a special case of the gamma distribution)

which we write as~.Degrees of freedom generally refers to the number of

independent pieces of information to create a random variable.

We will use the abbreviation d.f.to refer to degrees of freedom.

If a random variableXis distributed

~, then it has an

expected value of nand a variance of 2n.


31/97


32/97

32

The FDistribution

Suppose Uand Vare independent chi square random variableswith n and m degrees of freedom, respectively.

A random variable of the form is said to have an Fdistribution with m and n degrees of freedom.

We will use the notation

, to denote an Frandom variable with mand ndegrees of freedom.


33/97


34/97

34

The tDistribution

Let be a standard normal random variable.Let be a chi square random variable independent of whichhas

degrees of freedom.

The Students tratio with ndegrees of freedom is denoted

The tdistribution with degrees of freedom has an expectedvalue of zero and a variance of

2.


35/97

35

The standard normal distribution and the tdistribution havesimilar shape.

Both have an expected value of zero and the variance of thetdistribution, , converges to 1 as .

00.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

3 2 1 0 1 2 3

Probability

Dens

ity

Zratio tratio,4df tratio,10df


36/97


37/97


38/97

38

Sampling

In many instances, we will be interested in knowing the valueof one or more population parameters.

For example, if we want to know about the degree of incomeinequality in society, we would be curious to know about theexpected value and variance of the population incomedistribution.

If we have a Census, then we would be able to learn the truecharacteristics of the income distribution.

However, interviewing everyone in the population is a verycostly exercise in terms of both time and money.


39/97

39

Random Sampling

Instead, we will observe a sample of the population and usethe sample to generate our best guess as to what the true

characteristics of the population distribution actually are.

Suppose that is a random variable with a probability densityfunction; where is an unknown parameter.A random sample from; is observations, , , , ,that are drawn independently from the pdf; .We sometimes refer to the random sample , , , asindependent, identically distributed (i.i.d.) random variables.


40/97

40

Section C.2 Finite Sample Properties of Estimators

We now turn to estimators of population parameters and note thatthere are two types of properties of these estimators.

The first set of properties is finite sample properties which aresometimes referred to as small sample properties.

The latter title is somewhat misleading since it refers to samples ofany size whether the number of observations is small or large.

The second set of properties is asymptotic properties which refer tothe behavior of estimators as the sample size approaches infinity.


41/97

41

Estimators and Estimates

Any function of a random sample whose objective is toapproximate a parameter is called an estimator.

Example

Suppose that , , , is a random sample from apopulation with a mean of .The sample average,

, is an estimator of the

unknown population mean, .After we collect the actual data, , , , , and wecompute the estimator by using the values that we measure in

the sample, the resulting value is known as an estimate.


42/97


43/97


44/97

44

We define the bias of an estimator as

Example

For the sample average, , we have alreadyseen that .Therefore, we can compute the bias of

.

0The bias of the sample average is 0whichmeans, as we have already seen, that is unbiased.


45/97


46/97


47/97

47

Example

We can compute the sampling variance of the sample

average, .

1

1 Notice that the sampling variance of gets smaller as thesample size, , gets larger.


48/97


49/97


50/97

50

The following graph compares the sampling variance of two

estimators where the estimator with the smaller samplingvariance is shown with the solid red line.

Notice that for the given interval , , the estimatorwith the smaller variance has more probability in this range.


51/97

51

Example

As we have seen, the sample average , is anunbiased estimator for and has a sampling variance of

The alternative estimator using only the first observationof the random sample,

, is also unbiased estimator for

.

The variance of the alternative estimator is Therefore,

is more efficient than

since


52/97

52

Section C.3 Asymptotic or Larger Sample Properties of Estimators

Another useful set of properties of estimators areasymptotic or larger sample properties of the estimators.

One useful reason for investigating the asymptotic properties

of estimators is that we can examine the performance of anestimator as the sample size grows which gives us anotherway to choose between estimators.

Another useful reason for examining asymptotic properties isthat determining the sampling distribution in finite samples israther difficult for some estimators.

However, in many cases, it is easier to determine the

asymptotic sampling distribution and to use it as anapproximation in order to draw inferences.


53/97

53

Consistency

One useful property for an estimator is that as the samplegrows infinitely large, the estimator equals the true parameter.

Formally, if

is an estimator of

with a sample size n, then

is a consistent estimator of if for every 0,lim 1If is not consistent for then we say it is inconsistent.In addition, if is consistent, then we say that is the

probability limit of

which is written as


54/97

54

A useful illustration of consistency is the sample average,

,

from a population with mean and variance .We have already seen that is unbiased for and, in addition,we saw that

Notice that as , 0.Therefore, is a consistent estimator of .Thus, if , , , are independent and identicallydistributed random variables with mean

, then

,which is known as the law of large numbers.


55/97

55

A (biased) alternative estimator for the population mean is

1 1

1 Notice that as , .In addition, we can show that 1

Notice that as , 0.It can be shown that is a consistent (but biased) estimator of .


56/97

56

Asymptotic Normality

In order to draw inferences, we need to know not only theestimator, but we also need to know information about thesampling distribution of the estimator.

Many econometric estimators are approximated by the normaldistribution as the sample size gets large.

Let

: 1,2, be a sequence of random variables such

that for all numbersz,

where is the standard normal cdf.Then, is said to have an asymptotic standard normaldistribution which we write as ~ 0,1where the astands for either asymptotically or approximately.


57/97


58/97

58

Section C.5 Interval Estimation and Confidence Intervals

While estimation of a population parameter generally yields asingle number as an estimate, that overlooks the fact that thereis uncertainty about the true parameter.

Example

The sample average yields a point estimate, , of the truepopulation average,.However, simply reporting this point estimate ignores thefact that has a sampling distribution.

However, we can instead generate an interval estimate whichis range in which the true parameter lies.


59/97

59

Example

Suppose that , , , are independent randomvariables, each of which is distributed , .We have already seen that

~0,1As we have seen, the sample average, , is an unbiasedpoint estimate for the population mean, .How can we use this information to create an interval

estimate for the true population mean, ?


60/97

60

~0,1We can create an interval that has a 95% probability ofcontaining the population mean,

.

We call such an interval a 95% confidence interval.

In general, we can create a 100 1 %confidenceinterval by choosing a level of significance,

.

The smaller the value of that we choose, the higher ourlevel of confidence.

However, to increase our confidence, we will need alarger interval.


61/97


62/97

62

. . 0.95For example, using textbook Appendix Table G.1., we seethat the probability that a standard normal randomvariable falls between -1.96 and +1.96 is 0.95, or

1.96 1.96 0.95

0

0.1

0.2

0.3

0.4

z_0.025 z_0.025

PDF

Z


63/97

63

The cdf for the standard normal distribution shown below

is similar to Appendix Table G.1 for 3.1 1.8.Z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010-2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048-2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233

-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294

The values ofZare determined by 0.XYwhereXis thetenths place shown in the rows and Yis the hundredths

place in the columns.

As the table shows, 1.96 0.025.

Th df f h d d l di ib i h b l


64/97

64

The cdf for the standard normal distribution shown below

is similar to Appendix Table G.1 for 1.8 3.1.Z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.091.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.97061.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.97672.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.98172.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.98572.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.98902.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.99162.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.99362.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.99522.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.99642.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.99742.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.99812.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986

3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990

As the table shows, 1.96 0.975.Therefore,

1.96 1.96 1.96 1.96 0.9750.025 0.95


65/97

65

1.96 1.96 0.95

Since is a standard normal random variable, theprobability is 0.95 that it falls between -1.96 and +1.96, or

1.96 1.96 0.95We can then re-write the expression inside of tofind the 95% confidence for

.

1.96 1.96 0.95

1.96 1.96

0.95

1.96 1.96 0.95

E l


66/97

66

Example

The height of white females who registered to vote in

Allegheny County, PA during the 1960s is normally

distributed with a variance of 6.25 (in inches).

If a random sample of 9women is selected and thesample average is height 65.5, construct a 95%confidence interval for the true average height, .

Noting that 6.25 2.5, the 95%confidence interval for is

1.96

1.96

65.5 1.96

2.5

9 65.5 1.96

2.5

963.87 67.13


67/97

67

Confidence Intervals for the Mean from a Normally DistributedPopulation

Assuming that we have a random sample from a normallydistributed population andthat we know the variance of the

distribution, we can use the approach on the preceding slidesto construct a confidence interval, .In situations in which we have much prior experience with theitems being sampled, such as the manufacturer of a product

who has detailed knowledge of the weight of their product,then we may know the variance of the distribution.

However, in many instances we will not know the variance of

the distribution so that we cannot use the above methods.

In order to construct a confidence interval for a random


68/97

68

In order to construct a confidence interval for a random

sample that is drawn from a normal distribution but withunknown variance, we must first estimate the variance.

The sample variance for a sample of size nis computed as 1

It can be shown that is an unbiased estimator of the truepopulation variance,

.

If the random sample is drawn from a normal distribution,

then the ratio follows the chi square distribution with

1degrees of freedom.

It can also be shown that if is a random sample


69/97

69

It can also be shown that if

, , , is a random sample

from the normal distribution with mean , and variance ,then the tratio

has a Student tdistribution with 1degrees of freedomwhere is the square root of the sample variance, .We can create a

100 1 %confidence interval for

using an approach similar to the method used earlier forZ.

Thus, for the 95% confidence interval, we must find theappropriate values such that

., ., 0.95


70/97


71/97

Example


72/97

72

Example

Returning to the height example of white females whoregistered to vote in Allegheny County, PA during the1960s where height is normally distributed.

Suppose that for the random sample of 9women,we compute the sample average height 65.5.However, we do not know the variance of height in the

population,

, but are able to compute the sample

variance of height, 8.5.Construct a 95% confidence interval for the true averageheight,

, among white female registered voters.


73/97

73

., ., 0.95

First, notice that with 1 8d.f., ., 2.306.df 0.10 0.05 0.025 0.01 0.005

8 1.397 1.860 2.306 2.896 3.355

Inserting the appropriate values into the expression belowyields the 95% confidence interval for, .

., ., 65.52.306 8.59

65.5 2.306 8.59

63.26 67.74


74/97


75/97

75

Section C.6 Hypothesis Testing

Suppose that we have limited information about a populationparameter.

We may develop an idea or a hypothesis about the trueparameter value.

If we are able to randomly sample from our population, wecould then test our hypothesis.

When testing our hypothesis, we call our hypothesis the nullhypothesis ().

Example


76/97

76

In our height example, we may formulate a nullhypothesis that the true mean height of white femaleregistered voters in Allegheny County is 63 inches.

We would write : 63We test the null hypothesis against an alternative

hypothesis () of which there are multiple options.: 63: 63 are called one-sided alternative hypotheses.: 63is a two-sided alternative hypothesis.


77/97

77

To test the null hypothesis against one of these alternativehypotheses, we need to develop a decision rule.

Once we have done so, we then use this decision rule to decide

if we will rejectour null hypothesis or if wefail to rejectit.

There are two approaches to forming our decision rules forhypothesis testing:

1)

using confidence intervals

2)using the test of signficance


78/97


79/97

We are now ready to test our hypothesis.


80/97

80

We previously found for our example that the 95% confidenceinterval for is

63.26 67.74

The 95% confidence interval does not contain our nullhypothesis value of 63.

We therefore reject our null hypothesis at the 95% level of

confidence.

In hypothesis testing, the confidence interval is also called theacceptance region.

The area outside the region is the critical region and the limitsof the regions are the critical values.


81/97


82/97


83/97

Examining the tdistribution table with d.f.,


84/97

84

1 8df 0.10 0.05 0.025 0.01 0.0058 1.397 1.860 2.306 2.896 3.355we see that the critical value is

.,

1.860.

The decision rule is to reject if 1.860.We form the tratio as before and we previously found the tratio to be 2.57.

The difference is now the critical value we use in our decisionrule which is 1.860 as opposed to 2.306.

Since the tratio is still greater than our critical value, we reject

the null hypothesis.


85/97

How can we think about Type I and Type II errors?


86/97

86

In a trial, the null hypothesis is that the defendant is not guiltywhile the alternative hypothesis is that the defendant is guilty.

A Type I error is to reject the null hypothesis when it is true.

In a trial, that would mean finding a defendant guilty when sheis really not guilty.

A Type II error is to accept the null when it is false.

In a trial, that would mean finding a defendant not guilty whenshe is really guilty.

Most jurists prefer to reduce the number of not guilty personswho are sent to prison (i.e. reduce the number of Type I errors).

P-values

O h h i i d b fi di i i l l


87/97

87

Our hypothesis testing proceeds by finding a critical valueand then testing whether either the sample average lies withinthe confidence interval or if the t-statistic exceeds a threshold.

Another approach is to ask the question How likely is thatwe would observe the sample mean, , that we find in oursample if the population mean is really ?In our height example, we would ask how likely is it that we

would observe 65.5in our sample if the true populationmean is 63(the null hypothesis).We proceed as before when we were using the test ofsignificance approach.

65.5 632.92 9 2.57

Using the tdistribution table and where ,


88/97

88

2.57df 0.10 0.05 0.025 0.01 0.0058 1.397 1.860 2.306 2.896 3.355we see that the estimated tratio of 2.57 falls between 2.306which corresponds to a probability of p=0.05 (in both tails)and 2.896 which corresponds to a probability of p=0.02.

Thus, the probability that we would observe 65.5 underthe null hypothesis that

63is between 0.05 and 0.02.

We call this probability the p-value.

Why concern ourselves with p-values?

Instead of arbitrarily choosing a level of significance as we didbefore, we are now reporting the probability that we wouldfind the sample mean if the null hypothesis is in fact true.

Testing the Equality of Two Population Means

S th t i di id l b l t f t l ti


89/97

89

Suppose that individuals can belong to one of two populationsand we are interested in knowing if the means of the two

populations are the same.

Observations from both populations are normally distributedwith~, and ~ , .The null hypothesis is

:

which we can also write as: 0The alternative hypothesis is

: or : 0


90/97


91/97

91

From the sample of size mfrom the first population we can

calculate the sample mean and the sample variance .From the second sample, with a sample size n, we can

determine and .The test statistic that we use is a tratio which is

The degrees of freedom subscript is intentionally left off since,

as discussed above, we are assuming that we have a large samplesuch that this statistic has a standard normal distribution.


92/97

92

Notice the similarity between the tratio for the test of equalityof two means and the tratio that we used previously to test ahypothesis about a population mean

The difference is that the sample mean and the populationmean from the initial tratio are replaced by the differencesinthe sample means and the population means while the variance

of the sample mean is replaced by the variance of thedifferencein means.

Example

We can test whether mean height differs between male and


93/97

93

We can test whether mean height differs between male andfemale voters in Allegheny County.

We draw a new sample of 36 male and 36 female voters wherethe means of the sample are

summarize height sex

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

height | 72 67.29861 3.850791 59.5 75

sex | 0

Notice that the variable sexappears to be missing.

However, since it is a character variable, not a numericvariable, we must use the tabulatecommand.

tabulate sex

sex | Freq. Percent Cum.------------+-----------------------------------

F | 36 50.00 50.00

M | 36 50.00 100.00

------------+-----------------------------------

Total | 72 100.00

To test the null hypothesis that the mean heights of men and

women are equal we need to compute the means and standardd i i l f d


94/97

94

women are equal, we need to compute the means and standarddeviations separately for men and women.

summarize height if sex=="F"


-------------+--------------------------------------------------------

height | 36 64.52778 2.850926 59.5 72

summarize height if sex=="M"


-------------+--------------------------------------------------------

height | 36 70.06944 2.481799 64 75

The bysortcommand yields the same result.

bysort sex: summarize height

-----------------------------------------------------------------------------------

-> sex = F


-------------+--------------------------------------------------------

height | 36 64.52778 2.850926 59.5 72

-----------------------------------------------------------------------------------

-> sex = M


-------------+--------------------------------------------------------

height | 36 70.06944 2.481799 64 75

bysort sex: summarize height

-----------------------------------------------------------------------------------

-> sex = F


95/97

95


-------------+--------------------------------------------------------

height | 36 64.52778 2.850926 59.5 72

-----------------------------------------------------------------------------------

-> sex = M


-------------+--------------------------------------------------------

height | 36 70.06944 2.481799 64 75

The sample statistics are for women 64.5, 2.85, and

8.12and for men

70.1,

2.48, and

6.15.

Since our null hypothesis is : 0while ouralternative hypothesis is : 0, we will perform atwo tailed test.

At the 0.05level of significance, the critical tvalue is1.96 so our decision rule is to reject if || 1.96.


96/97


97/97

Date post:	23-Feb-2018
Category:	Documents
Upload:	zeqi-zhu
View:	218 times
Download:	0 times

Lecture 1 Stat Review

Documents