+ All Categories
Home > Documents > DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy...

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy...

Date post: 23-Dec-2015
Category:
Upload: constance-wilkins
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
46
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory variable which has more than two categories. 1 COST = 1 + T TECH + W WORKER + V VOC + 2 N + u
Transcript
Page 1: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory variable which has more than two categories.

1

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

Page 2: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

In the previous sequence we used a dummy variable to differentiate between regular and occupational schools when fitting a cost function.

2

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

Page 3: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

In actual fact there are two types of regular secondary school in Shanghai. There are general schools, which provide the usual academic education, and vocational schools.

3

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

Page 4: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

As their name implies, the vocational schools are meant to impart occupational skills as well as give an academic education.

4

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

Page 5: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

However the vocational component of the curriculum is typically quite small and the schools are similar to the general schools. Often they are just general schools with a couple of workshops added.

5

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

Page 6: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

Likewise there are two types of occupational school. There are technical schools training technicians and skilled workers’ schools training craftsmen.

6

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

Page 7: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

So now the qualitative variable has four categories. The standard procedure is to choose one category as the reference category and to define dummy variables for each of the others.

7

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

Page 8: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

In general it is good practice to select the most normal or basic category as the reference category, if one category is in some sense more normal or basic than the others.

8

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

Page 9: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

In the Shanghai sample it is sensible to choose the general schools as the reference category. They are the most numerous and the other schools are variations of them.

9

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

Page 10: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

Accordingly we will define dummy variables for the other three types. TECH will be the dummy for the technical schools: TECH is equal to 1 if the observation relates to a technical school, 0 otherwise.

10

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

Page 11: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

Similarly we will define dummy variables WORKER and VOC for the skilled workers’ schools and the vocational schools.

11

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

Page 12: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

Each of the dummy variables will have a coefficient which represents the extra overhead costs of the schools, relative to the reference category.

12

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

Page 13: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

Note that you do not include a dummy variable for the reference category, and that is the reason that the reference category is usually described as the omitted category.

13

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

Page 14: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

If an observation relates to a general school, the dummy variables are all 0 and the regression model is reduced to its basic components.

14

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

General school(TECH = WORKER = VOC = 0)

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

COST = 1+ 2N + u

Page 15: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

If an observation relates to a technical school, TECH will be equal to 1 and the other dummy variables will be 0. The regression model simplifies as shown.

15

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

General school(TECH = WORKER = VOC = 0)

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

COST = 1+ 2N + u

COST = (1+ T) + 2N + uTechnical school(TECH = 1; WORKER = VOC = 0)

Page 16: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

The regression model simplifies in a similar manner in the case of observations relating to skilled workers’ schools and vocational schools.

16

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

General school(TECH = WORKER = VOC = 0)

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

COST = 1+ 2N + u

COST = (1+ T) + 2N + uTechnical school(TECH = 1; WORKER = VOC = 0)

COST = (1+ V) + 2N + uVocational school(VOC = 1; TECH = WORKER = 0)

COST = (1+ W) + 2N + uSkilled workers' school(WORKER = 1; TECH = VOC = 0)

Page 17: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

The diagram illustrates the model graphically. The coefficients are the extra overhead costs of running technical, skilled workers’, and vocational schools, relative to the overhead cost of general schools.

17

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

N

W

V

T

Workers’Vocational

Technical

General

1+T

1+W

1+V

1

COST

Page 18: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

Note that we do not make any prior assumption about the size, or even the sign, of the coefficients. They will be estimated from the sample data.

18

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

N

W

V

T

Workers’Vocational

Technical

General

1+T

1+W

1+V

1

COST

Page 19: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

School Type COST N TECH WORKER VOC

1 Technical 345,000 623 1 0 0

2 Technical 537,000 653 1 0 0

3 General 170,000 400 0 0 0

4 Workers’ 526.000 663 0 1 0

5 General 100,000 563 0 0 0

6 Vocational 28,000 236 0 0 1

7 Vocational 160,000 307 0 0 1

8 Technical 45,000 173 1 0 0

9 Technical 120,000 146 1 0 0

10 Workers’ 61,000 99 0 1 0

Here are the data for the first 10 of the 74 schools. Note how the values of the dummy variables TECH, WORKER, and VOC are determined by the type of school in each observation.

19

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

Page 20: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

The scatter diagram shows the data for the entire sample, differentiating by type of school.

20

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

N

COST

-100000

0

100000

200000

300000

400000

500000

600000

0 200 400 600 800 1000 1200

Technical schools Workers' schools Vocational schools General schools

Page 21: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

Here is the Stata output for this regression. The coefficient of N indicates that the marginal cost per student per year is 343 yuan.

21

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578

------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748------------------------------------------------------------------------------

Page 22: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578

------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748------------------------------------------------------------------------------

The coefficients of TECH, WORKER, and VOC are 154,000, 143,000, and 53,000, respectively, and should be interpreted as the additional annual overhead costs, relative to those of general schools.

22

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

Page 23: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578

------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748------------------------------------------------------------------------------

The constant term is –55,000, indicating that the annual overhead cost of a general academic school is –55,000 yuan per year. Obviously this is nonsense and indicates that something is wrong with the model.

23

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

Page 24: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

The top line shows the regression result in equation form. We will derive the implicit cost functions for each type of school.

24

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

COST = –55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N^

Page 25: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

In the case of a general school, the dummy variables are all 0 and the equation reduces to the intercept and the term involving N.

25

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

General school(TECH = WORKER = VOC = 0)

COST = –55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N^

^COST = –55,000 + 343N

Page 26: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

The annual marginal cost per student is estimated at 343 yuan. The annual overhead cost per school is estimated at –55,000 yuan. Obviously a negative amount is inconceivable.

26

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

General school(TECH = WORKER = VOC = 0)

COST = –55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N^

^COST = –55,000 + 343N

Page 27: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

The extra annual overhead cost for a technical school, relative to a general school, is 154,000 yuan. Hence we derive the implicit cost function for technical schools.

27

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

General school(TECH = WORKER = VOC = 0)

COST = –55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N

Technical school(TECH = 1; WORKER = VOC = 0)

^

^COST = –55,000 + 343N

^COST = –55,000 + 154,000 + 343N= 99,000 + 343N

Page 28: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

And similarly the extra overhead costs of skilled workers’ and vocational schools, relative to those of general schools, are 143,000 and 53,000 yuan, respectively.

28

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

General school(TECH = WORKER = VOC = 0)

COST = –55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N

Technical school(TECH = 1; WORKER = VOC = 0)

Vocational school(VOC = 1; TECH = WORKER = 0)

Skilled workers' school(WORKER = 1; TECH = VOC = 0)

^

^COST = –55,000 + 343N

^COST = –55,000 + 154,000 + 343N= 99,000 + 343N

^COST = –55,000 + 143,000 + 343N= 88,000 + 343N

^COST = –55,000 + 53,000 + 343N= –2,000 + 343N

Page 29: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

Note that in each case the annual marginal cost per student is estimated at 343 yuan. The model specification assumes that this figure does not differ according to type of school.

29

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

General school(TECH = WORKER = VOC = 0)

COST = –55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N

Technical school(TECH = 1; WORKER = VOC = 0)

Vocational school(VOC = 1; TECH = WORKER = 0)

Skilled workers' school(WORKER = 1; TECH = VOC = 0)

^

^COST = –55,000 + 343N

^COST = –55,000 + 154,000 + 343N= 99,000 + 343N

^COST = –55,000 + 143,000 + 343N= 88,000 + 343N

^COST = –55,000 + 53,000 + 343N= –2,000 + 343N

Page 30: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

The four cost functions are illustrated graphically.

30

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

-100000

0

100000

200000

300000

400000

500000

600000

0 200 400 600 800 1000 1200

Technical schools Workers' schools Vocational schools General schools

N

COST

Page 31: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578

------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748------------------------------------------------------------------------------

We can perform t tests on the coefficients in the usual way. The t statistic for N is 8.52, so the marginal cost is (very) significantly different from 0, as we would expect.

31

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

Page 32: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578

------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748------------------------------------------------------------------------------

The t statistic for the technical school dummy is 5.76, indicating the the annual overhead cost of a technical school is (very) significantly greater than that of a general school, again as expected.

32

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

Page 33: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578

------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748------------------------------------------------------------------------------

Similarly for skilled workers’ schools, the t statistic being 5.15.

33

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

Page 34: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578

------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748------------------------------------------------------------------------------

In the case of vocational schools, however, the t statistic is only 1.71, indicating that the overhead cost of such a school is not significantly greater than that of a general school.

34

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

Page 35: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

This is not surprising, given that the vocational schools are not much different from the general schools.

35

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578

------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748------------------------------------------------------------------------------

Page 36: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

Note that the null hypotheses for the tests on the coefficients of the dummy variables are than the overhead costs of the other schools are not different from those of the general schools.

36

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578

------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748------------------------------------------------------------------------------

Page 37: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

Finally we will perform an F test of the joint explanatory power of the dummy variables as a group. The null hypothesis is H0: T = W = V = 0. The alternative hypothesis is that at least one is different from 0.

37

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578

------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748------------------------------------------------------------------------------

Page 38: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

The residual sum of squares in the specification including the dummy variables is 5.41×1011.

38

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578

------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748------------------------------------------------------------------------------

Page 39: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

. reg COST N

Source | SS df MS Number of obs = 74---------+------------------------------ F( 1, 72) = 46.82 Model | 5.7974e+11 1 5.7974e+11 Prob > F = 0.0000Residual | 8.9160e+11 72 1.2383e+10 R-squared = 0.3940---------+------------------------------ Adj R-squared = 0.3856 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 1.1e+05

------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- N | 339.0432 49.55144 6.842 0.000 240.2642 437.8222 _cons | 23953.3 27167.96 0.882 0.381 -30205.04 78111.65------------------------------------------------------------------------------

The residual sum of squares in the specification excluding the dummy variables is 8.92×1011.

39

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

Page 40: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

The reduction in RSS when we include the dummies is therefore (8.92 – 5.41)×1011. We will check whether this reduction is significant with the usual F test.

40

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

. reg COST N

Source | SS df MS Number of obs = 74---------+------------------------------ F( 1, 72) = 46.82 Model | 5.7974e+11 1 5.7974e+11 Prob > F = 0.0000Residual | 8.9160e+11 72 1.2383e+10 R-squared = 0.3940---------+------------------------------ Adj R-squared = 0.3856 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 1.1e+05----------------------------------------

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578----------------------------------------

Page 41: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

92.1469/1041.5

3/)1041.51092.8()69,3( 11

1111

F

The numerator in the F ratio is the reduction in RSS divided by the cost, which is the 3 degrees of freedom given up when we estimate three additional coefficients (the coefficients of the dummies).

41

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

. reg COST N

Source | SS df MS Number of obs = 74---------+------------------------------ F( 1, 72) = 46.82 Model | 5.7974e+11 1 5.7974e+11 Prob > F = 0.0000Residual | 8.9160e+11 72 1.2383e+10 R-squared = 0.3940---------+------------------------------ Adj R-squared = 0.3856 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 1.1e+05----------------------------------------

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578----------------------------------------

Page 42: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

92.1469/1041.5

3/)1041.51092.8()69,3( 11

1111

F

The denominator is RSS for the specification including the dummy variables, divided by the number of degrees of freedom remaining after they have been added.

42

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

. reg COST N

Source | SS df MS Number of obs = 74---------+------------------------------ F( 1, 72) = 46.82 Model | 5.7974e+11 1 5.7974e+11 Prob > F = 0.0000Residual | 8.9160e+11 72 1.2383e+10 R-squared = 0.3940---------+------------------------------ Adj R-squared = 0.3856 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 1.1e+05----------------------------------------

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578----------------------------------------

Page 43: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

The F ratio is therefore 14.92.

43

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

. reg COST N

Source | SS df MS Number of obs = 74---------+------------------------------ F( 1, 72) = 46.82 Model | 5.7974e+11 1 5.7974e+11 Prob > F = 0.0000Residual | 8.9160e+11 72 1.2383e+10 R-squared = 0.3940---------+------------------------------ Adj R-squared = 0.3856 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 1.1e+05----------------------------------------

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578----------------------------------------

92.1469/1041.5

3/)1041.51092.8()69,3( 11

1111

F

Page 44: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

F tables do not give the critical value for 3 and 69 degrees of freedom, but it must be lower than the critical value with 3 and 60 degrees of freedom. This is 6.17, at the 0.1% significance level.

44

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

. reg COST N

Source | SS df MS Number of obs = 74---------+------------------------------ F( 1, 72) = 46.82 Model | 5.7974e+11 1 5.7974e+11 Prob > F = 0.0000Residual | 8.9160e+11 72 1.2383e+10 R-squared = 0.3940---------+------------------------------ Adj R-squared = 0.3856 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 1.1e+05----------------------------------------

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578----------------------------------------

92.1469/1041.5

3/)1041.51092.8()69,3( 11

1111

F 17.6)60,3( %1.0 crit, F

Page 45: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

Thus we reject H0 at a high significance level. This is not exactly surprising since t tests show that TECH and WORKER have highly significant coefficients.

92.1469/1041.5

3/)1041.51092.8()69,3( 11

1111

F

45

17.6)60,3( %1.0 crit, F

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

. reg COST N

Source | SS df MS Number of obs = 74---------+------------------------------ F( 1, 72) = 46.82 Model | 5.7974e+11 1 5.7974e+11 Prob > F = 0.0000Residual | 8.9160e+11 72 1.2383e+10 R-squared = 0.3940---------+------------------------------ Adj R-squared = 0.3856 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 1.1e+05----------------------------------------

. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578----------------------------------------

Page 46: DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

Copyright Christopher Dougherty 2012.

These slideshows may be downloaded by anyone, anywhere for personal use.

Subject to respect for copyright and, where appropriate, attribution, they may be

used as a resource for teaching an econometrics course. There is no need to

refer to the author.

The content of this slideshow comes from Section 5.2 of C. Dougherty,

Introduction to Econometrics, fourth edition 2011, Oxford University Press.

Additional (free) resources for both students and instructors may be

downloaded from the OUP Online Resource Centre

http://www.oup.com/uk/orc/bin/9780199567089/.

Individuals studying econometrics on their own who feel that they might benefit

from participation in a formal course should consider the London School of

Economics summer school course

EC212 Introduction to Econometrics

http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx

or the University of London International Programmes distance learning course

EC2020 Elements of Econometrics

www.londoninternational.ac.uk/lse.

2012.11.04


Recommended