+ All Categories
Home > Documents > Train the model with a subset of the data Test the model ... the model on the remaining data (the...

Train the model with a subset of the data Test the model ... the model on the remaining data (the...

Date post: 22-May-2018
Category:
Upload: vomien
View: 213 times
Download: 1 times
Share this document with a friend
38
Transcript

Train the model with a subset of the data

Test the model on the remaining data (the validation set)

What data to choose for training vs. test?

In a time-series dimension, it is natural to hold out the last year (or time period) of the data, to simulate predicting the future based on all past data. In most settings, however, we’ll randomly select our training/test sets.

A class of methods to do many training/test splits and average over all the runs

Here is a simple example of 5-fold cross validation. Gives 5 test sets 5 estimates of MSE.

The 5-fold CV estimate is obtained by averaging these values.

Split the data up into K “folds”. Iteratively leave fold k out of the training data and use it to test.

The more folds, the smaller each testing set is (more training data), but the more times we need to

run the estimation procedure. Using rules of thumb like 5—10 folds is often utilized in practice. This can be done with a simple for loop in R

For generalized linear models, the cv.glm() function can be used to perform k-fold cross validation. For example, this code loops over 10 possible polynomial orders and computes the 10-fold cross-validated error in each step

The validation set methodology works when we specify a class of models

Examples:

• Linear models with p-order polynomials

• A set of related models such as LDA, QDA and Logit

• Non-parametric models with a smooth parameter

Once I have a class of models, I loop over the class to try each one, and compute a validation set error on each pass. K-fold CV methods helps ensure my estimates will be stable

An alternative to the approach of fitting many models and picking the best is to fit a single model using all the predictors and throw out the less useful ones in one step.

This could also help pare down my feature set into a more manageable set before trying out fancy models and deeper analysis

Helps guard against overfitting or have higher model interpretability

Regularization helps achieve these aims!

The suppose we have p features and want to fit the following estimating equation

𝑦𝑖 = 𝛼 +

𝑗=1

𝑝

𝛽𝑗𝑥𝑖𝑗 + 𝜖𝑖

OLS will find መ𝛽′s that minimize least squares in the training set. We have learned that this will tend to overfit models

Another way of putting this, we end up with too complex (“squiggly”) models. They have too much variance.

Is a method to reduce variance in our model by imposing a penalty for having high መ𝛽′s. We can then tune this penalty to minimize MSE in a validation set.

To do so, we minimize:

The size of the penalty term 𝜆 determines how aggressively we lower the coefficients

Since the penalty function is the absolute value, we will often set መ𝛽′sequal to zero

The idea is that for a feature k that does not improve fit by very much, it’s not worth suffering the penalty, 𝜆 ∗ |𝛽𝑘|, of adding it into the model

Binary outcomes: equal to 1 or 0. Examples: coin is heads or tails, person is guilty or innocent, default or not on a loan, etc.

Continuous outcomes: can take an real value

Categorical outcomes: can take one of N qualitative values. Example: mapping symptons to a well-defined disease.

Traditional regression is designed for continuous outcomes.

We use classification for binary and categorical outcomes

A mapping of categories to numbers comes with strong implications

Would imply different “differences” across qualitative conditions

Outcome: 0 or 1

Given features X, we want to say “what is the probability the outcome=1”.

We can write this as 𝑃(𝑜𝑢𝑡𝑐𝑜𝑚𝑒 = 1|𝑋)

Our model will output a probability even though real outcomes will always be 0 or 1

26

Inverse Elasticity Rule

• Profit Max (MR=MC)

• Price-cost margin (Lerner index) = 1 over elasticity

• Price minus marginal costs divided by price is referred to as gross margin.

.1)(

p

pq

p

qcp

)(')(0 qcqpqpq

MR MC

27

Inverse Elasticity Rule

• Suppose MC=0. Then quantity is chosen so that elasticity is 1.

• Intuition: if marginal costs are zero, then optimize for revenue. Total revenue grows until elasticity = 1.

• If MC>0, one will “stop” before reaching elasticity = 1.

.1)(

p

pq

p

qcp

11

p

pq

p

p

Inverse Elasticity Rule 2

• Suppose we sell n goods indexed i=1,…,n

• Demands xi(p)

• Profit

• If we assume constant marginal cost, this simplification is an example of selling the same good in multiple markets or to multiple customer “types”

• Cross-price elasticity

• Note no minus sign.

• Positive substitutes; Negative complements

).()(1

p

n

i ii ximcp

.j

i

i

jij

dp

dx

x

p

28

29

Representative Consumer Assumption

• If there is a representative consumer maximizing utility:

max u(x)-px, so

• Thus there are symmetric cross-derivatives

pxxpx dduandu )()(

i

j

j

i

p

x

p

x

From the total derivative of FOC

Recall this rule from multivariate calculus

This rule need not hold in practice, but is a commonly made assumption

30

In Matrix Notation

Price cost margin:

n

jj

iji

n

ji

j

ji

i p

ximcpx

p

ximcpx

p 11))(())((0

n

j ij

j

j

ip

imcpx

1

))((1

,)(

i

ii

p

imcpL

0 = 1 + E L, and thus L = - E-1 1

Two Good Formula• L = - E-1 1 yields

𝑳𝟏𝑳𝟐

=𝒆𝟏𝟏 𝒆𝟏𝟐𝒆𝟐𝟏 𝒆𝟐𝟐

−𝟏 𝟏𝟏

𝑳𝟏 = −𝒆𝟐𝟐 − 𝒆𝟏𝟐

𝒆𝟏𝟏𝒆𝟐𝟐 − 𝒆𝟏𝟐𝒆𝟐𝟏=

𝟏 −𝒆𝟏𝟐𝒆𝟐𝟐

𝒆𝟏𝟏 −𝒆𝟏𝟐𝒆𝟐𝟏𝒆𝟐𝟐

= −𝟏 −

𝒆𝟏𝟐𝒆𝟐𝟐

𝒆𝟏𝟏 𝟏 −𝒆𝟏𝟐𝒆𝟐𝟏𝒆𝟐𝟐𝒆𝟏𝟏

= −𝟏

𝒆𝟏𝟏

𝟏 −𝒆𝟏𝟐𝒆𝟐𝟐

𝟏 −𝒆𝟏𝟐𝒆𝟐𝟏𝒆𝟐𝟐𝒆𝟏𝟏

31

Divide top and bottom by 𝒆𝟐𝟐

Multiply top and bottom by 𝒆𝟏𝟏

Factor out 𝒆𝟏𝟏

Rule for inverting a 2x2 matrix

Two Good Formula• L = - E-1 1 yields

L𝟏 = −𝟏

𝒆𝟏𝟏

𝟏 −𝒆𝟏𝟐𝒆𝟐𝟐

𝟏 −𝒆𝟏𝟐𝒆𝟐𝟏𝒆𝟐𝟐𝒆𝟏𝟏

𝒆𝟏𝟐𝒆𝟐𝟏

𝒆𝟐𝟐𝒆𝟏𝟏will be between 0 and 1 because 𝒆𝟏𝟐𝒆𝟐𝟏 < 𝒆𝟐𝟐𝒆𝟏𝟏

This is because cross price elasticities have to be smaller than the relevant own price elasticities.

32

Two Good Formula• L = - E-1 1 yields

L𝟏 = −𝟏

𝒆𝟏𝟏

𝟏 −𝒆𝟏𝟐𝒆𝟐𝟐

𝟏 −𝒆𝟏𝟐𝒆𝟐𝟏𝒆𝟐𝟐𝒆𝟏𝟏

= −𝟏

𝒆𝟏𝟏

𝟏 −+−

𝟏 −+ +− −

= −𝟏

𝒆𝟏𝟏

> 𝟏

𝟎 − 𝟏

33

Two Good Formula for Substitutes• L = - E-1 1 yields

L𝟏 = −𝟏

𝒆𝟏𝟏

𝟏 −𝒆𝟏𝟐𝒆𝟐𝟐

𝟏 −𝒆𝟏𝟐𝒆𝟐𝟏𝒆𝟐𝟐𝒆𝟏𝟏

= −𝟏

𝒆𝟏𝟏

𝟏 −+−

𝟏 −+ +− −

= −𝟏

𝒆𝟏𝟏

> 𝟏

𝟎 − 𝟏

34

Two Good Formula for Substitutes• L = - E-1 1 yields

L𝟏 = −𝟏

𝒆𝟏𝟏

𝟏 −𝒆𝟏𝟐𝒆𝟐𝟐

𝟏 −𝒆𝟏𝟐𝒆𝟐𝟏𝒆𝟐𝟐𝒆𝟏𝟏

35

Two Good Formula for Complements• L = - E-1 1 yields

L𝟏 = −𝟏

𝒆𝟏𝟏

𝟏 −𝒆𝟏𝟐−

𝟎 − 𝟏

36

Two Good Formula Review• L = - E-1 1 yields

= −𝟏

𝒆𝟏𝟏

𝟏 −𝒆𝟏𝟐𝒆𝟐𝟐

𝟏 −𝒆𝟏𝟐𝒆𝟐𝟏𝒆𝟐𝟐𝒆𝟏𝟏

𝒆𝟏𝟐 > 𝟎, goods are substitutes. A price decrease on product 2 decreases sales on product 1 (go in same direction)

𝒆𝟏𝟐 < 𝟎, goods are complements. A price decrease on product 2 increases sales on product 1 (go in opposite directions)

𝒆𝟏𝟏& 𝒆𝟐𝟐 will be negative due to law of demand (note before we “embedded the negative sign)

37

38

pB

p2

p1

v1

v2

Buy Both

Buy Good 1

Buy Good 2

Buy Nothing

Reducing bundle price gives the additional sales of both goods with a single price cut


Recommended