+ All Categories
Home > Documents > America CAS Seminar on Ratemaking March 2005 Presented by: Serhat Guven An Introduction to GLM...

America CAS Seminar on Ratemaking March 2005 Presented by: Serhat Guven An Introduction to GLM...

Date post: 13-Dec-2015
Category:
Upload: roderick-mosley
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
45
America CAS Seminar on Ratemaking March 2005 Presented by: Serhat Guven An Introduction to GLM Theory Refinements
Transcript

America

CAS Seminar on RatemakingMarch 2005Presented by: Serhat Guven

An Introduction to GLM TheoryRefinements

2

America

An Introduction to GLM Theory

OUTLINE

Background

GLM Building Blocks

­ Link Function

­ Error Distribution

­ Model Structure

Diagnostics

Summary

PURPOSE: To discuss techniques that refine the structural design, including proper tools and diagnostics, of the GLM thereby allowing for a more flexible model

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

3

America

Purpose of Predictive Modeling

To predict a response variable using a series of explanatory variables (or rating factors).

Traditional methods focus on the parameters, modeling requires the analyst to consider the validation of the parameters.

Dependent/ResponseLossesClaims

Retention

WeightsClaims

ExposuresPremium

Statistical Model

Model ResultsParameters

Validation Statistics

Independent/PredictorsAge AccidentsLimit Convictions

Territory Credit Score

Background

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

4

America

Purpose of Predictive Modeling

To produce a sensible model that explains recent historical experience and is likely to be predictive of future experience.

Strong predictive power yet very

poor explanatory power

Good predictor of previous experience but poor predictor of

future experience

Overall Mean“Best” Model

1 parameter for each observation

Model Complexity

(Number of Parameters)

Traditional methods tend to create unnecessarily complex structures that tend to overfit the data.

Background

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

5

America

GLMs generalize the traditional regression models by introducing nonlinearity through the link function and loosening the normality assumption

Generalized Linear Models

y = h(Linear Combination of Rating Factors) + Error

g=h-1 is called the LINK function and is chosen to measure the “signal” most accurately

Error should reflect underlying process and can come from the exponential family

Response Variable

Systematic Component

Random Component= +

Linear combination of rating factors is the model structure

Signal Noise

Background

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

6

America

Generalized Linear Models

εYY

ξ) h(XβμY

More formally:

Response Variable

Systematic Component

Random Component= +

Where:

And:

ω

)μφV(Var(Y)

Link function (g=h-1) Links random and systematic component.

Design MatrixIdentifies predictor variables for each observation.

ParametersQuantities estimated via log likelihood function.

Offset TermAllows incorporation of known effects or restrictions.

Scale Parameter

Variance Function

Prior Weights

Background

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

7

America

Generalized Linear Models

)1()1()1(1)-(rT11)-(rT)( μyμ'gηWXXWXβ rrrr

)μ(V)μ('g

ωW

)(2)((r)

rrdiag

The general solution for the GLM parameters:

Where:

weights

rr

ω

Xβμ

μgη

hg

)()(

1

and:

Link function

Error Distribution

Model Structure

Background

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

8

America

GLMs generalize the traditional regression models by introducing nonlinearity through the link function and loosening the normality assumption

Components of a GLM

y = h(Linear Combination of Rating Factors) + Error

Response Variable

Systematic Component

Random Component= +

Signal Noise

Building Blocks

Basic Building Blocks

­ Link Function

­ Error Structure

­ Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

9

America

GLM Building Blocks: Link Functions

Link function (g=h-1) chosen to based on how the factors are related to produce the best signal:

-Log: variables related multiplicatively (e.g., risk modeling)

- Identity: variables related additively (e.g., risk modeling)

-Logit: retention or risk modeling

-Reciprocal: canonical link for gamma distribution (e.g., severity modeling)

-Mixed: additive/multiplicative rating algorithms

y = h(Linear Combination of Rating Factors) + Error

Link Functions

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

10

America

Example: Log Link

The rating structure is multiplicative and the premium for a youthful policyholder living in Area C is: $1,955=$1,000 x 1.700 x 1.150

Policyholder Age (p)

Relativity

Youthful

Adult

Mature

Seniors

1.700

1.000

0.800

1.100

Rating Area (r)

Relativity

A

B

C

D

E

0.900

1.000

1.150

1.300

1.500

Base Premium = 1,000

The signal allows us to populate rating tables:

Premium = Base Premium x Policyholder Age x Rating Area

=exp(.531)

=exp(0.000)

=exp(-0.223)

=exp(0.095)

=exp(-0.105)

=exp(0.000)

=exp(0.140)

=exp(0.262)

=exp(0.405)

=exp(6.908)

= exp(6.908) * exp(0.531) * exp(0.140)

= exp(6.908 + 0.531 + 0.140)

= exp(b+p+r)

h(linear combination of rating factors)

Link Functions

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

11

America

Link function relates the independent predictors to the response in a non linear form :

­ Pure Multiplicative – Log

XβexpηY

)Xβ(1

1ηY

e

XβηY

1ηY

- Pure Additive - Identity

- Logit

- Reciprocal

GLM Building Blocks: Link FunctionsLink Functions

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

12

America

Mixed Rating Algorithms

Mixed additive – multiplicative rating algorithm:

Base Rate*(Age+Gender+Usage)*Driving Record*Territory*Limit

Mixture model structures do not fit within the framework of garden variety GLMs

Solutions:

- Create n way tables from the pure multiplicative models

- Build hierarchal GLM to systematically estimate the additive component from the multiplicative factors

- Remove the restriction on the link function to calculate parameters directly

Link Functions

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

13

America

Generalized Nonlinear Models

TeSXY Z

ZeXY

Examples models that do not fit into traditional GLMs:

­ Mixed Additive Multiplicative Model

­ Alternative Mixtures

ZPeX-e1

1Y

­ Complicating the logit functions

Alternate link functions allow the ability to introduce additional nonlinearities into the solution.

Link Functions

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

14

America

Solution for Non-Linear Models

Use identity link function

Replace design matrix X in GLM solution with D

yWDDWDβ 1)-(rT11)-(rT)( r

)μ(V

ωW

)((r)

rdiag

ijth element of D is derivative of with regard to the jth parameter for the ith observation

Sort of equivalent to solving GLM with different design matrix. However, there are two “linear predictors” in use

Where

“…the method is likely to be most useful for determining if a reasonable fit can be improved, rather than for the somewhat more optimistic goal of correcting a hopeless situation.”

-Pregibon (1980)

Link Functions

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

15

America

GLM Building Blocks: Error Structure

Density: Severity

0

200

400

600

800

1,000

1,200

1,400

0 2,000 4,000 6,000 8,000 10,000 12,000 14,000

Range

Den

sit

y

Severity

Reflects the variability of the underlying process

Frequency: Frequency

0

10,000

20,000

30,000

40,000

50,000

60,000

70,000

80,000

90,000

100,000

0 1 2 3 4

Range

Fre

qu

en

cy

Frequency

- Gamma consistent with severity modeling, may want to try Inverse Gaussian

- Poisson consistent with frequency modeling

- Tweedie consistent with pure premium modeling

- Normal useful for a variety of applications

y = h(Linear Combination of Rating Factors) + Error

Density: Pure Premium

0

200

400

600

800

1,000

1,200

1,400

1,600

1,800

2,000

2,200

2,400

0 2,000 4,000 6,000 8,000 10,000 12,000 14,000

Range

Den

sit

y

Pure Premium

Density: Loss Ratio

0

100

200

300

400

500

600

700

800

900

1,000

-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

Range

Den

sit

y

Loss Ratio

Error Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

16

America

Error Structure: Variance Function

Observed Response Error StructureVariance Function

V()Scale Parameter

Normal µ0

Claim Frequency Poisson µ 1

Claim Severity Gamma µ2

Risk Premium Gamma or Tweedie µT µT

New/Renewal Rate Binomial µ (1-µ) 1

Error structure is also used to incorporate assumptions about uncertainty and the prediction

Error Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

17

America

Example: Binomial Distribution

Binomial

­ Basic functional form in decision modeling

­ Belongs to the exponential family of distributions

­ Can be extended to multinomial distributions

-

0.0500

0.1000

0.1500

0.2000

0.2500

0.3000

Mean

Va

ria

nc

e

Extreme probabilities of success/failure

related to low variability

Higher variability associated with less certain probability

outcomes

Variance Function = (1-)

Error Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

18

America

Additional Variance Functions

Observed Response Error StructureVariance Function

V()Scale Parameter

Normal µ0

Claim Frequency Poisson µ 1

Claim Severity Gamma µ2

Risk Premium Gamma or Tweedie µT µT

New/Renewal Rate Binomial µ (1-µ) 1

Claim FrequencyOver-dispersed

Poissonµ K

Claim Severity Inverse Gaussian µ3

Error structure is also used to incorporate assumptions about the uncertainty and the predicted value

Error Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

19

America

Heterogeneous exposure bases

­ If model together, exposures with high variability may mask patterns of less random risks

­ If loss trends vary by exposure class, the proportion each represents of the total will change and may mask important trends

­ Independent predictors can have different effects on different perils

If cannot split data, use joint modeling techniques to improve overall fit

Error Structure: Scale Parameter

-3.5

-3.0

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

-1,000 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000

Fitted Value

Stu

den

tized

Sta

nd

ard

ized

Devia

nce R

esid

uals

> .2

> .5

> .7

> 1.4

> 2.2

> 2.9

> 3.6

> 4.3

> 5.0

> 5.7

> 6.5

> 7.2

> 7.9

> 8.6

> 9.3

> 10.1

> 10.8

> 11.5

> 12.2

> 12.9

> 13.6

> 14.4

> 15.1

> 15.8

> 16.5

> 17.2

> 17.9

> 18.7

> 19.4

> 20.1

> 20.8

> 21.5

> 22.3

> 23.0

> 23.7

> 24.4

> 25.1

> 25.8

> 26.6

> 27.3

-5

-4

-3

-2

-1

0

1

2

3

-1,000 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 11,000

Fitted Value

Stu

den

tized

Sta

nd

ard

ized

Devia

nce R

esid

uals

> .3

> .5

> .8

> 1.6

> 2.5

> 3.3

> 4.1

> 4.9

> 5.7

> 6.6

> 7.4

> 8.2

> 9.0

> 9.8

> 10.7

> 11.5

> 12.3

> 13.1

> 13.9

> 14.8

> 15.6

> 16.4

> 17.2

> 18.1

> 18.9

> 19.7

> 20.5

> 21.3

> 22.2

> 23.0

> 23.8

> 24.6

> 25.4

> 26.3

> 27.1

> 27.9

> 28.7

> 29.5

> 30.4

> 31.2

Cluster of negative residuals caused by combination of two types of risks

Joint modeling techniques improved the residual plot

Error Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

20

America

GLM Building Blocks: Model Structure

Include variables that are predictive, exclude those that are not

- Gender may not have major impact on theft severity

Simplify some rating factors, if full inclusion not necessary

- Some levels within a particular predictor may be grouped together (e.g., 50-54 year olds)

- A curve may replicate the signal (e.g., amount of insurance)

- Scoring levels to combine rating factors into a single concept thereby untangling impacts of various factors

Complicate model if the relationship between levels of one variable depends on another characteristic

- The difference between males and females depends on age

y = h(Linear Combination of Rating Factors) + Error

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

21

America

Complicating the Model: Interactions

Interactions are required when the combined effect of multiple rating levels of two different independent rating factors is different than the additive effect of the simple parameters.

Interaction topics

- Interactions versus correlations

- Identifying interactions

- Full and partial interactions

- Simplifying interactions

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

22

America

Interactions versus Correlations

Male Female Totals

Youthful wYM wYF wY.

Adult wAM wAF wA.

Mature wMM wMF wM.

Seniors wSM wSF wS.

Totals w.M w.F w..

Distributional Correlations: Observed Weights

Let:

i j

ji

Q

ww

ji,

2ji,ji,

..

.

..

...ji,

w

)w(w

wwww

Assumption of distributional independence

Testing the assumption – Cramer’s V scales this statistic from (-1,+1)

A simple GLM model addresses distributional correlations.

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

23

America

Interactions versus Correlations

Male Female

Youthful yYM yYF

Adult yAM yAF

Mature yMM yMF

Seniors ySM ySF

Interactions: Observed Responses

Assumption of simple model adequacy.

Testing the assumption-

Chi Squared Test scales this statistic from (0,1)

Let:

i j ji,

2ji,ji,

ji,kji,ji,

y

)y(yQ

)ξ βh(xμy

A more complex GLM model is required to handles such interactions.

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

24

America

Identifying Interactions

Decomposing the Interaction for simplification

- graphically

-Chi-squared test

-Signs test

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

25

America

Identifying Interactions

Understanding complex relationships between multiple rating factors.Policyholder Age x Policyholder Sex Interaction

0.0000

0.0001

0.0002

0.0003

0.0004

0.0005

0.0006

0.0007

0.0008

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

55%

Policyholder Age

Policyholder Sex (Male)

Policyholder Sex (Female)

Policyholder Age x Policyholder Sex Interaction - Rescaled

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

55%

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

35-3

9

40-4

4

45-4

9

50-5

4

55-5

9

60-6

4

65-6

970

+

Policyholder Age

Policyholder Sex (Male)

Policyholder Sex (Female)

- Can view interactions on an “absolute” scale …

- … or on a rescaled basis.

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

26

America

Identifying Interactions

Consistency of Interaction over time.

Predicted Values [LossYear (1995)]

-0.0002

-0.0001

0.0000

0.0001

0.0002

0.0003

0.0004

0.0005

0.0006

0.0007

0.0008

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

22%

24%

Policyholder Age

Policyholder Sex (Male)

Policyholder Sex (Female)

Predicted Values [LossYear (1996)]

-0.0002

-0.0001

0.0000

0.0001

0.0002

0.0003

0.0004

0.0005

0.0006

0.0007

0.0008

0.0009

0%

5%

10%

15%

20%

25%

30%

35%

Policyholder Age

Policyholder Sex (Male)

Policyholder Sex (Female)

- Three way interaction with time …

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

27

America

Parameter Notation: Simple Model

Relationship between rating levels of one rating factor is constant for all levels of other rating variables.

Assume two rating variables:

- Age: Youthful, Adult (base), Mature, Seniors

- Gender: Male (base), Female

Simple Model: Age + Gender

5 Parameters

Male Female

Youthful β0+­βY

β0+­βY­+­βF

Adult β0 β0+­βF

Mature β0+­βM

β0+­βM­+­βF

Seniors β0+­βS

β0+­βS­+­βF

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

28

America

Simple Factor Model

Predicted Values

0.500

0.700

0.900

1.100

1.300

1.500

1.700

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+Policyholder Age

Policyholder­Sex­(Male)

Policyholder­Sex­(Female)

Simple Model: Age + Gender

Relationship between males and females is a constant exp(βF) at each age.

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

29

America

Parameter Notation: Full Interaction

Male Female

Youthful β0+­βY

β0+­βY­+­βF

Adult β0 β0+­βF

Mature β0+­βM

β0+­βM­+­βF

Seniors β0+­βS

β0+­βS­+­βF

Male Female

Youthful β0+­βY β0+­βY­+­βF+­βYF

Adult β0 β0+­βF­

Mature β0+­βM β0+­βM­+­βF+­βMF

Seniors β0+­βS β0+­βS­+­βF+­βSF

Relationship between rating levels of one rating factor is different for all levels of another rating variable.

Assume two rating variables:

- Age: Youthful, Adult (base), Mature, Seniors

- Gender: Male (base), Female

No interaction: Age + Gender

Interaction: Age + Gender+Age.Gender

# Parameters: 5 # Parameters: 8

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

30

America

Predicted Values

0.500

0.700

0.900

1.100

1.300

1.500

1.700

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+Policyholder Age

Policyholder­Sex­(Male)

Policyholder­Sex­(Female)

Full Interaction Factor Model

Predicted Values

0.500

1.000

1.500

2.000

2.500

3.000

3.500

4.000

4.500

5.000

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+Policyholder Age

Policyholder­Sex­(Male)

Policyholder­Sex­(Female)

Full Interaction Model:

Age + Gender + Age.Gender

Relationship between males and females is a different at each age.

Simple Model:

Age + Gender

Relationship between males and females is a constant exp(βF) at each age.

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

31

America

Parameter Notation: Partial Interaction

Male Female

Youthful β0+­βY β0+­βY­+­βYF

Adult β0 β0

Mature β0+­βM β0+­βM­+­βMF

Seniors β0+­βS β0+­βS­+­βSF

Relationship between rating levels of one rating factor is different for all levels of another rating variable, except there is no difference at the base level of the other rating variable.

Assume two rating variables:

- Age: Youthful, Adult (base), Mature, Seniors

- Gender: Male (base), Female

Partial Interaction: Age + Age.Gender

Male Female

Youthful β0+­βY β0+­βY­+­βF+­βYF

Adult β0 β0+­βF­

Mature β0+­βM β0+­βM­+­βF+­βMF

Seniors β0+­βS β0+­βS­+­βF+­βSF

Interaction: Age + Gender + Age.Gender

# Parameters: 8 # Parameters: 7

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

32

America

Predicted Values

0.500

1.000

1.500

2.000

2.500

3.000

3.500

4.000

4.500

5.000

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+Policyholder Age

Policyholder­Sex­(Male)

Policyholder­Sex­(Female)

Partial Interaction Factor Model

Predicted Values

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

5.00

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+Policyholder Age

Policyholder­Sex­(Male)

Policyholder­Sex­(Female)

Partial Interaction Model:

Age + Age.Gender

Interaction adjusts for removal of the simple gender term, except at the base level (36-40).

Full Interaction Model:

Age + Gender + Age.Gender

Relationship between males and females is a different at each age.

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

33

America

Parameter Notation: Partial Interaction

Male Female

Youthful β0 β0+­βF­+­βYF

Adult β0 β0­+­βF­

Mature β0 β0­+­βF­+­βMF

Seniors β0 β0­+­βF­+­βSF

Relationship between rating levels of one rating factor is different for all levels of another rating variable, except there is no difference at the base level of the other rating variable.

Assume two rating variables:

- Age: Youthful, Adult (base), Mature, Seniors

- Gender: Male (base), Female

Partial Interaction: Gender + Age.Gender

Male Female

Youthful β0+­βY β0+­βY­+­βF+­βYF

Adult β0 β0+­βF­

Mature β0+­βM β0+­βM­+­βF+­βMF

Seniors β0+­βS β0+­βS­+­βF+­βSF

Interaction: Age + Gender + Age.Gender

# Parameters: 8 # Parameters: 5

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

34

America

Predicted Values

0.500

1.000

1.500

2.000

2.500

3.000

3.500

4.000

4.500

5.000

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+

Policyholder Age

Policyholder­Sex­(Male)

Policyholder­Sex­(Female)

Partial Interaction Factor Model

Partial Interaction Model:

Gender + Age.Gender

Predicted Values

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

5.00

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+

Policyholder Age

Policyholder­Sex­(Male)

P olicyholder­Sex­(Female)

Interaction adjusts for removal of the simple age term, except at the base level (male).

Full Interaction Model:

Age + Gender + Age.Gender

Relationship between males and females is a different at each age.

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

35

America

Parameter Notation: Interaction Only

Male Female

Youthful β0 β0+­βYF

Adult β0 β0

Mature β0 β0­+­βMF

Seniors β0 β0­+­βSF

Only variation allowed is at non-base levels.

Assume two rating variables:

- Age: Youthful, Adult (base), Mature, Seniors

- Gender: Male (base), Female

Interaction Only: Age.Gender

Male Female

Youthful β0+­βY β0+­βY­+­βF+­βYF

Adult β0 β0+­βF­

Mature β0+­βM β0+­βM­+­βF+­βMF

Seniors β0+­βS β0+­βS­+­βF+­βSF

Interaction: Age + Gender + Age.Gender

# Parameters: 8 # Parameters: 4

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

36

America

Predicted Values

0.50

0.60

0.70

0.80

0.90

1.00

1.10

1.20

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+Policyholder Age

Policyholder­Sex­(Male)

Policyholder­Sex­(Female)

Interaction Only Factor Model

Predicted Values

0.50

0.60

0.70

0.80

0.90

1.00

1.10

1.20

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+Policyholder Age

Policyholder­Sex­(Male)

Policyholder­Sex­(Female)

Interaction adjusts for removal of the simple terms, except at the base levels (male and 36-40).

Interaction adjusts for removal of the simple age term, except at the base level (male).

Partial Interaction Model:

Gender + Age.Gender

Interaction Only Model:

Age.Gender

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

37

America

Parameter Notation: Summaries

Male Female

Youthful β0 β0+­βYF

Adult β0 β0

Mature β0 β0­+­βMF

Seniors β0 β0­+­βSF

Interaction Only: Age.Gender

Male Female

Youthful β0+­βY β0+­βY­+­βF+­βYF

Adult β0 β0+­βF­

Mature β0+­βM β0+­βM­+­βF+­βMF

Seniors β0+­βS β0+­βS­+­βF+­βSF

Interaction: Age + Gender + Age.Gender

# Parameters: 8

# Parameters: 4

Male Female

Youthful β0 β0+­βF­+­βYF

Adult β0 β0­+­βF­

Mature β0 β0­+­βF­+­βMF

Seniors β0 β0­+­βF­+­βSF

Partial Interaction: Gender + Age.Gender

# Parameters: 5

Male Female

Youthful β0+­βY β0+­βY­+­βYF

Adult β0 β0

Mature β0+­βM β0+­βM­+­βMF

Seniors β0+­βS β0+­βS­+­βSF

Partial Interaction: Age + Age.Gender

# Parameters: 7

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

38

America

Simplifying Interactions

Age and Gender (Male at Base)

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+

Policyholder Age

0

10

20

30

40

50

Complex relationships can be simplified using curves, groups, etc.

Age and Gender Predicted Values

0.00005

0.00015

0.00025

0.00035

0.00045

0.00055

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+

Policyholder Age

0

10

20

30

40

503rd Degree Curve

4th Degree Curve Ages

Grouped

Males same as FemalesMale/Female

Relativity same for 21-24

- Age curve simplified with several curves and a grouping.

- Relationship between males and females simplified too. Male/Female

Relativity varies by Age

Model Structure

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

Definitions

Parameterization

Identification

Simplification

39

America

Testing Assumptions: Macro Residual Analysis

Normal Error Structure/Log Link (Studentized Standardized Deviance Residuals)

-8

-6

-4

-2

0

2

4

6

8

10

12

14

16

10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0 16.5Transformed Fitted Value

> 1

> 3

> 6

> 11

> 16

> 21

> 26

> 31

> 37

> 42

> 47

> 52

> 57

> 62

> 68

> 73

> 78

> 83

> 88

Gammar Error/Log Link (Studentized Standardized Dev iance Residuals)

-8

-6

-4

-2

0

2

4

6

8

10

12

14

16

10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0 16.5Transformed Fitted Value

> 1

> 2

> 4

> 6

> 8

> 9

> 11

> 13

> 15

> 16

> 18

> 20

> 22

> 24

> 25

> 27

> 31

- Asymmetrical appearance suggests power of variance function is too low

Plot of all residuals tests selected error structure/link function

- Elliptical pattern is ideal

Crunched Residuals (Group Size: 72)

-0.04

-0.03

-0.02

-0.01

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.00000 0.00005 0.00010 0.00015 0.00020 0.00025 0.00030 0.00035 0.00040 0.00045 0.00050 0.00055 0.00060 0.00065 0.00070

Fitted Value

- Two concentrations suggests two perils: split of use joint modeling

- Use crunched residuals for frequency

-3.5

-3.0

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

-1,000 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000

Fitted Value

Stu

den

tized

Sta

nd

ard

ized

Devia

nce R

esid

uals

> .2

> .5

> .7

> 1.4

> 2.2

> 2.9

> 3.6

> 4.3

> 5.0

> 5.7

> 6.5

> 7.2

> 7.9

> 8.6

> 9.3

> 10.1

> 10.8

> 11.5

> 12.2

> 12.9

> 13.6

> 14.4

> 15.1

> 15.8

> 16.5

> 17.2

> 17.9

> 18.7

> 19.4

> 20.1

> 20.8

> 21.5

> 22.3

> 23.0

> 23.7

> 24.4

> 25.1

> 25.8

> 26.6

> 27.3

Diagnostics

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

40

America

Testing Assumptions: Micro Residual Analysis

Largest 100 Studentized Standardized Deviance Residuals

-6

-4

-2

0

2

4

6

8

10

12

11.5 12 12.5 13 13.5 14 14.5 15 15.5 16 16.5

Transformed Fitted Value

Small Large

Good OK OK

Poor OK Problem

Examine largest residuals…

Influence

Fit

Largest 100 Cook's Deviance Residuals

0

0.005

0.01

0.015

0.02

0.025

12 12.5 13 13.5 14 14.5 15 15.5 16 16.5 17

Transformed Fitted Value

- Standardized deviance gives a measure of “fit” (performance)

- Cook’s deviance gives a measure of “influence”

“Problem” points may require further investigation

Diagnostics

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

41

America

Testing Predictiveness: Sampling

Training and Testing

Data

Training Data

Test Data

80%

20%

Model Structure and Parameters

Build

Test

OK

Not OK

Done

Diagnostics

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

42

America

Testing Predictiveness: Bootstrapping

Bootstrapping

Data

Training Data

Test Data

80% 20%

Model Structure

Build

TestModel Parameters

DoneOK

Not OK

Diagnostics

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

43

America

Testing Predictiveness: Gains Curve

Gains Curves

­ Order observations by fitted values (descending).

­ Plot cumulative fitted against cumulative weight.

­ High fitted values should correspond to high observed values.

­ Gini Coefficient : The larger coefficient implies greater predictiveness.

Fitted Number of Claims

Diagnostics

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

44

America

Summary

Predictive Models corrects for methodological flaws associated with traditional approaches

- Excludes unsystematic effect

- Corrects distributional bias

- Identifies and models response correlation

GLM Building Blocks can be refined to find the signals for stochastic processes

­ Link Functions can be adjusted to create non linear solutions

­ Error Distributions are expanded to understand the relationship between the uncertainty and prediction

­ Model Structures are flexible to reflect underlying process

GLM diagnostics aid to better decision making abilities

• Background

– Link Function

– Error Distribution

– Model Structure

• Summary

• GLM Building Blocks

• Diagnostics

America

Questions?


Recommended