+ All Categories
Home > Documents > Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS,...

Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS,...

Date post: 17-Jan-2016
Category:
Upload: cora-washington
View: 221 times
Download: 0 times
Share this document with a friend
Popular Tags:
49
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. [email protected] www.data-mines.com
Transcript
Page 1: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Dimension Reduction in Workers Compensation

CAS predictive Modeling SeminarLouise Francis, FCAS, MAAA

Francis Analytics and Actuarial Data Mining, [email protected]

www.data-mines.com

Page 2: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Objectives• Answer questions: What is

dimension reduction and why use it?• Introduce key methods of dimension

reduction• Illustrate with examples in Workers

Compensation• There will be some formulas, but

emphasis is on insight into basic mechanisms of the procedures

Page 3: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Introduction• “How do mere observations become data

for analysis?”• “Specific variable values are never

immutable characteristics of the data”• Jacoby, Data Theory and Dimension Analysis, Sage

Publications

• Many of the dimension reduction/measurement techniques originated in the social sciences and dealt with how to create scales from responses on attitudinal and opinion surveys

Page 4: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Unsupervised learning• Dimension reduction methods

generally unsupervised learning

• Supervised Learning• A dependent or target variable

• Unsupervised learning • No target variable• Group like variables or like records

together

Page 5: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

The Data• BLS Economic indexes

• Components of inflation• Employment data• Health insurance inflation

• Texas Department of Insurance closed claim data for 2002 and 2003• Employment related injury• Excludes small claims• About 1800 records

Page 6: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

What is a dimension?

• Jacoby – The number of separate and interesting sources of variation

• In many studies each variable is a dimension

• However, we can also view each record in a database as a dimension

Page 7: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Dimensions

Year Medical Csre MedicalServices Transportation Electricity

1980 74.90$ 74.80$ 83.10$ 26.70$ 1981 82.90 82.80 93.20 31.55 1982 92.50 92.60 97.00 36.01 1983 100.60 100.70 99.30 37.18 1984 106.80 106.70 103.70 38.60 1985 113.50 113.20 106.40 38.98 1986 122.00 121.90 102.30 40.22 1987 130.10 130.00 105.40 40.02 1988 138.60 138.30 108.70 40.20 1989 149.30 148.90 114.10 40.83 1990 162.80 162.70 120.50 41.66

Page 8: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

The Two Major Categories of Dimension Reduction

• Variable reduction• Factor Analysis• Principal Components Analysis

• Record reduction• Clustering

• Other methods tend to be developments on these

Page 9: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Principal Components Analysis

• A form of dimension (variable) reduction• Suppose we want to combine all the

information related to the “inflation” dimension of insurance costs• Medical care costs• Employment (wage) costs• Other

• Energy• Transportation• Services

Page 10: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Principal Components

• These variables are correlated but not perfectly correlated

• We replace many variables with a weighted sum of the variables

• These are then used as independent variables in a predictive model

Page 11: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Factor Analysis: A Latent Factor

Subtitle

9/12/2005

litigation rates

Subtitle

9/12/2005

# Procedures

Subtitle

9/12/2005

Index of tort climate

Social Inflation

Page 12: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Factor/Principal Components Analysis

• Linear methods – use linear correlation matrix

• Correlation matrix decomposed to find smaller number of factors the are related to the same underlying drivers

• Highly correlated variables tend to have high load on the same factor

Page 13: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Factor/Principal Components Analysis

Medical Care MedicalServices Transportation Electricity Utility Fuel Oil Gas BreadMedical Care 1.000MedicalServices 1.000 1.000Transportation 0.993 0.992 1.000Electricity 0.888 0.884 0.910 1.000Utility 0.872 0.873 0.875 0.771 1.000Fuel Oil 0.448 0.451 0.468 0.281 0.704 1.000Gas 0.586 0.592 0.601 0.402 0.752 0.926 1.000Bread 0.983 0.983 0.975 0.844 0.847 0.459 0.595 1.000

Page 14: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Factor/Principal Components Analysis

•Uses eignevectors and eigenvalues•R is correlation matrix, V eigenvectors, lambda eigenvalues

VRV

Page 15: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Inflation Data

Component Matrixa

.986 -.086

.986 -.081

.990 -.073

.895 -.205

.877 .303

.551 .761

.709 .639

.973 -.078

.587 .337

.766 .077

.457 -.644

.967 -.202

-.695 .521

.986 -.048

Medical Care

MedicalServices

Transportation

Electricity

Utility

Fuel Oil

Gas

Bread

Eggs

Apples

Coffee

Employment

UEP

EmpCost

1 2

Component

Extraction Method: Principal Component Analysis.

2 components extracted.a.

Page 16: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Factor Rotation• Find simpler more easily

interpretable factors

• Use notion of factor complexity

rowfor factor on loading

mean is b j,fcator on i variableof

loading is b factors, ofnumber is

)(1

ij

ij

222

r

bbr

qr

iijiji

Page 17: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Factor Rotation

• Quartimax Rotation• Maximize q

• Varimax Rotation• Maximizes the variance of squared

loadings for each factor rather than for each variable

Page 18: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Varimax Rotation

Rotated Component Matrixa

.834 .533

.831 .537

.829 .546

.835 .383

.510 .775

-.028 .939

.172 .939

.818 .532

.260 .625

.560 .529

.755 -.232

.890 .429

-.869 -.011

.811 .563

Medical Care

MedicalServices

Transportation

Electricity

Utility

Fuel Oil

Gas

Bread

Eggs

Apples

Coffee

Employment

UEP

EmpCost

1 2

Component

Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization.

Rotation converged in 3 iterations.a.

Page 19: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Plot of Loadings on Factors

Page 20: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

How Many Factors to Keep?•Eigenvalues provide information on how much variance is explained•Proportion explained by a given component=corresponding eigenvalue/n•Use Scree Plot•Rule of thumb: keep all factors with eigenvalues>1

Page 21: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Page 22: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

WC Severity vs Factor 1

Page 23: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

WC Severity vs Factor 2

Page 24: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

What About Categorical Data?• Factor analysis is performed on

numeric data• You could code data as binary

dummy variables • Categorical Variables from Texas

data• Injury• Cause of loss• Business Class• Health Insurance (Y/N)

Page 25: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Optimal Scaling

• A method of dealing with categorical variables

• Can be used to model nonlinear relationships

• Uses regression to • Assign numbers to categories• Fit regression coefficients• Y*=f(X*)

• In each round of fitting, a new Y* and X* is created

Page 26: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Variable Correlations

Correlations Original Variables

1.000 -.019 .049

-.019 1.000 .105

.049 .105 1.000

1 2 3

1.109 1.014 .877

injury

cause

Business class

Dimension

Eigenvalue

injury causeBusiness

class

Correlations Transformed Variables

Dimension: 1

1.000 .710 .433

.710 1.000 .552

.433 .552 1.000

1 2 3

2.138 .590 .272

injury

cause

Business class

Dimension

Eigenvalue

injury causeBusiness

class

Page 27: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Visualizations of Scaled Variables

Page 28: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Can we use scaled variables in prediction?

Average Paid LossNtile of Optimal Score First Score Second Score

1 294,305 163,736 2 270,763 188,733 3 233,056 206,497 4 151,455 261,773 5 147,751 277,389

Page 29: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Tree Using Optimal Scaling Scores

Page 30: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Tree for Subrogation

Page 31: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Row Reduction: Cluster Analysis• Records are grouped in categories that

have similar values on the variables• Examples

• Marketing: People with similar values on demographic variables (i.e., age, gender, income) may be grouped together for marketing

• Text analysis: Use words that tend to occur together to classify documents

• Fraud modeling• Territory definition

• Note: no dependent variable used in analysis

Page 32: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

ClusteringClustering• Common Method: k-means,

hierarchical

• No dependent variable – records are grouped into classes with similar values on the variable

• Start with a measure of similarity or dissimilarity

• Maximize dissimilarity between members of different clusters

Page 33: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Dissimilarity (Distance) Measure – Dissimilarity (Distance) Measure – Continuous VariablesContinuous Variables

•Euclidian Distance

•Manhattan Distance

1/ 22

1( ) i, j = records k=variable

mij ik jkkd x x

1

mij ik jkkd x x

Page 34: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Binary Variables

Row Variable1 0

0 a b a+b1 c d c+d

a+c b+dCo

lum

n

Var

iab

le

Page 35: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Binary Variables

• Sample Matching

• Rogers and Tanimoto

b cd

a b c d

2( )( ) 2( )

b cd

a d b c

Page 36: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Example: Texas Data

• Data from 2002 and 2003 closed claim database by Texas Ins Dept

• Only claims over a threshold included• Variables used for clustering:

• Report Lag• Settlement Lag• County (ranked by how often in data)• Injury• Cause of Loss• Business class

Page 37: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Results Using Only Numeric Variables

• Used Euclidian distance measureFinal Cluster Centers

10.741 5.158 14.500

25.155 7.342 53.000

11.204 4.553 14.000

40.67 42.26 63.00

233 8264 13893

391 7439 13843

172 0 14627

.39 .00 .00

.35 .00 .00

RANK of NCounty

RANK of SumLoss

RANK of numSuit

age

Elapsed time betweendate of injury and datereported to insurer

Elapsed time betweendate of injury and datesuit filed

Elapsed time betweendate of injury and dateof trial

BackInj

MultInj

1 2 3

Cluster

Page 38: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Two Stage Clustering With Categorical Variables

• First compute dissimilarity measures

• Then get clusters

• Find optimum number of clusters

Page 39: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Page 40: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Loadings of Injuries on Cluster

Page 41: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Age and Cluster

Page 42: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

County vs Cluster

Page 43: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Means of Financial Variables by Cluster

Average of Financial Variables by Cluster

Mean

257,111.7426 38,831.05

78,186.5918 53,273.24

263,851.2863 57,535.26

174,739.1995 25,522.39

219,854.6705 38,853.73

TwoStep Cluster Number1

2

3

4

Total

Paidloss

Totalallocated loss

adjustmentexpense

Page 44: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Tying Things Together: Multidimensional Scaling

• A mathematical way to connect clustering and factor analysis

• Data can be decomposed into key row dimensions times a diagonal weight matrix times key column dimensions

kkTkk VDUX ˆ

Page 45: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Modern dimension reduction• Hidden layer in neural networks like

a nonlinear principle components• Projection Pursuit Regression – a

nonlinear PCA• Kahonen self-organizing maps – a

kind of neural network that does clustering

• These can be understood as enhancements factor analysis or clustering

Page 46: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Page 47: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Kahonen SOM for Fraud

1 4 7 10 13 16

S1

S4

S7

S10

S13

S16

4-5

3-4

2-3

1-2

0-1

Page 48: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Recommended References

• Hacher, 1994, A Step-by-Step Approach for Using the SAS System for Factor Ananlysis and Structural Equation Modeling, SAS Publications

• Jacoby, 1991, Data Theory and Dimension Analysis, Sage Publications

• Kaufman and Rousseeuw,1990, Finding Groups in Data, Wiley

• Kim and Mueller, 1978, Factor Analysis: Statistical Methods and Practical Issues, Sage Publications

Page 49: Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Questions?


Recommended