Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring
and Covariates
David SpeightsSenior Research Statistician
HNC Insurance Solutions
Irvine, California
Session CPP-53
Presentation Outline
• Introduction to Neural Networks
• Introduction to Survival Analysis
• Neural Networks with Right Censoring
• Simulated Example
• Predicting Claim Duration
Introduction to Neural NetworksMotivation
• Complex Classification– Character Recognition
– Voice Recognition
• Humans have no trouble with these concepts– We can read even distorted documents
– We can recognize voices over poor telephone lines.
• Attempt to model human brain
Introduction to Neural NetworksConnection to Brain Functionality
• Brain – made up of millions of neurons sending signals to the
body and each other
• Neural Networks – collection of “neurons” which send “signals” to
produce an output
Introduction to Neural NetworksCommon Representation
. . .
. . .
X1 X2 XP
Y
P predictors (inputs)
1 Hidden Layer with M Neurons
1 output
1 2 M
Introduction to Neural Networks Architecture of the ith Neuron
Represents a neuron in the brain
X1
X2XP
...
O=bi0 + bi1X1 + … + bipXp
s(O)
S is a function on the interval (0,1) representing the strength of the output
0
1
s
O
Activation Function
Introduction to Neural Networks Connection to Multiple Regressions
• Similarities– Both describe relationships between variables
– Both can create predictions
• Differences– Function describing the relationships is more complex
– Response variables are typically called outputs
– Predictor variables are typically called inputs
– Estimating the parameters is usually called training
Introduction to Neural NetworksFunctional Representation
Y = f(X1, …, Xp) + error
• Multiple Linear Regression – f() = linear combination of regressors– Forced to model only specified relationships
• Neural Network– f() = nonlinear combination of regressors– Can deal with nonlinearities and interactions without special
designation
Introduction to Neural NetworksFunctional Specification
• For a neural network f() is written
• Here g and s are transformation functions specified in advance
))((),...,(
Equation RegressionLinear Multiple
10
101
p
kkjk
M
jjp
XsgXXf
Introduction to Survival AnalysisWhat is Survival Analysis
• Used to model time to event data (example: time until a claim ends)
• Usually represented by (1) right skewed data (2) multiplicative error structure (3) right censoring
• Common in cancer clinical trials, component failure analysis, and AIDS data analysis among other examples
Introduction to Survival AnalysisNotation
• T1, ..., Tn - independent failure times with distribution F
and density function f
• C1, ..., Cn - independent censoring times with distribution
G and density function g
• Yi = min(Ti,Ci) - observed time
• i = I(Yi = Ti) - Censoring indicator
• Xi = (Xi1, ..., Xip) - vector of known covariates
associated with the ith individual
Introduction to Survival Analysis Likelihood Analysis (Parametric Models)
• (Yi, i, Xi) i=1, …, n , independent observations
• Likelihood written
n
iiiiXYfL
1
)|,()(
• f(Y,|X)=[f (Y|X)(1-G(Y|X))][g(Y|X)(1-F (Y|X))]
n
i
i
ii
i
iiXYFXYfLL
1
1
2))|(1()|()(
• Here L2 does not depend on
Neural Networks with Right CensoringModel Specification
• Neural Network Model
• Here has distribution function F and density f• = {0, …, p, 1, …, p}
• The likelihood isi
iin
i
i
iixmlpT
FxmlpT
fLL
1),()log(
1),()log(1
),(1
2
),(
)'()log(1
0
xmlp
xsTM
jjj
Neural Networks with Right Censoring Fitting Neural Networks without Censoring
• estimated by minimizing squared error
n
iii
n
iii
xCxmlpYC11
2),(),()log()(
n
i
ii
n
i
ixmlpiY
xmlpYnL
eL
1
2
2
1
2),()log(
21
2
),()log(21
)2log(2
)),(log(
21
),(
• Ifis normal minimizing squared error same as maximizing the likelihood.
Neural Networks with Right CensoringFitting Neural Networks without Censoring
• Gradient decent algorithm for estimating ),(
1:1:: iijiijijxC
• Algorithm updated at each observation• is known as the learning rate
• j:0=j-1:n
• Known as back-propagation algorithm• To generalize to right censored data, replace C() with
the likelihood for censored neural networks.
Neural Networks with Right CensoringFitting Neural Networks with Censoring
• Step 1 - Estimating – Fix and pass through data once using
• Step 2 - Estimating – fix at end of pass through data
– iterate until |j-j-1|<using Newton-Raphson algorithm
),( 1:1:: jijiijij
L
),(
),(
1
2
1
1
j
j
jj L
L
Neural Networks with Right CensoringFitting Neural Networks with Censoring
• With highly parameterized neural networks we risk over fitting
• We need to design the fitting procedure to find a good fit to the data
Neural Networks with Right CensoringFitting Neural Networks with Censoring
• The negative of the likelihood is calculated on both sets of data at the same time.
Negative Likelihood
75% Training Data 25% Testing Data
Parameter Estimates
Training Cycles Training Cycles
Neural Networks with Right CensoringFitting Neural Networks with Censoring
• Potential drawbacks to neural networks– Hard to tell the individual effects of each predictor
variable on the response.
– Can have poor extrapolation properties
• Potential Gains from neural networks– Can reduce preliminary analysis in modeling
• discovery of interactions and nonlinear relationships becomes automatic
– Increases predictive power of models
Neural Networks with Right CensoringFitting Neural Networks with Censoring
• True Time Model : log(t) = x2 + 0.5• Censoring Model: log(c) = 0.25 + x2 + 0.5• x ~ U(-3,3)
• ~ N(0,1)
• Censored if c < t
• ~ 35% censoring
• 3 node neural network fit
Simulated Example
log P
redic
tion/
Actua
l
-2
-1
0
1
2
3
4
5
6
7
8
9
10
X-3 -2 -1 0 1 2 3
• Scatter are true times versus x
• Solid line represents NN fit to data
Simulated Example
Predicting Claim Duration
• Predictor Variables– NCCI Codes
• Body Part Code
• Injury Type
• Nature of Injury
• Industry Class Code
– Demographic Information• Age
• Gender
• Weekly Wage
• Zip Code
• Response Variable– Time from report until the
claim is closed
Predicting Claim Duration
• Ratio of prediction to actual duration on log10 scale
• Difficult to represent open claim results
Open Claim Closed Claim
Conclusions
• Provides an intuitive method to address right censored data with a neural network
• Allows for more flexible mean function
• Can be used with many time to event data situations