Date post: | 07-Apr-2018 |
Category: |
Documents |
Upload: | lisa-newton |
View: | 214 times |
Download: | 0 times |
of 20
8/6/2019 02 San Francisco Guo 11 SOA
1/20
Data Mining Techniques &Its Applications in Insurance
Society of Actuaries
San Francisco Spring Meeting
June 24 - 26, 2002
Lijia Guo, PhD, ASA, MAAA
University of Central FloridaSession 11L
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 2
Learning Objectives
Understanding a Data Mining Process
Having insight about the actuarial
applications of data mining techniques
Exploring the perspective of applying data
mining techniques in your own practice
8/6/2019 02 San Francisco Guo 11 SOA
2/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 3
Agenda
Introduction
Data Mining Methods
Actuarial Applications
Conclusions & Questions
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 4
Introduction
Changes in Information Technology
Availability of large quantity of insurance
data
Mind your business by mining your data
8/6/2019 02 San Francisco Guo 11 SOA
3/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 5
What is Data Mining?
An information discovery process.
Prediction
-- Finding unknown values/relationships/patterns from
known large database
Description
-- interpretation of a large database
Making crucial business decisions - turn the
newfound knowledge into actionable results
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 6
Why Use Data Mining?
Product development
Marketing
Analysis of Claims Distribution
Healthcare ALM
Fraud detection
Solvency analysis
8/6/2019 02 San Francisco Guo 11 SOA
4/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 7
Data Mining Methods
Classification
Regression
Clustering
Summarizations
Dependency modeling
Deviation Detection
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 8
Data Mining Algorithms
Decision Trees (Breiman et al., 1984)
Logistic regression (Hosmer & Lemeshow,1989)
Neural Networks (Bishop, 1995; Ripley, 1996)
Fuzzy Logics
Genetic Algorithms (Goldberg, 1989)
Bayesian analysis, (Cheeseman et al., 1988)
Hybrid algorithms
8/6/2019 02 San Francisco Guo 11 SOA
5/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 9
Data Mining Algorithms
-- Decision Trees
What are decision trees
How decision trees work
Choosing variables
Grouping
Creating the leaf nodes of the tree
Strengths and weaknesses
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 10
Data Mining Algorithms-- Neural Networks
What are Neural Networks
How Neural Networks work
Processing elements Training
Predicting
Strengths and weaknesses
8/6/2019 02 San Francisco Guo 11 SOA
6/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 11
Data Mining Algorithms
-- Hybrid Algorithms
Problems with standard algorithms
Advanced algorithms
Discovery-driven approaches
Mixture of algorithms
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 12
Data Mining:Knowledge Discovery Process
Data Acquisition
Data integration
Data exploration
Model building
Understanding your model
Post-mining analysis
8/6/2019 02 San Francisco Guo 11 SOA
7/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 13
Data Mining Process: Data Acquisition
Data acquisition
Getting your data
Data qualification issues
Data quality issues
Data derivation
Defining a study Basic Risk Characteristics
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 14
Data Mining Process:Data Acquisition-- Case Study
SOA database for RP-2000 Mortality Tables
10,957,103 exposed life-years
Subset of the database that includes all the lives
above age 70 (3,769,956 exp, 217,490 death)
Risk groups
Age, gender, participation status, union, pay type,
collar type, and annuity amount, etc.
8/6/2019 02 San Francisco Guo 11 SOA
8/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 15
Data Mining Process:Data Acquisition
-- Case Study
Existing study on advanced-age mortality
Smooth extension of the patterns
Families of curves - Gompertz law, etc.
All these approaches aim at explaining the age
pattern of mortality.
Mortality distribution varies among seniors
with different backgrounds
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 16
Data Mining Process: Data Integration
To identify the factors that influence
mortality
To study the interaction of the risk factors
To gain the perspective on the importance
of these factors
8/6/2019 02 San Francisco Guo 11 SOA
9/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 17
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 18
Data Mining Process: Data Integration-- Case Study
Main effect exists for all six variables
considered
Degrees of the effects of the risk factors are
different. the interaction of these factors
the importance of the factors
8/6/2019 02 San Francisco Guo 11 SOA
10/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 19
Data Mining Process: Data exploration
Decision tree algorithm
Analyze the influences and the importance ofthe mortality risk factors
observations are grouped into several segments
Algorithm - SAS/Enterprise Miner Version4.2 (2001).
Further study the interaction and theimportance of the risk factors
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 20
Data Mining Process: Data Integration-- Case Study
Variable Importance Measure
Variable Importance
Participation Status 1.00
Gender 0.75
Annuity size 0.43
Pay Type 0.21
Union 0.18
Collar 0.00
8/6/2019 02 San Francisco Guo 11 SOA
11/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 21
Data Mining Process: Data exploration
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 22
Data Mining Process: Data exploration
8/6/2019 02 San Francisco Guo 11 SOA
12/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 23
Six risk groups:
Employees
Beneficiaries
Combined
Disabled
Male Retirees
Female Retirees. Logistic regression method
Data Mining Process: Model building
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 24
Data Mining Process: Model Building --Case Study: Female Retiree
8/6/2019 02 San Francisco Guo 11 SOA
13/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 25
Data Mining Process: Model Building
-- Case Study: Female Retiree Group
Collar and Pay Type are two important
variables
The interaction between Collar and Pay
Type does exist
Both annuity size and union are not
picked up by tree algorithm
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 26
Data Mining Process: Model Building-- Case Study: Female Retiree Group
R-square for the regression is 0.95
PTCPTCxxp
p046.000087.026.097.17
1log
2 ++=
=
collarmixed
collarblue
collarwhite
C
0047.0
0
0
=
typepaysalarized
typepayhourly
typepaycombined
PT
0
051.0
033.0
Wherep is the mortality rate,x is the age
8/6/2019 02 San Francisco Guo 11 SOA
14/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 27
Data Mining Process: Model Building -- Case Study: Female Retiree Group
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 28
Data Mining Process: Model Building-- Case Study: Male Retiree Group
R-square for the regression is 0.92
Wherep is the mortality rate,x is the age
SUUSxxp
p ++=
200055.020.057.141
log
=
annuitysmall
annuitymedian
annuityel
S
0074.0
060.0
arg044.0
=combined
membeunionnon
memberunion
U
040.0
14.0
0
8/6/2019 02 San Francisco Guo 11 SOA
15/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 29
Data Mining Process: SEMMA
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 30
Data Mining Process: Model Building -- Case Study: MaleRetiree
8/6/2019 02 San Francisco Guo 11 SOA
16/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 31
Data Mining Process: Post-mining Analysis -- Case Study
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 32
Data Mining Process: Understandingyour model Case Study
The male retirees mortality model and the female
retirees mortality model depend on different
variables
Mortality of the beneficiaries is determined by
gender, annuity size, the pay type, and theirinteractions
The gender factors will play a much-reduced role
in determining beneficiaries mortality model
8/6/2019 02 San Francisco Guo 11 SOA
17/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 33
Data Mining Process: Post-mining
Analysis -- Case Study
Limited results on the mortality distribution for
the ages above 95
As the female demography changed in the past
three decade, variables such as annuity size, and
union will play more important role in
determining the female mortality
Other risk factors such as education, life style,smoking/non-smoking, etc.
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 34
Data Mining Process: Summary-- Case Study
Non Gompertz (linear growth) between age
70 and 85
Selection of the risk factors may influence
the quality of the mortality model Mortality models varies with the most
important risk factor (the participating
status, in this study) among all the other
variables
8/6/2019 02 San Francisco Guo 11 SOA
18/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 35
Data Mining Process:
-- Case Study in Claim Analysis
Basic risk characteristics
Top-down identification
Underlying statistical properties
Domain-specific constraints
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 36
Data Mining Process:-- Case Study in ALM
Decision tree and DNF learning
Generative stochastic modeling
Probabilistic networks Probabilistic Rules
Hidden Markov model
8/6/2019 02 San Francisco Guo 11 SOA
19/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 37
Data Mining Process:
-- Applications in Healthcare
More productive managed care program
Pricing
Individual health insurance market
Recovery & prevention of fraudulent claims
Prescription Drugs cost management
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 38
Quiz on Data mining
What is Data Mining?
What can data mining do?
What are data mining techniques? What are the applications of data mining?
How can you practice on data mining?
8/6/2019 02 San Francisco Guo 11 SOA
20/20
SO A San F rancisco Spring MeetingJune 24-26, 2002
Slide 39
Summary
Overview of data mining techniques
Its application to actuarial practice
Future developments
Potential contribution to your area