Session 124TS, A Practical Guide to Machine Learning for Actuaries
Presenters: Dave M. Liner, FSA, MAAA, CERA
SOA Antitrust Disclaimer SOA Presentation Disclaimer
27 JUNE 2018
Dave Liner
A practical guide to machine learning for actuaries
Rank 2009 2010 2011 2012 2013 2014 2015 2016 2017123456789
1011
CareerCast top job rankings by yearActuary
Rank 2009 2010 2011 2012 2013 2014 2015 2016 2017123456789
1011
CareerCast top job rankings by yearActuary
Data Scientist
Statistician
9
12
What I learned about machine learning
1. Machine learning drives disruption and innovation in many sectors globally
2. Many actuaries already do most of the machine learning process
3. Most popular machine learning methods are based on concepts that actuaries already understand
4. There is a staggering amount of free resources to develop machine learning skills
13
Actuaries already do most of the machine learning process
1. Get data2. Prepare data3. Build model4. Use model to gain insight5. Tell others about model results
14
We will review five machine learning methods today
k-Nearest neighborsk-Means clusteringDecision treesLinear regressionLogistic regressionNeural networks
Illustrative dataset
15
16
X1 = Petal lengthX2 = Petal widthY = Species
17
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
Setosa
Versicolor
Virginica
It is easier to illustrate methods using 2-dimensional data
k-Nearest neighbors
18
19
What is the species of a new sample based on Petal width and length using kNN?
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
Setosa
Virginica
Versicolor
20
What is the species of a new sample based on petal width and length using kNN?
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
Setosa
Virginica
Versicolor
21
Four lines of Python code gets same result as Excel kNN file
import sklearn.datasets as ds, sklearn.model_selection as ms, sklearn.preprocessing as pp, sklearn.neighbors, sklearn.metrics
X_train,X_test,y_train,y_test = ms.train_test_split(pp.scale(ds.load_iris().data),ds.load_iris().target,test_size=30,random_state=0)
y_pred_kNN = sklearn.neighbors.KNeighborsClassifier(n_neighbors=3).fit(X_train,y_train).predict(X_test)
print(sklearn.metrics.confusion_matrix(y_test, y_pred_kNN))
k-Means clustering
22
23
What is the species of a new sample based on Petal width and length using clustering?
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
Unsupervised learning does not require labels (Y)
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length24
Step 1: select random centroids (k=3 in this case)
25
Step 2: move centroids based on data
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
26
Step 3: keep moving centroids until no points are reassigned
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
27
Step 4: assign each point to a centroid cluster
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
28
Step 4: assign each point to a centroid cluster
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
Setosa
Virginica
Versicolor
29
Step 5: use clusters to assign new data points
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
Setosa
Virginica
Versicolor
30
Step 5: use clusters to assign new data points
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
Setosa
Virginica
Versicolor
31
Step 5: use clusters to assign new data points
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
Setosa
Virginica
Versicolor
Decision trees
32
33
34
Decision trees provide another method for classification
Setosa VirginicaVersicolor
Root Node
Decision Node
Leaf Nodes
35
Decision trees provide another method for classification
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
Setosa
Versicolor
Virginica
36
Decision trees provide another method for classification
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
Setosa
Versicolor
Virginica
37
Decision trees provide another method for classification
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
Setosa
Versicolor
Virginica
38
Decision trees provide another method for classification
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
Setosa
Versicolor
Virginica
39
Decision trees provide another method for classification
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
Setosa
Versicolor
Virginica
40
Decision trees provide another method for classification
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
Setosa
Versicolor
Virginica
Logistic regression
41
42
Linear regression uses line of best fit to make numerical predictions
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
43
Logistic regression is a classification algorithm
0.0
0.5
1.0
0.0 0.5 1.0 1.5 2.0 2.5
Is it Virginica?
Petal Width
Setosa
Versicolor
Virginica
Yes
No
44
Is it virginica?
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
Setosa
Versicolor
Virginica
0.0
0.5
1.0
0.0 0.5 1.0 1.5 2.0 2.5
Virginica?
Petal Width
Yes
No
45
Is it virginica?
0.0
0.5
1.0
1.5
2.0
2.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Petal Width
Petal Length
Setosa
Versicolor
Virginica
0.0
0.5
1.0
0.0 0.5 1.0 1.5 2.0 2.5
Virginica?
Petal Width
Yes
No
Neural networks
46
47
Neural can model non-linear situations
Setosa
Versicolor
Virginica
Sepal Length
Petal Width
Sepal Width
Petal Length
Activation Functions
Input Hidden Output
48
Neural NetworksInput Layer Hidden Layer Output Layer
49
Neural NetworksInput Layer Hidden Layer Output Layer
50
Neural NetworksInput Layer Hidden Layer Output Layer
51
Neural NetworksInput Layer Hidden Layer Output Layer
52
Summary of methods
Unsupervised Deep Regression
k-Nearest Neighbors
k-Means Clustering
Decision Trees
Linear Regression
Logistic Regression
Neural Networks
Next steps
53
55
Possible machine learning applications in healthcare include
Predicting non-adherent drug event before it happensPredict opioid drug abuse before it happensEstimate member persistency based on member and dependent characteristicsProject medical costs using personal and clinical dataDevelop clinical best-practices by linking clinical and financial data