trees forests and tensorflow - Meetupfiles.meetup.com/18648807/trees.pdffort lauderdale machine...

trees forests and tensorflow

fort lauderdale machine learning meetup18-May-2016

Andy Catlin• An enthusiastic student,

enjoying Thomas Quintana’s ongoing lecture series on Google’s TensorFlow.

• A data science teacher, mentor, and coach. My focus areas are recommender systems and collective intelligence.

• An entrepreneur – my teams helped build out the analytics infrastructure for the National Football League and several NFL teams.

Above: Curriculum for City University of New York’s Online Masters Degree in Data Analytics program, where I am the lead faculty member.

[email protected]

classificationand

regression treesSource: G

areth James, R

obert Tibshirani, and Trevor H

astie, Introduction to Statistical Learning, Springer.

2013.

sources / resources / learning roadmap

Resource Math? Videos? Code?Leo Breiman, ‘Statistical Modeling: The Two Cultures, Institute of Mathematical Statistics,” 2001. See also “What’s the difference between machine learning, statistics, and data mining?,” http://www.sharpsightlabs.com/difference-machine-learning-statistics-data-mining/, Sharp Sight Labs, 2016.

Little No No

Scott Foreman, “Understanding the Bias/Variance Tradeoff,” http://scott.fortmann-roe.com/docs/BiasVariance.html, June 2012.

Little No No

Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie, Elements of Statistical Learning, freely downloadable here: https://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLII_print4.pdf

Lots No No

Josh Gordon (Google), Machine Learning for Developers, YouTube video series: https://www.youtube.com/watch?v=cKxRvEZd3Mw&list=PLOU2XLYxmsIIuiBfYad6rFYQU_jL2ryal&index=4

Little Yes Python

Joel Grus, Data Science from Scratch, O’Reilly. 2015. Some No PythonGareth James, Robert Tibshirani, and Trevor Hastie, Introduction to Statistical Learning, Springer. 2013. Freely downloadable at http://www-bcf.usc.edu/~gareth/ISL/. Excellent videos in edx course, and archived here: http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/.

Some Yes R

Victor Lavrenko, Decision Trees, YouTube video series, http://bit.ly/D-Tree . Part of Introductory Applied Machine Learning course at University of Edinburgh.

Some Yes No

Kevin Markham, “ROC curves and Area Under the Curve explained,” http://www.dataschool.io/roc-curves-and-auc-explained/. See also “Understanding ROC Curves,” http://www.navan.name/roc/. and “Comparing supervised learning algorithms,” http://www.dataschool.io/comparing-supervised-learning-algorithms/

Little Yes No

Foster Provost and Tom Fawcett, Data Science for Business, O’Reilly. 2013. Little No NoSebastian Raschka, Python Machine Learning, Packt. 2015. See also his blog post “When Does Deep Learning Work Better Than SVMs or Random Forests?,” http://www.kdnuggets.com/2016/04/deep-learning-vs-svm-random-forest.html.

Lots No Yes

Wesleyan University, Machine Learning for Data Analysis. https://www.coursera.org/learn/machine-learning-data-analysis. Freely available Python-based Coursera course. Part of a five course specialization.

Little Yes Python

http://www.sharpsightlabs.com/difference-machine-learning-statistics-data-mining/

http://scott.fortmann-roe.com/docs/BiasVariance.html

https://web.stanford.edu/%7Ehastie/local.ftp/Springer/OLD/ESLII_print4.pdf

https://www.youtube.com/watch?v=cKxRvEZd3Mw&list=PLOU2XLYxmsIIuiBfYad6rFYQU_jL2ryal&index=4

http://www-bcf.usc.edu/%7Egareth/ISL/

http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/

http://bit.ly/D-Tree

http://www.dataschool.io/roc-curves-and-auc-explained/

http://www.navan.name/roc/

http://www.dataschool.io/comparing-supervised-learning-algorithms/

http://www.kdnuggets.com/2016/04/deep-learning-vs-svm-random-forest.html

https://www.coursera.org/learn/machine-learning-data-analysis

goal of this talk

• To help you understand when to use, how to build, and how to tune decision trees and random forests

outline of this talk

• problem statement: “hiring analytics”• decision trees and random forests• trees into tensorflow?

hiring analytics

features and labels

• What does it mean for an NFL draft pick (“hire”) to have been successful?

• What are the features that matter most in selecting a player?

http://ww

w.nfl.com

/combine/top-perform

ers

http://www.nfl.com/combine/top-performers

y = f(X)candidate featuresplayerid 39408, 39412, …position QB, WR, DL, OL, …history of knee injuries Yes, No40 yard dash time 4.24, 4.31, 4.67, …wonderlic score 0..50“good citizen" Yes, No

candidate labelsprobowl first five years? Yes, Noyears in league? 3.1, 5.2, …

http://wonderlictestsam

ple.com/nfl-w

onderlic-scores/https://en.w

ikipedia.org/wiki/W

onderlic_testG

reen Bay’s Mike Eayrs’ w

as probably NFL’s first data

scientist: http://ww

w.baselinem

ag.com/c/a/Projects-

Managem

ent/Green-Bay-Packers-Reel-Tim

e

http://wonderlictestsample.com/nfl-wonderlic-scores/

https://en.wikipedia.org/wiki/Wonderlic_test

http://www.baselinemag.com/c/a/Projects-Management/Green-Bay-Packers-Reel-Time

Will this player be selected for the Pro Bowl?featuresplayerid 39408, 39412, …position QB, WR, DL, OL, …history of knee injuries No = 0, Yes = 140 yard dash time Slow = 0, Medium = 1, Fast = 2wonderlic score Normal= 0, Smart = 1“good citizen" Yes, No

labelsprobowl first five years? No = 0, Yes = 1years in league? 3.1, 5.2, …

Which attribute m

atters most

(1/2)?

Player 40yard Wonderlic KneeInjury ProBowl1 Medium Smart No No2 Medium Smart Yes No3 Fast Smart No Yes4 Fast Smart No Yes5 Fast Normal No Yes6 Slow Normal Yes No7 Fast Normal Yes Yes8 Medium Smart No No9 Fast Normal No Yes

10 Fast Normal No Yes11 Fast Normal Yes Yes12 Fast Smart Yes Yes13 Fast Normal No Yes14 Low Smart Yes No

New Fast Smart Yes ?

Which attribute m

atters most

(2/2)?

Player 40yard Wonderlic KneeInjury ProBowl1 Fast Smart No No2 Fast Smart Yes No3 Medium Smart No Yes4 Slow Smart No Yes5 Slow Normal No Yes6 Slow Normal Yes No7 Medium Normal Yes Yes8 Fast Smart No No9 Fast Normal No Yes

10 Slow Normal No Yes11 Fast Normal Yes Yes12 Medium Smart Yes Yes13 Medium Normal No Yes14 Low Smart Yes No

New Fast Smart Yes ?

40yard X[0]

WonderlicX[1]

KneeInjuryX[2] ProBowl

2 1 0 02 1 1 01 1 0 10 1 0 10 0 0 10 0 1 01 0 1 12 1 0 02 0 0 10 0 0 12 0 1 11 1 1 11 0 0 10 1 1 0

Player40yard

X[0]Wonderlic

X[1]KneeInjury

X[2] ProBowl

New Medium High No ?

40yard X[0]

WonderlicX[1]

KneeInjuryX[2] ProBowl

2 1 0 02 1 1 01 1 0 10 1 0 10 0 0 10 0 1 01 0 1 12 1 0 02 0 0 10 0 0 12 0 1 11 1 1 11 0 0 10 1 1 0

Player 40yard Wonderlic KneeInjury ProBowl

New 2 1 0 ?

id3 algorithm

Best name ever?...Iterative Dichotomiser 3. Source: Joel Grus, Data Science from Scratch, O’Reilly. 2015. See also https://en.wikipedia.org/wiki/ID3_algorithm.

https://en.wikipedia.org/wiki/ID3_algorithm

entropySource: Foster Provost and Tom

Fawcett, D

ata Science

for Business, O

’Reilly. 2013. N

ote that scikitlearn uses cart algorithm

instead of id3; cart uses giniimpurity

instead of entropy—see also

https://en.wikipedia.org/w

iki/Decision_tree_learning

https://en.wikipedia.org/wiki/Decision_tree_learning

information gain

Source: Foster Provost and Tom Fawcett, Data Science for Business, O’Reilly. 2013. See also https://en.wikipedia.org/wiki/Claude_Shannon

https://en.wikipedia.org/wiki/Claude_Shannon

from collections import Counter, defaultdictfrom functools import partialimport math, random

def entropy(class_probabilities):"""given a list of class probabilities, compute the entropy"""return sum(-p * math.log(p, 2) for p in class_probabilities if p)

def class_probabilities(labels):total_count = len(labels)return [count / total_count

for count in Counter(labels).values()]

Source: Joel Grus, D

ata Science from

Scratch, O

’Reilly. 2015

def data_entropy(labeled_data):labels = [label for _, label in labeled_data]probabilities = class_probabilities(labels)return entropy(probabilities)

def partition_entropy(subsets):"""find the entropy from this partition of data into subsets"""total_count = sum(len(subset) for subset in subsets)

return sum( data_entropy(subset) * len(subset) / total_countfor subset in subsets )

Source: Joel Grus, D

ata Science from

Scratch, O

’Reilly. 2015

accuracy

Suppose you took the decision tree model that we built from 14 NFL players, then used the model to predict whether members of the next year’s college draft group would go on to play in the Pro Bowl.

• How accurate would your model be? • How should you best measure your model’s accuracy?

Source: Gareth Jam

es, Robert Tibshirani, and Trevor

Hastie, Introduction to S

tatistical Learning, Springer. 2013.

Source: Gareth Jam

es, Robert Tibshirani, and Trevor

Hastie, Introduction to S

tatistical Learning, Springer. 2013.

As the flexibility of f-hat [the estimate for the labelled response variable] increases, its variance increases and its bias decreases.

•Variance refers to the amount by which f-hat (our estimate for y) would change if we estimated it using a different training data set.•Bias refers to the error that is introduced by approximating a real life problem, which may be extremely complicated, by a much simpler model.

bias variance tradeoff

Source: Scott Foreman, “U

nderstanding the Bias/Variance Tradeoff,” http://scott.fortm

ann-roe.com

/docs/BiasVariance.html, June 2012.

bias variance tradeoff

http://scott.fortmann-roe.com/docs/BiasVariance.html

Overfitting

Wesleyan U

niversity, Machine Learning for D

ata A

nalysis. https://ww

w.coursera.org/learn/machine-

learning-data-analysis

https://www.coursera.org/learn/machine-learning-data-analysis

Accuracy: confusion m

atrix

import numpy as npfrom sklearn import tree

#Load the dataset

X = [[2,1,0],[2,1,1],[1,1,0],[0,1,0],[0,0,0],[0,0,1],[1,0,1],[2,1,0],[2,0,0],[0,0,0],[2,0,1],[1,1,1],[1,0,0],[0,1,1]]

y = [0,0,1,1,1,0,1,0,1,1,1,1,1,0]

nfl_feature_names = ['40 yard','wonderlic','knee injury']nfl_target_names = ['No Pro Bowl', 'Pro Bowl']

from sklearn.cross_validation import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

clf = tree.DecisionTreeClassifier()clf = clf.fit(X_train, y_train)predictions = clf.predict(X_test)

from sklearn.metrics import accuracy_scorefrom sklearn.metrics import classification_reportimport sklearn.metrics

print(sklearn.metrics.confusion_matrix(y_test,predictions))print(accuracy_score(y_test, predictions))

# visualization codefrom sklearn.externals.six import StringIOimport pydotplusdot_data = StringIO()

tree.export_graphviz(clf, out_file = dot_data,feature_names = nfl_feature_names,class_names = nfl_target_names,filled=True, rounded=True,impurity=False)

graph = pydotplus.graph_from_dot_data(dot_data.getvalue())

graph.write_pdf("c:\\Data\ProBowl.pdf")

Random forestsSource: G

areth James, R

obert Tibshirani, and Trevor H

astie, Introduction to Statistical Learning, Springer.

2013.

#Build model on training datafrom sklearn.ensemble import RandomForestClassifier

# build 25 treesclassifier=RandomForestClassifier(n_estimators=25)

classifier=classifier.fit(pred_train,tar_train)

predictions=classifier.predict(pred_test)

print(sklearn.metrics.confusion_matrix(tar_test,predictions))print(sklearn.metrics.accuracy_score(tar_test, predictions))

[[1424 80] [ 217 109]]

0.837704918033

# fit an Extra Trees model to the datamodel = ExtraTreesClassifier()model.fit(pred_train,tar_train)# display the relative importance of each attributeprint(model.feature_importances_)

[ 0.02572953 0.01454145 0.02808065 0.01565101 0.00723755 0.00482434 0.06410482 0.03400461 0.0571412 0.12897684 0.01891439 0.01500713 0.02514497 0.06112466 0.05639455 0.05085095 0.01686558 0.06461658 0.063369640.07272654 0.01245386 0.05971838 0.05615322 0.04636755]

trees=range(25)accuracy=np.zeros(25)

for idx in range(len(trees)):classifier=RandomForestClassifier(n_estimators=idx + 1)classifier=classifier.fit(pred_train,tar_train)predictions=classifier.predict(pred_test)accuracy[idx]=sklearn.metrics.accuracy_score(tar_test, predictions)

plt.cla()plt.plot(trees, accuracy)

Decision trees: pros and cons

• Decision trees are less accurate than more modern methods.• Great for “explainability” – important for change management and

easy operationalization• Handle interactions between variables better than regression

methods.

• Random forests—by controlling for variance—approach “state of the art” for accuracy… but also suffer from explainability issues. Especially strong for ranking variables.

Decision trees in TensorFlow

• The hard way—implement algorithms from group up (e.g., id3, cart)• Higher level approaches:

• skflow: scikit-learn TensorFlow• keras

Date post:	07-Oct-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

trees forests and tensorflow - Meetupfiles.meetup.com/18648807/trees.pdffort lauderdale machine...

Documents