+ All Categories
Transcript
Page 1: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

“BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree

Ensembles

“BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree

Ensembles

Vesna Luzar-Stiffler, Ph.D.University Computing Centre, and CAIR Research Centre,

Zagreb, Croatia Charles Stiffler, Ph.D.

CAIR Research Centre, Zagreb, [email protected], [email protected]

Page 2: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

OutlineOutline

Introduction/Background Trees Ensemble Trees Visualization Tools

Simulation Results

Web Survey Results

Conclusions/Recommendations

Page 3: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Introduction / BackgroundIntroduction / Background

Classification / Decision Trees Data mining (statistical learning) method for

classification Invented twice:

Statistical community: Breiman: Friedman et.al. (1984) Machine Learning community: Quinlan (1986)

Many positive features Interpretability, ability to handle data of mixed type

and missing values, robustness to outliers, etc.

Disadvantage unstable vis-à-vis seemingly minor data perturbations

low predictive power

Page 4: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Introduction / BackgroundIntroduction / Background

Possible improvements: Ensembles Bagging i.e., Bootstraping trees (Breiman, 1996) Boosting, e.g., AdaBoost (Freund & Schapire, 1997) Random Forests (Breiman, 2001) Stacking, randomized trees, etc.

Advantage: Improved prediction

Disadvantage Loss of interpretability (“black box”)

Page 5: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Classification TreeClassification Tree

Let

be the classification tree prediction at input x obtained from the full “training” data Z=

{(x1,y1),(x2,y2)…(xN,yN)}

)(ˆ xf

Page 6: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Bagging Classification TreeBagging Classification Tree

Let

be the classification tree prediction at input x obtained from the bootstrap sample Z*b, b=1,2,…B.

Bagging estimate:

)(ˆ * xf b

1

2

B

B

b

b

bagxf

Bxf

1

* )(ˆ1

)(ˆ

Page 7: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Visualization toolsVisualization tools

Graphs based on predictor “importances”

(Bxp) matrix F (p=# of predictors)

For bagged trees, we take the avg: Diagram 1, importance mean bar chart Diagram 2, (“BOF Clusters”) is the cluster

means chart (NEW) Diagram 3, (“BOF MDPREF”) is the

multidimensional preference bi-plot (NEW)

)(ˆ1ˆ

1

22

b

B

b kkTI

BI

Page 8: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Visualization toolsVisualization tools

Graphs based on proximity (nxn) matrix P, (n=# of cases) Diagram 4 (“Proximity Clusters”) is the cluster

means chart (Breiman,2002) Diagram 5 (“Proximity MDS”) is the

multidimensional scaling plot of “similar” cases (Breiman,2002)

Page 9: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Simulation experimentsSimulation experiments

S1:Generate a sample of size n=30,two classes, and p=5 variables (x1-x5), with a standard normal distribution and pair-wise correlation 0.95.The responses are generated according toPr(Y=1|x1≤0.5) = 0.2, Pr(Y=1|x1>0.5)=0.8.

S2:Generate a sample of size n=30,two classes, and p=5 variables (x1-x5), with a standard normal distribution and pair-wise correlation 0.95 between x1 and x2, and 0 among other predictors.The responses are generated according toPr(Y=1|x1≤0.5) = 0.2, Pr(Y=1|x1>0.5)=0.8.

Page 10: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Diagram 1, Mean importance Diagram 1, Mean importance

S1 S2

Page 11: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Diagram 2, “BOF Clusters” Diagram 2, “BOF Clusters”

S1 S2

Page 12: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Diagram 3, “BOF MDPREF” Diagram 3, “BOF MDPREF”

S1 S2

Page 13: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Diagram 4, “Proximity Clusters” Diagram 4, “Proximity Clusters”

S1 S2

Page 14: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Web Survey dataWeb Survey data

ICT infrastructure/usage in Croatian primary and secondary schools 25,000+ teachers (cases)200+ variablesResponse: “classroom use of a computer by educators” (yes/no)Partition 50% training 25% validation 25% test

Page 15: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Initial tree (before bagging)Initial tree (before bagging)

Page 16: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Diagram 1, “Mean importance” Diagram 1, “Mean importance”

Page 17: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Diagram 2, “BOF Clusters” Diagram 2, “BOF Clusters”

Page 18: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Diagram 3, “BOF MDPREF” Diagram 3, “BOF MDPREF”

Page 19: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Bootstrap tree 11Bootstrap tree 11

Page 20: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Bootstrap tree 22Bootstrap tree 22

Page 21: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Bootstrap tree 12Bootstrap tree 12

Page 22: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Clustering trees Clustering trees

Page 23: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Diagram 5, “Proximity MDS” Diagram 5, “Proximity MDS”

Page 24: BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004

Conclusions/ RecommendationsConclusions/ Recommendations

There are SWs for trees

There are some SWs for tree ensembles

There are some visualization tools (old and new)

The problem is they are not “interfaced” (integrated)


Top Related