Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez...

Post on 29-Mar-2019

215 views 0 download

transcript

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Scalable Bayesian Network Classifiers

Geoff WebbAna MartinezNayyar Zaidi

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Introduction

• Learning from large data

is not just about scaling-upexisting algorithms.

• Large data is best tackled by fundamentally new learningalgorithms.

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Introduction

• Learning from large data is not just about scaling-upexisting algorithms.

• Large data is best tackled by fundamentally new learningalgorithms.

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Introduction

• Learning from large data is not just about scaling-upexisting algorithms.

• Large data is best tackled by fundamentally new learningalgorithms.

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Learning curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

• Error typically reduces as data quantity increases.

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Learning curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

• Error typically reduces as data quantity increases.

• Different algorithms have different curves.

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Learning curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Overfitting on small data

• Algorithms that closely fit complex multivariatedistributions will tend to overfit small data

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Learning curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Overfitting on small data

Best on large data

• Algorithms that closely fit complex multivariatedistributions will tend to overfit small data, but can betterfit large data

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Learning curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Overfitting on small data

Best on large data

• Algorithms that closely fit complex multivariatedistributions will tend to overfit small data, but can betterfit large data: Bias-Variance Tradeoff

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Capacity to fit = model space + optimization

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1RM

SE

Training set size

Naïve Bayes

Logistic Regression

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Much research of questionable relevance

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

• Most prior research has used relatively small data.

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Requirements

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

• Need algorithms that can closely fit complex multivariatedistributions

, while being very computationally efficient.• low bias• few passes through data• out of core

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Requirements

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

• Need algorithms that can closely fit complex multivariatedistributions, while being very computationally efficient.

• low bias• few passes through data• out of core

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Requirements

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

• Need algorithms that can closely fit complex multivariatedistributions, while being very computationally efficient.

• low bias

• few passes through data• out of core

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Requirements

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

• Need algorithms that can closely fit complex multivariatedistributions, while being very computationally efficient.

• low bias• few passes through data

• out of core

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Requirements

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

• Need algorithms that can closely fit complex multivariatedistributions, while being very computationally efficient.

• low bias• few passes through data• out of core

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

State-of-the-art

• Most low bias algorithms do not scale• Random Forest• SVM• Deep Learning

and inherently in-core

• Selective KDB is a scalable low-bias Bayesian NetworkClassifier

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

State-of-the-art

• Most low bias algorithms do not scale• Random Forest• SVM• Deep Learning

and inherently in-core

• Selective KDB is a scalable low-bias Bayesian NetworkClassifier

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

State-of-the-art

• Most low bias algorithms do not scale• Random Forest• SVM• Deep Learning

and inherently in-core

• Selective KDB is a scalable low-bias Bayesian NetworkClassifier

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Outline

1 Introduction

2 Bayesian Network Classifiers

3 Selective KDB

4 Experiments

5 Conclusions & Future Research

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Bayesian Network Classifiers

• Defined by parent relation π and Conditional ProbabilityTables (CPTs)

• π encodes conditional independence / structure• CPTs encode conditional probabilities

• Classifies using P(y | x) ∝ P(y | πY )∏

P(xi | πi )• Usually makes Y a parent of all Xi

Y

X1 X2 X3 X4 X5

• Given π, CPTs can be learned by counting jointfrequencies

• single pass• incremental

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

k-Dependence Bayes (KDB)

• Two pass learning

• 1st pass, learn structure:• Collect counts for pairs of attributes with the class.• Order attributes based on MI with the class.• Select parents based on CMI.

• no more than k parents• parents must be earlier in the order

Y

X1 X2 X3 X4 X5

• 2nd pass, learn CPTs:• Collect statistics according to the structure learned.

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

k-Dependence Bayes (KDB)

• Two pass learning

• 1st pass, learn structure:• Collect counts for pairs of attributes with the class.• Order attributes based on MI with the class.• Select parents based on CMI.

• no more than k parents• parents must be earlier in the order

Y

X1 X2 X3 X4 X5

• 2nd pass, learn CPTs:• Collect statistics according to the structure learned.

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

k-Dependence Bayes (KDB)

• Training Space: O(ya2v2 + yavk+1

)• Classification Space: O

(yavk+1

)• Training Time: O

(ta2 + ya2v2 + tak

)• Classification Time: O(yak)

t = no. of training examplesa = number of attributes;v = average number of values;y = number of classes

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

k-Dependence Bayes (KDB)

• Parameter k controls bias-variance tradeoff

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Naïve Bayes

KDB k=1

KDB k=2

KDB k=3

KDB k=4

KDB k=5

• No a priori means to anticipate best k

• Spurious attributes may increase error

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

k-Dependence Bayes (KDB)

• Parameter k controls bias-variance tradeoff

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Naïve Bayes

KDB k=1

KDB k=2

KDB k=3

KDB k=4

KDB k=5

• No a priori means to anticipate best k

• Spurious attributes may increase error

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

k-Dependence Bayes (KDB)

• Parameter k controls bias-variance tradeoff

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Naïve Bayes

KDB k=1

KDB k=2

KDB k=3

KDB k=4

KDB k=5

• No a priori means to anticipate best k

• Spurious attributes may increase error

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

KDB models are nested

• Mi ,j = KDB with k = i using attributes 1 to j .

Attributes (ordered by MI)k 1 2 3 41 M1,1 M1,2 M1,3 M1,4

2 M2,1 M2,2 M2,3 M2,4

3 M3,1 M3,2 M3,3 M3,4

4 M4,1 M4,2 M4,3 M4,4

• Mi , j is a minor extension of Mi , j−1 and Mi−1, j

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

KDB models are nested

• Mi ,j = KDB with k = i using attributes 1 to j .

Attributes (ordered by MI)k 1 2 3 41 M1,1 M1,2 M1,3 M1,4

2 M2,1 M2,2 M2,3 M2,4

3 M3,1 M3,2 M3,3 M3,4

4 M4,1 M4,2 M4,3 M4,4

• Mi , j is a minor extension of Mi , j−1 and Mi−1, j

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

KDB models are nested

• Mi ,j = KDB with k = i using attributes 1 to j .

Attributes (ordered by MI)k 1 2 3 41 M1,1 M1,2 M1,3 M1,4

2 M2,1 M2,2 M2,3 M2,4

3 M3,1 M3,2 M3,3 M3,4

4 M4,1 M4,2 M4,3 M4,4

• Mi , j is a minor extension of Mi , j−1 and Mi−1, j

Y

X1 X2 X3 X4 X5

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

KDB models are nested

• Mi ,j = KDB with k = i using attributes 1 to j .

Attributes (ordered by MI)k 1 2 3 41 M1,1 M1,2 M1,3 M1,4

2 M2,1 M2,2 M2,3 M2,4

3 M3,1 M3,2 M3,3 M3,4

4 M4,1 M4,2 M4,3 M4,4

• Mi , j is a minor extension of Mi , j−1 and Mi−1, j

Y

X1 X2 X3 X4 X5

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

KDB models are nested

• Mi ,j = KDB with k = i using attributes 1 to j .

Attributes (ordered by MI)k 1 2 3 41 M1,1 M1,2 M1,3 M1,4

2 M2,1 M2,2 M2,3 M2,4

3 M3,1 M3,2 M3,3 M3,4

4 M4,1 M4,2 M4,3 M4,4

• Mi , j is a minor extension of Mi , j−1 and Mi−1, j

Y

X1 X2 X3 X4 X5

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Selective KDB

• In one extra pass select an attribute subset and best kusing leave-one-out CV.

• The full model subsumes all k × a submodels.

• Very efficient selection between a large class of strongmodels.

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Selective KDB

• In one extra pass select an attribute subset and best kusing leave-one-out CV.

• The full model subsumes all k × a submodels.

• Very efficient selection between a large class of strongmodels.

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Selective KDB

• In one extra pass select an attribute subset and best kusing leave-one-out CV.

• The full model subsumes all k × a submodels.

• Very efficient selection between a large class of strongmodels.

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Leave-one-out CV

• Very low bias estimator of out-of-sample performance

• Incremental cross-validation makes it VERY efficient forBNC

• Collect counts from all data once• When classifying a hold-out object subtract it from the

counts

• All k × a nested models can be evaluated with little morecomputation than the full KDB model

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Selective KDB

• Training Space: O(ya2v2 + yavk+1

)• Classification Space: O

(ya∗vk

∗+1)

• Training Time: O(ta2 + ya2v2 + tayk

)• Classification Time: O(ya∗k∗)

a = number of attributes;v = average number of values;y = number of classesa∗ = number of attributes selected (a∗ ≤ a).k∗ = best value of k found (k∗ ≤ kmax).

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Selective KDB

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Naïve Bayes

KDB k=1

KDB k=2

KDB k=3

KDB k=4

KDB k=5

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Selective KDB

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Naïve Bayes

KDB k=1

KDB k=2

KDB k=3

KDB k=4

KDB k=5

SKDB Only K

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Selective KDB

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Naïve BayesKDB k=1KDB k=2KDB k=3KDB k=4KDB k=5SKDB Only KSKDB k=5 Only Atts

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Selective KDB

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Naïve BayesKDB k=1KDB k=2KDB k=3KDB k=4KDB k=5SKDB Only KSKDB k=5 Only AttsSKDB

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

16 datasets (165K-54M examples)No. of cases No. of No. of

Name (million) Atts. Classes Sizelocalization 0.165 5 11 11MB

census-income 0.299 41 2 136MB

USPSExtended 0.342 676 2 603MBs

MITFaceSetA 0.474 361 2 584MB

MITFaceSetB 0.489 361 2 603MB

MSDYear-Prediction 0.515 90 90 601MBs

covtype 0.581 54 7 72MB

MITFaceSetC 0.839 361 2 1.1GB

poker-hand 1.025 10 10 24MB

uscensus1990 2.458 67 4 325MB

PAMAP2 3.851 54 19 1.7GB

kddcup 5.210 41 40 754MB

linkage 5.749 11 2 251MB

mnist8ms 8.100 784 10 19GBs

satellite 8.705 138 24 3.6GB

splice 54.628 141 2 7.3GBs sparse format.

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Numeric attributes

• 5 bin equal frequency discretisation

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Out-of-core BNCs

RMSE

KDB5 vs SKDB best KDB vs SKDB

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Out-of-core BNCs

• Training Time Comparisons

100

1000

10

100

(sec

onds

)

NB

1

Tim

e TAN

AODE

KDB k=5

SKDB

0,1

zatio

n

ncom

e

r-ha

nd

natio

n

ovty

pe

cens

us

eSet

A

eSet

B

ende

d

eSet

C

ddcu

p

SKDB

loca

liz

Cen

sus-

in

Poke

r

don

co usc

MIT

Face

MIT

Face

UPS

Sext

e

MIT

Face kd

Datasets

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Out-of-core BNCs

• Classification Time Comparisons

100

1000

k=4k=3

k=4k=3

k=5 k=4.5

10

me

(sec

onds

)

NBTAN

k=4 k=4 k=2

k=4.8 k=51

Tim TAN

AODEKDB k=5SKDB

k 4 k 4 k 20,1

ocal

izat

ion

oker

-han

d

us-in

com

e

dona

tion

covt

ype

usce

nsus

TFac

eSet

A

Sext

ende

d

TFac

eSet

B

TFac

eSet

C

kddc

up

lo P

Cen

su

MIT

UPS

S

MIT

MIT

Datasets

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Out-of-core SGD

• SGD in Vowpal Wabbit (VW):• Squared and logistic function.• Quadratic features (best results).• Different number of passes (3 or 10).• Discrete attributes into binary features.• One-against-all for multiclass classification.

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Out-of-core SGD

0.0001 0.001 0.01 0.1 10.0001

0.001

0.01

0.1

1

VWLF - RMSE (log. scale)

SK

DB

(m

ax k

=5)

- R

MS

E (

log.

sca

le)

0.0001 0.001 0.01 0.1 10.0001

0.001

0.01

0.1

1

VWSF - RMSE (log. scale)

SK

DB

(m

ax k

=5)

- R

MS

E (

log.

sca

le)

VW (logistic) - RMSE VW (squared) - 0-1 Loss

VWRMSE (logistic) 0-1 Loss (squared)

Selective KDB (7+2)-0-7 8-1-7

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Out-of-core SGD

• Training Time Comparisons

100

1000

10000

(sec

onds

)

1

10

tion

ome

and

tion yp

e

nsus

etA

etB

ded

etC

cup

Tím

e

VWSF

VWLF

SKDB

loca

lizat

Cen

sus-

inco

Pok

er-h

a

dona

t

covt

y

usce

n

MIT

Fac

eSe

MIT

Fac

eS

UP

SS

exte

nd

MIT

Fac

eS

kddc

Datasets

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Out-of-core SGD

• Classification Time Comparisons

10

100

(sec

onds

)

0.1

1

tion

and

ome

tion

ype

nsus etA

ded

etB

etC

cup

Tim

e

VWSF

VWLF

SKDB

loca

lizat

Pok

er-h

a

Cen

sus-

inco

dona

t

covt

y

usce

n

MIT

Fac

eSe

UP

SSe

xten

d

MIT

Fac

eS

MIT

Fac

eS

kddc

Datasets

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

In-core BayesNet and RF

BayesNet RFx = sampled dataset

BayesNet RF (Num)k-selective KDB 6-3-3 5-0-6

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

In-core BayesNet and RF

• Training Time Comparisons

100

1000

10000

100000

seco

nds)

BayesNet

RF

0.1

1

10

tion

ome

tion

type etA

Set

B

nded

Tim

e ( s RF

SKDB

loca

lizat

Cen

sus-

inco

dona

t

covt

MIT

Fac

eS

MIT

Fac

eS

US

PS

Ext

en

Datasets

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

In-core BayesNet and RF

• Classification Time Comparisons

10

100

seco

nds)

BayesNet

RF

0.1

1

tion

ome

tion

type etA

Set

B

nded

Tim

e (s

SKDB

loca

liza

t

Cen

sus-

inco

dona

t

covt

MIT

Fac

eS

MIT

Fac

e S

US

PS

Ext

en

Datasets

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Global comparisons

• Cumulative Ranking

• Selective KDB copes well with high-dimensional datasetsand datasets with more than 1 million points.

• RF performs better on datasets with small # of attributes.

• VW has advantage for sparse numeric data.

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Global comparisons

• Ranking

7.63

6.817

8

5.88

5

6

king

3.47 3.44 3.34 2.912.533

4Ran

k

1

2

NB AODE TAN BayesNet VWLF KDB (best k) RF (Num) SKDBy ( ) ( )

Learners

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Global comparisons (Time)

Learner

Tim

e (s

econ

ds)

NB TAN AODE KDB ksKDB VWsq VWlog

050

0010

000

1500

0

Learner

Tim

e (s

econ

ds)

− lo

g. s

cale

NB TAN AODE KDB ksKDB VWsq VWlog

5010

020

050

010

00

Training Test

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Stepping back

• The key trick is nested evaluation of a large class ofcount-based models

• Works also with 2-pass selective evaluation of AnDEmodels

• Combines the efficiency of simple generative techniqueswith the power of discriminative learning

• Can use any loss function that is a function of eachP̂(y | x)

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Future Research

• Numeric attributes• it seems that there should be something better than

discretisation.• this remains elusive ...

• Overfitting avoidance

• Increase number of alternative models considered

• Explore other forms of nested BNCs

• Two pass Selective KDB• sample small test set in second pass

• Single pass generative/discriminative learning• initially learn a simple generative model• collect discriminative statistics and refine the model when

there is sufficient evidence to make a choice• refinements might be attribute selection, structural

refinement or weighting

• repeat

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Satellite learning curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RMSE

Training set size

NBKDB K=1KDB K=2KDB K=3KDB K=4KDB K=5KDB K=6KDB K=7KDB K=8

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Satellite learning curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RMSE

Training set size

NBKDB K=1KDB K=2KDB K=3KDB K=4KDB K=5KDB K=6KDB K=7KDB K=8SKDB K=10

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Satellite learning curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RMSE

Training set size

NBKDB K=1KDB K=2KDB K=3KDB K=4KDB K=5KDB K=6KDB K=7KDB K=8SKDB K=10Random Forest

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Conclusions

• Large data calls for fundamentally new learning algorithms

• We are pioneering a new generation of theoreticallywell-founded algorithms that are both scalable to verylarge data and capable of exploiting the fine detailinherent therein