+ All Categories
Home > Documents > Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns...

Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns...

Date post: 15-Apr-2020
Category:
Upload: others
View: 12 times
Download: 0 times
Share this document with a friend
100
Constraint Solving meets Data Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium Guest Lecture @ AI lab, VUB, Belgium 22 Feb. 2013
Transcript
Page 1: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Constraint Solvingmeets

Data Mining and Machine Learning

Tias GunsDTAI, KU Leuven, Belgium

Guest Lecture @ AI lab, VUB, Belgium 22 Feb. 2013

Page 2: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

One of the success stories of A.I.

model: declarative specification of constraints

+

search: generic handling of variables and constraints & efficient propagation of individual constraints

Used in scheduling, planning, bio-informatics, game playing, verification, logical reasoning, ...

Constraint Solving

Page 3: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Data mining & Machine Learning“Using historical data to improve decisions”

● Data Mining: discovering knowledge in datafor example purchasing behaviour, biological data, ...

● Software we can't program by handfor example self-driving cars, speech recognition, …

● Self-customizing programsfor example spam filters, recommender systems, …

Also hyped as data analytics and big data[T. Mitchell, Machine Learning, 1997]

Page 4: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

This lecture

Data Mining &Machine Learning

ConstraintSolving

How can the two fields benefit from each-other?

● Part A: Introduction● Part B: Solving in ML & DM● Part C: Learning in constraint solvers

Page 5: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Part A: introduction

Page 6: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Declarative vs ImperativeTwo approaches to problem solving.

Page 7: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

I want a carpet that is:

Declarative

● 5m x 2m● blue● with fish patterns● and little ropes on the side

Page 8: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Imperative

Now,

● put a blue wire,● tighten it,● add two layers of white,● make a nod,● ...

Page 9: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Example: graph coloring

Page 10: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Graph coloring, imperativeNaieve:

● depth-first search over countries,do not assign neighbors the same color

● O(kn) k=colors, n=nodes/countries

Smart:● based on the principle of inclusion-exclusion and the

zeta transform [Björklund, A.; Husfeldt, T.; Koivisto, M. (2009), "Set partitioning via inclusion–exclusion", SIAM Journal on Computing 39 (2): 546–563]

● O(n*2n)

Development time vs execution time?

Page 11: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Graph coloring, declarative

Neighboring countries shouldhave different colors

Page 12: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Graph coloring, declarativeConstraint programming● Variables

ex. countries

● Domains ex. colors

● Constraints ex. neighbor colors

Page 13: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Imperative vs DeclarativeImperative:

● Low-level control● Typically very fast and efficient● Very specific for one problem

Declarative:● high-level modeling● less fast and scalable than hand-made algorithms● general and reusable

Page 14: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

From imperative to declarativeExample, evolution in databases:

● Data in files, access with IO calls (syscall level)● Data in records, access by following pointers

(data definition language + data manipulation language)● Data in tables/relations (schema + query language)

Abstraction, reuse, query optimisation, indexing, continued progress and improvements

Warning: generality/efficiency trade-off remains● NoSQL (key/value) for specialised data (images, graphs)

or specific settings (high-speed low-consistency)

Page 15: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

AI Research in 2012Recent conferences and journals have:

● Search and Planning● SAT and Constraints● Probabilistic Planning● Probabilistic Reasoning● Inference in First-Order Logic● Machine Learning & Data Mining● Natural Language● Vision and Robotics● Multi-agent systems

[H. Geffner, AI: From Programs to Solvers, Turing Session, ECAI-2012]

Trend from imperativeto declarative:

● SAT solvers● CP solvers● BDDs● MIP & QP● Bayesian Networks● Markov Logic● POMDPs● ...

Page 16: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Constraint Solving

An incomplete categorisation

Symbolic Numeric

SAT

SMT

ASP

FO(.)

CP Constr. basedLocal Search

LNS

Know.Comp.

BDD

ADD

Math.Progr.

LP

MIP

Convex opt.

weightedCP

pseudo-boolean

Numeric Other

Page 17: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Declarative Constraint Solving

Mantra:

Constraint Solving = Model + Search

by the user

by a solver

Page 18: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SAT solving● Propositional Satisfiability● Example: the 'frietkot' problem

Page 19: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SAT solving● Propositional Satisfiability● First proven NP-complete problem (Cook, 1971)● Input: clauses

● X \/ Y \/ -Z

● -X \/ Z

● Output: UNSATISFIABLE, or, assignment to variables that satisfies all clauses

Page 20: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SAT solvingAdvantages/disadvantages:✔ Extremely optimised solvers

● standard input format (making comparison easy)● yearly competitions● sustained scientific progress

Page 21: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SAT solver research

Time

(max 1200 sec)

Number of problems solved

20022003

20042005

20062007

2008

2009

2010

2011

Page 22: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SAT solvingAdvantages/disadvantages:✔ Extremely optimised solvers

● yearly competitions● standard input format (making comparison easy)● sustained yearly progress

✗ No support for modelling high-level problems

Page 23: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Constraint Programming, example● variables

[E11

... E99

]

● domains

Exy

= {1 ... 9}

● constraints

all_different([E1x

]), ...

all_different([Ex1

]), ...

all_different([E11

...E33

]), ...

all_diff(

all_diff(

all_diff(

all_diff(

...

...

) )

)

)

all_diff()

Page 24: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Constraint Programming● CSP or COP (satisfaction / optimisation)● Solving combinatorial problems (typically in NP)

scheduling, routing, planning, ...

● Input: Variables, Domains, Constraints

● High level modeling languages (Zinc, Essence, OPL)

● 'global constraints'

Page 25: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP SearchTwo key principles:

● Propagation of constraintseg. alldiff(X,Y,Z) X={1},Y={1,2},Z={1,2,3,4} → Y={2},Z={3,4}

Every constraint is implemented by a propagator.

● Branch over values of variableseg. Propagation at fixpoint → branch over Z={3}

Search is recursive and complete

Page 26: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Declarative constraint solvingAdvantages

● general approach● reuse of solvers, modeling primitives

Disadvantage● need expertise (good model/bad model)● search heuristics huge impact on performance

Use historical data to improve decisions?→ Machine Learning

Page 27: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

What DM & ML offers to CS

The use of historical data:● Learning/improving models (constraints)● Learning/improving search strategies, solver selection,

heuristics, etc

Data Mining &Machine Learning

ConstraintSolving

Page 28: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

What CS offers to DM & ML

● Declarative: model + search● Decomposability and reuse● General: many tasks, variations● Rapid prototyping, iterative process

Data Mining &Machine Learning

ConstraintSolving

Page 29: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

This lecture

Data Mining &Machine Learning

ConstraintSolving

How can the two fields benefit from each-other?

● Part A: Introduction● Part B: Solving in ML & DM● Part C: Learning in constraint solvers

Page 30: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Part B: Solving in ML & DM

Page 31: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Data Mining and Machine LearningSymbolic

● Rule learning● Decision trees● Clustering● Pattern Mining● ...

Numeric● Regression● SVMs● Matrix factorisation● ...

Mostly using numeric optimisation:least squares, gradient decent,convex optimisation, ...

Mostly using hand-craft algorithms:Ripper, C4.5, k-means, Apriori, ...

Page 32: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009]

Page 33: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009]

Page 34: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009]

Page 35: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009]

Page 36: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009]

Page 37: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009][Slide by E. Ricci, Analysis of Patterns, 2009]

Page 38: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009]

Page 39: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009]

Page 40: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009]

Page 41: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Declarative methods in MLOften problem expressible as linear/convex opt. problem → can solve with standard ILP/QP solvers

In practice, specialised solvers are often used:● faster, lighter implementations● special-purpose decompositions● dealing with large and sparse data

Page 42: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Data Mining and Machine LearningSymbolic

● Rule learning● Decision trees● Clustering● Pattern Mining● ...

Numeric● Regression● SVMs● Matrix factorisation● ...

Mostly using numeric optimisation:least squares, gradient decent,convex optimisation, ...

Mostly using hand-craft algorithms:Ripper, C4.5, k-means, Apriori, ...

Can declarative methodsbe used here too?

Page 43: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Basic mining taskAnalysing a datasetto find patterns of interest

For example:

Analysing purchases (e.g. books)

Here, patterns are sets of 'items'

(e.g. + + )

Page 44: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Patterns of interest:

● which patterns are frequent?

● which patterns have a high average price?

● which patterns are frequent on one datasetand infrequent on the other?

● which patterns are significant w.r.t a background model?

● ...

→ specified by constraints

Page 45: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Constraint-based Pattern Mining● Numerous constraints proposed● Numerous algorithms developed

Yet,● new constraints mostly require new implementations● very hard to combine different constraints

Surprisingly, CP had not beenapplied to Pattern Mining

Page 46: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Pattern MiningDepends on type of data

Itemset MiningText Mining Graph Mining

Well, there's egg and spam; egg sausage and spam; spam and bacon; egg bacon and spam; egg bacon sausage and spam; spam bacon sausage and spam; spam egg spam spam bacon and spam; spam sausage spam spam bacon spam tomato and spam; spam spam spam egg and spam; spam spam spam spam baked beans spam spam spam spam; or Lobster Thermidor, a Crevette with a mornay sauce served in a Provencale manner with shallots and aubergines garnished with truffle pate, brandy and spam.

Page 47: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Pattern MiningDepends on type of data

Itemset MiningText Mining Graph Mining

Well, there's egg and spam; egg sausage and spam; spam and bacon; egg bacon and spam; egg bacon sausage and spam; spam bacon sausage and spam; spam egg spam spam bacon and spam; spam sausage spam spam bacon spam tomato and spam; spam spam spam egg and spam; spam spam spam spam baked beans spam spam spam spam; or Lobster Thermidor, a Crevette with a mornay sauce served in a Provencale manner with shallots and aubergines garnished with truffle pate, brandy and spam.

Mostfundamental

setting

Page 48: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

sets of items

Page 49: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Find: set of items appearing frequently

Example:

Itemset Mining

{ , }: frequency = 2

Page 50: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Declarative approach

coverage( , ) = { , }

frequency( , ) = 2

Page 51: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP for Itemset Mining

coverage:

frequency: ∑tT t≥Freq

∀T t : T t=1⇔({I1, ... In}⊆rowt)

Page 52: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP for Itemset Mining

coverage:

frequency: ∀ I i : I i=1⇒∑t ,Dti=1T t≥Freq

∀T t : T t=1⇔∑iI i(1−Dti)=0

Page 53: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP4IM, basic model

Page 54: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Traditional search: proj. database

Page 55: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP for Itemset Miningcoverage:

freq >= 2: ∀ I i : I i=1⇒∑t ,Dti=1T t≥Freq

∀T t : T t=1⇔ ∧i , Dti=0 ¬I i

Page 56: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP for Itemset Miningcoverage:

freq >= 2:

● propagate i2Intuition: infrequent i2 can never be part of freq. superset

∀ I i : I i=1⇒∑t ,Dti=1T t≥Freq

∀T t : T t=1⇔ ∧i , Dti=0 ¬I i

Page 57: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP for Itemset Miningcoverage:

freq >= 2:

● propagate i2● propagate t1

Intuition: unavoidable t1 will always be covered

∀ I i : I i=1⇒∑t ,Dti=1T t≥Freq

∀T t : T t=1⇔ ∧i , Dti=0 ¬I i

Page 58: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP for Itemset Miningcoverage:

freq >= 2:

● propagate i2● propagate t1● branch i1=1

∀ I i : I i=1⇒∑t ,Dti=1T t≥Freq

∀T t : T t=1⇔ ∧i , Dti=0 ¬I i

Page 59: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP for Itemset Miningcoverage:

freq >= 2:

● propagate i2● propagate t1● branch i1=1● ...

∀ I i : I i=1⇒∑t ,Dti=1T t≥Freq

∀T t : T t=1⇔ ∧i , Dti=0 ¬I i

Page 60: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

More constraints● Coverage (required)

● Frequent

● Maximal

● Closed

● Delta-closed

● ...

I i=1⇒∑tT tDti≥Freq

T t=1⇔∑iI i1−Dti=0

I i=1⇔∑tT tDti≥Freq

I i=1⇔∑tT t 1−Dti=0

I i=1⇔∑tT t 1−−Dti=0

+ combinations

Page 61: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Generality

Page 62: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

35% 30% 25% 20% 15% 10% 5% 1%0.05

0.5

5

50

500

FIMCPPATTERNISTLCM5.3LCM2.5MAFIAB_ECLATB_FPGROWTHB_APRIORIDMCP

Minimum support

Ru

ntim

e (

s)

Simple Itemset MiningCP (Gecode)

Specialisedsystems

}

coverage+frequency

Page 63: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Constraint-based mining

5 15 25 350.1

1

10

100

1000

FIM_CP_1%FIM_CP_5%FIM_CP_10%PATTER_1%PATTER_5%PATTER_10%LCM_10%

MaxAvgCost

Ru

ntim

e(s

)

CP (Gecode)

Specialisedsystems

Page 64: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Correlated itemset miningAlso known as: discriminative itemset mining, contrast set mining, emerging itemsets, subgroup discovery, ...

● Given: labelled transactions

● Find: the itemset that best correlates with the class label : { , } , : { , }

Page 65: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Correlation constraint

● Existing pruning technique:

only uses upper-bound of

● Our CP-based propagator:

uses upper- and lower-bound of and look-ahead formulation

f ∑t∈PT t ,∑t∈N

T t ≥Bound

∑T

∑TI i=1⇒ ...

much stronger propagation !

Page 66: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Correlated itemset mining

Runtime in seconds

Page 67: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Decreasing the gap

An integrated CP solver would:

● use principles of both IM and CP

● focus on constraints for itemset mining

35% 30% 25% 20% 15% 10% 5% 1%0.05

0.5

5

50

500

Hypothesis: unnecessary overhead in CP solver

Page 68: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Itemset Mining principles

● Search strategy Level-wise, BFS, DFS

● Representation of data

● Representation of sets

1 0 1 11 1 0 10 0 1 1

110

010

101

111

1 0 1 11 1 0 10 0 1 1

1 0 1 1 0 1 0 0{1, 3, 4} {2}

Page 69: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

EclatMiner

GecodeCP Solver

DMCP CP Solver

Search strategy

DFS DFS (binary) DFS (binary)

Repres. of data

Shared,vertical

In constraints(up to 4 copies)

Shared matrix (default: vertical)

● Data shared (read-only) by constraints● Horizontal, positive and negative views available

Integration 1/3

Page 70: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

EclatMiner

GecodeCP Solver

Our DMCP CP Solver

Repres. of sets

Sparse or Dense

Sparse Sparse or Dense

Types of vars. Boolean vector (set)

Bool, Int, Set, ... Boolean vector (set)

● Represented by lower and upper bound:

0, 0/1, 0/1, 1

Min: {0, 0, 0, 1}

Max: {0, 1, 1, 1}

Integration 2/3

Page 71: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

EclatMiner

GecodeCP Solver

Our DMCP CP Solver

Constraints Few,hard to

combine

Many,easy to

add/combine

Some,easy add/combine

Constraint activ.

Strict order(in algorithm)

On domain change

Change of lower/upper bound

● General matrix constraint:

Integration 3/3

Data representation (matrix)

Boolean vectors

Page 72: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Frequent Itemset Mining

35% 30% 25% 20% 15% 10% 5% 1%0.05

0.5

5

50

500

Mushroom (Frequent)

FIMCPPATTERNISTLCM5.3LCM2.5MAFIAB_ECLATB_FPGROWTHB_APRIORIDMCP

Minimum support

Ru

ntim

e (

s)

Old, Gecode

New, DMCP

Page 73: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

50% 10% 5% 1% 0.5% 0.1% 0.05% 0.01%0.05

0.5

5

50

500

T10I4D100K (Frequent)

FIMCPPATTERNISTLCM5LCM2MAFIAB_ECLATB_FPGROWTHB_APRIORIDMCP

Minimum support

Run

time

(s)

Frequent Itemset Mining, scaling

Old, Gecode

New, DMCP

Page 74: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

35% 30% 25% 20% 15% 10% 5% 1%0.05

0.5

5

50

500

Splice (Closed)

FIMCPPATTERNISTLCM5.3LCM2.5MAFIAB_ECLATB_FPGROWTHB_APRIORIDMCP

Minimum support

Ru

ntim

e (

s)

Closed Itemset Mining

Old, Gecode

New, DMCP

Page 75: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP for Itemset MiningAdvantages of CP modelling:

● Easily add new constraints● Freely combine constraints

Advantage of IM/CP solver integration:● Theoretical: polynomial delay analysis● Practical: remove efficiency/scalability gap

Model

Search

Page 76: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Data Mining and Machine LearningSymbolic

● Rule learning● Decision trees● Clustering● Pattern Mining● ...

Mostly using hand-craft algorithms:Ripper, C4.5, k-means, Apriori, ...

Can declarative methodsbe used here too?

Correlated Itemset Mining [Nijssen et al. KDD 09]

using CP [Bessiere et al. CP 09]

SAT & clustering [Davidson et al. SDM 10]

Itemset Mining: [Guns et al. KDD08, AIJ 12] [many more]

Sequence mining: [Coquery et al. ECAI 12], ...

Page 77: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Open Questions● More tasks, different types of problems● Structured data (sequences, trees, graphs)● Efficiency/generality trade-off● Scalability and specialised solvers● High-level modeling language for DM?

Page 78: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

This lecture

Data Mining &Machine Learning

ConstraintSolving

How can the two fields benefit from each-other?

● Part A: Introduction● Part B: Solving in ML & DM● Part C: Learning in constraint solvers

Page 79: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Part C: Learning in constraint solvers

Page 80: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Declarative Constraint Solving

Mantra:

Constraint Solving = Model + Search

by the user

by a solver

Page 81: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Declarative Constraint SolvingAdvantages

● general approach● reuse of solvers, modeling primitives

Disadvantage● need expertise (good model/bad model)● search heuristics huge impact on performance

Use historical data to improve decisions?→ Machine Learning

Page 82: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

No Free Lunch theoremNo single algorithm is best on all problem instances

→ Can we characterize/learn which algorithm is best on which problem instance?

In many competitions (SAT, CP, rostering, …),big gap between 'single best solver' and 'oracle solver'

Empirical hardness modelsHardness of a problem vs design choices in an algo.

Page 83: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

● given a number of solvers,which solver to choose?

→ algorithm selection (also known as: portfolio's, meta-learning, ...)

● given an algorithm with a number of parameters,which parameter values to set?

→ algorithm configuration (= alg. selection when small parameter space)

Page 84: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Algorithm selectionGeneral approach:

1. Collect solvers

2. Collect problem instances/data

3. Calculate features on instance/data

4. Build predictive model

[Kotthof, Survey, 2012]

Page 85: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Algorithm selection 1. Collect solvers● Availability of solvers?● Too many solvers?● Diversity of solvers?

Page 86: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Algorithm selection 2. Collect problem instances/data● Similar problems? (SAT vs CP)● Representable set of data?● Diversity of instances/data?

Page 87: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Algorithm selection 3. Calculate features on instance/data● What features to use?

→ Domain specific!● Number of features to use?● Normalisation? Features selection? Stacking?

Features are arguably the MOST IMPORTANT choice(in ML in general)

Page 88: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Algorithm selection 4. Build predictive model● What model?

● per solver, e.g. regression?● per pair-of-solver, classification?● per portfolio, classification or ranking?

● Sensitivity to features? (e.g. noise, redundancy)● Clustering / hierarchical models?

Page 89: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SATZilla 2007: winning the SAT competition 1. Solvers:

● From previous competitions (~20)● Subset selection to select ~10 diverse ones

(as measured by repeatedly building portfolio's)● Two pre-solvers (limited amount of time)● One backup solver (if all else fails)

2. Problem instances/data● From previous competitions

Page 90: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SATZilla 2007: winning the SAT competition 3. Calculate features on instance/data

● 64 SAT-specific features● feature selection● speed to compute vs gain in prediction performance

4. Build predictive model● logistic regression of log(runtime)● censored data/timeouts● hierarchical: predict sat/unsat

Page 91: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SATZilla 2007: winning the SAT competition

HANDMADE

Pre-solving

Feat. Comp.

Oracle

SATzilla07

Page 92: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SATzilla 2011: more success● More features (138)● Learn best pre-solvers● Feature computation prediction (2 levels of features)● Backup solver: not best overall, but best on feature-

timeout instances

Learning method:● Replace regression by pairwise classification,● Take cost of mis-classification (runtime) into

account

Random Oracle Handmade Industrial

Page 93: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

General lessons● Good classifier alone not enough to win a competition

(pre-solvers, backup solver, time of feat. calculation)

● Need good features (and feature selection)

● Hierarchical models: clustering problem instances and having separate portfolio's offers gain

● The best classifiers take the actual runtime into account (new: and cost of misclassification?)

Page 94: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Why does it work?Machine Learning perspective:● Similar to ensembles:

combining predictions = minimising the variance

● Many more interesting (underexplored?) connections to boosting, bagging, and ensembles.

In SAT community: even single solver has large variance on runtime for a single input file (heuristic choices)

Page 95: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Open Questions● Bounds on improvement? Learning theory?● Limited use of probabilistic techniques?

In wider context of improving constraint solving:● Learning parameters [iRace] or entire search

strategies [grammar approach IRIDIA]?● Learning model reformulations? [ModRef]● Learning constraints/models? [ConAcq, ModelSeeker]

Page 96: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

This lecture

Data Mining &Machine Learning

ConstraintSolving

How can the two fields benefit from each-other?

● Part A: Introduction● Part B: Solving in ML & DM● Part C: Learning in constraint solvers

Page 97: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

What CS offers to DM & ML

● Declarative: model + search● Decomposability and reuse● General: many tasks, variations● Rapid prototyping, iterative process

Data Mining &Machine Learning

ConstraintSolving

Page 98: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

What DM & ML offers to CS

The use of historical data:● Learning/improving models (constraints)● Learning/improving search strategies, solver selection,

heuristics, parameters, etc

Data Mining &Machine Learning

ConstraintSolving

Page 99: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Thank you for listening

Data Mining &Machine Learning

ConstraintSolving

Questions?

Page 100: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Possible task:SatZilla data:

http://www.cs.ubc.ca/labs/beta/Projects/SATzilla/

Do solver selection with ML techniques:● Features matter!● Imperative or declarative learning methods?● Ease of modification/improvement of learning

technique?● Additional improvements tuning with, e.g., iRace?


Recommended