+ All Categories
Home > Documents > Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer...

Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer...

Date post: 25-Mar-2021
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
65
Learning to Branch Ellen Vitercik Joint work with Nina Balcan, Travis Dick, and Tuomas Sandholm Published in ICML 2018 1
Transcript
Page 1: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Learning to BranchEllen Vitercik

Joint work with Nina Balcan, Travis Dick, and Tuomas Sandholm

Published in ICML 2018

1

Page 2: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Integer Programs (IPs)

a

maximize ๐’„ โˆ™ ๐’™subject to ๐ด๐’™ โ‰ค ๐’ƒ

๐’™ โˆˆ {0,1}๐‘›

2

Page 3: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Facility location problems can be formulated as IPs.

3

Page 4: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Clustering problems can be formulated as IPs.

4

Page 5: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Binary classification problems can be formulated as IPs.

5

Page 6: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Integer Programs (IPs)

a

maximize ๐’„ โˆ™ ๐’™subject to ๐ด๐’™ = ๐’ƒ

๐’™ โˆˆ {0,1}๐‘›

NP-hard

6

Page 7: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Branch and Bound (B&B)

โ€ข Most widely-used algorithm for IP-solving (CPLEX, Gurobi)

โ€ข Recursively partitions search space to find an optimal solutionโ€ข Organizes partition as a tree

โ€ข Many parametersโ€ข CPLEX has a 221-page manual describing 135 parameters

โ€œYou may need to experiment.โ€

7

Page 8: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Why is tuning B&B parameters important?

โ€ข Save timeโ€ข Solve more problems

โ€ข Find better solutions

8

Page 9: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B in the real world

Delivery company routes trucks dailyUse integer programming to select routes

Demand changes every daySolve hundreds of similar optimizations

Using this set of typical problemsโ€ฆ

can we learn best parameters?

9

Page 10: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Application-

Specific

DistributionAlgorithm

Designer

B&B parameters

Model

๐ด 1 , ๐’ƒ 1 , ๐’„ 1 , โ€ฆ , ๐ด ๐‘š , ๐’ƒ ๐‘š , ๐’„ ๐‘š

How to use samples to find best B&B parameters for my domain?

10

Page 11: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Model

Model has been studied in applied communities [Hutter et al. โ€˜09]

Application-

Specific

DistributionAlgorithm

Designer

B&B parameters

๐ด 1 , ๐’ƒ 1 , ๐’„ 1 , โ€ฆ , ๐ด ๐‘š , ๐’ƒ ๐‘š , ๐’„ ๐‘š

11

Page 12: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Model

Model has been studied from a theoretical perspective

[Gupta and Roughgarden โ€˜16, Balcan et al., โ€˜17]

Application-

Specific

DistributionAlgorithm

Designer

B&B parameters

๐ด 1 , ๐’ƒ 1 , ๐’„ 1 , โ€ฆ , ๐ด ๐‘š , ๐’ƒ ๐‘š , ๐’„ ๐‘š

12

Page 13: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Model

1. Fix a set of B&B parameters to optimize

2. Receive sample problems from unknown distribution

3. Find parameters with the best performance on the samples

โ€œBestโ€ could mean smallest search tree, for example

๐ด 1 , ๐’ƒ 1 , ๐’„ 1 ๐ด 2 , ๐’ƒ 2 , ๐’„ 2

13

Page 14: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Questions to address

How to find parameters that are best on average over samples?

Will those parameters have high performance in expectation?

๐ด, ๐’ƒ, ๐’„

?

๐ด 1 , ๐’ƒ 1 , ๐’„ 1 ๐ด 2 , ๐’ƒ 2 , ๐’„ 2

14

Page 15: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithms

4. Experiments

5. Conclusion and Future Directions

15

Page 16: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

16

Page 17: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

17

Page 18: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

18

Page 19: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

19

Page 20: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

20

Page 21: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

๐‘ฅ2 = 0 ๐‘ฅ2 = 1

21

Page 22: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

๐‘ฅ2 = 0 ๐‘ฅ2 = 1

22

Page 23: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0,3

5, 0, 0, 0, 1, 1

116

0, 1,1

3, 1, 0, 0, 1

133.3

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

๐‘ฅ6 = 0 ๐‘ฅ6 = 1 ๐‘ฅ2 = 0 ๐‘ฅ2 = 1

23

Page 24: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0, 1,1

3, 1, 0, 0, 1

133.3

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

๐‘ฅ6 = 0 ๐‘ฅ6 = 1 ๐‘ฅ2 = 0 ๐‘ฅ2 = 1

0,3

5, 0, 0, 0, 1, 1

116

24

Page 25: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0,3

5, 0, 0, 0, 1, 1

116

0, 1,1

3, 1, 0, 0, 1

133.3

0, 1, 0, 1, 1, 0, 1 0,4

5, 1, 0, 0, 0, 1

118

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

๐‘ฅ6 = 0 ๐‘ฅ6 = 1 ๐‘ฅ2 = 0 ๐‘ฅ2 = 1

๐‘ฅ3 = 0 ๐‘ฅ3 = 1

133

25

Page 26: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

3. Fathom leaf if:i. LP relaxation solution is

integral

ii. LP relaxation is infeasible

iii. LP relaxation solution isnโ€™t better than best-known integral solution

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0,3

5, 0, 0, 0, 1, 1

116

0, 1,1

3, 1, 0, 0, 1

133.3

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

๐‘ฅ6 = 0 ๐‘ฅ6 = 1 ๐‘ฅ2 = 0 ๐‘ฅ2 = 1

๐‘ฅ3 = 0 ๐‘ฅ3 = 1

0,4

5, 1, 0, 0, 0, 1

118

0, 1, 0, 1, 1, 0, 1

133

26

Page 27: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

3. Fathom leaf if:i. LP relaxation solution

is integral

ii. LP relaxation is infeasible

iii. LP relaxation solution isnโ€™t better than best-known integral solution

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0,3

5, 0, 0, 0, 1, 1

116

0, 1,1

3, 1, 0, 0, 1

133.3

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

๐‘ฅ6 = 0 ๐‘ฅ6 = 1 ๐‘ฅ2 = 0 ๐‘ฅ2 = 1

๐‘ฅ3 = 0 ๐‘ฅ3 = 1

0,4

5, 1, 0, 0, 0, 1

118

0, 1, 0, 1, 1, 0, 1

133Integral

27

Page 28: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

3. Fathom leaf if:i. LP relaxation solution is

integral

ii. LP relaxation is infeasible

iii. LP relaxation solution isnโ€™t better than best-known integral solution

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0,3

5, 0, 0, 0, 1, 1

116

0, 1,1

3, 1, 0, 0, 1

133.3

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

๐‘ฅ6 = 0 ๐‘ฅ6 = 1 ๐‘ฅ2 = 0 ๐‘ฅ2 = 1

๐‘ฅ3 = 0 ๐‘ฅ3 = 1

0,4

5, 1, 0, 0, 0, 1

118

0, 1, 0, 1, 1, 0, 1

133

28

Page 29: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

3. Fathom leaf if:i. LP relaxation solution is

integral

ii. LP relaxation is infeasible

iii. LP relaxation solution isnโ€™t better than best-known integral solution

This talk: How to choose which variable?(Assume every other aspect of B&B is fixed.)

29

Page 30: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Variable selection policies can have a huge effect on tree size

30

Page 31: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bounda. Algorithm Overview

b. Variable Selection Policies

3. Learning algorithms

4. Experiments

5. Conclusion and Future Directions

31

Page 32: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Variable selection policies (VSPs)

Score-based VSP:

At leaf ๐‘ธ, branch on variable ๐’™๐’Š maximizing๐ฌ๐œ๐จ๐ซ๐ž ๐‘ธ, ๐’Š

Many options! Little known about which to use when

1,3

5, 0, 0, 0, 0, 1

136

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

๐‘ฅ2 = 0 ๐‘ฅ2 = 1

32

Page 33: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Variable selection policies

For an IP instance ๐‘„:

โ€ข Let ๐‘๐‘„ be the objective value of its LP relaxation

โ€ข Let ๐‘„๐‘–โˆ’ be ๐‘„ with ๐‘ฅ๐‘– set to 0, and let ๐‘„๐‘–

+ be ๐‘„ with ๐‘ฅ๐‘– set to 1

Example.

1

2, 1, 0, 0, 0, 0, 1

140

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7 ๐‘๐‘„๐‘„

33

Page 34: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Variable selection policies

For an IP instance ๐‘„:

โ€ข Let ๐‘๐‘„ be the objective value of its LP relaxation

โ€ข Let ๐‘„๐‘–โˆ’ be ๐‘„ with ๐‘ฅ๐‘– set to 0, and let ๐‘„๐‘–

+ be ๐‘„ with ๐‘ฅ๐‘– set to 1

Example.

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

๐‘๐‘„1โˆ’ ๐‘๐‘„1+

๐‘๐‘„๐‘„

34

Page 35: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

For a IP instance ๐‘„:

โ€ข Let ๐‘๐‘„ be the objective value of its LP relaxation

โ€ข Let ๐‘„๐‘–โˆ’ be ๐‘„ with ๐‘ฅ๐‘– set to 0, and let ๐‘„๐‘–

+ be ๐‘„ with ๐‘ฅ๐‘– set to 1

Example.

Variable selection policies

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

๐‘๐‘„1โˆ’ ๐‘๐‘„1+

๐‘๐‘„๐‘„

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

The linear rule (parameterized by ๐) [Linderoth & Savelsbergh, 1999]

Branch on variable ๐‘ฅ๐‘– maximizing:

score ๐‘„, ๐‘– = ๐œ‡min ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–โˆ’ , ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–

+ + (1 โˆ’ ๐œ‡)max ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–โˆ’ , ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–

+

35

Page 36: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Variable selection policies

And many moreโ€ฆ

The (simplified) product rule [Achterberg, 2009]

Branch on variable ๐‘ฅ๐‘– maximizing:

score ๐‘„, ๐‘– = ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–โˆ’ โˆ™ ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–

+

The linear rule (parameterized by ๐) [Linderoth & Savelsbergh, 1999]

Branch on variable ๐‘ฅ๐‘– maximizing:

score ๐‘„, ๐‘– = ๐œ‡min ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–โˆ’ , ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–

+ + (1 โˆ’ ๐œ‡)max ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–โˆ’ , ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–

+

36

Page 37: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Variable selection policies

Given ๐‘‘ scoring rules score1, โ€ฆ , scored.

Goal: Learn best convex combination ๐œ‡1score1 +โ‹ฏ+ ๐œ‡๐‘‘scored.

Branch on variable ๐‘ฅ๐‘– maximizing:

score ๐‘„, ๐‘– = ๐œ‡1score1 ๐‘„, ๐‘– + โ‹ฏ+ ๐œ‡๐‘‘scored ๐‘„, ๐‘–

Our parameterized rule

37

Page 38: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Application-

Specific

DistributionAlgorithm

Designer

B&B parameters

Model

๐ด 1 , ๐’ƒ 1 , ๐’„ 1 , โ€ฆ , ๐ด ๐‘š , ๐’ƒ ๐‘š , ๐’„ ๐‘š

How to use samples to find best B&B parameters for my domain?

38

Page 39: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Application-

Specific

DistributionAlgorithm

Designer

B&B parameters

Model

๐ด 1 , ๐’ƒ 1 , ๐’„ 1 , โ€ฆ , ๐ด ๐‘š , ๐’ƒ ๐‘š , ๐’„ ๐‘š

๐œ‡1, โ€ฆ , ๐œ‡๐‘‘

How to use samples to find best B&B parameters for my domain?๐œ‡1, โ€ฆ , ๐œ‡๐‘‘

39

Page 40: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithmsa. First-try: Discretization

b. Our Approach

4. Experiments

5. Conclusion and Future Directions

40

Page 41: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

First try: Discretization

1. Discretize parameter space

2. Receive sample problems from unknown distribution

3. Find params in discretization with best average performance

๐œ‡

Average tree size

41

Page 42: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

First try: Discretization

This has been prior workโ€™s approach [e.g., Achterberg (2009)].

๐œ‡

Average tree size

42

Page 43: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Discretization gone wrong

๐œ‡

Average tree size

43

Page 44: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Discretization gone wrong

๐œ‡

Average tree size

This can

actually

happen!

44

Page 45: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Discretization gone wrong

Theorem [informal]. For any discretization:

Exists problem instance distribution ๐’Ÿ inducing this behavior

Proof ideas:

๐’Ÿโ€™s support consists of infeasible IPs with โ€œeasy outโ€ variablesB&B takes exponential time unless branches on โ€œeasy outโ€ variables

B&B only finds โ€œeasy outsโ€ if uses parameters from specific range

Expected tree size

๐œ‡

45

Page 46: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithmsa. First-try: Discretization

b. Our Approachi. Single-parameter settings

ii. Multi-parameter settings

4. Experiments

5. Conclusion and Future Directions

46

Page 47: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Simple assumption

Exists ๐œ… upper bounding the size of largest tree willing to build

Common assumption, e.g.:

โ€ข Hutter, Hoos, Leyton-Brown, Stรผtzle, JAIRโ€™09

โ€ข Kleinberg, Leyton-Brown, Lucier, IJCAIโ€™17

47

Page 48: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

๐œ‡ โˆˆ [0,1]

Lemma: For any two scoring rules and any IP ๐‘„,

๐‘‚ (# variables)๐œ…+2 intervals partition [0,1] such that:

For any interval [๐‘Ž, ๐‘], B&B builds same tree across all ๐œ‡ โˆˆ ๐‘Ž, ๐‘

Much smaller in our experiments!

Useful lemma

48

Page 49: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Useful lemma

branch

on ๐‘ฅ2

branch

on ๐‘ฅ3

๐œ‡

๐œ‡ โˆ™ score1 ๐‘„, 1 + (1 โˆ’ ๐œ‡) โˆ™ score2 ๐‘„, 1

๐œ‡ โˆ™ score1 ๐‘„, 2 + (1 โˆ’ ๐œ‡) โˆ™ score2 ๐‘„, 2

๐œ‡ โˆ™ score1 ๐‘„, 3 + (1 โˆ’ ๐œ‡) โˆ™ score2 ๐‘„, 3

branch

on ๐‘ฅ1

Lemma: For any two scoring rules and any IP ๐‘„,

๐‘‚ (# variables)๐œ…+2 intervals partition [0,1] such that:

For any interval [๐‘Ž, ๐‘], B&B builds same tree across all ๐œ‡ โˆˆ ๐‘Ž, ๐‘

49

Page 50: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Useful lemma

๐‘„

๐‘„2โˆ’ ๐‘„2

+

Any ๐œ‡ in yellow interval:

๐‘ฅ2 = 0 ๐‘ฅ2 = 1

branch

on ๐‘ฅ2

branch

on ๐‘ฅ3

๐œ‡

branch

on ๐‘ฅ1

Lemma: For any two scoring rules and any IP ๐‘„,

๐‘‚ (# variables)๐œ…+2 intervals partition [0,1] such that:

For any interval [๐‘Ž, ๐‘], B&B builds same tree across all ๐œ‡ โˆˆ ๐‘Ž, ๐‘

๐‘„2โˆ’

50

Page 51: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Useful lemma

Lemma: For any two scoring rules and any IP ๐‘„,

๐‘‚ (# variables)๐œ…+2 intervals partition [0,1] such that:

For any interval [๐‘Ž, ๐‘], B&B builds same tree across all ๐œ‡ โˆˆ ๐‘Ž, ๐‘

๐œ‡

๐œ‡ โˆ™ score1 ๐‘„2โˆ’, 1 + (1 โˆ’ ๐œ‡) โˆ™ score2 ๐‘„2

โˆ’, 1

๐œ‡ โˆ™ score1 ๐‘„2โˆ’, 3 + (1 โˆ’ ๐œ‡) โˆ™ score2 ๐‘„2

โˆ’, 3

๐‘„

๐‘„2โˆ’ ๐‘„2

+

Any ๐œ‡ in yellow interval:

๐‘ฅ2 = 0 ๐‘ฅ2 = 1

branch on

๐‘ฅ2 then ๐‘ฅ3

branch on

๐‘ฅ2 then ๐‘ฅ1

51

Page 52: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Useful lemma

Lemma: For any two scoring rules and any IP ๐‘„,

๐‘‚ (# variables)๐œ…+2 intervals partition [0,1] such that:

For any interval [๐‘Ž, ๐‘], B&B builds same tree across all ๐œ‡ โˆˆ ๐‘Ž, ๐‘

๐œ‡

Any ๐œ‡ in blue-yellow interval:

branch on

๐‘ฅ2 then ๐‘ฅ3

branch on

๐‘ฅ2 then ๐‘ฅ1

๐‘„

๐‘„2โˆ’ ๐‘„2

+

๐‘ฅ3 = 0 ๐‘ฅ3 = 1

52

๐‘ฅ2 = 0 ๐‘ฅ2 = 1

Page 53: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Useful lemma

Proof idea.

โ€ข Continue dividing [0,1] into intervals s.t.:

In each interval, var. selection order fixed

โ€ข Can subdivide only finite number of times

โ€ข Proof follows by induction on tree depth

Lemma: For any two scoring rules and any IP ๐‘„,

๐‘‚ (# variables)๐œ…+2 intervals partition [0,1] such that:

For any interval [๐‘Ž, ๐‘], B&B builds same tree across all ๐œ‡ โˆˆ ๐‘Ž, ๐‘

๐œ‡

branch on

๐‘ฅ2 then ๐‘ฅ3

branch on

๐‘ฅ2 then ๐‘ฅ1

53

Page 54: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Learning algorithm

Input: Set of IPs sampled from a distribution ๐’Ÿ

For each IP, set ๐œ‡ = 0. While ๐œ‡ < 1:1. Run B&B using ๐œ‡ โˆ™ score1 + (1 โˆ’ ๐œ‡) โˆ™ score2, resulting in tree ๐’ฏ

2. Find interval ๐œ‡, ๐œ‡โ€ฒ where if B&B is run using the scoring rule ๐œ‡โ€ฒโ€ฒ โˆ™ score1 + 1 โˆ’ ๐œ‡โ€ฒโ€ฒ โˆ™ score2

for any ๐œ‡โ€ฒโ€ฒ โˆˆ ๐œ‡, ๐œ‡โ€ฒ , B&B will build tree ๐’ฏ (takes a little bookkeeping)

3. Set ๐œ‡ = ๐œ‡โ€ฒ

Return: Any เทœ๐œ‡ from the interval minimizing average tree size

๐œ‡ โˆˆ [0,1]

54

Page 55: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Learning algorithm guarantees

Let ฦธ๐œ‡ be algorithmโ€™s output given เทจ๐‘‚๐œ…3

๐œ€2ln(#variables) samples.

W.h.p., ๐”ผ๐‘„~๐’Ÿ[tree-size(๐‘„, ฦธ๐œ‡)] โˆ’ min๐œ‡โˆˆ 0,1

๐”ผ๐‘„~๐’Ÿ[treeโˆ’size(๐‘„, ๐œ‡)] < ๐œ€

Proof intuition: Bound algorithm classโ€™s intrinsic complexity (IC)โ€ข Lemma bounds the number of โ€œtruly differentโ€ parameters

โ€ข Parameters that are โ€œthe sameโ€ come from a simple set

Learning theory allows us to translate IC to sample complexity

๐œ‡ โˆˆ [0,1]

55

Page 56: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithmsa. First-try: Discretization

b. Our Approachi. Single-parameter settings

ii. Multi-parameter settings

4. Experiments

5. Conclusion and Future Directions

56

Page 57: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Useful lemma: higher dimensions

Lemma: For any ๐‘‘ scoring rules and any IP,

a set โ„‹ of ๐‘‚ (# variables)๐œ…+2 hyperplanes partitions 0,1 ๐‘‘ s.t.:

For any connected component ๐‘… of 0,1 ๐‘‘ โˆ–โ„‹,

B&B builds the same tree across all ๐ โˆˆ ๐‘…

57

Page 58: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Learning-theoretic guarantees

Fix ๐‘‘ scoring rules and draw samples ๐‘„1, โ€ฆ , ๐‘„๐‘~๐’Ÿ

If ๐‘ = เทจ๐‘‚๐œ…3

๐œ€2ln(๐‘‘ โˆ™ #variables) , then w.h.p., for all ๐ โˆˆ [0,1]๐‘‘,

1

๐‘

๐‘–=1

๐‘

treeโˆ’size(๐‘„๐‘– , ๐) โˆ’ ๐”ผ๐‘„~๐’Ÿ[treeโˆ’size(๐‘„, ๐)] < ๐œ€

Average tree size generalizes to expected tree size

58

Page 59: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithms

4. Experiments

5. Conclusion and Future Directions

59

Page 60: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Experiments: Tuning the linear rule

Let: score1 ๐‘„, ๐‘– = min ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–โˆ’ , ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–

+

score2 ๐‘„, ๐‘– = max ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–โˆ’ , ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–

+

Our parameterized rule

Branch on variable ๐‘ฅ๐‘– maximizing:

score ๐‘„, ๐‘– = ๐œ‡ โˆ™ score1 ๐‘„, ๐‘– + (1 โˆ’ ๐œ‡) โˆ™ score2 ๐‘„, ๐‘–

This is the linear rule [Linderoth & Savelsbergh, 1999]

60

Page 61: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Experiments: Combinatorial auctions

Leyton-Brown, Pearson, and Shoham. Towards a universal test suite for combinatorial auction algorithms. In Proceedings of the Conference on Electronic Commerce (EC), 2000.

โ€œRegionsโ€ generator:

400 bids, 200 goods, 100 instances

โ€œArbitraryโ€ generator:

200 bids, 100 goods, 100 instances

61

Page 62: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Additional experiments

Facility location:

70 facilities, 70 customers,

500 instances

Clustering:

5 clusters, 35 nodes,

500 instances

Agnostically learning

linear separators:

50 points in โ„2,

500 instances

62

Page 63: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithms

4. Experiments

5. Conclusion and Future Directions

63

Page 64: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Conclusion

โ€ข Study B&B, a widely-used algorithm for combinatorial problems

โ€ข Show how to use ML to weight variable selection rulesโ€ข First sample complexity bounds for tree search algorithm configuration

โ€ข Unlike prior work [Khalil et al. โ€˜16; Alvarez et al. โ€˜17], which is purely empirical

โ€ข Empirically show our approach can dramatically shrink tree sizeโ€ข We prove this improvement can even be exponential

โ€ข Theory applies to other tree search algos, e.g., for solving CSPs

64

Page 65: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Future directions

How can we train faster?โ€ข Donโ€™t want to build every tree B&B will make for every training instance โ€ข Train on small IPs and then apply the learned policies on large IPs?

Other tree-building applications can we apply our techniques to?โ€ข E.g., building decision trees and taxonomies

How can we attack other learning problems in B&B? โ€ข E.g., node-selection policies

65

Thank you! Questions?


Recommended