Summer School-2014
Irith Ben-Arroyo Hartman
Datasim project -(joint work with Abed Abu dbai, Elad Cohen,
Daniel Keren)
University of Haifa, Israel
July, 2014 1
SOLVING LARGE CARPOOLING PROBLEMS USING GRAPH THEORETIC TOOLS
Summer School-2014
Outline of talk
1. Background
2. Matching Problem• Matching in bipartite graphs
• The greedy matching algorithm
• Why is greedy performing so well?
• The assignment problem
3. General Problem formulation• Is there an efficient algorithm?
• What do we know about the general problem?
• F-factor in bipartite graphs
4. Heuristics for the general problem• Theoretical upper bounds
• Approximation algorithms
5. Other related problems
July, 2014 2
Summer School-2014
BACKGROUNDWhy carpool? Defining the graph model
July, 2014 3
Summer School-2014
Why carpool?
• Reduces the number of cars on the roads.
• Saves time, petrol, traffic congestion, noise, air pollution, parking spaces, stress, accidents.
• Encourages sociability
July, 2014 4
Summer School-2014
Background
• In IMOB (Transportation Research Institute – University of Hasselt) an automatic service for carpooling is being designed.
• People register their periodic trip executions (PTE) • i.e. a periodic trip on Monday from A to B leaving at about 9:00am
• Together with information: • origin and destination,
• earliest and latest departure and arrival times,
• maximal detour distance acceptable,
• capacity of the car (if available)
• Information of the person: age, gender, educational level, special interests, etc.
5July, 2014
Summer School-2014
Background
• The system suggests to individuals to share a car
• Individuals evaluate the suggestion, negotiate it, and possibly, agree to carpool.
• After the drive, individuals evaluate each other, the system uses the feedback for future trips.
July, 2014 6
Summer School-2014
The problem – informal description
• Given:1. a set of trips (Periodic Trip Executions),
2. some owners of trips own a car, and some don’t,
3. compatibility measure (wxy) of x riding in the car owned by y ,
4. capacity of every car ,
• Can we match between people so as to minimize the number of cars, and maximize the total compatibility between passengers and drivers?
July, 2014 7
Summer School-2014
Defining the Graph Model
July, 2014 8
passenger
x y
Wxy C(y)
Owner of vehicle (driver)
Wxy takes into account the origins and destinations of x and y, the times of departure and arrival, maximal detour distance, time flexibility, profiles of passenger and potential driver, feedback of passengers and driver.
C(y) is the capacity of the car –how many people it can contain including the driver.
Summer School-2014
How is wxy being computed?
• Use Path Similarity:
July, 2014 9
0 < pathSim(A,B)≤ 1
When is it close to 0? When is it 1?
Summer School-2014
• Use Time Interval Similarity
• Use Profile Similarity (age, gender, income category, job type, music preference, etc.)
• Use Reputation (safety, timeliness)
July, 2014 10
How is wxy being computed?
Summer School-2014
Assume we allow one passenger in a car
• How do we model the problem?
• What do we optimize?
July, 2014 11
Summer School-2014
THE MATCHING PROBLEMMatchings in bipartite graphs
The greedy matching algorithm
Why is greedy performing so well?
The assignment problem
July, 2014 12
Summer School-2014
Assume we allow one passenger in a car
• Definitions: A matching in a graph G=(V,E) is a collection of vertex disjoint edges.
• A matching is maximum (or maximum weight) if there is no other matching of larger cardinality (or larger weight).
• A matching is maximal (or maximal weight) if there is no other matching containing it which is of larger cardinality (or larger weight).
July, 2014 13
Summer School-2014
Example- matching
14
Summer School-2014
Example- maximal matching
15
Prove this is a maximum matching
A matching is defined on undirected graphs. What do we do if the graph is directed?
Summer School-2014
Matching in bipartite graphs
• Definition: A bipartite graph is a graph G=(V,E) where
V=V1U V2 and all edges in the graph are between V1 andV2.
July, 2014 16
V1 (passengers) V2(drivers)
Summer School-2014
Matching algorithms
• If the graph is bipartite and unweighted – we have the “Hungarian Algorithm” O(|V||E|), or
Hopcroft-Karp O (|E||V|1/2)
• If the graph is bipartite and weighted Kuhn-Munkers alg.
O(|V|2|E|)
• If the graph is general and unweighted – Edmonds (1965) algorithm and improvement by Micali-Vazirani O (|E||V|1/2)
• If the graph is general and weighted – Edmonds, and improvement by Galil O(|V||E|log|V|)
July, 2014 17
Summer School-2014
The greedy algorithm for max weight matching
18
In the worst case greedy / optimal = 1/2
Summer School-2014
Worse case scenario of greedy matching
July, 2014 19
Optimal matching
Summer School-2014
Worse case scenario of greedy matching
July, 2014 20
Optimal matching
Greedy matching
Summer School-2014
Performance of the Greedy Algorithm
21
0.958 0.96 0.962 0.964 0.966 0.968 0.97 0.972 0.974 0.976 0.9780
50
100
150
Accuracy (Greedy/K-M)
Num
ber
of
gra
phs
Accuracy Histogram
1000 graphs, each of size 500x500 and 10% edge density
Good (and surprising!?) news!
July, 2014
Summer School-2014
Why is the greedy heuristics performing so well?
22
Theorem (P. Erdos 1961):
A random graph in Gn,p almost surely has stability number at most
2p-1logn
Idea of proof is to look at the random variable
X – number of stable sets of cardinality k+1 in G, and compute
E(X)=
When
Implying that a random graph almost surely has stability no at most k.
July, 2014
npk log2 1 nasXP 1)0(
Summer School-2014 23
1. We use the fact that if G has a maximal (not maximum!) matching of size n-k then G has a stable set of size at least k.
2. We conclude that in Gn,p the greedy matching algorithm will almost surely find a matching of size at least n- 2p-1logn
Why is the greedy heuristics performing so well?
July, 2014
Summer School-2014
Assignment Problem (bipartite weighted graphs)
24
w
w
July, 2014
Summer School-2014
Incremental Approach:25
• Given an optimal weighted matching M in G, finding quickly an optimal matching M’ in G’. (where G’ differs from G by a relatively small number of edges. G’ is also called the ‘perturbed graph’.)
• We can estimate how far w(M) is from the optimal solution of G’.
• We can use the optimal matching and covering in Gin order to find quickly an optimal, or ‘good enough’ solution to the perturbed graph G’.
July, 2014
Summer School-2014
Comparing K-M to incremental K-M
0 5 10 15 20 25 30 35 400
2
4
6
8
10
12
14
16
18
generation
CP
U-t
ime (
seconds)
Average Run Time
Kuhn-Munkers incremental VS. Kuhn-Munkers
KM
KM-inc
0 5 10 15 20 25 30 35 400.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
Run Time Ratio
Kuhn-Munkers incremental VS. Kuhn-Munkers
generation
KM
-inc t
ota
l C
PU
-tim
e/K
M t
ota
l C
PU
-tim
e
26
Average run-time Run-time ratio
July, 2014
Summer School-2014
GENERAL PROBLEM FORMULATIONIs there an efficient algorithm?
What do we know about the general problem?
F-factor in bipartite graphs
July, 2014 27
Summer School-2014
What is different if we allow more than one passenger in a car?
July, 2014 28
Summer School-2014
General Problem – Formal description
Definitions: A directed star,
A Star Partition – is a covering of V by
disjoint directed stars.
Feasible Star Partition is a star partition where
each star with root r is of in-degree at most c(r), and r has a loop.
Problem formulation:
Given G=(V,E), c:V -> N, w:E -> (0,1)
Find a feasible star partition of V(G) such that the sum of the weights of all the edges in the stars is maximized.
July, 2014 29
c(r)=5
Summer School-2014
General Problem Formulation -LP
July, 2014 30
Summer School-2014
Does this problem minimize no of drivers?
Assume the capacity of each vertex is 6. Weights of the edges are denoted in the graph. All other edges in the graph have weight 0.
July, 2014 31
Example:
1
1
1
1 0.60.6
0.6
Summer School-2014
Example
1
1
1
1 0.60.6
0.6
What is max Σwij xijsubject to the LP constraints? It is 4 with 2 drivers
Assume the capacity of each vertex is 6. Weights of the edges are denoted in the graph. All other edges in the graph have weight 0.
July, 2014 32
Summer School-2014
Example
1
1
1
1 0.60.6
0.6
What is max Σwij xijsubject to the LP constraints? It is 4 with 2 drivers.
But the minimum no. of drivers is 1 with total weight 3.8
Assume the capacity of each vertex is 5. Weights of the edges are denoted in the graph. All other edges in the graph have weight 0.
July, 2014 33
Summer School-2014
When is optimal star partition = minimum no of drivers?
• Problem 1: Given G=(V,E), w:E ->R, c:V -> N, find a feasible star partition that covers a set of edges of maximum weight.
• Problem 2: Given G=(V,E), c:V -> N, find a feasible star partition with a minimum number of stars.
• Claim: If wij = 1 for every existing edge (except for loops which have wii=0) then Problems 1 and 2 are equivalent
July, 2014 34
Summer School-2014
• Every star with c legs covers c+1 vertices.
• If we have a star partition with d stars, then the total no of edges covered by the stars is
|V| - d.
• Thus minimizing the number of stars is equivalent to maximizing the number of edges covered by the stars.
July, 2014 35
Proof of Claim
Summer School-2014 36
Summer School-2014
IS THERE AN EFFICIENT ALGORITHM FOR THE GENERAL CARPOOLING PROBLEM?No! The problem is NP-Complete
July, 2014 37
Summer School-2014
• Claim 1: When the drivers are unknown, and the edge weights are 0/1 the problem is NP-hard.
• Proof: Reduction from the Minimum Dominating Set Problem.
• A dominating set in a graph is a subset of vertices such that for every there exists some
adjacent to it.
July, 2014 38
Proof of NP-completeness
uÏV 'V 'ÏV
• Given a directed graph , and an integer k > 0,
does there exist a dominating set of size at most k? - an NP-Complete problem.
G = (V,E)
vÎV '
Summer School-2014
Proof of NP-completeness
39
Claim 1: When the drivers are unknown, and the edge weights are 0/1 the problem is NP-hard.
Proof: Reduction from the Minimum Dominating Set Problem.
C(v)=max-degree
G
July, 2014
Summer School-2014
Proof of NP-completeness
40
Claim 2: When the drivers are unknown, the edge weights are 0/1 and c(v) ≤ 3 the problem is NP-hard.
Proof: Reduction from the problem of partitioning into paths of length two.
July, 2014
Summer School-2014
What about the case when the drivers are known in advance?
July, 2014 41
Summer School-2014July, 2014 42
What about the case when the drivers are known in advance?
C=3. Is this an optimal solution? If yes – prove it!No – give a better solution.
V2(drivers)V1 (passengers)
Summer School-2014
• Let . An f-factor is a collection of edges such that E’ meets every in exactly f(v) edges.
• Q1: What is a 1-factor?
• Q2: For the carpool problem when the drivers are known and every driver can take at most 4 passengers, what are we looking for?
July, 2014 43
f :V®N
An f - factor in a graph
EE ' Vv
Summer School-2014
example
July, 2014 44
V1 (passengers) V2(drivers)
2
1
2
1)(
Vvif
Vvifvf
Summer School-2014
How do we find an f-factor, or a maximum partial f-factor in a bipartite graph?
Convert G to G’, and look for a 1-factor in G’.
July, 2014 45
V2V1 V2‘V1‘
G G’
Summer School-2014
How do we find an f-factor, or a maximum partial f-factor in a bipartite graph?
Convert G to G’, and look for a 1-factor in G’.
July, 2014 46
V2V1 V2‘V1‘
G G’
Summer School-2014
What do we know about the general problem?
There are 8 possible scenarios, depending on these 3 questions:
• Capacity 1/2 or general?• Capacity 1/2 is a matching problem. – there exist efficient algorithms
• General capacity is a star partition problem- Intractable problem
• Edge weights 0/1 or (0,1)?• 0/1 edge weights is an un-weighted graph.
• (0,1) edge weights corresponds to a weighted graph
• Drivers are a-priori known/unknown?• If the drivers are known, the graph is bipartite. (edges among
passengers, or among drivers are irrelevant).
• If the drivers are not known, the graph is general. Intractable problem
July, 2014 47
Summer School-2014
Summary of all possible scenariosUnknown DriversKnown Drivers
Max matching in general graphs
Max bipartite matching1/2 capacity
NP-hard (even for capacity 3)Can be reduced to max bipartite matching
General Capacity
July, 2014 48
Unknown DriversKnown Drivers
Max weight matching in general graphs
Max weighted bipartite matching (assignment Pb)
1/2 capacity
NP-hard (even for capacity 3)Can be reduced to max weighted bipartite matching
General Capacity
0/1 edge weights
General edge weights
Summer School-2014
HEURISTICS FOR THE GENERAL PROBLEMChallenges: NP-hard problem for c >2
Theoretical upper bounds
Approximation algorithms
July, 2014 49
Summer School-2014
Basic, greedy heuristics for the general carpooling problem
Given G=(V,E), c:V -> N, w:E -> (0,1) and a subset D of V consisting of potential drivers.
Take heaviest edge as long as it does not violate the star family
July, 2014 50
Summer School-2014
We take into account the following considerations:
1. It is preferable to match non-drivers before potential drivers, since potential drivers, if unmatched, can always drive their own vehicle.
2. Since we would like to minimize the number of vehicles, it is preferable to assign passengers to existing vehicles, (which already contain passengers) than to use 'new' vehicles.
3. If a new vehicle is used, it is preferable to use a vehicle with larger capacity, than to use a small capacity vehicle.
July, 2014 51
Better (linear) heuristics for the general problem
Summer School-2014
Greedy heuristics for the general problem
July, 2014 52
Summer School-2014
A different approach – by giving potential drivers a weight function
July, 2014 53
Summer School-2014
Other greedy heuristics...
• Other heuristics are also possible, such as
• Pick a vertex v (in D) with a highest sum of weights of in-edges to v. Add highest c(v) edges to v, remove from the graph, and continue...
July, 2014 54
Summer School-2014
Heaviest driver heuristics-V1
July, 2014 55
Summer School-2014
Heaviest driver heuristics- V3
July, 2014 56
1. “Guess” a set of drivers (the ‘heaviest drivers’),
2. Try to match all passengers to the drivers (a bipartite graph problem)
3. If you do not succeed, then add drivers, until you succeed.
Summer School-2014
Greedy with local adjustments of weights
July, 2014 57
Summer School-2014
THEORETICAL UPPER BOUNDS
July, 2014 58
Summer School-2014
Naïve upper bound to a star family
• The number of stars is at least n/c
• Number of edges chosen is at most (c-1)n/c
July, 2014 59
Summer School-2014
Better upper bound to a star family
July, 2014 60
• Assume
• Take the smallest k s.t.
• Now sort E in descending order by w
• Then
c(v1) ³ c(v2 ) ³ ... ³ c(vn )
nvck
i i 1)(
w(e1) ³w(e2 ) ³ ... ³w(em)
kn
i iewHw1
)()(max
Summer School-2014
How do we evaluate the heuristics?
• We can compare the weight of the selected star family , w(H) in different heuristics, as well as the amount of all unmatched vertices U.
• We can compare running times between heuristics
• We can compare w(H) to a theoretical upper bound:
July, 2014 61
Summer School-2014
Comparison with optimal solution for c=2i.e. finding a maximum weight matching
62July, 2014
Summer School-2014
How do we compare to an optimal solution if we cannot compute the optimal solution? 1. IDEA: Assume c=5, w=1.
2. “Plant” an optimal solution.
3. Add edges to “hide” it.
4. Run the heuristic algorithms and see if they ‘”find” it.
63July, 2014
Summer School-2014
“plant” an optimal solution
64July, 2014
Summer School-2014
Add edges to “hide” it
65July, 2014
Summer School-2014
Comparison with a known optimal solution
66July, 2014
Summer School-2014
Results on real data
July, 2014 67
Summer School-2014
Approximation algorithms
• An algorithm is a p-approximation algorithm if it has a solution of value f(x) (for input x), where
• E.g. the greedy matching algorithm is ½-approximation algorithm.
• What does the greedy star partition algorithm give?
July, 2014 68
OPTxfOPTp )(
Summer School-2014
Greedy star forest is 1/c -approximationE.g.c=5
July, 2014 69
Summer School-2014
Greedy star forest is 1/c -approximationE.g.c=5
July, 2014 70
Optimal solution
Summer School-2014
Greedy star forest is 1/c -approximationE.g.c=5
July, 2014 71
Optimal solution
Greedy solution
Summer School-2014
Can we find a heuristics that is better than?1/c - approximation
• If the graph is undirected, there is a ½ - approximation algorithm for the star forest problem. [Nguyen, Shen, Hou, Sheng, Miller and Zhang]
1. Take a maximum weight spanning tree t.
2. Pick either the odd layers or the even layers of t– whichever is the heaviest.
3. Get a star forest of weight at least ½ the optimum star forest.
July, 2014 72
)(2
1)(2
1)(2
1)( GOPTGOPTTwSFw sftGG
Summer School-2014
Example
1. Taking a max weight spanning tree
July, 2014 73
Summer School-2014
Example
1. Pick either the odd or even layers of the tree
July, 2014 74
Summer School-2014
What about directed graphs?
1. Take a maximum weight reverse-arborescence. (Algorithm by Edmonds)
July, 2014 75
Summer School-2014
What are the problems with this algorithm?
1. Is a max weight “reverse-arborescence “ heavier than a max weight star forest?
2. How can we guarantee that the in-degree of every vertex is not greater than its capacity –c(v) ?
July, 2014 76
Summer School-2014
Solution to problem 1
July, 2014 77
G
W=0
Summer School-2014
Solution to problem 1
July, 2014 78
G
W=0
Summer School-2014
Solution to problem 1
July, 2014 79
G
W=0
Summer School-2014
Additional Extensions
1. Carpooling with Aversion:
Some passengers do not want to share a ride with some other specific passengers.
• We have shown this problem is NP-complete even when the drivers are known. (reduction from the minimum vertex-colouring problem).
2. Carpooling with Attachment:
• Some passengers prefer to be with other passengers in the same ride.
• We have shown this problem is NP-complete even when the drivers are known. (reduction from the knapsack problem).
July, 2014 80
Summer School-2014
Additional Extensions
3. Carpooling with VIP passengers.
• Some passengers do not want to be “squeezed” in a carpool, they want to share a ride with only few others:
• NP-Complete – same as above.
July, 2014 81
Summer School-2014
Conclusions
• We have proved that the general carpooling problem is NP-hard
• Found quick algorithms and incremental algorithms for the case of bipartite graphs.
• Devised and implemented 6 different heuristics for the general problem on real data
• Compared between the heuristics in terms of running time and performance
• Compared to the optimal matching in general graphs. (c=2), and to various upper bounds on the general problem .
• Challenge: Find a good approximation algorithm and show it is close enough to the optimal solution.
July, 2014 82
Summer School-2014
THANK –YOU!
July, 2014 83
Questions?