Regret Minimizationand
Job Scheduling
Yishay MansourTel Aviv University
3
Decision Making under uncertainty
• Online algorithms– Stochastic models– Competitive analysis– Absolute performance
criteria• A different approach:
– Define “reasonable“ strategies
– Compete with the best (in retrospect)
– Relative performance criteria
5
Routing
• Model: Each day 1. select a path from source to
destination 2. observe the latencies.
– Each day diff values
• Strategies: All source-dest. paths • Loss: The average latency on
the selected path• Performance Goal: match the
latency of best single path
6
Financial Markets: options
• Model: stock or cash. Each day, set portfolio
then observe outcome.• Strategies: invest either:
all in stock or, all in cash• Gain: based on daily
changes in stock• Performance Goal:
Implements an option !
CASH
STOCK
7
Machine learning – Expert Advice
• Model: each time 1. observe expert predictions 2. predict a label• Strategies: experts
(online learning algorithms)• Loss: errors• Performance Goal: match the
error rate of best expert– In retrospect
1
2
3
4
1
0
1
1
8
Parameter Tuning
• Model: Multiple parameters.
• Strategies: settings of parameters• Optimization: any• Performance Goal:
match the best setting of parameters
9
Parameter Tuning
• Development Cycle– develop product (software)– test performance– tune parameters– deliver “tuned” product
• Challenge: can we combine– testing– tuning– runtime
10
Regret Minimization: Model
• Actions A={1, … ,N}• Time steps: t ∊ { 1, … , T}• At time step t:– Agent selects a distribution pt(i) over A– Environment returns costs ct(i) ε [0,1]
• Adversarial setting– Online loss: lt(on) = Σi ct(i) pt(i)
• Cumulative loss : LT(on) = Σt lt(on)
11
External Regret
• Relative Performance measure:– compares to the best strategy in A
• The basic class of strategies
• Online cumulative loss : LT(on) = Σt lt(on)
• Action i cumulative loss : LT(i) = Σt ct(i)
• Best action: LT(best) = MINi {LT(i) }=MINi {Σt ct(i)}
• External Regret = LT(on) – LT(best)
External Regret Algorithm
• Goal: Minimize Regret• Algorithm:– Track the regrets– Weights proportional to the regret
• Formally: At time t– Compute the regret to each action• Yt(i)= Lt(on)- Lt(i), and rt(i) = MAX{ Yt(i),0}
• pt+1(i) = rt(i) / Σi rt(i) – If all rt(i) = 0 select pt+1 arbitrarily.
• Rt = < rt(1), …,rt(N)> and ΔRt= Yt - Yt-1
External Regret Algorithm: Analysis
Rt = < rt(1), …,rt(N)> and ΔRt= Yt - Yt-1
• LEMMA: ΔRt ∙ Rt-1 = 0
Σi(ct(i) – lt(on)) rt-1(i) =
Σict(i)rt-1(i)– Σilt(on)rt-1(i)
Σi lt(on) rt-1(i)
= [Σi ct(i) pt(i) ]Σi rt-1(i)
= Σi ct(i)rt-1(i)• LEMMA:
Rt-1
Rt
NTTt t
ΔRiT
Ri
12
2)(max
14
External regret: Bounds
• Average regret goes to zero– No regret– Hannan [1957]
• Explicit bounds– Littstone & Warmuth ‘94 – CFHHSW ‘97– External regret = O(log N + √Tlog N)
15
16
Dominated Actions
• Model: action y dominates x if y always better than x• Goal: Not to play
dominated actions• Goal (unknown model):
The fraction of times we play dominated actions is played is vanishing
.3
.8
.9
.3
.6
CostAction y
CostAction x
.2
.4
.7
.1
.3
Internal/Swap Regret
• Internal Regret– Regret(x,y) = ∑t: a(t)=x ct(x) - ct(y)
– Internal Regret = maxx,y Regret(x,y)
• Swap Regret– Swap Regret = ∑x maxy Regret(x,y)
• Swap regret ≥ External Regret– ∑x maxy Regret(x,y) ≥ maxy ∑x Regret(x,y)
• Mixed actions– Regret(x,y) = ∑t (ct(x) - ct(y))pt(x)
Dominated Actions and Regret
• Assume action y dominates action x– For any t: ct(x) > ct(y)+δ
• Assume we used action x for n times– Regret(x,y) > δ n
• If SwapRegret < R then– Fraction of time dominated action used – At most R/δ
19
Calibration
• Model: each step predict a probability and observe outcome
• Goal: prediction calibrated with outcome– During time steps where
the prediction is p the average outcome is (approx) p
.3
.5
.3
.3
.5
predictions outcome
Calibration:
.3
.5
1/3
1/2
Predict prob. of rain
Calibration to Regret
• Reduction to Swap/Internal regret:– Discrete Probabilities• Say: {0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}
– Loss of action x at time t: (x – ct)2
– y*(x)= argmaxy Regret(x,y) • y*(x)=avg(ct|x)
– Consider R(x,y*(x))
21
Internal regret
• No internal regret– [Foster & Vohra] , [Hart & Mas-Colell]• Based on the approachability theorem [Blackwell ’56]
– Explicit Bounds– [Cesa-Bianchi & Lugasi ’03]• Internal regret = O(log N + √T log N)
– [Blum & Mansour]• Swap regret = O(log N + √T N)
22
Regret: External vs Internal
• External regret– You should have bought
S&P 500
– Match boyi to girli
• Internal regret – Each time you bought
IBM you should have bought SUN
– Stable matching
• Limitations:- No state- Additive over time
23
[Even-Dar, Mansour, Nadav, 2009]
Routing Gamess1
t1
t2 s2
f1, L
f1, R
f2, T
f2, B
f2
f1
• Atomic
– Finite number of players
– Player i transfer flow from si to ti
f1,L
f2,T
Latency on edge e = Le(f1,L + f2,T)
e
• Costi = pε (si, ti) Latency(p) * flowi (p)
• Splittable flows
Cournot Oligopoly [Cournot 1838]
• Best response dynamics converges for 2 players [Cournot 1838]
– Two player’s oligopoly is a super-modular game [Milgrom, Roberts 1990]
• Diverges for n 5 [Theocharis 1960]
X Y
Cost1(X) Cost2(Y)
Market price
Overall quantity
• Firms select production level• Market price depends on the TOTAL supply• Firms maximize their profit = revenue - cost
X y
P
Resource Allocation Games
• Each advertiser wins a proportional market share
$5M $10M $17M $25M
• Advertisers set budgets:
‘s allocated rate = 5+10+17+25
25
f (U = ) - $25M
• Utility:
– Concave utility from allocated rate
– Quasi-linear with money
The best response dynamics generally diverges for linear resource allocation games
Properties of Selfish Routing, Cournot Oligopoly and Resource Allocation Games
1. Closed convex strategy set
2. A (weighted) social welfare is concave
3. The utility of a player is convex in the vector of actions of other players
R
There exists 1,…,n > 0
Such that
1 u1 (x) + 2 u2(x)+…+n un(x)
Socially Concave Games
The relation between socially concave games and concave games
Zero SumGames
SociallyConcaveGames
ConcaveGames
Concave Games [ Rosen 65]• The utility of a player is strictly concave in her
own strategy
• A sufficient condition for equilibrium uniqueness
Normal Form Games
(with mixed strategies)
Unique Nash Equilibrium
Atomic, splittable
routing
Resource Allocation
Cournot
The average action and average utility converge to NE
Theorem 1: The average action profile converges to NE
Player 1:
Player 2:
Player n:
Day 1 Day 2 Day 3 Day T Average of days 1…T
( )T - Nash equilibrium
Theorem 2: The average daily payoff of each player converges to her payoff
in NE
If each player uses a procedure without regret in socially
concave games then their joint play converges to Nash
equilibrium:
Convergence of the “average action” and “average payoff” are two different things!
• Here the average action converges to (½,½) for every player
s tts
• But the average cost is 2, while the average cost in NE is 1
s t
• On Even Days • On Odd Days
The Action Profile Itself Need Not Converge
s tts
• On Even Days • On Odd Days
32
Correlated Equilibrium
• CE: A joint distribution Q• Each time t, a joint action
drawn from Q – Each player action is BR
• Theorem [HM,FV]: Multiple players playing
low internal (swap) regret converge to CE
.3
.8
.9
.3
.6
Action x Action y
.2
.4
.7
.1
.3
33[Even-Dar, Klienberg, Mannor, Mansour, 2009]
35
Job Scheduling: Motivating Example
Load Balancer
usersservers
GOAL: Minimize load on servers
36
Online Algorithms
• Job scheduling– N unrelated machines
• machine = action– each time step:
• a job arrives– has different loads on
different machines• algorithm schedules
the job on some machine– Given its loads
– Goal:• minimize the loads
– makespan or L2
• Regret minimization– N actions• machines
– each time step• First, algorithm
selects an action (machine)• Then, observes the
losses– Job loads
– Goal: • minimize the sum of
losses
37
Modeling Differences: Information
• Information model:– what does the algorithm know when it selects
action/machine• Known cost: – First observe costs then select action– job scheduling
• Unknown cost:– First select action then observe costs– Regret Minimization
38
Modeling Differences: Performance
• Theoretical Performance measure:– comparison class• job scheduling: best (offline) assignment• regret minimization: best static algorithm
– Guarantees:• job scheduling: multiplicative• regret minimization: additive and vanishing.
• Objective function:– job scheduling: global (makespan)– regret minimization: additive.
39
Formal Model
• N actions• Each time step t algorithm ON– select a (fractional) action: pt(i)
– observe losses ct(i) in [0,1]
• Average losses of ON– for action i at time T: ONT(i) = (1/T) Σt<T pt(i) ct(i)
• Global cost function:– C∞(ONT(1), … , ONT(N)) = maxi ONT(i)
– Cd(ONT(1), … , ONT(N)) = [ Σi (ONT(i))d ]1/d
40
Formal Model
• Static Optimum:– Consider any fixed distribution α • Every time play α
– Static optimum α* - minimizes cost C
• Formally:– Let α ◊ L = (α(1)L(1) , … , α(N) L(N))
• Hadamard (or Schur) product.– best fixed α*(L) = arg minα C(α ◊ L )
• where LT(i) = (1/T) Σt ct(i)– static optimality C*(L) = C(α*(L) ◊ L)
41
Example
• Two machines, makespan:
observedloads
α*(L)L1 L2
21
1
21
2 ,LL
L
LL
L finalloads
L1 L2
21
12
LL
LLmakespan
4 2 ( 1/3 , 2/3) 4/3 4/3
42
Our Results: Adversarial General
• General Feasibility Result:– Assume C convex and C* concave• includes makespan and Ld norm for d>1.
– There exists an online algorithm ON, which for any loss sequence L:
C(ON) < C*(L) + o(1)– Rate of convergence about √N/T
43
Our Results: Adversarial Makespan
• Makespan Algorithm– There exists an algorithm ON– for any loss sequence LC(ON) < C*(L) + O(log2 N / √T)
• Benefits:– very simple and intuitive– improved regret bound
Δ
Tpt
2
1)1(
Two actions
44
Our Results: Adversarial Lower Bound
• We show that for many non-convex C there is a non-vanishing regret– includes Ld norm for d<1
• Non-vanishing regret ratio >1
There is a sequence of losses L, such that,C(ON) > (1+γ) C*(L), where γ>0
45
Preliminary: Local vs. Globaltime
B1 B2 Bk
Low regret in each block
Overall low regret
46
Preliminary: Local vs. Global
• LEMMA:– Assume C convex and C* concave,– Assume a partition of time to Bi
– At each time block Bi regret at most Ri
Then:
C(ON)-C*(L) ≤ Σi Ri
47
Preliminary: Local vs. Global
Proof:C(ON) ≤ Σ C(ON(Bi)) C is convexΣ C*(L(Bi)) ≤ C*(L) C* is concaveC(ON(Bi)) – C*(L(Bi)) ≤ Ri low regret in each Bi
Σ C(ON(Bi)) – C*(L(Bi)) ≤ Σ Ri C(ON) – C*(L) ≤ Σ Ri
QED
• Enough to bound the regret on subsets.
48
Example
t=1
t=2
M1 M2
arrivallosses
static optα*=(1/2,1/2)cost = 3/2
M1 M2
local optα*:(1/3,2/3)(2/3,1/3)cost = 4/3
M1 M2M1 M2
global offline opt:(0,1)(1,0)cost = 1
Stochastic case:
• Each time t the costs are drawn from a joint distribution, – i.i.d over time steps, not between actions
INTUITION: Assume two actions (machines)• Load Distribution:– With probability ½ : (1,0)– With probability ½ : (0,1)
• Which policy minimizes makespan regret?!• Regret components:– MAX(L(1),L(2)) = sum/2 +|Δ|/2– Sum=L(1)+L(2) & Δ=L(1)-L(2)
Stochastic case: Static OPT
• Natural choice (model based)– Select always action ( ½, ½ )
• Observations:– Assume T/2+Δ times (1,0) and T/2-Δ times (0,1)– Loads (T/4+ Δ/2 , T/4-Δ/2)– Makespan = T/4+ Δ/2 > T/4– Static OPT: T/4 – Δ2/T < T/4
• W.h.p. OPT is T/4-O(1)
• sum=T/2 & E[|Δ|]= O(√T)– Regret = O(√T)
Can we do better ?!
Stochastic case: Least Loaded
• Least loaded machine:– Select the machine with the lower current load
• Observation:– Machines have same load (diff ≤ 1): |Δ| ≤ 1– Sum of loads: E[sum] = T/2 – Expected makespan = T/4
• Regret– Least Loaded Makespan LLM=T/4 ± √T– Regret =MAX{LLM-T/4,0} = O(√T)• Regret considers only the “bad” regret
Can we do better ?!
Stochastic case: Optimized Finish
• Algorithm:– Select action ( ½, ½ )
• For T-4√T steps
– Play least loaded afterwards.• Claim: Regret =O(T1/4)– Until T-4 √T steps (w.h.p) Δ < 2√T– Exists time t in [T-4 √T ,T]:
• Δ=0 & sum = T/2 + O( T1/4)• From 1 to t: regret = O(T1/4)• From t to T: Regret = O(√(T-t)) =
O(T1/4)
Can we do better ?!
Stochastic case: Any time
• An algorithm which has low regret for any t– Not plan for final horizon T
• Variant of least loaded:– Least loaded weight: ½ + T-1/4
• Claim: Regret = O(T1/4)– Idea:• Regret = max{(L1 + L2)/2 – T/4,0} + Δ• Every O(T1/2) steps Δ=0• Very near (½, ½)
Can we do better ?!
Stochastic case: Logarithmic Regret
• Algorithm:– Use phases– Length of phases shrink exponentially• Tk= Tk-1/2 and T1= T/2• Log T phases
– Every phase cancels deviations of previous phase• Deviation from the expectation
• Works for any probabilities and actions !– Assuming the probabilities are known.
Can we do better ?!
Stochastic case
• Assume that each action’s cost is drawn from a joint distribution, – i.i.d over time steps
• Theorem (makespan/Ld)– Known distribution• Regret =O(log T /T)
– Unknown distributions• Regret = O( log2 T /T)
Summary
• Regret Minimization– External – Internal – Dynamics
• Job Scheduling and Regret Minimization– Different global function– Open problems:• Exact characterization• Lower bounds
64
65
Makespan Algorithm
• Outline:– Simple algorithm for two machines• Regret O(1/√T)• simple and almost memory-less
– Recursive construction:• Given three algorithms: two for k/2 actions andone for 2 actions build an algorithm for k actions• Main issue: what kind of feedback to “propagate”.• Regret O(log2 N /√T)
– better than the general result.
66
Makespan: Two Machines
• Intuition:– Keep online’s loads balanced
• Failed attempts:– use standard regret minimization• In case of unbalanced input sequence L,• algo. will put most of the load on single machine
– use optimum to drive the probabilities• Our approach: – Use the online loads • not opt or static cumulative loads
67
Makespan Algorithm: Two actions
• At time t maintain probabilities pt,1 and pt,2 = 1-pt,1
• Initially p1,1 = p1,2 = ½• At time t:
• Remarks:– uses online loads– Almost memory-less
T
ONONT
lplppp
tt
tttttt
1,2,
1,1,2,2,1,1,1
2
1
Δ
Tpt
2
11,
68
Makespan Algorithm: Analysis
• View the change in probabilities as a walk on the line.
0½ 1
69
Makespan Algorithm: Analysis
• Consider a small interval of length ε
• Total change in loads:– identical on both• started and ended
with same Δ
• Consider only losses in the interval– local analysis
• Local opt is also in the interval• Online used “similar” probability– loss of at most ε per step
70
Makespan Algorithm: Analysis
• Simplifying assumptions:– The walk is “balanced” in every interval
• add “virtual” losses to return to initial state• only O(√T) additional losses
– relates the learning rate to the regret
– Losses “cross” interval’s boundary line• needs more sophisticated “bookkeeping”.
– make sure an update affects at most two adjacent intervals.
– Regret accounting• loss in an interval• additional “virtual” losses.
71
Makespan: Recursive algorithm
• Recursive algorithm:
A3
A1A2
72
Makespan: Recursive
• The algorithms:– Algorithms A1 and A2:
• Each has “half” of the actions– gets the actual losses and “balances” them
• each work in isolation.– simulating and not considering actual loads.
– Algorithm A3• gets the average load in A1 and A2
– balances the “average” loads.
A3
A1 A2
AVG(lt,i qt,i ) AVG(l t,iq’ t,i
)
lt,1 …. lt,k/2 ….
74
Makespan: Recursive
• The combined output :
A3A1A2
p2p1
x xq1, …
q1p1, … ,
qk/2, …
qk/2p2, … ,
l1, … lk/2, …
AVG(lt,1 qt,i ) AVG(l t,k/2q t,i
)
75
Makespan: Recursive
• Analysis (intuition):– Assume perfect ZERO regret • just for intuition …
– The output of A1 and A2 completely balanced• The average equals the individual loads
– maximum=average=minimum
– The output of A3 is balanced• the contribution of A1 machines equal to that of A2
• Real Analysis:– need to bound the amplification in the regret.