Low-regret Online Decision-making Via Bellman Inequalities
Joint work with Sid Banerjee and Itai Gurvich
2/36Relaxations and Regret Bounds for Online Problems
● Must make decisions upon request ● Uncertain process● Statistical information available● Goal: develop practical near optimal algorithms
3/36
Our Results
Relaxations and Regret Bounds for Online Problems
4/36
Our Results
Relaxations and Regret Bounds for Online Problems
Meta-Theorem For diferent resource allocation problems, we
give a practical policy, based on re-solving an optimization
program, with bounded .
The bound is independent of the horizon and capacities.
5/36
Our Results
Relaxations and Regret Bounds for Online Problems
Meta-Theorem For diferent resource allocation problems, we
give a practical policy, based on re-solving an optimization
program, with bounded .
The bound is independent of the horizon and capacities.
● Applications: Dynamic posted pricing, Online Knapsack, Network Revenue Management (Online Packing), Online Matching, Online Probing, Contextual Bandits
6/36
Our Results
Relaxations and Regret Bounds for Online Problems
Meta-Theorem For diferent resource allocation problems, we
give a practical policy, based on re-solving an optimization
program, with bounded .
The bound is independent of the horizon and capacities.
● Applications: Dynamic posted pricing, Online Knapsack, Network Revenue Management (Online Packing), Online Matching, Online Probing, Contextual Bandits
● Challenges: defne a benchmark and use it to design an algorithm
7/36
Why Constant Regret?
Relaxations and Regret Bounds for Online Problems
Case Study: edge weighted online matching
8/36
Why Constant Regret?
Relaxations and Regret Bounds for Online Problems
Case Study: edge weighted online matching
9/36
Why Constant Regret?
Relaxations and Regret Bounds for Online Problems
● Algorithms are diferent● Not worst case, but parametric
Case Study: edge weighted online matching
10/36
Problem 1: Online Knapsack
Relaxations and Regret Bounds for Online Problems
● Finite set of types:
● Known reward distribution and weight:
● Initial budget and horizon:
● Arrival process:
● Objective: collect as much reward as possible
11/36
Types of Benchmark
Relaxations and Regret Bounds for Online Problems
Number of type- arrivals
12/36
Types of Benchmark
Relaxations and Regret Bounds for Online Problems
Reward
Algorithm Optimal (DP) Prophet
Regret Number of type- arrivals
13/36
Online Packing
Relaxations and Regret Bounds for Online Problems
Theorem A natural policy with constant expected regret
for online packing problems. Regret independent of .
In particular, the regret depends only on
Generalizes to multiple resources and other arrival processes.
14/36
Online Packing
Relaxations and Regret Bounds for Online Problems
Theorem A natural policy with constant expected regret
for online packing problems. Regret independent of .
In particular, the regret depends only on
Generalizes to multiple resources and other arrival processes.
15/36
Online Packing
Relaxations and Regret Bounds for Online Problems
Theorem A natural policy with constant expected regret
for online packing problems. Regret independent of .
In particular, the regret depends only on
Generalizes to multiple resources and other arrival processes.
16/36
Online Packing
Relaxations and Regret Bounds for Online Problems
Theorem A natural policy with constant expected regret
for online packing problems. Regret independent of .
In particular, the regret depends only on
Generalizes to multiple resources and other arrival processes.
17/36
Online Packing
Relaxations and Regret Bounds for Online Problems
Theorem A natural policy with constant expected regret
for online packing problems. Regret independent of .
In particular, the regret depends only on
Generalizes to multiple resources and other arrival processes.
Similar results in a recent work for restricted cases [Bumpensanti & Wang]
18/36
Overview of the General Framework
Relaxations and Regret Bounds for Online Problems
Goal: Handle more general problems
19/36
Overview of the General Framework
Relaxations and Regret Bounds for Online Problems
Goal: Handle more general problems
20/36
Intuition
Relaxations and Regret Bounds for Online Problems
Given the additional information, Prophet wants to solve a DP
21/36
Intuition
Relaxations and Regret Bounds for Online Problems
Given the additional information, Prophet wants to solve a DP
22/36
Intuition
Relaxations and Regret Bounds for Online Problems
Bellman Loss (computational)
Given the additional information, Prophet wants to solve a DP
23/36
Intuition
Relaxations and Regret Bounds for Online Problems
Bellman Loss (computational)
Given the additional information, Prophet wants to solve a DP
24/36
Intuition
Relaxations and Regret Bounds for Online Problems
Bellman Loss (computational)
Information Loss (estimation)
Given the additional information, Prophet wants to solve a DP
25/36
Knapsack RABBI
Relaxations and Regret Bounds for Online Problems
26/36
Problem 2: Dynamic Posted Pricing
Relaxations and Regret Bounds for Online Problems
● Stream of T customers with i.i.d. rewards
● Each customer wants one of our identical items
● We can post any fare from the set
● Objective: collect as much reward as possible
Prophet solves:
?
27/36
Pricing RABBI
Relaxations and Regret Bounds for Online Problems
Fraction of customers that would buy when the fare is
28/36
Dynamic Posted Pricing
Relaxations and Regret Bounds for Online Problems
Theorem A natural policy with constant expected regret
for Dynamic Posted Pricing. Regret independent of .
In particular, the regret depends only on .
Fraction that buys at
29/36
Dynamic Posted Pricing
Relaxations and Regret Bounds for Online Problems
Theorem A natural policy with constant expected regret
for Dynamic Posted Pricing. Regret independent of .
In particular, the regret depends only on .
Fraction that buys at
30/36
Dynamic Posted Pricing
Relaxations and Regret Bounds for Online Problems
Theorem A natural policy with constant expected regret
for Dynamic Posted Pricing. Regret independent of .
In particular, the regret depends only on .
Fraction that buys at
31/36
The Algorithm is Practical
Relaxations and Regret Bounds for Online Problems
32/36
The Algorithm is Practical
Relaxations and Regret Bounds for Online Problems
33/36
Bound via Bellman Inequalities
Relaxations and Regret Bounds for Online Problems
Defnition Given fltration , is a relaxed value w.r.t. if
1) Initial Ordering:
2) Monotonicity:
34/36
Bound via Bellman Inequalities
Relaxations and Regret Bounds for Online Problems
Defnition Given fltration , is a relaxed value w.r.t. if
1) Initial Ordering:
2) Monotonicity:
35/36
Conclusions and Extensions
Relaxations and Regret Bounds for Online Problems
● Framework based on constructing tractable benchmarks● Bellman Loss: computational● Information Loss: estimation● Applications: NRM, Probing, Contextual Bandits,
AdWords, Dynamic Pricing, and other Resource Allocation Problems
36/36
Related Work
Relaxations and Regret Bounds for Online Problems
● Prophet: worst case distribution (competitive ratio) for maximum of iid [Hill & Kertz], best possible [Correa et al.], matroid constraints [Kleinberg & Weinberg]
● Constant regret in NRM: [Arlotto & Gurvich]
[Talluri & Van Ryzin], [Reiman & Wang], [Jasin & Kumar], [Bumpensanti & Wang]
● Online matching, resource allocation, AdWords[Manshadi et al], [Legrain & Jaillet]
● Probing: competitive ratio (linear regret) [Gupta & Nagarajan], [Singla], [Chugg & Maehara]
● Information Relaxation [Balseiro & Brown], [Brwon, Smith, & Sun] ● Approximate Dynamic Programming [Powell]