Carnegie Mellon
Selecting Observations against Adversarial
ObjectivesAndreas Krause
Brendan McMahanCarlos GuestrinAnupam Gupta
Observation selection problems
Place sensors forbuilding automation
Monitor rivers, lakes using robots
Detectcontaminations
in water networksSet V of possible observations (sensor locations,..)Want to pick subset A* µ V such that
For most interesting utilities F, NP-hard!
A¤ = argmaxjA j· k
F (A)
Placement B = {S1,…, S5}
Key observation: Diminishing returns
Placement A = {S1, S2}
Formalization: SubmodularityFor A µ B, F(A [ {S’}) – F(A) ¸ F(B [ {S’}) – F(B)
Adding S’ will help a lot! Adding S’ doesn’t
help muchNew sensor S’
Submodularity[with Guestrin, Singh, Leskovec, VanBriesen, Faloutsos, Glance]
We prove submodularity forMutual information F(A) = H(unobs) – H(unobs|A)
UAI ’05, JMLR ’07 (Spatial prediction)Outbreak detection F(A) = Impact reduction sensing A
KDD ’07 (Water monitoring, …)
Also submodular:Geometric coverage F(A) = area coveredVariance reduction F(A) = Var(Y) – Var(Y|A) …
Why is submodularity useful?
Theorem [Nemhauser et al ‘78]Greedy algorithm gives constant factor approximationF(Agreedy) ¸ (1-1/e) F(Aopt)
Can get online (data dependent) bounds for any algorithmCan significantly speed up greedy algorithmCan use MIP / branch & bound for optimal solution
~63%
12
34
5Greedy Algorithm(forward selection)
sj +1 = argmaxs2VnA j
F (A j [ f sg)
Robust observation selection
What if …… parameters of model P(XV j ) unknown / change?… sensors fail?… an adversary selects the outbreak scenario?
Morevariabilityhere now
new
Attackhere!
Best placement forparameters old
Sensors
Robust prediction
Instead: minimize “width” of the confidence bandsFor every location s 2 V, define Fs(A) = Var(s) – Var(s|A)Minimize “width” simultaneously maximize all Fs(A)Each Fs(A) is (often) submodular! [Das & Kempe ‘07]
Low average variance (MSE)but high maximum
(in most interesting part!)
Typical objective: Minimize average variance (MSE)
Confidencebands
Horizontal positions V
pH v
alue
-3
-2
-1
0
1
2
3
-3
-2
-1
0
1
2
3
-3
-2
-1
0
1
2
3
Adversarial observation selection
Given:Possible observations V, Submodular functions F1,…,Fm
Want to solve
Can model many problems this way:Width of confidence bands: Fi is variance at location iunknown parameters: Fi is info-gain with parameters i
adversarial outbreak scenarios: Fi is utility for scenario i…
Unfortunately, mini Fi(A) is not submodular
One Fi foreach location i
… …A¤ = argmax
jA j· kmin
iF i (A)
How does greedy do?Set A F1 F2 mini Fi
{x} 1 0 0{y} 0 2 0{z} {x,y} 1 2 1{x,z} 1 {y,z} 2
Theorem: The problem max|A|· k mini F(A) does not admit any approximation unless P=NP
Optimalsolution(k=2)
Greedy picksz first
Then, canchoose only
x or y
Greedy does arbitrarily badly. Is there something better?
Alternative formulationIf somebody told us the optimal value,
can we recover the optimal solution A*?
Need to solve dual problem
Is this any easier?
Yes, if we relax the constraint |A| · k
c¤ = maxjA j· k
mini
F i (A)
A¤ = argminA
jA j such that mini
F i (A) ¸ c¤
Solving the alternative problem
Trick: For each Fi and c, define truncation c
|A|
Fi(A)F’i(A)
Set F1 F2 F’1
F’2
F’avg,1 mini Fi
{x} 1 0 1 0 ½ 0{y} 0 2 0 1 ½ 0{z} {x,y}
1 2 1 1 1 1
{x,z}
1 1 (1+)/2
{y,z}
2 1 (1+)/2
mini Fi(A) ¸ c F’avg,c(A) = c
Lemma:
F’avg,c(A)is submodular!
F 0i (A) = minfF i (A);cg
F 0avg;c(A) = 1
mX
iF 0
i (A)
Why is this useful?Can use the greedy algorithm to find (approximate) solution!
Proposition: Greedy algorithm finds
AG with |AG| · k and F’avg,c(AG) = c
where = 1+log maxs i Fi({s})
Back to our example
Guess c=1First pick xThen pick y
Optimal solution!
How do we find c?
Set F1 F2 mini Fi
F’avg,1
{x} 1 0 0 ½{y} 0 2 0 ½{z} {x,y}
1 2 1 1
{x,z}
1 (1+)/2
{y,z}
2 (1+)/2
Submodular Saturation Algorithm
Given set V, integer k and functions F1,…,Fm
Initialize cmin=0, cmax = mini Fi(V)Do binary search: c = (cmin+cmax)/2
Use greedy algorithm to find AG such that F’avg,c(AG) = cIf |AG| > k: decrease cmax
If |AG| · k: increase cmin
until convergencecmaxcmin c
|AG| · k c too low
|AG| > k c too high
Theoretical guarantees
Theorem: If there were a polytime algorithm with better constant < , then NPµ DTIME(nlog log n)
Theorem: Saturate finds a solution AS such that
mini Fi(AS) ¸ OPTk and |AS|· k
where OPTk = max|A|· k mini Fi(A) = 1 + log maxs i Fi({s})
Theorem: The problem max|A|· k mini F(A) does not admit any approximation unless P=NP
Experiments:Minimizing maximum variance in GP regressionRobust biological experimental designOutbreak detection against adversarial contaminations
Goals:Compare against state of the artAnalyze appropriateness of“worst-case” assumption
0 20 40 600
0.05
0.1
0.15
0.2
0.25
Number of sensors
Max
imum
mar
gina
l var
ianc
e
Greedy
SimulatedAnnealing
Saturate
Spatial prediction
Compare to state of the art [Sacks et.al. ’88, Wiens ’05, …]Highly tuned simulated annealing heuristics (7 parameters)
Saturate is competitive & faster, better on larger problems
Environmental monitoring Precipitation data
bette
r
0 20 40 60 80 1000.5
1
1.5
2
2.5
Number of sensors
Max
imum
mar
gina
l var
ianc
e
Greedy
Saturate
SimulatedAnnealing
Maximum vs. average variance
Minimizing the worst-case leads to good average-case score, not vice versa
Environmental monitoring Precipitation data
bette
r
0 5 10 15 200
0.05
0.1
0.15
0.2
0.25
Number of sensors
Mar
gina
l var
ianc
e
Max. var.opt. avg.(Greedy) Max. var.
opt. var.(Saturate)
Avg. var.opt. max.(Saturate)
Avg. var.opt. avg.(Greedy)
0 5 10 15 200
0.5
1
1.5
2
2.5
3
Number of sensors
Mar
gina
l var
ianc
e
Max. var.opt. avg.(Greedy) Max. var.
opt. max.(Saturate)
Avg. var.opt. max.(Saturate)
Avg. var.opt. avg.(Greedy)
Outbreak detection
Results even more prominent on water network monitoring (12,527 nodes)
Water networks
bette
r
Water networks
0 2 4 6 8 100
500
1000
1500
2000
2500
3000
Number of sensors
Det
ectio
n tim
e (m
inut
es)
max DT(Saturate)
max DT(Greedy)
avg DT(Saturate)
avg DT(Greedy)
0 10 20 300
500
1000
1500
2000
2500
3000
Number of sensors
Max
imum
det
ectio
n tim
e (m
inut
es)
Greedy
SimulatedAnnealing
Saturate
Robust experimental design
Learn parameters of nonlinear functionyi = f(xi,) + wChoose stimuli xi to facilitate MLE of Difficult optimization problem!
Common approach: linearization!yi ¼ f(xi,0) + rf0
(xi)T (-0) + wAllows nice closed form (fractional) solution!
How should we choose 0??
Robust experimental design
State-of-the-art: [Flaherty et al., NIPS ‘06]Assume perturbation on Jacobian rf0
(xi)Solve robust SDP against worst-case perturbationMinimize maximum eigenvalue of estimation error (E-optimality)
This paper:Assume perturbation of initial parameter estimate 0
Use Saturate to perform well against all initial parameter estimatesMinimize MSE of parameter estimate(Bayesian A-optimality, typically submodular!)
Experimental setupEstimate parameters of Michaelis-Menten model (to compare results)Evaluate efficiency of designs
Loss of optimal design,knowing true parameter true
Loss of robust design,assuming (wrong) initial parameter 0
e±ciency ´ ¸max[Cov(µ̂ j µtrue;wopt(µtr ue)))]¸max[Cov(µ̂ j µtr ue;w½(µ0))]
Robust design results
Saturate more efficient than SDP if optimizing for high parameter uncertainty
bette
r
Low uncertainty in 0 High uncertainty in 0
A B C A B C
10-1
100
1010
0.2
0.4
0.6
0.8
1
Initial parameter estimate 02
Effi
cien
cy (w
.r.t.
E-o
ptim
ality
)
ClassicalE-optimal
design
SDP = 10-3
true 2
Saturate
10-1
100
1010
0.2
0.4
0.6
0.8
1
Initial parameter estimate 02
Effi
cien
cy (w
.r.t.
E-o
ptim
ality
)
ClassicalE-optimal
design
SDP = 10-3
true 2
Saturate
SDP = 16.3
Future (current) workIncorporating complex constraints (communication, etc.)Dealing with large numbers of objectives
Constraint generationImproved guarantees for certain objectives (sensor failures)
Trading off worst-case and average-case scores
0 200 400 600 8000
500
1000
1500
2000
2500
3000
Expected score
Adv
ersa
rial s
core
k=5k=10
k=15k=20
ConclusionsMany observation selection problems require optimizing adversarially chosen submodular function
Problem not approximable to any factor!Presented efficient algorithm: Saturate
Achieves optimal score, with bounded increase in costGuarantees are best possible under reasonable complexity assumptions
Saturate performs well on real-world problemsOutperforms state-of-the-art simulated annealing algorithms for sensor placement, no parameters to tuneCompares favorably with SDP based solutions for robust experimental design
A¤ = argmaxjA j· k
mini
F i (A)