Post on 29-Oct-2020
transcript
ChallengesandResultsinAc1veSensing
DavidA.Castañón
BostonUniversitySpecialthankstocollaboratorsD.Hitchings,K.Jenkins,H.Ding,V.Saligrama,K.Trapeznikovand
J.Wang
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA
Acknowledgements
• Thankstosponsoringagencies- NSF,AFOSR,DHS
• ThankstoSecretaryKellyandPresidentTrump- Laptoponplanes
Mo1va1on:Search
• ClassicprobleminWWII:submarinesearch- Assignmentofsearchpatrols(air,surface)tolocateinsuspectedareas- Keyproblem:notguaranteedtofindwhensearchinganarea
- Limitedvisibility,range,intelligentadversary- Book:SearchandScreening–B.Koopman1946
Mo1va1on:Surveillance
• Unmannedandmannedvehiclesincoordinatedmonitoring- Detec
Mo1va1on:Diagnosis
• Medicaldiagnosis,faultdetec1onincomponents,…- Keyaspect:Imperfecttests- Needtointerpretcollectedmeasurements,iden
Mo1va1on:Security
• Checkpointofthefuture:manypossibletests- Exploitreal-
ProblemFeatures
• Opportunityforselec1ngmeasurementssequen1ally• Informa1onprocessedfrompreviousmeasurementstoselectfuturemeasurements- Feedback
• Meaningfulmissionobjec1vestoguideselec1onofmeasurements- Correctdiagnosis- Detec
Focuson3problems
• Discretesearch- Findingobjectofinterestinextensionsofclassicalsearchtheorymodels
• Dynamicsearchwithinforma1ontheore1cobjec1ves- Newclassofsearchmodelswithfulladap
Dynamical System
Dynamical System
Feedbacktocontrolinforma1on
• FeedbackControl- Focusonchangingdynamics
Observed Signals
Controller
Observed Signals
Sensor Manager
Estimation/ Fusion
Action Selection
• Active Sensing - Focus on changing observations
• Implicit Assumption - Rapid, automated processing of
observations to generate “state” information
SearchTheory
• Simplemodel:Sta1onaryobject,finiteloca1ons,priorprobabilityofobjectloca1on(Stone,Kadane,…)
• Simpleac1onmodel:lookonlyinoneplace
• Simplesensormodel- Searchofanareayieldsdetectornot- Pd<1,butnoprob.falsealarm- Condi
Problemsetup
• Nota1on- Loca
Nofeedbackneeded!
• Typicalfeedbackstructure:decisiontree
• Searchfeedbackstructure
- Futuredecisionsneededonlyifnodetec
Op1miza1onForm
• Prob.objectisatiandisnotdetecteda[erobserva1onsI(t):
• Objec1ve:minimizeprobabilityofnodetec1onwithsearchesuptoT:
• Solu1on:- Feedbackform:At
cost
cost
cost
Agents
LocationsSparse accessibility
Extensions
• Whereaboutssearch
• Op1malstrategy:- Feedbackform:At
cost
cost
cost
Agents
LocationsSparse accessibility
Extensions
• Converttonetworkop1miza1on- Integerprogramwithunimodularconstraints- FastalgorithmdevelopedDing-C.’17--complexity
O(N |A|) where N =MX
m=1
Nm
Results
• Samplesearch:9UAVswithlimitedfieldofregard- Pdsfrom0.7to0.9
Illustra
Canweimprovesensormodel?
• Assumewehavebothfalsealarmsandmisseddetec1ons- Observa><
>>>>:
⇡i(t�1)pi(1�⇡i(t�1))qi+⇡i(t�1)pi u(t) = i, y(t) = 1
⇡i(t�1)(1�pi)(1�⇡i(t�1))(1�qi)+⇡i(t�1)(1�pi) u(t) = i, y(t) = 0
⇡i(t�1)qj(1�⇡j(t�1))qj+⇡j(t�1)pj u(t) = j 6= i, y(t) = 1
⇡i(t�1)(1�qj)(1�⇡j(t�1))(1�qj)+⇡j(t�1)(1�pj) u(t) = j 6= i, y(t) = 0
…withgreatdifficulty!
• Objec1ve:GivenTobserva1ons,
• Onlyoneknowncharacteriza1onofop1malstrategies(C.‘95)- Specialcase:pi=1-qi=p- Op
Informa1onTheoryandSearch
• Informa1ontheory:quan1ta1vemeasuresofinforma1onanduncertainty- Givencondi
Changesearchproblem
• ObjectlocatedincompactsubsetofEuclideanspace- xisnowcon
Background
• Noisydecoding(Horstein‘63,Burnashev’73,…)- Probabilis
Formula1on
• MsensorssearchingforasingleobjectlocatedatunknownXpresentincompactregionAinRn- Discretestages:ateachstagesensormchoosesAmaBorelsubsetofAtoobserve,receivesdiscrete-valuedobserva
Formula1on-2
• Admissiblestrategies:- Eachsensorm:mapcondi
Solu1on
• Stochas1ccontrol:- Nota
BackwardInduc1on
• One-stageproblem- Leti1:MbeaBooleanvectorindica
Solu1on-2
• One-stageproblemsolu1on- Expecteddifferen
Solu1on-3
• One-stageproblemsolu1on(cont)- Lemma:G(u)isstrictlyconcaveinu. - uisaprobabilityvector(sumsto1,non-nega
Solu1on-4
• N-stageproblemsolu1on- Lemma:Foranydensitypn(x),wecanfindAsuchthat
- Theorem:Op
Solu1on-5
• SingleSensorProblem(Jedinak,Frazier,Sznitman’13)- Defineforsensorm:
- Scalarstrictlyconcavemaximiza
• Approach:Commonapproachatcodingregions
- Compute
- Orderindicesi1:Minalinear,totalorder- Allocateregionsinsameorderwithprobabili
• Doesminimizingentropyguaranteegoodlocaliza1on?- Notnecessarily.2-Ddifferen
Experiments
• Illustra1onofbounds- Singlesensor,binarysymmetricprobabilityoferror0.1
0 5 10 15 200
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09mean square error over time
n
mea
n sq
uare
erro
r
real error curvetheoretical upper boundtheoretical lower bound
Mul1sensorexample
• Twosensors,binaryasymmetricchannel- Probabilitytables:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. XX, NO. X, XXXXXX XXXX 11
b). Note that, in the correlated error model, the sensorshave greater probability of agreeing on a measurement,which corresponds to a significant part of the error beingcreated by randomness in the signature of the object ofinterest.
TABLE I: The sensor specifications for two types ofBoolean sensors: correlated and independent errors
y = 0,0 0,1 1,0 1,10, 0 0.62 0.17 0.17 0.040, 1 0.21 0.57 0.06 0.161,0 0.11 0.03 0.68 0.181,1 0.11 0.02 0.16 0.71
a) Correlated
y= 0 1f10 0.79 0.21f11 0.14 0.86f20 0.79 0.21f21 0.27 0.73b) Independent
For these problems, computation of a greedy solutionto minimize expected mean square error at each stageis a formidable task, as it requires searching over allpossible combinations of sensing regions for each sensor.With the posterior differential entropy objective, we haveoptimal strategies characterized in terms of computedoperating points. For the independent case, the optimaloperating points computed using the MATLAB func-tion fmincon are u1⇤ = 0.511, u2⇤ = 0.494. Sensor1 is more accurate, and thus seeks to include moreprobability in its search area. The expected one-stagereduction in differential entropy for this case is 0.54bits. For the correlated case, the joint operating point isu11⇤ = 0.288, u10⇤ = 0.25, u01⇤ = 0.25, u00⇤ = 0.212,with expected differential entropy reduction of 0.58 bitsper stage. The correlated channel case leads to greaterreduction in differential entropy, as the sensors exploitthe correlation in the signal to enhance informationextraction. When compared with the optimal strategiesfor independent sensors, the correlated sensors increasethe probability of the overlap area (u11⇤ vs. u1⇤ ⇥ u2⇤)where both sensors query as to the presence of the object.
The optimal strategies at each stage n, based onpn�1(x), are to find regions A1n, A2n ⇢ [0, 1]2 so thatZ
x2A1n\A2npn�1(x)dx = u
11⇤
Z
x2(A1n)c\A2npn�1(x)dx = u
01⇤
Z
x2A1n\(A2n)cpn�1(x)dx = u
10⇤
Z
x2(A1n)c\(A2n)cpn�1(x)dx = u
00⇤
Note that there are many sensing strategies that willsatisfy the above equalities. We exploit this degree offreedom to select sensing strategies that can be imple-mented by sensors with field of view constraints, and thataim to reduce mean square error as well as achieving op-timal reduction in differential entropy. Thus, we choose
our subsets to be rectangular intervals, so that sensorswill observe connected regions. In addition, we chooseour sensing strategies to alternate between partitions ofthe x-axis at odd times n, and partitions of the y-axisat even times n, dividing each axis into intervals withprobabilities corresponding to u10⇤, u11⇤, u01⇤, u00⇤, andthen we aggregate the appropriate regions to compute thesensing areas A1n, A2n. This construction is illustrated inFig. 3. By alternating between axes for partition at differ-ent times, we ensure that the errors in both dimensionsare reduced as the differential entropy decreases.
(Rf )c \ (Rg)c(Rf )c \ Rg
RfRg
Rf \ (Rg)c Rf \ Rg
Af Ag
Af \ (Ag)c (Af )c \ (Ag)c(Af )c \ AgAf \ Ag
A10A11A01
A1 A2
A00
Fig. 3: Partition of a line segment into four disjointsubsets at each stage.
For each of the sensor models, we conducted 2000Monte Carlo experiments. In each experiment, we ran-domly generate a object position X 2 X = [0, 1]2 usinga uniform distribution. We initialize our prior density forX , p0(x), as a uniform distribution; therefore, the initialdifferential entropy H(p0) = �
R 10
R 10 log2 1dx dy = 0.
At each stage n > 0, given the density pn�1(x), sensingareas A1n, A2n are selected, and random measurements(y1n, y
2n) are generated according to the sensor error
models. These measurements are used to update theconditional density from pn�1(x) to pn(x) as indicatedin (2). We continue this process until n = 20 sensingstages are completed.
For each experiment, we plot the average differentialentropy H(pn) and the average mean-square error as afunction of n. Fig. 4(a) contains the average differentialentropy results for both the correlated and independentmeasurement error models. As expected, the averagedifferential entropy for the correlated case decays fasterthan that for the independent case as the number ofstages increases. Fig. 4(b) contains the graph of themean squared error of the estimated object location as afunction of the number of stages, as well as the lowerbounds on the errors. We note the near-equivalence of thethe mean square error in both cases, leading to an expo-nential decay as a function of the number of stages. Thissuggests that an exponentially decaying upper boundmay be possible for these algorithms, although no suchbound has been established in the literature.
The second set of experiments consist of 3 sensorssearching for a object in X = [0, 1]. Each sensor hasobservations taking 3 possible values. The sensing areasat stage n are denoted as A1n, A2n and A3n respectively.We assume that sensor 3 has two choices of precision
0 5 10 15 200
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18correlated error vs independent error
n
mea
n sq
uare
erro
r
correlated real errorindependent real errorcorrelated lower boundindependent lower bound
0 5 10 15 20−12
−10
−8
−6
−4
−2
0correlated entropy vs independent entropy
n
post
erio
r diff
eren
tial e
ntro
py
correlatedindependent
• Canallowforchoiceofsensormodeatacost- Changesmeasurementdistribu
• Previousmodelsrequireparametriccharacteriza1onsofuncertainty- Aretheretheoriesthat`learn’thefeedbackstrategiesfromdataratherthanderivingthemfrommodels?
• Studynewclassofproblems:Machinelearningwithsensingbudget- Collec
Mo1va1ngExample
• Digitrecogni1on:Doweneedfullresolu1on?
Need higher
resolution
SupervisedLearning
L( , )= Features:
X Labels:
Y
Training Data
Loss:
Predicted Label
True Label
Learn a Classifier: f( )à
Adap1veSensorSelec1on
• Goal:Learnapolicyπtominimizeempiricalclassifica1onerrorplusacquisi1oncost
Policy:
User
Current measurements
π( , )
fs( )à
π( , )
π( , )
Classify using current measurements
Acquire new measurements, return
to policy
Assump1on:Havetrainingdata
• Trainingdatawithmaximalsetoffeaturescollected- neededtoevaluatewhatcanbegainedinperformance
Define the function fj as the classifier operating on the sensor subset sj
x11 x12 x13 …
x1K-2 x1K-1 x1K
x21 x22 x23 …
x2K-2 x2K-1 x2K , ,…,
xN1 xN2 xN3 …
xNK-2 xNK-1 xNK
Training data
{ }Ks j ,,1…⊆
Assume a subset of sensors/
features:
The cost of using sensor subset s for an example xi with label yi can then be
defined:
∑∈
≠ +=j
jsk
kyxfj cyxfL )(1)),((
Classification error
Cost of sensors in s
Fixed a priori
Learningdecisionstrategiesinfixed-structures
• Assumesequenceofpoten1alobserva1onsisknown- Decisioniswhethertocollectmoreobserva
Upperboundobjec1ves
• Boundindicatorsbyconvexupperboundsurrogates
• Problem:non-convex!• Idea:reformulateriskbeforeintroducingsurrogates
- Theorem:
- where
( ) 01 ≤≥ zzφR(g1, g2, x, y) = L(f1(x)) · I
gi(x)0 + L(f2(x))Ig1(x)>0Ig2(x)0 + L(f3(x))Ig1(x)>0Ig2(x)>0 L(f1(x)) · �(gi(x)) + L(f2(x))�(�g1(x))�(g2(x)) + L(f3(x))�(�g1(x))�(�g2(x))
( )( )
⎟⎟⎠
⎞⎜⎜⎝
⎛
⋅+⋅
⋅+⋅⋅++=
≥≥
Nowhaveconvexminimiza1on
• Introducingsurrogatesleadstoalinearprogram!
- Surrogateφ(z)=max(1-z,0)- Approachgeneralizestoarbitrarylengths,aslongasorderisfixed- Approachcanalsohandletreestructures
• Keyidea:Achieveperformanceclosetothatofusingallthefeatures,whilereducingcostofmeasurementsignificantly- E.g.risk-basedscreeningatcheckpointstomaintainthroughput
( ) ( ) ( ) ( )( ) ( ) ⎟
⎟⎠
⎞⎜⎜⎝
⎛
−⋅+−⋅
⋅+−⋅⋅+
)()()()(,)()()()(,)()()(
maxmin2211
2311132
, 21 xgxxgxxgxxgxxgxx
gg φπφπ
φπφπφππ
Budgettreeexperiment
• Landsatdatausing4spectralbands,eachbandcosts1- Comparewithcompe
Budgettreeexperiment2
• Imagesegmenta1ondataset:7features
Whataboutsensorselec1on?
g1(x) g3(x) g6(x)
g2(x)
g4(x)
g5(x)
g7(x)
Consider a directed acyclic graph (not a tree!)
Decision function chooses next sensor or
to stop and classify Can we efficiently train
decision functions?
Trainingdecisiongraphs
• Yes!Usebackwardinduc1on(dynamicprogramming)- Trainendclassifiersfirst,usethosetogetsurrogatecosts-to-go- Recurtowardsthefrontintraining- Theorem:Policyconvergestoop
Otherac1vesensingproblems
• Ac1vesensingforGaussianmodels- Determinis
Conclusions
• Ac1vesensingproblemsareincreasinglyimportantwiththedeploymentofflexible,highlycapablesensors- Shared-aperturemul