+ All Categories
Home > Documents > Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki,...

Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki,...

Date post: 18-Dec-2015
Category:
Upload: meghan-hodge
View: 215 times
Download: 2 times
Share this document with a friend
Popular Tags:
22
Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe
Transcript
Page 1: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping

Pradeep Varakantham Singapore Management University

Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Page 2: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Motivating Domains

Disaster RescueSensor

Networks

Characteristics of Domains: Uncertainty Coordinating multiple agents Sequential decision making

Page 3: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Meeting the challengesProblem:

Multiple agents coordinating to perform multiple tasks in presence of uncertainty

Sol: Represent as Distributed POMDPs and solveNEXP Complete for optimal solutionApproximate algorithm to dynamically exploit

structure in interactionsResult: Vast improvement in performance over

existing algorithms

Page 4: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Outline

Illustrative Domain

Model

Approach: Exploit dynamic structure in interactions

Results

Page 5: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Illustrative Domain Multiple types of

robots Uncertainty in

movements Reward

Saving victims Collisions Clearing debris

Maximize expected joint reward

Page 6: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

ModelDisPOMDPs with Coordination Locales, DPCL

Joint model: <S, A, Ω, P, R, O, Ag>Global state represents completion of tasksAgents independent except in coordination locales,

CLsTwo types of CLs:

Same time CL (Ex: Agents colliding with each other)

Future time CL (Ex: Cleaner robot cleaning the debris assists rescue robot in reaching the goal)

Individual observability

Page 7: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Solving DPCLs with TREMOR

Teams REshaping of MOdels for Rapid execution

Two steps:1. Branch and Bound search

MDP based heuristics

2. Task Assignment evaluation By computing policies for every agentPerform only joint policy computation at CLs

Page 8: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

1. Branch and Bound search

Page 9: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

2. Task Assignment EvaluationUntil convergence of policies or

maximum iterations:1)Solve individual POMDPs2)Identify potential coordination locales3)Based on type and value of

coordination :Shape P and R of relevant individual agents

Capture interactionsEncourage/Discourage interactions

4)Go to step 1

Page 10: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Identifying potential CLsCL = <State, Action>Probability of CL occurring at a time step, T

Given starting beliefStandard belief update given policy

Policy over belief states

Probability of observing w, in belief state “b”

Updating “b”

Page 11: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Type of CLSTCL, if there exists “s” and “a” for which

Transition/Reward function not decomposable, P(s,a,s’) ≠ Π1≤i≤N P((sg,si),ai,(sg’,si’)) OR R(s,a,s’) ≠ Σ1≤i≤N R((sg,si),ai,(sg’,si’))

FTCL, Completion of task (global state) by an agent at

t’ affects transitions/rewards of other agents at t

Page 12: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Shaping Model (STCL)Shaping transition function

Shaping reward function

Joint transition probability when CL occursNew transition

probability for agent “i”

Page 13: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

ResultsBenchmark Algorithms

Independent POMDPsMemory Bounded Dynamic Programming

(MBDP)

CriterionDecision qualityRun-time

Parameters: (i) agents; (ii) CLs; (iii) states; (iv) horizon

Page 14: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

State space

Page 15: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Agents

Page 16: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Coordination Locales

Page 17: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Time Horizon

Page 18: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Related workExisting Research

DEC-MDPs Assuming individual or collective full observability Task allocation and dependencies as input

DEC-POMDPs JESP MBDP Exploiting independence in

transition/reward/observation.Model Shaping

Guestrin and Gordon, 2002

Page 19: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

ConclusionDPCL, a specialization of Distributed POMDPs

TREMOR exploits presence of few CLs in domains

TREMOR depends on single agent POMDP solvers

Results: TREMOR outperformed DisPOMDP algorithms,

except in tightly coupled small problems

Page 20: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Questions?

Page 21: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Same Time CL (STCL)There is an STCL, if

Transition function not decomposable, OR P(s,a,s’) ≠ Π1≤i≤N P((sg,si),ai,(sg’,si’))

Observation function not decomposable, OR O(s’,a,o) ≠ Π 1≤i≤N O(oi,ai,(sg’,si’))

Reward function not decomposable R(s,a,s’) ≠ Σ1≤i≤N R((sg,si),ai,(sg’,si’))

Ex: Two robots colliding in a narrow corridor

Page 22: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Future Time CL

Actions of one agent at “ t’ ” can affect transitions OR observations OR rewards of

other agents at “ t ” P((st

g,sti),at

i,(stg’,st

i’)|ajt’ ) ≠ P((st

g,sti),at

i,(stg’,st

i’)) , ¥ t’ < t

R((stg,st

i),ati,(st

g’,sti’)|aj

t’ ) ≠ R((stg,st

i),ati,(st

g’,sti’)) , ¥ t’

< t O(wt

i,ati,(st

g’,sti’)|aj

t’ ) ≠ O(wti,at

i,(stg’,st

i’)) , ¥ t’ < t

Ex: Clearing of debris assists rescue robots in getting to victims faster


Recommended