+ All Categories
Home > Documents > Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction...

Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction...

Date post: 25-Sep-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
60
Communicating with Unknown Teammates Samuel Barrett 1 Noa Agmon 2 Noam Hazon 3 Sarit Kraus 2,4 Peter Stone 1 1 University of Texas at Austin 2 Bar-Ilan University {sbarrett,pstone}@cs.utexas.edu {agmon,sarit}@macs.biu.ac.il 3 Ariel University 4 University of Maryland [email protected] ECAI Aug 21, 2014 S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Transcript
Page 1: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Communicating with Unknown Teammates

Samuel Barrett1 Noa Agmon2 Noam Hazon3

Sarit Kraus2,4 Peter Stone1

1University of Texas at Austin 2Bar-Ilan University{sbarrett,pstone}@cs.utexas.edu {agmon,sarit}@macs.biu.ac.il

3Ariel University 4University of [email protected]

ECAI

Aug 21, 2014

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 2: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Ad Hoc Teamwork

Motivation

Example

Ad Hoc Teamwork

◮ Only in control of a single

agent or subset of agents

◮ Unknown teammates

◮ No pre-coordination

◮ Shared goals

Examples in humans:

◮ Pick up soccer

◮ Accident response

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 3: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Ad Hoc Teamwork

Motivation

Example

Motivation

◮ Agents are becoming more common and lasting longer◮ Both robots and software agents

◮ Pre-coordination may not be possible

◮ Agents should be robust to various teammates

◮ Past work focused on cases with no communication

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 4: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Ad Hoc Teamwork

Motivation

Example

Motivation

◮ Agents are becoming more common and lasting longer◮ Both robots and software agents

◮ Pre-coordination may not be possible

◮ Agents should be robust to various teammates

◮ Past work focused on cases with no communication

Research Question:

How can an agent act and communicate optimally

with teammates of uncertain types?

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 5: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Ad Hoc Teamwork

Motivation

Example

Example

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 6: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Ad Hoc Teamwork

Motivation

Example

Example

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 7: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Ad Hoc Teamwork

Motivation

Example

Example

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 8: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Ad Hoc Teamwork

Motivation

Example

Example

Ad Hoc Agent Teammates

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 9: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Ad Hoc Teamwork

Motivation

Example

Example

Ad Hoc Agent Teammates

/How long does the first road take?

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 10: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Ad Hoc Teamwork

Motivation

Example

Outline

1 Introduction

2 Problem Description

3 Theoretical Results

4 Empirical Results

5 Conclusions

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 11: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Overview

Communication

Teammates

Outline

1 Introduction

2 Problem Description

3 Theoretical Results

4 Empirical Results

5 Conclusions

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 12: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Overview

Communication

Teammates

Problem Description

◮ Multi-armed bandit◮ Two Bernoulli arms◮ Ad hoc agent observes all payoffs

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 13: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Overview

Communication

Teammates

Problem Description

◮ Multi-armed bandit◮ Two Bernoulli arms◮ Ad hoc agent observes all payoffs

◮ Multi-agent

◮ Simultaneous actions

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 14: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Overview

Communication

Teammates

Problem Description

◮ Multi-armed bandit◮ Two Bernoulli arms◮ Ad hoc agent observes all payoffs

◮ Multi-agent

◮ Simultaneous actions

◮ Limited communication

◮ Fixed set of messages◮ Has explicit cost

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 15: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Overview

Communication

Teammates

Problem Description

◮ Multi-armed bandit◮ Two Bernoulli arms◮ Ad hoc agent observes all payoffs

◮ Multi-agent

◮ Simultaneous actions

◮ Limited communication

◮ Fixed set of messages◮ Has explicit cost

◮ Goal: Maximize payoffs and minimize communication costs

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 16: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Overview

Communication

Teammates

Communication

◮ Last observation

◮ Arm mean

◮ Suggestion

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 17: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Overview

Communication

Teammates

Communication

◮ Last observation - The last arm chosen and the resulting

payoff

◮ Arm mean - The mean and number of pulls of a selected

arm

◮ Suggestion - Suggest that your teammates should pull the

selected arm

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 18: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Overview

Communication

Teammates

Teammates

◮ Limited number of types

◮ Continuous parameters

◮ Tightly coordinated

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 19: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Overview

Communication

Teammates

Teammates

◮ Limited number of types

◮ Continuous parameters

◮ Tightly coordinated

◮ Team shares knowledge through communication

◮ Do not need to track each agent’s pulls

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 20: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Overview

Communication

Teammates

Teammate Behaviors

ε-Greedy UCB(c)

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 21: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Overview

Communication

Teammates

Teammate Behaviors

ε-Greedy

◮ Track arm means

◮ Usually choose greedily

◮ ε - fraction of time to

explore

UCB(c)

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 22: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Overview

Communication

Teammates

Teammate Behaviors

ε-Greedy

◮ Track arm means

◮ Usually choose greedily

◮ ε - fraction of time to

explore

UCB(c)

◮ Track arm means and pulls

◮ Choose greedily with

respect to bounds

◮ c - weight given to bounds

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 23: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Overview

Communication

Teammates

Teammate Behaviors

ε-Greedy

◮ Track arm means

◮ Usually choose greedily

◮ ε - fraction of time to

explore

UCB(c)

◮ Track arm means and pulls

◮ Choose greedily with

respect to bounds

◮ c - weight given to bounds

◮ Have probability of following suggestion sent by ad hoc

agent

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 24: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Outline

1 Introduction

2 Problem Description

3 Theoretical Results

4 Empirical Results

5 Conclusions

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 25: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Research Question

Can an ad hoc agent approximately plan to communicate

optimally with these teammates in polynomial time?

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 26: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Model

◮ Model as a POMDP (teammates’ behaviors)

◮ State:

◮ Pulls and successes:

◮ Teammates’

◮ Ad hoc agent’s

◮ Communicated

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 27: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Model

◮ Model as a POMDP (teammates’ behaviors)

◮ State:

◮ Pulls and successes:

◮ Teammates’

◮ Ad hoc agent’s

◮ Communicated

◮ Types and parameters of teammates (partially

observed)

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 28: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Model

◮ Model as a POMDP (teammates’ behaviors)

◮ State:

◮ Pulls and successes:

◮ Teammates’

◮ Ad hoc agent’s

◮ Communicated

◮ Types and parameters of teammates (partially

observed)

◮ Actions are arms to choose and messages to send

◮ Transition function is based on arms’ distributions and

teammates’ behaviors

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 29: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Simple Version

◮ What if we know the teammates’ behaviors?

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 30: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Simple Version

◮ What if we know the teammates’ behaviors?

◮ Problem simplifies to an MDP

◮ What is the size of the state space?

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 31: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Simple Version

◮ What if we know the teammates’ behaviors?

◮ Problem simplifies to an MDP

◮ What is the size of the state space?

◮ Team is tightly coordinated ⇒ only track pulls and

successes of team

◮ Track team’s, ad hoc agent’s, and communicated pulls

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 32: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Simple Version

◮ What if we know the teammates’ behaviors?

◮ Problem simplifies to an MDP

◮ What is the size of the state space?

◮ Team is tightly coordinated ⇒ only track pulls and

successes of team

◮ Track team’s, ad hoc agent’s, and communicated pulls

◮ Polynomial in terms of number of teammates and

rounds

◮ Solvable in polynomial time

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 33: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Full version

◮ Do not fully know teammates’ behaviors

◮ Know teammates are either ε-greedy or UCB(c)

◮ Do not know ε or c

◮ Problem is a POMDP

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 34: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Background

◮ POMDPs can be approximately solved in polynomial time

in terms of the number of δ-neighborhoods that can cover

the belief space (aka the covering number)◮ H. Kurniawati, D. Hsu, and W. S. Lee. SARSOP: Efficient point-based POMDP planning by

approximating optimally reachable belief spaces. In In Proc. Robotics: Science and Systems, 2008

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 35: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

δ-neighborhood

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 36: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Proof Sketch

◮ Observable part of the state adds a polynomial factor

◮ Only need to worry about the partially observed teammates

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 37: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Proof Sketch

◮ Observable part of the state adds a polynomial factor

◮ Only need to worry about the partially observed teammates

◮ Belief space of ε can be represented as beta

distribution

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 38: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Proof Sketch

◮ Observable part of the state adds a polynomial factor

◮ Only need to worry about the partially observed teammates

◮ Belief space of ε can be represented as beta

distribution

◮ Belief space of c can be represented by the upper and

lower possible values

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 39: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Proof Sketch

◮ Observable part of the state adds a polynomial factor

◮ Only need to worry about the partially observed teammates

◮ Belief space of ε can be represented as beta

distribution

◮ Belief space of c can be represented by the upper and

lower possible values

◮ Can track probability of ε-greedy vs UCB using Bayes

updates

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 40: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Proof Sketch

◮ Observable part of the state adds a polynomial factor

◮ Only need to worry about the partially observed teammates

◮ Belief space of ε can be represented as beta

distribution

◮ Belief space of c can be represented by the upper and

lower possible values

◮ Can track probability of ε-greedy vs UCB using Bayes

updates

◮ Covering number of belief space is polynomial ⇒ POMDP

can be solved in polynomial time

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 41: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Question

Model

Simple Problem

Proof sketch

Proof Sketch

◮ Observable part of the state adds a polynomial factor

◮ Only need to worry about the partially observed teammates

◮ Belief space of ε can be represented as beta

distribution

◮ Belief space of c can be represented by the upper and

lower possible values

◮ Can track probability of ε-greedy vs UCB using Bayes

updates

◮ Covering number of belief space is polynomial ⇒ POMDP

can be solved in polynomial time

◮ Results carry over into case of unknown arm means

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 42: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Setup

ε-Greedy Teammates

UCB(c) Teammates

Unknown arms

Externally-created Teammates

Outline

1 Introduction

2 Problem Description

3 Theoretical Results

4 Empirical Results

5 Conclusions

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 43: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Setup

ε-Greedy Teammates

UCB(c) Teammates

Unknown arms

Externally-created Teammates

Approach

◮ POMDP problem is tractable ⇒ we can use existing

POMDP solvers

◮ POMCP

◮ Particle filtering to track beliefs

◮ Monte Carlo tree search to plan

◮ D. Silver and J. Veness. Monte-Carlo planning in large POMDPs. In NIPS ’10. 2010

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 44: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Setup

ε-Greedy Teammates

UCB(c) Teammates

Unknown arms

Externally-created Teammates

Approach

◮ POMDP problem is tractable ⇒ we can use existing

POMDP solvers

◮ POMCP

◮ Particle filtering to track beliefs

◮ Monte Carlo tree search to plan

◮ Fast

◮ Handles large state-action spaces

◮ Approximate

◮ D. Silver and J. Veness. Monte-Carlo planning in large POMDPs. In NIPS ’10. 2010

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 45: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Setup

ε-Greedy Teammates

UCB(c) Teammates

Unknown arms

Externally-created Teammates

Empirical Setup

◮ Vary message costs

◮ Vary number of rounds

◮ Vary number of arms

◮ Vary number of teammates

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 46: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Setup

ε-Greedy Teammates

UCB(c) Teammates

Unknown arms

Externally-created Teammates

Ad Hoc Agent Behaviors

◮ POMCP - Plan using POMCP

◮ NoComm - Act greedily and do not communicate

◮ Obs - Act greedily and communicate the last observation

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 47: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Setup

ε-Greedy Teammates

UCB(c) Teammates

Unknown arms

Externally-created Teammates

Problem Description

◮ Problem tackled in the theory

◮ Teammates are either ε-greedy or UCB(c)

◮ Need to figure out:

◮ Type

◮ Parameter (ε or c)

◮ Chance of following suggestion

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 48: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Setup

ε-Greedy Teammates

UCB(c) Teammates

Unknown arms

Externally-created Teammates

ε-Greedy Teammates

0.08 0.16 0.32 0.64 1.28 2.56

Message Cost

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Fra

cofM

ax

Rew

ard

POMCP

NoComm

Obs

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 49: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Setup

ε-Greedy Teammates

UCB(c) Teammates

Unknown arms

Externally-created Teammates

UCB(c) Teammates

0.08 0.16 0.32 0.64 1.28 2.56

Message Cost

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Fra

cofM

ax

Rew

ard

POMCP

NoComm

Obs

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 50: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Setup

ε-Greedy Teammates

UCB(c) Teammates

Unknown arms

Externally-created Teammates

Unknown arms - ε-greedy or UCB(c)

1 2 3 4 5 6 7 8 9

Num Teammates0.60

0.65

0.70

0.75

0.80

0.85

Fra

cofM

ax

Rew

ard

POMCP

NoComm

Obs

Match

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 51: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Setup

ε-Greedy Teammates

UCB(c) Teammates

Unknown arms

Externally-created Teammates

Externally-created Teammates

◮ Teammates we did not create

◮ Created by students for project

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 52: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Setup

ε-Greedy Teammates

UCB(c) Teammates

Unknown arms

Externally-created Teammates

Externally-created Teammates

◮ Teammates we did not create

◮ Created by students for project

◮ Not necessarily tightly coordinated

◮ Not considering ad hoc teamwork

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 53: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Setup

ε-Greedy Teammates

UCB(c) Teammates

Unknown arms

Externally-created Teammates

Externally-created Teammates

◮ True ad hoc teamwork scenario

◮ Models are incorrect

◮ Theoretical guarantees do not hold

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 54: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Setup

ε-Greedy Teammates

UCB(c) Teammates

Unknown arms

Externally-created Teammates

Externally-created Teammates – Cost

0.08 0.16 0.32 0.64 1.28 2.56

Message Cost

0.0

0.2

0.4

0.6

0.8

Fra

cofM

ax

Rew

ard

POMCP

NoComm

Obs

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 55: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Setup

ε-Greedy Teammates

UCB(c) Teammates

Unknown arms

Externally-created Teammates

Externally-created Teammates – Num Teammates

1 2 3 4 5 6 7 8 9

Num Teammates

0.3

0.4

0.5

0.6

0.7

0.8

Fra

cofM

ax

Rew

ard

POMCP

NoComm

Obs

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 56: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Related Work

Conclusions

Future Work

Questions

Outline

1 Introduction

2 Problem Description

3 Theoretical Results

4 Empirical Results

5 Conclusions

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 57: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Related Work

Conclusions

Future Work

Questions

Related Work

◮ S. Liemhetcharat and M. Veloso. Modeling mutual capabilities in heterogeneous teams for role assignment.

In IROS ’11, pages 3638 –3644, 2011

◮ F. Wu, S. Zilberstein, and X. Chen. Online planning for ad hoc autonomous agent teams. In IJCAI, 2011

◮ M. Bowling and P. McCracken. Coordination and adaptation in impromptu teams. In AAAI, pages 53–58,

2005

◮ J. Han, M. Li, and L. Guo. Soft control on collective behavior of a group of autonomous agents by a shill

agent. Journal of Systems Science and Complexity, 19:54–62, 2006

◮ M. Knudson and K. Tumer. Robot coordination with ad-hoc team formation. In AAMAS ’10, pages

1441–1442, 2010

◮ E. Jones, B. Browning, M. B. Dias, B. Argall, M. M. Veloso, and A. T. Stentz. Dynamically formed

heterogeneous robot teams performing tightly-coordinated tasks. In ICRA, pages 570 – 575, May 2006

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 58: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Related Work

Conclusions

Future Work

Questions

Conclusions

◮ Can optimally plan best way to communicate with unknown

teammates

◮ Can handle an infinite set of possible teammates

◮ Can cooperate with a variety of teammates not covered in

theory

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 59: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Related Work

Conclusions

Future Work

Questions

Future Work

◮ More complex domains

◮ Unknown environments

◮ Teammates that learn about us

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates

Page 60: Communicating with Unknown Teammatespstone/Papers/bib2html-links/ECAI14-Ba… · Introduction ProblemDescription TheoreticalResults EmpiricalResults Conclusions AdHocTeamwork Motivation

Introduction

Problem Description

Theoretical Results

Empirical Results

Conclusions

Related Work

Conclusions

Future Work

Questions

Thank You!

In some cases, ad hoc agents

can optimally plan about how to

communicate with their

teammates.

S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates


Recommended