Introduction to Collectives Kagan Tumer NASA Ames Research Center [email protected] kagan .

Introduction to Collectives

Kagan Tumer

NASA Ames Research Center

[email protected]

http://ic.arc.nasa.gov/~kagan

http://ic.arc.nasa.gov/projects/COIN/index.html

(Joint work with David Wolpert)

CDCS 2002 K. Tumer 2

Ames Research Center

Outline

• Introduction to collectives– Definition / Motivation– A naturally occurring example

• Illustration of theory of collectives I– Central equation of collectives

• Interlude 1:– Autonomous defects problem (Johnson and Challet)

• Illustration of theory of collectives II– Aristocrat utility– Wonderful life utility

• Interlude 2:– El Farol bar problem: System equilibria and global optima– Collective of rovers: Scientific return maximization

• Final thoughts


Ames Research CenterMotivation

• Most complex systems, not only can be, but need to be viewed as collectives. Examples include:– Control of a constellation of communication satellites– Routing data/vehicles over a communication network/highway– Dynamic data migration over large distributed databases– Dynamic job scheduling across a (very) large computer grid– Coordination of rovers/submersibles on Mars/Europa– Control of the elements of an amorphous computer/telescope– Construction of parallel algorithms for optimization problems– Autonomous defects Problem



Collectives

• A Collective is– A (perhaps massive) set of agents;– All of which have “personal” utilities they are trying to achieve;– Together with a world utility function measuring the full

system’s performance.

• Given that the agents are good at optimizing their personal utilities, the crucial problem is an inverse problem:

How should one set (and potentially update) the personal utility functions of the agents so that they “cooperate unintentionally” and optimize the world utility?



Natural Example: Human Economy

• World utility is GDP– Agents are the individual humans– Agents try to maximize their own “personal” utilities

• Design problem is:– How to modify personal utilities of the agents through

incentives or regulations (e.g., tax breaks, SEC regulations against insider trading, antitrust laws) to achieve high GDP?

– Note: A. Greenspan does not tell each individual what to do.

• Economics hamstrung by “pre-set agents” – No such restrictions for an artificial collective



Outline• Introduction to Collectives

– Definition / Motivation– A naturally occurring example

• Illustration of Theory of Collectives IIllustration of Theory of Collectives I– Central Equation of CollectivesCentral Equation of Collectives


• Illustration of theory of collectives II– Aristocrat utility – Wonderful life utility


• Final thoughts



Nomenclature

an agentstate of all agents across all time t : state of agent at time t ^t : state of all agents other than at time t

tn

1,t0

^4,t0

4


Ames Research CenterKey Concepts for Collectives

• Intelligence: Percentage of states that would have resulted in agent having a worse utility (e.g., SAT-like percentile concept).

• Learnability: Signal-to-noise measure. Quantifies how sensitive an agent’s personal utility function is to a change in its state.

• Factoredness: Degree to which an agent’s personal utility is aligned with the world utility (e.g., quantifies “if you get rich, world benefits” concept).



• Our ability to control system consists of setting some parameters s (e.g, agents' goals):

Central Equation of Collectives

€

P(G |s) = dr ε G∫ P(G |

r ε G,s) d

r ε gP(

r ε G |

r ε g,s)P(

r ε g |s)∫

Learnability Factoredness Explore vs. Exploit

Operations Research Economics Machine Learning

– G and g are intelligences for the agents w.r.t the world utility (G) and their personal utilities (g) , respectively





• Illustration of Theory of Collectives I– Central Equation of Collectives

• Interlude 1:Interlude 1:– Autonomous defects problem (Johnson and Autonomous defects problem (Johnson and

Challet)Challet)• Illustration of Theory of Collectives II

– Aristocrat utility – Wonderful life utility


• Final thoughts



Autonomous Defects Problem

• Given a collection of faulty devices, how to choose the subset of those devices that, when combined with each other, gives optimal performance (Johnson & Challet).

€

G(ζ ) =n j a j

j =1

N

∑

nk

k =1

N

∑ nk: action of agent k (nk = 0 ; 1)

aj distortion of component j

• Collective approach: Identify each agent with a component.• Question: what utility should each agent try to maximize?



Autonomous Defects Problem (N=100)



Autonomous Defects Problem (N=1000)



Autonomous Defects Problem: Scaling







• Illustration of Theory of Collectives IIIllustration of Theory of Collectives II– Aristocrat utility Aristocrat utility – Wonderful life utilityWonderful life utility


• Final thoughts



• Recall central equation:

Personal Utility

€

P(G |s) = dr ε G∫ P(G |

r ε G,s) d

r ε gP(

r ε G |

r ε g,s)P(

r ε g |s)∫

Learnability Factoredness

• Solve for personal utility g that maximizes learnability, while constrained to the set of factored utilities


Ames Research CenterAristocrat Utility

• One can solve for factored U with maximal learnability i.e., a U with good term 2 and 3 in central equation:

• Intuitively, AU reflects the difference between the actual G and the average G (averaged over all actions you could take).

• For simplicity, when evaluating AU here, we make the following approximation:

€

AUη (ζ ) ≡ G(ζ ) − E[G(ζ ) | ζ ^η ]

= G(ζ ) − pi.G(ζ

^η,CL

η

r s i )

i∑

1

Number of possible actions for pi() =



• Clamping parameter CLv: replace ’s state (taken

to be unary vector) with constant vector v• Clamping creates a new “virtual” worldline• In general v need not be a “legal” state for • Example: four agents, three actions. Agent 2 clamps

to “average action” vector a = (.33 .33 .33):

Clamping

0 0 0 1 1 1 3 0 9 0 0 0


Ames Research CenterWonderful Life Utility

• The Wonderful Life Utility (WLU) for is given by:

– Clamping to “null” action (v = 0) removes player from system (hence the name).

– Clamping to “average” action disturbs overall system minimally (can be viewed as approximation to AU).

– Theorem: WLU is factored regardless of v– Intuitively, WLU measures the impact of agent on the world

• Difference between world as it is, and world without • Difference between world as it is, and world where takes average

action

– WLU is “virtual” operation. System is not re-evolved.

€

WLUη (ζ ) ≡ G(ζ ) − G(ζ ^η ,CLη

r v )







• Illustration of Theory of Collectives II– Aristocrat utility – Wonderful life utility

• Interlude 2:Interlude 2:– El Farol bar problem: System equilibria and global El Farol bar problem: System equilibria and global

optimaoptima– Collective of rovers: Scientific return maximization

• Final thoughts


Ames Research CenterEl Farol Bar Problem

• Congestion game: A game where agents share the same action space, and world utility is a function purely of how many agents take each action.

• Illustrative Example: Arthur’s El Farol bar problem:– At each time step, each agent decides whether to attend a bar:

• If agent attends and bar is below capacity, agent gets reward

• If agent stays home and bar is above capacity, agent gets reward

– Problem is particularly interesting because rational agents cannot all correctly predict attendance:

• If most agents predict attendance will be low and therefore attend, attendance will be high

• If most agents predict high attendance and therefore do not attend …



Modified El Farol Bar Problem

• Each week agents select one of seven nights to attend a bar

€

G(ζ ) = xk (ζ t )e− xk (ζ t )

c

k =1

7

∑t

∑

Reward for night k at week t

Rt : Reward for week t

Attendance for night k at week t

Capacity of bar

• Further modifications:– Each week each agent selects two nights to attend bar.– ...– Each week each agent selects six nights to attend bar.


Ames Research CenterPersonal Utility Functions

• Two conventional utilities:– Uniform Division (UD): Divide each night’s total reward among

all agents that attended that night (the “natural” reward)

– Team Game (TG): Total world reward at time t (Rt)

• Three collective-based utilities:– WL 0 : WL utility with clamping parameter set to vector of 0s

(world utility minus “world utility without me”)

– WL 1 : WL utility with clamping parameter set to vector of 1s (world utility minus “world utility where I attend every night”)

– WL a : WL utility with clamping parameter set to vector of average action (world utility minus “world utility where I do what is “expected of me”)



Bar Problem: Utility Comparison

(Attend one night, 60 agents, c=3)


Ames Research CenterTypical Daily Bar Attendance

0

20

40

60

80

100

120

140

Daily Attendance

WLU TG UD

Days of week

(c=6; t=1000 s ; Number of agents = 168)



Scaling Properties (attend one night)

c=2,3,4,6,8,10,15, respectively



Performance vs. # of Nights to Attend

60 agents; c= 3,6,8,10,10,12,15 respectively



Collectives of Rovers

• Design a collective of autonomous agents to gather scientific information (e.g., rovers on Mars, submersibles under Europa)

– Some areas have more valuable information than others

– World Utility: Total importance weighted information collected

– Both the individual rovers and the collective need to be flexible so they can adapt to new circumstances

– Collective-based payoff utilities result in better performance than more “natural” approaches


Ames Research CenterWorld Utility

• Token value function:

– L : Location Matrix for all agents– L : Location Matrix agent – Lt

a: Location Matrix of agent at time t, had it taken action a at t-1

– : Initial token configuration

€

V (L,Θ) = Θx ,yx ,y∑ min(1,Lx ,y )

€

G(ζ ) = V (L ,Θ)

• World Utility :

• Note: Agents’ payoff utilities reduce to figuring out what “L” to use.


Ames Research CenterPayoff Utilities

€

WLUη

r 0 (ζ ) = G(ζ ) − V (L^η ,Θ)

€

SUη (ζ ) = V (Lη ,Θ)

€

AUη (ζ ) = G(ζ ) − p r a V (L^η + Lη

r a ,Θ)

r a ∈

r A η

∑• Collectives-Based Utility (theoretical):

• Selfish Utility :

€

TGη (ζ ) = V (L,Θ)• Team Game Utility :

• Collectives-Based Utility (practical):

€

WLUη

r a (ζ ) = G(ζ ) − V (L^η + p r

a Lη

r a

r a ∈

r A η

∑ ,Θ)


Ames Research CenterUtility Comparison in Rover Domain

100 rovers on a 32x32 grid


Ames Research CenterScaling Properties in Rover Domain



Summary

• Given a world utility, deploying RL algorithms provides a solution to the distributed design problem. But what utilities does one use?

• Theory of collectives shows how to configure and/or update the personal utilities of the agents so that they “unintentionally cooperate” to optimize the world utility

• Personal utilities based on collectives successfully applied to many domains (e.g., autonomous rovers, constellations of communication satellites, data routing, autonomous defects)

• Performance gains due to using collectives-based utilities increase with size of problem

• A fully fleshed science of collectives would benefit from and have applications to many other sciences

Date post:	19-Dec-2015
Category:	Documents
View:	246 times
Download:	3 times

Introduction to Collectives Kagan Tumer NASA Ames Research Center [email protected] kagan .

Documents