Optimal Synchronization and Games on Graphs talks/2013 optimal coop ctrl- local and glob… ·...

Optimal Synchronization and Games on Graphs

UTA Research Institute (UTARI)The University of Texas at Arlington

F.L. LewisMoncrief-O’Donnell Endowed Chair

Head, Controls & Sensors Group

Optimal Design for Synchronization &Games on Communication Graphs

Supported by AFOSR, NSF, AROThanks to Jie Huang


F.L. Lewis

http://ARRI.uta.edu/acs

Cooperative Control Synchronization: Optimal Design and Games on Communication Graphs

Supported by :NSF ‐ PAUL WERBOSARO, AFOSRNNSF of China China Project 111 at NEU

Invited by Derong Liu

Thanks toCesare AlippiZhang Huaguang

He who exerts his mind to the utmost knows nature’s pattern.

The way of learning is none other than finding the lost mind.

Meng Tz500 BC

Man’s task is to understand patterns innature and society.

Mencius

Kung Tz500 BC

Confucius

ArcheryChariot driving

MusicRites and Rituals

PoetryMathematics

孔子 Man’s relations toFamilyFriendsSocietyNationEmperorAncestors

Outline

Optimal Design for Synchronization of Cooperative Systems

Distributed Observer and Dynamic Regulator

Discrete-time Optimal Design for Synchronization

Graphical Games

Control Design Methodsfor Multi‐Agent Systems

Acks. to:Guanrong Chen – Pinning controlLihua Xie - Local nbhd. tracking errorZhihua Qu - Lyapunov eq. for di-graphs

Books Coming

F.L. Lewis, H. Zhang, A. Das, K. Hengster-Movric, Cooperative Control of Multi-Agent Systems: Optimal Design and Adaptive Control, Springer-Verlag, 2013, to appear.

Key Point

Lyapunov Functions and Performance IndicesMust depend on graph topology

Hongwei Zhang, F.L. Lewis, and Abhijit Das“Optimal design for synchronization of cooperative systems: state feedback, observer and outputfeedback,”IEEE Trans. Automatic Control, vol. 56, no. 8, pp. 1948-1952, August 2011.

OutlineCooperative ControlLocally Optimal Design and SynchronizationGlobally Optimal Design for Collective MotionMulti‐player Games on Communication GraphsReinforcement Learning for Game Solutions

OutlineCooperative ControlLocally Optimal Design and SynchronizationGlobally Optimal Design for Collective MotionMulti‐player Games on Communication GraphsReinforcement Learning for Game Solutions

J.J. Finnigan, Complex science for a complex world

The Internet

ecosystem ProfessionalCollaboration network

Barcelona rail network

Structure of Natural andManmade Systems

Local nature of Physical LawsPeer-to-Peer Relationships

in networked systems

Clusters of galaxies

Synchronized Motion of Biological Groups

Fishschool

Birdsflock

Locustsswarm

Firefliessynchronize

The Power of Synchronization Coupled OscillatorsDiurnal Rhythm

Outline

A. Stable Design for Synchronization of Cooperative Systems

B. Global Optimal Design for Collective Group Motion

Stability vs. Optimality ofCooperative Control

Issues: For cooperative control on graphs -

Local stability of each agent is NOT the same as stable synchronization of the team

Local optimality of each agent is NOT the same a global optimality of the team

1

2

3

4

56

Diameter= length of longest path between two nodes

Volume = sum of in-degrees1

N

ii

Vol d

Spanning treeRoot node

Strongly connected if for all nodes i and j there is a path from i to j.

Tree- every node has in-degree=1Leader or root node

Followers

Communication Graph

Communication Graph1

2

3

4

56

N nodes

[ ]ijA a

0 ( , )ij j i

i

a if v v E

if j N

oN1

Noi ji

jd a

Out-neighbors of node iCol sum= out-degree

42a

Adjacency matrix

0 0 1 0 0 01 0 0 0 0 11 1 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 0 1 1 0

A

iN1

N

i ijj

d a

In-neighbors of node iRow sum= in-degreei

(V,E)

i

Dynamic Graph- the Distributed Structure of ControlEach node has an associated state i ix u

Standard local voting protocol ( )i

i ij j ij N

u a x x

1

1i i

i i ij ij j i i i iNj N j N

N

xu x a a x d x a a

x

( )u Dx Ax D A x Lx L=D-A = graph Laplacian matrix

x Lx

If x is an n-vector then ( )nx L I x

x

1

N

uu

u

1

N

dD

d

Closed-loop dynamics

i

j

[ ]ijA a

Communication Graph

N nodes

G=(V,E)

State at node i is ( )ix t

Synchronization problem( ) ( ) 0i jx t x t

1

2

3

4

56

Theorem. Graph contains a spanning tree iff e-val of L at is simple.

Graph strongly connected implies exists a spanning tree

Then 2 0

Then -L has one e-val at zero and all the rest stable

1 0

Then, all states synchronize using the local voting protocol

Laplacian matrixL=D-A

1 1

( ) (0) (0) (0)i i

N Nt tLt T T

i i i ij j

x t e x v e w x w x e v

Consensus Value and Convergence Rate

x Lx Closed-loop system with local voting protocol

Modal decomposition

Let be simple. Then for large t1 0

2 1 22 2 1 1 2 2

1( ) (0) (0) (0) 1 (0)

Nt t tT T T

j jj

x t v e w x v e w x v e w x x

2 determines the rate of convergence and is called the FIEDLER e-value

1 0

and the Fiedler e-val 2There is a big push to find expressions for the left e-vector for

Let graph have a spanning tree. Then all nodes reach consensus.

1 1

( ) (0) (0) (0)i i

N Nt tLt T T

i i i ij j

x t e x v e w x w x e v

Convergence Value and Rate

x Lx Closed‐loop system with local voting protocol

Modal decomposition

Let be simple. Then for large t1 0

2 1 22 2 1 1 2 2

1( ) (0) (0) (0) 1 (0)

Nt t tT T T

j jj

x t v e w x v e w x v e w x x

2 determines the rate of convergence ‐ Fiedler e‐value

1 1 2Tw determines the consensus value in terms of the initial conditions

Depends on Communication Graph TopologyNo freedom to determine the consensus value

L has e‐val at zero

We call this the Cooperative Regulator Problem

is simple if the graph is strongly connected1 0

1

2 3

4 5 6

12

3

4 5

6

Graph Eigenvalues for Different Communication Topologies

Directed Tree-Chain of command

Directed Ring-Gossip networkOSCILLATIONS


Directed graph-Better conditioned

Undirected graph-More ill-conditioned

65

34

2

1

4

5

6

2

3

1

Synchronization on Good Graphs

Chris Elliott fast video

65

34

2

1

1

2 3

4 5 6

Mesh graph4 neighbors

Synchronization on Gossip Rings

Chris Elliott weird video

12

3

4 5

6

Ring graphor cycle

10 nodes

These beautiful pictures are from a lecture by Ron Chen, City U. Hong KongPinning Control of Graphs

Natural and biological structures

Locally Optimal Design and Synchronization

Controlled Consensus: Cooperative Tracker

Node state i ix uDistributed Local voting protocol with control node v

( ) ( )i

i ij j i i ij N

u a x x b v x

( ) 1x L B x B v i i

i ij i ij j ij N j N

u a x a x b v

0ib If control v is in the neighborhood of node i

{ }iB diag b

Theorem. Let graph have a spanning tree and for at least one root node. Then L+B is nonsingular with all e-vals positiveand -(L+B) is asymptotically stable

0ib

control node v

Ron Chen – pinning control

Local Neighborhood Tracking Error

2 1 1,

2 1 0A B

0.5 0.5K

Agent Dynamics and Local Feedback Design

i i ix Ax Bu

i iu Kx

1

2

3

4

56

Couple 6 agents with communication graph

Nodes synchronize to consensus heading

-350 -300 -250 -200 -150 -100 -50 0 50-300

-250

-200

-150

-100

-50

0

50

x

y

0( ) ( )i

i ij j i i ij N

e x x g x x

Local neighborhood tracking error

i iu K

0xc.g. leader

2 1 1,

2 1 0A B

0.5 0.5K

Agent Dynamics and local Feedback design

i i ix Ax Bu

i iu Kx

1

2

3

4

56

ADD another comm. Link- more information flow

0( ) ( )i

i ij j i i ij N

e x x g x x


i iu K

Causes Unstable Formation!

-30 -25 -20 -15 -10 -5 0 5 10 15 20-25

-20

-15

-10

-5

0

5

10

15

20

25

WHY?

x

y

We want Design Freedom that overcomes graph topology constraints

Decouple Control Design from Graph Topology constraints

Guaranteed synchronization for general Directed graphs

Hongwei Zhang, F.L. Lewis, and Abhijit Das“Optimal design for synchronization of cooperative systems: state feedback,observer and output feedback”IEEE Trans. Automatic Control, vol. 56, no. 8, pp. 1948-1952, August 2011.

Guaranteed stability for continuous-time multi-agent systems on graphs -

A. STABLE DESIGN FOR COOPERATIVE CONTROL ON GRAPHS

i i ix Ax Bu

A. State Feedback Design for Cooperative Systems on Graphs

Cooperative Regulator vs. Cooperative Tracker problemN nodes with dynamics

Synchronization Tracker design problem 0( ) ( ),ix t x t i

0 0x AxControl node or Command generator (Exosystem)

0( ) ( )i

i ij j i i ij N

e x x g x x

0n ne L G I x x L G I

1 2 ,TT T T nN

Ne R 0 0 ,nNx Ix R 1 nN nnI I R

0nNx x R


Overall error vector

Consensus or synchronization error

where

= Local quantity

= Global quantity

i

j

, ,n mi ix R u R x0(t)

Ron Chen- pinning control Lihua Xie- error

L=D-A


Local quantity Global quantity

Local control objectives imply global performance

Local Neighborhood Tracking Error

0( ) ( )i

i i ij j i i ij N

u cK cK e x x g x x

0( ) ( )i

i i i i ij j i i ij N

x Ax Bu Ax cBK e x x g x x

0( ) ( ) ( )Nx I A c L G BK x c L G BK x

( ) ( )NI A c L G BK

Closed loop system

Overall c.l. dynamics

Global synch. error dynamics

Fax and Murray 2004

1 2 ,TT T T nN

Nx x x x R Overall state

Graph structure Control structure

Coop. nbhd SVFB

MIXES UP CONTROL DESIGN AND GRAPH STRUCTURE

0x Ix

( )u c L G K Distributed form of control

The key to global stability and synchronization of the collective

is

Locally optimal design for each agent

Lewis and Syrmos1995

DECOUPLES CONTROL DESIGN FROM COMMUNICATION GRAPH STRUCTURE

OPTIMALDesign at Each node

LOCAL OPTIMAL DESIGN Guarantees Global Synchronization

12

0

(x )T Ti i i i iJ Qx u Ru dt

minimizes

0( ) ( )i

i i ij j i i ij N

u cK cK e x x g x x

Optimal Control 3rd edLewis, Vrabie, Syrmos2012

S. Tuna, “LQR-based coupling gain for synchronization of linear systems,” Arxiv preprint arXiv:0801.3390, 2008.

Hongwei Zhang, F.L. Lewis, and Abhijit Das, “Optimal design for synchronization of cooperativesystems: state feedback, observer and output feedback”IEEE Trans. Automatic Control, vol. 56, no. 8, pp. 1948-1952, August 2011.

Optimal Control 3rd edLewis, Vrabie, Syrmos2012

Emre Tuna 2008 paper online

OPTIMAL Design at each node gives global guaranteed performance on any strongly connected communication graph

OPTIMALDesign at Each node

Li, Duan, Chen-Finsler’s Lemma

1

2 3

4 5 6

12

3

4 5

6




Example: Unbounded Region of Consensus for Optimal Feedback Gains.

2 1 1,

2 1 0A B

0.5 0.5K

b. Unbounded Consensus Region forOptimal SVFB Gain

a. Bounded Consensus Region forArbitrarily Chosen Stabilizing SVFB Gain

Q=I, R=1

1.544 1.8901K

Example from [Li, Duan, Chen 2009]

Im{ }

Re{ }

Im{ }

Re{ }

A c BK E-vals of (L+G)

Results:

Local Riccati Design yields guaranteed stable synchronization

Decouples Controls Design from Graph Properties

Globally Optimal Design for Collective Group Motions

Outline

A. Stable Design for Synchronization of Cooperative Systems

B. Optimal Design for Collective Group Motion

Stability vs. Optimality ofCooperative Control

Issues: For cooperative control on graphs -

Local stability of each agent is NOT the same as stable synchronization of the team

Local optimality of each agent is NOT the same a global optimality of the team

Have seen that LOCAL OPTIMAL DESIGN Guarantees Global Synchronization

The method just shown guarantees synchronization on arbitrary graphsIt is a LOCAL OPTIMAL DESIGN at each agent

B. GLOBAL OPTIMAL DESIGN FOR COLLECTIVE MOTION ON GRAPHS

What about Global Optimality of cooperative control on graphs?

Problem- the global optimal control is not distributed

The global optimal control is generally distributed only on a complete graph – Wei Ren

ni i ix Ax Bu

( ) ( )x I A x I B u Ax Bu ( ) ( )I A I B u A Bu

Agent dynamics

Global dynamics

LQR

1T TA P PA Q PBR B P

12

0

( )T TJ Q u Ru dt

ARE

Control 1 Tu R B P is distributed only on a complete graph- Wei Ren

BUT- a distributed control must have the form ( )u c L G K

So Q and R must depend on the graph topology

0x Ix

1T TA P PA Q PBR B P LQR case- ARE

Given A, B, and the distributed control form, find Q and R

( )u c L G K

Inverse Optimality

Kristian Hengster-Movric

2 2 0( ) ( )i

i i ij j i i ij N

u cK cK e x x g x x

2 0( ) ( )i



2 2 0( ) ( ) ( )Nx I A c L G BK x c L G BK x

2( ) ( )NI A c L G BK Global synch. error dynamics


( ) ( )x I A x I B u

ni i ix Ax Bu

( ) ( )I A I B u

2( )u c L G K

ne L G I

0( ) ( )i

i ij j i i ij N

e x x g x x

Closed‐loop system

Distributed Control

System 0 0x Ax

0x Ix

Local nbhd tracking error

Global disagreement error

Leader

,i i ix u x R

x u TNxxx 1 1T

Nu u u

1T

Ne ( )e L G

i iu ( )u L G

{ }iG diag g

( )u L G

B.1 Optimal Cooperative Tracker for Single-Integrator Dynamics0 0x

0( ) ( )i

i ij j i i ij N

e x x g x x

System


Global disagreement error

control

Closed‐loop System

Leader node


No control structure hereFocus on graph structure

0x Ix


1T TA P PA Q PBR B P

x u

Q

Use local nbhd tracking errorIn the cost function!

Condition on graph topology


i i ix Ax Bu

Cooperative Regulator vs. Cooperative Tracker problemN nodes with dynamics

Synchronization Tracker design problem 0( ) ( ),ix t x t i

0 0x AxControl node or Command generator (Exosystem)

0( ) ( )i

i ij j i i ij N

e x x g x x


1 2 ,TT T T nN

Ne R 0 0 ,nNx Ix R 1 nN nnI I R

0nNx x R


Overall error vector

Consensus or synchronization error

where

= Local quantity

= Global quantity

i

j

, ,n mi ix R u R x0(t)

Ron Chen- pinning control Lihua Xie- error

L=D-A

Cooperative Tracker for Identical LTI Dynamics

ni i ix Ax Bu

( ) ( )x I A x I B u

( ) ( )I A I B u

2 2 0( ) ( )i

i i ij j i i ij N

u cK cK e x x g x x

2 0( ) ( )i



2 2 0( ) ( ) ( )Nx I A c L G BK x c L G BK x

2( ) ( )NI A c L G BK

2( )u c L G K

Cooperative Tracker for Identical LTI Dynamics

ne L G I

0( ) ( )i

i ij j i i ij N

e x x g x x

1T

Ne

0 0x Ax

Closed‐loop system


Systemleader

Control


0x Ix

max 1 2 2 2 2

min 1 2 2 2

( ( ) ( ))(( ) ( ) )

T

T T

R L G Q K R KcL G R L G K R K

Q depends on Graph topology

TWO CONDITIONS


The local optimal design from before

A new condition on the graph

1( ) ( ) ( ) ( ) 0T TI A P P I A Q P I B R I B P

1 2P P P

1 1( )P cR L G

12 2 2 2 2 2 0T TA P P A Q P BR B P

1 11 2 1 2 1 2 1 2 1 2

1 11 2 2 1 1 1 2 2 2

( )( )( ) 0

( ) ( ) 0

T T

T T

P A P P P A Q P P B R R P B P

P A P P A Q PR P P BR B P

11 2 2 2 2 2 2

1 11 1 1 1 2 2 2

( )

( ) ( ) 0

T T T

T

P A P P A Q P BR B P

Q PR P P BR B P

21 1( ) ( )TQ c L G R L G

1T TA P PA Q PBR B P ( ) ( )I A I B u A Bu

22 1 2 2 1 2 2

2 11 2 2 2 1 2 2 2 2

1 1 11 1 1 2 2 2 1 2 2 2 2

1 11 2 2 2 1 2 2 2 2

(( ) ) ( )(( ) ) ( ) ( )

( ) ( ) ( ) ( )

( )

( )

T T

T T T

T T

T T

Q c L G K R R L G K cR L G A P P A

c L G R L G K R K cR L G Q P BR B P

P R P P BR B P P Q P BR B P

Q P BR B P P Q P BR B P

1 1 1 11 2 1 2

1 11 1 2 2 2

( ) ( )( )( )

( )

T T T

T

u R B P R I B P R R I B P P

R P R B P c L G K

1 2R R R

Proof:

2 conditions:

ARE

System

Select

ARE

Choose Q

ARE

Control

Distributed !!


12 2 2 2 2 2 0T TA P P A Q P BR B P

1 1( )P cR L G

1. Condition on graph topology

For some 2 2 2 2 2 20, 0, 0T T TP P R R Q Q

2. Local agent control design condition – Same as before-local optimal control

For some 1 1 1 10, 0T TP P R R

Always holds if (A,B) reachable

Locally optimal design is also globally optimal on the graph if condition 1 holds

Two Conditions for global optimal design on the graph

Condition on Graph Topology

1 1( )P cR L G

1. Undirected Graphs ( )TL G L G

1 1( ) ( )TR L G L G R

1 1( ) ( )R L G L G R The condition becomes a Commutativity Requirement

Iff have the same eigenvectors1, ( )R L G

Case 1. 1R I

00 0

( , ) ( ( ) ( ) ) ( )T T T T TJ u L G L G u u dt e e u u dt

For single-integrator dynamics

TL T T

Case 2.

0TR T T 0 diagonal

1 1( ) ( )R L G L G R

Let

Select For any

Equivalent to

Jordan form

R depends on graph topology- ALL e-vectors

1 1 1 10, 0T TP P R R


2. Detail Balanced Graphs

i ij j jie e 1 ... 0N for

Then is a left eigenvector for L for e-val= 0 1T

N

, {1 / } 0iL DP D diag with a symmetric graph Laplacian matrix

1( )L G DP G D P D G DP

P

1( ) ( )P D L G R L G

Detail balanced implies reversibility of an associated Markov Process

Detail balanced implies balanced

1 1( )P cR L G 1 1( ) ( )TR L G L G R Equivalent to

1 1 1 10, 0T TP P R R

R depends on graph topology – principal left e-vector

3. Directed Graphs with Simple Graph Laplacian L+G

1( )T L G T

Select

1( ) ( )T T T TT L G T T L G T

( ) ( )T T TT T L G L G T T

0T TR T T R

Diagonal Jordan form

Dennis BernsteinMatrix book

1 1( )P cR L G 1 1( ) ( )TR L G L G R Equivalent to

1 1 1 10, 0T TP P R R

R depends on graph topology- ALL e-vectors

1( ) ( ) , 0T TL G R L G R R R

A new class of digraphs

Distributed Systems

1i i ix k Ax k Bu k

0 01x k Ax k

0i

i ij j i i ij N

e x x g x x

11i i i iu c d g K

kBKgdckAxkx iiiii 111

11 c Nk A k I A c I D G L G BK k

GLGDI 1

, 1,k k N

A.2 Discrete‐Time Optimal Design for Synchronization

Distributed systems

Command generator

Local Nbhd Tracking Error

Local closed‐loop dynamics

Local cooperative SVFB ‐ weighted

0 ( )k x k x k Global disagreement error dynamics

Weighted Graph Matrix

Weighted graph eigenvalues

Decouple controls design from graph topology

K. Hengster-Movric, Keyou You, F.L. Lewis, and Lihua Xie,, “Synchronization of Discrete-Time Multi-agent Systems on Graphs Using Riccati Design,” Automatica, to appear.

11 c Nk A k I A c I D G L G BK k

GLGDI 1

, 1,k k N

Weighted Graph Matrix

Weighted graph eigenvalues

Synchronization error dynamics

MIXES UP CONTROL DESIGN AND GRAPH STRUCTURE

1

r

c0

r0

Covering circle of graph eigenvalues

Synchronization region contains this circle

Ctrldesign

GraphProps.

Kristian MovricDecouple Controls Design From Graph Topology

Single‐Input case with Real Graph Eigenvalues

1/21/2 1 1/20max

0

( ( ) )T T Tr r Q A PB B PB B PAQc

0 max min

0 max min

.rc

If graph eigenvalues are real

u

u Ar

1For SI systems, for proper choice of Q

Mahler measure

2log uii

A intrinsic entropy rate = minimum data rate in a networked control system that enables stabilization of an unstable system – Baillieul and others.

min max/ Eigen‐ratio = ‘condition number’ of the communication graph

condition

Work on log quantization- Elia & Mitter, Lihua Xie


( ) | ( ) |iunstab

ile

M A A

Mahler Measure

Guoxiang Gu, L. Maronovici, and F.L. Lewis, “Consensusability of discrete-time dynamic multi-agent systems,” IEEE Trans. Automatic Control, to appear, 2012.

max

min

( )L G

Graph Condition Number

1

1

1( )1

M A

Synchronization guaranteed if

( ) log( ( ))h A M A

Topological Entropy1

1

1( ) log1

C L G

New definition- Graph Channel Capacity

Like to have

min large means fast convergence( ) 1G Varshney

1

2 3

4 5 6

12

3

4 5

6





Directed graph-Better conditioned

Undirected graph-More ill-conditioned

65

34

2

1

4

5

6

2

3

1

max min

max min

( ) u

u

A A


Is equivalent to max

min

( ) 1( ) 1AA

1i i ix k Ax k Bu k 0i

i ij j i i ij N

u cK e x x g x x

0( )i

i ij j i i ij N

u cF z K e x x g x x

Add stable filter

2max

min

( ) 1( ) 1AA

Filtered protocol gives synch. if

select ( )A 2

2 2 1min 2

min

( (1 ) ) ,T T TP A P I BB P A B PB

the stabilizing solution to0P

2 2 1min min( (1 ) )T TK I B PB B PA

1min min( ) ( )T z K zI A BK B

1 2

2

(1 )( )1 ( )

F zT z

Complementary sensitivity

Guoxiang Gu &Lewis – IEEE TAC

Improvement

max

min

( )G Graph Condition Number

Like to have ( ) 1G

min large means fast convergence

eigenratio= min

max

L.R. Varshney, “Distributed inference with costly wires”

max

min

( ) 1( ) 1AA

Games on Communication GraphsKyriakos Vamvoudakis, Mohammed Abouheaf

Sun Tz bin fa孙子兵法

Graphical Coalitional Games

500 BC


F.L. Lewis, K. Vamvoudakis, M. Abouheaf

http://ARRI.uta.edu/acs

Games on Communication Graphs

Supported by :NSF ‐ PAUL WERBOSARO, AFOSR

Manufacturing as the Interactions of Multiple AgentsEach machine has it own dynamics and cost functionNeighboring machines influence each other most stronglyThere are local optimization requirements as well as global necessities

,i i i ix Ax B u

0 0x Ax

0( ) ( ),ix t x t i

0( ) ( ),i

i ij i j i ij N

e x x g x x

( ) ,nix t ( ) im

iu t

Graphical GamesSynchronization‐ Cooperative Tracker Problem

Node dynamics

Target generator dynamics

Synchronization problem

Local neighborhood tracking error (Lihua Xie)

Pinning gains (Ron Chen)0ig

x0(t)

K.G. Vamvoudakis, F.L. Lewis, and G.R. Hudas, “Multi-Agent Differential Graphical Games: online adaptive learning solution for synchronization with optimality,” Automatica, vol. 48, no. 8, pp. 1598-1611, Aug. 2012.

,i i i ix Ax B u

0 0x Ax

0( ) ( ),ix t x t i

0( ) ( ),i

i ij i j i ij N

e x x g x x

( ) ,nix t ( ) im

iu t

1 2 ,TT T T nN

N 0 0nNx Ix

0 ,n nL G I x x L G I

/ ( )L G

0nNx x

1 2 ,TT T T nN

Nx x x x

Graphical GamesSynchronization‐ Cooperative Tracker Problem

Node dynamics

Target generator dynamics

Synchronization problem

Local neighborhood tracking error (Lihua Xie)

Global neighborhood tracking error

Lemma. Let graph be strongly connected and at least one pinning gain nonzero. Then

and agents synchronize iff ( ) 0t

Pinning gains (Ron Chen)0ig

x0(t)

Standard way =

( )i

i i i i i i ij j jj N

A d g B u e B u

0( ) ( )i

i ij i j i ij N

e x x g x x

12

0

( (0), , ) ( )i

T T Ti i i i i ii i i ii i j ij j

j N

J u u Q u R u u R u dt

12

0

( ( ), ( ), ( ))i i i iL t u t u t dt

12( ( )) ( )

i

T T Ti i i ii i i ii i j ij j

j Nt

V t Q u R u u R u dt

1

N

i ii

z Az B u

12

10

( (0), , ) ( )N

T Ti i i j ij j

j

J z u u z Qz u R u dt

( ) { : }i j iu t u j N

1

( , , )

( , ),

( ,{ : })

TN

i i j i

G U v

G V E v v v

v U U j N R

Graphical Game: Games on GraphsLocal nbhd. tracking error dynamics

Define Local nbhd. performance index

Local value functions for fixed policies iu

Static Graphical Game

Standard N‐player differential game

Values depend on all other agents

Value depends only on neighbors

Local agent dynamics driven by neighbors’ controls

Dynamics depend on all other agents

Values driven by neighbors’ controls

Kyriakos Vamvoudakis

( )i


A d g B u e B u

0( ) ( )i

i ij i j i ij N

e x x g x x

12

0

( (0), , ) ( )i


j N


12

0

( ( ), ( ), ( ))i i i iL t u t u t dt

12( ( )) ( )

i


j Nt


( ) { : }i j iu t u j N

1

( , , )

( , ),

( ,{ : })

TN

i i j i

G U v

G V E v v v

v U U j N R

Graphical Game: Games on GraphsLocal nbhd. tracking error dynamics

Define Local nbhd. performance index

Local value functions for fixed policies iu

Static Graphical Game

Value depends only on neighbors

Local agent dynamics driven by neighbors’ controls

Values driven by neighbors’ controls


1u

2u

iu Control action of player i

Value function of player i

New Differential Graphical Game

( )i


A d g B u e B u

State dynamics of agent i

Local DynamicsLocal Value Function

Only depends on graph neighbors

12

0

( (0), , ) ( )i


j N


1

N

i ii

z Az B u

12

10

( (0), , ) ( )N

T Ti i i j ij j

j

J z u u z Qz u R u dt

1u

2u

iu Control action of player i

Central Dynamics

Value function of player i

Standard Multi-Agent Differential Game

Central DynamicsLocal Value Functiondepends on ALL

other control actions

1 1 11 1 2 3 1 2 1 3 13 3 3( ) ( ) ( ) coi

teamJ J J J J J J J J J

1 1 12 1 2 3 2 1 2 3 23 3 3( ) ( ) ( ) coi


1 1 13 1 2 3 3 1 3 2 33 3 3( ) ( ) ( ) coi


The objective functions of each player can be written as a team average term plus a conflict of interest term:

1 1

1 1

( ) , 1,N N

coii j i j team iN N

j j

J J J J J J i N

For N-player zero-sum games, the first term is zero, i.e. the players have no goals in common.

For N-players

Team Interest vs. Self InterestCooperation vs. Collaboration

( ), { : }i j iu t u j N

{ : , }G i ju u j N j i

* * *1 2, ,...,u u u

* * * *( , ) ( , ),i i i G i i i G iJ J u u J u u i N

* *( )i i iJ J u

( ) ( , ) ( , ' ),i i i i G i i i G iJ u J u u J u u i

* 12( ( )) min ( )

ii

T T Ti i i ii i i ii i j ij ju

j Nt


Problems with Nash Equilibrium Definition on Graphical GamesGame objective

Neighbors of node i

All other nodes in graphDef: Nash equilibrium

are in Nash equilibrium if

Counterexample. Disconnected graph

Let each node play his optimal control

Then, each agent’s cost does not depend on any other agent

Then all agents are in Nash equilibriumNote‐ this Nash is also coalition‐proof

Another example

Define

Def. Local Best response.is said to be agent i’s local best response to fixed policies of its neighbors if

* * *( , ) ( , ), ,i j G j i j G jJ u u J u u i j N

A restriction on what sorts of performance indices can be selected in multi‐player graph games.

A condition on the reaction curves (Basar and Olsder) of the agents

This rules out the disconnected counterexample.

*( , ) ( , ),i i i i i i iJ u u J u u u

*iu iu

New Definition of Nash Equilibrium for Graphical Games

* * *1 2, ,...,u u u

* * * *( , ) ( , ),i i i G i i i G iJ J u u J u u i N

Def: Interactive Nash equilibrium

are in Interactive Nash equilibrium if

2. There exists a policy such that ju

1.

That is, every player can find a policy that changes the value of every other player.

They are in Nash equilibrium

Interaction Condition

Theorem 3. Let (A,Bi) be reachable for all i. Let agent i be in local best response

Then are in global Interactive Nash iff the graph is strongly connected.

*( , ) ( , ),i i i i i iJ u u J u u i

* * *1 2, ,...,u u u

i

k

( ) (( ) ) ( ) (( ) )0( ) ( )

N n i i n kk kT

ii N

I A L G I diag B K L G I Bv A Bv

p pp diag Q I A

1( ) ( ) T ii i i i i ii i i i

i

Vu u V d g R B K p

k k k ku K p v

2B AB A B

Picks out the shortest path from node k to node i

A

BAB

Hamiltonian System

L G2( )L G

3( )L G

1 1 12 2 2( , , , ) ( ) 0

i i

TT T Ti i

i i i i i i i i i ij j j i ii i i ii i j ij ji i j N j N

V VH u u A d g B u e B u Q u R u u R u

10 ( ) Ti ii i i ii i

i i

H Vu d g R B

u

2 1 2 1 11 1 12 2 2( ) ( ) 0,

i

TT Tj jc T T Ti i i

i i ii i i i i ii i j j j jj ij jj ji i i j jj N

V VV V VA Q d g B R B d g B R R R B i N

2 1 1( ) ( ) ,i

jc T Tii i i i i ii i ij j j j jj j

i jj N

VVA A d g B R B e d g B R B i N

* *( , , , ) 0ii i i i

i

VH u u

* 2 11 1 12 2 20 ( , , , ) ( )

i

T Tc T T Ti i i i

i i i i i i ii i i i i ii i j ij ji i i i j N

V V V VH u u A Q d g B R B u R u

2 1( )

i

c T ii i i i i ii i ij j j

i j N

VA A d g B R B e B u

12( ( )) ( )

i


j Nt


Value function

Differential equivalent (Leibniz formula) is Bellman’s Equation

Stationarity Condition

1. Coupled HJ equations

2. Best Response HJ Equations – other players have fixed policies

where

where

Graphical Game Solution Equations

ju


Reinforcement Learning to Solve Graphical Games

D. Vrabie, K. Vamvoudakis, and F.L. Lewis, Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles, IET Press, 2012.

BooksF.L. Lewis, D. Vrabie, and V. Syrmos, Optimal Control, third edition, John Wiley and Sons, New York, 2012.

New Chapters on:Reinforcement LearningDifferential Games

F.L. Lewis and D. Vrabie,“Reinforcement learning andadaptive dynamic programmingfor feedback control,” IEEECircuits & Systems Magazine,Invited Feature Article, pp. 32-50, Third Quarter 2009.

IEEE Control Systems Magazine“Reinforcement learning andfeedback Control,” Dec. 2012

Different methods of learning

SystemAdaptiveLearning system

ControlInputs

outputs

environmentTuneactor

Reinforcementsignal

Actor

Critic

Desiredperformance

Reinforcement learningIvan Pavlov 1890s

Actor-Critic LearningPaul Werbos

We want OPTIMAL performance- ADP- Approximate Dynamic Programming

Sutton & Barto book

Doya, Kimura, Kawato 2001

Limbic system

Cerebral cortexMotor areas

ThalamusBasal ganglia

Cerebellum

Brainstem

Spinal cord

Interoceptivereceptors

Exteroceptivereceptors

Muscle contraction and movement

Summary of Motor Control in the Human Nervous System

reflex

Supervisedlearning

ReinforcementLearning- dopamine

(eye movement)inf. olive

Hippocampus

Unsupervisedlearning

Limbic System

Motor control 200 Hz

theta rhythms 4-10 Hz

picture by E. StinguD. Vrabie

Memoryfunctions

Long term

Short term

Hierarchy of multiple parallel loops

gamma rhythms 30-100 Hz

Online Solution of Graphical Games

Use Reinforcement Learning Convergence Results

POLICY ITERATION


1 1 12 2 2( , , , ) ( ) 0

i i

TT T Ti i

i i i i i i i i i ij j j i ii i i ii i j ij ji i j N j N

V VH u u A d g B u e B u Q u R u u R u

1( ) T ii i i ii i

i

Vu d g R B


Policy Iteration gives structure needed for online graph games

Solve simultaneously online:

ˆ ˆ Ti i iV W

112

ˆˆ ( )T

T ii i i ii i i N

iu d g R B W

Weierstrass Approximator structures‐ 2 at each node

actor

critic

2 11 14 4

ˆ ˆ ˆ ˆ ˆ( )i

Tj jT T T T T

i i i ii i i N i i N j j j N j jj ij jj j j Nj jj N

W Q W DW d g W B R R R B W

ˆ ˆ( ( ) )i

ii i i i i i ij j j

i j N

A d g B u e B u

Bellman equation

Bellman equation becomes an algebraic equation in the parameters

Approximate values by a Critic Network at each node i

ˆ ˆ Ti i iV W

Approximate control policies by an Actor Network at each node i

112

ˆˆ ( )T

T ii N i i ii i i N

iu d g R B W

1

2 11 14 42

ˆˆ

ˆ ˆ ˆ ˆ ˆ[ ( ) ](1 )

i

i ii

Tj jT T T T Ti

i i i i ii i i N i i N j j j N j jj ij jj j j Ni i j jj N

EW a

W

a W Q W DW d g W B R R R B W

Online Solution of Graphical Games Using Value Function Aproximation

Tuning law for Critic parameters

Tuning law for Actor parameters

2 11

1 1ˆ ˆ ˆ ˆ ˆ ˆ ˆ{( ) ( ) }4 4

ii

TT Tj jT T T Ti N i N

i N i N i i N i i N i i i N i i N j j j j jj ij jj jsi s j jj N

j i

W a F W F W DW W W d g W B R R R Bm m

Converges to solution of coupled HJ equations, and Nash equilibriumand keeps states stable while learningNeed PE of ˆ ˆ( ( ) )

i

ii i i i i i ij j j

i j N

A d g B u e B u

Lemma 1 – Draguna Vrabie

Solves Lyapunov equation without knowing f(x,u)

( ( )) ( , ) ( ( )), (0) 0t T

t

V x t r x u d V x t T V

0 ( , ) ( , ) ( , , ), (0) 0TV Vf x u r x u H x u V

x x

Another form for the CT Bellman eq.

Is equivalent to

Integral Reinforcement Form of Bellman Equation

Can avoid knowledge of drift term f(x) by using Integral Reinforcement Learning (IRL)

Draguna Vrabie

Then HJ equations are solved online without knowing f(x)Coupled AREs are solved online without knowing A

Bellman Equation

Can avoid knowledge of drift term f(x) by using Integral Reinforcement Learning (IRL)

Draguna Vrabie

Then HJ equations are solved online without knowing f(x)

Coupled AREs are solved online without knowing A

Lemma 1 – Draguna Vrabie

( ( )) ( , ) ( , )Td V x V f x u r x u

dt x

Proof:

( , ) ( ( )) ( ( )) ( ( ))t T t T

t t

r x u d d V x V x t V x t T

Solves Lyapunov equation without knowing f(x,u)

( ( )) ( , ) ( ( )), (0) 0t T

t

V x t r x u d V x t T V

0 ( , ) ( , ) ( , , ), (0) 0TV Vf x u r x u H x u V

x x

Allows definition of temporal difference error for CT systems

( ) ( ( )) ( , ) ( ( ))t T

t

e t V x t r x u d V x t T

Another form for the CT Bellman eq.

Is equivalent to

Integral Reinforcement Form of Bellman Equation

Draguna Vrabie

System

Action network

Policy Evaluation(Critic network)cost

Value updateControl policy update

Critic and Actor tuned simultaneouslyLeads to ONLINE FORWARD‐IN‐TIME implementation of optimal control

New Structure of Adaptive Controller

F.L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEECircuits & Systems Magazine, Invited Feature Article, pp. 32‐50, Third Quarter 2009.


Do not need to know system drift dynamics

Best Paper Award, Int. Joint Conf. Neural Networks, Barcelona, 2010. Draguna Vrabie and F.L. Lewis,“Adaptive Dynamic Programming Algorithm for Finding Online the Equilibrium Solution of the Two-PlayerZero-Sum Differential Game.”

Reinforcement Learning Adaptive Critic

( ) ( )i i i ix f x g x u

112

ˆˆ ( )T

T ii i i ii i i N

iu d g R B W

ˆ ˆ Ti i iV W

Process i Cost Function Optimization

Process j1 controlupdate

Process i controlupdate

Process j2 controlupdate

Optimal Performance of Each Process Depends on the Control of its Neighbor Processes

Control Policy of Each Process Depends on the Performance of its Neighbor Processes

Process i Cost Function Optimization

Process i controlupdate

Process j2 Cost Function Optimization

Process j1 Cost Function Optimization

Graphical Games for Multi-Process Optimal Control

Motions of Biological Groups

Fishschool

Birdsflock

Locustsswarm

Firefliessynchronize

Local / Peer-to-Peer Relationships in socio-biological systems

The cloud-capped towers, the gorgeous palaces,The solemn temples, the great globe itself,Yea, all which it inherit, shall dissolve,And, like this insubstantial pageant faded,Leave not a rack behind.

We are such stuff as dreams are made on, and our little life is rounded with a sleep.

Our revels now are ended. These our actors, As I foretold you, were all spirits, and Are melted into air, into thin air.

Prospero, in The Tempest, act 4, sc. 1, l. 152-6, Shakespeare

System

Action network


The Adaptive Critic Architecture

Adaptive Critics

Value update

Control policy update


Policy Iteration is Reinforcement Learning

F.L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEECircuits & Systems Magazine, Invited Feature Article, pp. 32‐50, Third Quarter 2009.

System

Action network


The Adaptive Critic Architecture

Adaptive Critics

Value update

Control policy update


A new adaptive control architecture

Optimal Adaptive Control

ˆ ˆ Ti i iV W

112

ˆˆ ( )T

T ii i i ii i i N

iu d g R B W

Date post:	01-Oct-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Optimal Synchronization and Games on Graphs talks/2013 optimal coop ctrl- local and glob… ·...

Documents