Optimal Synchronization and Games on Graphs
UTA Research Institute (UTARI)The University of Texas at Arlington
F.L. LewisMoncrief-O’Donnell Endowed Chair
Head, Controls & Sensors Group
Optimal Design for Synchronization &Games on Communication Graphs
Supported by AFOSR, NSF, AROThanks to Jie Huang
UTA Research Institute (UTARI)The University of Texas at Arlington
F.L. Lewis
http://ARRI.uta.edu/acs
Cooperative Control Synchronization: Optimal Design and Games on Communication Graphs
Supported by :NSF ‐ PAUL WERBOSARO, AFOSRNNSF of China China Project 111 at NEU
Invited by Derong Liu
Thanks toCesare AlippiZhang Huaguang
He who exerts his mind to the utmost knows nature’s pattern.
The way of learning is none other than finding the lost mind.
Meng Tz500 BC
Man’s task is to understand patterns innature and society.
Mencius
Kung Tz500 BC
Confucius
ArcheryChariot driving
MusicRites and Rituals
PoetryMathematics
孔子 Man’s relations toFamilyFriendsSocietyNationEmperorAncestors
Outline
Optimal Design for Synchronization of Cooperative Systems
Distributed Observer and Dynamic Regulator
Discrete-time Optimal Design for Synchronization
Graphical Games
Control Design Methodsfor Multi‐Agent Systems
Acks. to:Guanrong Chen – Pinning controlLihua Xie - Local nbhd. tracking errorZhihua Qu - Lyapunov eq. for di-graphs
Books Coming
F.L. Lewis, H. Zhang, A. Das, K. Hengster-Movric, Cooperative Control of Multi-Agent Systems: Optimal Design and Adaptive Control, Springer-Verlag, 2013, to appear.
Key Point
Lyapunov Functions and Performance IndicesMust depend on graph topology
Hongwei Zhang, F.L. Lewis, and Abhijit Das“Optimal design for synchronization of cooperative systems: state feedback, observer and outputfeedback,”IEEE Trans. Automatic Control, vol. 56, no. 8, pp. 1948-1952, August 2011.
OutlineCooperative ControlLocally Optimal Design and SynchronizationGlobally Optimal Design for Collective MotionMulti‐player Games on Communication GraphsReinforcement Learning for Game Solutions
OutlineCooperative ControlLocally Optimal Design and SynchronizationGlobally Optimal Design for Collective MotionMulti‐player Games on Communication GraphsReinforcement Learning for Game Solutions
J.J. Finnigan, Complex science for a complex world
The Internet
ecosystem ProfessionalCollaboration network
Barcelona rail network
Structure of Natural andManmade Systems
Local nature of Physical LawsPeer-to-Peer Relationships
in networked systems
Clusters of galaxies
Synchronized Motion of Biological Groups
Fishschool
Birdsflock
Locustsswarm
Firefliessynchronize
The Power of Synchronization Coupled OscillatorsDiurnal Rhythm
Outline
A. Stable Design for Synchronization of Cooperative Systems
B. Global Optimal Design for Collective Group Motion
Stability vs. Optimality ofCooperative Control
Issues: For cooperative control on graphs -
Local stability of each agent is NOT the same as stable synchronization of the team
Local optimality of each agent is NOT the same a global optimality of the team
1
2
3
4
56
Diameter= length of longest path between two nodes
Volume = sum of in-degrees1
N
ii
Vol d
Spanning treeRoot node
Strongly connected if for all nodes i and j there is a path from i to j.
Tree- every node has in-degree=1Leader or root node
Followers
Communication Graph
Communication Graph1
2
3
4
56
N nodes
[ ]ijA a
0 ( , )ij j i
i
a if v v E
if j N
oN1
Noi ji
jd a
Out-neighbors of node iCol sum= out-degree
42a
Adjacency matrix
0 0 1 0 0 01 0 0 0 0 11 1 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 0 1 1 0
A
iN1
N
i ijj
d a
In-neighbors of node iRow sum= in-degreei
(V,E)
i
Dynamic Graph- the Distributed Structure of ControlEach node has an associated state i ix u
Standard local voting protocol ( )i
i ij j ij N
u a x x
1
1i i
i i ij ij j i i i iNj N j N
N
xu x a a x d x a a
x
( )u Dx Ax D A x Lx L=D-A = graph Laplacian matrix
x Lx
If x is an n-vector then ( )nx L I x
x
1
N
uu
u
1
N
dD
d
Closed-loop dynamics
i
j
[ ]ijA a
Communication Graph
N nodes
G=(V,E)
State at node i is ( )ix t
Synchronization problem( ) ( ) 0i jx t x t
1
2
3
4
56
Theorem. Graph contains a spanning tree iff e-val of L at is simple.
Graph strongly connected implies exists a spanning tree
Then 2 0
Then -L has one e-val at zero and all the rest stable
1 0
Then, all states synchronize using the local voting protocol
Laplacian matrixL=D-A
1 1
( ) (0) (0) (0)i i
N Nt tLt T T
i i i ij j
x t e x v e w x w x e v
Consensus Value and Convergence Rate
x Lx Closed-loop system with local voting protocol
Modal decomposition
Let be simple. Then for large t1 0
2 1 22 2 1 1 2 2
1( ) (0) (0) (0) 1 (0)
Nt t tT T T
j jj
x t v e w x v e w x v e w x x
2 determines the rate of convergence and is called the FIEDLER e-value
1 0
and the Fiedler e-val 2There is a big push to find expressions for the left e-vector for
Let graph have a spanning tree. Then all nodes reach consensus.
1 1
( ) (0) (0) (0)i i
N Nt tLt T T
i i i ij j
x t e x v e w x w x e v
Convergence Value and Rate
x Lx Closed‐loop system with local voting protocol
Modal decomposition
Let be simple. Then for large t1 0
2 1 22 2 1 1 2 2
1( ) (0) (0) (0) 1 (0)
Nt t tT T T
j jj
x t v e w x v e w x v e w x x
2 determines the rate of convergence ‐ Fiedler e‐value
1 1 2Tw determines the consensus value in terms of the initial conditions
Depends on Communication Graph TopologyNo freedom to determine the consensus value
L has e‐val at zero
We call this the Cooperative Regulator Problem
is simple if the graph is strongly connected1 0
1
2 3
4 5 6
12
3
4 5
6
Graph Eigenvalues for Different Communication Topologies
Directed Tree-Chain of command
Directed Ring-Gossip networkOSCILLATIONS
Graph Eigenvalues for Different Communication Topologies
Directed graph-Better conditioned
Undirected graph-More ill-conditioned
65
34
2
1
4
5
6
2
3
1
Synchronization on Good Graphs
Chris Elliott fast video
65
34
2
1
1
2 3
4 5 6
Mesh graph4 neighbors
Synchronization on Gossip Rings
Chris Elliott weird video
12
3
4 5
6
Ring graphor cycle
10 nodes
These beautiful pictures are from a lecture by Ron Chen, City U. Hong KongPinning Control of Graphs
Natural and biological structures
Locally Optimal Design and Synchronization
Controlled Consensus: Cooperative Tracker
Node state i ix uDistributed Local voting protocol with control node v
( ) ( )i
i ij j i i ij N
u a x x b v x
( ) 1x L B x B v i i
i ij i ij j ij N j N
u a x a x b v
0ib If control v is in the neighborhood of node i
{ }iB diag b
Theorem. Let graph have a spanning tree and for at least one root node. Then L+B is nonsingular with all e-vals positiveand -(L+B) is asymptotically stable
0ib
control node v
Ron Chen – pinning control
Local Neighborhood Tracking Error
2 1 1,
2 1 0A B
0.5 0.5K
Agent Dynamics and Local Feedback Design
i i ix Ax Bu
i iu Kx
1
2
3
4
56
Couple 6 agents with communication graph
Nodes synchronize to consensus heading
-350 -300 -250 -200 -150 -100 -50 0 50-300
-250
-200
-150
-100
-50
0
50
x
y
0( ) ( )i
i ij j i i ij N
e x x g x x
Local neighborhood tracking error
i iu K
0xc.g. leader
2 1 1,
2 1 0A B
0.5 0.5K
Agent Dynamics and local Feedback design
i i ix Ax Bu
i iu Kx
1
2
3
4
56
ADD another comm. Link- more information flow
0( ) ( )i
i ij j i i ij N
e x x g x x
Local neighborhood tracking error
i iu K
Causes Unstable Formation!
-30 -25 -20 -15 -10 -5 0 5 10 15 20-25
-20
-15
-10
-5
0
5
10
15
20
25
WHY?
x
y
We want Design Freedom that overcomes graph topology constraints
Decouple Control Design from Graph Topology constraints
Guaranteed synchronization for general Directed graphs
Hongwei Zhang, F.L. Lewis, and Abhijit Das“Optimal design for synchronization of cooperative systems: state feedback,observer and output feedback”IEEE Trans. Automatic Control, vol. 56, no. 8, pp. 1948-1952, August 2011.
Guaranteed stability for continuous-time multi-agent systems on graphs -
A. STABLE DESIGN FOR COOPERATIVE CONTROL ON GRAPHS
i i ix Ax Bu
A. State Feedback Design for Cooperative Systems on Graphs
Cooperative Regulator vs. Cooperative Tracker problemN nodes with dynamics
Synchronization Tracker design problem 0( ) ( ),ix t x t i
0 0x AxControl node or Command generator (Exosystem)
0( ) ( )i
i ij j i i ij N
e x x g x x
0n ne L G I x x L G I
1 2 ,TT T T nN
Ne R 0 0 ,nNx Ix R 1 nN nnI I R
0nNx x R
Local neighborhood tracking error
Overall error vector
Consensus or synchronization error
where
= Local quantity
= Global quantity
i
j
, ,n mi ix R u R x0(t)
Ron Chen- pinning control Lihua Xie- error
L=D-A
0n ne L G I x x L G I
Local quantity Global quantity
Local control objectives imply global performance
Local Neighborhood Tracking Error
0( ) ( )i
i i ij j i i ij N
u cK cK e x x g x x
0( ) ( )i
i i i i ij j i i ij N
x Ax Bu Ax cBK e x x g x x
0( ) ( ) ( )Nx I A c L G BK x c L G BK x
( ) ( )NI A c L G BK
Closed loop system
Overall c.l. dynamics
Global synch. error dynamics
Fax and Murray 2004
1 2 ,TT T T nN
Nx x x x R Overall state
Graph structure Control structure
Coop. nbhd SVFB
MIXES UP CONTROL DESIGN AND GRAPH STRUCTURE
0x Ix
( )u c L G K Distributed form of control
The key to global stability and synchronization of the collective
is
Locally optimal design for each agent
Lewis and Syrmos1995
DECOUPLES CONTROL DESIGN FROM COMMUNICATION GRAPH STRUCTURE
OPTIMALDesign at Each node
LOCAL OPTIMAL DESIGN Guarantees Global Synchronization
12
0
(x )T Ti i i i iJ Qx u Ru dt
minimizes
0( ) ( )i
i i ij j i i ij N
u cK cK e x x g x x
Optimal Control 3rd edLewis, Vrabie, Syrmos2012
S. Tuna, “LQR-based coupling gain for synchronization of linear systems,” Arxiv preprint arXiv:0801.3390, 2008.
Hongwei Zhang, F.L. Lewis, and Abhijit Das, “Optimal design for synchronization of cooperativesystems: state feedback, observer and output feedback”IEEE Trans. Automatic Control, vol. 56, no. 8, pp. 1948-1952, August 2011.
Optimal Control 3rd edLewis, Vrabie, Syrmos2012
Emre Tuna 2008 paper online
OPTIMAL Design at each node gives global guaranteed performance on any strongly connected communication graph
OPTIMALDesign at Each node
Li, Duan, Chen-Finsler’s Lemma
1
2 3
4 5 6
12
3
4 5
6
Graph Eigenvalues for Different Communication Topologies
Directed Tree-Chain of command
Directed Ring-Gossip networkOSCILLATIONS
Example: Unbounded Region of Consensus for Optimal Feedback Gains.
2 1 1,
2 1 0A B
0.5 0.5K
b. Unbounded Consensus Region forOptimal SVFB Gain
a. Bounded Consensus Region forArbitrarily Chosen Stabilizing SVFB Gain
Q=I, R=1
1.544 1.8901K
Example from [Li, Duan, Chen 2009]
Im{ }
Re{ }
Im{ }
Re{ }
A c BK E-vals of (L+G)
Results:
Local Riccati Design yields guaranteed stable synchronization
Decouples Controls Design from Graph Properties
Globally Optimal Design for Collective Group Motions
Outline
A. Stable Design for Synchronization of Cooperative Systems
B. Optimal Design for Collective Group Motion
Stability vs. Optimality ofCooperative Control
Issues: For cooperative control on graphs -
Local stability of each agent is NOT the same as stable synchronization of the team
Local optimality of each agent is NOT the same a global optimality of the team
Have seen that LOCAL OPTIMAL DESIGN Guarantees Global Synchronization
The method just shown guarantees synchronization on arbitrary graphsIt is a LOCAL OPTIMAL DESIGN at each agent
B. GLOBAL OPTIMAL DESIGN FOR COLLECTIVE MOTION ON GRAPHS
What about Global Optimality of cooperative control on graphs?
Problem- the global optimal control is not distributed
The global optimal control is generally distributed only on a complete graph – Wei Ren
ni i ix Ax Bu
( ) ( )x I A x I B u Ax Bu ( ) ( )I A I B u A Bu
Agent dynamics
Global dynamics
LQR
1T TA P PA Q PBR B P
12
0
( )T TJ Q u Ru dt
ARE
Control 1 Tu R B P is distributed only on a complete graph- Wei Ren
BUT- a distributed control must have the form ( )u c L G K
So Q and R must depend on the graph topology
0x Ix
1T TA P PA Q PBR B P LQR case- ARE
Given A, B, and the distributed control form, find Q and R
( )u c L G K
Inverse Optimality
Kristian Hengster-Movric
2 2 0( ) ( )i
i i ij j i i ij N
u cK cK e x x g x x
2 0( ) ( )i
i i i i ij j i i ij N
x Ax Bu Ax cBK e x x g x x
2 2 0( ) ( ) ( )Nx I A c L G BK x c L G BK x
2( ) ( )NI A c L G BK Global synch. error dynamics
Graph structure Control structure
( ) ( )x I A x I B u
ni i ix Ax Bu
( ) ( )I A I B u
2( )u c L G K
ne L G I
0( ) ( )i
i ij j i i ij N
e x x g x x
Closed‐loop system
Distributed Control
System 0 0x Ax
0x Ix
Local nbhd tracking error
Global disagreement error
Leader
,i i ix u x R
x u TNxxx 1 1T
Nu u u
1T
Ne ( )e L G
i iu ( )u L G
{ }iG diag g
( )u L G
B.1 Optimal Cooperative Tracker for Single-Integrator Dynamics0 0x
0( ) ( )i
i ij j i i ij N
e x x g x x
System
Local nbhd tracking error
Global disagreement error
control
Closed‐loop System
Leader node
Graph structure Control structure
No control structure hereFocus on graph structure
0x Ix
Kristian Hengster-Movric
1T TA P PA Q PBR B P
x u
Q
Use local nbhd tracking errorIn the cost function!
Condition on graph topology
Kristian Hengster-Movric
i i ix Ax Bu
Cooperative Regulator vs. Cooperative Tracker problemN nodes with dynamics
Synchronization Tracker design problem 0( ) ( ),ix t x t i
0 0x AxControl node or Command generator (Exosystem)
0( ) ( )i
i ij j i i ij N
e x x g x x
0n ne L G I x x L G I
1 2 ,TT T T nN
Ne R 0 0 ,nNx Ix R 1 nN nnI I R
0nNx x R
Local neighborhood tracking error
Overall error vector
Consensus or synchronization error
where
= Local quantity
= Global quantity
i
j
, ,n mi ix R u R x0(t)
Ron Chen- pinning control Lihua Xie- error
L=D-A
Cooperative Tracker for Identical LTI Dynamics
ni i ix Ax Bu
( ) ( )x I A x I B u
( ) ( )I A I B u
2 2 0( ) ( )i
i i ij j i i ij N
u cK cK e x x g x x
2 0( ) ( )i
i i i i ij j i i ij N
x Ax Bu Ax cBK e x x g x x
2 2 0( ) ( ) ( )Nx I A c L G BK x c L G BK x
2( ) ( )NI A c L G BK
2( )u c L G K
Cooperative Tracker for Identical LTI Dynamics
ne L G I
0( ) ( )i
i ij j i i ij N
e x x g x x
1T
Ne
0 0x Ax
Closed‐loop system
Local nbhd tracking error
Systemleader
Control
Graph structure Control structure
0x Ix
max 1 2 2 2 2
min 1 2 2 2
( ( ) ( ))(( ) ( ) )
T
T T
R L G Q K R KcL G R L G K R K
Q depends on Graph topology
TWO CONDITIONS
Kristian Hengster-Movric
The local optimal design from before
A new condition on the graph
1( ) ( ) ( ) ( ) 0T TI A P P I A Q P I B R I B P
1 2P P P
1 1( )P cR L G
12 2 2 2 2 2 0T TA P P A Q P BR B P
1 11 2 1 2 1 2 1 2 1 2
1 11 2 2 1 1 1 2 2 2
( )( )( ) 0
( ) ( ) 0
T T
T T
P A P P P A Q P P B R R P B P
P A P P A Q PR P P BR B P
11 2 2 2 2 2 2
1 11 1 1 1 2 2 2
( )
( ) ( ) 0
T T T
T
P A P P A Q P BR B P
Q PR P P BR B P
21 1( ) ( )TQ c L G R L G
1T TA P PA Q PBR B P ( ) ( )I A I B u A Bu
22 1 2 2 1 2 2
2 11 2 2 2 1 2 2 2 2
1 1 11 1 1 2 2 2 1 2 2 2 2
1 11 2 2 2 1 2 2 2 2
(( ) ) ( )(( ) ) ( ) ( )
( ) ( ) ( ) ( )
( )
( )
T T
T T T
T T
T T
Q c L G K R R L G K cR L G A P P A
c L G R L G K R K cR L G Q P BR B P
P R P P BR B P P Q P BR B P
Q P BR B P P Q P BR B P
1 1 1 11 2 1 2
1 11 1 2 2 2
( ) ( )( )( )
( )
T T T
T
u R B P R I B P R R I B P P
R P R B P c L G K
1 2R R R
Proof:
2 conditions:
ARE
System
Select
ARE
Choose Q
ARE
Control
Distributed !!
Kristian Hengster-Movric
12 2 2 2 2 2 0T TA P P A Q P BR B P
1 1( )P cR L G
1. Condition on graph topology
For some 2 2 2 2 2 20, 0, 0T T TP P R R Q Q
2. Local agent control design condition – Same as before-local optimal control
For some 1 1 1 10, 0T TP P R R
Always holds if (A,B) reachable
Locally optimal design is also globally optimal on the graph if condition 1 holds
Two Conditions for global optimal design on the graph
Condition on Graph Topology
1 1( )P cR L G
1. Undirected Graphs ( )TL G L G
1 1( ) ( )TR L G L G R
1 1( ) ( )R L G L G R The condition becomes a Commutativity Requirement
Iff have the same eigenvectors1, ( )R L G
Case 1. 1R I
00 0
( , ) ( ( ) ( ) ) ( )T T T T TJ u L G L G u u dt e e u u dt
For single-integrator dynamics
TL T T
Case 2.
0TR T T 0 diagonal
1 1( ) ( )R L G L G R
Let
Select For any
Equivalent to
Jordan form
R depends on graph topology- ALL e-vectors
1 1 1 10, 0T TP P R R
Kristian Hengster-Movric
2. Detail Balanced Graphs
i ij j jie e 1 ... 0N for
Then is a left eigenvector for L for e-val= 0 1T
N
, {1 / } 0iL DP D diag with a symmetric graph Laplacian matrix
1( )L G DP G D P D G DP
P
1( ) ( )P D L G R L G
Detail balanced implies reversibility of an associated Markov Process
Detail balanced implies balanced
1 1( )P cR L G 1 1( ) ( )TR L G L G R Equivalent to
1 1 1 10, 0T TP P R R
R depends on graph topology – principal left e-vector
3. Directed Graphs with Simple Graph Laplacian L+G
1( )T L G T
Select
1( ) ( )T T T TT L G T T L G T
( ) ( )T T TT T L G L G T T
0T TR T T R
Diagonal Jordan form
Dennis BernsteinMatrix book
1 1( )P cR L G 1 1( ) ( )TR L G L G R Equivalent to
1 1 1 10, 0T TP P R R
R depends on graph topology- ALL e-vectors
1( ) ( ) , 0T TL G R L G R R R
A new class of digraphs
Distributed Systems
1i i ix k Ax k Bu k
0 01x k Ax k
0i
i ij j i i ij N
e x x g x x
11i i i iu c d g K
kBKgdckAxkx iiiii 111
11 c Nk A k I A c I D G L G BK k
GLGDI 1
, 1,k k N
A.2 Discrete‐Time Optimal Design for Synchronization
Distributed systems
Command generator
Local Nbhd Tracking Error
Local closed‐loop dynamics
Local cooperative SVFB ‐ weighted
0 ( )k x k x k Global disagreement error dynamics
Weighted Graph Matrix
Weighted graph eigenvalues
Decouple controls design from graph topology
K. Hengster-Movric, Keyou You, F.L. Lewis, and Lihua Xie,, “Synchronization of Discrete-Time Multi-agent Systems on Graphs Using Riccati Design,” Automatica, to appear.
11 c Nk A k I A c I D G L G BK k
GLGDI 1
, 1,k k N
Weighted Graph Matrix
Weighted graph eigenvalues
Synchronization error dynamics
MIXES UP CONTROL DESIGN AND GRAPH STRUCTURE
1
r
c0
r0
Covering circle of graph eigenvalues
Synchronization region contains this circle
Ctrldesign
GraphProps.
Kristian MovricDecouple Controls Design From Graph Topology
Single‐Input case with Real Graph Eigenvalues
1/21/2 1 1/20max
0
( ( ) )T T Tr r Q A PB B PB B PAQc
0 max min
0 max min
.rc
If graph eigenvalues are real
u
u Ar
1For SI systems, for proper choice of Q
Mahler measure
2log uii
A intrinsic entropy rate = minimum data rate in a networked control system that enables stabilization of an unstable system – Baillieul and others.
min max/ Eigen‐ratio = ‘condition number’ of the communication graph
condition
Work on log quantization- Elia & Mitter, Lihua Xie
Single‐Input case with Real Graph Eigenvalues
( ) | ( ) |iunstab
ile
M A A
Mahler Measure
Guoxiang Gu, L. Maronovici, and F.L. Lewis, “Consensusability of discrete-time dynamic multi-agent systems,” IEEE Trans. Automatic Control, to appear, 2012.
max
min
( )L G
Graph Condition Number
1
1
1( )1
M A
Synchronization guaranteed if
( ) log( ( ))h A M A
Topological Entropy1
1
1( ) log1
C L G
New definition- Graph Channel Capacity
Like to have
min large means fast convergence( ) 1G Varshney
1
2 3
4 5 6
12
3
4 5
6
Graph Eigenvalues for Different Communication Topologies
Directed Tree-Chain of command
Directed Ring-Gossip networkOSCILLATIONS
Graph Eigenvalues for Different Communication Topologies
Directed graph-Better conditioned
Undirected graph-More ill-conditioned
65
34
2
1
4
5
6
2
3
1
max min
max min
( ) u
u
A A
Single‐Input case with Real Graph Eigenvalues
Is equivalent to max
min
( ) 1( ) 1AA
1i i ix k Ax k Bu k 0i
i ij j i i ij N
u cK e x x g x x
0( )i
i ij j i i ij N
u cF z K e x x g x x
Add stable filter
2max
min
( ) 1( ) 1AA
Filtered protocol gives synch. if
select ( )A 2
2 2 1min 2
min
( (1 ) ) ,T T TP A P I BB P A B PB
the stabilizing solution to0P
2 2 1min min( (1 ) )T TK I B PB B PA
1min min( ) ( )T z K zI A BK B
1 2
2
(1 )( )1 ( )
F zT z
Complementary sensitivity
Guoxiang Gu &Lewis – IEEE TAC
Improvement
max
min
( )G Graph Condition Number
Like to have ( ) 1G
min large means fast convergence
eigenratio= min
max
L.R. Varshney, “Distributed inference with costly wires”
max
min
( ) 1( ) 1AA
Games on Communication GraphsKyriakos Vamvoudakis, Mohammed Abouheaf
Sun Tz bin fa孙子兵法
Graphical Coalitional Games
500 BC
UTA Research Institute (UTARI)The University of Texas at Arlington
F.L. Lewis, K. Vamvoudakis, M. Abouheaf
http://ARRI.uta.edu/acs
Games on Communication Graphs
Supported by :NSF ‐ PAUL WERBOSARO, AFOSR
Manufacturing as the Interactions of Multiple AgentsEach machine has it own dynamics and cost functionNeighboring machines influence each other most stronglyThere are local optimization requirements as well as global necessities
,i i i ix Ax B u
0 0x Ax
0( ) ( ),ix t x t i
0( ) ( ),i
i ij i j i ij N
e x x g x x
( ) ,nix t ( ) im
iu t
Graphical GamesSynchronization‐ Cooperative Tracker Problem
Node dynamics
Target generator dynamics
Synchronization problem
Local neighborhood tracking error (Lihua Xie)
Pinning gains (Ron Chen)0ig
x0(t)
K.G. Vamvoudakis, F.L. Lewis, and G.R. Hudas, “Multi-Agent Differential Graphical Games: online adaptive learning solution for synchronization with optimality,” Automatica, vol. 48, no. 8, pp. 1598-1611, Aug. 2012.
,i i i ix Ax B u
0 0x Ax
0( ) ( ),ix t x t i
0( ) ( ),i
i ij i j i ij N
e x x g x x
( ) ,nix t ( ) im
iu t
1 2 ,TT T T nN
N 0 0nNx Ix
0 ,n nL G I x x L G I
/ ( )L G
0nNx x
1 2 ,TT T T nN
Nx x x x
Graphical GamesSynchronization‐ Cooperative Tracker Problem
Node dynamics
Target generator dynamics
Synchronization problem
Local neighborhood tracking error (Lihua Xie)
Global neighborhood tracking error
Lemma. Let graph be strongly connected and at least one pinning gain nonzero. Then
and agents synchronize iff ( ) 0t
Pinning gains (Ron Chen)0ig
x0(t)
Standard way =
( )i
i i i i i i ij j jj N
A d g B u e B u
0( ) ( )i
i ij i j i ij N
e x x g x x
12
0
( (0), , ) ( )i
T T Ti i i i i ii i i ii i j ij j
j N
J u u Q u R u u R u dt
12
0
( ( ), ( ), ( ))i i i iL t u t u t dt
12( ( )) ( )
i
T T Ti i i ii i i ii i j ij j
j Nt
V t Q u R u u R u dt
1
N
i ii
z Az B u
12
10
( (0), , ) ( )N
T Ti i i j ij j
j
J z u u z Qz u R u dt
( ) { : }i j iu t u j N
1
( , , )
( , ),
( ,{ : })
TN
i i j i
G U v
G V E v v v
v U U j N R
Graphical Game: Games on GraphsLocal nbhd. tracking error dynamics
Define Local nbhd. performance index
Local value functions for fixed policies iu
Static Graphical Game
Standard N‐player differential game
Values depend on all other agents
Value depends only on neighbors
Local agent dynamics driven by neighbors’ controls
Dynamics depend on all other agents
Values driven by neighbors’ controls
Kyriakos Vamvoudakis
( )i
i i i i i i ij j jj N
A d g B u e B u
0( ) ( )i
i ij i j i ij N
e x x g x x
12
0
( (0), , ) ( )i
T T Ti i i i i ii i i ii i j ij j
j N
J u u Q u R u u R u dt
12
0
( ( ), ( ), ( ))i i i iL t u t u t dt
12( ( )) ( )
i
T T Ti i i ii i i ii i j ij j
j Nt
V t Q u R u u R u dt
( ) { : }i j iu t u j N
1
( , , )
( , ),
( ,{ : })
TN
i i j i
G U v
G V E v v v
v U U j N R
Graphical Game: Games on GraphsLocal nbhd. tracking error dynamics
Define Local nbhd. performance index
Local value functions for fixed policies iu
Static Graphical Game
Value depends only on neighbors
Local agent dynamics driven by neighbors’ controls
Values driven by neighbors’ controls
Kyriakos Vamvoudakis
1u
2u
iu Control action of player i
Value function of player i
New Differential Graphical Game
( )i
i i i i i i ij j jj N
A d g B u e B u
State dynamics of agent i
Local DynamicsLocal Value Function
Only depends on graph neighbors
12
0
( (0), , ) ( )i
T T Ti i i i i ii i i ii i j ij j
j N
J u u Q u R u u R u dt
1
N
i ii
z Az B u
12
10
( (0), , ) ( )N
T Ti i i j ij j
j
J z u u z Qz u R u dt
1u
2u
iu Control action of player i
Central Dynamics
Value function of player i
Standard Multi-Agent Differential Game
Central DynamicsLocal Value Functiondepends on ALL
other control actions
1 1 11 1 2 3 1 2 1 3 13 3 3( ) ( ) ( ) coi
teamJ J J J J J J J J J
1 1 12 1 2 3 2 1 2 3 23 3 3( ) ( ) ( ) coi
teamJ J J J J J J J J J
1 1 13 1 2 3 3 1 3 2 33 3 3( ) ( ) ( ) coi
teamJ J J J J J J J J J
The objective functions of each player can be written as a team average term plus a conflict of interest term:
1 1
1 1
( ) , 1,N N
coii j i j team iN N
j j
J J J J J J i N
For N-player zero-sum games, the first term is zero, i.e. the players have no goals in common.
For N-players
Team Interest vs. Self InterestCooperation vs. Collaboration
( ), { : }i j iu t u j N
{ : , }G i ju u j N j i
* * *1 2, ,...,u u u
* * * *( , ) ( , ),i i i G i i i G iJ J u u J u u i N
* *( )i i iJ J u
( ) ( , ) ( , ' ),i i i i G i i i G iJ u J u u J u u i
* 12( ( )) min ( )
ii
T T Ti i i ii i i ii i j ij ju
j Nt
V t Q u R u u R u dt
Problems with Nash Equilibrium Definition on Graphical GamesGame objective
Neighbors of node i
All other nodes in graphDef: Nash equilibrium
are in Nash equilibrium if
Counterexample. Disconnected graph
Let each node play his optimal control
Then, each agent’s cost does not depend on any other agent
Then all agents are in Nash equilibriumNote‐ this Nash is also coalition‐proof
Another example
Define
Def. Local Best response.is said to be agent i’s local best response to fixed policies of its neighbors if
* * *( , ) ( , ), ,i j G j i j G jJ u u J u u i j N
A restriction on what sorts of performance indices can be selected in multi‐player graph games.
A condition on the reaction curves (Basar and Olsder) of the agents
This rules out the disconnected counterexample.
*( , ) ( , ),i i i i i i iJ u u J u u u
*iu iu
New Definition of Nash Equilibrium for Graphical Games
* * *1 2, ,...,u u u
* * * *( , ) ( , ),i i i G i i i G iJ J u u J u u i N
Def: Interactive Nash equilibrium
are in Interactive Nash equilibrium if
2. There exists a policy such that ju
1.
That is, every player can find a policy that changes the value of every other player.
They are in Nash equilibrium
Interaction Condition
Theorem 3. Let (A,Bi) be reachable for all i. Let agent i be in local best response
Then are in global Interactive Nash iff the graph is strongly connected.
*( , ) ( , ),i i i i i iJ u u J u u i
* * *1 2, ,...,u u u
i
k
( ) (( ) ) ( ) (( ) )0( ) ( )
N n i i n kk kT
ii N
I A L G I diag B K L G I Bv A Bv
p pp diag Q I A
1( ) ( ) T ii i i i i ii i i i
i
Vu u V d g R B K p
k k k ku K p v
2B AB A B
Picks out the shortest path from node k to node i
A
BAB
Hamiltonian System
L G2( )L G
3( )L G
1 1 12 2 2( , , , ) ( ) 0
i i
TT T Ti i
i i i i i i i i i ij j j i ii i i ii i j ij ji i j N j N
V VH u u A d g B u e B u Q u R u u R u
10 ( ) Ti ii i i ii i
i i
H Vu d g R B
u
2 1 2 1 11 1 12 2 2( ) ( ) 0,
i
TT Tj jc T T Ti i i
i i ii i i i i ii i j j j jj ij jj ji i i j jj N
V VV V VA Q d g B R B d g B R R R B i N
2 1 1( ) ( ) ,i
jc T Tii i i i i ii i ij j j j jj j
i jj N
VVA A d g B R B e d g B R B i N
* *( , , , ) 0ii i i i
i
VH u u
* 2 11 1 12 2 20 ( , , , ) ( )
i
T Tc T T Ti i i i
i i i i i i ii i i i i ii i j ij ji i i i j N
V V V VH u u A Q d g B R B u R u
2 1( )
i
c T ii i i i i ii i ij j j
i j N
VA A d g B R B e B u
12( ( )) ( )
i
T T Ti i i ii i i ii i j ij j
j Nt
V t Q u R u u R u dt
Value function
Differential equivalent (Leibniz formula) is Bellman’s Equation
Stationarity Condition
1. Coupled HJ equations
2. Best Response HJ Equations – other players have fixed policies
where
where
Graphical Game Solution Equations
ju
Kyriakos Vamvoudakis
Reinforcement Learning to Solve Graphical Games
D. Vrabie, K. Vamvoudakis, and F.L. Lewis, Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles, IET Press, 2012.
BooksF.L. Lewis, D. Vrabie, and V. Syrmos, Optimal Control, third edition, John Wiley and Sons, New York, 2012.
New Chapters on:Reinforcement LearningDifferential Games
F.L. Lewis and D. Vrabie,“Reinforcement learning andadaptive dynamic programmingfor feedback control,” IEEECircuits & Systems Magazine,Invited Feature Article, pp. 32-50, Third Quarter 2009.
IEEE Control Systems Magazine“Reinforcement learning andfeedback Control,” Dec. 2012
Different methods of learning
SystemAdaptiveLearning system
ControlInputs
outputs
environmentTuneactor
Reinforcementsignal
Actor
Critic
Desiredperformance
Reinforcement learningIvan Pavlov 1890s
Actor-Critic LearningPaul Werbos
We want OPTIMAL performance- ADP- Approximate Dynamic Programming
Sutton & Barto book
Doya, Kimura, Kawato 2001
Limbic system
Cerebral cortexMotor areas
ThalamusBasal ganglia
Cerebellum
Brainstem
Spinal cord
Interoceptivereceptors
Exteroceptivereceptors
Muscle contraction and movement
Summary of Motor Control in the Human Nervous System
reflex
Supervisedlearning
ReinforcementLearning- dopamine
(eye movement)inf. olive
Hippocampus
Unsupervisedlearning
Limbic System
Motor control 200 Hz
theta rhythms 4-10 Hz
picture by E. StinguD. Vrabie
Memoryfunctions
Long term
Short term
Hierarchy of multiple parallel loops
gamma rhythms 30-100 Hz
Online Solution of Graphical Games
Use Reinforcement Learning Convergence Results
POLICY ITERATION
Kyriakos Vamvoudakis
1 1 12 2 2( , , , ) ( ) 0
i i
TT T Ti i
i i i i i i i i i ij j j i ii i i ii i j ij ji i j N j N
V VH u u A d g B u e B u Q u R u u R u
1( ) T ii i i ii i
i
Vu d g R B
Online Solution of Graphical Games
Policy Iteration gives structure needed for online graph games
Solve simultaneously online:
ˆ ˆ Ti i iV W
112
ˆˆ ( )T
T ii i i ii i i N
iu d g R B W
Weierstrass Approximator structures‐ 2 at each node
actor
critic
2 11 14 4
ˆ ˆ ˆ ˆ ˆ( )i
Tj jT T T T T
i i i ii i i N i i N j j j N j jj ij jj j j Nj jj N
W Q W DW d g W B R R R B W
ˆ ˆ( ( ) )i
ii i i i i i ij j j
i j N
A d g B u e B u
Bellman equation
Bellman equation becomes an algebraic equation in the parameters
Approximate values by a Critic Network at each node i
ˆ ˆ Ti i iV W
Approximate control policies by an Actor Network at each node i
112
ˆˆ ( )T
T ii N i i ii i i N
iu d g R B W
1
2 11 14 42
ˆˆ
ˆ ˆ ˆ ˆ ˆ[ ( ) ](1 )
i
i ii
Tj jT T T T Ti
i i i i ii i i N i i N j j j N j jj ij jj j j Ni i j jj N
EW a
W
a W Q W DW d g W B R R R B W
Online Solution of Graphical Games Using Value Function Aproximation
Tuning law for Critic parameters
Tuning law for Actor parameters
2 11
1 1ˆ ˆ ˆ ˆ ˆ ˆ ˆ{( ) ( ) }4 4
ii
TT Tj jT T T Ti N i N
i N i N i i N i i N i i i N i i N j j j j jj ij jj jsi s j jj N
j i
W a F W F W DW W W d g W B R R R Bm m
Converges to solution of coupled HJ equations, and Nash equilibriumand keeps states stable while learningNeed PE of ˆ ˆ( ( ) )
i
ii i i i i i ij j j
i j N
A d g B u e B u
Lemma 1 – Draguna Vrabie
Solves Lyapunov equation without knowing f(x,u)
( ( )) ( , ) ( ( )), (0) 0t T
t
V x t r x u d V x t T V
0 ( , ) ( , ) ( , , ), (0) 0TV Vf x u r x u H x u V
x x
Another form for the CT Bellman eq.
Is equivalent to
Integral Reinforcement Form of Bellman Equation
Can avoid knowledge of drift term f(x) by using Integral Reinforcement Learning (IRL)
Draguna Vrabie
Then HJ equations are solved online without knowing f(x)Coupled AREs are solved online without knowing A
Bellman Equation
Can avoid knowledge of drift term f(x) by using Integral Reinforcement Learning (IRL)
Draguna Vrabie
Then HJ equations are solved online without knowing f(x)
Coupled AREs are solved online without knowing A
Lemma 1 – Draguna Vrabie
( ( )) ( , ) ( , )Td V x V f x u r x u
dt x
Proof:
( , ) ( ( )) ( ( )) ( ( ))t T t T
t t
r x u d d V x V x t V x t T
Solves Lyapunov equation without knowing f(x,u)
( ( )) ( , ) ( ( )), (0) 0t T
t
V x t r x u d V x t T V
0 ( , ) ( , ) ( , , ), (0) 0TV Vf x u r x u H x u V
x x
Allows definition of temporal difference error for CT systems
( ) ( ( )) ( , ) ( ( ))t T
t
e t V x t r x u d V x t T
Another form for the CT Bellman eq.
Is equivalent to
Integral Reinforcement Form of Bellman Equation
Draguna Vrabie
System
Action network
Policy Evaluation(Critic network)cost
Value updateControl policy update
Critic and Actor tuned simultaneouslyLeads to ONLINE FORWARD‐IN‐TIME implementation of optimal control
New Structure of Adaptive Controller
F.L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEECircuits & Systems Magazine, Invited Feature Article, pp. 32‐50, Third Quarter 2009.
Online Solution of Graphical Games
Do not need to know system drift dynamics
Best Paper Award, Int. Joint Conf. Neural Networks, Barcelona, 2010. Draguna Vrabie and F.L. Lewis,“Adaptive Dynamic Programming Algorithm for Finding Online the Equilibrium Solution of the Two-PlayerZero-Sum Differential Game.”
Reinforcement Learning Adaptive Critic
( ) ( )i i i ix f x g x u
112
ˆˆ ( )T
T ii i i ii i i N
iu d g R B W
ˆ ˆ Ti i iV W
Process i Cost Function Optimization
Process j1 controlupdate
Process i controlupdate
Process j2 controlupdate
Optimal Performance of Each Process Depends on the Control of its Neighbor Processes
Control Policy of Each Process Depends on the Performance of its Neighbor Processes
Process i Cost Function Optimization
Process i controlupdate
Process j2 Cost Function Optimization
Process j1 Cost Function Optimization
Graphical Games for Multi-Process Optimal Control
Motions of Biological Groups
Fishschool
Birdsflock
Locustsswarm
Firefliessynchronize
Local / Peer-to-Peer Relationships in socio-biological systems
The cloud-capped towers, the gorgeous palaces,The solemn temples, the great globe itself,Yea, all which it inherit, shall dissolve,And, like this insubstantial pageant faded,Leave not a rack behind.
We are such stuff as dreams are made on, and our little life is rounded with a sleep.
Our revels now are ended. These our actors, As I foretold you, were all spirits, and Are melted into air, into thin air.
Prospero, in The Tempest, act 4, sc. 1, l. 152-6, Shakespeare
System
Action network
Policy Evaluation(Critic network)cost
The Adaptive Critic Architecture
Adaptive Critics
Value update
Control policy update
Critic and Actor tuned simultaneouslyLeads to ONLINE FORWARD‐IN‐TIME implementation of optimal control
Policy Iteration is Reinforcement Learning
F.L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEECircuits & Systems Magazine, Invited Feature Article, pp. 32‐50, Third Quarter 2009.
System
Action network
Policy Evaluation(Critic network)cost
The Adaptive Critic Architecture
Adaptive Critics
Value update
Control policy update
Critic and Actor tuned simultaneouslyLeads to ONLINE FORWARD‐IN‐TIME implementation of optimal control
A new adaptive control architecture
Optimal Adaptive Control
ˆ ˆ Ti i iV W
112
ˆˆ ( )T
T ii i i ii i i N
iu d g R B W