Real-time Decentralized Pursuer-Evader
Assignment for Cooperating UCAVs Using
the DTC Algorithm∗
Dany Dionne† and C. A. Rabbath‡
A real-time application of the DTC algorithm for decentralized task allo-
cation is reported. The application involves a team of almost-lighter-than-
air vehicles capable of local sensing and of communications. The objective
of this team is an interception of a group of moving targets. The task allo-
cation problem is to decentrally distribute the targets across the members
of the team such as to preserve cooperation while minimizing both commu-
nications and traveled distances. Results demonstrate applicability of the
DTC algorithm in hard real-time problems involving small teams of UAVs.
I. Introduction
The deployment of a team of uninhabited aerial vehicles (UAVs) is a demanding task for
operators in a base station since: (i) simultaneous coordination/cooperation with the other
team members must be insured, (ii) the communications link between the UAVs and the base
station must be maintained, and (iii) the environment is an uncertain and time-varying. To
ease the deployment of a team of networked UAVs, autonomous vehicles with the capability
to cooperatively self-assign tasks are of interest.1 Examples of tasks to be allocated are
waypoints to be reached by the UAVs, or targets to be intercepted. Such set of tasks is in
general time-varying due to the uncertainties in the environment, e.g, the unknown future
trajectory of the targets, or the online detection of an obstacle that requires modifying the
flight plan to the waypoints.
∗This work was financially supported by the Natural Sciences and Engineering Research Council ofCanada.
†Research and Development, Lockheed Martin Canada.‡Defence Research and Development Canada - Valcartier, and Mechanical Eng. Dept., McGill University,
Adjunct Professor
AIAA Guidance, Navigation and Control Conference and Exhibit20 - 23 August 2007, Hilton Head, South Carolina
AIAA 2007-6454
Copyright © 2007 by Dany Dionne and Camille A. Rabbath. Published by the American Institute of Aeronautics and Astronautics, Inc., with permission.
The general problem of dynamically coordinating a group of multiple robots satisfying
multiple goals is as yet unsolved.2 Nonetheless, several task allocation strategies for simplified
problems have been proposed. These strategies are either centralized, i.e., a single entity
allocates the tasks for the whole team, or decentralized, i.e., each UAV allocates itself its own
tasks. Decentralized task allocation strategies involves UAVs with different input information
(each UAV has different local measurements, independent noises, and so on). Then, each
UAV feeds its decentralized task allocation rule with a different information input,3,4 and
the output of this rule (the allocated task) may fail in maintaining cooperation across the
team.
Decentralized task allocation algorithms for robots and UAVs were proposed in Refs 5–7.
In Ref. 5, a decentralized task allocation algorithm was proposed that foster cooperation by
asynchronous intermittent communications; each UAV communicates when the error between
its local and the shared information exceeds a given threshold. In Ref. 6, efficient informa-
tion exchange was investigated by transmitting only the data with the largest impact in the
performance of the closed-loop system. In Ref. 7, a decentralized task allocation algorithm
was proposed where the decision to communicate is triggered by deviations in the outputs of
the decentralized task allocation rule. The resulting asynchronous and intermittent commu-
nications was demonstrated efficient in the sense that the number of communication events
was limited, while both the cooperation across the team and the minimization of the global
cost function were preserved.
This paper presents a real-time application of the decentralized task consensus (DTC)
algorithm introduced in Ref. 7. The studied scenario is a pursuit-evasion engagement, where
the pursuers and the evaders are the uninhabited almost-lighter-than-air vehicles (ALTAVs)
described Ref. 8. The vehicles’ model comprises 6-DOF nonlinear equations of motion with
noise in sensor measurements and actuator limits. The pursuers are equipped with local
sensors and a communications system. The tasks to be allocated across the team are the
evaders to be reached. Real-time simulations are carried out on a multiprocessor testbed.
The testbed is comprised of a cluster of four PCs running RedHawk Linux operating sys-
tem, with fast communications, and hardware synchronization. This enables fast, real-time
simulations for teaming pursuer vehicles. The simulations show the effectiveness of the DTC
algorithm as well as the computing times needed for the real-time execution of the guidance
and control laws in various engagement scenarios.
II. Scenario
A three-dimensional pursuit-evasion scenario with N pursuers and M evaders is adopted.
The objective of the pursuers is to intercept the evaders while minimizing the communications
State estimates Task allocation Guidance
Localmeasurements
Receivedcommunication
Decisionto communicate
Transmitcommunication
ALTAVdynamics and
autopilot
Figure 1. Control loop of a pursuer.
and the distance to be traveled. The adopted solution is the DTC algorithm7 that repetitively
updates the target allocation as new measurements/information arrives.
Each pursuer obtains information through communications and by gathering measure-
ments. The measurements gathered by a pursuer are about its own position and the position
of the targets; these measurements are gathered at constant rates. The communications
received by a pursuer describe the state of the other pursuers in the team; these communi-
cations are intermittent and subject to a communication delay. Whenever a pursuer decides
to transmit data, this information is broadcasted to all the other pursuers.
The control loop of a pursuer is illustrated in Fig. 1. The components of the control loop
are described below.
A. ALTAV dynamics and Autopilot
The nonlinear ALTAV dynamics was derived from experimentations by Quanser inc.8 Each
ALTAV is equipped with four motors. The nonlinear dynamics of each ALTAV is given by
Mx = −Cxx +4
∑
i=1
Fi sin(γ) (1a)
My = −Cyy +4
∑
i=1
Fi sin(φ) (1b)
Mz = Fg − FB − Cz z −4
∑
i=1
Fi cos(γ) cos(φ) (1c)
Jθθ =(
F1l1 − F2l2 + F3l3 − F4l4)
sin(ρ) − Cθθ (1d)
Jγ γ = F1l1 − F3l3 − FBlB sin(γ) − Cγ γ (1e)
Jφφ = −F2l2 + F4l4 − FBlB sin(φ) − Cφφ (1f)
where M = 1.618 [kg] is the mass, Jθ = 0.995, Jγ = 1.005, and Jφ = 1.005 [kg m2/rad]
are the moments of inertia about the x, y, and z axes, respectively, Fg = 9.8M [N] is the
force due to gravity, FB = 13 [N] is the buoyant force, l1 = l2 = l3 = l4 = 0.941 [m] are
the perpendicular distances between each of the motors and the vehicle center of gravity,
Cx = Cy = Cz = 0.95 [kg/s] and Cθ = Cγ = Cφ = 0.5 [kg m2/rad] are the drag coefficients
in the directions x, y, z, θ, γ, and φ, respectively, ρ = 6π/180 [rad] is the angular offset from
vertical of the motors’ thrust vector, and Fi, i ∈ {1, 2, 3, 4}, are the force magnitudes of the
motors. The four control variables are the values of Fi, i ∈ {1, 2, 3, 4}.
The autopilot has for objective to steer the ALTAV toward a desired position while
maintaining θ(t) = 0. The adopted controllers are PIDs and are described in details in
Ref. 9.
B. Guidance
Let V = {1, · · · , N} be the set of pursuers, and T = {1, · · · ,M} be the set of targets. The
guidance law delivers the desired position of pursuer i ∈ V along the three axes,[
xdi yd
i zdi
]T,
such as to achieve its guidance objective. This guidance objective is an interception of the
target located at[
xej , y
ej , z
ej
]T, where j = l?−i ∈ T . The value l?−i is the index of the target
allocated to the pursuer i ∈ V . A pure pursuit guidance law10 is adopted, i.e., the pursuer
flies toward the current position of its allocated target
[
xdi (tk) yd
i (tk) zdi (tk)
]T
=[
xej(tk) ye
j (tk) zej (tk)
]T
(2)
C. Measurements and State estimator
Each UAV gathers measurements on its x, y, z positions, and its θ, γ, and φ angles. All mea-
surements are subject to an additive zero-mean Gaussian noise. The position measurements
along the x and y axes are obtained from a GPS at intervals ∆GPS = 1 [s] with a noise covari-
ance σ2GPS = 1 [m2]. The position measurement along the z axis is obtained from a sonic range
finder (SRF) at intervals ∆SRF = 0.02 [s] with a noise covariance σ2SRF = 0.02 [m2]. The θ
measurements are obtained at intervals ∆θ = 0.02 [s] with a noise covariance σ2θ = 2 [degree2].
The φ and γ measurements are obtained at intervals ∆tilt = 0.01 [s] with a noise covariance
σ2tilt = 1 [degree2].
The state estimator has two main purposes: to filter the noise in the position measure-
ments, and to time align the information. Time alignment of the information is achieved by
prediction of the position of the other pursuers from the instant of their last received report
to the current time instant.
The estimator selected by each pursuer i ∈ V is a bank of (N +1) Kalman estimators,
i.e., N Kalman predictors that processes the information shared by the N pursuers, and
one Kalman filter that processes all the shared and unshared information about the local
pursuer i ∈ V . Each Kalman estimator in the bank delivers an estimated state vector,
x ∈ R9, describing the pursuer associated with it. This state vector is given by
x =[
qx qx qx qy qy qy qz qz qz
]T
(3)
where qx, qy, and qz are the estimated positions along the x, y, and z axes, respectively. The
estimation model is in the form of
˙x(t) = A(t)x(t) + B(t)u(t) + w(t) (4a)
y(tk) = H(tk)x(tk) + ν(tk) (4b)
with time-invariant block-diagonal matrices given by
A =
Aq 03×3 03×3
03×3 Aq 03×3
03×3 03×3 Aq
, B =[
09×1
]
(5a)
H =
Hq 01×3 01×3
01×3 Hq 01×3
01×3 01×3 Hq
(5b)
where
Aq =
0 1 0
0 0 −α
0 0 0
, Hq =[
1 0 0]
(6)
and with noises w(t) ∼ N (0, Qw) and ν ∼ N (0, Qν) given by
Qw =
Qwq 0 0
0 Qwq 0
0 0 Qwq
, Qwq =
qw1 0 0
0 qw2 0
0 0 qw3
(7a)
Qν =
σ2GPS 0 0
0 σ2GPS 0
0 0 σ2SRF
(7b)
In Eq. (5), the dynamics along the x, y, and z axes is uncoupled, this approximation
reduces the computational requirements. The uncoupled dynamics along each axis is in the
form of Eq. (6) where the last two rows of Aq, together with the last row of Qwq , form a Singer
shaping filter.11 The Singer shaping filter provides the estimator with the ability to cope
with unknown correlated exogenous inputs (i.e., it compensates for the unrealistic null B
matrix and the unknown u in Eq. (5)). The main exogeneous input is the control command
in § II.A; the value of that input cannot be employed by the estimator due to the control
variables in Eq. (1) being significantly different from those in Eq. (5).
By trial and error, the process noise is set to a power spectral density, Qwq , given by
qw1 = 0.1 [m/s], qw
2 = 0.1 [m/s2], qw3 = 0.1 [m/s3], and the Singer’s correlation coefficient is
selected to have value α = 0.1.
D. Task allocation and Communication
The DTC algorithm is employed for the task allocation and for the decision to communicate,
see Ref. 7. This algorithm solves the task allocation problem twice: (i) by employing only the
shared information, and (ii) by employing all the information available to the local pursuer.
Whenever there is a discrepancy between (i) and (ii), the pursuer adopts the allocation
(i) to preserve cooperation across the team. The decision to communicate is based on the
discrepancies between the allocations from (i) and (ii).
The objective of the task allocation problem is to minimize a global cost, J . This global
cost is calculated as follows. Let cij be the cost for pursuer i ∈ V to intercept target j ∈ T .
The global cost is the accumulation of the costs cij:
J(tk, L) =N
∑
i=1
cij(tk), ij ∈ L (8)
where L is one of the admissible task allocations for the team. Without loss of generality,
the cost cij is selected to be the square of the separation between the vehicles.
The minimization of the global cost in Eq. (8) involves solving an optimal combinatorial
problem. An exact solution is obtained by calculating the global cost for all the admissible
combinations.
A pursuer communicates when its unshared information is sufficient to modify the so-
lution of the task allocation problem. The information communicated by a pursuer is its
current state vector. The decision to communicate is obtained as follows. Let l?−i be the
target allocated to pursuer i ∈ V based on the shared information, and let l?i be the allocated
target based on all the local information. The decision function, gi, is given by
gi(tk) = l?i − l?−i (9)
and the decision to communicate is
gi(tk) =
0 =⇒ no communication
otherwise =⇒ communicate(10)
III. DTC Simulations: Testbed and Results
This section describes the testbed used to simulate, in real-time (RT), the DTC algorithm
and presents the simulation results obtained. The simulations rely upon nonlinear 6-DOF
models of uninhabited combat air vehicles (UCAVs), and hardware synchronization of the
various computing tasks. The multirate tasks include communications, decision, control,
filtering and dynamics. The scenarios are as follows. A small team of UCAVs have for
mission to strike an equal number of evader aerial vehicles. The pursuer-evader allocation
problem revolves around a decentralized coordination of the team of UCAVs using locally
available information and evolving scenes while constraining the number of communication
events. Once an UCAV is within a prescribed distance of its assigned target, then the
ground operator can order the firing of UCAV munitions on the target. Thus, the goal of
the DTC algorithm is to guide the UCAVs within a neighborhood of the detected targets.
The objective of the simulations is to show that this can be done in RT for a set of realistic
scenarios.
A. Testbed
The multiprocessor testbed is shown in Figure 2. The use of multiple RT processing targets
warrants small step size despite large computing tasks, enables reconfigurable hardware-in-
the-loop and immerses the user in RT operations. In the figure, the host PC, which runs
Windows Operating System (OS), serves for offline DTC design and analysis, and for sending
user commands to the RT targets through a TCP/IP link. The RT processing environment
comprises four central processing units, or targets, running RedHawk Linux OS,12 sharing
information through shared memory and FireWire (IEEE 1394 OHCI). Designer-specified
data are communicated online to the Windows-based viewer PC equipped with a renderer
software to display the engagement. In the simulations considered in this project, the X-
Plane13 renderer was used. The computing tasks running on the four RT targets are obtained
through a rapid control prototyping process enabled by the use of Matlab/Simulink,14 Real-
RTTarget
1
HostPC
RTTarget
2
RTTarget
3
RTTarget
4
ViewerPC
Real-time Processing
SM SMFW
RouterTCP/IP
TCP/IPTCP/IP
TCP/IPTCP/IP
TCP/IP
SM: Shared MemoryFW: FireWire
RTTarget
1
HostPC
RTTarget
2
RTTarget
3
RTTarget
4
ViewerPC
Real-time Processing
SM SMFW
RouterTCP/IP
TCP/IPTCP/IP
TCP/IPTCP/IP
TCP/IP
SM: Shared MemoryFW: FireWire
Figure 2. Multiprocessor testbed
Time Workshop15 and RT-Lab.16 In short, having a Simulink model available, the user
separates the various components of his/her model into subsystems. Then, the designer
decides unto which target the various subsystems will run. Making sure the subsystems are
RT-compliant, the user then initiates the process of automatic code generation, compilation
on the RT targets, and uploading to the assigned RT targets. After this sequence is done,
the user can start the simulations and acquire data, as needed.
To reduce computing times via distributed processing, the user must make sure the
pursuer-engagement model is relatively well balanced among the nodes and that the depen-
dency among subsystems, within a single time step, is reduced.
B. Simulation Results
The results obtained with a 5-pursuer, 5-evader open-air engagement scenario are shown
in Figures 3 to 5. In these simulations, artificial delays are introduced in the exchange of
information among the RT targets to mimic the effect of communications delays found with
actual aerial systems.
Figure 3. Vehicle trajectories projected in the horizontal plane during the engagement. Pur-suers (solid lines). Evaders (dashed lines).
The trajectories obtained in the horizontal plane are shown in Figure 3. The engagement
is face-to-face, the pursuers are denoted P1, · · · , P5 and their initial position is on the left-
hand side. The initial position of the evaders is on the right-hand side. Initially, the pursuers
P1 and P2 are in the same neighborhood, while the other pursuers are more separated from
each other.
Snapshots of the engagement taken at four different time instants are presented in Fig-
ure 4. The pursuers (P) are shown to approach the evaders (E), as taken from a ground
observation point at t0 + 10, t0 + 20 and t0 + 30 seconds. Forty seconds later, each P lies in
proximity of its assigned E.
The communication events between the pursuers during the first 60 [s] of the engagement
are shown in Figure 5. The communications are intermittent and asynchronous. The decision
to communicate (see § II.D.) is triggered by the uncertainties in the available information and
by the geometry of the engagement. Consider the pursuers P1 and P2 that are in proximity
of each other initially (see Fig. 3). This proximity makes their decentralized task allocation
more difficult, these pursuers then employs communication at about t = t0 +10 [s] to ensure
efficient cooperation.
The average computing times obtained by the RT implementations of the proposed DTC
algorithm are displayed in Table 1 for a selected simulation time frame. The target computer
is a dual-CPU Pentium 4 with a 2 GHz clock speed. The shortest execution period for the
t0 + 70 sec
t0 + 30 secE
E
E
E
E
P
P
P
P
P
Vie
w fr
om g
roun
d
Chase ViewP-E
P-E
P-E
P-E
P-E
t0 + 70 sec
t0 + 30 secE
E
E
E
E
P
P
P
P
P
Vie
w fr
om g
roun
d
Chase ViewP-E
P-E
P-E
P-E
P-E
Figure 4. Engagement snapshots. The pursuers (P) arrive from the left-hand side, while theevaders (E) arrive from the right-hand side.
Figure 5. Communications events during the first 60 [s] of the engagement. From upper tolower panels, the communication history of each pursuer is displayed.
multi-rate model is 0.01 second. Shared memory communications is used in the exchange of
data between the two CPUs. The results show that the computing times are significantly
smaller than the idle time, thus demonstrating that the actual real-time implementation of
the DTC algorithm is feasible despite the nonlinear UAV dynamics and the sophistication
of the involved mathematics.
IV. Conclusion
The DTC algorithm was demonstrated in a detailed real-time simulation of UAVs. Real-
time applicability was demonstrated in small team composed of up to five UAVs. Based on
the results provided in this paper, it is envisaged that application of the DTC algorithm
to swarms of hundreds of vehicles could be readily simulated with the testbed shown in
Figure 2. Applicability of the DTC algorithm to such large team will be enable by combining
Table 1. Computing times of the DTC algorithm in a time frame.
5vs5/1CPU 5vs5/2CPUs 2vs2/1CPU
Computing times
(microseconds)160 100 70
the novelties introduced in the DTC algorithm with numerically efficient techniques to solve
the combinatorial problem encountered in solving the task allocation problem.
References
1Chandler, P. R. and S., R., “UAV Cooperative Path Planning,” Proceedings of the AIAA Guidance,
Navigation, and Control Conference, August 2000, Paper AIAA-2000-4370.
2Mataric, M. J., Sukhatme, G. S., and Østergaard, E. H., “Multi-Robot Task Allocation in Uncertain
Environments,” Autonomous Robots, Vol. 14, 2003, pp. 255–263.
3Ren, W., Beard, R. W., and Kingston, D. B., “Multi-Agent Kalman Consensus with Relative Uncer-
tainty,” Proceedings of the American Control Conference, Portland, Oregon, June 2005, pp. 1865–1870.
4Mitchell, J. W. and Sparks, A. G., “Communication Issues in the Cooperative Control of Unmanned
Aerial Vehicles,” Proceedings of the 41st Annual Allerton Conference on Communication, Control, and
Computing , 2003.
5Shima, T., Rasmussen, S. J., and Chandler, P., “UAV Team Decision and Control Using Efficient
Collaborative Estimation,” Proceedings of the American Control Conference, Portland, Oregon, June 2005,
pp. 4107–4112.
6Alighanbari, M. and How, J. P., “Decentralized Task Assignment for Unmanned Aerial Vehicles,”
Proceedings of the IEEE Conference on Decision and Control , Seville, Spain, December 2005, pp. 5668–
5673.
7Dionne, D. and Rabbath, C. A., “Multi-UAV Decentralized Task Allocation With Intermittent Com-
munications: the DTC Algorithm,” Proceedings of the IEEE American Control Conferenre, New York, July
2007, to appear.
8Earon, E., “Almost-Lighter-Than-Air Vehicle Fleet Simulation,” Tech. Rep. V. 0.9, Quanser inc.,
Toronto, Canada, 2005.
9Lechevin, N., Rabbath, C. A., and Earon, E., “Towards Decentralized Fault Detection in UAV For-
mations,” Proceedings of the American Control Conference, New York, July 2007, paper FrC07.2.
10Shneydor, N. A., Missile Guidance and Pursuit - Kinematics, Dynamics, and Control , Engineering
Science, Horwood Publishing, England, 1998.
11Singer, R. A., “Estimating Optimal Tracking Filter Performance for Manned Maneuvering Targets,”
IEEE Transactions on Aerospace and Electronic Systems, Vol. 5, 1970, pp. 473–483.
12http://www.ccur.com/isd solutions redhawklinux.asp.
13http://www.x-plane.com.
14http://www.mathworks.com/.
15http://www.mathworks.com/access/helpdesk/help/toolbox/rtw/.
16www.opal-rt.com.