Distribution A: Approved for Public Release, Distribution Unlimited
ACE TA1 Proposer’s Day: Agenda
1
0700 - 0800 Log-In and Set-upDARPA Conference Center / Zoom Video Conference for Attendees
0800 - 0810 Welcome and Rules of EngagementMr. David Ott, ACE Program SETA
0810 - 0830Welcome and Program IntroductionDr. Timothy Grayson – STO Office DirectorDr. Dan “Animal” Javorsek, Lt Col, USAF – Program Manager, DARPA/STO
0830 - 0850 Contracts Management Office (CMO) BriefingMs. Catherine Stevens – Contracting Officer, Director of Prototypes Division
0850 – 0935 Future of Air CombatCDR Kevin “Shaka” Chlan, USN, Federal Executive Fellow, CSBA
0935 – 1020 History and Future of SymbiosisDr. Dan Patt
1020 – 1030 Break
1030 – 1115 ACE Program Overview and TA1 BAADr. Dan “Animal” Javorsek, Lt Col, USAF – Program Manager, DARPA/STO
1115 – 1200 Experimentation Integration Team (EIT) BriefingMr. Chris “Disco” Demay & Mr. Matt Rich
1200 – 1215 Break for Reset
1215 – 1700 Proposer Lightning Talks3 minutes each (1 slide due March 19)
1 on 1 Meetings with Dr. Javorsek10 minutes each (First 15 requested)
Distribution A: Approved for Public Release, Distribution Unlimited
Air Combat Evolution (ACE) TA1: Build Combat AutonomyDan “Animal” Javorsek, PhD
Program Manager, DARPA/STO
Demonstrate trusted, scalable, human-level autonomy for air combat
Briefing prepared for ACE TA1 Proposers Day
26 March 2020
Distribution A: Approved for Public Release, Distribution Unlimited
Future U.S. Combat Success Requires AI Capable UAVs
Can we use existing methods designed for humans to mature autonomy?
2
“In the future, it is desirable to have each
operator control multiple unmanned
systems, thus shifting the human’s role from
operator towards mission manager.”
Unmanned Systems Roadmap, 2018
Distribution A: Approved for Public Release, Distribution Unlimited
Build performance and trust the way we do with humans
Striker Escort
Suppression of Enemy Air Defenses
Point Protection
Traffic Avoidance
Autopilot
Terrain Avoidance
Navigation
Mosaic Warfare
3
Dogfight
Physics-Based Maneuver Systems
Nonlinear Interactive
SystemsLo
wer
Prob
lem
Com
plex
ity
Hig
her
Lower Cognitive Workload Higher
Dogfight is gateway to nonlinear combat autonomy
Combat autonomy is stuck here!
Distribution A: Approved for Public Release, Distribution Unlimited
Motivation: Human pilots & battle managers will need to rely on behaviors of autonomous systems to increase ratio of humans to unmanned assets
Air Combat Evolution (ACE):Demonstrating trusted, scalable, human-level autonomy for air combat
local
globa
lwill push combat
autonomy up the stack
Maneuver
Individual Tactical Behaviors (1v1)
Team Tactical Behaviors (2v1, 2v2)
Multi-aircraft Operational Behaviors
Heterogeneous Multi-aircraft Strategic Behaviors
current automation lives here
Current teams comprised of:all humans or many humans operating a
single unmanned asset
Future force structure vision requires single human managing many unmanned assets
Goal: Demonstrate trusted, scalable, human-level autonomy for air combat
• will increase performance and trust of local combat autonomy
• Mature algorithms in simulation, sub-scale, and full-scale combat representative aircraft
• Increase complexity (1v1, 2v1, 2v2) within each stage
• will scale trusted algorithms developed for tactical maneuvers and engagements to complex campaign levels
and
5
Distribution A: Approved for Public Release, Distribution Unlimited
TA4
TA3
TA2
ACE program technical challenges
6
ACE will build scalable performance and trust in combat autonomy
TA1 Need performance from automated tactical decision making
Must build pilot trust in combat automation
Scale performance and maintain trust up the stack
Demonstrate performance on increasingly realistic platforms
Distribution A: Approved for Public Release, Distribution Unlimited
TA4
TA3
TA2
ACE program structure
7
TA1 Need performance from automated tactical decision making
Must build pilot trust in combat automation
Scale performance and maintain trust up the stack
Demonstrate performance on increasingly realistic platforms
Distribution A: Approved for Public Release, Distribution Unlimited
TA1: Build combat autonomy for local behaviors
• Challenge: Dogfight represents a new class of games- Continuous, unbounded, incomplete knowledge - Adversary can actively conceal/deceive- High-tempo with simultaneous players
• Insights: - Contrary to popular belief the Dogfight is manifold bounded- Established tech base actively addressing these challenges- Hybridized AI approaches blended with rules-based tree search
show strong promise
• Technical Area 1 (TA1) Objectives:- Develop and demonstrate within visual range (WVR) individual
and team control algorithms- Implementation in M&S, sub-scale unmanned aerial vehicles
(UAVs), and full-scale combat representative aircraft- Success metric: Win Probability (PW)
8
Consequence-Normalized Crosscheck Ratio (RN)
Win
Pro
babi
lity
(Pw)
1v1
Phase 1:M&S
performance
Phase 2:Sub-scale
50/50
Phase 3:Full-scale
trust
Sim Combat AircraftCommercial
UAVs
perfo
rman
ce
trust
2v1
2v2
TA1 increases performance of dogfight automation in increasingly realistic scenarios
Game Complexity:(state-space complexity)
Tic-tac-toe Checkers Chess Go
Sequential Games
Drivingor Atari
Board Games Atari StarcraftPoker
103 1020 1047 10170
0.5 30060
State of the art AI exceeds requirement for ACE in many dimensions
Information Observability:(perfect vs imperfect info)
Tempo:(actions per minute)
Starcraft
10270
Starcraft
Sim graphic source: Ernest et al., J Def Manag 2016
Distribution A: Approved for Public Release, Distribution Unlimited
Trials advancing combat autonomy through competition
9
6
1v1Need better autonomy to go from this: to this:
mot
ivat
ion
goal
months from start to finish
Kick-off
30Sept2019 19Nov2019 28Jan2020 18Aug
Trial 1@JHU/APL
Trial 2@JHU/APL
Trial 3@AFWERX Las Vegas
preseason regular season
playoffs
within visual range air to air combat (aka dogfighting) in M&S
8 performer teams developing & competing combat autonomy against:
gov’t provided AI each other human pilots
will use the dogfight as the challenge problem to increase performance and trust in AI algorithms
using F-16 models
1997 2016 2019 2020+
gam
eAI
year
Distribution A: Approved for Public Release, Distribution Unlimited
Cross TA interactions, all TAs
10
darpa.mil
Distribution A: Approved for Public Release, Distribution Unlimited
Cross TA interactions featuring TA2
11
darpa.mil
Distribution A: Approved for Public Release, Distribution Unlimited
Cross TA interactions featuring TA3
12
darpa.mil
darpa.mil
Characterization of combat autonomy provided by TA1
Distribution A: Approved for Public Release, Distribution Unlimited
Cross TA interactions featuring TA4
13
darpa.mil
Distribution A: Approved for Public Release, Distribution Unlimited
Metrics: Build trust in AI the same way we do with pilots
14
Phase 1 Phase 2 Phase 3
M&S Subscale Full-scale
TA1: Increase performance for local behaviors
Win Probability (PW): Limited Th: 50% Ob: 100%
For 2D: unopposed, unaware, O/D/HA-limited, O/D/HA-unlimited; 3D: repeat all
Win Probability (PW): Limited Th: 75% Ob: 100%
For 2D: unopposed, unaware, O/D/HA-limited, O/D/HA-unlimited; 3D: repeat all
Win Probability (PW): Limited 1v1 Th: 75% Limited 2v1 Th: 90%Limited 2v2 Th: 80% Ob: 100%
Th: ThresholdOb: Objective
O/D/HA: Offensive/Defensive/High Aspect Initial ConditionsUnopposed: Pre-planned maneuvers Unaware: Station keeping on unaware adversary
Limited: Baseline adversary with standard gameplan, limited maneuver potential, and thrustUnlimited: Adversary with no gameplan, maneuver potential, or thrust restrictions
Distribution A: Approved for Public Release, Distribution Unlimited
ACE Program Schedule
15
Distribution A: Approved for Public Release, Distribution Unlimited
• Overall Scientific and Technical Merit• Standard DARPA BAA language
• Develop air combat maneuvering algorithms for individual and team tactical behaviors
• Integrate algorithms with M&S environment, sub-scale live aircraft, and full-scale live aircraft
• Implement a strategy for interacting the combat autonomy with the human pilot
• Characterize contribution of tactical behavior algorithms to battle management tasks and provide interface to TA-3 algorithms
• Potential Contribution and Relevance to the DARPA Mission• Standard DARPA BAA language
• Cost and Schedule Realism• Standard DARPA BAA language
Source selection and evaluation criteria
16
Distribution A: Approved for Public Release, Distribution Unlimited
• Teaming encouraged
• Highlight previous experience
• Schedule- BAA released – 06 March 2020- Proposers Day – 26 March 2020- Optional 1-on-1s – 26 March 2020- FAQ/Questions Due Date – 31 March 2020
- Email: [email protected]
- Full Proposals Due – 30 April 2020
Submission Highlights
17
Distribution A: Approved for Public Release, Distribution Unlimited 18
Distribution A: Approved for Public Release, Distribution Unlimited
AlphaZero: Hybrid AI Approach Dominates Board Games
DARPA Swarm Autonomy Study, 2018
• Google DeepMind fused Deep Learning (DL) with Monte Carlo tree search to create a board-game program (AlphaZero1) that could beat the best human and machine players, learning through self play given only the rules of the game
- Searched 10,000 fewer board positions per move than the best traditional AI heuristic search program• Combination of DL and heuristic search:
- During match play, Monte Carlo tree search explores a subset of the space of possible moves by playing ahead for several moves against itself and then picking the next move with best combined value of its expected outcome at end of Monte Carlo search
- Expected value of board positions provided by a deep convolution neural network (CNN) trained to evaluate states (board positions)
- Weights of CNN initially random, then trained by updating the network’s weights based on the outcome of thousands of games of self play using reinforcement learning
• Subjectively described by former world chess champion Gary Kasporov as having much more human-like play, more intuitive and innovative than traditional AI chess programs
19
AlphaZero training instances needed to exceed performance of traditional AI champion. Performance (Elo units) versus 1000s of training trials
1. Silver et al., Science 362, 1140–1144 (2018) 7 December 2018
Distribution A: Approved for Public Release, Distribution Unlimited
OpenAI Five Achieves Top Human Level in DOTA 2 (almost)• OpenAI developed OpenAI Five to achieve expert-human level play in video game Defense of
the Ancients 2 (DOTA 2) using only CNN with reinforcement learning from self play1,2
DOTA 2: similar to Quake III and StarCraft 2; closer to air combat than board games:- First-person shooter, multiplayer (5 vs 5) with competing and cooperating agents, simultaneous
play, no discrete states, uses only pixels and awarded game points - OpenAI Five introduced restrictions/constrains on DOTA making it easier to automate; constraints
are being removed over time
20
OpenAI Five plays at 99.95% human level, but failed to beat the top human professionals in 5v5 play
1. https://openai.com/five/2. https://www.youtube.com/watch?v=dltN4MxV1RI3. Graves et al, Nature Vol. 538, p. 471, 27 Oct 2016
Screenshot of Dota 2, a fantasy arena battle game where two teams of five heroes fight to destroy one another’s base. Gameplay is complex, and matches typically last more than 30 minutes.
- Currently OpenAI Five defeated former pro human players rated in top 99.95% but lost to the world champion human team with some game restrictions in August 2018
CNN trained with self-play using deep reinforcement learning similar to DeepMind’s CTF, but 1-level of reinforcement, not hierarchical
- CNN augmented with memory, shown to be successful for sequential action problems3
- “Divining the exact thought process behind the bots’ actions is impossible. What we can say is that they excelled in close quarters but found it trickier to match humans’ long-term strategies.”
Distribution A: Approved for Public Release, Distribution Unlimited
• Google DeepMind developed learning environment (StarCraft II Learning Environment: SC2LE1)- StarCraft II, a popular real-time strategy game, goes beyond board games toward simulated combat:
- Multi-agent, continuous play (not discrete states), imperfect information (map only partially observable via local camera moved to collect information), vast and diverse action space, and contests consist of 1000s of actions
- SC2LE provides a suite of tools including StarCraft2 game engine, training data sets (completed games from human players), built-in automated opponent (AlphaStar), learning engine and associated learning tools
- In difficulty and sophistication, SE2LE significant advance beyond Arcade Learning Environment (ALE) for Atari • DeepMind’s AlphaStar agents use deep reinforcement learning with and without memory
- Learning environment and baseline informed by AlphaGo and AlphaZero- Deep convolution networks used to create both value (estimating
probability of outcomes from a given state) and policy (selecting next actions to take)
- Variants add memories to CNN to cope with complexities of StarCraft II• DeepMind hosts competitions between competitors who submit
their own intelligent agents - DeepMind invites other groups to develop their own approaches- 2019: AlphaStar at professional-human level, with some caveats2
DeepMind Creates AI Learning Environment for StarCraft II
21
1. Vinyals et al., https://arxiv.org/abs/1708.0478216 Aug 20172. https://www.theverge.com/2019/1/24/18196135/google-deepmind-ai-starcraft-2-victory
StarCraft II, real-time strategy game, goes beyond board games toward simulated real-time combat
Google DeepMind AI improving at StarCraft II
Distribution A: Approved for Public Release, Distribution Unlimited
Deepmind trained agents to beat Starcraft pros in a matter of weeksDeep neural network trained using imitation and reinforcement learning
22
1. original agent trained using imitation learning on replays of community games
2. original agent forked to create modified agents3. agents played against each other
4. based on outcome of engagements use reinforcement learning to update neural network, fork new agents
5. agents played against each other
6. based on outcome of engagements use reinforcement learning to update neural network, fork new agents
7. play all agents against each other 8. Identify “least exploitable” agents out of the league
• Takes about 2 weeks to run the entire AlphaStar League
• Each agent trained with 16 TPUv3s (tensor processing units) equivalent to 50 GPUs each
• Agents play about 200 years of SC2
T0+3 days T0+7 days T0+9 days
T0+11days T0+12 days
© Google Deepmind © Google Deepmind
© Google Deepmind © Google Deepmind
© Google Deepmind
Distribution A: Approved for Public Release, Distribution Unlimited
Google DeepMind Achieves Human-Level Play for Quake III
• Google DeepMind achieved expert-human level play on Quake III Arena Capture the Flag (CTF) using AlphaZero-style reinforcement learning with deep CNN1,2
- Quake III Arena Capture the Flag (CTF) closer to air combat than board games:- First-person shooter, multiplayer with competing and cooperating agents, simultaneous play, no discrete
states, uses only pixels and awarded game points to learn and play• Deep reinforcement learning with self play trains CNN on two tiers
- Intelligent agents play in teams with and against each other on randomly generated environments- Each agent learns its own internal reward signal, complementing the sparse delayed reward from winning- Selects actions using novel temporally hierarchical representation enabling reasoning at multiple timescales
23
1. Jaderbert et al, arXiv:1807.01281v1 [cs.LG] 3 Jul 20182. https://www.youtube.com/watch?v=dltN4MxV1RI
- Displays humanlike behaviors (e.g. navigating, following, defending) using encoded high-level game knowledge
• Trained agents exceeded win rate of strong human players and far stronger than state-of-the-art agents
Thousands of games provide data set for training in two environments
Quake III, real-time capture the flag game, is closer to simulated air combat scenarios
Distribution A: Approved for Public Release, Distribution Unlimited
• University team combined heuristic search with Deep Learning (similar to AlphaZero)- First program to successfully employ heuristic search for incomplete information game- Before each action, algorithm considers possible actions and uses limited search of
possible future states to a certain depth of search
DeepStack Beats Professional Texas Hold ‘em Poker Players
24
1. http://science.sciencemag.org/ on January 24, 20192. https://www.quora.com/What-is-an-intuitive-explanation-of-counterfactual-regret-minimization
Hybrid AI approach defeats professional poker players in game of incomplete information
- Abstract representation of actions reduces size of possible actions for computational tractability
- At end of search: an evaluation function estimates value of possible poker hands and hence possible actions
- Evaluation function trained using DL and random poker situations using recursive self-play through fixed depth of play in a Monte Carlo Markov decision process
• Strives to approximate Nash equilibrium: tie is worst-case situation against any strategy and a win occurs against any less-than-optimal strategy (e.g. mistakes by a human player)
DeepStack performs limited-depth heuristic search using an evaluation function trained with DL