ACE TA1 Proposer’s Day: Agenda › attachments › 20200326_ACE_Pro... · 3/26/2020 · autonomy...

Distribution A: Approved for Public Release, Distribution Unlimited

ACE TA1 Proposer’s Day: Agenda

1

0700 - 0800 Log-In and Set-upDARPA Conference Center / Zoom Video Conference for Attendees

0800 - 0810 Welcome and Rules of EngagementMr. David Ott, ACE Program SETA

0810 - 0830Welcome and Program IntroductionDr. Timothy Grayson – STO Office DirectorDr. Dan “Animal” Javorsek, Lt Col, USAF – Program Manager, DARPA/STO

0830 - 0850 Contracts Management Office (CMO) BriefingMs. Catherine Stevens – Contracting Officer, Director of Prototypes Division

0850 – 0935 Future of Air CombatCDR Kevin “Shaka” Chlan, USN, Federal Executive Fellow, CSBA

0935 – 1020 History and Future of SymbiosisDr. Dan Patt

1020 – 1030 Break

1030 – 1115 ACE Program Overview and TA1 BAADr. Dan “Animal” Javorsek, Lt Col, USAF – Program Manager, DARPA/STO

1115 – 1200 Experimentation Integration Team (EIT) BriefingMr. Chris “Disco” Demay & Mr. Matt Rich

1200 – 1215 Break for Reset

1215 – 1700 Proposer Lightning Talks3 minutes each (1 slide due March 19)

1 on 1 Meetings with Dr. Javorsek10 minutes each (First 15 requested)


Air Combat Evolution (ACE) TA1: Build Combat AutonomyDan “Animal” Javorsek, PhD

Program Manager, DARPA/STO

Demonstrate trusted, scalable, human-level autonomy for air combat

Briefing prepared for ACE TA1 Proposers Day

26 March 2020


Future U.S. Combat Success Requires AI Capable UAVs

Can we use existing methods designed for humans to mature autonomy?

2

“In the future, it is desirable to have each

operator control multiple unmanned

systems, thus shifting the human’s role from

operator towards mission manager.”

Unmanned Systems Roadmap, 2018


Build performance and trust the way we do with humans

Striker Escort

Suppression of Enemy Air Defenses

Point Protection

Traffic Avoidance

Autopilot

Terrain Avoidance

Navigation

Mosaic Warfare

3

Dogfight

Physics-Based Maneuver Systems

Nonlinear Interactive

SystemsLo

wer

Prob

lem

Com

plex

ity

Hig

her

Lower Cognitive Workload Higher

Dogfight is gateway to nonlinear combat autonomy

Combat autonomy is stuck here!


Motivation: Human pilots & battle managers will need to rely on behaviors of autonomous systems to increase ratio of humans to unmanned assets

Air Combat Evolution (ACE):Demonstrating trusted, scalable, human-level autonomy for air combat

local

globa

lwill push combat

autonomy up the stack

Maneuver

Individual Tactical Behaviors (1v1)

Team Tactical Behaviors (2v1, 2v2)

Multi-aircraft Operational Behaviors

Heterogeneous Multi-aircraft Strategic Behaviors

current automation lives here

Current teams comprised of:all humans or many humans operating a

single unmanned asset

Future force structure vision requires single human managing many unmanned assets

Goal: Demonstrate trusted, scalable, human-level autonomy for air combat

• will increase performance and trust of local combat autonomy

• Mature algorithms in simulation, sub-scale, and full-scale combat representative aircraft

• Increase complexity (1v1, 2v1, 2v2) within each stage

• will scale trusted algorithms developed for tactical maneuvers and engagements to complex campaign levels

and

5


TA4

TA3

TA2

ACE program technical challenges

6

ACE will build scalable performance and trust in combat autonomy

TA1 Need performance from automated tactical decision making

Must build pilot trust in combat automation

Scale performance and maintain trust up the stack

Demonstrate performance on increasingly realistic platforms


TA4

TA3

TA2

ACE program structure

7

TA1 Need performance from automated tactical decision making

Must build pilot trust in combat automation

Scale performance and maintain trust up the stack

Demonstrate performance on increasingly realistic platforms


TA1: Build combat autonomy for local behaviors

• Challenge: Dogfight represents a new class of games- Continuous, unbounded, incomplete knowledge - Adversary can actively conceal/deceive- High-tempo with simultaneous players

• Insights: - Contrary to popular belief the Dogfight is manifold bounded- Established tech base actively addressing these challenges- Hybridized AI approaches blended with rules-based tree search

show strong promise

• Technical Area 1 (TA1) Objectives:- Develop and demonstrate within visual range (WVR) individual

and team control algorithms- Implementation in M&S, sub-scale unmanned aerial vehicles

(UAVs), and full-scale combat representative aircraft- Success metric: Win Probability (PW)

8

Consequence-Normalized Crosscheck Ratio (RN)

Win

Pro

babi

lity

(Pw)

1v1

Phase 1:M&S

performance

Phase 2:Sub-scale

50/50

Phase 3:Full-scale

trust

Sim Combat AircraftCommercial

UAVs

perfo

rman

ce

trust

2v1

2v2

TA1 increases performance of dogfight automation in increasingly realistic scenarios

Game Complexity:(state-space complexity)

Tic-tac-toe Checkers Chess Go

Sequential Games

Drivingor Atari

Board Games Atari StarcraftPoker

103 1020 1047 10170

0.5 30060

State of the art AI exceeds requirement for ACE in many dimensions

Information Observability:(perfect vs imperfect info)

Tempo:(actions per minute)

Starcraft

10270

Starcraft

Sim graphic source: Ernest et al., J Def Manag 2016


Trials advancing combat autonomy through competition

9

6

1v1Need better autonomy to go from this: to this:

mot

ivat

ion

goal

months from start to finish

Kick-off

30Sept2019 19Nov2019 28Jan2020 18Aug

Trial 1@JHU/APL

Trial 2@JHU/APL

Trial 3@AFWERX Las Vegas

preseason regular season

playoffs

within visual range air to air combat (aka dogfighting) in M&S

8 performer teams developing & competing combat autonomy against:

gov’t provided AI each other human pilots

will use the dogfight as the challenge problem to increase performance and trust in AI algorithms

using F-16 models

1997 2016 2019 2020+

gam

eAI

year


Cross TA interactions, all TAs

10

darpa.mil


Cross TA interactions featuring TA2

11

darpa.mil



12

darpa.mil

darpa.mil

Characterization of combat autonomy provided by TA1



13

darpa.mil


Metrics: Build trust in AI the same way we do with pilots

14

Phase 1 Phase 2 Phase 3

M&S Subscale Full-scale

TA1: Increase performance for local behaviors

Win Probability (PW): Limited Th: 50% Ob: 100%

For 2D: unopposed, unaware, O/D/HA-limited, O/D/HA-unlimited; 3D: repeat all

Win Probability (PW): Limited Th: 75% Ob: 100%

For 2D: unopposed, unaware, O/D/HA-limited, O/D/HA-unlimited; 3D: repeat all

Win Probability (PW): Limited 1v1 Th: 75% Limited 2v1 Th: 90%Limited 2v2 Th: 80% Ob: 100%

Th: ThresholdOb: Objective

O/D/HA: Offensive/Defensive/High Aspect Initial ConditionsUnopposed: Pre-planned maneuvers Unaware: Station keeping on unaware adversary

Limited: Baseline adversary with standard gameplan, limited maneuver potential, and thrustUnlimited: Adversary with no gameplan, maneuver potential, or thrust restrictions


ACE Program Schedule

15


• Overall Scientific and Technical Merit• Standard DARPA BAA language

• Develop air combat maneuvering algorithms for individual and team tactical behaviors

• Integrate algorithms with M&S environment, sub-scale live aircraft, and full-scale live aircraft

• Implement a strategy for interacting the combat autonomy with the human pilot

• Characterize contribution of tactical behavior algorithms to battle management tasks and provide interface to TA-3 algorithms

• Potential Contribution and Relevance to the DARPA Mission• Standard DARPA BAA language

• Cost and Schedule Realism• Standard DARPA BAA language

Source selection and evaluation criteria

16


• Teaming encouraged

• Highlight previous experience

• Schedule- BAA released – 06 March 2020- Proposers Day – 26 March 2020- Optional 1-on-1s – 26 March 2020- FAQ/Questions Due Date – 31 March 2020

- Email: [email protected]

- Full Proposals Due – 30 April 2020

Submission Highlights

17

mailto:[email protected]

Distribution A: Approved for Public Release, Distribution Unlimited 18


AlphaZero: Hybrid AI Approach Dominates Board Games

DARPA Swarm Autonomy Study, 2018

• Google DeepMind fused Deep Learning (DL) with Monte Carlo tree search to create a board-game program (AlphaZero1) that could beat the best human and machine players, learning through self play given only the rules of the game

- Searched 10,000 fewer board positions per move than the best traditional AI heuristic search program• Combination of DL and heuristic search:

- During match play, Monte Carlo tree search explores a subset of the space of possible moves by playing ahead for several moves against itself and then picking the next move with best combined value of its expected outcome at end of Monte Carlo search

- Expected value of board positions provided by a deep convolution neural network (CNN) trained to evaluate states (board positions)

- Weights of CNN initially random, then trained by updating the network’s weights based on the outcome of thousands of games of self play using reinforcement learning

• Subjectively described by former world chess champion Gary Kasporov as having much more human-like play, more intuitive and innovative than traditional AI chess programs

19

AlphaZero training instances needed to exceed performance of traditional AI champion. Performance (Elo units) versus 1000s of training trials

1. Silver et al., Science 362, 1140–1144 (2018) 7 December 2018


OpenAI Five Achieves Top Human Level in DOTA 2 (almost)• OpenAI developed OpenAI Five to achieve expert-human level play in video game Defense of

the Ancients 2 (DOTA 2) using only CNN with reinforcement learning from self play1,2

DOTA 2: similar to Quake III and StarCraft 2; closer to air combat than board games:- First-person shooter, multiplayer (5 vs 5) with competing and cooperating agents, simultaneous

play, no discrete states, uses only pixels and awarded game points - OpenAI Five introduced restrictions/constrains on DOTA making it easier to automate; constraints

are being removed over time

20

OpenAI Five plays at 99.95% human level, but failed to beat the top human professionals in 5v5 play

1. https://openai.com/five/2. https://www.youtube.com/watch?v=dltN4MxV1RI3. Graves et al, Nature Vol. 538, p. 471, 27 Oct 2016

Screenshot of Dota 2, a fantasy arena battle game where two teams of five heroes fight to destroy one another’s base. Gameplay is complex, and matches typically last more than 30 minutes.

- Currently OpenAI Five defeated former pro human players rated in top 99.95% but lost to the world champion human team with some game restrictions in August 2018

CNN trained with self-play using deep reinforcement learning similar to DeepMind’s CTF, but 1-level of reinforcement, not hierarchical

- CNN augmented with memory, shown to be successful for sequential action problems3

- “Divining the exact thought process behind the bots’ actions is impossible. What we can say is that they excelled in close quarters but found it trickier to match humans’ long-term strategies.”


• Google DeepMind developed learning environment (StarCraft II Learning Environment: SC2LE1)- StarCraft II, a popular real-time strategy game, goes beyond board games toward simulated combat:

- Multi-agent, continuous play (not discrete states), imperfect information (map only partially observable via local camera moved to collect information), vast and diverse action space, and contests consist of 1000s of actions

- SC2LE provides a suite of tools including StarCraft2 game engine, training data sets (completed games from human players), built-in automated opponent (AlphaStar), learning engine and associated learning tools

- In difficulty and sophistication, SE2LE significant advance beyond Arcade Learning Environment (ALE) for Atari • DeepMind’s AlphaStar agents use deep reinforcement learning with and without memory

- Learning environment and baseline informed by AlphaGo and AlphaZero- Deep convolution networks used to create both value (estimating

probability of outcomes from a given state) and policy (selecting next actions to take)

- Variants add memories to CNN to cope with complexities of StarCraft II• DeepMind hosts competitions between competitors who submit

their own intelligent agents - DeepMind invites other groups to develop their own approaches- 2019: AlphaStar at professional-human level, with some caveats2

DeepMind Creates AI Learning Environment for StarCraft II

21

1. Vinyals et al., https://arxiv.org/abs/1708.0478216 Aug 20172. https://www.theverge.com/2019/1/24/18196135/google-deepmind-ai-starcraft-2-victory

StarCraft II, real-time strategy game, goes beyond board games toward simulated real-time combat

Google DeepMind AI improving at StarCraft II


Deepmind trained agents to beat Starcraft pros in a matter of weeksDeep neural network trained using imitation and reinforcement learning

22

1. original agent trained using imitation learning on replays of community games

2. original agent forked to create modified agents3. agents played against each other

4. based on outcome of engagements use reinforcement learning to update neural network, fork new agents

5. agents played against each other

6. based on outcome of engagements use reinforcement learning to update neural network, fork new agents

7. play all agents against each other 8. Identify “least exploitable” agents out of the league

• Takes about 2 weeks to run the entire AlphaStar League

• Each agent trained with 16 TPUv3s (tensor processing units) equivalent to 50 GPUs each

• Agents play about 200 years of SC2

T0+3 days T0+7 days T0+9 days

T0+11days T0+12 days

© Google Deepmind © Google Deepmind

© Google Deepmind © Google Deepmind

© Google Deepmind


Google DeepMind Achieves Human-Level Play for Quake III

• Google DeepMind achieved expert-human level play on Quake III Arena Capture the Flag (CTF) using AlphaZero-style reinforcement learning with deep CNN1,2

- Quake III Arena Capture the Flag (CTF) closer to air combat than board games:- First-person shooter, multiplayer with competing and cooperating agents, simultaneous play, no discrete

states, uses only pixels and awarded game points to learn and play• Deep reinforcement learning with self play trains CNN on two tiers

- Intelligent agents play in teams with and against each other on randomly generated environments- Each agent learns its own internal reward signal, complementing the sparse delayed reward from winning- Selects actions using novel temporally hierarchical representation enabling reasoning at multiple timescales

23

1. Jaderbert et al, arXiv:1807.01281v1 [cs.LG] 3 Jul 20182. https://www.youtube.com/watch?v=dltN4MxV1RI

- Displays humanlike behaviors (e.g. navigating, following, defending) using encoded high-level game knowledge

• Trained agents exceeded win rate of strong human players and far stronger than state-of-the-art agents

Thousands of games provide data set for training in two environments

Quake III, real-time capture the flag game, is closer to simulated air combat scenarios


• University team combined heuristic search with Deep Learning (similar to AlphaZero)- First program to successfully employ heuristic search for incomplete information game- Before each action, algorithm considers possible actions and uses limited search of

possible future states to a certain depth of search

DeepStack Beats Professional Texas Hold ‘em Poker Players

24

1. http://science.sciencemag.org/ on January 24, 20192. https://www.quora.com/What-is-an-intuitive-explanation-of-counterfactual-regret-minimization

Hybrid AI approach defeats professional poker players in game of incomplete information

- Abstract representation of actions reduces size of possible actions for computational tractability

- At end of search: an evaluation function estimates value of possible poker hands and hence possible actions

- Evaluation function trained using DL and random poker situations using recursive self-play through fixed depth of play in a Monte Carlo Markov decision process

• Strives to approximate Nash equilibrium: tie is worst-case situation against any strategy and a win occurs against any less-than-optimal strategy (e.g. mistakes by a human player)

DeepStack performs limited-depth heuristic search using an evaluation function trained with DL

Date post:	05-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

ACE TA1 Proposer’s Day: Agenda › attachments › 20200326_ACE_Pro... · 3/26/2020 · autonomy...

Documents