+ All Categories
Home > Documents > Grid-Wise Control for Multi-Agent Reinforcement Learning in Video...

Grid-Wise Control for Multi-Agent Reinforcement Learning in Video...

Date post: 17-Oct-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
8
Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI Lei Han* 1 , Peng Sun* 1 , Yali Du* 2 , Jiechao Xiong 1 , Qing Wang 1 , Xinghai Sun 1 , Han Liu 3 , Tong Zhang 4 1 Tencent AI Lab, Shenzhen, China 2 University of Technology Sydney, Australia 3 Northwestern University, IL, USA 4 Hong Kong University of Science and Technology, Hong Kong, China * Equal contribution Email: [email protected]
Transcript
Page 1: Grid-Wise Control for Multi-Agent Reinforcement Learning in Video …11-11-00)-11-11-30-4821-grid-wise_contr.… · Grid-Wise Control for Multi-Agent Reinforcement Learning in Video

Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI

Lei Han*1, Peng Sun*1, Yali Du*2, Jiechao Xiong1, Qing Wang1, Xinghai Sun1, Han Liu3, Tong Zhang4

1 Tencent AI Lab, Shenzhen, China2 University of Technology Sydney, Australia

3 Northwestern University, IL, USA4 Hong Kong University of Science and Technology, Hong Kong, China

* Equal contribution

Email: [email protected]

Page 2: Grid-Wise Control for Multi-Agent Reinforcement Learning in Video …11-11-00)-11-11-30-4821-grid-wise_contr.… · Grid-Wise Control for Multi-Agent Reinforcement Learning in Video

Introduction

qConsidered Problem• Multi-agent reinforcement learning (MARL)• Grid-world environment (video game)• Challenge

Ø flexibly control an arbitrary number of agentsØwhile achieving effective collaboration

qExisting MARL Approaches• Decentralized learning

Ø IQL, IAC (Tan, 1993; Foerster et al., 2017)• Centralized learning

ØCommNet, BicNet (Sukhbaatar et al., 2016; Peng et al., 2017)• Mixture

ØCOMA, QMIX, Mean-Field (Foerster et al., 2017; Rashid et al., 2018; Yang et al., 2018)vUnable/instable to deal with variant agent number

Page 3: Grid-Wise Control for Multi-Agent Reinforcement Learning in Video …11-11-00)-11-11-30-4821-grid-wise_contr.… · Grid-Wise Control for Multi-Agent Reinforcement Learning in Video

GridNet

qArchitecture• Encoder

Ø Inputs are represented as an image-like structureØ Using conv/pooling layers to generate an embedding

• DecoderØUp-sampling to construct an action mapØAn agent will take the action in the grid it occupies

Page 4: Grid-Wise Control for Multi-Agent Reinforcement Learning in Video …11-11-00)-11-11-30-4821-grid-wise_contr.… · Grid-Wise Control for Multi-Agent Reinforcement Learning in Video

GridNet

qAlgorithms• Can be integrated with many general RL algorithms

ØQ-learning ØActor-critic

qProperties• Collaboration is natural

ØStacked convolutional and/or pooling layers provide a large receptive fieldØEach agent is aware of other agents in its neighborhood

• Fast parallel explorationØConvolutional parameters are shared by all the agents ØOnce an agent takes a beneficial action during its own exploration, the other agents will

acquire the knowledge as well• Transferrable policy

ØThe trained policy is easy to be transferred to other settings with a various number of agents

Page 5: Grid-Wise Control for Multi-Agent Reinforcement Learning in Video …11-11-00)-11-11-30-4821-grid-wise_contr.… · Grid-Wise Control for Multi-Agent Reinforcement Learning in Video

Experiments on Battle Games in StarCraft II

qScenarios• 5Immortals vs. 5Immortals (5I)• 3Immortals+2Zealots vs. 3Immortals+2Zealots (3I2Z)• mixed army battle (MAB) with a random number of various Zerg units

• including Baneling, Zergling, Roach, Hydralisk and Mutalisk.

qTraining Strategies• Against handcraft policies: random (Rand), attack-nearest (AN), hit-and-run (HR)• Against self historic versions: self-play (SP)

qCompared Methods• IQL: independent Q-learning [Tan, 1993]• IAC: independent actor-critic [Foerster et al., 2017]• Central-V: centralized value with decentralized policy [Foerster et al., 2017]• CommNet: communication net [Sukhbaatar et al., 2016]

qVideo link: https://youtu.be/LTcr01iTgZA

Page 6: Grid-Wise Control for Multi-Agent Reinforcement Learning in Video …11-11-00)-11-11-30-4821-grid-wise_contr.… · Grid-Wise Control for Multi-Agent Reinforcement Learning in Video

• On 5I and 3I2Z• Performance (against handcraft policies)

Experiments on Battle Games in StarCraft II

• Performance (against each other)

Page 7: Grid-Wise Control for Multi-Agent Reinforcement Learning in Video …11-11-00)-11-11-30-4821-grid-wise_contr.… · Grid-Wise Control for Multi-Agent Reinforcement Learning in Video

• Transferability On 5I and 3I2Z• Directly apply the trained policy to maps with more agents• 10I, 20I, 5I5Z, 10I10Z

• Performance On MAB• CommNet and Central-V cannot be applied

Experiments on Battle Games in StarCraft II

qLearned Tactics

Page 8: Grid-Wise Control for Multi-Agent Reinforcement Learning in Video …11-11-00)-11-11-30-4821-grid-wise_contr.… · Grid-Wise Control for Multi-Agent Reinforcement Learning in Video

Thanks!Poster at Pacific Ballroom #243

Jun 11th, 6:30 pm


Recommended