Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | cory-johns |
View: | 214 times |
Download: | 0 times |
RADHA-KRISHNA BALLA19 FEBRUARY, 2009
UCT for Tactical Assault Battles in Real-Time Strategy
Games
Overview
I. IntroductionII. Related WorkIII. MethodIV. Experiments & ResultsV. Conclusion
I. IntroductionII. Related WorkIII. MethodIV. Experiments & ResultsV. Conclusion
Domain
RTS games Resource Production Tactical Planning
Tactical Assault battles
RTS game - Wargus
Screenshot of a typical battle scenario in Wargus
Planning problem
Large state spaceTemporal actionsSpatial reasoningConcurrencyStochastic actionsChanging goals
I. IntroductionII. Related WorkIII. MethodIV. Experiments & ResultsV. Conclusion
Related Work
Board games โ bridge, poker, Go etc., Monte Carlo simulations
RTS games Resource Production
Means-ends analysis โ Chan et al., Tactical Planning
Monte Carlo simulations โ Chung et al., Nash strategies โ Sailer et al., Reinforcement learning โ Wilson et al.,
Bandit-based problems, Go UCT โ Kocsis et al., Gelly et al.,
Our Approach
Monte Carlo simulationsUCT algorithm
Advantage Complex plans from simple abstract actions Exploration/Exploitation tradeoff Changing goals
I. IntroductionII. Related WorkIII. MethodIV. Experiments & ResultsV. Conclusion
Method
Planning architectureUCT AlgorithmSearch space formulationMonte Carlo simulationsChallenges
Planning Architecture
Online Planner
State space abstraction Grouping of units
Abstract actions Join(G) Attack(f,e)
UCT Algorithm
Exploration/Exploitation tradeoffMonte Carlo simulation โ get subsequent
statesSearch tree
Root node โ current state Edges โ available actions Intermediate nodes โ subsequent states Leaf nodes โ terminal states
Rollout-based constructionValue estimates
UCT Algorithm โ Pseudo Code 1
At each interesting time point in the game:build_UCT_tree(current state);choose argmax action(s) based on the UCT policy; execute the aggregated actions in the actual game;wait until one of the actions get executed;
build_UCT_tree(state):for each UCT pass do
run UCT_rollout(state);
(.. continued)
UCT Algorithm โ Pseudo Code 2
UCT_rollout(state): recursive algorithm
if leaf node reached thenestimate final reward; propagate reward up the tree and update value functions; return;populate possible actions;
if all actions explored at least once thenchoose the action with best value function; else if there exists unexplored actionchoose an action based on random sampling;
run Monte-Carlo simulation to get next state based on current state and action;
call UCT_rollout(next state);
UCT Algorithm - Formulae
๐+แบ๐ ,๐แป= ๐แบ๐ ,๐แป + ๐ร เถจ๐๐๐๐แบ๐ แป๐แบ๐ ,๐แป
๐แบ๐ แป= ๐๐๐๐๐๐ฅ๐ ๐+แบ๐ ,๐แป
๐แบ๐ ,๐แป โ ๐แบ๐ ,๐แป + 1 ๐แบ๐ แป โ ๐แบ๐ แป + 1
๐แบ๐ ,๐แป โ๐แบ๐ ,๐แป + 1๐แบ๐ ,๐แปแพ๐ โ ๐แบ๐ ,๐แปแฟ
Action Selection:
Value Updation:
Search Space Formulation
Abstract State Friendly and enemy groups
Hit points Location
Current actions Current time
Calculation of group hit points:
Calculation of mean location: centroid
๐ป๐(๐บ) = ( เถฅ๐ป๐๐)2
๐๐ข๐๐๐๐ ๐๐ ๐๐๐ก๐๐๐ ๐โ๐๐๐๐๐ = แ๐๐๐๐๐๐๐๐๐ฆ2 แ + ๐๐๐๐๐๐๐๐๐ฆ โ๐๐๐๐๐๐ฆ
Monte Carlo Simulations
Domain-specificActual game play โ Wargus
Join actions Attack actions
Reward calculation โ objective function Time Hit points
Note: Partial simulations (time cutoff)
Domain-specific Challenges
State space abstraction Grouping of units (proximity-based)
Concurrency Aggregation of actions
Join actions โ simple Attack actions โ complex (partial simulations)
Planning problem - revisited
Large state space โ abstractionTemporal actions โ Monte Carlo simulationsSpatial reasoning โ Monte Carlo simulationsConcurrency โ aggregation of actionsStochastic actions โ UCT (online planning)Changing goals โ UCT (different objective
functions)
I. IntroductionII. Related WorkIII. MethodIV. Experiments & ResultsV. Conclusion
Experiments
# Scenario Name
# of friendly groups
Friendly groups
composition# of enemy
groupsEnemy groups composition
# of possible โJoinโ actions
# of possible โAttackโ actions
Total # of possible actions
1 2vs2 2 {6,6} 2 {5,5} 1 4 5
2 3vs2 3 {6,2,4} 2 {5,5} 3 6 9
3 4vs2_1 4 {2,4,2,4} 2 {5,5} 6 8 14
4 4vs2_2 4 {2,4,2,4} 2 {5,5} 6 8 14
5 4vs2_3 4 {2,4,2,4} 2 {5,5} 6 8 14
6 4vs2_4 4 {2,4,2,4} 2 {5,5} 6 8 14
7 4vs2_5 4 {2,4,2,4} 2 {5,5} 6 8 14
8 4vs2_6 4 {2,4,2,4} 2 {5,5} 6 8 14
9 4vs2_7 4 {3,3,6,4} 2 {5,9} 6 8 14
10 4vs2_8 4 {3,3,3,6} 2 {5,8} 6 8 14
11 2vs4_1 2 {9,9} 4 {4,5,5,4} 1 8 9
12 2vs4_2 2 {9,9} 4 {5,5,5,5} 1 8 9
13 2vs4_3 2 {9,9} 4 {5,5,5,5} 1 8 9
14 2vs5_1 2 {9,9} 5 {5,5,5,5,5} 1 10 11
15 2vs5_2 2 {10,10} 5 {5,5,5,5,5} 1 10 11
16 3vs4 3 {12,4,4} 4 {5,5,5,5} 3 12 15
Table 1: Details of the different game scenarios
Planners
UCT Planners UCT(t) UCT(hp)
Number of rollouts โ 5000Averaged over โ 5 runs
Planners
Baseline Planners Random Attack-Closest Attack-Weakest Stratagus-AI Human
Video โ Planning in action
Simple scenario<add video>
Complex scenario<add video>
Results
Figure 1: Time results for UCT(t) and baselines.
Results
Figure 2: Hit point results for UCT(t) and baselines.
Results
Figure 3: Time results for UCT(hp) and baselines.
Results
Figure 4: Hit point results for UCT(hp) and baselines.
Results - Comparison
Figures 1, 2, 3 & 4: Comparison between UCT(t) and UCT(hp) metrics
Time results Hit point results
UCT(t)
UCT
(hp)
Results
Figure 5: Time results for UCT(t) with varying rollouts.
I. IntroductionII. Related WorkIII. MethodIV. Experiments & ResultsV. Conclusion
Conclusion
Conclusion Hard planning problem Less expert knowledge Different objective functions
Future Work Computational time โ engineering aspects Machine Learning techniques Beyond Tactical Assault
Thank you