Paris Game/AI Conference 2011

Post on 15-Dec-2014

1,328 views 0 download

Tags:

description

Slides from the Paris Game/AI Conference 2011 talk by Neil Henning - covering the

transcript

Preparing AI for Parallelism

Lessons from NASCAR The Game 2011Neil Henning – Technology Lead

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Introduction

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

I am sure some of you are wondering...

Introduction

Paris Game AI Conference 2011

Why a guy from

is doing a talk about

which was developed by

Neil Henningneil@codeplay.co

m

● Team from Codeplay worked for 15 months on game

Introduction

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Introduction

● NASCAR isn’t just about driving straight, then turning left

● 43 cars on screen at the same time

● Overtaking is all about navigating through these packs● Cannot simply make the AI use LODs, nearly always in

view

● Cars race in tight packs on the circuit

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Agenda

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

● How to prepare AI for parallelism

Agenda

● …by investigating NASCAR the Game 2011's AI

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Agenda

● During the investigation I will answer the questions:

● Why prepare your AI for parallelism?

● What changes should be made?

● How did these changes help when optimizing NASCAR?

● How did we make use of the PS3's unique hardware?

● What common issues are there?

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

● What performance improvement was achieved?

Why prepare your AI for parallelism?

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Why prepare your AI for parallelism?

● Without parallelism, tighter limits on number of bots

Paris Game AI Conference 2011

frame length

● Say we have four bots

● In serial – can easily fit in a frame

Neil Henningneil@codeplay.co

m

Why prepare your AI for parallelism?

● Without parallelism, tighter limits on number of bots

Paris Game AI Conference 2011

frame length

● Want to increase bots by 3x?

● Have to either optimize or parallelize (or both)

Neil Henningneil@codeplay.co

m

Why prepare your AI for parallelism?

● Without parallelism, tighter limits on number of bots

Paris Game AI Conference 2011

frame length

● Split work between threads

● Only possible with parallelism

Neil Henningneil@codeplay.co

m

Why prepare your AI for parallelism?

● Multicore is the future (has been for some time)

Paris Game AI Conference 2011

● Even iPad uses dual core processors now!

● Sony's new PS Vita is quad core

● This generation of consoles are multicore

● Being able to split work amongst cores is key

● Might not be required yet, but could be essential later

Neil Henningneil@codeplay.co

m

Why prepare your AI for parallelism?

● Helps during crunch time

Paris Game AI Conference 2011

● Have AI prepared to become parallel

● Either optimize engine or cut features

● Optimization being sought throughout engine

● Optimization folks will love you!

Neil Henningneil@codeplay.co

m

What changes should be made?

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

● Split work into manageable chunks

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

What changes should be made?

● In NASCAR, had 18 components for each car

Stay

Behind

Stay

Beside

Obstacle

Detectio

n

Driving

Controllers

● Components are in groups

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

What changes should be made?

● All components in a group can be run in parallel

● 43 cars = 43 AIs

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

What changes should be made?

● Each car’s groups can be run in parallel too

0

1

2

42

What changes should be made?

● Read/Write phases

Paris Game AI Conference 2011

● Two phases for your AI

● Read phase can read world/other car state

● Write phase can modify own car state

Neil Henningneil@codeplay.co

m

What changes should be made?

● Use temporary data to store read values from

environment

Paris Game AI Conference 2011

● In read phase, store needed reads into temporary data

● In write phase, read from the temporary data

● AI is one frame behind world events

Neil Henningneil@codeplay.co

m

● Effect on AI is minimal

What changes should be made?

● In NASCAR a read/write phase was used

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Write

Phase

Read

Phase● Write phase uses data from previous frames read phase

● Minimal set of components in read/write phase group

● Only components that required world/other car state

What changes should be made?

● Remove large stack locals

Paris Game AI Conference 2011

● Having two or more threads means lots of duplicate

locals

Neil Henningneil@codeplay.co

m

void func(){

char localBuffer[1024];// … do something with localBuffer

}

● If func is called from many threads, many times data

use!

What changes should be made?

● Document code – describe relationship between data

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

struct Foo{

Bar * bar;};

one :

one?

one :

many?

many :

one?

What changes should be made?

● Document code – describe relationship between data

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

struct Foo{

Bar * bar;};

● Knowing how data is shared critical for threading

● Documenting the relationship saves time and effort later

What common issues are there?

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

What common issues are there?

● Virtual functions – can have a high runtime cost

Paris Game AI Conference 2011

● ~500-1200 cycles on PowerPC if virtual lookup misses

cache

Neil Henningneil@codeplay.co

m

● Can equate to a large amount of time doing no work

What common issues are there?

● In NASCAR, components had virtual update method

Paris Game AI Conference 2011

● Based on previous game (Supercar Challenge)

Neil Henningneil@codeplay.co

m

● 16 cars in previous, now 43 cars

● 5 component types in previous, now 18 component

types

● Now read/write phase too

● 80 virtual calls to update became 1333 virtual calls!

What common issues are there?

● In NASCAR, components had virtual update method

Paris Game AI Conference 2011

● In real terms, 3ms of virtual function lookup per frame

Neil Henningneil@codeplay.co

m

● First optimization was to have typed buckets of

components

● 1333 virtual calls went to 31 virtual calls

● Platform agnostic (PS3, 360 and Wii all sped up)

What common issues are there?

● Virtual functions not just a code abstraction

Paris Game AI Conference 2011

● Virtual functions hide data too

Neil Henningneil@codeplay.co

m

● Not knowing the size of data kills SPU/Compute

development

struct Foo { virtual void func(); };struct Bar : public Foo { virtual void func(); };

Foo * foo;foo->func();

// don’t know size of foo! Could be sizeof(Foo) || sizeof(Bar)

What common issues are there?

● Naïve multithreading – locks galore

Paris Game AI Conference 2011

● Locks can be a solution, be very careful of use though

Neil Henningneil@codeplay.co

m

void func(){

lock->lock();// … do somethinglock->unlock();

}

● Read/write phases allow removal of most (if not all) locks

● Avoid/reduce/remove locks if possible

What common issues are there?

● Physics subsystem caused issues with NASCAR

Paris Game AI Conference 2011

● Physics system used, raycast to find problematic

obstacles

Neil Henningneil@codeplay.co

m

● Each call to raycast used a mutex, every thread would

halt!

● AI required knowledge of obstacles

● Had to refactor code to remove need for locking

What common issues are there?

● Know your data – how is it accessed? Where is it shared?

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

struct RaceCar { Brain * brain; };

struct Brain { RaceCar * raceCar; Obstacle ** obstacles; };

struct Obstacle { BrainInterface * interface; };

struct BrainInterface { RaceCar * raceCar; Brain * brain; };

● Very easy for systems grown over time to have

convoluted struct layouts

How did these changes help when optimizing

NASCAR?

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

How did these changes help when optimizing NASCAR?

● Read/Write phase was key to performance on Xbox 360

Paris Game AI Conference 2011

● Allowed work to be split across all 6 threads

Neil Henningneil@codeplay.co

m

● Each thread was given 1/6th of the cars to process

● Takes 2ms of all CPU resources on 360 in a frame

...

barriers

How did these changes help when optimizing NASCAR?

● Tried the same approach on PS3

Paris Game AI Conference 2011

● Both threads on PS3 were completely full

Neil Henningneil@codeplay.co

m

● Any multithreading speedup has to be on the SPUs

● Code was ~2Mb and data was ~8Mb – far too large!

● Each SPU has 256kb local storage (for code & data)

● Unfeasible to mimic 360 approach

● Only 2 threads on PS3, but have 6 sub processors (the

SPUs)

● On PS3 most costly components were targeted

How did these changes help when optimizing NASCAR?

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

How did these changes help when optimizing NASCAR?

● PS3 version relied on components being run in parallel

Paris Game AI Conference 2011

● And all components in a group being able to be run in

parallel

Neil Henningneil@codeplay.co

m

● Costly groups were made to use the SPUs

● Knowing relationship between data was key

● Well documented code made life so much easier!

How did we make use of the PS3's unique

hardware?

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

How did we make use of the PS3's unique hardware?

● Codeplay was asked by Eutechnyx to optimize the AI

Paris Game AI Conference 2011

● Very tight deadlines, 1 month to reduce time taken in AI

Neil Henningneil@codeplay.co

m

● No main thread time left – have to use the SPUs

● Our Offload compiler technology crucial

How did we make use of the PS3's unique hardware?

● For those unfamiliar with coding for the SPU…

Paris Game AI Conference 2011

● They are amazingly fast, if you code correctly for them

Neil Henningneil@codeplay.co

m

● Normally requires total rewrite of existing codebase

● Painful to access global variables

● Virtual functions are a complete write off

How did we make use of the PS3's unique hardware?

● SPU development typically takes many months

Paris Game AI Conference 2011

● Common to have 4-5 SPU programmers for ~10 months

Neil Henningneil@codeplay.co

m

● Not feasible for late-in-cycle development

● Offload aims to mitigate the issues with getting code

onto SPU

● Can offload code to SPU much quicker (typically a few

man days)

● Much easier to move existing code bases to SPU

How did we make use of the PS3's unique hardware?

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

● Small language extension moves work from PPU to SPU

● Any work within an offload block is performed on the SPU

__blockingoffload(){

// do some work on SPU, PPU waits for completion!};

offloadThread_t handle = __offload(){

// do some work on SPU!};

// can do some work on PPU before waiting for SPUoffloadThreadJoin(handle);

● All PPU code is duplicated for the SPU

How did we make use of the PS3's unique hardware?

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

● Offload allows access to global variables

● Just use them as normal!

int aGlobalVariable;

__blockingoffload(){

int aLocalVariable = aGlobalVariable;};

How did we make use of the PS3's unique hardware?

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

● Offload allows virtual function calls too

● Just have to specify which virtual functions may be called

struct Foo { virtual void bar() {} };

__blockingoffload[Foo::bar this](){

Foo foo;foo.bar();

};

How did we make use of the PS3's unique hardware?

● First, profiled the AI during a typical race

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Driving Controllers

Obstacle Detection

Stay Behind Other Car

Stay Beside Other Car

● Four components taking most of the frame time

How did we make use of the PS3's unique hardware?

● Used four slightly different strategies when

multithreading

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Driving Controllers

Obstacle Detection

Stay Behind Other Car

Stay Beside Other Car

How did we make use of the PS3's unique hardware?

● Obstacle Detection only component in its group

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Obstacle Detection

● Very inefficient code for the SPU, but moved 1/3 onto 4

SPUs

How did we make use of the PS3's unique hardware?

● Looked at Stay Behind/Beside Other Car together

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Stay Behind Other Car

Stay Beside Other Car

● In the same group, can be run in parallel

How did we make use of the PS3's unique hardware?

● Moved Stay Behind component to SPU

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Stay Behind Other Car

Stay Beside Other Car

● Stay Beside component would continue to be run on PPU

How did we make use of the PS3's unique hardware?

● As long as SPU work was less time than the PPU work, no

cost!

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Stay Behind Other Car

Stay Beside Other Car

● Effectively ‘hid’ the cost of calculating Stay Behind

component

How did we make use of the PS3's unique hardware?

● Lastly, driving controllers took 1/3 of AI cost alone

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Driving Controllers

● Split the cars across 4 SPUs, and ran in parallel

How did we make use of the PS3's unique hardware?

● In total ~170 source code changed

Paris Game AI Conference 2011

● Changes were purely optimization

Neil Henningneil@codeplay.co

m

AIObstacle ** obstacles;unsigned int numObstacles;offloadThread_t handle = __offload(obstacles, numObstacles){

for(unsigned int i = 0; i < numObstacles; i++){

AIObstacle * obstacle = obstacles[i];

// use obstacle for calculations}

};

How did we make use of the PS3's unique hardware?

● In total ~170 source code changed

Paris Game AI Conference 2011

● Changes were purely optimization

Neil Henningneil@codeplay.co

m

// array of AIObstacle * ’s on main memoryAIObstacle ** obstacles;unsigned int numObstacles;offloadThread_t handle = __offload(obstacles, numObstacles){

for(unsigned int i = 0; i < numObstacles; i++){

// AIObstacle * points to main memoryAIObstacle * obstacle = obstacles[i];

// use obstacle for calculations}

};

How did we make use of the PS3's unique hardware?

● In total ~170 source code changed

Paris Game AI Conference 2011

● Changes were purely optimization

Neil Henningneil@codeplay.co

m

// array of AIObstacle * ’s on main memoryAIObstacle ** obstacles;unsigned int numObstacles;offloadThread_t handle = __offload(obstacles, numObstacles){

CachedPointer<AIObstacle *>innerObstacles(obstacles, numObstacles);

for(unsigned int i = 0; i < numObstacles; i++){

// AIObstacle * points to main memoryCachedPointer<AIObstacle>

obstacle(innerObstacles[i]);// use obstacle for calculations

}};

What performance improvement was achieved?

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

What performance improvement was achieved?

● Obstacle detection went from 2ms -> 1.1ms

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

● ~100 lines of source code changed

Obstacle Detection

● 2½ weeks development time

What performance improvement was achieved?

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Obstacle Detection

● Obstacle detection went from 2ms -> 1.1ms

● ~100 lines of source code changed

● 2½ weeks development time

What performance improvement was achieved?

● Stay Behind went from 1.1ms -> 0ms (hidden behind

other)

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

● ~50 lines of source code changed

Stay Behind Other Car

Stay Beside Other Car

● 1 week development time

What performance improvement was achieved?

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Stay Behind Other Car

Stay Beside Other Car

● Stay Behind went from 1.1ms -> 0ms (hidden behind

other)● ~50 lines of source code changed

● 1 week development time

What performance improvement was achieved?

● Driving Controllers went from 4ms -> 0.6ms

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

● ~20 lines of source code changed

Driving Controllers

● 8 hours development time

What performance improvement was achieved?

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Driving Controllers

● Driving Controllers went from 4ms -> 0.6ms

● ~20 lines of source code changed

● 8 hours development time

What performance improvement was achieved?

● Performance speaks for itself!

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

● 50% speed improvement on PS3

Takeaway

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Takeaway

● It is possible to parallelise late in development

Paris Game AI Conference 2011

● But need code ready to be parallelised

Neil Henningneil@codeplay.co

m

● Small changes in coding style lead to hugely better

results● Better to plan systems from beginning with multicore in

mind

Questions?

Paris Game AI Conference 2011

Neil Henningneil@codeplay.co

m

Can also catch me on twitter @sheredom