+ All Categories
Home > Documents > JOHAN TORP STHLM GAME DEVELOPER FORUM 5/5 2011. ›M.Sc. Computer Science. OO (Java) and functional...

JOHAN TORP STHLM GAME DEVELOPER FORUM 5/5 2011. ›M.Sc. Computer Science. OO (Java) and functional...

Date post: 14-Dec-2015
Category:
Upload: samara-elmer
View: 217 times
Download: 4 times
Share this document with a friend
Popular Tags:
42
A STEP TOWARDS DATA ORIENTATION JOHAN TORP <[email protected]> STHLM GAME DEVELOPER FORUM 5/5 2011
Transcript

A STEP TOWARDS DATA ORIENTATIONJOHAN TORP <[email protected]>STHLM GAME DEVELOPER FORUM 5/5 2011

› M.Sc. Computer Science. OO (Java) and functional programming (Haskell)› Worked ~5 years outside game industry. C++, generic programming &

boost, DbC› AI coder at DICE ~2½ years

MY BACKGROUND

Optimal game dev credentials?

NEW TRADE-OFFS

• OOP• GP• FP• DbC• TMP

PC & normal sized apps = cache schmache

Games on consoles 5000 L2 misses = ~1ms

- data-oriented design ftw!

› A lot of OO code and knowledge out there› Incrementally moving from OO to cache-friendlier code

THIS TALK

› Facts needed before looking at code› Cache-friendly pathfinding› Async vs sync code› Questions

AGENDA

› Visual domain specific scripting language› Gameplay / AI code in C++› NavPower pathfinding middleware› EASTL containers› We love to blow up parts of our game worlds – and call this destruction

FROSTBITE

› PS3 has 1 core: 32KB data and 32KB instruction L1 cache › 512KB L2 I+D cache› 360 has 3 cores: 32KB data and 32KB instruction L1 cache for each core, › 1Mb L2 I+D shared by all cores

› 1 L1 cache miss ~= 40 cycles

“You miss L1 so much that you cry yourself to sleep every night with a picture of it under your pillow” @okonomiyonda

› 1 L2 cache miss ~= 600 cycles › 1 L2 cache miss ~= 20 matrix multiplications› Other than heavy calculations: CPU performance ~= cache misses

PS3 (PPU) / XBOX 360 CACHE ECONOMY

Keep copy of common data nearby … in a compact representation

Pointer chasing thrashes both I-cache and D-cache

Often better to copy frequently accessed data once each frame, access copy instead

KEEP HOT DATA NEARBY

getBot()->getPlayer()->getControllable()->getWorldPosition()

EXAMPLE DATA-ORIENTED INPUT

/// Temporary struct containing information about a single sensor. /// Never stored between updates. struct VisionInfo {

VisionInfo(const AiSettings& settings, EntryComponent& owner, ...);

Vec3 eyePos; Vec2 eyeForwardXz; uint playerId;

// Extracted from settings float centralAngle; float peripheralAngle; float seeingDistance; bool seeThroughTerrain;

};

› Temporary data structures common in Data-Oriented Design› Stack variables or alloca()› Not suited for large amounts or large edge cases

TEMPORARIES – STACK SPACE

› Put aside 8x128kb blocks for ”scratch pad calculations”› Linear allocator – doesn’t free within block› Return whole block when done – zero fragmentation

TEMPORARIES – SCRATCH PAD

Find a good slot in fragmented memory space Expensive!Container of new:ed objects scattered in memory Poor cache locality!Mix short/long lived allocations -> fragmention Lose memory over time!

You should prefer pre-allocated flat vectors and try to minimize new/malloc

NEW / MALLOC

TIME TO LOOK AT PATHFINDING

IN AN OO WORLD

AI DECISION MAKING

PATHFINDING

ANIMATION

NAVPOWER

OO

OO

OO

DECISION TO MOVEMENT

DECISION TO MOVEMENT

DECISION TO MOVEMENT

DECISION TO MOVEMENT

DECISION TO MOVEMENT

› Find path› Load / unload nav mesh section› Add / remove obstacles› Path invalidation detection› Can go-tests› Line- / can go straight-tests, circle tests, triangle tests

NAVPOWER OPERATIONS NEEDED

› Find path› Load / unload nav mesh section› Add / remove obstacles› Path invalidation detection› Can go-tests› Line- / can go straight-tests, circle tests, triangle tests

Collect and batch process for good cache locality

NAVPOWER OPERATIONS NEEDED

› Pathfinder - find path, path invalidation, circle/line tests› Random position generator - can go-tests› Manager - load nav mesh, obstacles, destruction, updates

Let some line tests in AI decision making remain synchronous

ABSTRACTIONS

class Pathfinder { virtual PathHandle* findPath(const PathfindingPosition& start, const PathfindingPosition& end, float corridorRadius, PathHandle::StateListener* listener) = 0;

virtual void releasePath(PathHandle* path) = 0;

virtual bool canGoStraight(Vec3Ref start, Vec3Ref end, Vec3* collision = nullptr) const = 0; };

PATHFINDER INTERFACE

PATH HANDLE

typedef eastl::fixed_vector<Vec3, 8> WaypointVector; typedef eastl::fixed_vector<float, 8> WaypointRadiusVector;

struct PathHandle { enum State {ComputingPath, ValidPath, NoPathAvailable, RepathingRequired};

class StateListener { virtual void onStateChanged(PathHandle* handle) = 0; };

PathHandle():waypoints(pathfindingArena()), radii(pathfindingArena()) {}

WaypointVector waypoints; WaypointRadiusVector radii; State state;

};

PATH HANDLE

typedef eastl::fixed_vector<Vec3, 8> WaypointVector; typedef eastl::fixed_vector<float, 8> WaypointRadiusVector;

struct PathHandle { enum State {ComputingPath, ValidPath, NoPathAvailable, RepathingRequired};

class StateListener { virtual void onStateChanged(PathHandle* handle) = 0; };

PathHandle():waypoints(pathfindingArena()), radii(pathfindingArena()) {}

WaypointVector waypoints; WaypointRadiusVector radii; State state;

};

› class NavPowerPathfinder : public Pathfinder {› public:

virtual PathHandle* findPath(...) override;› virtual PathHandle* findPathFromDestination(...) override;› virtual void releasePath(...) override;› virtual bool canGoStraight(...) const override;

void updatePaths();› void notifyPathListeners();

› private:› bfx::PolylinePathRCPtr m_paths[MaxPaths];

PathHandle m_pathHandles[MaxPaths];› PathHandle::StateListener* m_pathHandleListeners[MaxPaths];› u64 m_usedPaths, m_updatedPaths, m_updatedValidPaths; };

NAVPOWER PATHFINDER

CORRIDOR STEP1. Copy all new NavPower paths -> temporary representation2. Drop unnecessary points for all paths3. Corridor adjust all paths 4. Copy temporaries -> PathHandles

typedef eastl::vector<CorridorNode> Corridor;

ScratchPadArena scratch; Corridor corridor(scratch); corridor.resize(navPowerPath.size()); // Will allocate memory using scratch pad

CORRIDOR STEP1. Copy all new NavPower paths -> temporary representation2. Drop unnecessary points for all paths3. Corridor adjust all paths 4. Copy temporaries -> PathHandles

for (...) { // Loop through all paths in their corridor representation dropUnnecessaryPoints(it->corridor, scratchPad);

for (...) shrinkEndPoints(it->corridor);

for (...) calculateCornerDisplacements(it->corridor);

for (...) displaceCorners(it->corridor);

for (...) shrinkSections(it->corridor);

for (...) copyCorridorToHandle(it->corridor, it->pathHandle);

}

CORRIDOR STEP 2-4

NAVPOWER MANAGER

void NavPowerManager::update(float frameTime) { m_streamingManager.update(); m_destructionManager.update(); m_obstacleManager.update();

for (PositionGeneratorVector::const_iterator it= ...) (**it).update();

bfx::SystemSimulate(frameTime);

for (PathfinderVector::const_iterator it=m_pathfinders.begin(), ...) (**it).updatePaths(); for (PathfinderVector::const_iterator it=m_pathfinders.begin(), ...) (**it).notifyPathListeners(); }

AI Decision Making Code

NAIVE OO CALL PATTERN

Pathfinding Runtime Code

NavPower Code

Animation Code

EXECUTION

Animation Code

Animation Code

AI Decision Making Code

AI Decision Making Code

NavPower Code

CURRENT CALL PATTERN

Pathfinding Runtime Code

EXECUTION

HOT HOT HOT!!!!!

› Keep pathfinding code/data cache hot› Avoid call sites cache running cold› Easier to jobify / SPUify› Easy to schedule and avoid spikes

BATCHING BENEFITS

SIMPLIFIED ARCHITECTURE

AI DECISION MAKING

PATHFINDING

ANIMATION

NAVPOWER

OO

OO

OO

LESS SIMPLIFIED ARCHITECTURE

LOCOMOTION

PATHFINDING

DRIVING LOCOMOTION

ANIMATION

SCRIPTINGSERVER CLIENT

VEHICLE INPUT

PATH FOLLOWING

AI DECISION MAKING

NAVPOWER

Waypoint DataCorridor Radii

Waypoint Positions

Each server update

1. Each AI decision making2. Pathfinding manager update

All pathfinding requestsAll corridor adjustmentsAll PathHandle notifications -> path following -> server locomotion

3. Network pulse. Server locomotion -> client locomotion4. ...rest of update

No extra latency added

MY PRECIOUS LATENCY

› Callbacks. Delay? Fire in batch?› Handle+poll instead of callbacks. Poll in batch?› Record messages/events, act on them later.. in batch?› Assume success, recover from failure next update

DELAYING/BATCHING

+ Cache friendly & parallelizable+ Easy to profile & schedule+ Avoid bugs with long synchronous callback chains+ Modular

- More glue code managers, handles, polling update calls, multiple representations of the same data

- More bugsindex fiddling, life time handling, latency, representations drifting out of sync

- Callstack won’t tell you everythingbreak point in sync code gives easy-to-debug vertical slice...

...but can we afford vertical deep dives?

DELAYING/BATCHING - PROS & CONS

› Do not have to abandon OO nor rewrite the world› Start small, batch a bit, cut worst pointer chasing, avoid deep dives, grow

from there› Much easer to rewrite a system in a DO fashion afterwards

Existing code is crystallized knowledge, refactor incrementally to learn!

INCREMENTAL GAINS

›Background: Console caches, heap allocations expensive, temporary memory›AI decision making – pathfinding – animation›Code: Async abstractions, handles, scratch pad, fixed_vector, batch processing›Latency analysis, pros&cons sync vs async

Think about depth/width of calls, try stay within your system, keep hot data nearby

SUMMARY

Avoid rewritis You can retract your synchronous tentacles slowly

WHEN BATTLING AN OO OCTUPUS

email [email protected] twitter semanticspeed slides www.johantorp.com

QUESTIONS?


Recommended