2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 1
Coding for Multiple CoresATI Developer DayCoding for Multiple CoresATI Developer Day
Bruce DawsonProgrammer/European Technical RepresentativeMicrosoft Game Technology Group
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 2
Why multi-threading/multi-core?Why multi-threading/multi-core?
Clock rates are stagnantFuture CPUs will be predominantly multi-thread/multi-core
Xbox 360 has 3 coresPS3 will be multi-core>70% of PC sales will be multi-core by end of 2006
Most Windows Vista systems will be multi-coreTwo performance possibilities:
Single-threaded? Minimal performance growthMulti-threaded? Exponential performance growth
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 3
Design for MultithreadingDesign for MultithreadingGood design is criticalBad multithreading can be worse than no multithreading
Deadlocks, synchronization bugs, poor performance, etc.
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 4
Bad MultithreadingBad Multithreading
Thread 1
Thread 2
Thread 3
Thread 4
Thread 5
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 5
Rendering ThreadRendering ThreadRendering Thread
Game Thread
Good MultithreadingGood Multithreading
Main Thread
Physics
Rendering Thread
Animation/Skinning
Particle Systems
Networking
File I/O
Game Thread
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 6
Handling Dependencies: CascadesHandling Dependencies: CascadesThread 1
Thread 2
Thread 3
Thread 4
Thread 5
Input
Physics
AI
Rendering
Present
Frame 1Frame 2Frame 3Frame 4
Advantages:Synchronization points are few and well-defined
Disadvantages:Increases latency (for constant frame rate)Needs simple (one-way) data flow
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 7
Typical Threaded TasksTypical Threaded Tasks
File DecompressionRenderingGraphics FluffPhysics
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 8
File DecompressionFile Decompression
Most common CPU heavy thread on the Xbox 360Easy to multithreadAllows use of aggressive compression to improve load timesDon’t throw a thread at a problem better solved by offline processing
Texture compression, file packing, etc.
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 9
Threading File I/O & DecompressionThreading File I/O & Decompression
First: use large reads and asynchronous I/OThen: consider compression to accelerate loading
Don't do format conversions etc. that are better done at build time!
Have resource proxies to allow rendering to continue
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 10
Bad Load ThreadBad Load Threadhashtable<Resource*> g_resources;// Load threadvoid LoadResource(ResID resName) {
Locker lock(&resourceLock);pNewResource = LoadCompressedResource(resName);g_resources.add(pNewResource);
}
// Render threadResource* GetResource(ResID resName) {
Locker lock(&resourceLock);return g_resources.find(resName);
}
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 11
Good Load Thread, Poor Render ThreadGood Load Thread, Poor Render Threadhashtable<Resource*> g_resources;// Load threadvoid LoadResource(ResID resName) {
pNewResource = LoadCompressedResource(resName);Locker lock(&resourceLock);g_resources.add(pNewResource);
}
// Render threadResource* GetResource(ResID resName) {
Locker lock(&resourceLock);return g_resources.find(resName);
}
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 12
Good Load Thread, Good Render ThreadGood Load Thread, Good Render Threadhashtable<Resource*> g_resources, g_renderRes;// Load threadvoid LoadResource(ResID resName) {
pNewResource = LoadCompressedResource(resName);Locker lock(&resourceLock);g_resources.add(pNewResource);
}
// Render threadResource* GetResource(ResID resName) {
return g_renderRes.find(resName);}
// Copy from g_resources to g_renderRes once per frame
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 13
RenderingRendering
Separate update and render threadsMulti-threaded device ownership (D3DCREATE_MULTITHREADED) works poorly
Exception: Xbox 360 command buffers
Special case of cascades paradigmPass render state from update to render
With constant workload gives same latency, better frame rateWith increased workload gives same frame rate, worse latency
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 14
Separate Rendering ThreadSeparate Rendering Thread
Update Thread
Buffer 1
Render Thread
Buffer 0
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 15
Graphics FluffGraphics Fluff
Extra graphics that doesn't affect playProcedurally generated animating cloud texturesCloth simulationsDynamic ambient occlusionProcedurally generated vegetation, etc.Extra particles, better particle physics, etc.
Easy to synchronizePotentially expensive, but if the core is otherwise idle...?
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 16
Physics?Physics?
Could cascade from update to physics to rendering
Makes use of three threadsMay be too much latency
Could run physics on many threadsUses many threads while doing physicsMay leave threads mostly idle elsewhere
Other possibilities (rendering and physics decoupled?): see Intel's talk
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 17
Rendering ThreadRendering Thread
Overcommitted Multithreading?Overcommitted Multithreading?Physics
Rendering Thread
Animation/Skinning
Particle Systems
Game Thread
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 18
How Many Threads?How Many Threads?No more than one CPU intensive software thread per core
3-6 on Xbox 3601-? on PC (1-4 for now, need to query)
Too many busy threads adds complexity, and lowers performance
Context switches are not freeCan have many non-CPU intensive threads
I/O threads that block, or intermittent tasks
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 19
Simultaneous Multi-ThreadingSimultaneous Multi-Threading
Be careful with Simultaneous Multi-Threading (SMT) threads
Not the same as double the number of coresCan give a small perf boostCan cause a perf dropCan avoid scheduler latency
Ideally one heavy thread per core plus some additional intermittent threads
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 20
Case Study: Kameo (Xbox 360)Case Study: Kameo (Xbox 360)
Started single threadedRendering was taking half of time—put on separate thread
Two render-description buffers created to communicate from update to renderLinear read/write access for best cache usageDoesn't copy const data
File I/O and decompress on other threads
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 21
Case Study: Kameo (Xbox 360)Case Study: Kameo (Xbox 360)
File decompression1XAudio0
2
1Rendering0
1
File I/O1Game update0
0
Software threadsThreadCore
Total usage was ~2.2-2.5 cores
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 22
Case Study: Project Gotham RacingCase Study: Project Gotham Racing
1XAudio0
2
Texture decompression1Crowd update, texture decompression0
1
Audio update, networking1Update, physics, rendering, UI0
0
Software threadsThreadCore
Total usage was ~2.0-3.0 cores
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 23
Managing Your ThreadsManaging Your Threads
Creating threadsSynchronizingTerminating
Don't use TerminateThread()Bad idea on Windows: leaves the process in an indeterminate state, doesn't allow clean-up, etc.Unavailable on Xbox 360
Instead return from your thread function, or call ExitThread
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 24
Creating Threads PoorlyCreating Threads Poorlyconst int stackSize = 0;HANDLE hThread = CreateThread(0, stackSize,
ThreadFunctionBad, 0, 0, 0);// Do work on main thread here.for (;;) { // Wait for child thread to complete
DWORD exitCode;GetExitCodeThread(hThread, &exitCode);if (exitCode != STILL_ACTIVE)
break;}
...
DWORD __stdcall ThreadFunctionBad(void* data){#ifdef WIN32
SetThreadAffinityMask(GetCurrentThread(), 8);#endif
// Do child thread work here.return 0;
}
CreateThread doesn't initialize C runtime
Stack size of zero means inherit parent's
stack size
Busy waiting is bad!
Don't forget to close this when done with it
Be careful with thread affinities on Windows
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 25
Creating Threads WellCreating Threads Wellconst int stackSize = 65536;HANDLE hThread = (HANDLE)_beginthreadex(0, stackSize,
ThreadFunction, 0, 0, 0);// Do work on main thread here.// Wait for child thread to completeWaitForSingleObject(hThread, INFINITE);CloseHandle(hThread);
...
unsigned __stdcall ThreadFunction(void* data){#ifdef XBOX
// On Xbox 360 you must explicitly assign// software threads to hardware threads.XSetThreadProcessor(GetCurrentThread(), 2);
#endif// Do child thread work here.return 0;
}
_beginthreadexinitializes CRT
Specify stack size on Xbox 360
The correct way to wait for a thread to exit
Don't forget to close this when done with it
Thread affinities must be specified on Xbox
360
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 26
Alternative: OpenMPAlternative: OpenMP
Available in VC++ 2005 (Windows and Xbox 360)Simple way to parallelize loops and some other constructsWorks best on long symmetric tasks—particles?Game tasks are short, asymmetricOpenMP is nice, but not ideal for games
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 27
Available Synchronization ObjectsAvailable Synchronization Objects
EventsSemaphoresMutexesCritical SectionsDon't use SuspendThread()
Some title have used this for synchronizationCan easily lead to deadlocksInteracts badly with Visual Studio debugger
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 28
Exclusive Access: MutexExclusive Access: Mutex// InitializeHANDLE mutex =
CreateMutex(0, FALSE, 0);
// Usevoid ManipulateSharedData() {
WaitForSingleObject(mutex, INFINITE);// Manipulate stuff...ReleaseMutex(mutex);
}
// DestroyCloseHandle(mutex);
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 29
Exclusive Access: CRITICAL_SECTIONExclusive Access: CRITICAL_SECTION// InitializeCRITICAL_SECTION cs;InitializeCriticalSection(&cs);
// Usevoid ManipulateSharedData() {
EnterCriticalSection(&cs);// Manipulate stuff...LeaveCriticalSection(&cs);
}
// DestroyDeleteCriticalSection(&cs);
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 30
Lockless programmingLockless programming
Trendy technique to use clever programming to share resources without lockingIncludes InterlockedXXX(), lockless message passing, Double Checked Locking, etc.Very hard to get right:
Compiler can reorder instructionsCPU can reorder instructionsCPU can reorder reads and writesInterlockedXxx is not a memory barrier on Xbox 360
Not as fast as avoiding synchronization entirely
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 31
Lockless Messages: BuggyLockless Messages: Buggyvoid SendMessage(void* input) {
// Wait for the message to be 'empty'.while (g_msg.filled)
;memcpy(g_msg.data, input, MESSAGESIZE);g_msg.filled = true;
}
void GetMessage() {// Wait for the message to be 'filled'.while (!g_msg.filled)
;memcpy(localMsg.data, g_msg.data, MESSAGESIZE);g_msg.filled = false;
}
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 32
Synchronization tips/costs:Synchronization tips/costs:
Synchronization is moderately expensive when there is no contention
Hundreds to thousands of cycles
Synchronization can be arbitrarily expensive when there is contention!Goals:
Synchronize rarelyHold locks brieflyMinimize shared data
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 33
Beware hidden synchronization:Beware hidden synchronization:
Allocations are (generally) a synch pointConsider per-thread heaps with no lockingHEAP_NO_SERIALIZE flag avoids lock on Win32 heapsConsider custom single-purpose allocatorsConsider avoiding memory allocations!
Avoid synch in in-house profilersD3DCREATE_MULTITHREADED causes synchronization on almost every Direct3D call
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 34
Profiling multi-threaded appsProfiling multi-threaded apps
Need thread-aware profilersProfiling may hide many synchronization stallsHome-grown spin locks make profiling harderConsider instrumenting calls to synchronization functions
Don't use locks in instrumentation—use TLS variables to store results
Windows: Intel VTune and the Visual Studio Team System ProfilerXbox 360: PIX, XbPerfView, etc.
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 35
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 36
PIX timing capturePIX timing capture
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 37
Naming ThreadsNaming Threadstypedef struct tagTHREADNAME_INFO {
DWORD dwType; // must be 0x1000LPCSTR szName; // pointer to name (in user addr space)DWORD dwThreadID; // thread ID (-1=caller thread)DWORD dwFlags; // reserved for future use, must be zero
} THREADNAME_INFO;
void SetThreadName( DWORD dwThreadID, LPCSTR szThreadName) {THREADNAME_INFO info;info.dwType = 0x1000;info.szName = szThreadName;info.dwThreadID = dwThreadID;info.dwFlags = 0;
__try {RaiseException( 0x406D1388, 0, sizeof(info)/sizeof(DWORD),
(DWORD*)&info );}__except(EXCEPTION_CONTINUE_EXECUTION) {}
}
SetThreadName(-1, "Main thread");
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 38
Other IdeasOther Ideas
Debugging tips for MTVisual Studio does support multi-threaded debugging
Use threads windowUse @hwthread in watch window on Xbox 360
KD and WinDBG support multi-threaded debugging
Thread Local Storage (TLS)__declspec(thread) declares per-thread variables
But doesn't work in dynamically loaded DLLsTLSAlloc is less efficient, less convenient, but works in dynamically loaded DLLs
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 39
Windows tipsWindows tips
Test on multiple machines and configurations
Single-core, SMT (i.e. Hyper-Threading), Dual-core, Intel and AMD chips, Multi-socket multicore(4+ cores)
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 40
Windows API featuresWindows API features
WaitForMultipleObjectObviously better than a series of WaitForSingleObject callsThe OS is highly optimized around multithreading and event-based blocking
I/O Completion PortsVery efficient way to have the OS assign a pool of worker threads to incoming I/O requestsUseful construct for implementing a game server
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 41
SMT versus MulticoreSMT versus Multicore
OS returns number of logical processors in GetSystemInfo(), so a 2 could mean a SMT machine with only 1 actual core –or-2 coresDetailed Win32 APIs exposing this distinction not available until Windows XP x64, Windows Server 2003 SP1, Windows Vista, etc.GetLogicalProcessorInformation()
For now you have to use CPUID detailed by Intel and AMD to parse this out…
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 42
Timing with Multiple CoresTiming with Multiple Cores
RDTSC is not always synced between cores!As your thread moves from core to core, results of RDTSCcounter deltas may be nonsense
CPU frequency itself can change at run-time through speed step technologies
See Power Management APIs for more informationBest thing to do is use Win32 API QueryPerformanceCounter / QueryPerformanceFrequencySee DirectX SDK article Game Timing and Multiple Cores
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 43
Thread MicromanagementThread Micromanagement
Use SetThreadAffinityMask with caution!
May be useful for assigning ‘heavy’ work threadsThis mask is technically a hint, not a commitmentRDTSC-based instrumenting will require locking the game threads to a single coreOtherwise let the Windows scheduler do the right thingCreateDevice/Reset might have a side-effect on the calling thread’s affinity with software vertex processing enabled
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 44
Thread Micromanagement (cont)Thread Micromanagement (cont)
Be careful about boosting thread priorityIf the priority is too high, you could cause the system to hang and become unresponsiveIf the priority is too low, the thread may starve
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 45
DLLs and MultithreadingDLLs and Multithreading
DllMain for every DLL is informed of thread creation/destruction
For some DLLs this is required to initialize TLSFor many this is a waste of time, so call DisableThreadLibraryCalls() from your DllMain during process creation (DLL_PROCESS_ATTACH)
The OS serializes access to the entry pointThis means threads created during DllMainwon’t start for a while, so don’t wait on them in the DLL startup
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 46
ResourcesResources
Multithreading Applications in Win32, Jim Beveridge & Robert Weiner, Addison-Wesley, 1997Multiprocessor Considerations for Kernel-Mode Drivers
http://download.microsoft.com/download/e/b/a/eba1050f-a31d-436b-9281-92cdfeae4b45/MP_issues.doc
Determining Logical Processors per Physical Processorhttp://www.intel.com/cd/ids/developer/asmo-na/eng/dc/threading/knowledgebase/43842.htm
GetLogicalProcessorInformationhttp://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/getlogicalprocessorinformation.asp
Double checked lockinghttp://en.wikipedia.org/wiki/Double-checked_locking
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 47
ResourcesResourcesGDC 2006 Presentations
http://msdn.com/directx/presentationsDirectX Developer Center
http://msdn.com/directxXNA Developer Center
http://msdn.com/xnaXbox Developer Center (Registered Devs Only)
https://xds.xbox.comXNA, DirectX, XACT Forums
http://msdn.com/directx/forumsEmail addresses
[email protected] (DirectX Feedback)[email protected] (Xbox Developers Only)[email protected] (XNA Feedback) `
2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 48
© 2006 Microsoft Corporation. All rights reserved.Microsoft, DirectX, Xbox 360, the Xbox logo, and XNA are either registered trademarks or trademarks of Microsoft Corporation in the United Sates and / or other countries.
This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.