+ All Categories
Home > Technology > MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

Date post: 05-Dec-2014
Category:
Upload: amd-developer-central
View: 4,005 times
Download: 1 times
Share this document with a friend
Description:
Presentation MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder at the AMD Developer Summit (APU13) November 11-13, 2013.
12
DESIGNING A GAME AUDIO ENGINE FOR HSA LAURENT BETBEDER SCEA
Transcript
Page 1: MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

DESIGNING A GAME AUDIO ENGINE FOR HSA LAURENT BETBEDER

SCEA

Page 2: MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

2 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

WHAT’S SO SPECIAL ABOUT CONSOLE GAME DEV?

Extreme performance optimizations

‒ Until gamers opt for shorter upgrade cycles (phones/tablets business model) ?

‒ Can’t run sub-optimal audio code when competing for cycles on crowded compute queues

Custom hardware, OS, drivers and compilers

‒ To extract max perf from fixed hardware

‒ Helps lengthening platform life time

‒ “But but… where’s my OpenCL runtime?”

Low latency

‒ Music games on consoles need it as much as professional music prod software on desktop

‒ But is much harder to achieve reliably when a system is constantly overloaded

NOW THAT CONSOLES MOSTLY RUN PC HARDWARE

Page 3: MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

3 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

GAME AUDIO DSP ON THE ACP

Heavy specialized DSP workloads

‒ Stuff games need badly but don’t really want to deal with

‒ Best fit for dedicated and/or fixed function hardware

‒ Codecs

‒ CELP codecs -> party chat

‒ 100s of MP3/AT9/AAC decode instances

‒ Huge impact on game assets footprint, down/load times

‒ Optional output bitstream encoding (AC3/DTS)

‒ Voice recognition

‒ Echo cancelation

Platform wide IP licensing levels the playing field

‒ Good for indy developers

‒ And good for the platform!

Available via asynchronous secure system APIs

WHY?

Page 4: MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

4 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

GAME AUDIO DSP ON THE ACP

Exotic hardware and dev environment

‒ Closed to games

‒ Closed to middleware

‒ Platform specific

Asynchronous interface

‒ Can’t have sequential interleaving of DSP back and forth between CPU and ACP w/o latency buildup

‒ But ultimately, we want the DSP pipeline to be data driven (by artists who know nothing about this)

‒ Modularity

Slow clock rate @ 800MHz, very limited SIMD and no FP support

‒ Tough sell against Jaguar for many DSP algorithms

‒ Very tight local memory shared by multiple DSP cores

Already pretty busy with codec loads and system tasks

WHY NOT?

Page 5: MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

5 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

GAME AUDIO DSP ON THE GPU

Much more demand for real-time effects today and will keep growing

CPU FLOPS likely to stagnate and could even decline in HSA as CUs takes over SIMD workloads

Flexibility: some games are CPU bound, others are GPU bound…

hUMA is a game changer (removes NUMA’s main bottleneck: GPU write back)

Compute queues with prioritized scheduling and even some form of preemption

Many real-time audio DSP algorithms work well on wide SIMD units

‒ FFT convolution (spectral processing in general)

‒ Mixing, resampling, wave shaping, etc…

Mostly coalesced mem accesses

Low/med bandwidth (< 1GB/s)

WHY?

Page 6: MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

6 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

GAME AUDIO DSP ON THE GPU

Some algorithms do not work (as) well on wide SIMD units

‒ IIR filters, ADPCM decodes, dynamics: data recursion causes thread interdependencies within wavefronts

‒ Typical AAA game runs 1000s of biquads at various stages in the filtergraph

Workloads may require batch voice processing to achieve high CU efficiency

‒ Build 2D grids (channels x samples) or 3D grids (channels x subbands x samples)

‒ Swizzling is key but watch out for runtime cost as SIMD widens (static vs dynamic)

Batch processing goes against free form MaxMSP model artists are pushing for

‒ Unique DSP chain for each sound “just because we can!”

‒ Data driven filtergraph and DSP pipeline

Complex prioritized scheduling & dispatching compute queues

‒ Do not prevent intermittent CU saturation caused by large graphics workloads

‒ Risky for low latency direct path audio DSP

Proprietary hardware, drivers and shader compilers (PSSL)

‒ Audio middleware will need a some incentive to move up there

‒ Most will probably stay on the CPU

WHY NOT?

Page 7: MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

7 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

GAME AUDIO DSP ON JAGUAR

Well known and open x64 dev environment

‒ Middleware friendly

‒ CLANG/LLVM solid & stable

Full FP unit with SSE4 support

Early PA is surprisingly good for compiled intrinsics code

‒ ~10% slower than core i7 @ same clock rate

‒ GDDR5 latency is not an issue

‒ < ~50% of 1 core @ 1.6GHz running the entire KZSF filtergraph

Only reliable solution for ultra low latency

‒ Music and rhythm games

‒ Run 100% on CPU (including decoding)

WHY?

Page 8: MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

8 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

GAME AUDIO DSP ON JAGUAR

“Weak laptop CPU” compared to top of the line on desktop

‒ No FMA4

‒ Slow clock @ 1.6GHz (compared to typical desktop)

256bit AVX mostly useless

Possible bottleneck down the line

WHY NOT?

Page 9: MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

9 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

GAME ENGINE CODE

3D audio

‒ Sound emitters (distance, directionality and size modeling)

‒ Sound listeners (mic and ear modeling)

‒ Sound geometry (collision meshes)

‒ Deeper physical modeling of sound propagation

‒ Simple ray casting (occlusion, obstruction, indirect audio)

‒ Advanced ray casting (diffraction, real-time individual early reflection tracking)

Physics

‒ Rigid body dynamics (collisions, friction, destruction)

‒ Fluid dynamics (turbulences)

Animation, special FX

‒ Inline audio sequencing and modulation

‒ Foley, coarse granular synthesis

THIN COMPUTE

Page 10: MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

10 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

CONCLUSIONS

HSA + hUMA is a great combo for high perf game audio!

‒ Maximized perf per W from specialized hardware (CPU + GPU + ACP)

‒ Our challenge is to figure out what to run where and when

ACP is a great fit for codecs and OS services

‒ But not for modular synthesis and highly customized DSP pipelines

GPU is great fit for mid/high latency DSP and high level 3D thin compute

‒ Indirect (reflected) audio

‒ Convolution reverb

‒ 3D ray casting for occlusion/obstruction/diffraction

CPU is still the best fit for everything else:

‒ Open modular synthesis frameworks and middleware

‒ Low latency audio

Page 11: MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

11 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

AUDIO SYNTHESIZER SCHEDULING IN HSA

Page 12: MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

12 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION

© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners.


Recommended