Parallel Random Generator - GDC 2015

transcript

Parallel Random Generator Manny Ko Principal Engineer Activision

Outline

●Serial RNG

●Background

●LCG, LFG, crypto-hash

●Parallel RNG

●Leapfrog, splitting, crypto-hash

RNG - desiderata

● White noise like

● Repeatable for any # of cores

● Fast

● Small storage

RNG Quality

● DIEHARD

● Spectral test

● SmallCrush

● BigCrush

GPUBBS

Power Spectrum

Power spectrum density Radial Mean Radial Variance

Serial RNG: LCG

● Linear-congruential (LCG)

● 𝑋𝑖 = 𝑎 ∗ 𝑋𝑖−1 + 𝑐 𝑚𝑜𝑑 𝑀,

● a, c and M must be chosen carefully!

● Never choose 𝑀 = 231! Should be a prime

● Park & Miller: 𝑎 = 16807, 𝑚 = 214748647 =231 − 1. 𝑚 is a Mersenne prime!

● Most likely in your C runtime

LCG: the good and bad

● Good:

● Simple and efficient even if we use mod

● Single word of state

● Bad:

● Short period – at most m

● Low-bits are correlated especially if 𝑚 = 2𝑛

● Pure serial

LCG - bad

● 𝑋𝑘_+1 = (3 ∗ 𝑋𝑘+4) 𝑚𝑜𝑑 8

● {1,7,1,7, … }

Mersenne Prime modulo

● IDIV can be 40~80 cycles for 32b/32b

● 𝑘 𝑚𝑜𝑑 𝑝 where 𝑝 = 2𝑠 − 1:

● 𝑖 = 𝑘 & 𝑝 + 𝑘 ≫ 𝑠 ;

● 𝑟𝑒𝑡 𝑖 ≥ 𝑝 ? 𝑖 − 𝑝 ∶ 𝑖;

Lagged-Fibonacci Generator

● 𝑋𝑖 = 𝑋𝑖−𝑝 ∗ 𝑋𝑖−𝑞; p and q are the lags ● ∗ is =-* mod M (or XOR);

● ALFG: 𝑋𝑛 = 𝑋𝑛−𝑗 + 𝑋𝑛−𝑘(𝑚𝑜𝑑 2𝑚)

● * give best quality

● Period = 2𝑝 − 1 2𝑏−3; 𝑀 = 2𝑏

● The good:

●Very efficient: 2 ops + power-of-2 mod

●Much Long period than LCG;

●Directly works in floats

●Higher quality than LCG

●ALFG can skip ahead

LFG – the bad

● Need to store max(p,q) floats

● Pure sequential –

● multiplicative LFG can’t jump ahead.

Mersenne Twister

● Gold standard ?

● Large state (624 ints)

● Lots of flops

● Hard to leapfrog

● Limited parallelism

power spectrum

● End of Basic RNG Overview

Parallel RNG

● Maintain the RNG’s quality

● Same result regardless of the # of cores

● Minimal state especially for gpu.

● Minimal correlation among the streams.

Random Tree

• 2 LCGs with different 𝑎

• L used to generate a seed for R

• No need to know how many generators or # of values #s per-thread

• GG

Leapfrog with 3 cores

• Each thread leaps ahead by 𝑁 using L

• Each thread use its own R to generate its own sequence

• 𝑁 = 𝑐𝑜𝑟𝑒𝑠 ∗ 𝑠𝑒𝑞𝑝𝑒𝑟𝑐𝑜𝑟𝑒

Leapfrog

● basic LCG without c:

● 𝐿𝑘+1 = 𝑎𝐿𝑘𝑚𝑜𝑑 𝑚

● 𝑅𝑘+1 = 𝑎𝑛𝑅𝑘 𝑚𝑜𝑑 𝑚

● LCG: 𝐴 = 𝑎𝑛and 𝐶 = 𝑐(𝑎𝑛 − 1)/(𝑎 − 1) – each core jumps ahead by n (# of cores)

Leapfrog with 3 cores

• Each sequence will not overlap

• Final sequence is the same as the serial code

Leapfrog – the good

● Same sequence as serial code

● Limited choice of RNG (e.g. no MLFG)

● No need to fix the # of random values used per core (need to fix ‘n’)

Leapfrog – the bad

● 𝑎𝑝no longer have the good qualities of 𝑎

● power-of-2 N produce correlated sub-sequences

● Need to fix ‘n’ - # of generators/sequences

● the period of the original RNG is shorten by a factor of ‘n’. 32 bit LCG has a short period to start with.

Sequence Splitting

• If we know the # of values per thread 𝑛

• 𝐿𝑘+1 = 𝑎𝑛𝐿𝑘 𝑚𝑜𝑑 𝑚 • 𝑅𝑘+1 = 𝑎𝑅𝑘𝑚𝑜𝑑 𝑚

• the sequence is a subset of the serial code

Leapfrog and Splitting

● Only guarantees the sequences are non-overlap; nothing about its quality

● Not invariant to degree of parallelism

● Result change when # cores change

● Serial and parallel code does not match

Lagged-Fibonacci Leapfrog

● LFG has very long period ● Period = 2𝑝 − 1 2𝑏−3; 𝑀 = 2𝑏

● 𝑀 can be power-of-two!

● Much better quality than LCG

● No leapfrog for the best variant – ‘*’

● Luckily the ALFG supports leapfrogging

Issues with Leapfrog & Splitting ● LCG’s period get even shorter

● Questionable quality

● ALFG is much better but have to store more state – for the ‘lag’.

Crypto Hash

● MD5

● TEA: tiny encryption algorithm

Core Idea

1. input trivially prepared in parallel, e.g. linear ramp

2. feed input value into hash, independently and in parallel

3. output white noise

output

● A Feistel coder

● Input is split into L and R

● 128B key

● F: shift and XORs or adds

Magic ‘delta’

● 𝑑𝑒𝑙𝑡𝑎 = 5 − 1 231

● Avalanche in 6 cycles (often in 4)

● * mixes better than ^ but makes TEA twice as slow

Applications

Fractal terrain

(vertex shader)

Texture tiling

(fragment shader)st

● Good package by Michael Mascagni

● http://www.sprng.org/

References ● [Mascagni 99] Some Methods for Parallel Pseudorandom Number Generation, 1999.

● [Park & Miller 88] Random Number Generators: Good Ones are hard to Find, CACM, 1988.

● [Pryor 94] Implementation of a Portable and Reproducible Parallel Pseudorandom Number Generator, SC, 1994

● [Tzeng & Li 08] Parallel White Noise Generation on a GPU via Cryptographic Hash, I3D, 2008

● [Wheeler 95] TEA, a tiny encryption algorithm, 1995.

Take Aways

● Look beyond LCG

● ALFG is worth a closer look

● Crypto-based hash is most promising – especially TEA.

Parallel Random Generator - GDC 2015

Data & Analytics