Post on 01-Aug-2020
transcript
Inte
rpre
ting
a ge
netic
pro
gram
min
g po
pula
tion
with
nV
idia
C+
+ C
UD
A
W. B
. Lan
gdon
CR
ES
T la
b,D
epar
tmen
t of C
ompu
ter
Sci
ence
2
Intr
oduc
tion
•G
ener
al P
urpo
se u
se o
f GP
U (
GP
GP
U)
and
why
we
care
•G
enet
ic P
rogr
amm
ing
(GP
).•
Run
ning
man
y pr
ogra
ms
on g
raph
ics
hard
war
e de
sign
ed fo
r a
sing
le p
rogr
am
oper
atin
g on
man
y da
ta in
par
alle
l (S
IMD
).•
Sim
ulta
neou
sly
runn
ing
¼m
illio
n pr
ogra
ms
•A
ctua
l spe
ed 2
12 b
illio
n G
P o
ps /s
econ
d•
Less
ons
W. B
. Lan
gdon
, Kin
g's
Lond
on
Why
Inte
rest
in G
raph
ics
Car
ds
•S
peed
–80
0 0.
8Ghz
CP
Us
–E
ven
with
mul
ti-th
read
ing
off-
chip
mem
ory
bottl
enec
k m
eans
diff
icul
t to
keep
C
PU
s bu
sy
•F
utur
e sp
eed
–F
aste
r th
an M
oore
’s la
w–
nVid
ia a
nd A
MD
/AT
I cla
im
doub
ling
12m
onth
s
3W
. B. L
angd
on, K
ing'
s Lo
ndon
Gen
etic
Pro
gra
mm
ing
=S
urv
ival
of
the
Fit
test
pro
gra
ms
•A
Dar
win
ian
popu
latio
n of
pro
gram
s ev
olve
s un
der
“nat
ural
sel
ectio
n”–
Fitn
ess
is d
eter
min
ed b
y ru
nnin
g th
em–
Bet
ter
prog
ram
s ar
e se
lect
ed to
be
pare
nts
–C
hild
pro
gram
s cr
eate
d by
sex
ual r
ecom
bina
tion
–R
epea
t unt
il so
lutio
n fo
und
Tre
e (A
-10)
*BF
ree
5
Gen
eral
Pu
rpo
se G
PU
So
ftw
are
Op
tio
ns
•M
icro
soft
Res
earc
h w
indo
ws/
Dire
ctX
[200
7]•
Bro
okG
PU
sta
nfor
d.ed
u•
GP
U s
peci
fic a
ssem
bler
s •
nVid
ia C
UD
A [E
uroG
P 2
008]
•nV
idia
Cg
[GE
CC
O 2
007]
•
Pea
kStr
eam
•S
h no
long
er a
ctiv
e. R
epla
ced
by R
apid
Min
d [E
uroG
P 2
008]
•O
penC
L
Mos
t sof
twar
e ai
med
at g
raph
ics.
Inte
rest
in u
sing
th
em (
and
CE
LL p
roce
ssor
s, X
Box
, PS
3, g
ame
cons
oles
) fo
r ge
nera
l pur
pose
com
putin
g.
6
Mis
sin
g, F
utu
re G
PG
PU
•U
ntim
ely
deat
h of
tool
s.–
Too
ls n
o lo
nger
sup
port
ed–
Too
ls s
uper
ceed
ed o
r be
com
e m
ore
com
mer
cial
–H
ardw
are
rapi
d tu
rn o
ver
•T
he “
othe
r”G
PU
man
ufac
ture
r A
MD
/AT
I•
Ope
nCL
•C
ompl
ete
GP
on
GP
U (
smal
l gai
n?)
•A
pplic
atio
ns w
ith fi
tnes
s ev
alua
tion
on G
PU
•Im
prov
ed d
ebug
and
per
form
ance
mon
itorin
g
6W
. B. L
angd
on, K
ing'
s Lo
ndon
7
nVid
ia G
80 H
ardw
are
•C
onne
ctio
n to
hos
t PC
com
pute
r•
Mem
ory
heira
rchy
–O
n ch
ip: R
egis
ters
, sha
red,
con
stan
ts
–O
ff ch
ip: G
loba
l and
“lo
cal”
•S
ched
ulin
g th
read
s•
How
man
y th
read
s?
W. B
. Lan
gdon
, Kin
g's
Lond
on
early
nVid
ia t1
0p T
esla
192
Str
eam
Pro
cess
ors
Clo
ck 1
.08
GH
z <
Tflo
p (m
ax!)
1 G
Byt
e
8W
. B. L
angd
on, K
ing'
s Lo
ndon
Ava
ilabl
e 24
0 1.
5GH
z4
toge
ther
16
GB
ytes
10½
×4⅜
inch
es
Tes
la c
hip
conn
ectio
ns
Linu
x P
C
Mem
ory,
GP
U c
hip,
etc
. All
on o
ne c
ard
9W
. B. L
angd
on, K
ing'
s Lo
ndon
CU
DA
dat
a m
emor
y he
irarc
hyLi
nux
PC
Eac
h st
ream
pro
cess
or h
as it
s ow
n re
gist
ers
24 S
P s
hare
16k
. Rea
d/w
rite
cont
entio
n de
lays
thre
ads.
64k
can
be r
ead
by a
ll 8
(10)
blo
cks
of S
P
with
out c
onte
ntio
n de
lays
.
Bot
h C
UD
A “
loca
l”an
d “g
loba
l”va
riabl
es a
re
off c
hip.
Lat
ency
hun
dred
tim
es m
ore
than
on
chip
. Mus
t hav
e th
ousa
nds
of th
read
s to
kee
p S
P b
usy.
Pro
gram
mer
res
pons
ible
for
divi
ding
mem
ory
betw
een
thre
ads
and
sync
roni
satio
n.
Rol
e of
cac
hes
uncl
ear.
10
Meg
a T
hrea
ding
Eac
h bl
ock
of 2
4 st
ream
pro
cess
ors
runs
up
to 2
4 th
read
s of
the
sam
e pr
ogra
m.
Eac
h th
read
exe
cute
s th
e sa
me
inst
ruct
ion.
Whe
n pr
ogra
m b
ranc
hes,
som
e th
read
s ad
vanc
e an
d ot
hers
are
hel
d.La
ter
the
othe
r br
anch
es a
re r
un to
cat
ch u
p.
If th
read
is b
lock
ed w
aitin
g fo
r of
f chi
p m
emor
y an
othe
r se
t of t
hrea
ds c
an b
e st
arte
d.
New
thre
ads
coul
d be
from
ano
ther
pro
gram
.
12
Per
form
ance
v th
read
s
Spe
ed
(log
scal
e)
Thr
eads
(lo
g sc
ale)
W. B
. Lan
gdon
, Kin
g's
Lond
on13
Per
form
ance
v th
read
s 2
•G
raph
em
phas
ises
the
impo
rtan
ce o
f usi
ng
man
y th
read
s (m
inim
um 4
000)
. •
Whe
n a
thre
ad s
talls
bec
ause
of w
aitin
g fo
r of
f ch
ip m
emor
y an
othe
r th
read
is a
utom
atic
ally
sc
hedu
led
if re
ady.
Thu
s ac
cess
to G
PU
’s m
ain
mem
ory
bottl
e ne
ck c
an b
e ov
er c
ome.
•A
t lea
st 2
0 th
read
s pe
r st
ream
pro
cess
or–
57 1
09G
P o
p/se
c w
ith 1
2 th
read
s pe
r S
P–
88 1
09G
P o
p/se
c w
ith 1
5 th
read
s pe
r S
P
14
Exp
erim
ents
•In
terp
rets
88
billi
on G
P p
rimiti
ves
per
seco
nd
(212
sus
tain
ed p
eak)
•H
ow?
–ea
ch in
tege
r co
ntai
ns 3
2 B
oole
ans
–ra
ndom
ised
test
cas
e se
lect
ion
–si
mpl
e C
UD
A r
ever
se p
olis
h in
terp
rete
r
•20
mux
sol
ved
•37
mux
sol
ved.
137
bill
ion
test
cas
es
W. B
. Lan
gdon
, Kin
g's
Lond
on
Boo
lean
Mul
tiple
xor
d =
2a
n =
a +
dN
um te
st c
ases
= 2
n
20-m
ux 1
mill
ion
test
cas
es37
-mux
137
109
test
s
16
Sub
mac
hine
Cod
e G
P•
int c
onta
ins
32 b
its. T
reat
eac
h as
Boo
lean
.•
32 B
oole
an (
and
or, n
or, i
f, no
t etc
) do
ne
sim
ulta
neou
sly.
•Lo
ad 3
2 te
st c
ases
–D
0 =
0101
0101
…–
D1
=00
1100
11…
–D
N=
000
0000
0…or
111
1111
1…
•C
heck
32
answ
ers
sim
ulta
neou
sly
•C
PU
spe
ed u
p 24
fold
(32
) 60
fold
(64
bits
)
W. B
. Lan
gdon
, Kin
g's
Lond
on
17
Ran
dom
ised
Sam
ples
•20
-mux
204
8 of
1 m
illio
n
(2 1
0-3)
•37
-mux
819
2 of
137
bill
ion
(8 1
0-6)
•S
ame
test
s fo
r al
l fou
r pr
ogra
ms
in e
ach
sele
ctio
n to
urna
men
t•
New
test
s fo
r ne
w g
ener
atio
n an
d ea
ch
tour
nam
ent
•(S
tatis
tical
sig
nific
ance
test
not
nee
ded)
W. B
. Lan
gdon
, Kin
g's
Lond
on
18
Sin
gle
Inst
ruct
ion
Mul
tiple
Dat
a
•G
PU
des
igne
d fo
r gr
aphi
cs•
Sam
e op
erat
ion
done
on
man
y ob
ject
s–
Eg
appe
aran
ce o
f man
y tr
iang
les,
di
ffere
nt s
hape
s, o
rient
atio
ns,
dist
ance
s, s
urfa
ces
–O
ne p
rogr
am, m
any
data
→S
impl
e (f
ast)
par
alle
l dat
a st
ream
s
•H
ow to
run
man
y pr
ogra
ms
on
SIM
D c
ompu
ter?
Inte
rpre
ting
man
y pr
ogra
ms
sim
ulta
neou
sly
•C
an c
ompi
le G
P fo
r G
PU
on
host
PC
. T
hen
run
one
prog
ram
on
mul
tiple
da
ta (
trai
ning
cas
es).
•A
void
com
pila
tion
by
inte
rpre
ting
tree
•R
un s
ingl
e S
IMD
in
terp
rete
r on
GP
U o
n m
any
tree
s.
20
GP
U G
enet
ic P
rogr
amm
ing
Inte
rpre
ter
•P
rogr
ams
wai
t for
the
inte
rpre
ter
to
offe
r an
inst
ruct
ion
they
nee
d ev
alua
ting.
•
For
exa
mpl
e an
add
ition
. –
Whe
n th
e in
terp
rete
r w
ants
to d
o an
ad
ditio
n, e
very
one
in th
e w
hole
pop
ulat
ion
who
is w
aitin
g fo
r ad
ditio
n is
eva
luat
ed.
–T
he o
pera
tion
is ig
nore
d by
eve
ryon
e el
se.
–T
hey
then
indi
vidu
ally
wai
t for
thei
r ne
xt
inst
ruct
ion.
•T
he in
terp
rete
r m
oves
on
to it
s ne
xt
oper
atio
n.
•T
he in
terp
rete
r ru
ns r
ound
its
loop
un
til th
e w
hole
pop
ulat
ion
has
been
in
terp
rete
d.
21
•D
ata
is p
ushe
d on
to s
tack
bef
ore
oper
atio
ns p
op
them
(i.e
. rev
erse
pol
ish.
x+
y →
)•
The
tree
is s
tore
d as
line
ar e
xpre
ssio
n in
rev
erse
po
lish.
•
Sam
e st
ruct
ure
on h
ost a
s G
PU
. –
Avo
id e
xplic
it fo
rmat
con
vers
ion
whe
n po
pula
tion
is
load
ed o
nto
GP
U.
•G
enet
ic o
pera
tions
act
on
reve
rse
polis
h:–
rand
om tr
ee g
ener
atio
n (e
g ra
mpe
d-ha
lf-an
d-ha
lf)
–su
btre
e cr
osso
ver
–2
type
s of
mut
atio
n•
Req
uire
s on
ly o
ne b
yte
per
leaf
or
func
tion.
–
So
larg
e po
pula
tions
(m
illio
ns o
f ind
ivid
uals
) ar
e po
ssib
le.
Rep
rese
ntin
g th
e P
opul
atio
n
xy
+
22
Rev
erse
Pol
ish
Inte
rpre
ter
(A-1
0)×
B≡
A 1
0 -
B×
Var
iabl
e: p
ush
onto
sta
ckF
unct
ion
pop
argu
men
ts, d
o op
erat
ion,
pus
h re
sult
1 st
ack
per
prog
ram
. All
stac
ks in
sha
red
mem
ory. 22
RP
N in
terp
rete
rin
t SP
= 0
;fo
r(un
sign
ed in
t PC
= 0
;; P
C+
+){
Rea
d op
code
from
glo
bal/c
onst
ant
cons
t int
type
= O
PC
OD
E>
>5;
if(O
PC
OD
E=
=O
PN
OP
) br
eak;
if(ty
pe=
=le
af)
push
(tra
inin
gdat
a);
else
{//f
unct
ion
cons
t uns
igne
d in
t sp1
= s
tack
(SP
-1);
cons
t uns
igne
d in
t sp2
= s
tack
(SP
-2);
SP
-=
2;
if(
ty
pe=
=O
PA
ND
) pu
sh(A
ND
(sp1
,sp2
));
else
if(t
ype=
=O
PO
R)
push
( O
R(s
p1,s
p2))
;if(
OP
CO
DE
==
OP
NA
ND
) ||
OP
CO
DE
==
OP
NO
R)
stac
k(S
P-1
) =
~st
ack(
SP
-1);
} }
W. B
. Lan
gdon
, Kin
g's
Lond
on24
Val
idat
ion
•S
olut
ions
run
on
all t
est c
ases
in G
PU
.•
Evo
lved
20-
mux
and
37-
mux
exp
ress
ions
co
nver
ted
to C
cod
e, c
ompi
led
and
run
agai
nst a
ll te
sts
25
Per
form
ance
v R
PN
siz
e
W. B
. Lan
gdon
, Kin
g's
Lond
on
W. B
. Lan
gdon
, Kin
g's
Lond
on26
Per
form
ance
•nV
idia
ear
ly e
ngin
eerin
g sa
mpl
e (
192
SP
)•
88 1
09G
P o
pera
tions
/sec
ond
(pea
k 21
2)•
In v
alid
atio
n st
ep g
et b
ig im
prov
emen
t
(88
109→
212
109
GP
ops)
by
usin
g “c
onst
ant”
mem
ory
•10
0 tim
es [C
IGP
U 2
008]
•ha
rdw
are
sim
ilar
nVid
ia G
eFor
ce 8
800
GT
X (
128
SP
)
27
Less
ons
•C
ompu
tatio
n is
che
ap. D
ata
is e
xpen
sive
. •
Sug
gest
inte
rpre
ting
GP
tree
s on
the
GP
U is
do
min
ated
by
leaf
s:–
sinc
e th
ere
are
lots
of t
hem
and
typi
cally
they
req
uire
da
ta tr
ansf
ers
acro
ss th
e G
PU
.–
addi
ng m
ore
func
tions
will
slo
w in
terp
rete
r le
ss th
an
mig
ht h
ave
been
exp
ecte
d.
•T
o ge
t the
bes
t of t
he G
PU
it n
eeds
to b
e gi
ven
larg
e ch
unks
of w
ork
to d
o:–
Aim
for
at le
ast o
ne s
econ
d–
GeF
orce
: mor
e th
an 1
0 se
cond
s an
d Li
nux
dies
•S
olve
d by
not
usi
ng G
PU
as
mai
n vi
deo
inte
rfac
e??
–Le
ss th
an 1
mill
isec
Lin
ux ta
sk s
witc
hing
dom
inat
es
•P
oor
debu
g, p
erfo
rman
ce to
ols
Dis
cuss
ion
•In
terp
rete
r fa
ster
than
com
pile
d G
P–
How
ever
usi
ng m
odes
t num
ber
of te
st c
ases
(81
92)
•32
/64-
bit s
uita
ble
for
Boo
lean
pro
blem
s. A
lso
used
in r
egre
ssio
n pr
oble
ms
(8 b
it re
solu
tion)
, gr
aphi
cs a
nd o
ptic
al c
hara
cter
rec
ogni
tion
(OC
R)
•S
peed
up
due
to 3
2bits
and
CU
DA
•M
ain
bottl
e ne
ck is
acc
ess
to G
PU
’s m
ain
mem
ory.
But
GP
pop
allo
ws
man
y th
read
s.
•N
o on
-chi
p lo
cal a
rray
s; s
tack
in s
hare
d m
emor
y–
Lim
its n
umbe
r of
thre
ads
to 2
40.
28
Con
clus
ions
•G
PU
offe
rs h
uge
pow
er o
n yo
ur d
esk
•In
terp
rete
d ge
netic
pro
gram
min
g (G
P)
can
effe
ctiv
ely
use
grap
hics
car
ds a
nd T
esla
•
88 b
illio
n G
P o
pera
tions
per
sec
ond
(cf
. 0.
8 at
CIG
PU
-200
8)
•T
esla
firs
t to
solv
e tw
o G
P b
ench
mar
ks–
20 m
ux s
olve
d (<
1 ho
ur v
. >4
year
s)–
37 m
ux s
olve
d. 1
37 b
illio
n te
st c
ases
. <1d
ay
•C
ode
via
FT
P
Tec
hnic
al r
epor
t TR
-09-
05W
. B. L
angd
on, K
ing'
s Lo
ndon
3030
EN
D
W. B
. Lan
gdon
, Kin
g's
Lond
on
•M
ovie
s of
evo
lvin
g po
pula
tions
–E
volv
ing π
http
://w
ww
.cs.
ucl.a
c.uk
/sta
ff/W
.Lan
gdon
/pi_
mov
ie.g
ifht
tp://
ww
w.c
s.uc
l.ac.
uk/s
taff/
W.L
angd
on/p
i2_m
ovie
.htm
l
–E
volv
ing
Pro
tein
Pre
dict
ion
Pop
=M
illio
n 10
00ge
ns
http
://w
ww
.cs.
mun
.ca/
~bl
angd
on/g
pu_g
p_sl
ides
/nuc
lear
.gif
•O
ther
use
ful w
eb s
ites
–G
P b
iblio
grap
hy–
gpgp
u.or
g–
GP
gpgp
u.co
m–
nvid
ia.c
om/c
uda
31W
. B. L
angd
on, K
ing'
s Lo
ndon
Spe
ed o
f GP
U in
terp
rete
rG
eFor
ce 8
800
GT
X.
Exp
erim
ent
Nu
mb
er o
f T
erm
inal
s|F
|P
op
ula
tio
nP
rog
ram
si
zeS
tack
d
epth
Tes
t ca
ses
Sp
eed
(m
illio
n
OP
s/se
c)
Mac
key-
Gla
ss8+
128
420
4 80
011
.04
1200
895
Mac
key-
Gla
ss8+
128
420
4 80
013
.04
1200
1056
Pro
tein
20+1
284
1 04
8 57
656
.98
200
504
Lase
r a3+
128
418
225
55.4
815
1 36
065
6
Lase
r b9+
128
45
000
49.6
837
6 64
019
0
Can
cer
1 01
3 88
8+10
014
5 24
2 88
0≤15
.04
128
535
Gen
eChi
p47
+100
16
16 3
84≤
63.0
8⅓
M,
sam
ple
200
314
CU
DA
2.8
bill
ion
[200
9] 3
.8 b
illio
nC
ompi
led
on 1
6 m
ac 4
.2 b
illio
n (1
00 1
06da
ta p
oint
s)
Exa
mpl
es•
App
roxi
mat
ing
Pi
•C
haot
ic T
ime
Ser
ies
Pre
dict
ion
•M
ega
popu
latio
n. B
ioin
form
atic
s pr
otei
n cl
assi
ficat
ion
•Is
pro
tein
nuc
lear
bas
ed o
n nu
m o
f 20
amin
o ac
ids
•P
redi
ctin
g B
reas
t Can
cer
fata
litie
s•
HG
-U13
3A/B
pro
bes →
10ye
ar o
utco
me
•P
redi
ctin
g pr
oble
ms
with
DN
A G
eneC
hips
•H
G-U
133A
cor
rela
tion
betw
een
prob
es in
pr
obes
ets →
MM
, A/G
rat
io a
nd A
×C
33W
. B. L
angd
on, K
ing'
s Lo
ndon