©20
03 H
ewle
tt-Pa
ckar
d D
evel
opm
ent C
ompa
ny, L
.P.
The
info
rmat
ion
cont
aine
d he
rein
is s
ubje
ct to
cha
nge
with
out n
otic
e
Perf
orm
ance
Mea
sure
men
ts o
f a
Use
r-Sp
ace
DA
FS S
erve
rw
ith a
Dat
abas
e W
orkl
oad
Sam
uel A
. Fin
eber
gD
on W
ilson
Non
Stop
Lab
s
page
2A
ugus
t 27,
200
3Fi
nebe
rg a
nd W
ilson
NIC
ELI P
rese
ntat
ion
Out
line
•Ba
ckgr
ound
on
DA
FS a
nd O
DM
•Pr
otot
ype
clie
nt a
nd s
erve
r•
I/O
tests
per
form
ed•
Raw
ben
chm
ark
resu
lts•
Ora
cle
TPC
-H re
sults
•Su
mm
ary
and
conc
lusi
ons
page
3A
ugus
t 27,
200
3Fi
nebe
rg a
nd W
ilson
NIC
ELI P
rese
ntat
ion
Wha
t is
the
Dire
ct A
cces
s Fi
le S
yste
m (D
AFS
)?
•C
reat
ed b
y th
e D
AFS
Col
labo
rativ
e–
Gro
up c
onsi
sting
of o
ver 8
0 m
embe
rs fr
om in
dustr
y, g
over
nmen
t, an
d ac
adem
ic in
stitu
tions
–D
AFS
1.0
spe
c w
as a
ppro
ved
in S
epte
mbe
r 200
1•
DA
FS is
a d
istri
bute
d fil
e ac
cess
pro
toco
l–
Dat
a re
ques
ted
from
file
s, n
ot b
lock
s–
Base
d lo
osel
y on
NFS
v4•
Opt
imiz
ed fo
r loc
al fi
le s
harin
g en
viro
nmen
ts–
Syste
ms
are
in re
lativ
ely
clos
e pr
oxim
ity–
Con
nect
ed b
y a
high
-spee
d lo
w-la
tenc
y ne
twor
k•
Built
on
top
of d
irect
-acc
ess
trans
port
netw
orks
–In
itial
ly ta
rget
ed a
t Virt
ual I
nter
face
Arc
hite
ctur
e (V
IA) n
etw
orks
–D
irect
Acc
ess
Tran
spor
t (D
AT)
API
was
late
r gen
eral
ized
and
por
ted
to
othe
r net
wor
ks (e
.g.,
Infin
iban
d, iW
arp)
page
4A
ugus
t 27,
200
3Fi
nebe
rg a
nd W
ilson
NIC
ELI P
rese
ntat
ion
Char
acte
ristic
s of
a D
irect
Acc
ess
Tran
spor
t
•C
onne
cted
mod
el, i
.e.,
VIs
mus
t be
conn
ecte
d be
fore
co
mm
unic
atio
n ca
n oc
cur
•Tw
o fo
rms
of d
ata
trans
port
–Se
nd/r
ecei
ve –
two-
side
d–
RDM
A re
ad a
nd w
rite
–on
e si
ded
•Bo
th tr
ansp
orts
supp
ort d
irect
dat
a pl
acem
ent
–Re
ceiv
es m
ust b
e pr
e-po
sted
•M
emor
y re
gion
s m
ust b
e “r
egis
tere
d” b
efor
e th
ey c
an b
e tra
nsfe
rred
thro
ugh
a D
AT
–Pi
ns d
ata
in p
hysi
cal m
emor
y–
Esta
blis
hes
VM n
slatio
nta
bles
for t
he N
IC
page
5A
ugus
t 27,
200
3Fi
nebe
rg a
nd W
ilson
NIC
ELI P
rese
ntat
ion
DA
FS D
etai
ls
•Se
ssio
n ba
sed
–D
AFS
clie
nts
initi
ate
sess
ions
with
a s
erve
r–
DA
T/VI
A c
onne
ctio
n is
ass
ocia
ted
with
a s
essi
on
•RP
C-li
ke C
omm
and
form
at–
Impl
emen
ted
with
sen
d/re
ceiv
e–
Serv
er “
rece
ives
” re
ques
ts “s
ent”
from
clie
nts
–Se
rver
“se
nds”
resp
onse
s to
be
“rec
eive
d” b
y cl
ient
•O
pen/
Clo
se–
Unl
ike
NFS
v2, f
iles
mus
t be
open
and
clo
sed
(not
sta
tele
ss)
•Re
ad/W
rite
I/O
“m
odes
”–
Inlin
e: li
mite
d am
ount
of d
ata
incl
uded
in re
ques
t/re
spon
se
–D
irect
: Ser
ver i
nitia
tes
RDM
A re
ad o
r writ
e to
mov
e da
ta
page
6A
ugus
t 27,
200
3Fi
nebe
rg a
nd W
ilson
NIC
ELI P
rese
ntat
ion
Inlin
e vs
. Dire
ct I/
OTime
Clie
ntSe
rver
Clie
ntSe
rver
Inlin
eD
irect
Resp
onse
Requ
est
disk
read
or w
rite
Requ
est
disk
writ
eRD
MA
read
Resp
onse
disk
read
RDM
A w
rite
Resp
onse
Requ
est
Dire
ct
writ
e
Dire
ct
read
Inlin
e
Read
or w
rite
page
7A
ugus
t 27,
200
3Fi
nebe
rg a
nd W
ilson
NIC
ELI P
rese
ntat
ion
Ora
cle
Dis
k M
anag
er (O
DM
)
•Fi
le a
cces
s in
terfa
ce s
pec
for t
he O
racl
e D
atab
ase
–Su
ppor
ted
as a
sta
ndar
d fe
atur
e in
Ora
cle
9i–
Impl
emen
ted
as a
ven
dor s
uppl
ied
DLL
–Fi
les
that
can
not
be
open
ed u
sing
OD
M u
se s
tand
ard
API
s•
Basi
c co
mm
ands
–Fi
les
are
crea
ted
and
pre-
allo
cate
d th
en c
omm
itted
–Fi
les
are
then
“id
entif
ied”
(ope
n) a
nd “
unid
entif
ied”
(clo
sed)
–A
ll r/
w I/
O u
ses
an a
sync
hron
ous
“odm
_io”
com
man
d•
I/O
s sp
ecifi
ed a
s de
scrip
tors
, mul
tiple
per
odm
_io
call
–M
ultip
le w
aitin
g m
echa
nism
s: w
ait f
or s
peci
fic, w
ait f
or a
ny
–O
ther
com
man
ds a
re s
ynch
rono
us, e
.g.,
resi
zing
page
8A
ugus
t 27,
200
3Fi
nebe
rg a
nd W
ilson
NIC
ELI P
rese
ntat
ion
Prot
otyp
e Cl
ient
/Ser
ver
•D
AFS
Ser
ver
–Im
plem
ente
d fo
r Win
dow
s 20
00 a
nd L
inux
(all
testi
ng w
as
on W
indo
ws)
–Bu
ilt o
n VI
PL 1
.0 u
sing
DA
FS 1
.0 S
DK
prot
ocol
stu
bs–
All
data
buf
fers
are
pre
-allo
cate
d an
d pr
e-re
giste
red
–D
ata-
driv
en m
ultit
hrea
ded
desi
gn•
OD
M C
lient
–Im
plem
ente
d as
a W
indo
ws
2000
dll
for O
racl
e 9i
–M
ultit
hrea
ded
to e
nabl
e de
coup
ling
of a
sync
hron
ous
I/O
fro
m O
racl
e th
read
s–
Inlin
e bu
ffers
are
cop
ied,
dire
ct b
uffe
rs a
re
regi
stere
d/de
regi
stere
d as
par
t of t
he I/
O–
Inlin
e/di
rect
thre
shol
d (s
et w
hen
libra
ry is
initi
aliz
ed)
page
9A
ugus
t 27,
200
3Fi
nebe
rg a
nd W
ilson
NIC
ELI P
rese
ntat
ion
Test
Sys
tem
Con
figur
atio
n
•G
oal w
as to
com
pare
loca
l I/O
with
DA
FS•
Loca
l I/O
con
figur
atio
n–
Sing
le s
yste
m ru
nnin
g O
racl
e on
loca
lly a
ttach
ed d
isks
•D
AFS
/OD
M I/
O c
onfig
urat
ion
–O
ne s
yste
m ru
nnin
g D
AFS
ser
ver s
oftw
are
with
loca
lly
atta
ched
dis
ks–
Seco
nd s
yste
m ru
nnin
g O
racl
e an
d O
DM
clie
nt, f
iles
on
DA
FS s
erve
r acc
esse
d us
ing
OD
M o
ver a
net
wor
k•
4-pr
oces
sor W
indo
ws
2000
ser
ver b
ased
sys
tem
s–
500M
Hz
Xeon
, 3G
B RA
M, d
ual-b
us P
CI 6
4/33
–Se
rver
Net
II (V
IA 1
.0 b
ased
) Sys
tem
Are
a N
etw
ork
–D
isks
wer
e 15
K RP
M a
ttach
ed b
y tw
o PC
I RA
ID c
ontro
llers
, co
nfig
ured
for R
AID
1/0
(mirr
ored
-strip
ed)
page
10
Aug
ust 2
7, 2
003
Fine
berg
and
Wils
on N
ICEL
I Pre
sent
atio
n
Expe
rimen
ts
•Ra
w I/
O T
ests
–O
dmbl
ast –
strea
min
g I/
O te
st–
Odm
lat –
I/O
late
ncy
test
–D
AFS
tests
use
d O
DM
dll
to a
cces
s fil
es o
n D
AFS
ser
ver
–Lo
cal t
ests
used
spe
cial
loca
l OD
M li
brar
y bu
ilt o
n W
indo
ws
unbu
ffere
d I/
O•
Ora
cle
data
base
test
–St
anda
rd T
PC-H
ben
chm
ark
–SQ
L ba
sed
deci
sion
sup
port
code
–D
AFS
tests
use
d O
DM
dll
to a
cces
s fil
es o
n D
AFS
ser
ver
–Lo
cal t
ests
used
ran
with
out O
DM
(Ora
cle
uses
win
dow
s un
buffe
red
I/O
dire
ctly
)
page
11
Aug
ust 2
7, 2
003
Fine
berg
and
Wils
on N
ICEL
I Pre
sent
atio
n
Odm
blas
t
•O
DM
bas
ed I/
O s
tress
test
–In
tend
ed to
pre
sent
a c
ontin
uous
load
to th
e I/
O s
yste
m–
Issue
s m
any
sim
ulta
neou
s I/
Os
(to a
llow
for p
ipel
inin
g)•
In o
ur e
xper
imen
ts, O
dmbl
ast s
tream
s 32
I/O
s to
ser
ver
–16
I/O
s pe
r odm
_io
call
–w
ait f
or I/
Os
from
the
prev
ious
odm
_io
call
•I/
Os
can
be re
ads,
writ
es, o
r a ra
ndom
mix
•I/
Os
can
be a
t seq
uent
ial o
r ran
dom
offs
ets
page
12
Aug
ust 2
7, 2
003
Fine
berg
and
Wils
on N
ICEL
I Pre
sent
atio
n
Odm
blas
t rea
d co
mpa
rison
0.0
50.0
100.
0
150.
0
200.
0
250.
0
020
0000
4000
0060
0000
8000
0010
0000
0
I/O S
ize
(byt
es)
Bandwidth (MB/sec)
Loca
l Seq
Rd
Loca
l Ran
d R
dD
AFS
Seq
Rd
DA
FS R
and
Rd
page
13
Aug
ust 2
7, 2
003
Fine
berg
and
Wils
on N
ICEL
I Pre
sent
atio
n
Odm
blas
t writ
e co
mpa
rison
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
90.0
100.
0
020
0000
4000
0060
0000
8000
0010
0000
0
I/O S
ize
(byt
es)
Bandwidth (MB/sec)
Loca
l Seq
Wr
Loca
l Ran
d W
rD
AFS
Seq
Wr
DA
FS R
and
Wr
page
14
Aug
ust 2
7, 2
003
Fine
berg
and
Wils
on N
ICEL
I Pre
sent
atio
n
Odm
lat
•I/
O L
aten
cy te
st–
How
long
doe
s a
sing
le I/
O ta
ke
•(n
ot n
eces
saril
y re
late
d to
agg
rega
te I/
O ra
te)
–Fo
r the
se e
xper
imen
ts, <
16K
= in
line,
≥ 1
6K =
dire
ct–
Der
ived
the
com
pone
nts
that
mak
e up
I/O
tim
e us
ing
linea
r re
gres
sion
–M
ore
deta
ils in
pap
er
page
15
Aug
ust 2
7, 2
003
Fine
berg
and
Wils
on N
ICEL
I Pre
sent
atio
n
Odm
lat p
erfo
rman
ce
0.0
1000
.0
2000
.0
3000
.0
016
384
3276
849
152
6553
6B
ytes
per
I/O
Ope
ratio
n
Time per Operation (microseconds)
Rea
d Ti
me
Writ
e Ti
me
page
16
Aug
ust 2
7, 2
003
Fine
berg
and
Wils
on N
ICEL
I Pre
sent
atio
n
Ora
cle-
base
d re
sults
•St
anda
rd D
atab
ase
Benc
hmar
k -T
PC-H
–W
ritte
n in
SQ
L–
Dec
isio
n su
ppor
t ben
chm
ark
–M
ultip
le a
d-ho
c qu
ery
strea
ms
with
an
“upd
ate
thre
ad”
–30
GB
data
base
siz
e•
Ora
cle
conf
igur
atio
n–
All
I/O
s 51
2-by
te a
ligne
d (re
quire
d fo
r unb
uffe
red
I/O
)–
16K
data
base
blo
ck s
ize
–D
atab
ase
files
dis
tribu
ted
acro
ss tw
o N
TFS
file
syste
ms
•M
easu
rem
ents
–C
ompa
red
aver
age
runt
ime
for l
ocal
vs.
DA
FS b
ased
I/O
–Sk
ippe
d of
ficia
l “TP
C-H
pow
er”
met
ric–
Varie
d in
line/
dire
ct th
resh
old
for D
AFS
bas
ed I/
O
page
17
Aug
ust 2
7, 2
003
Fine
berg
and
Wils
on N
ICEL
I Pre
sent
atio
n
Ora
cle
TPC-
H P
erfo
rman
ce
13:2
317
:13
14:3
9
loca
l D
AFS
16k
-di
rect
DA
FS 1
6k-
inlin
e
Time (Hrs:Min)
page
18
Aug
ust 2
7, 2
003
Fine
berg
and
Wils
on N
ICEL
I Pre
sent
atio
n
Ora
cle
TPC-
H O
pera
tion
Dis
trib
utio
n
16 K
Byte
Rea
d79
.4%
>16
KByt
e W
rite
1.1%
>16
KByt
e R
ead
19.1
%
16 K
Byte
Writ
e0.
3%
page
19
Aug
ust 2
7, 2
003
Fine
berg
and
Wils
on N
ICEL
I Pre
sent
atio
n
Ora
cle
TPC-
H C
PU U
tiliz
atio
n
0102030405060708090100
020
040
060
080
010
0012
00
Elap
sed
Tim
e (m
ins)
% CPU Used
Daf
s C
lient
(inl
ine
I/O)
DA
FS S
erve
r (in
line
I/O)
DA
FS S
erve
r (di
rect
I/O
)Lo
cal I
/OD
AFS
Clie
nt (d
irect
I/O
)
page
20
Aug
ust 2
7, 2
003
Fine
berg
and
Wils
on N
ICEL
I Pre
sent
atio
n
TPC-
H S
umm
ary
•Lo
cal I
/O s
till f
aste
r–
Limite
d Se
rver
Net
II b
andw
idth
–M
emor
y re
gistr
atio
n or
cop
ying
ove
rhea
d –
Win
dow
s un
buffe
red
I/O
is a
lread
y ve
ry e
ffici
ent
•D
AFS
stil
l has
mor
e ca
pabi
litie
s th
an lo
cal I
/O–
Cap
able
of c
luste
r I/O
(RA
C)
•M
emor
y re
gistr
atio
n is
stil
l a p
robl
em w
ith D
ATs
–Re
gistr
atio
n ca
chin
g ca
n be
pro
blem
atic
•C
an n
ot g
uara
ntee
add
ress
map
ping
s w
ill n
ot c
hang
e•
OD
M h
as n
o m
eans
for n
otify
ing
NIC
of m
appi
ng c
hang
es–
Nee
d ei
ther
bet
ter i
nteg
ratio
n of
I/O
libr
ary
with
Ora
cle
or
bette
r int
egra
tion
of O
S w
ith D
AT
•Tr
ansp
aren
cy is
exp
ensi
ve!
page
21
Aug
ust 2
7, 2
003
Fine
berg
and
Wils
on N
ICEL
I Pre
sent
atio
n
Conc
lusi
ons
•D
AFS
Ser
ver/
OD
M C
lient
ach
ieve
d pe
rform
ance
clo
se to
the
limits
of
our
net
wor
k–
Loca
l SC
SI I/
O w
as s
till f
aste
r•
Runn
ing
a da
taba
se b
ench
mar
k, D
AFS
TPC
-H p
erfo
rman
ce w
as
with
in 1
0% o
f loc
al I/
O–
Also
pro
vide
s ad
vant
ages
of a
net
wor
k fil
e sy
stem
(i.e
., cl
uste
ring
supp
ort)
•Lim
itatio
ns o
f our
tests
–Se
rver
Net
II ba
ndw
idth
was
inad
equa
te –
no s
uppo
rt fo
r mul
tiple
NIC
s–
Nee
ded
to d
o cl
ient
-side
regi
strat
ion
for a
ll di
rect
I/O
s•
TPC
-H b
ench
mar
k w
as n
ot o
ptim
ally
tune
d–
Nee
ded
to b
ring
clie
nt C
PU c
lose
r to
100%
•M
ore
disk
s, le
ss C
PUs,
oth
er tu
ning
–C
PU o
ffloa
d is
not
a b
enef
it if
I/O
is th
e bo
ttlen
eck
HP
logo