Bolt: Building a distributed ndarray
Jason Wittenbach Janelia Research Campus (HHMI) Freeman Lab
t, (x, y, z)
time ~ 104
space ~ 107
~ 1011 elements
~ 1 TB
(x, y, t)
(x, y, z, t)
(x, y, z, c, t)
(n, k)
neuroscienceastronomy
geospatial
climate science
• a distributed ndarray• built on PySpark• conforms to NumPy API
data.mean(axis=0)
data[2, 4:10]
data.T
(t, x, y, z) (x, y, z, t)
(v, w, x, y, z)
(v,w) yx z
(v,w,x)
(v, w, x, y, z)
yz
(v, w, x, y, z)
(v)
indexing slicingapply-along-axis
indexing
transpose
slicingapply-along-axis
reshape
indexing
transpose
map
slicingapply-along-axis
reshape
reduce filter
chunking padding
indexingfiltermap
(u,v) x
y
apply-along-axis
reduceByKey
(u,v) x
ymap
transpose
(u,v) x
y
map
transpose
(u,v) x
y
map
transpose
(u,v) x
y
shuffle
(t x,y,z) (x,y,z t)
(t x, y, z)
(t)x y
z
(t x, y, z)
(t)x y
z
(t x, y, z)
(t,chunk) (t,chunk)
(t,chunk)
chunking
transpose+ = shuffle
optimization
thanks join us!
Jeremy FreemanNicholas SofroniewAndrew Osheroff
Freeman Lab
Ken CarlileRobert Lines
Janelia ScientificComputing
bolt-project thunder-project
GitHub
@jsonWittenbach
Twitter