ScalableDataAnaly-cs:OntheRoleofStra-fiedDataSharding
SrinivasanParthasarathyDataMiningResearchLabOhioStateUniversity
TheDataDeluge:DataDataEverywhere
22
180 zettabytes will be created in 2025 [IDC report]
600$ to buy a disk drive that can store all of the world’s
music
3
[McKinsey Global Institute Special Report, June ’11]!
Data Storage is Cheap
Data does not exist in isolation.
4
Data almost always exists in connection with other data – integral
part of the value proposition.
5
“There’s gold in them there mountains of data” - Gill Press, Forbes Contributor
6
Social networks Protein Interactions Internet
VLSI networks Scientific Simulations Neighborhood graphs
7
Big Data Challenge: All this data is only useful if we can extractinterestingandactionableinformationfromlargecomplexdatastoresefficiently Projected to be a $200B industry in 2020. [IDC report]
MapReduce,MPI,Spark,etc.
DistributedDataProcessingisCentraltoAddressingtheBigDataChallenge
8
Source: blog.mayflower.de
Howeverdistributeddataprocessingitselfcanpose
challenges!
TheCaseforStra-fiedDataShardingofComplexBigData
KeyChallenge:DataPlacement(Sharding)• Localityofreference
– Placingrelateditemsinproximityimprovesefficiency
• Mi-ga-ngImpactofDataSkew– Cri-calforbigdataworkloads!
• Interac-veResponseTimes– Operateonasamplewithsta-s-calguarantees
• HeterogeneityandEnergyAware– Heterogeneouscomputeandstorageresourcesareubiquitous
Example–Imageretrieval
11
Query Image
Heavy Load Light Load No Load No Load
Query Image
Equal Load Equal Load Equal Load Equal Load
• Random partitioning • Load imbalance
• Stratified partitioning • Load imbalance mitigated
Stra-fiedSamplinginaSlide
• Roots in Stratified Sampling (Cochran’48)
• Group related data into “homogeneous strata”
• Sample each strata • Proportional
Allocation (shown) • Optimal Allocation
But,herewewanttopar--on/shard• For Locality
• Elements within a strata are placed together • For Mitigating Skew
• Each partition is a proportionally allocated stratified sample
• For Interactivity • Optimally allocate one partition • Proportionally allocate the rest
• Accounting for Energy/Heterogeneity • More on this later -- time permitting
Apps have varying requirements: ONE SIZE DOES NOT FIT ALL!
OurVision:Stra-fiedDataPlacement
KeyValueStorese.g.Memcached
Redis
MPI&Par--onedGlobalAddressesSpaceSystems
(PGAS)e.g.GlobalArrays
HADOOP/SHARK/Azure
(HDFS/RDD/Blob)
STRATIFIEDDATASHARDING&PLACEMENT
KeyChallenge:Crea-ngStrata(ofComplexData)
• WhataboutClustering?• Non-trivialfordatawithcomplexstructure• Poten-allyexpensive• Variablesizeden--es
• 4-stepapproach[ICDE’13]1. Convertcomplexdataintoa(mul--)setofpivotalelementsthat
capturefeatures-of-interest2. Computesketchofset(minwisehashing)3. Usesketchestogroupintostrata(sketchsort/sketchcluster)4. Par--onstrataaccordingtoapplica-onneeds(e.g.skew,
balance,locality)
Step1:Pivo-za-onProblem:Needtosimplifycomplexrepresenta-on.KeyIdea:ThinkGloballyActLocally• Setsoflocalizedfeaturesthatcollec-vely
capturesglobalpicture
Solu,on:SpecifictoData&Domain• Documents/Text
• Shingling[Broder1998]• Trees(XML,Linguis-cdata)
• Wedgepivots[Ta-konda’10]• Graphs(Web,Social,Molecules)
– Adjacencylists[Buehrer’08],WedgeDecomposi-ons[Seshadri’11],Graphlets[Pruzlj’09]
• Spa-al/vectordata• LSH[Indyk’99,Chariker’02,Satuluri’12]
• Images/Simula-on/Sequen-aldata• Kernels(Leslie’03),KLSH(Kulis’2010)
PIV
OT
TR
AN
SFO
RM
ATIO
NS
A
B C
L E
A
B C
L E F
.
.
.
.
Δ1
Δ25
DATA (Δ)
A
B C
A
F C
A
E C
A
F L
B
E F
A
E L
A
B L
A
B C
A
E C A
E L
A
B L
.
.
.
.
(PS-1)
(PS-25)
PIVOT SETS (PS)
Step2.Sketching
• Problem:Pivotsetsmaybevariablelength,similaritycomputa-onisexpensive:O(n^2)
• KeyIdea:UseSketching
• Solu,on:LocalitySensi-veHashing[Broder’98, Indyk’99, Charikar’01]– Resulting representation is fixed-
length (k)– Tradeoff: Representation Fidelity
vs. Sketch size– Can handle kernel functions
[Kulis’09] and statistical priors [Satuluri’12, Chakrabarti’15, ‘16]
A
B C
A
F C
A
E C
A
F L
B
E F
A
E L
A
B L
A
B C
A
E C A
E L
A
B L
.
.
.
.
(PS-1)
(PS-25)
PIVOT SETS (PS)
MIN
WIS
E H
AS
HIN
G o
n P
IVO
T S
ETS
{1050, 2020, 3130,1800} (SK-1) {1050, 2020, 7225, 2020} (SK-25)
.
.
.
.
.
. SKETCHES(SK)
18
Minwise Hashing (Broder et al 98)
{ dog, cat, lion, tiger, mouse}![ cat, mouse, lion, dog, tiger]![ lion, cat, mouse, dog, tiger]!
Universe
A = { mouse, lion }
mh1(A) = min ( { mouse, lion } ) = mouse!
mh2(A) = min ( { mouse, lion } ) = lion!
19
Key Fact
For two sets A, B, and a min-hash function mhi():
Unbiased estimator for Sim using k hashes:
Step3:Stra-fica-onProblem:Grouprelateden--esintostrataKeyIdea:InspiredbyW.Cochran’sworkonstra-fiedsampling[1940s]
Solu,ons:• Sortpivotsetsdirectly(skipsketch
step)–PivotSort• DirectlyuseoutputofLSH/Minwise
Hash–SketchSort• Clustersketcheswithfastvariantof
k-modes–SketchCluster
S
KE
TCH
SO
RT
or S
KE
TCH
CLU
STE
R
S-1 : : S-4 (Δ1, SK-1) (Δ5, SK-5) (Δ12,SK-12) (Δ25,SK-25) : : : S-5 : : : S-128 : : :
MIN
WIS
E H
AS
HIN
G o
n P
IVO
T S
ETS
{1050, 2020, 3130,1800} (SK-1) {1050, 2020, 7225, 2020} (SK-25)
.
.
.
.
.
. SKETCHES(SK)
Step4:ShardingandPlacement• Problem:Howtopar--onstra-fieddata?• KeyIdeas:Guidedbyapplica-onhintsandsystemstate.• Solu,ons:
1. Propor,onalAlloca,on:Spliteachstratumuniformlypropor-onallyacrossallpar--onsàmi-gatesskew
2. Op,malAlloca,onforfirststrata,propor-onalforrest[C77]
3. All-in-One:Placeeachstratuminitsen-retywithina
par--onIMPORTANTNOTE:Weusesketchestocreatestrata–butpar,,oninghappensonoriginaldata.
S
KE
TCH
SO
RT
OR
SK
ETC
HC
LUS
TER
S-1 : : S-4 (Δ1, SK-1) (Δ5, SK-5) (Δ12,SK-12) (Δ25,SK-25) : : : S-5 : : : S-128 : : :
PA
RTI
TIO
NIN
G &
RE
PLI
CAT
ION
P-1 : P-2 S-4 S-7 S-8 S-12 : S-128 P-3 : : : P-8 S-3 S-4 S-9S-12 : S-127
PIV
OT
TR
AN
SFO
RM
ATIO
NS
A
B C
L E
A
B C
L E F
.
.
.
.
Δ1
Δ25
DATA (Δ)
A
B C
A
F C
A
E C
A
F L
B
E F
A
E L
A
B L
A
B C
A
E C A
E L
A
B L
.
.
.
.
(PS-1)
(PS-25)
PIVOT SETS (PS)
MIN
WIS
E H
AS
HIN
G o
n P
IVO
T S
ETS
{1050, 2020, 3130,1800} (SK-1) {1050, 2020, 7225, 2020} (SK-25)
.
.
.
.
.
. SKETCHES(SK) Strata (S)
EmpiricalEvalua-on
• Wereportwallclock-mes• All-mesincludecostofplacement• Evalua-onsonseveralkeyanaly-ctasks
• Top-K algorithms [Fagin], Outlier Detection [Ghoting’08, Otey’06], Frequent Tree[Zaki’05, Tatikonda’09] and Graph Mining [Buehrer’06, Yan’02, Nijlsson’04], XML Indexing [Tatikonda’07], Community detection in Social/Biological data [Ucar’06, Satuluri’11], Web Graph Compression [Chellapilla’08-09; Vigna’11, LZ’77], Itemset Mining [Buehrer-Fuhry’15]
• Allapplica-onsarerunstraightoutofthebox–theonlythingtheuserspecifiesrelatestolocality,skew,andinterac-on.
FrequentTreeMining[Ta-konda’09]
• UsedWidely– Transac-ons,graphs,trees
• Approach1. DistributeData
• Propor-onalAlloca-on
2. RunPhase13. ExchangeMetaData4. RunPhase25. FinalReduc-on
• Shardingmainlyimpactssteps1-3.Steps3and5aresequen-al.
Proposedapproachesshows100Xgains
FTMPhase1:DrillingDown
•
• DataDependentWorkloadSkewismi-gated• Payload-awareshardinghelps!
WebGraphCompression[Vignaetal2011]
Cri-calapplica-onforsearchcompaniesKeyRequirement:LocalityApproach:• Distributedataviaplacement• Runcompressionalgorithm
inparallel• Parameters(similartoFTM)
– Useadjacency/trianglepivots– UseAll-in-onepar--oning
Asegwayanddrilldown(ICDE’15):LocalizedApproximateMiner(LAM)
• Firstboundedspace&-mepaoernminingalgorithm;O(|D|log|D|)
• Parameter-free• Scaleswithcomputeresources
– Near-linearincores&machines• Scaleswithdatasize
– Billionsoftransac-ons&items– E.g.67minononemachine;1minonacluster
• Twoparallelphases:Localize,ApproxMining
27
LAMPhase1:Localize[SketchSortandStra-fy!]
1,2,3,7 1,2,5,7,8 1,2,4 . . . 1,7,8,5,3,22
Dataset D = Min-wise hash Matrix M =
T1 T2 T3 . . . TN
.
.
.
1 . . . K
T1 T2 T3 TN
Sort T3 TN T1 T2
Local Partition 1 Local Partition 2 Local Partition 3 28
LAMPhase2:ApproxMiningI
LAMPhase2:ApproxMiningII
• Mined(p,tlist)pairsorderedbyu7lity• AddptopaoernsetP• IndatasetD,Removepfromeachrowintlist• Replacewithapointertopinthepaoernset• AppendPtoDandrunLAMagainonnewDIteratemul-ple-mesforbeoercompression
30
Experiments
• Ninetransac-onaldatasetsfromUCI,FIMI• CompareLAMtostate-of-the-art
– Krimp[Vreekenetal.2011]– Slim[Smetsetal.2012]– CDB-Hyper[Xiangetal.2008]
• Fivewebgraphdatasets(|V|~107,|E|~109)• PLAM(ParallelLAM):Clusterimplementa-on
• ComparetoClosedItemsetMining
• Compression,execu-on-me,scalability
31
UCI/FIMI:Compression
0.1
1
10
Accidents
AdultAnneal
Breast
Iris Kosarak
Mushroom
Pageblocks
Tic-Tac-Toe
Com
pres
sion
Rat
io (l
og s
cale
)
LAM 5KrimpSLIM
CDB Hyper
LAM achieves better compression on most datasets
32
UCI/FIMI:Execu-on-me
0.1
1
10
100
1000
10000
100000
Accidents Adult Anneal Kosarak Mushroom
Exec
utio
n Ti
me
(log
scal
e)LAM 5KrimpSLIM
CDB Hyper
LAM is one or more orders of magnitude faster
33
24x
Itemset results for various supports, grouped by set size.
34
Web:ComparingLAMtoClosedSets
1
10
100
1000
10000
100 1000
Exec
utio
n Ti
me
(sec
)
Support
Closed Sets Gen.Closed Sets Comp.
LAMPLAM8core
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
100 1000
Com
pres
sion
Rat
io
Support
Closed SetsLAM1Iteration
LAM5Iterations
35
Smallest web graph dataset EU2005: |V| = 863K, |E| = 19M
• For σ < 100, Closed Sets slow at generating patterns • Even slower at compressing • LAM produces better compression: 2x w/ 1 iter, 4x w/ 5
iter
Web:ComparingLAMtoClosedSets
36
Smallest web graph dataset EU2005: |V| = 863K, |E| = 19M
Larger datasets: Better results than closed sets, in less time.
Web:Scalability
20 40 60 80
100 120 140 160 180 200
50 100 150 200 250
Spee
dup
Machines
EU2005UK2006
ARABIC2005SK2005IT2004
2
3
4
5
6
7
8
1 1.5 2 2.5 3 3.5 4 4.5 5
Com
pres
sion
Rat
io
Pass Number
EU2005UK2006
ARABIC2005SK2005IT2004
• Near-linear scalability to hundreds of machines
• Compression ratios increase over multiple passes
37
LAM:thoughtsandfuturework
• Firstpaoernminingalgorithmtoruninlinearithmic-meinthesizeoftheinput
• LeversStra-fiedDataPar--oning.• Parameter-free–savesdomainexpert-me• Scalesnear-linearlyto
– Hundredsofcores&machines– Billionsoftransac-onsanditems
• Futurework:Canweextendsimilarideastotrees,graphsandsequences?
38
Energy-andHeterogeneity-AwarePar--oning(ICPP’17)
• ModernDatacentersareincreasinglyheterogeneous– Computa-on– Storage– GreenEnergyHarves-ng
• Shardingandplacementwhileaccoun-ngforheterogeneityischallenging– ParetoOp-malModel
OverviewofParetoFramework
ParetoFunc-on:Math
[Gori’11]
42
Evalua-on–ParetoFron-ers
Swiss (Tree Mining) RCV: Text Mining UK: Web Analytics
TakeHomeMessage• Intodaysanaly-csworlddatahascomplexstructure • Stratitifed Data Placement has a central role to play
– Over 2 orders of magnitude improvement over state-of-art for a multitude of analytic tasks. First to explore this idea for placement.
– Preliminary results on heterogeneous- energy-aware systems show significant promise!
KeyValueStorese.g.Memcached
Redis
MPI&Par--onedGlobalAddressesSpaceSystems
(PGAS)e.g.GlobalArrays
HADOOP/SHARK/Azure
(HDFS/RDD/Blob)
STRATIFIEDDATASHARDING&PLACEMENT
Thanks• Organizers• Audience• FormerStudents(whodidallthework!)
– YeWang(AirBnB),A.Chakrabarty(MSR),D.Fuhry(OSU).
• Fundingagencies– NSF,DOE,NIH,Google,IBM