White-BoxTestingofBigDataAnalyticswithComplexUser-DefinedFunctions
MuhammadAliGulzar 1 Shaghayegh Mardani 1 MadanMusuvathi2Miryung Kim1
1UniversityofCalifornia,LosAngeles2MircrosoftResearch
1
2
SoftwareDevelopmentCycleofBigDataAnalytics
InadequateTestingofBigDataAnalytics
1 Developlocally
2 TestlocallywithSampleData
3Executethejobonthecloudhopingthatitwouldwork
4Severhourslater,thejobcashesorproduceswrongoutput
5 GotoStep 2
Repeat
3
MotivatingExampleFindthetotalnumberoftripsmadefromUCLAusingapublictransport,apersonalvehicle,oronfoot.
TripsDataset(20GB)
#,ORIG,DEST,DIST,TIME1,90034,90024,10,12,90001,90024,16,1.4….
Zip,Location90034,“UCLA”90024,“Westwood”…
LocationsDataset(100MB)
BigDataApplicationinApacheSparkval trips = sc.textFile(“trips”)
.map { s => val c = s.split(","); (c(1), c(3).toInt / c(4).toInt)} val locations = sc.textFile(”zipcode”)
.map { s => val c= s.split(","); (c(0), c(1))}
.filter { s => s._2.equals(“UCLA") } val result= trips.join(locations).map { s =>
if (s._2._1 > 40) ("car", 1) else if (s._2._1 > 15) ("public", 1) else ("onfoot", 1)}
.reduceByKey(_ + _)
val trips = sc.textFile(“trips”) .map { s => val c = s.split(","); (c(1), c(3).toInt / c(4).toInt)}
val locations = sc.textFile(”zipcode”) .map { s => val c= s.split(","); (c(0), c(1))} .filter { s => s._2.equals(“UCLA") }
val result= trips.join(locations).map { s => if (s._2._1 > 40) ("car", 1)
else if (s._2._1 > 15) ("public", 1) else ("onfoot", 1)}
.reduceByKey(_ + _) 4
CharacteristicsofBigDataAnalytics
Relationalskeleton
Customlogicasuser-definedfunctions
Stringoperationsarecommon
Fluidinterchangebetweentypes
Howdowetestabigdataapplicationeffectivelyandefficiently?
5
Option1:SampleInputData
• randomsampling,
• topnsampling
• topk%sample,etc.
Limitations:
• Thesamplemayonlyexercisealimitedsetofprogrampaths (lowcodecoverage).
• Thesamplemaynotincludetheinputsleadingtoaprogramcrash.
• Alargesamplemayhavehighercoveragebutincreaselocaltestingtime.
6
Option2:TraditionalTestGenerationforJava
• BigDataAnalyticsprogramscompiletoJavabytecode
• Butthisincludestheentiresystem(700KLOCforApacheSpark)
• Symbolicexecutionwithoutabstractionisinfeasible andwouldnotscale
7
OurApproach:White-BoxTesting
sc.textFile("hdfs").map(s=> s.toInt).filter(w => w > 0)).reducebyKey(_+_)
Input:BigDataAnalyticsApplication
BigTestPC Input
X>0 X=“1”
X≤0 X=“0”
Output:TestInputData
1. DecomposerelationalskeletonandUDFs
2. Logicalspecificationsforrelationaloperators
3. SymbolicexecutionofUDFs
4. Generateinputsbyjointpathconstraints
8
ModellingDataflowOperators
Step1Decomposition
Step3:SymbolicExecution
Step2:LogicalSpecs
Step4:TestGeneration
Trips Zipcode
MapMap
Join:⨝
Map
ReduceByKey
Filter
9
ModellingDataflowOperators
Trips Zipcode
MapMap
Join:⨝
Map
ReduceByKey
FilterTrue False
Non-MatchingKeys
Non-MatchingKeys
• Handleterminating and non-terminatingcasesofdataflowoperators
• E.g.Join canintroduce3cases
• 2casesinwhichkeysfromrightandleftdonotmatch
• 1caseinwhichrightandleftkeysmatch
Step1Decomposition
Step3:SymbolicExecution
Step2:LogicalSpecs
Step4:TestGeneration
10
ModellingUser-definedFunctions
Trips Zipcode
MapMap
Join:⨝
Map
ReduceByKey
FilterTrue False
Non-MatchingKeys
Non-MatchingKeys
Decomposition UDFSE LogicalSpecs
Testgeneration
s.split(“,”).length > 2
V>40
=>
“car”
15<V≤40 V<15
=>
“public”
=>
“walk”
Step1Decomposition
Step3:SymbolicExecution
Step2:LogicalSpecs
Step4:TestGeneration
• Handlestrings,collections,andtuples
11
JoinDataflowandUDF(JDU)Path
Trips Zipcode
Map:𝑓map1
Map:𝑓map2
Filter:𝑓filter
Join:⨝
Map:𝑓map3
ReduceByKey:𝑓Agg
~𝑓filter(K2 ,V2)
T1
T4
FalseTrue
𝑓filter(K2,V2)⋀ K1 =K2
(K1 ,V1)(K2 ,V2)
(K1 ,(V1,V2))
(S,1)
(S,N)
𝑓filter(K2,V2)⋀K1 ∉ Zipcode
K1 ∉ Zipcode K2 ∉ Trips
𝑓filter(K2,V2)⋀K2 ∉ Trips
T2 T3
T Z
Z.split(“,”)[1]=“Palms” ⋀Z.split(“,”).length >1 ⋀
T.split(“,”)[1] = Z.split(“,”)[0] ⋀
T.split(“,”).length >1 ⋀ …
Step1Decomposition
Step3:SymbolicExecution
Step2:LogicalSpecs
Step4:TestGeneration
12
TestInputGeneration
Z.split(“,”)[1]=“Palms” ⋀Z.split(“,”).length >1 ⋀
T.split(“,”)[1] = Z.split(“,”)[0] ⋀
T.split(“,”).length >1 ⋀ …
(assert (= T (str.++ (str.++ line20 ",") line21))) (assert (= Z
(str.++ (str.++ " " ",") (str.++ (str.++ line11 ",")(str.++ (str.++ " " ",") (str.++ (str.++ line13 ",") line14))))))
(assert(and (not (= (str.to.int line14) 0)) (and (isinteger line14) (and (isinteger line13) (and (= "Palms" line21) (and (= x11 line20) (and (<= s21 15)(and (<= s21 40) (and (= s21 x621) (and (= s1 x61) (=
s22 x622))))))))))))))) (assert
(and (= x11 line11) (and (= x12 (/ (str.to.int line13) (str.to.int line14))) (and
(= x61 x11) (and (= x621 x12) (and (= x622 x42) (and (= x71 "walk") (= x72
1))))))))))))
Trips Location
_, "\x00", _, "0", "1" "\x00", "Palms"
GeneratedTestData
Step1Decomposition
Step3:SymbolicExecution
Step2:LogicalSpecs
Step4:TestGeneration
13
Evaluation
RQ1: HowmuchtestcoverageimprovementcanBigTest achieve?
RQ2: HowmanyfaultscanBigTest detect?• Webuiltthefirstbenchmarkoffaultydataflowprogramsbasedon
oursurveyofsuchprogramsonQ/Aforumse.g. StackOverflow .
RQ3: HowmuchtestdatareductiondoesBigTest provideandhowlongdoesBigTest taketogeneratetestdata?
14
ExperimentalSetting
• Weusesevensubjectprogramsfromearlierworks
• Allsubjectapplicationshavecomplexstring,complexarithmetic,Tupletypeforkey-valuepairs,andcollectionswithcustomlogic.
SubjectProgram Dataflow Operators #ofOperators
JDUPathsK=2
#ofUDFs
IncomeAggregate map,filter,reduce 3 6 4
MovieRatings map,filter,reduceByKey 4 5 4
AirportLayover map,filter,reduceByKey 3 14 4
CommuteType map,fitler,join,reduceByKey 6 11 5
PigMix-L2 map,join 5 4 6
GradeAnalysis flatmap,filter,reduceByKey,map 5 30 3
WordCount flatmap,map,reduceByKey 3 4 3
15
StudyofBigDataAnalyticsFaults
• Noexistingbenchmarkoffaultyapplications
• Westudythecharacteristicsofreal-worldbigdataanalyticsbugspostedonStackOverflow andApacheSparkMailingLists.
Community
SurveyStatisticsKeywordsSearched ApacheSparkexceptions,
taskerrors,failures,wrongoutputs
PostsStudied Top50
PostswithCodingErrors
23
CommonFaultTypes 7
TotalFaulty Programs 31
FaultTypes ExampleIncorrectStringOffset str.substring(1,0)
IncorrectColumnSelection str.split(“,”)[1]
Wrong Delimiters str.split(“\t”)[1]
IncorrectBranchCondition If(age>10 && age<9)
WrongJoin Type LeftOuterJoin
Key-ValueSwap (Value, Key)
Others Division by zero
16
RealWorldFaultInjection
• Identified7commoncodefaulttypes
• Manuallyinsertedthesefaultsintobenchmarks
• Leadstoatotalof31faultybigdataapplications.
val trips = sc.textFile(“trips”) .map { s =>
val c = s.split(","); (c(1), c(3).toInt / c(4).toInt)
} val loc = sc.textFile(”zipcode”) . . . .
val trips = sc.textFile(“trips”) .map { s =>
val c = s.split(","); - (c(1), c(2).toInt / c(4).toInt)} val loc = sc.textFile(”zipcode”) . . . .
c(5).
Afterinjectingfaultbasedonfaulttype ”IncorrectColumnSelection”,theprogramextractsthecolumnatindex5insteadof4.
OriginalProgram FaultyProgram
Sedge[ASE’13]generatesexamplesfordataflowprogramsbutithandlesaUDFasuninterpreted functionanddoesnotmodelitsinternals.
17
RQ1:CodeCoverage
RQ1 RQ2 RQ3
100 100 100 100 100 100 100
17
40
14 1825
1325
6760
29
55
75 77
100
0
20
40
60
80
100
IncomeAggregate
MovieRatings AirportLayover
CommuteType PigMixL2 GradeAnalysis WordCount
JDUPathCoverageonSubjectPrograms
BigTest Sedge EntireDataset
JDUPathCo
verage
Normalize
d
18
RQ1:CodeCoverage
RQ1 RQ2 RQ3
JDUPathCo
verage
Normalize
d
BigTest improvesJDUpathcoverageby78%againstSedgeand34%againsttheentiredataset.
100 100 100 100 100 100 100
17
40
14 1825
1325
6760
29
55
75 77
100
0
20
40
60
80
100
IncomeAggregate
MovieRatings AirportLayover
CommuteType PigMixL2 GradeAnalysis WordCount
JDUPathCoverageonSubjectPrograms
BigTest Sedge EntireDataset
19
RQ2:FaultDetectionCapability
RQ1 RQ2 RQ3
BigTest detects2XmorefaultsthanSedgebecauseitmodelstheinternalsemanticsofUDFswiththespecificationsofdataflowoperators.
Applications TotalSeededFaults
DetectedbyBigTest
DetectedbySedge 1 2 3 4 5 6 7
IncomeAggregate 3 3 1 ✓ NA NA ✓ NA NA ✓
MovieRating 6 6 6 ✓ ✓ ✓ ✓ NA ✓ ✓
AirportLayover 6 6 4 ✓ ✓ ✓ ✓ NA ✓ ✓
CommuteType 6 6 4 NA ✓ ✓ ✓ ✓ ✓ ✓
PigMix-L2 4 4 2 NA ✓ ✓ NA ✓ ✓ NA
GradeAnalysis 4 4 3 NA ✓ ✓ ✓ NA NA ✓
WordCount 2 2 0 NA ✓ NA NA NA NA ✓
InjectedFaultType
20
RQ3:TestSizeReductionRQ1 RQ2 RQ3
6 5 14 11 430
6
4.00E+09
5.21E+05
4.48E+08 3.20E+08 2.40E+08 4.00E+07 1.11E+08
1E+00
1E+02
1E+04
1E+06
1E+08
1E+10
IncomeAggregate
MovieRatings AirportLayover
CommuteType PigMixL2 GradeAnalysis WordCount
TestDatasetSize
BigTest EntireDataset
#ofRow
s
Comparedtotheentiredataset,BigTest achievesmoreJDUpathcoveragewith105Xto108Xsmallertestdata,translatinginto194Xtestingspeedup.
21
Summary
• NeedSEtoolsforbigdataanalyticsapplications
• BigTest providesexhaustive,automatic,andfast testing
• Contributions:
1. DemonstratedtheneedtointerpretUDFs
2. Modelstrings,collections,andtuples
3. Logicalspecificationsfordataflowoperatorshandlingterminatingandnonterminatingcases
4. ProvidethefirstsymbolicexecutionengineforApacheSpark/Scala
5. Presentastudyofbigdataanalyticsbugsandthefirstbugbenchmark
Publicallyavailableat:https://github.com/maligulzar/BigTest
22
RQ3:BreakdownofBigTest’s TestingTimeRQ1 RQ2 RQ3
4.7 0.6
66.5
12.7 0.3 8.5 0.33.7 3.8
3.5
3.9 3.8 3.82.6
2.2 2.9
4.2
6.4
3.95.3
1.8
0
20
40
60
80
IncomeAggregate
MovieRatings AirportLayoverCommuteType PigMixL2 GradeAnalysis WordCount
BreakdownofTestingTime
TheoremSolver ConstraintsGeneration Testing
Timeinse
cond
s
Byrunningtestslocally,BigTest improvesthetestingtime(CPUseconds)by194X,onaverage,comparedtotestingtheentiredataseton16-nodecluster.
23
InadequateTestGenerationToolsforBigDataAnalyticsTraditionalSoftwareTestGeneration BigDataAnalyticsTestGeneration
def concat(append: boolean, a:String, b: String ) {
result: String = null;If (append)result = a + b;return
result.toLowerCase();}
sc.textFile("hdfs").flatMap(s=> s.split(",")).map(w =>(w,1)).reducebyKey(_+_)
• Standaloneapplication• SymbolicExecutionCompatible• Welldefinedsemantics• Logicalexecutionissimilarto
physicalexecution
• Heavilydependsonframework• Non-existenceSymbolicExecutionfor
dataflowoperators• Newoperatorswithchangingsemantics• Logicalexecutionisdifferenttophysical
execution
24
ProgramDecomposition
• Challenge:DuetothecomplexityofDISCframeworks’code,symbolicexecutionisinfeasibleonDISCapplications.
• Insight:TheindividualUDFsofDISCapplicationarerelativelysmaller(<100LOC)makingsymbolicexecutionfeasible.
• Solution:WedecomposeaDISCapplicationusingASTanalysisintoasetofindividualUDFsanddataflowoperators.
. . .
.map { s => val c= s.split(",")(c(0), c(1))
}.filter {
s => s._2.equals("Palms") }. . .
class UDF_MAP{static void main(String args[]){
apply(null);}static Tuple2 apply(String s){
String[] arr = s.split(",");return Tuple2(arr[0], arr[1]);
} }
map
class UDF_FILTER{static void main(String args[]){
apply(null);}static Boolean apply(String s){
return s.equals(”Palms");}}
filter
Decomposition UDFSE LogicalSpecs
Testgeneration
25
SymbolicExecutionofUDFs
• Challenges:Strings,Collections,andObjectareeminentinDISCapplicationsbutnotfullysupportbysymbolicexecutiontooli.e JavaPathFinder.
• Insight:InDISCapplications,mostunboundedtypesareeventuallybounded.WeperformlazySEonsuchtypese.g Split(“,”) isunboundedArraybutSplit(“,”)[1] isbounded.
• Solution:UsingJPF,wesymbolicallyexecuteUDFsinisolationtogeneratedpathconstraintsandeffects.LoopsandArraysareboundedbyK=2.
class UDF{static void main(String args[]){
apply(null);}static Tuple2 apply(Tuple3 s){
if (s._2()._1() > 40)return Tuple2("car", 1);
else if (s._2()._1() > 15)return Tuple2("public", 1);
elsereturn Tuple2("onfoot", 1);
}}
FromStep1
sym>40
sym>15Car,1
public,1 onfoot,1
PathConstraints Effect
sym >40 Car,1
40≥sym>15 Public,1
sym ≤ 15 onfoot ,1
map
Decomposition UDFSE LogicalSpecs
Testgeneration
26
LogicalSpecificationsofDataflowOperators
• Challenges: DataflowoperatorsinDISCapplicationsareaccompaniedwith100Kslinesofframeworkcodemakingsymbolicexecutioninfeasible.
• Insight: Dataflowoperatorshavestandardsemanticsbutimplementeddifferentlyforoptimizationpurposes.
• Solution: Usingthesesemantics,weabstracttheirimplementationinlogicalspecificationsandusedthespecificationstotietogetherUDFs’symbolictrees.
FromStep2
map
map
filter
Join
LogicalSpecsofOperator
SymbolicTreeUDF map
map
filter
Join
map
Decomposition UDFSE LogicalSpecs
Testgeneration
map