AbstractionandPerformanceinDatabaseSystems
ChristophKochEPFLDATALab
Contents• Expressivenessvs.efficientevaluationofdeclarative
languages– HowGeorgshapedmeandthistalk
• Domain-specificlanguagesarehotacrosscomputerscience– DSLsvsdeclarativelanguages
• EpidemiologyofDatabasePeopleMissingBoatsDisorder(DMBD)– InDBsystems:TheScalabilityBlunder:NoSQL– InDBsystems:HowDSLsmakeDBperformanceworkmainstream…andfolklore.
– InDBtheory:WherearethePODSpeopleintheDSLrevolution?• Opportunities:Non-TuringcompleteDSLs&FMT• WhatIdo
MycollaborationwithGeorg• 20+jointpapersonexpressiveness,
complexity,andefficientevaluationofdeclarative/querylanguages.
• ThingsIlearnedfromGeorg:– Howtodoresearch,really– HowtowriteaPODSpaperJ– Usingdeclarativelanguagescreatively– Expressivenessvs.complexityisnota
zero-sumgame!– Onecan’tjustwritepapersandhavea
careerhere,butadvancehumanknowledge!
– Muchmore
Theyears0to13AG• Ididmoreworkondeclarativelanguages
– E.g.forprobabilisticdatabasesandvideogames• Imovedmoreintosystems
• HowcouldIcombinedeclarativelanguages,expressiveness/efficiencywithsystems?– Domain-specificlanguages– Databasesandcompilation/codegenerationforperformance.
• ThisiswhatIcurrentlymostlydo.
DeclarativelanguagesandDSLs• Domain-specificlanguages(DSLs)
– Engineeredlanguages– UsuallyTuring-complete– EmbeddedDSL:classicalPL(e.g.Java)+library(domain-specificvocabulary)
• SQLisaDSL(domain=databasequerying)– ButmostnewDSLsarenotverydeclarative.
• InTuring-completeDSLs:(compiler)optimizationstendtobelocalandsometimesbrittle.
DSLsarehot!• Motivation:notdeclarativitybutperformance
– CompensateforthefailureofDennardscalingandMoore’slaw.
– Wedon’tknowhowtobuildrobustoptimizingcompilerswithdeep/globaloptimizations.
– Consequence:Domain-specificcompilation–opportunitiesforautomaticsoftwarespecialization.
• PeoplealloverCSareflockingtoDSLs– Computerarchitecture.ASPLOS;Chisel,…– HPC&Graphics:OpenGL,Halide,…– Systems,databases:LegoBase,S-Store…
DSLsandcodegeneration• Softwarespecializationbycompilation.
• Staging/partialevaluation(e.g.specializeDBMScodeforagivenschema).
• DSLcompilerframeworksallowtoeasilyadddomain-specificcodeoptimizations.• Usageindomainmakesthemrobust.• Squid:github.com/epfldata/squid[Parreaux,Shaikhha,K.,GPCE2017,
Scala2017,POPL2018]
• Increasingly,DSLsenablecodegenerationthatmatchesoroutperformshumansystemsprogrammingexperts!• Observedinmultipledomains,e.g.lineartransforms[Spiral],OLAP
[LegoBase],OLTP[S-Store]• “AbstractionwithoutRegret”[Rompf&Odersky,CACM;K.,
CIDR2013]
S-StoreTPC-Cbenchmarkresults
8
OLTPX
Dashti,John,K.,2014
DSLsandtheroleofdatabaseresearch
• Relationaldatabasescreatedmanyfirsts.• SQLisstillthemostsuccessfulDSL• RDBMSshowshowtobuildanentiresystem,theentirestack,for
executingSQLefficiently.• Algebras,planlanguages,cost-basedoptimization,logicalvs.physical
datarepresentation;managingthememoryhierarchy,memhierarchy-awareoperatorimplementation.
– ThebasicpipelineandarchitectureisthefoundationofallmodernDSL-basedsystems.
– Somecreditisgiven(e.g.GraphLab),butthedatabasecontribsareincreasinglytakenasahistoricalfootnoteacrossCS.• Also,arewestillinnovatinginanysignificantway?
DSLsandtheroleofdatabaseresearch
Databaseperformancetechniquesarebecomingmainstream… andtheroleofdatabasesfadesaway.Intwoways:
– ThecontributionsoftheDBcommunityarebecomingahistoricalfootnote.
• Databaseideasstopbeingconsidereddatabaseideas.– Databasesfunctionalityisintegratedintootherkindsofsystems,andclassicalDBMSwillbeusedinfewerscenarios.
Example1:row/columnarrepresentations
• Muchhyped(M.Stonebraker).VariousDBMSbuilt– Vertica,SAPHana,…
• But:It’sCSfolklorenow.• Ubiquituousinprogrammingtools
– List:n+1objects– Pair:3objects– MakesahugeperformancedifferenceinOOruntimesystems,e.g.JVM–boxing/unboxingoverheads!!!
• HeavilyusedinHPC,graphics,ML,…
Example2:GRACEHashjoin
• Classicaldatabasecoursematerial.Seemsuniquelyaboutdatabases(?)
• Main-memDBcase:hashjoinbecomesthetrivalimplementation.
• GRACEhash-join=mainmemhashjoin+stagingforthememhierarchy.
• Memhierarchyconsiderationshavebynowbeenbetteranalyzed/addressedbythecompilers,computerarchitectureandHPCcommunities.– general/automaticalgotransformationtechniquesexist(looptiling&superoptimization;seeAhoetal.DragonBook2ndEd.Chapter11)
Acaseofmissingtheboat
• IsthereanythingaboutDBPerformancethatwon’tbeabsorbedintotheCSsystems/performancemainstream?
• Conjecture:No.• ExperienceintheDBLabproject(github.com/epfldata/
dblab)[Shaikhha,…,K.,VDLB2014,SIGMOD2016,TODS2018,JFP2018].– Wearebuildingalibraryofcompileroptimizationsfordata-intensivesystems,byabstractingfromadatabasesystem(LegoBase).
– Aftercleaningup,noneseemreallyspecifictodatabases.
• Thisisaproblemforthefutureofdatabaseresearch.
DatabasePeopleMissingBoatsDisorder(DMBD)–apandemic?
• Causes:– LackofcaretorecognizemajorCStrends(early)– Lackofefforttoabstract&generalizeresults– Cateringtoomuchtoreviewersinacalcified&brokensystemofconferences.
• Symptoms:Rectalpain,depression• Treatment:???
AnothercaseofMtBinDBsystems:NoSQL
• Therealwayswasdistributedandparalleldatabasesresearch.– Bannedfromfirst-ratepublicationvenues– Fewsystemsbuilt–not“sexy”enough.
• ThenGoogleandFacebookwantedscalabledatabases,andwecouldn’tofferthem.
• Consequencestoday:– Amassivelossofprestigeforourcommunity– Awidely-heldbeliefthatonehastolookforSOSPrather
thanSIGMODforgoodDBresearch.– GenuinecontributionsoftheDBcommunitydonotget
acknowledgedandcited,butreinvented.
AthirdMtBcase:DBTheory
• Estimated#ofPODSpaperstalkingofDSLs,ever:0
• Pub.venuesforfoundationalDSLwork:POPL,SIGGRAPH,ASPLOS,…– Citationin-degreeintoDBtheoryliterature:~0
Opportunities
• ManyresultsfromDBTheory,finitemodeltheoryonnon-TuringcompletelanguagescarryovertomodernDSLs.
• Peopleinotherdomainsdonotknowtheseresultsandfindthemexciting,whenappliedtotheirDSL.
• E.g.collectionprogramminglanguageslikeSparkareessentiallyjustnestedrelationalalgebra…
• MyexperienceataDSLsummerschool.
QuizFrom:K,“ExploitingDomain-SpecificKnowledge:[…]Part1:LessonsonDSLslearned
bytheDBcommunity”,DSLDesign&ImplementationSummerSchool,2016.ConsiderthefollowingDSL:• purelyfunctionalScala,with“if”astheonlycontrolstructure• TypesbuiltfromInt,List,andtuples• Listops:singletonconstr,emptylist,map(x=>…),flatten,listconcat++• Tupleconstruction(…)andprojection_i• (deep)equalitytest=;theidentityfunctionLetuscallthislanguage(Scala/List)MonadCalculus(MC)tohavealabel.Example:scala> val R = List(1)++List(2); val S = List(1)++List(3)R: List[Int] = List(1, 2)S: List[Int] = List(1, 3)
scala> R.map(r => S.map(s => if (r==s) List((r,s)) else List()).flatten).flattenres2: List[(Int, Int)] = List((1,1))
for(r
Quiz:WhatcanyoudoinMC?
R.map(r => S.map(s => if (r==s) List((r,s)) else List()).flatten).flatten
• Joins? ---yes• Arbitrary“conjunctivequeries” ---yes• ArbitrarySQLselect-from-wherequeries ---no,conditions( x==a) == List()• Aggregations:selectcount(*)from… ---no• Testingonorder/looksideways,sortingalistofintegers? ---no• Reachabilityinagraphgivenbytheedgerelation? ---no
Quiz:WhatcanyoudoinMC?
R.map(r => S.map(s => if (r==s) List((r,s)) else List()).flatten).flatten
• Doeseveryprogramterminate? ---yes• Howbigisthelargestvaluethancanbeproduced? ---polynomialininput• Howquicklydoeseveryprogterminate? ---PTIME• Allqueriesofrelationalalgebra ---yes!!!• Onlyqueriesexpressibleinrelationalalgebra ---yes%repr!!!!!!!!!!• Caneveryprogrambeparallelized? ---yes,fantastically
well!(AC0)--givenpolynomiallymuchhardware,everyprogramrunsinCONSTANTtime!!!!--ifyouhaveonlyconstantlymuchhardware=>BrentSchedulingPrinciple
Quiz:ExtendingMCR.map(r => S.map(s => if (r==s) List((r,s)) else List()).flatten).flatten
• Testingonorder/looksideways,sortingalistofintegers? ---no• List.mappreservesorderbutcan’t“query”it.• ButwhatifIwantaDSLthatcandothis?CouldaddList.foldLeft,andnothingelse.• Doeseveryprogramstillterminate?---yes• DoeseveryprogramstillruninPTIME?---no,nonelementary!
TheFO[X]DSLZoo
Databasetheoryworkthatweneedmoreof
1. Resultsoncomplexityandefficiencythatsystemspeoplecanunderstandtoberelevanttothem,andwhichcarryovertonewlanguages,e.g.– Georg’sworkonhypertreedecompositions– Resultcardinalitybounds–AGMbound– Worst-caseoptimaljoins– …
2. ResultsthatbridgethegapbetweenPODSandPOPL/SIGGRAPH/ASPLOSwork.
Summary
• TrynottomisstheDSLboat.
• Ifthisadviceisusefultoyou,youultimatelyhaveGeorgtothankforitJ