+ All Categories
Home > Documents > The MyriaBig Data Management and Analytics System and...

The MyriaBig Data Management and Analytics System and...

Date post: 04-Mar-2021
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
35
Jingjing Wang, Tobin Baker, Magdalena Balazinska, Daniel Halperin, Brandon Haynes, Bill Howe, Dylan Hutchison, Shrainik Jain, Ryan Maas, Parmita Mehta, Dominik Moritz, Brandon Myers, Jennifer Ortiz, Dan Suciu, Andrew Whitaker, Shengliang Xu DEPARTMENT OF COMPUTER SCIENCE &ENGINEERING UNIVERSITY OF W ASHINGTON http://myria.cs.washington.edu The Myria Big Data Management and Analytics System and Cloud Service
Transcript
Page 1: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Jingjing Wang,TobinBaker,MagdalenaBalazinska,DanielHalperin,BrandonHaynes,BillHowe,DylanHutchison,Shrainik Jain,RyanMaas,Parmita Mehta,DominikMoritz,BrandonMyers, JenniferOrtiz,Dan

Suciu,AndrewWhitaker,Shengliang XuDEPARTMENT OF COMPUTER SCIENCE &ENGINEERING

UNIVERSITY OF WASHINGTONhttp://myria.cs.washington.edu

TheMyria BigDataManagementandAnalyticsSystemandCloudService

Page 2: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Acknowledgments

TheMyria Team!Oursciencecollaborators!!• AndrewConnolly,TomQuinn,SarahLoebman,ArielRokem,GingerArmbrust,Yejin Choi

Oursponsors!!!• NationalScienceFoundation,Moore&SloanFoundations,WashingtonResearchFoundation,eScience Institute,ISTCBigData,Petrobras,EMC,Amazon,andFacebook

2MagdalenaBalazinska- UniversityofWashington

Page 3: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

BigData

MagdalenaBalazinska - UniversityofWashington 3

Management

Analytics

Efficient Easy

ScienceApps

Page 4: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

GoalsoftheMyria stack• Advancestate-of-the-artinbigdatasystems• Focusonefficiencyandproductivity• Testonrealapplicationsandsupportrealusers

Deliverables:• Builtanewbigdatamgmt &analyticssystem• DeployedandoperateMyria asaservice• Sourcecodeanddemoservice:http://myria.cs.washington.edu

4MagdalenaBalazinska- UniversityofWashington

Page 5: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

5MagdalenaBalazinska- UniversityofWashington

Myria hasbeendevelopedandisoperatedby• DatabaseGroupintheComputerScience&EngineeringDepartmentatUW

• UWeScience Institute

Co-PIs:DanSuciu andBillHowe

Page 6: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

6

Myria Demo

MagdalenaBalazinska- UniversityofWashington

Page 7: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria CloudService

MagdalenaBalazinska- UniversityofWashington 7

Serviceavailablethroughprojectwebsite

Page 8: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

AnalysisintheBrowserwithMyria

MagdalenaBalazinska- UniversityofWashington 8

Declarative-imperativeanalysiswithMyriaL andPython

Page 9: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria OperatesDirectlyonDatainS3

MagdalenaBalazinska- UniversityofWashington 9

Forefficientprocessing,cachesqueryresultsinternallyincluster

Page 10: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

MyriaL isImperative+DeclarativewithIterations

MagdalenaBalazinska- UniversityofWashington 10

Page 11: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria ProvidesDetailsofQueryExecution

MagdalenaBalazinska- UniversityofWashington 11

Page 12: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria ServiceincludesJupyter Notebook

MagdalenaBalazinska- UniversityofWashington 12

Jupyter notebookavailabledirectlywithMyria service

Page 13: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria SupportsPythonUser-DefinedFunctions

MagdalenaBalazinska- UniversityofWashington 13

DatafromtheHumanConnectomeproject

MRIdataanalysis

PythonUDFsenablerunninglegacycodeandcomplexanalyticsbeyondSQL/MyriaL

Page 14: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

UsersCanDeployOwnService

pip install myria-cluster

MagdalenaBalazinska- UniversityofWashington 14

myria-cluster create [OPTIONS] CLUSTER_NAME

myria-cluster stop/start/destroy […]

Page 15: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

ExampleMyria Applications

15

NeuroscienceAstronomy

NaturalLanguageProcessing

PicturefromLeilaZillesMyMergerTree Screenshot

DatafromtheHumanConnectome project

Oceanography

100

101

102

103

104

100

101

102

103

104

ps3.fcs…subset

FSC

692-40

RED

fluo

resc

ence

FSC

Picoplankton

Nanoplankton

100

101

102

103

104

100

101

102

103

104

P35-surf

FSC Small Stuff

58

0-3

0

IS

Ultraplankton

100

101

102

103

104

100

101

102

103

104

P35-surf

FSC Small Stuff

69

2-4

0 litt

le s

tuff

Phytoplankton

Prochlorococcus

Bibliometrics

Page 16: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

16

Myria Internals

MagdalenaBalazinska- UniversityofWashington

Page 17: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria Polystore Stack

Browser SpecializedServices

RACO

MyMergerTree

QueryTranslation,Optimization,andOrchestration

Python/Jupyter

Parallel, Iterative, and Elastic Query

Execution

MyriaXMPI

SciDB

Graphs

NoSQL

MagdalenaBalazinska- UniversityofWashington 17

Page 18: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria’s DataModelandQueryInterface• RelationalAlgebraCompiler(RACO)

– Myria’s queryoptimizerandfederator• RACOcore:relationalalgebraextendedwith

– Iterations formulti-passalgorithms– Flatmap toexplodenon-1NFattributevaluesintomanytuples– Stateful apply forwindowedandneighborhoodfunctions

• Querylanguage:MyriaL (Imperative+Declarative)– Eachstatementisdeclarative(SQL,comprehensions,functioncalls)– Statementsarecombinedwithimperativeconstructs

• Variableassignment• Iteration

• PythonUDFs/UDAs– Minimizebarrierstoadoptionandrunlegacycode

• PythonAPI– FluentAPIwithPythonlambdafunctions

MagdalenaBalazinska- UniversityofWashington 18

Page 19: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Polystore Optimization• Rule-basedopt.withthreetypesofrules

– OptimizelogicalMyria algebraplans– Translatelogicalplansintoback-endspecificphysicalplans– Optimizeback-endspecificphysicalplans

• Toaddanewback-end,developermustspecify– Treerepresentationofquerylanguage– RulesthattranslateMyria algebraintothisrepresentation– Administrativefunctionsincludingonetosubmitqueries

• Datamodelindependence– Myria hidestheexistenceofvariousback-ends– UserswriteMyriaL scriptsassumingrelationalmodel– Back-endsincludeselectarray,graph,andkey-valuesystems

MagdalenaBalazinska- UniversityofWashington 19

Page 20: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

FederatedQueryExecution

Federatedplansrequirefastdatamovement

MagdalenaBalazinska- UniversityofWashington 20

Worker1

Worker"

SourceDBMS

User

t = scan(data)x = distances(t,t)export(x,'db://Target')

x = import('db://Source')u = cluster(x)

WorkerDirectorysource.w1à target.wmsource.wnà target.w1

[1] [2]

[3]

[4]

Worker1

Worker#

TargetDBMS

UserorOpt.

Page 21: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

DataMovementwithPipeGen

A+

DBMSBytecode

UnitTests

PipeGen

Pipegen-EnabledDBMS

21

PipeGen:DataPipeGeneratorforHybridAnalyticsBrandonHaynes,AlvinCheung,andMagdalenaBalazinska.SOCC2016.

DBMSbytecode

DBMS with optimizeddata pipe

PipeVerify:Verification

IORedirect: I/O RedirectorIdentify

File Open Expressions

InjectConditional Redirection

InstrumentUnit Tests

InstrumentUnit Tests

Data Flow Analysis

Type Substitution

FormOpt: Format Optimizer

Data Pipe Type

Augmented Types

Page 22: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

PipeGen’s Performance

MagdalenaBalazinska- UniversityofWashington 22

16-nodeclusterwith16workers/tasksTransfer10^9tupleswith4ints and3doubles

Page 23: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria Polystore Stack

Browser SpecializedServices

RACO

MyMergerTree

QueryTranslation,Optimization,andOrchestration

Python/Jupyter

Parallel, Iterative, and Elastic Query

Execution

MyriaXMPI

SciDB

Graphs

NoSQL

MagdalenaBalazinska- UniversityofWashington 23

Page 24: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

MyriaX EngineandCloudDeployment

MagdalenaBalazinska- UniversityofWashington 24

AmazonEC2Instance

JSONqueryplans&APIcalls

CoordinatorREST Interface

Worker

HDFSAmazonEBSVolumesand/orLocalStorage

RDBMS

AmazonS3

Worker

YARNContainer

Worker

YARNContainer

YARNContainer

… …

YARNContainer

AmazonEC2Instance

RDBMS RDBMS

AmazonEC2Instance

… …

Page 25: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

MyriaX Overview

25MagdalenaBalazinska- UniversityofWashington

• Datastorage– ReaddatafromS3,HDFS,localfiles– ParseCSV,TSV,andvariousscientificfileformats– StoredatainlocalrelationalDBMSinstances

• Faststoragewithphysicaltuning(indexing,hash-partitioning)

• Queryexecution– FundamentallyaparallelDBMS

• Fast,pipelinedqueryexecution– Butschedulingmoreflexibletosupportelasticity– Novelfeatures:Multiwayjoinsanditerations

• Resourcemanagement– ExecutesontopoftheYARNresourcemanager

Page 26: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

EfficientIterativeProcessing

• Userspecifiesquerydeclaratively– SubsetofDatalog withaggregation

• Generateefficient,shared-nothingqueryplan– Smallextensions to existingshared-nothingsystems

• Planamenabletoruntimeoptimizations– Synchronousvsasynchronous– Differentprocessingpriorities

• OptimizationssignificantlyaffectperformanceMagdalenaBalazinska- UniversityofWashington 26

AsynchronousandFault-TolerantRecursiveDatalogEvaluationinShared-NothingEnginesJingjing Wang,MagdalenaBalazinska,andDanielHalperin.PVLDB 8(12):1542-1553(2015)

Page 27: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria’s OptimizedIterationsExample

Declarative QueryE = scan(jwang:cc:graph);V = select distinct E.$0 from E;doCC := [$0, MIN($1)] <-[from V emit V.$0 as x, V.$0 as y] +[from E, CC where E.$0 = CC.$0 emit E.$1, CC.$1];

until convergence;store(CC, CC);

MagdalenaBalazinska - UniversityofWashington 27

AsynchronousandFault-TolerantRecursiveDatalogEvaluationinShared-NothingEnginesJingjing Wang,MagdalenaBalazinska,andDanielHalperin.PVLDB 8(12):1542-1553(2015)

//Canhave multiple relations//with recursive dep.

IDBController(CC) Scan(Edges) 

Join 

Scan(Edges) 

Compiled to a Distributed Query Plan

Page 28: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

PerformanceComparisonwithSparkDeclarativeQuery

(subsetofDatalog withagg.)

Shared-NothingQueryPlanIn-MemoryProcessing

Synchronous

Asynchronous

PrioritizeNewData PrioritizeBaseData

28

# of Workers8 16 32 64

0

50

100

150

200

250

Que

ry T

ime

(Sec

onds

)

Spark Myria, Sync Myria, Async

(GraphX) 28

ConnectedComponents– Twittersubgraph221millionedgesand5millionvertices

Page 29: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria Polystore Stack

Browser SpecializedServices

RACO

MyMergerTree

QueryTranslation,Optimization,andOrchestration

Python/Jupyter

Parallel, Iterative, and Elastic Query

Execution

MyriaXMPI

SciDB

Graphs

NoSQL

MagdalenaBalazinska- UniversityofWashington 29

Page 30: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

MagdalenaBalazinska- UniversityofWashington 30

CloudOperationinMyria

OrpointtodatainAmazonS3

Page 31: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria’s PersonalizedServiceLevelAgreements

31

ChangingtheFaceofDatabaseCloudServiceswithPersonalizedServiceLevelAgreementsJenniferOrtiz,VictorT.Almeida,andMagdalenaBalazinska.CIDR2015

MagdalenaBalazinska- UniversityofWashington

WorkloadCompressionintoPSLA

WorkloadGeneration

QueryClustering

TemplateGeneration

Cross-TierPruning PSLASchema

RuntimePrediction

Myria’s SLAgeneration

Page 32: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria’s PerfEnforce Subsystem

32

PerfEnforceDemonstration:DataAnalyticswithPerformanceGuaranteesJenniferOrtiz,BrendanLee,andMagdalenaBalazinska.SIGMOD2016.

MagdalenaBalazinska- UniversityofWashington

Page 33: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

MagdalenaBalazinska - UniversityofWashington

Myria’s PerfEnforce Subsystem

33

Clustersizechangesduringquerysession

PerfEnforceDemonstration:DataAnalyticswithPerformanceGuaranteesJenniferOrtiz,BrendanLee,andMagdalenaBalazinska.SIGMOD2016.

Page 34: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

AutomaticDataPipes

ImageProcessingPerf.Debugging

CloudPSLAs

Myria CloudOperation

PerformanceGuarantees ElasticMemory

EfficientMulti-Join IterativeQueries

EfficientProcessing&ComplexAnalyticswithMyriaX

DataSummaries

Myria’s InnovationsSummary

Myria Polystore

FederatedAnalytics

MagdalenaBalazinska- UniversityofWashington 34

Page 35: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Conclusion• Highlyexpressive

– MyriaL (RA+iterations)&Python• Polystore withhybridanalytics• Highperformanceonvarietyofqueries• Availableasaservice

– Focusonlowbarriertoentry– Andturningusersintoself-sufficientexperts– Alsofocusontheserviceprovider:OperateMyria

• Sourcecodeandmoreinfo(includesvideos)http://myria.cs.washington.edu/

35MagdalenaBalazinska- UniversityofWashington


Recommended