+ All Categories
Home > Documents > Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data...

Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data...

Date post: 21-May-2020
Category:
Upload: others
View: 10 times
Download: 0 times
Share this document with a friend
31
Pentaho Data Integration Best Architecture Practices Matt Casters Pentaho Chief Architect of Data Integration, Hitachi Vantara
Transcript

PentahoDataIntegrationBestArchitecturePracticesMattCastersPentahoChiefArchitectofDataIntegration,HitachiVantara

Contents

• Introduction• Generaladvice• Specificadvice• Practicalexamples

• Recap• Q&A

Introduction:Whatis“DataIntegrationArchitecture?”

Introduction

• Whatis“dataintegrationarchitecture”?– Highlevelviewona(potential)DIsolution– Describescomponentsandtheirrelationships– Takingintoaccountallparts– Avoidingdetailswithoutskippinganything

Introduction

• Whydoyouneedanarchitecture?– Solutionsgetverycomplex– Teamsofengineersgetlarge– Consciousdecisionsonuseofsolutioncomponents– Holisticviewsonsecurity,quality,transparency,performance– Allowsforvalidationofhighlevelrequirements– Allowsforthecreationandvalidationofscenarios– Clearlydefinesstakeholders

GeneralAdvice:SomePointersinSettingupSolidArchitecturesforSolidSolutions

GeneralAdvice– Don’tForgettheDetails…

• Learnthebasicsofthebuildingblocks…– PDIBestPractices#PWorld14• Standards,naming,…– PDIBestGovernancePractices#PWorld15• PM,CI,VCS,Testing,…– Getexpertiseforallsoftwarecomponentsyouuse

GeneralAdvice– Whiteboarding

• Whiteboarding– Isdonewithinterestedstakeholders– Triestocompromiseknowledgefromvariousparties– Allowsforquickhighleveldesign– Itisjustastartingpoint!– Needstogetfollowedup,validatedagainstscenarios– Forgetconviction:timetochangeyourmind

GeneralAdvice– Scalability

• Parallelizeonahighlevel– Aggressivelowlevelparallelizationcangetyouintotrouble

• Remembertoallowdatatoflowinswimlanes– Parallelizationofasmuchaspossible– “Sharding”andsoonshouldbearchitectedin

• Identifytimewindowearlyon,assessHWneeds

GeneralAdvice– Transparency

• Greatcomplexityrequirestransparency– Somethingwillalwaysgowrong– Attheworstpossibletime

• Asarule:– alwaystracedatamovingbetweenpartsofarchitecture–Whenindoubt:addmorelogging,trackingandtracing

• Usecomponentsinarchitecturethatallowformonitoring– Preferserversthatallowyoutoseewhat’sgoingon

GeneralAdvice– Predictability

• Enormousworkloads,batchjobs,putsystemsunderstress

• Batchestendtogrowbiggerovertime,causingmorestress

• Asarule:– Ifyoucaninanyway,usemicro-batching– Chopup1largenightlyworkloadintohundredsofsmallonesthroughouttheday

• Advantages:– Morefrequentupdates– Predictableworkload– Failearlyscenario:problemsaredetectedearlier

SpecificAdvice:AdviceforIoTandOthers

SpecificAdvice– Hadoop

• Hadoophasitselfbecomeanecosystemofsoftware

• Selectthesoftwareintheecosystemtofityouridealarchitecture

• Onlyselectproperlysupportedcomponents,avoidbleedingedge

• Combatlackoftransparencywithextensivelogging

• Followtherightsizingforyourarchitecture,balancecorrectly• Useitasascalablepart,notjustasa“Database”

SpecificAdvice– IoT

• IoTisMessy– DataQualityvarying– DataConnectivityproblems– Latearrivingdata– Flash-floodsofdata(lowpredictability)– Highcomplexity– Varyingdataformatsandversions– Numberofdifferentdevicescanbehigh

HitachiVantara IoTOfferings

CONNECTEDTHINGS

OperationalInsights

AssetIntelligence

MaintenanceOptimization

ManufacturingOptimization

EDGE

AssetAvatar State

CORE ANALYTICS

FOUNDRY

DataCollection

AssetManagement

AssetAvatar

ArtificialIntelligence

Batch/Stream/Analytics

DataBlending/Orchestration

AssetIntegration

EdgeAnalytics

DataFiltering

DataTransformation

DashboardAlerts/

NotificationsApplicationEnablement

SpecificAdvice– IoT

• Planaheadforfailure• UsemoderntechniqueslikeMetadataInjection

• Makeextensiveuseofqueuesinanyformat

• Assumethatthingswillgowrongineveryscenario

• Designthearchitecturetocopewithfailures• Designthearchitecturetoreportonstatistics

PracticalExamples:WarStoriesfromtheField

Examples– LargeServicesVendor

• Movinglargeamountsofsmalldatapacketsaround

• Pickedtherighttools,didn’tpickanoverallarchitecture• Differentteams“workingtogether”indifferentcountries

• Architecturebecamesecondarytotheoverallsolution

• Technologywasselectednotarchitecture

Examples– LargeServicesVendor

• Carteserversgothammeredthousandsoftimespersecond– Useofaspecificschedulerwasmandated– Runningoutofsockets,HTTPserverbucklingundertheload

• ComplaintsaboutPDIstartuptimes

• Overallperformancetoolow

• Servicescalledintosolve“critical”issuesinoursoftware

Examples– LargeServicesVendor

• Don’tallowinternalorganizationalneedsdrivethearchitecture• Don’tallowtechnologychoicestodrivearchitecture– Andifyoutoo,handletheimplications

• Toscale,rampupperformance,alwaysqueueandintelligentlyhandlequeuedtasks(notoneatatimeforexample)

• Theperformanceofthewholeisdeterminedbytheslowestlink– Considerthisup-frontinthearchitecture

Examples– HandlingTVSet-topData

• Periodicinnature,handlingclicks• ReadingfromMQTT,dumpingdataintoOracleforanalysis

• ReportedPDIperformancetrouble,servicescalledin

• Smallscaletest,predictedten-foldincreaseinsize,alreadyintrouble

Examples– HandlingTVSet-topData

• MQTT:greatforqueuingandIoT

• Notalwayspossibletoreadinparallelfromqueues!

• OracleisanRDBMS,killsparallelisminarchitecture

Examples– HandlingTVSet-topData

• Considerpartitioninglargeamountsofclients

• Considerdataextractionforanydatastoragemechanism

Examples– BigBank

• Processedagazillionrecordseverynight• Hadabatchwindowof2hours• Gotamonstercomputertodothejobwith64cores

• RancomplexdataqualityvalidationsinPDI,hundredsofsteps

• Gotintoaperformanceproblem

• Neededextensiveperformancetuning

Examples– BigBank

Pick2

Good

FastCheap

Examples– BigBank

Pick2

Lotsofwork

InbatchwindowOn1server

Examples– BigBank

• Considerup-frontwhetherHWchoiceswillpinyoudownlater

• Weightheimportanceofspecificrequirementsintothearchitecture– timevscomplexityvshardwareinthiscase

Recap:PDIBestArchitecturePractices

Recap

• Makeanarchitectureup-front,notaspartofthedocumentation

• Becritical• Bedetailed• Runscenariosagainstit• Bereadytochangeyourmind

• Getstakeholdersinvolved• UsePDI:PessimisticDataIntegration

QuestionsandDiscussion


Recommended