Shortening the time from analysis to deployment with ml as-a-service — Luiz Andrade and Gabriel De...

Post on 24-Jan-2018

259 views 0 download

transcript

Shortening the time from analysis to deploymentwith ML-as-a-Service

TEVECSystems

LuizAugustoCanito Gallego deAndradeGabrieldeBodt Sivieri

TimeSeriesForecasting

Brazil’s GDP

Indistrial Capacity

Sales

Whatwillsalesbelikeinthecomingperiods?

TimeSeriesForecasting

Somestrategiestodealwiththeproblem

TimeSeriesForecasting

Somestrategiestodealwiththeproblem

EmbeddingstrategyFeatureengineeringstrategy

APICustomerStory

Thecustomerneedsinsightsabouthisdataandtobuildvalueuponitsdatabase 1

Thecustomeristhrilledwiththeresultsandeagerlywantstodeploythisnewacquiredknowledgeinhisbusinessprocesses

3

DataScienceteamscomesinthescenetocrunchdataanddeliverpowerfull modelsandinsightsaboutcustomerdata

2

Whataretherequirements?

4

CustomerSideConsultingSide

APICustomerStory

APIServiceLevel CloudStandards Improvedaccuracyovertime

Freshinsightstoincreasevalue

CodeStandardsandreleaseworkflow

Newvariablesfrompublicsources

APICustomerStory

APIServiceLevel CloudStandards Improvedaccuracyovertime

Freshinsightstoincreasevalue

CodeStandardsandreleaseworkflow

Newvariablesfrompublicsources

Some objectives/requirementsare extremely software related

APICustomerStory

APIServiceLevel CloudStandards Improvedaccuracyovertime

Freshinsightstoincreasevalue

CodeStandardsandreleaseworkflow

Newvariablesfrompublicsources

Others are Data Science related

MachinelearningasaService

FocusGroupsstrategies

FocusGroup1

Collaborationishard

Problemsaresolvedlocally

Problemoriented

Thereisnolongtermstrategy

FocusGroup2

FocusGroup4FocusGroup3

MachinelearningasaService

ProductOrientedStrategy

LimitedAPIproblemrange

Softwareproblemsbecomefocus

”Distancefromdata”

”Onesizefitsall”

Softwareengineering

Customerservice

DataScience UserExperience

Ourviewofthematter

Experimentationframework

CommonlyusedframeworksandAPIs

Model 1

Model2

Model3

Model4

Pipelines

Documentbased

database

Modelostreinados

ProductionStructure REST

ContinuousDataScience

What’sapipeline?

NodeNode

Node

Node

Node

Target

Bycombiningeffectivesoftwarearchitectureandstate-of-the-artMLandDStoolsweareabletoquicklytestanddeployafreshpipelinesfordifferentproblems

Experimenting(AgileDataScience)

MLengineeringRunAccuracyReport

DataScienceSubsamplesdatasetstofocusonanimprovement

DataScienceDesigningnewmodelsinsmall/mediumsizescale

testing

FocusonBusinessmetrics(MAPE,ROC).Secondaryuseof”math”metricssuchasRMSEorLogLoss

Accuracyisreportedbasedinproductionforecastsversusupdatedinformation

Clusteraccuracybydatasetthemeorkeystatisticalmetrics

UseofTEVEC’spipeliningframeworkforquickmodeldesign

Prototypeusingsmallscaletestinginaconsoleapplication(JupyterHub)

Experimenting(AgileDataScience)

MLengineeringRunAccuracyReport

DataScienceSubsamplesdatasetstofocusonanimprovement

DataScienceDesigningnewmodelsinsmall/mediumsizescale

testing

DataScience/MLEngineeringLargescaletestingon

productionframeworkusingproductiondata

MLengineeringPushpipelinestoproductionandmonitoroperations

BusinessDecisionAnalyzetheaccuracyreport

anddecidetopushtoproduction

ExperimentingstructureisanactualdocumentinTEVEC’sODMdatastructure

Experimentconnectswithpipelinesandappliesittoasequenceofdatasets

A/BTestingcomparesperformanceinsameformatasAccuracyReport

Businesshasbusiness-likeinputstodecidecommunicateexpectedresultstocustomer

Thenewpipelinewasvalidatedthroughoutthewholeexperiment.Itissafetopushtoproduction.

Experimenting(AgileDataScience)

MLengineeringRunAccuracyReport

DataScienceSubsamplesdatasetstofocusonanimprovement

DataScienceDesigningnewmodelsinsmall/mediumsizescale

testing

DataScience/MLEngineeringLargescaletestingon

productionframeworkusingproductiondata

MLengineeringPushpipelinestoproductionandmonitoroperations

BusinessDecisionAnalyzetheaccuracyreport

anddecidetopushtoproduction

Wetrytorepeatthecycleeveryweek

Experimenting(AgileDataScience)

LargeScaleexperimentingisaninherentpartofthesystem.

Conclusions

WeachievedprocessstabilityonceweseparatedourDataScienceteamfromtheProductionSoftwareEcosystem

ThroughacollaborationbetweenDataScienceteamandMLEngineerswewereabletodesignacontinuousexperimentationprocess

Tocareaboutstandardsandinterfaceinexperimentationstageistosavetimeindeployment.Thisalsoreducestheriskofunexpectederrorsinproduction

Pipelinestructureusesstate-of-the-artpackagesandframeworkswhileenforcinginterfacesandsoftwarearchitecture,notcodingstandards.ThissavestimetofocusonDataScience

Wearestilllearningfromthisnew”continuous”DSprocess,butsofarwehavehadexcellentresultsinteamgrowingandincrementallyimprovingoursoftware

Luiz Augusto Canito Gallego de Andrade+55 (11) 9 7163-2619luiz.andrade@tevec.com.br

Gabriel Sivieri+55 (11) 9 7191-3783gabriel.sivieri@tevec.com.br