Date post: | 24-Jan-2018 |
Category: |
Technology |
Upload: | papisio |
View: | 259 times |
Download: | 0 times |
Shortening the time from analysis to deploymentwith ML-as-a-Service
TEVECSystems
LuizAugustoCanito Gallego deAndradeGabrieldeBodt Sivieri
TimeSeriesForecasting
Brazil’s GDP
Indistrial Capacity
Sales
Whatwillsalesbelikeinthecomingperiods?
TimeSeriesForecasting
Somestrategiestodealwiththeproblem
TimeSeriesForecasting
Somestrategiestodealwiththeproblem
EmbeddingstrategyFeatureengineeringstrategy
APICustomerStory
Thecustomerneedsinsightsabouthisdataandtobuildvalueuponitsdatabase 1
Thecustomeristhrilledwiththeresultsandeagerlywantstodeploythisnewacquiredknowledgeinhisbusinessprocesses
3
DataScienceteamscomesinthescenetocrunchdataanddeliverpowerfull modelsandinsightsaboutcustomerdata
2
Whataretherequirements?
4
CustomerSideConsultingSide
APICustomerStory
APIServiceLevel CloudStandards Improvedaccuracyovertime
Freshinsightstoincreasevalue
CodeStandardsandreleaseworkflow
Newvariablesfrompublicsources
APICustomerStory
APIServiceLevel CloudStandards Improvedaccuracyovertime
Freshinsightstoincreasevalue
CodeStandardsandreleaseworkflow
Newvariablesfrompublicsources
Some objectives/requirementsare extremely software related
APICustomerStory
APIServiceLevel CloudStandards Improvedaccuracyovertime
Freshinsightstoincreasevalue
CodeStandardsandreleaseworkflow
Newvariablesfrompublicsources
Others are Data Science related
MachinelearningasaService
FocusGroupsstrategies
FocusGroup1
Collaborationishard
Problemsaresolvedlocally
Problemoriented
Thereisnolongtermstrategy
FocusGroup2
FocusGroup4FocusGroup3
MachinelearningasaService
ProductOrientedStrategy
LimitedAPIproblemrange
Softwareproblemsbecomefocus
”Distancefromdata”
”Onesizefitsall”
Softwareengineering
Customerservice
DataScience UserExperience
Ourviewofthematter
Experimentationframework
CommonlyusedframeworksandAPIs
Model 1
Model2
Model3
Model4
Pipelines
Documentbased
database
Modelostreinados
ProductionStructure REST
ContinuousDataScience
What’sapipeline?
NodeNode
Node
Node
Node
Target
Bycombiningeffectivesoftwarearchitectureandstate-of-the-artMLandDStoolsweareabletoquicklytestanddeployafreshpipelinesfordifferentproblems
Experimenting(AgileDataScience)
MLengineeringRunAccuracyReport
DataScienceSubsamplesdatasetstofocusonanimprovement
DataScienceDesigningnewmodelsinsmall/mediumsizescale
testing
FocusonBusinessmetrics(MAPE,ROC).Secondaryuseof”math”metricssuchasRMSEorLogLoss
Accuracyisreportedbasedinproductionforecastsversusupdatedinformation
Clusteraccuracybydatasetthemeorkeystatisticalmetrics
UseofTEVEC’spipeliningframeworkforquickmodeldesign
Prototypeusingsmallscaletestinginaconsoleapplication(JupyterHub)
Experimenting(AgileDataScience)
MLengineeringRunAccuracyReport
DataScienceSubsamplesdatasetstofocusonanimprovement
DataScienceDesigningnewmodelsinsmall/mediumsizescale
testing
DataScience/MLEngineeringLargescaletestingon
productionframeworkusingproductiondata
MLengineeringPushpipelinestoproductionandmonitoroperations
BusinessDecisionAnalyzetheaccuracyreport
anddecidetopushtoproduction
ExperimentingstructureisanactualdocumentinTEVEC’sODMdatastructure
Experimentconnectswithpipelinesandappliesittoasequenceofdatasets
A/BTestingcomparesperformanceinsameformatasAccuracyReport
Businesshasbusiness-likeinputstodecidecommunicateexpectedresultstocustomer
Thenewpipelinewasvalidatedthroughoutthewholeexperiment.Itissafetopushtoproduction.
Experimenting(AgileDataScience)
MLengineeringRunAccuracyReport
DataScienceSubsamplesdatasetstofocusonanimprovement
DataScienceDesigningnewmodelsinsmall/mediumsizescale
testing
DataScience/MLEngineeringLargescaletestingon
productionframeworkusingproductiondata
MLengineeringPushpipelinestoproductionandmonitoroperations
BusinessDecisionAnalyzetheaccuracyreport
anddecidetopushtoproduction
Wetrytorepeatthecycleeveryweek
Experimenting(AgileDataScience)
LargeScaleexperimentingisaninherentpartofthesystem.
Conclusions
WeachievedprocessstabilityonceweseparatedourDataScienceteamfromtheProductionSoftwareEcosystem
ThroughacollaborationbetweenDataScienceteamandMLEngineerswewereabletodesignacontinuousexperimentationprocess
Tocareaboutstandardsandinterfaceinexperimentationstageistosavetimeindeployment.Thisalsoreducestheriskofunexpectederrorsinproduction
Pipelinestructureusesstate-of-the-artpackagesandframeworkswhileenforcinginterfacesandsoftwarearchitecture,notcodingstandards.ThissavestimetofocusonDataScience
Wearestilllearningfromthisnew”continuous”DSprocess,butsofarwehavehadexcellentresultsinteamgrowingandincrementallyimprovingoursoftware
Luiz Augusto Canito Gallego de Andrade+55 (11) 9 [email protected]
Gabriel Sivieri+55 (11) 9 [email protected]