BigDataisalso BigComputeBetterPriceForecastingusingMachineLearning
AvishkarMisraPhD– [email protected]&AnalyticsPlatformTeam
OracleBigDataApplianceandOracleBigDataCloudServiceofferafantasticcomputationplatformtoscaleupexperimentationandspeeduptimetoinsight.TheOracleBigData&AnalyticsPlatformteam,workedwithaleadingNorthAmericancommodityproducerin2016,tohelpimprovethegranularityandaccuracyoftheirpredictionforthepriceofthecommodity.Anaccuratepriceforecastmodelhasapotentialbenefitof$25.6Movera3yearperiodforthecommodityproducer.WeusedPythonandPySpark runningonOracleBigDataAppliancetotryout39,936modelingcombinationsinamatterofminutestofindabetteralgorithm.Themoreaccuratealgorithmwasabletopredictthepricewithin+/-5%oftheactualprice73%ofthetimecomparedto40%forthealgorithmthecustomerhaddevelopedandfine-tunedovermanyyears.Oracle’sBigDataplatformsofferawaytodramaticallyscalethenumberofexperiments,helpingthescientistsandanalystsquicklynarrowtheirfocusonthemostrelevantandeffectivetechniquestoimpacttheirbusiness’sbottomline.
BlogPost:https://a-misra.com/2016/08/01/big-data-is-also-big-compute/
Abstract
BigData&DataScienceAdvisoryServicesCustomerEngagementsthatbringindustryexperienceandbestpracticestodemonstrateandprototypesolutionsforcustomerinlinewiththeirstrategicbusinessgoals.
2
SolutionArchitecture
3
DataScience&MachineLearning
4
AnalyticsDesignHub
1
BusinessValueDefinition
Teamskill-sets:• DataScientists• DataWranglers• Architects• BusinessAnalysts
Teambackgroundsfrom:• Amazon• MSFT• Deloitte• IBM• Teradata• etc.
Define,design,developanddemonstratethevalueofasolutionquickly.
ExecutiveCommitment
DiscoverySession
ExecutiveReadout Deploy
BusinessCase
AnalyticSprint
Architecture&
Roadmap
4-6weeks
Agriculture:aerialimageanalysis1
2 Utilities:frauddetection
Tree
BigDataAnalyticsSprint
AlargeNorthAmericanLumberproducerlookingtoimproveweeklypriceforecasts
Today:CommodityPriceForecasting
Spotmarkettradersspeculatedonweeklycommoditypriceusing:
priorweek’smarketprice90-dayforecast– Tunedover10yearsgutinstinct
Decide:sell vshold inventory
MoreaccuratepricepredictioncanÛMargin
LCBQ - KDWesternSpruce-Pine-Fir#2&Btr2x4random.
Commonbenchmark,sinceithashighcorrelationwithothertypesoflumber.
Lumberlumbereverywhere
Normalizethedataforweeklypredictionsby
• over-sampling
• aggregation
• sub-sampling
Whatfactorsinfluencetheprice?
CurrentPrice Weekly(everyFriday)
EconomicActivityinConstruction&Building Monthly (3-monthlag)
CurrencyExchangeRates Daily
Weather*new Hourly
Focus:59majorpopulationcenters– wherepeopleliveandlikelytorenovate+build.
WeatherData:
• WeeklyTotalPrecipitation
• WeeklyAverageTemperature
Weather,butforwhere?
Time-seriesprediction,socannotusetraditionalk-foldvalidation.
EvaluationMethodology
...
2-years
For104weeks:• Trainmodelsforeachweek.• Predictfornextweek.• Evaluatepredictionaccuracy
Q:MachineLearningpipelineoptions– Whichdowepick?
FeatureGeneration None,RollingWindow {7,13,26,52},PolynomialInteractions,All 6
FeatureNormalizer None, Min-Max 2
FeatureSelector None, PCA 2
ModelingObjective Classifier, Regression 2
ModelingAlgorithm LogisticRegressionDecisionTreesExtraTreesAda BoostGradientBoostingRandomForest
LinearRegressionRidgeRegressionLassoRegressionDecisionTreeRegressorAda BoostTreeRegressorExtraTreeRegressorGradientBoostingRegressorRandomForestRegressor
6 |8
Answer:Experiment!
OracleBigDataCloudService
DataScientists
Ifyoudoublethenumberofexperimentsyoudoperyearyou’regoingtodoubleyourinventiveness.
– JeffBezos
Trainedandevaluated- 14,976Classifiers
Classification– Canwepredictifthepricewillgoup?
Good
Poor
Trainedandevaluated- 22,464Regressors
Regressors – Canwepredicttheprice?
Poor
Good
Evaluated37,440modelsinlessthan15mins!!
Takeaways
Classifiers:• Ensemblearebetter- ExtraTrees,GradientBoosting,RandomForest,AdaBoost.• Min-Maxnormalizationoffeaturevaluesnotuseful• PrincipalComponentAnalysisveryhelpful
Regressors:• Lotofmodelover-fitting!• ExtraTreeRegressors workverywell• PrincipalComponentAnalysisnothelpful
Insightsà Abetterpricepredictionalgorithm
Target isActualPrice+/- 5%
LinearRegression:40% oftimeintarget.
Extra-TreeRegression:73% oftime intarget.
Û $19.4mto$25.6mmarginover3years
PriceForecastingforCommodityGoodsObjective:Improvepriceforecastingonfinishedcommoditygoodstooptimizemargindollarsacrossallchannelsandenhancecustomerupsellingcapabilitiesfortraders
• OracleBigDataCloudServices,including:ClouderaHadoopDist.,Spark,Jupyter,Connectors,DataIntegrator,OracleRAdvancedAnalyticsforHadoop,Spatial&Graph,Storage
• SalesReps(spotmarkettraders)reliedonpriorweekmarketpricingandlong-range(90)dataforecaststomakea‘gutinstinct’decision–weeklypriceforecastsdidnotexist
• BusinessAnalystsmanuallypulledinformationfromavarietyofsourcesintoanExcelworkbook– oftenhavingtoretypemarkettrendinformationfromPDFs
• Whilethecurrentprocesswasdeemedhelpfulforlongrangestrategicforecasting,tradersneededshort-termforecastsinordertobetterpricegoodsanddeterminewhethertosellorholdinventoryonaweeklybasis
BusinessChallenge/Opportunity
ProposedSolution
Example
Leading North American Lumber Producer for Builders
and Home Improvement Centers
PotentialBenefits• Overa3yearhorizon,potentialincrementalgrossmarginfromimprovedpricingandreducedoperatingcostsrangedfrom$19.4mto$25.6minincrementalbenefits,anda4,342% ROI
AnalyticSprintResults• Demonstratedtheinclusionofexternaldatasetssuchasweather intothemixofdatatoimproveforecastaccuracy
• Demonstratednewmachinelearningtechniquesandtheabilitytoscaleclientexperimentationwithforecastmodelingto37k analyticmodelsinamatterofminutes
• Improvedpredictiongranularity frommonthlytoweekly,andaccuracytowithin+/- 5%oftheactualmarketprice73% ofthetime,comparedwith40% forthealgorithmusedbytheclient
BigData&DataScienceAdvisoryServicesGetintouchwithus!
Email:[email protected]
Blog:http://a-misra.com
OracleExternal:https://www.oracle.com/big-data
OracleInternal:http://bdcoe.us.oracle.com
Thiswork:https://a-misra.com/2016/08/01/big-data-is-also-big-compute/