Customer flow forecasting with GBRTpredictiveanalyticsandforecasting.com/wp-content/... · 1-14...

Customer flow forecasting with GBRT: the benefits of

adopting a customized machine learning approach

Shaohui Ma, Nanjing Audit University, Nanjing, 211815, China

and Robert Fildes, Lancaster Centre for Marketing Analytics and Forecasting

Department of Management Science, Lancaster University

THE TIME SERIES FORECASTING PROBLEM - THE SEARCH FOR THE HOLY GRAIL

Two distinct problems when choosing a method- • A single series

• Multiple (related) series

Two solutions • Aggregate selection

• The same model/method applied whatever the context

• Individual selection • The method is selected depending on the series characteristics and the

pool of (related) series

FORECASTING COMPETITIONS - WHAT ARE THEY FOR? The grail: a class of methods which dominates alternatives • Benchmark how ‘simple’ methods compare to a model class

E.g. how badly Holt-Winters compares to ARIMA (Newbold and Granger, 1974)

• Evaluate and compare different methods “allow practitioners to select the most appropriate method(s) for

their forecasting needs”: Makridakis et al., 2019

• Introduce and test new methods • Machine Learning methods introduced in the M3 (2001)

competition. • One variant Neural net

• Theta

• Extended to open ML submissions (M4, 2019)

The Characteristics of Forecasting Competitions

1. Specify the population of relevant time series.

2. Define the forecasting task precisely.

Lead time

Information set

3. Specify the forecasting methods to be considered

• Include standard benchmarks

• Include current practice.

4. Define a range of performance measures (linked to value)

5. Specify the data to be used in training the methods.

6. Calculate error measures and choose best method. 4

History of Forecasting Competitions

5

Contribution Methods Objectives & comments Series

Nottingham: Reid, Granger: 1974

B-J, ES, AR, Combining • To assess the loss from using automatic methods

• To assess relative performance

106

Makridakis & Hibon:1979

13 core methods+seasonal & combining

• establishing the conditions under which one method outperforms alternatives

• Help forecasters choose

111

M1: 1982 14 core methods, 3 new (damped trend), 2 dropped

1001

Meese, Geweke: 1984

ARIMA, AR, ARARMA • Examine effects of pre-processing, e.g. detrending, deseasonalizing

150 macro series

M3: 2000 Core methods + Theta+ NN + Rule based + automatic ARIMA

• final attempt to settle the accuracy issue 3003

M4: 2019 As above + ML: 50 individual submissions: 61 in total

• Learn how to improve accuracy to improve the theory and practice of forecasting

• Include Prediction intervals

100K

In addition: competitions on homogeneous data: Telecomunnications, energy, tourism

The ML results

• In M3 the NN implementation performed poorly

• In Crone et al. (IJF, 2011) NN and Machine learning methods

performed poorly on subset M3 data (chosen for series length)

• In Makridakis et al. (Plos One, 2018) ML methods ‘out-of-the-box’

performed poorly on subset of N3

–Despite an earlier evaluation which excluded standard benchmarks

• In M4 a hybrid deep learning neural network model which could

learn cross-sectional patterns won the competition

© 2017 Wessex Press, Inc. Principles of Business Forecasting 2e (Ord, Fildes, Kourentzes) • Chapter 12: Putting Forecasting Methods to Work 6

Objections to forecasting competitions

• Lack of clarity of the objectives

–Results are aggregate, tells us little about how to tackle a specific

problem

• No defined population of time series

–Statistical significance

• There is a single optimal method –Combining is necessarily sub-optimal (but it works!)

• Competence or otherwise of the contributors (applied to

all competitions from ARIMA to ML)

• Error measures

–Aggregation over time and over series

–Use of a single time origin

A new competition – a case study of mobile payment data

• Clear objectives

–To provide (small) retailers with short-term forecasts of 1-14 days to improve their planning

• A clear population of time series

–Retailers using a mobile payment platform

• Not particularly homogeneous: geography, shop type

• Range of methods considered

–Single series statistical benchmarks and ML methods compared to pooled methods

• Pooling even heterogeneous series can improve parameter estimation

• Rigorous evaluation

–Range of problem relevant error measures considered

8

The practical and research question: can a ML method outperform statistical benchmarks in a particular context?

BACKGROUND: MOBILE PAYMENT DATA COLLECTION PROCESS AND CUSTOMER FLOW

FORECASTING

CUSTOMER FLOW FORECASTING WITH THIRD-PARTY MOBILE PAYMENT DATA

The practical question: can we help millions of small businesses improve their operations by providing professional customer flow forecasts based on third-party payment data?

INNOVATIONS – THE RESEARCH ISSUES

• a novel application in retailing using newly emerging mobile payment ‘big’ data.

• identifies a set of important predictors for forecasting daily customer flows

• explores the benefit of complex models using data pooling for forecasting many time series.

• develops a general solution for forecasting many time series based on regression trees (Gradient Boosting Regression Trees).

• proposes a new strategy to generate multi-step ahead forecasts

• provides experimental comparisons on various forecasting strategies for generating multiple steps ahead

Overall,

• To demonstrate rigorously the effectiveness of an ML method

METHODOLOGICAL FRAMEWORK

Many possible drivers

Data is messy Alternative strategies for multi-step forecasting

Estimation

TIME SERIES TRANSFORMATION FOR H-STEP AHEAD FORECASTING

Pseudo MIMO strategy Vector of lead time forecasts jointly estimated

Direct strategy xt+k based on available info to t

Recursive strategy – xt+k

based on xt+k-1

Xi,t is the number of customers in store i visiting on day t

FEATURE EXTRACTION

• Lags of the customer flow • Lags 1 to L

• Local dynamics • Moving 20/50/80 percentile and standard deviations over last 1 to

w=W/7 weeks

• Global summaries • The ratio between the mean of the flow in the day of the week

(Monday to Saturday) and the global mean

• Store specific characteristics • City, Category, Comments, average payments

• Seasonality • Day of week

• Calendar events • Holidays, the day before/after a holiday

• Weather • Temperature, Wind strength, Precipitation

FEATURE SELECTION PROCESS

• Core features • First, with a pre-set of core features included (such as Day of

Week and calendar events) and a maximum estimation window W, the optimal number of lags (L≤W) is determined;

• Local dynamics • Then the optimal window width w (W/7) is determined to

construct local indicators capturing the local dynamics, • e.g. moving median over the last 3 weeks;

• Outside factors. • Weather, shop type

The first two parts are based on a forward selection process, and the third part is based on an individual evaluation.

DATA

• A randomly selected 2000 stores sample from a leading mobile payment platform in China, including 19.6 million platform payments log from July 2015 to October 2016.

Average daily customer flow over 2000 stores

EXPERIMENTS DESIGN

• The training set spans 401 days

• Two test sets each consists of 42 days of customer flow data

• Rolling for every two weeks

• Operational forecast lead times, 1, 1-14

• Evaluation metrics • sMAPE, symmetric Mean Absolute Percentage Error • MdAPE, Median Absolute Percentage Error • AvgRelMAE, Average Relative Mean Absolute Error • MPE, Mean Percentage Error

BENCHMARK MODELS

• Time series models Last Week, Naïve, automated ETS , Theta, automatic ARIMA

• Pooled models Lasso regression Random Forest (RF)

Decision trees to split the sample Different ‘weak learner; models developed for each split

Bootstrap aggregation of ‘weak learners’ Performed well compared to other ML methods in Kaggle

Gradient Boosted Regression Tree (GBRT), Implemented with the ‘Xgboost’: https://machinelearningmastery.com/gentle-introduction-gradient-boosting-algorithm-machine-learning/

• an additive model to add weak learners to minimize the loss function • Trees are parameterized • an iterative functional gradient descent algorithm to add a new tree

(weak learner) • one of the most preferred choices in data analytics competitions

RESULTS: FEATURE SELECTION

Nemenyi test results for rank of models with various lags and window widths for constructing local summaries at 5% significance level. The critical distance for the Nemenyi test is 0.106 and 0.036 respectively.

RANK OF FEATURE IMPORTANCE ON SELECTED FEATURES (XGBOOST )

ONE-DAY AHEAD FORECASTS: ACCURACY Test set 1 Test set 2 Test set 1&2

sMAPE MdAPE AvgRel

MAE MPE sMAPE MdAPE

AvgRel

MAE MPE sMAPE MdAPE

AvgRel

MAE MPE

Last Week 0.130 0.082 1.000 6.330 0.150 0.103 1.000 -0.781 0.138 0.091 1.000 -0.973

Naive 0.120 0.080 0.943 4.322 0.112 0.077 0.776 0.062 0.115 0.079 0.833 0.033

ETS 0.103 0.066 0.789 2.309 0.099 0.069 0.674 -2.734 0.099 0.067 0.713 -2.360

Theta 0.106 0.069 0.831 5.093 0.100 0.069 0.689 -0.838 0.102 0.068 0.741 -0.186

ARIMA 0.105 0.071 0.819 2.324 0.108 0.078 0.731 -2.788 0.105 0.074 0.758 -2.457

Lasso 0.114 0.076 0.865 0.678 0.109 0.078 0.726 -0.047 0.110 0.077 0.777 0.083

RF 0.104 0.064 0.745 -5.463 0.096 0.067 0.637 -5.684 0.097 0.065 0.674 -5.326

GBRT 0.097 0.057 0.676 -4.314 0.088 0.061 0.583 -3.826 0.090 0.058 0.614 -3.713

Nemenyi test results at 5% significance level on store level one-step ahead forecasts over the whole test periods. - For all error measures GBRT

significantly better - Bias?

ONE-DAY AHEAD FORECASTS: INDIVIDUAL VS. POOLING

The store level one-step ahead forecasting accuracy differences between GBRT and five time series models over the whole test periods.

1

2

Last week Naive Theta Arima ETS

Baseline models

Avg

Re

lMA

E

1-14 DAYS AHEAD FORECASTS: EVALUATION

Test set 1 Test set 2 Test set 1&2

sMAPE MdAPE AvgRel

MAE MPE sMAPE MdAPE

AvgRel

MAE MPE sMAPE MdAPE

AvgRel

MAE MPE

Last Week 0.135 0.086 1.000 1.245 0.150 0.102 1.000 -0.380 0.141 0.092 1.000 -2.179

Naive 0.154 0.102 1.138 -2.182 0.159 0.111 1.033 -5.901 0.155 0.105 1.082 -6.203

ETS 0.128 0.082 0.939 0.426 0.124 0.086 0.816 -2.713 0.125 0.083 0.874 -3.912

Theta 0.132 0.085 0.992 4.612 0.126 0.087 0.840 0.200 0.128 0.085 0.913 -0.316

ARIMA 0.136 0.091 1.009 1.031 0.135 0.097 0.891 -2.648 0.134 0.094 0.948 -3.652

Lasso-Recursive 0.140 0.096 1.052 6.609 0.129 0.096 0.850 1.480 0.133 0.095 0.940 0.659

RF-Recursive 0.145 0.100 1.093 8.639 0.139 0.101 0.924 3.428 0.140 0.100 0.994 1.919

GBRT-Recursive 0.113 0.073 0.827 4.690 0.114 0.079 0.758 -1.105 0.112 0.076 0.794 -2.027

Lasso-Direct 0.131 0.085 0.980 9.229 0.126 0.091 0.835 2.764 0.127 0.088 0.898 1.731

RF-Direct 0.128 0.086 0.944 -3.236 0.126 0.092 0.835 -8.080 0.126 0.089 0.881 -8.663

GBRT-Direct 0.113 0.072 0.814 -0.273 0.105 0.073 0.699 -3.659 0.107 0.072 0.753 -4.425

Lasso-pMIMO 0.138 0.095 1.013 3.077 0.123 0.088 0.810 0.782 0.128 0.091 0.901 0.803

RF-pMIMO 0.124 0.078 0.882 -4.343 0.113 0.082 0.749 -8.174 0.116 0.081 0.809 -6.539

GBRT-pMIMO 0.116 0.068 0.793 -3.770 0.101 0.069 0.663 -5.496 0.106 0.069 0.723 -4.666

GBRT – Direct and pMIMO significantly better

• longer term accuracy • multi-step ahead forecasting strategies

MULTI-DAY AHEAD FORECASTS: INDIVIDUAL VS. POOLING

The store level multi-step ahead forecasting accuracy differences between GBRT with different strategies and five time series models over the whole test periods.

Direct pMIMO Recursive

Last week Naive Theta Arima ETS Last week Naive Theta Arima ETS Last week Naive Theta Arima ETS

0

1

2

3

4

Baseline models

Avg

Re

lMA

E

The forecasting accuracy comparisons across 1-14 horizons on three strategies with GBRT over six rolling test periods. - pMIMO best for longer horizons

TESTING THE ROBUSTNESS OF MULTIPLE HORIZON FORECASTING STRATEGIES

Method

CONCLUSIONS FROM A FORECASTING COMPETITION - WHAT THIS CASE STUDY TELLS US (FILDES ET AL.,1998)

a. Statistically sophisticated or complex

methods do not typically produce more

accurate forecasts than simpler ones.

b. The rankings of the performance of the

various methods vary with the error

measures used.

c. The relative performance of the various

methods depends upon the length of the

forecasting horizon.

d. The characteristics of the data series are

important factors in determining relative

performance

• develop methods to ‘fit’ the

characteristics of your data series

e. Comparisons based on a single time

series and a single forecast origin are

unreliable; use

• multiple time series and forecast

origins are recommended.

f. Replicability of results

a. Complex ML methods work best

b. No support

c. Limited support

d. Pooling proves effective

Data features important

e. Many time series and rolling origin used

f. Availability of code? New applications

26

CASE STUDY CONCLUSIONS

• Customer flow forecasting based on a large pool of stores that includes a variety of categories, can generate more accurate forecasts than the forecasts generated by methods based on each store individually

• Complex tree methods (both GBRT & RF) perform very well under data pooling in forecasting many time series

• When using the right forecasting strategy, GBRT performs well for both one-step and multi-step ahead forecasting tasks

• Demonstrates the potential of ML methods in a realistic context.

WHY DOES AN ML METHOD WORK HERE?

• Data characteristics: Time series in the pool are closely correlated due to shopping patterns

• Data pooling: Many applications of ML are highly parameterized and have overfitted despite use of cross-validation; Pooling across time series overcomes problem

• Models: GBRT and RF are powerful ML models can capture complex cross-section patterns

• Multi-horizon forecasting strategy: right strategy is needed to generate multi-step forecasts

QUESTIONS AND COMMENTS

1-14 DAYS AHEAD FORECASTS: NEMENYI TEST

Nemenyi test results at 5% significance level on store level 1-14 days ahead forecasts over the whole test periods. The critical distance for the Nemenyi test is 0.443.

Date post:	02-Oct-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Customer flow forecasting with GBRTpredictiveanalyticsandforecasting.com/wp-content/... · 1-14...

Documents