Predicting deliveries from suppliers1445722/FULLTEXT01.pdf · The need for further research is...

Predicting deliveries from suppliers

A comparison of predictive models

Marcus Sawert

Högskolepoäng: 30 HP

Termin/år: VT 2020

Handledare: Leif Olsson

Examinator: Aron Larsson

Kurskod/registreringsnummer: IG001A

Utbildningsprogram: Civilingenjör i Industriell ekonomi

2

Abstract

In the highly competitive environment that companies find themselves in today, it is

key to have a well-functioning supply chain. For manufacturing companies, having a

good supply chain is dependent on having a functioning production planning. The

production planning tries to fulfill the demand while considering the resources

available. This is complicated by the uncertainties that exist, such as the uncertainty in

demand, in manufacturing and in supply. Several methods and models have been

created to deal with production planning under uncertainty, but they often overlook

the complexity in the supply uncertainty, by considering it as a stochastic uncertainty.

To improve these models, a prediction based on earlier data regarding the supplier or

item could be used to see when the delivery is likely to arrive.

This study looked to compare different predictive models to see which one could best

be suited for this purpose.

Historic data regarding earlier deliveries was gathered from a large international

manufacturing company and was preprocessed before used in the models. The target

value that the models were to predict was the actual delivery time from the supplier.

The data was then tested with the following four regression models in Python: Linear

regression, ridge regression, Lasso and Elastic net. The results were calculated by

cross-validation and presented in the form of the mean absolute error together with

the standard deviation. The results showed that the Elastic net was the overall best

performing model, and that the linear regression performed worst.

Keywords: Production planning, Supply, Deliveries, Prediction, Linear regression,

Ridge regression, Lasso, Elastic net.

3

Table of Contents

Abstract ............................................................................................................................... 2

List of abbreviations ........................................................................................................... 4

1 Introduction ...................................................................................................................... 5

1.1 Background ...................................................................................................................... 5

1.2 Purpose & Aim ................................................................................................................ 7

1.3 Research question ........................................................................................................... 7

1.4 Delimitations ................................................................................................................... 7

2 Theory ............................................................................................................................... 8

2.1 Supply chain management ............................................................................................ 8

2.2 Stochastic processes ........................................................................................................ 9

2.3 Predictive models .......................................................................................................... 10

2.4 Evaluation methods ...................................................................................................... 15

2.5 Previous studies ............................................................................................................ 18

3 Method ............................................................................................................................ 20

3.1 Data collection ............................................................................................................... 20

3.2 Data preprocessing ....................................................................................................... 21

3.3 Prediction ....................................................................................................................... 22

3.4 Evaluation ...................................................................................................................... 26

3.5 Reliability & Validity .................................................................................................... 27

3.6 Research ethics .............................................................................................................. 27

4 Results ............................................................................................................................ 29

5 Discussion ...................................................................................................................... 32

6 Conclusion ..................................................................................................................... 34

6.1 Future research .............................................................................................................. 34

References......................................................................................................................... 35

4

List of abbreviations

ANN Artificial Neural Network

ARIMA Autoregressive Integrated Moving Average

ERP Enterprise Resource Planning

Lasso Least Absolute Shrinkage and Selection Operator

MAE Mean Absolute Error

MAPE Mean Absolute Percentage Error

MRP Materials Requirement Planning

MRPII Manufacturing Resource Planning

MTO Make-to-order

MTS Make-to-stock

SCP Supply chain planning

sMAPE symmetric Mean Absolute Percentage Error

5

1 Introduction

In the highly competitive and volatile business environment many manufacturing

companies act in today, it is of great importance for them to have a functioning supply

chain management, which manages the flow of material from the procurement,

through manufacturing, and last with the delivery to the end customer (Sweeney et

al., 2018). The production planning is a big part of this process, which is trying to

fulfill the demand while considering the available material, employee and production

resources.

The available material is dependent on the deliveries from the suppliers, which not

always can be taken for granted to be on time (Khakdaman et al., 2015). This creates

the need to take the uncertainty in the material procurement in consideration when

the delivery date to the end customer is communicated. Therefore, it is important to

find a model that can predict the incoming deliveries with a high accuracy.

1.1 Background

Supply chain management is a way of coordinating the activities within a company to

ensure that the customer demand is met and delivered on time. The five main supply

chain processes of the Supply Chain Operations Reference-model are plan, source,

make, deliver and return (Bushuev and Guiffrida, 2012). The delivery part of the

supply chain is of great importance, and the delivery performance is one of the key

aspects of a successful supply chain. The timeliness of the deliveries has an impact on

the customer satisfaction, as well as having an economical aspect. Both early and late

deliveries create costs, early deliveries create extra inventory costs and late deliveries

produce costs in the form of production stoppage costs, lost sales and lost goodwill. In

some cases the delivery reliability, that the delivery is made in time, is more valued

than a fast delivery (Bushuev and Guiffrida, 2012).

To maintain a good delivery performance while maintaining low inventories is

increasingly difficult, with the increased need for customization which creates an

issue to accurately forecast the demand (Ngniatedema et al., 2015). This highlights one

of the key processes within manufacturing companies to deal with these problems, the

production planning. The production planning is the process that aims to making sure

that the production fulfills the demands available, with consideration to the available

material, the employees and the production resources. The production planning can

be divided into three parts, starting with a long-term aggregate plan, followed by a

mid-term plan and finally a short-term plan which decides which products and how

many to be produced, and when the production should start (Er et al., 2018). The

production planning is very closely linked to the material procurement since the

production cannot be completed if the required material is not available. A number of

systems have been developed to help coordinate this, such as the materials

requirement planning (MRP), the manufacturing resource planning (MRPII) and the

enterprise resource planning (ERP).

It is desirable that the scheduling produced by the production planning is as stable as

6

possible, but different uncertainties often cause a continual re-planning (Er et al.,

2018).

The uncertainties that needs to be taken into consideration within production

planning can be divided into three main groups (Peidro et al., 2010): demand,

process/manufacturing and supply. The demand uncertainty is being presented as the

most important one, which consists of the volatility in demand and the inexact

forecasting. The process/manufacturing uncertainty is regarding the internal

production process, with machine problems, quality issues etc. The uncertainty in

supply is caused by faults or delays in the deliveries from the suppliers (Peidro et al.,

2010).

There has been extensive research on the area of production planning with

uncertainty, but there is a handful of weaknesses that are not always addressed.

Beemsterboer et al. (2016) brings up the issue that a lot of the literature assumes either

pure make-to-order (MTO) or make-to-stock (MTS) production systems, conflicting

with modern hybrid production systems which combine the two.

Also a lack of consideration of multi-product organization is brought up (Khakdaman

et al., 2015) as a weakness in the available literature. Assuming a single product is a

simplification and doesn’t portray modern organizations.

In some of the previous research of production planning with uncertainty the

frameworks used have been stochastic programming and robust optimization, but

these models have only been considering demand uncertainty, which is only one type

that exists within production planning (Aouam et al., 2018).

The need for further research is expressed in (Mula et al., 2006), where they determine

that uncertainty always needs to be taken into consideration, and that optimization in

the area of production planning is very complex. The same study mentions models

based on artificial intelligence and fuzzy set theory to possibly be of use to model

production planning under uncertainty. Mula et al. (2006) also highlights the need for

models with other types of uncertainty, since models dealing with uncertain demand

has received a lot more attention in comparison.

Several studies on the area, such as Khakdaman et al. (2015), Gao et al. (2017), Peidro

et al. (2010) and Mirzapour Al-e-hashem et al. (2011) build their models with a

number of variables to help with production planning (etc.) under uncertainty, but

they lack consideration to uncertain supply, or it is assumed that the supply is

stochastic. Assuming that the uncertain supply is stochastic is a simplification of

reality as different parts and suppliers differs significant in their delivery

performance. A model to predict the incoming deliveries could improve these models

to become even more accurate.

7

1.2 Purpose & Aim

The aim of the study is to contribute to the research areas of supply chain risk

management as well as predictive modeling, by examining which predictive model is

best suited to predict the supply to a manufacturing company.

1.3 Research question

The research question is to examine which type of predictive model that gives the

most accurate results when predicting the supply of components and material to a

manufacturing company.

1.4 Delimitations

The data in the research will be taken from a large international manufacturing

company that has a multi-product mix, with both MTO and MTS production. The

production mostly consists of assembly of components from suppliers.

8

2 Theory

A few theoretical concepts are presented in the following chapter, which are relevant

to the study. This includes different models for prediction, as well as methods to

evaluate the results. It is also motivated which prediction models and evaluation

methods that ended up being used in the study. These will be presented in the

method chapter in more depth as to how they were practically used.

2.1 Supply chain management

Within business, a supply chain is a network of organizations, people, activities and

resources needed to meet the demand of the end customer. Managing this is a

foundation for an organization to attain and maintain its competitive advantage

(Bushuev and Guiffrida, 2012). This supply chain management includes an active

coordination of raw material acquisition, production processing as well as the

distribution of goods and services.

The five supply chain processes defined by the Supply Chain Operations Reference-

model (SCOR) are plan, source, make, deliver and return. The importance of the

delivery process in the supply chain is well documented, as the timeliness of

deliveries are of great importance to the customer.

This together with the fact that deliveries are integrated and effecting the different

stages of the supply chain makes the delivery performance the most important metric

in a supply chain (Bushuev and Guiffrida, 2012).

Improving the delivery process in the supply chain is therefore of interest to managers

within supply chain and logistics.

How well the delivery process works for a manufacturing company is dependent on if

the company can produce the required amount of goods in time, which requires a

well-functioning production planning.

2.1.1 Production planning

Production planning is the process within a manufacturing company to plan the

production regarding the availability of material, production resources and

employees. The goals of the production planning are, among others, to provide on-

time deliveries, optimal production utilization and low inventories. These goals are

often in conflict with each other, for example could on-time deliveries more often be

achieved with the help of a large inventory of finished goods, but such an inventory

instead leads to higher costs (Hübl, 2018).

Several technologies have been developed through the years to help companies with

production planning, such as the material requirement planning (MRP),

manufacturing resource planning (MRPII), enterprise resource planning (ERP) and the

supply chain planning (SCP) (Hübl, 2018).

9

The difficulty and the source to continuous re-planning within production planning

are the uncertainties involved. These are the demand uncertainty, the uncertainty in

the internal manufacturing/processes, and the uncertainty of supply (Peidro et al.,

2010). The demand uncertainty has been considered the most important of the three,

and the one that has received the most attention from researchers. One reason to the

difficulty of forecasting the demand is the phenomenon called the bullwhip effect,

which has been extensively researched, resulting in several ways presented to

mitigate this effect. The bullwhip effect is discussed in more detail in chapter 2.5.

The supply uncertainty is due to the fact that it can not be taken for granted that

suppliers deliver the correct amount at the correct time (Khakdaman et al., 2015).

Similarly, as for the demand side, this makes it difficult to forecast and predict when

all needed material will be available for the production. This however has not been

addressed to the same extent as the demand uncertainty resulting in a gap in the

research as to how this problem should be addressed.

The rest of the chapter will cover theory relevant to the supply chain management,

and production planning under uncertainty.

2.2 Stochastic processes

Stochastic means that something is randomly determined, and a stochastic process is

randomly changing as time passes. It is often used in modern chemical and physical

research, as it suits the random nature of their underlying mechanisms.

The random variables in a stochastic process can be generated in a number of different

distributions, where the following three are the most commonly used (Weiss, 2017):

• Uniform distribution, with values equal probably distributed throughout

the range.

• Poisson distribution, where a given number of independent events occur

in a time interval at a known rate.

• Binomial distribution, variables with two possible outcomes are randomly

generated, the outcomes can be of equal or unequal probability.

Some of the studies considering uncertainty in supply uses a stochastic variable for

calculating the risk. Since suppliers differs when it comes to delivery performance, as

well as the big difference between different parts and components in their material,

complexity, need for processing etc. makes a stochastic variable for calculating the

delivery uncertainty a simplification of reality. Therefore, an accurate model needs to

be based on a calculation that considers the changing delivery performance between

different suppliers and products.

10

2.3 Predictive models

A few commonly used predictive models are presented below, along with some

information of which types of studies they have been used in, and what type of data

they can be used with.

2.3.1 Neural Networks

An Artificial Neural Network (ANN) can be described as a model that is inspired by

the structure and processing of a biological brain. It consists of processing elements,

the neurons, and connections between these together with weights. It is commonly

defined by four parameters (Shanmuganathan and Samarasinghe, 2016):

1. The type of neuron, for example McCulloch-Pitts neuron.

2. The connection architecture, neural networks can be fully or partially

connected, as well as having different layers of neurons. In an auto associative

network, the input neurons are the also the output neurons, while in a hetero

associative network there are separate input and output neurons. The

architecture is determined depending on the connection between the output

and input neurons. The feedforward architecture has no connection back from

the output to the input neurons and the network has no memory of the

previous output values. The feedback architecture on the other hand, has

connections back from the output to the input neurons, and the network

remembers its previous states.

3. The learning algorithm which trains the network, which are divided into

supervised learning, unsupervised learning and reinforcement learning.

4. Recall algorithm, how the knowledge is extracted from the neural network.

Includes pattern association, data clustering, categorization.

Figure 1: Example of the structure of an artificial neural network (Neural-networks 2020)

11

Depending on what the goal of the neural network is, it can be constructed in different

ways. The learning algorithms depends on whether the model is linear or not, as

linear neural networks often are based on the least mean square rule. Nonlinear

models are often based on the back-propagation training rule (Russo et al., 2013).

Neural networks are often used for signal processing, forecasting and clustering, and

are increasingly being used in different areas where imprecise data and complex

relationship between variables exist, something that is difficult to handle with

traditional analytical methods (Shanmuganathan and Samarasinghe, 2016).

2.3.2 Time series analysis

Time series consists of observations that are taken sequentially in time, which can be

analyzed with the aim of finding trends or other useful information. The observations

need to be taken with equal time periods between them, it can be monthly, weekly,

daily or any other defined time period. Time series are frequently used in economics,

business, engineering and natural sciences (Box et al., 2013). One inherent feature of

time series is that the observations normally are dependent. This dependence is often

of great interest, and time series analysis includes techniques to analyze this

dependence.

In industry, business and economics, forecasting with the help of time series is of

particular interest. In these fields the time series are often represented as

nonstationary, with no constant mean level over time. Forecasting methods to deal

with this has been developed, which use the exponentially weighted moving

averages. One common model is the autoregressive integrated moving average

(ARIMA) (Box et al., 2013).

One problem with time series is brought up by Soo and Rottman (2018), which is the

complexity of influences that variables have on each other. Variables can experience

temporal trends, which can make it appear as if it exists positive or negative

relationship between variables even if no direct relation exist.

2.3.3 Regression analysis

A regression model is an analytical tool which estimates the relationship between a

dependent variable, and one or more explanatory variables. The explanatory

variables, or regressors, have a statistical or causal relationship to the dependent

variable. A regression model is quantitative and can be constructed in different ways,

depending on the purpose of the model and the data available (Welc and Esquerdo,

2018).

Regression models can be classified depending on a number of criteria, a few of them

are (Welc and Esquerdo, 2018):

• The numbers of equations used, there are single-equation models, which

use one equation to explain the relationship between the dependent

variable and the explanatory variables, and multi-equation models, which

12

have more than one dependent variable, and uses more than one

equation.

• How many explanatory variables the model uses, there are univariate

models with only one explanatory variable, and there are multivariate

models with at least two explanatory variables.

• The functional form of the model, there are linear models with a linear

relationship between the variables, and nonlinear models.

• The type of the dependent variable, in some models the dependent

variable is continuous, other models use a dependent variable expressed

in a binary way.

Regression models are mainly used for the following three purposes:

To examine the relationship, or lack thereof, between variables.

Forecasting, based on a model which explains the behavior of historical data, future

events can be predicted.

Scenario analyses, for example how an explanatory variable must change in order for

the dependent variable to reach a certain value (Welc and Esquerdo, 2018).

2.3.3.1 Linear regression

Linear regression is the simplest of the regression models and uses a linear

combination of the predictors for its function (Su et al., 2012). Linear regression has

become the building block for numerous modeling tools and is popular for the easily

interpretable parameters of the model. Also being able to provide a satisfactory result

to regression problems even when the sample size is small or the relationship between

the predictors and the dependent variable is relatively vague has increased the

popularity of linear regression (Su et al., 2012).

There are many methods to estimate the parameters of the linear regression model,

and the most common one is the least squares method which tries to minimize the

distance from the predicted values to the actual ones.

In linear regression models there are a few assumptions made about the data, the

predictors, the dependent variable and the relationship between them. These are a few

of the assumptions that are made (Pandis, 2016):

• That the observations of the dependent variable are independent from

each other.

• That for each value of the predictor, the dependent variable follows a

normal distribution.

• That the variability of the dependent variable stays the same for each

value the independent variable takes.

Although, even if the difference between the estimated values and the observed ones

not are normally distributed, linear regression can still produce valid results when the

13

sample size is large. And on the other hand, a linear regression model with these

values normally distributed is not necessarily valid (Schmidt and Finan, 2018).

2.3.3.2 Ridge regression

Also known as Tikhonov regularization, the Ridge regression was developed to solve

some of the issues with the least squares method within linear regression models

(Silva et al., 2018). Ridge regression together with, among others, Lasso are so called

shrinkage methods which picks out variables to keep in the finished model. This is also

the idea of another group of called subset selection methods, but there is an important

difference between the two groups. Subset selection methods have a discrete process

where the variables either are kept or discarded which often produces models with

high variance. Shrinkage methods however are continuous in their variable selection

process and are therefore not as prone to having high variability (Hastie et al., 2009).

One of these issues with the least squares method sometimes occurs when the

predictors have a high internal dependence and the regression coefficients have large

standard errors. This often results in poorly performing estimators, which the Ridge

regression tries to correct by implementing a penalty on the size of the coefficients

standard errors (Silva et al., 2018).

A similar function is also used in neural networks, which is called weight decay.

Another flaw with linear regression is that unusual large coefficient on a variable can

be canceled out by a negative coefficient with of similar size on its correlated variable,

this is dealt with in Ridge regression by implementing size constraints on the

coefficients (Hastie et al., 2009).

2.3.3.3 Lasso

Least Absolute Shrinkage and Selection Operator (Lasso) is a linear regression model

that effectively reduce the number of variables that the solution is dependent on

(Linear Models 2020). Standard regression models tend to include too many variables

which leads to overfitting as well as overestimating how the included variables are

used to explain the observed variability, thus “optimism bias” (Ranstam and Cook,

2018).

Lasso offers a solution to some of these problems with standard regression models,

since it tries to identify the variables and regression coefficients that leads to the least

prediction error. One of the drawbacks of the Lasso model is that it does not focus on

the contribution of individual variables, but instead the focus is on the best combined

prediction (Ranstam and Cook, 2018).

The Lasso model has been used for prediction with good results, one example is a

study by Wang et al. (2018) in which they predicted fuel consumption of ships at sea.

The results of the Lasso model were compared with three other methods (artificial

14

neural networks, support vector regression and Gaussian process regression) which

showed that the Lasso model was the best performing of the four.

2.3.3.4 Elastic net

Built in a similar way as Lasso, the regression method Elastic net simultaneously

performs automatic variable selection as well as continuous shrinkage. It was

developed in order to overcome some of the shortcomings of Lasso, one of which is

that when a group of variables exist with very high pairwise correlations between

them, Lasso selects one variable from the entire group without care for which one it

selects. Elastic net on the other hand prefers to use a grouping effect, where variables

with strong correlations are either kept in the model or removed together as a group

(Zou and Hastie, 2005).

Elastic net has shown to be very useful when the amount of variables is bigger than

the number of observations in the data (Hao and Lu, 2018).

Zou and Hastie (2005) show in their research, using real world data, that the Elastic

net method often achieves better results than both Lasso and Ridge regression.

Choosing the models

As shown above, there are several different types of predictive models available

depending on the data and how the prediction problem is structured.

As the problem in this case is the uncertainty in deliveries from suppliers, the

prediction is looking to provide an answer to when the delivery will be taking place.

Regression can provide a continues value, i.e. not providing the results as classified

into different categories. This suits the data well as deliveries can have very different

expected delivery times, and classification would create categories with large spans.

Time series analysis will not be used due to the fact that they look at defined time

periods, whereas deliveries have varying lead times. Time series could be used to

forecast what would be delivered during a defined time period but lacks the ability to

predict specific deliveries or the performance of suppliers or parts.

When it comes to prediction, linear regression models can sometimes perform better

than more advanced nonlinear models, in particular when the dataset is relatively

small (Hastie et al., 2009). Since the results are wanted as a continual value and not

classified, the simple linear regression is a good starting point.

Wang et al. (2018) showed in their study that Lasso outperformed, among others, the

artificial neural network when predicting fuel consumption. The number of variables

in their study is almost the same as in this study, and they too were looking to predict

a continuous value. Considering this, Lasso is a reasonable choice of model to try in

this study.

Scikit-learn is an open source machine learning library which offers both numerous

algorithms as well as several guides and thorough documentation for all its available

15

functions. And since finding the correct model is one of the hardest parts of solving a

machine learning problem, scikit-learn offers a guide to that as well. Depending on

the problem that the model will be used to solve, and what types of data that are

available, some machine learning models are better suited than others (Choosing the

right estimator 2020). The delivery times in this study is to be calculated as a

continuous variable and the number of samples is not very large. These conditions

make Lasso, Ridge regression and Elastic net some of the machine-learning models

that scikit-learn recommends for this type of study. This together with the fact that

Ridge was developed as an improvement to the ordinary linear regression model, and

that Elastic net is developed to deal with some of the shortcomings of Lasso makes the

comparison between them all interesting.

The predictive models chosen for the study are therefore:

Linear regression

Ridge regression

Lasso

Elastic net

2.4 Evaluation methods

This chapter presents some of the most common methods used for evaluating

predictive models.

2.4.1 MAPE

Mean Absolute Percentage Error (MAPE) is a measure for assessing the quality and

accuracy of a prediction made with a forecasting method. It can be presented as a

percentage and it is commonly used in practice thanks to the intuitive interpretation

of relative error that it offers (de Myttenaere et al., 2016).

Although being a popular and often used assessment method, MAPE has a few flaws.

It cannot handle zero values, as it would require divisions with zero. The most

concerning problem however, is that equal errors result in different MAPE-scores,

depending on if the forecasted value is above or below the actual value (Makridakis,

1993). This means that MAPE rewards results that under-forecast, which creates a

misleading result.

(2.1)

Table 1 The variables of MAPE

t Observation index 𝐴𝑡 Actual value

n Number of observations 𝐹𝑡 Forecasted value

𝑀𝐴𝑃𝐸 =1

𝑛∑

|𝐴𝑡 − 𝐹𝑡|

𝐴𝑡

𝑛

𝑡=1

16

2.4.2 sMAPE

To overcome some of the shortcoming of the MAPE method, it was expanded and

became the symmetric Mean Absolute Percentage Error (sMAPE). In addition to being

symmetric, it can also handle zero values (Flores, 1986). The model, just like MAPE,

gives a forecast a percentage error in comparison to the actual value.

(2.2)

Table 2 The variables of sMAPE

t Observation index 𝐴𝑡 Actual value

n Number of observations 𝐹𝑡 Forecasted value

2.4.3 MAE

Mean absolute error (MAE) is a measure of errors between observations, which can be

used to evaluate how well predictive models perform. MAE is calculated by taking

the average of the absolute error between the predicted and the actual value (Metrics

and scoring 2020).

(2.3)

Table 3 The variables of MAE

i Observation index 𝑦�̂� Predicted value

n Number of observations 𝑦𝑖 Actual value

Next to MAE, the very similar root mean square error (RMSE) is also widely used to

evaluate similar models as MAE, and there is no consensus on which of the two is

most appropriate to use (Chai and Draxler, 2014). One major difference between the

two is that all errors are given the same weight when using MAE, while RMSE on the

other hand gives errors with larger absolute values a higher weight. This has the effect

that the RMSE is more sensitive to outliers in the data compared to MAE (Mendo,

2009).

𝑠𝑀𝐴𝑃𝐸 =100%

𝑛 ∑

|𝐹𝑡 − 𝐴𝑡|

(|𝐴𝑡| + |𝐹𝑡|)/2

𝑛

𝑡=1

𝑀𝐴𝐸 =1

𝑛∑|𝑦𝑖 − 𝑦�̂�|

𝑛

𝑖=1

17

Chai and Draxler (2014) argues that when the errors are expected to be normally

distributed, the RMSE is the better choice over MAE, and that MEA should be used

when the errors are expected to be uniformly distributed.

Another distinct difference between RMSE and MEA is that the MAE uses absolute

values, which is a big disadvantage in many mathematical calculations (Chai and

Draxler, 2014).

Also, that the results of the MAE are of the same scale as the data it evaluates must be

taken into consideration before choosing the appropriate evaluation method.

2.4.4 Cross-validation

The validity of predictive models depend on their ability to generalize and accurately

predict properties of new and unseen data that is independent from the data used to

train the model (Varoquaux, 2018). This can be done with cross-validation, which

splits the available data into separate training and testing sets. Cross-validation

methods often use multiple rounds to evaluate the results. One of these methods is the

k-fold cross-validation where the data is divided into k subsets of equal size, and the

model performs k training rounds, or folds. For each training fold the model chooses

one of the subsets to be the test set, and the rest are chosen as training sets (Liu et al.,

2020). Each fold results in a validation error, and the k-fold cross-validation estimate is

the average validation error for all the k folds.

Many prediction models use hyperparameters, for example kernel parameter or

regularization parameter. These parameters are not calculated within the model, so

they need to be set before the model can be used. Hyperparameters can have different

purposes in a model, the regularization parameter for instance decides the amount of

shrinkage in a model. The value of these hyperparameters can sometimes greatly

affect the results, so finding the optimal value can be decisive for the performance of

the model. Cross-validation can be used both to find the optimal hyperparameter for

the model chosen, or simply to evaluate different models compared to each other (Liu

et al., 2020).

Choosing the models

All the models chosen for prediction are regression models, in which there are several

predictors that are used together to try to predict the dependent variable, or target

value. One common feature of most machine learning techniques is that the model is

first trained on a sub-set of the available data, and then tested on the rest.

In this study, a 5-fold cross-validation will be used in order to make sure that the

results are not affected by randomness as each sub-part of the dataset will be used

four times as a training set, and once as the test set.

In the cross-validation it has to be decided which format it is supposed to return the

results in. In this study the results will be in the form of MAE, the mean actual error.

The target value of the study is the actual lead time of each part and is measured in

days. This allows MAE to be easily understood and the fact that it is scale dependent

18

is not an issue. MAE is also better in handling outliers in the data, which exist in this

study (Mendo, 2009).

Scikit-learn also offers a built-in MAE calculator in its cross-validation function, as

opposed to MAPE and sMAPE.

2.5 Previous studies

Khakdaman et al. (2015) builds a robust optimization model to manage production

planning in an MTS/MTO environment. The uncertainties consist of delivery, process

and demand. The delivery part of the uncertainty is only about cost and currency

rates, which means that they overlook the risk of delayed or incomplete deliveries.

They conclude that although their focus is mostly on cost-related uncertainties, they

feel confident that their model delivered accurate estimates and useful findings.

Gao et al. (2017) uses a Markov-model for signal-based dynamic supply forecast, in

which suppliers are examined to find signals (financial health etc.) in order to guide

the procurement and selling decisions of the company. In the model, the capacity of

the suppliers is assumed to be stochastic. Their research concludes that a traditional

stationary forecast for supply capacity uncertainty may lead to poor decisions and

severe losses.

In (Peidro et al., 2010), a fuzzy linear programming model is used to deal with supply

chain planning under demand, process and delivery uncertainty. The uncertainties are

jointly considered, and the aim of the model is to use the available resources in order

to meet customer demands at a minimum cost. The model is tested with real-word

data and the results show that the model is more effective than deterministic methods

for handling situations without certain and precise information in supply chain

planning. The researchers see the integration of analytical and simulation models as a

possible improvement, where the best capacities of both types of models are

integrated to be used for supply chain planning problems.

Mirzapour Al-e-hashem et al. (2011) builds a robust multi-objective mixed integer

nonlinear programming model to deal with multi-product aggregate production

planning (APP) with uncertainty. The supply chain includes multiple suppliers,

multiple customers, is multi-period and multi-site. The uncertainties include the

demand fluctuations, as well as the cost parameters of the supply chain.

The demand part of the supply chain management is extensively researched, and one

phenomena of uncertainty in demand is called the bullwhip effect. This effect is one of

the main sources of inefficiencies in supply chains, and by extension, in production

planning. The effect is described as a growing difficulty to forecast demand the

further from the end customer, or higher up the supply chain an organization is (Braz

et al., 2018). Different ways to mitigate this effect are recommended, one way is to

implement a closed-loop supply chain, which has shown to be less affected than

traditional supply chains (Braz et al., 2018). Another way, introduced by Jaggi et al.

(2018) is the iterative proportional allocation algorithm which discourages the

bullwhip effect.

19

In the cases where these models deal with the uncertainty in supply, this uncertainty

is mixed and calculated together with other uncertainties, or it is viewed as a

stochastic uncertainty. The models do not focus on the supply uncertainty alone. This

means that the models produce results that are generalized and not consistent with

the complexity of real-world cases since they attribute all suppliers and items with the

same uncertainty. The research gap found here is to move away from the

generalization and instead use predictive modeling to form a data driven model

which can be incorporated in a bigger model that deals with production planning

under uncertainty in general.

Khakdaman et al. (2015, p. 1384) express at the end in their conclusion this need to

consider delivery uncertainty: “Finally, incorporating non-financial uncertainties such

as production lead time or raw material supply lead time into the model would

enhance its usefulness to both academics and practitioners.”

Predicting deliveries from supplier could both be used in models as explained above,

or independently as a way to handle uncertainties in the supply chain, i.e. a way of

supply chain risk management.

This study aims to find the predictive model best suited for these purposes.

20

3 Method

The methods and approaches to answer the research question are presented in the

following chapter, along with some information regarding the reliability, validity and

research ethics and how these are considered in this study.

The sub chapters are presented in the order in which they were carried out.

The study consists of four main parts, the data collection, data preprocessing, the

prediction and evaluation. The first part, the data collection, was made directly at a

manufacturing company. This was made in order to make sure that the results of the

study are applicable to real-world cases. The data then had to be preprocessed, to be

able to be used in the chosen predictive models. After the preprocessing was

complete, the predictive models were constructed and used with the now appropriate

data. Lastly, all the results from the different predictive models were gathered,

summarized and presented, all in order to see if any conclusions could be drawn.

All parts are described in more detail in the following chapter.

Figure 2: The structure of the study

3.1 Data collection

All the data was gathered from the company ERP-system, which tracks all orders,

warehouse information etc. throughout the different company facilities. The system

also contains information about the supplier for each part, along with the expected

lead time. The lead time is the time from the placement of the order to when the order

Data collection

Data preprocessing

Prediction

Evaluation

21

is fulfilled. The system also tracks the actual lead time of each delivery, and that is the

important information. The actual lead time is what is being used as the target value

in this study, the value that the models will try to predict.

This means that there is historical data in the company ERP-system regarding when

each order is placed, the lead time of each part, the supplier, and when the order was

received.

The data gathered is all internal transactions during 2018 and 2019, which in total was

375 000 lines. This was then filtered to only include the deliveries from the main

producing factory in Sweden that are sent to the other company facilities around the

world.

To make sure that the data used was going to be large enough, and that the results

would show over multiple cases, the 10 items with most deliveries during these two

years were selected to be used in the models. This means that the predictive models

were used on 10 different items to see if any clear conclusions could be drawn.

The data was collected from the ERP-system to an Excel file, in which the data

regarding the 10 items most frequently delivered was picked out, and then remade

into a comma separated file (.csv) in order to facilitate the data handling in Python.

3.2 Data preprocessing

To be able to use the data in a prediction model, it first needs to be preprocessed and

structured in a correct way. To successfully use algorithms to draw conclusions from

large sets of data, one key point is the quality of the data. The data needs to be

complete, up to date, and without errors, which collected data rarely is (Davidson and

Tayi, 2009).

Modern databases, many of huge sizes, often include noisy and missing data that

originate from multiple different sources (Jiawei Han et al., 2012). Data quality can be

described as the accuracy, completeness, consistency, timeliness, believability and

interpretability of the data (Jiawei Han et al., 2012). To help improve the data quality

and substantially improve the quality of the results, there are a few techniques of data

preprocessing that can be used.

Completeness is one key part of data quality, and one that often needs to be properly

addressed since missing values are a common issue with large datasets. A missing

value is when an observation, or tuple, doesn’t have values for each of the attributes.

The reason why a tuple lack values for each attribute can be a technical issue, but also

that the system gathering the data is constructed in a way that creates missing values.

A few different techniques to deal with this problem exists, among them is simply to

remove the tuple, to use a global constant to fill in the missing value, or to fill in the

mean or median value for that particular attribute.

All these techniques come with drawbacks, as ignoring the tuple decreases the size of

the data and possibly affect the quality of the results. Using a constant for all missing

values, or using the mean or median value produces a bias in the data and can

therefore also reduce the quality of the results (Jiawei Han et al., 2012).

22

In this study there were one attribute that many of the tuples were missing data for,

which was due to how the system is constructed. These missing values are actually 0

and were thus replaced with 0.

Binning is a data preprocessing technique to smooth data and can be used to reduce

the numbers of distinct values that an attribute can be (Jiawei Han et al., 2012). In this

study all the orders gathered from the ERP-system includes information about when

the order was placed and when it was scheduled for delivery and other scheduling

information. All these were in a year-month-day format, which creates many different

possible combinations over a two-year period. Here binning was used to reduce the

numbers of possible values, and to better visualize any monthly effects. All dates were

formatted as just month.

One of the attributes in the dataset had all its values formatted as text, so these were

transformed into numbers instead to ease the handling and loading of the data.

Lastly, the data was cleared from any sensitive information (costs, who placed the

order etc.) as well as a few redundant attributes. Some attributes were also removed

that contained information only available when the delivery was completed, this was

removed due to the fact that if a predictive model is to be useful, it cannot include

information not available before the actual delivery taking place. It would then not be

possible to use as a predictive model. When the data was preprocessed it contained 15

attributes without the target value (actual delivery time).

3.3 Prediction

This chapter is divided into parts for each of the chosen prediction models, in which it

is described which parameters that must be chosen and how it is performed.

Python has become one of the most popular languages due to its interactive nature

and the growing ecosystem of scientific libraries (Pedregosa et al., 2011).

Scikit-learn is a Python module which integrates several machine learning algorithms

for supervised and unsupervised learning. The scikit-learn project was created in

order to provide an open source machine learning library that would be well

accessible to non-machine learning experts and usable in various scientific areas

(Buitinck et al., 2013).

The library is a collection of functions and classes that the user then imports into a

Python environment.

Scikit-learn is as previously mentioned, a Python library, and needs an environment

to run it. Anaconda, which is a package and environment manager has been chosen to

be used together with the scikit-learn library. Anaconda has thousands of open-source

packages able for installation, and offers a simple user interface (Anaconda Distribution

2020).

23

3.3.1 Linear regression

Scikit uses the most common method for parameter estimation, the ordinary least

squares method, shown in equation 3.1, which tries to minimize the sum of squares

that differs between the observed values and the predicted ones (Linear Regression

2020).

(3.1)

Table 4 The variables of ordinary least squares

w Coefficients y Target values

X Training data

The data is divided into arrays, X for the testing data and y for the target values. The

coefficients, or weights, are calculated and the model is then fitted and tested.

The results are then calculated with the help of cross validation and presented in the

form of MAE.

3.3.2 Ridge regression

Ridge regression uses a similar method for parameter estimation as the ordinary least

squares method does but adds a penalty on the size of the coefficients. Then the model

minimizes the sum of squares in accordance with equation 3.2 (Linear Models 2020):

(3.2)

Table 5 The variables of Ridge regression


X Training data 𝛼 Regularization parameter

The complexity parameter alpha in the equation above decides the amount of

shrinkage in the model, a larger value of alpha results in a bigger shrinkage. Since the

best suitable value of alpha can differ a lot depending on the data, Scikit-learn offers a

method to try different values. This is done with RidgeCV which is a cross validation

||𝑋𝑤 − 𝑦||𝑤

𝑚𝑖𝑛

2

2

𝑚𝑖𝑛

𝑤||𝑋𝑤 − 𝑦||

2

2+ 𝛼||𝑤||

2

2

24

method where the model uses different alpha values to see which one best fits the

model (RidgeCV 2020).

Figure 3: Example of how the MAE changes depending on the value of alpha

RidgeCV performs a 5-fold cross validation for each of the alpha values that is

specified to be tested. In this study, the following alpha values are included:

Table 6 The values tested as alpha in Ridge regression

𝛼 0.001, 0.005, 0.01, 0.05, 0.1, 0.3, 1, 3, 5, 10, 15, 30, 50, 75, 100, 150

After the best suited alpha value is calculated, the results are calculated with cross

validation and presented in the form of MAE together with the standard deviation of

the results.

3.3.3 Lasso

Lasso estimates sparse coefficients and tends to prefer solutions with coefficients,

which reduces the variables the solution is dependent on. Lasso tries to minimize

equation 3.3 (Linear Models 2020):

(3.3)

Table 7 The variables of Lasso



𝑚𝑖𝑛

𝑤

1

2𝑛𝑠𝑎𝑚𝑝𝑙𝑒𝑠||𝑋𝑤 − 𝑦||

2

2+ 𝛼||𝑤||

1

25

The alpha in the equation decides the degree of sparsity of the coefficients, and just

like Ridge regression, the best suited alpha for the model can be decided by cross

validation. Scikit-learn offers two different methods for deciding the value of alpha,

the LassoCV and the LassoLarsCV function. LassoCV is however most often preferred

and will be used in this study (Lasso 2020).

Figure 4: Example of how the MAE changes depending on the value of alpha

The following values of alpha was tested in the model, each time with LassoCV

performing a 5-fold cross validation to test how the different alpha values affect the

results:

Table 8 The values tested as alpha in Lasso

𝛼 0.001, 0.005, 0.01, 0.05, 0.1, 0.3, 1, 3, 5, 10, 15, 30, 50, 75, 100, 150

The best suited parameter is then set and used in a 5-fold cross validation to get the

results, which are presented as MAE together with the standard deviation of the

results.

3.3.4 Elastic net

Elastic net uses both L1 and L2-norm regularization on the coefficients, which gives

the model some of the properties of both Lasso and Ridge regression. Elastic net tries

to minimize equation 3.4 (Linear Models 2020):

(3.4)

𝑚𝑖𝑛

𝑤

1

2𝑛𝑠𝑎𝑚𝑝𝑙𝑒𝑠||𝑋𝑤 − 𝑦||

2

2+ 𝛼𝜌||𝑤||

1+

𝛼(1 − 𝜌)

2||𝑤||

2

2

26

Table 9 The variables of Elastic net



𝜌 Ratio between L1 and L2 penalty

As seen above, the equation for Elastic net contains two parameters, alpha and 𝜌 (L1

ratio). To find the parameters most suited for the model, the ElasticNetCV function

can be used (ElasticNetCV 2020). Another possibility is to use the function

GridSearchCV which performs an exhaustive search from a grid of specified

parameters to see which combination provides the best result (GridSearchCV 2020). In

this study, the alpha values tested are the following:

Table 10 The values tested as alpha in Elastic net

𝛼 0.001, 0.005, 0.01, 0.05, 0.1, 0.3, 1, 3, 5, 10, 15, 30, 50, 75, 100, 150

The second parameters used in Elastic net is the L1 ratio which decides the

combination of L1 and L2 penalty. This value must be between 0 and 1, with 0 being

an L2 penalty (Ridge), and 1 being an L1 penalty (Lasso). All values between 0 and 1

results in a combined L1 and L2 penalty. It is recommended to try more values of the

L1 ratio closer to 1 than 0 (ElasticNetCV 2020). The L1 ratio numbers tested by the

GridSearchCV in this study are:

Table 11 The values tested as 𝜌 in Elastic net

𝜌 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99, 1

After the parameters are set, a 5-fold cross validation is used to get the results of the

model, which are presented together with the standard deviation.

3.4 Evaluation

After calculating the optimal parameters for the prediction models, these parameters

were inserted in the models and the results were calculated by using a 5-fold cross

validation. During the cross validation the dataset is divided into 5 smaller sets of

equal size, of which 4 will be used for training and the 5th set for testing. This is

repeated 5 times until all of the smaller sets have been used as both training and

testing sets (Cross-validation 2020). The function for the actual results of the cross

validation in Scikit-learn is the cross_val_score function, which returns the results of

each of the folds in the cross validation. The cross_val_score function offers a few

different ways of presenting the results, one of which is the MAE that was used in this

study. The results returned by the cross_val_score function is the average value of the

prediction for each tuple in the test set. This means that each fold of the cross-

27

validation only returns one value, which was collected and is presented in the results

chapter.

3.5 Reliability & Validity

This study is a quantitative study, which comes with the benefit that the

interpretations and findings are based on quantitative and measured data rather than

impressions. But for the analysis to have any value, the data has to be valid

(Denscombe, 2014). Good validity and reliability are two key principles of ensuring

that the research has good credibility and is of high quality.

3.5.1 Reliability

Reliability is a measure of the consistency of the study, that a research instrument

should reach approximately the same result every time the test is completed (Heale

and Twycross, 2015). This means that the people behind the research should have no

impact on the outcome of the study, and given everything else being equal, produce

the same results on different occasions (Denscombe, 2014).

The use of methods that are firmly grounded in previous research both for the

analysis of the data as well as the examination of the results indicates that the study is

of high reliability.

3.5.2 Validity

The validity refers to the accuracy of the data, as well as if the data is appropriate to

answer the research question of the study (Denscombe, 2014). The data in this study is

taken directly from the ERP-system of an actual company, which makes it appropriate

to answer the research question. The data is also of large quantity, spans over a two-

year period and was checked for errors before being used.

This means that the data in the study is of high validity.

3.6 Research ethics

To conduct research in an ethical way, there are six main principles that needs to be

considered. These are (Our core principles 2020):

• The research should maximize the benefit for individuals and society, as

well as minimize the risk and harm.

• The research respects the rights and dignity of individuals and groups.

28

• Participation in the research should be voluntary and appropriately

informed.

• The research is to be conducted with integrity and transparency.

• That the lines of responsibility and accountability are clearly defined.

• Research should be independent, and that conflicts of interest are explicit

in the cases where they cannot be avoided.

There is no reason to believe that the study will violate any of these principles.

Following the principles is easier since no information about individuals will be

gathered, the only data used is retrieved from the company ERP and is presented

without any sensitive information.

29

4 Results

As data regarding 10 items were chosen, 10 individual predictions were made with

the four different models. The results are presented below in table 8 and figure 6 and

7.

The number of the item is in the header, along with the number of rows (tuples) and

attributes. The results of the predictions are presented as the mean absolute value

(MAE) together with the standard deviation (SD). Ridge regression, Lasso and Elastic

net all use one or two parameters which must be set in the model. The values here are

the parameters that yielded the best results, blank fields mean that the model does not

use the parameter.

Table 8 The results of the prediction models

Item 1 (160 x 15)

MAE SD Alpha L1 ratio

Linear regression 34.923 14.029

Ridge regression 29.794 5.933 15

Lasso 29.78 7.286 5

Elastic net 29.78 7.286 5 1

Item 2 (159 x 15)



Ridge regression 15.785 4.082 0.001

Lasso 15.785 4.08 0.001

Elastic net 15.785 4.08 0.001 1

Item 3 (188 x 15)




Lasso 26.091 10.491 10

Elastic net 26.091 10.491 10 1

Item 4 (374 x 15)




Lasso 32.891 8.374 1

Elastic net 32.783 8.4 0.05 0.3

30

Item 5 (260 x 15)


Linear

Regression

39.594 15.488


Lasso 39.05 11.389 15

Elastic net 37.443 10.924 0.3 0

Item 6 (165 x 15)




Lasso 37.375 7.826 50

Elastic net 37.375 7.826 50 1

Item 7 (163 x 15)




Lasso 26.319 8.441 5

Elastic net 26.319 8.441 5 1

Item 8 (155 x 15)




Lasso 26.411 5.135 30

Elastic net 25.82 4.731 3 0.4

Item 9 (152 x 15)




Lasso 12.164 4.319 5

Elastic net 12.164 4.319 5 1

Item 10 (178 x 15)




Lasso 23.809 8.74 15

Elastic net 23.803 8.772 15 0.95

31

Below is a presentation of the average values of the mean absolute error as well as the

average standard deviation of the models.

Figure 6: The average MAE of the results

Figure 7: The average SD of the results

32

5 Discussion

The linear regression model with the ordinary least squares method performs worst in

all cases, as can be seen in table 8, which could be expected as this model is unable to

perform any regularization. That the regularization is what makes the other models

perform better is clear, as the results of Item 2 shows that linear regression, just as it

theoretically should (compare equation 1 and 2 for example), performs just in line

with the other models when their regularization parameter, alpha, moves towards

zero. 0.001 was the lowest value of alpha that the models could use, but they might

have chosen 0 if it had been among the possible values.

Ridge regression was developed to deal with some of the shortcomings of the

ordinary least squares method, and the results clearly shows that it does so. Even

though the results are close to each other, Ridge regression performs better or as good

as the linear regression in all cases.

Lasso was also developed as a continuation and improvement of the linear regression

and is highlighted for its ability to reduce the number of variables that the solution is

dependent on. This is brought up as a solution to the problem with standard

regression models which often include too many variables, leading to overfitting

(Ranstam and Cook, 2018). It is difficult to draw any conclusions when looking at the

results from Lasso compared to Ridge regression, since even though the number of

variables is rather small (15) Lasso performs better than Ridge in most cases, but not

all, see table 8. So, it is hard to say if Lasso is performing better due to its tendency to

prefer fewer variables, since there are not that many to begin with, and Lasso does not

perform better over all cases.

Elastic net was developed to improve Lasso, and the results show that it does not just

perform better than Lasso, it performs best overall which clearly can be seen in figure

6. It has, thanks to its parameter to control the ratio between L1 and L2 penalty, the

ability to perform more like Ridge regression or as Lasso depending on which is best

at that moment. As the L1 ratio approaches 1 the model performs as Lasso, and when

the ratio moves towards 0 it performs as Ridge regression, just as it theoretically

should. This gives Elastic net a flexibility to adapt to the current data and to always

perform better or as good as the other models.

Hao and Lu (2018) brought up that the Elastic net model is performing especially well

when the number of variables is bigger than the number of tuples, but as seen in the

results it performs as good as, or better than, Ridge regression and Lasso even though

the amount of variables is far smaller than the number of tuples in all cases.

The performance of Elastic net in situations with less variables than tuples is also a

key point in the study of Zou and Hastie (2005). This together with the fact that no

clear patterns can be seen when comparing the items with more tuples with the ones

with fewer makes it interesting to see whether any of the models would stand out

when the number of tuples becomes larger or smaller. At least in theory, the Elastic

net should perform even better as the number of tuples shrinks to fewer than the

variables.

33

Previous research on the area of production planning under uncertainty has resulted

in models which are an improvement compared to production planning without

taking any consideration to the uncertainties involved. This means that the area is

already useful for practitioners and researchers but including this data-driven method

for forecasting deliveries could improve the models even further.

34

6 Conclusion

The best overall model for predicting the delivery times is the Elastic net, with being

the best predictor in almost all cases, and very close to the best in the rest. It is clear

that it adapts and becomes more like Ridge or as Lasso depending on which one

works the best. Ordinary linear regression performs worst in more or less all cases and

should therefore not be used for this purpose.

This means that Elastic net should be the chosen as the predictive model to be used

when calculating the actual delivery times from suppliers. These results can then be

used to mitigate the supply uncertainty in a larger model for production planning, or

to be used independently in supply chain risk management.

6.1 Future research

The sample sizes in this model spans from 152 to 374 observations, which means that

these parts were delivered multiple times per month during these two years. For a

producing company, many parts from the suppliers will be delivered far less often

than that, which results in a weaker base for a prediction model. Future research

could look if a similar result would appear when the number of observations gets

smaller, and how small this number can be and with the model still providing a

meaningful result.

As the previous studies on the area often simplify the calculations of uncertainty from

the supply side, it would be interesting to see a predictive model, preferably elastic

net, implemented in a larger model. This though, requires data from a real-world case.

The results can then be compared with the results of the previous models to see if

there is any improvement.

35

References

1.1. Linear Models — scikit-learn 0.22.2 documentation [WWW Document], n.d. URL

https://scikit-learn.org/stable/modules/linear_model.html#lasso (accessed

3.19.20).

3.1. Cross-validation: evaluating estimator performance — scikit-learn 0.22.2

documentation [WWW Document], n.d. URL https://scikit-

learn.org/stable/modules/cross_validation.html#cross-validation (accessed

5.1.20).

3.2.4.1.1. sklearn.linear_model.ElasticNetCV — scikit-learn 0.22.2 documentation

[WWW Document], n.d. URL https://scikit-

learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html#

sklearn.linear_model.ElasticNetCV (accessed 5.1.20).

3.2.4.1.9. sklearn.linear_model.RidgeCV — scikit-learn 0.22.2 documentation [WWW

Document], n.d. URL https://scikit-

learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html#skle

arn.linear_model.RidgeCV (accessed 4.30.20).

3.3. Metrics and scoring: quantifying the quality of predictions — scikit-learn 0.22.2

documentation [WWW Document], n.d. URL https://scikit-

learn.org/stable/modules/model_evaluation.html#regression-metrics

(accessed 5.4.20).

Anaconda Distribution — Anaconda documentation [WWW Document], n.d. URL

https://docs.anaconda.com/anaconda/ (accessed 3.19.20).

Aouam, T., Geryl, K., Kumar, K., Brahimi, N., 2018. Production planning with order

acceptance and demand uncertainty. Comput. Oper. Res. 91, 145–159.

https://doi.org/10.1016/j.cor.2017.11.013

Beemsterboer, B., Land, M., Teunter, R., 2016. Hybrid MTO-MTS production planning:

An explorative study. Eur. J. Oper. Res. 248, 453–461.

https://doi.org/10.1016/j.ejor.2015.07.037

Box, G.E.P., Jenkins, G.M., Reinsel, G.C., 2013. Introduction, in: Time Series Analysis.

John Wiley & Sons, Ltd, pp. 7–18. https://doi.org/10.1002/9781118619193.ch1

Braz, A.C., De Mello, A.M., de Vasconcelos Gomes, L.A., de Souza Nascimento, P.T.,

2018. The bullwhip effect in closed-loop supply chains: A systematic literature

review. J. Clean. Prod. 202, 376–389.

https://doi.org/10.1016/j.jclepro.2018.08.042

Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae,

V., Prettenhofer, P., Gramfort, A., Grobler, J., Layton, R., Vanderplas, J., Joly,

A., Holt, B., Varoquaux, G., 2013. API design for machine learning software:

experiences from the scikit-learn project. ArXiv13090238 Cs.

Bushuev, M.A., Guiffrida, A.L., 2012. Optimal position of supply chain delivery

window: Concepts and general conditions. Int. J. Prod. Econ. 137, 226–234.

https://doi.org/10.1016/j.ijpe.2012.01.039

36

Chai, T., Draxler, R.R., 2014. Root mean square error (RMSE) or mean absolute error

(MAE)? – Arguments against avoiding RMSE in the literature. Geosci. Model

Dev. 7, 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014

Choosing the right estimator — scikit-learn 0.22.2 documentation [WWW Document],

n.d. URL https://scikit-

learn.org/stable/tutorial/machine_learning_map/index.html (accessed 3.19.20).

Davidson, I., Tayi, G., 2009. Data preparation using data quality matrices for

classification mining. Eur. J. Oper. Res. 197, 764–772.


de Myttenaere, A., Golden, B., Le Grand, B., Rossi, F., 2016. Mean Absolute Percentage

Error for regression models. Neurocomputing 192, 38–48.

https://doi.org/10.1016/j.neucom.2015.12.114

Denscombe, M., 2014. The Good Research Guide : For Small-scale Research Projects,

Open UP Study Skills. McGraw-Hill Education, Maidenhead, Berkshire.

Er, M., Arsad, N., Astuti, H., Kusumawardani, R., Utami, R., 2018. Analysis of

production planning in a global manufacturing company with process

mining. J. Enterp. Inf. Manag. 31, 317–337. https://doi.org/10.1108/JEIM-01-

2017-0003

Flores, B.E., 1986. A pragmatic view of accuracy measurement in forecasting. Omega

14, 93–98. https://doi.org/10.1016/0305-0483(86)90013-7

Gao, L., Yang, N., Zhang, R., Luo, T., 2017. Dynamic Supply Risk Management with

Signal-Based Forecast, Multi-Sourcing, and Discretionary Selling. Prod. Oper.

Manag. 26, 1399–1415. https://doi.org/10.1111/poms.12695

Hao, Y.-X., Lu, D., 2018. Sparse approximation of fitting surface by elastic net.

Comput. Appl. Math. 37, 2784–2794. https://doi.org/10.1007/s40314-017-0475-4

Hastie, T., Tibshirani, R., Friedman, J., 2009. Linear Methods for Regression, in: Hastie,

T., Tibshirani, R., Friedman, J. (Eds.), The Elements of Statistical Learning:

Data Mining, Inference, and Prediction, Springer Series in Statistics. Springer,

New York, NY, pp. 43–99. https://doi.org/10.1007/978-0-387-84858-7_3

Heale, R., Twycross, A., 2015. Validity and reliability in quantitative studies. Evid.

Based Nurs. 18, 66–67. https://doi.org/10.1136/eb-2015-102129

Hübl, A., 2018. Stochastic Modelling in Production Planning. Springer Fachmedien

Wiesbaden, Wiesbaden. https://doi.org/10.1007/978-3-658-19120-7

Jaggi, C., Verma, M., Jain, R., 2018. Quantitative analysis for measuring and

suppressing bullwhip effect. Yugosl. J. Oper. Res. 28, 415–433.

https://doi.org/10.2298/YJOR161211019J

Jiawei Han, Micheline Kamber, Jian Pei, 2012. 3 - Data Preprocessing.

https://doi.org/10.1016/B978-0-12-381479-1.00003-4

Khakdaman, M., Wong, K.Y., Zohoori, B., Tiwari, M.K., Merkert, R., 2015. Tactical

production planning in a hybrid Make-to-Stock–Make-to-Order environment

under supply, process and demand uncertainties: a robust optimisation

model. Int. J. Prod. Res. 53, 1358–1386.

https://doi.org/10.1080/00207543.2014.935828

37

Liu, Y., Liao, S., Jiang, S., Ding, L., Lin, H., Wang, W., 2020. Fast Cross-Validation for

Kernel-Based Algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 42, 1083–

1096. https://doi.org/10.1109/TPAMI.2019.2892371

Makridakis, S., 1993. Accuracy measures: theoretical and practical concerns. Int. J.

Forecast. 9, 527–529. https://doi.org/10.1016/0169-2070(93)90079-3

Mendo, L., 2009. Estimation of a probability with guaranteed normalized mean

absolute error. IEEE Commun. Lett. 13, 817–819.

https://doi.org/10.1109/LCOMM.2009.091128

Mirzapour Al-e-hashem, S.M.J., Malekly, H., Aryanezhad, M.B., 2011. A multi-

objective robust optimization model for multi-product multi-site aggregate

production planning in a supply chain under uncertainty. Int. J. Prod. Econ.

134, 28–42. https://doi.org/10.1016/j.ijpe.2011.01.027

Mula, J., Poler, R., García-Sabater, J.P., Lario, F.C., 2006. Models for production

planning under uncertainty: A review. Int. J. Prod. Econ. 103, 271–285.

https://doi.org/10.1016/j.ijpe.2005.09.001

“neural-networks” tag wiki [WWW Document], n.d. . Math. Stack Exch. URL

https://math.stackexchange.com/tags/neural-networks/info (accessed 5.17.20).

Ngniatedema, T., Fono, L.A., Mbondo, G.D., 2015. A delayed product customization

cost model with supplier delivery performance. Eur. J. Oper. Res. 243, 109–

119. https://doi.org/10.1016/j.ejor.2014.11.017

Our core principles - Economic and Social Research Council [WWW Document], n.d.

URL https://esrc.ukri.org/funding/guidance-for-applicants/research-

ethics/our-core-principles/ (accessed 3.16.20).

Pandis, N., 2016. Linear regression. Am. J. Orthod. Dentofacial Orthop. 149, 431–434.

https://doi.org/10.1016/j.ajodo.2015.11.019

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel,

M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,

Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É., 2011. Scikit-learn:

Machine Learning in Python. J. Mach. Learn. Res. 12, 2825−2830.

Peidro, D., Mula, J., Jiménez, M., del Mar Botella, M., 2010. A fuzzy linear

programming based approach for tactical supply chain planning in an

uncertainty environment. Eur. J. Oper. Res. 205, 65–80.


Ranstam, J., Cook, J.A., 2018. LASSO regression. BJS Br. J. Surg. 105, 1348–1348.

https://doi.org/10.1002/bjs.10895

Russo, A., Raischel, F., Lind, P.G., 2013. Air quality prediction using optimal neural

networks with stochastic variables. Atmos. Environ. 79, 822–830.

https://doi.org/10.1016/j.atmosenv.2013.07.072

Schmidt, A.F., Finan, C., 2018. Linear regression and the normality assumption. J. Clin.

Epidemiol. 98, 146–151. https://doi.org/10.1016/j.jclinepi.2017.12.006

Shanmuganathan, S., Samarasinghe, S. (Eds.), 2016. Artificial Neural Network

Modelling, Studies in Computational Intelligence. Springer International

Publishing, Cham. https://doi.org/10.1007/978-3-319-28495-8

38

Silva, T.C., Ribeiro, A.A., Periçaro, G.A., 2018. A new accelerated algorithm for ill-

conditioned ridge regression problems. Comput. Appl. Math. 37, 1941–1958.

https://doi.org/10.1007/s40314-017-0430-4

sklearn.linear_model.Lasso — scikit-learn 0.22.2 documentation [WWW Document],

n.d. URL https://scikit-

learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.

linear_model.Lasso (accessed 4.30.20).

sklearn.linear_model.LinearRegression — scikit-learn 0.22.2 documentation [WWW


learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.ht

ml#sklearn.linear_model.LinearRegression (accessed 4.29.20).

sklearn.model_selection.GridSearchCV — scikit-learn 0.22.2 documentation [WWW


learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.ht

ml (accessed 5.5.20).

Soo, K.W., Rottman, B.M., 2018. Causal strength induction from time series data. J.

Exp. Psychol. Gen. 147, 485–513.

http://dx.doi.org.proxybib.miun.se/10.1037/xge0000423

Su, X., Yan, X., Tsai, C.-L., 2012. Linear regression. WIREs Comput. Stat. 4, 275–294.

https://doi.org/10.1002/wics.1198

Sweeney, E., Grant, D.B., Mangan, D.J., 2018. Strategic adoption of logistics and

supply chain management. Int. J. Oper. Prod. Manag. 38, 852–873.

https://doi.org/10.1108/IJOPM-05-2016-0258

Varoquaux, G., 2018. Cross-validation failure: Small sample sizes lead to large error

bars. NeuroImage 180, 68–77. https://doi.org/10.1016/j.neuroimage.2017.06.061

Wang, S., Ji, B., Zhao, J., Liu, W., Xu, T., 2018. Predicting ship fuel consumption based

on LASSO regression. Transp. Res. Part Transp. Environ. 65, 817–824.

https://doi.org/10.1016/j.trd.2017.09.014

Weiss, C.J., 2017. Introduction to Stochastic Simulations for Chemical and Physical

Processes: Principles and Applications. J. Chem. Educ. 94, 1904–1910.

https://doi.org/10.1021/acs.jchemed.7b00395

Welc, J., Esquerdo, P.J.R., 2018. Basics of Regression Models, in: Welc, J., Esquerdo,

P.J.R. (Eds.), Applied Regression Analysis for Business: Tools, Traps and

Applications. Springer International Publishing, Cham, pp. 1–6.

https://doi.org/10.1007/978-3-319-71156-0_1

Zou, H., Hastie, T., 2005. Regularization and Variable Selection via the Elastic Net. J.

R. Stat. Soc. Ser. B Stat. Methodol. 67, 301–320.

Date post:	01-Feb-2021
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Predicting deliveries from suppliers1445722/FULLTEXT01.pdf · The need for further research is...

Documents