Predicting deliveries from suppliers
A comparison of predictive models
Marcus Sawert
Högskolepoäng: 30 HP
Termin/år: VT 2020
Handledare: Leif Olsson
Examinator: Aron Larsson
Kurskod/registreringsnummer: IG001A
Utbildningsprogram: Civilingenjör i Industriell ekonomi
2
Abstract
In the highly competitive environment that companies find themselves in today, it is
key to have a well-functioning supply chain. For manufacturing companies, having a
good supply chain is dependent on having a functioning production planning. The
production planning tries to fulfill the demand while considering the resources
available. This is complicated by the uncertainties that exist, such as the uncertainty in
demand, in manufacturing and in supply. Several methods and models have been
created to deal with production planning under uncertainty, but they often overlook
the complexity in the supply uncertainty, by considering it as a stochastic uncertainty.
To improve these models, a prediction based on earlier data regarding the supplier or
item could be used to see when the delivery is likely to arrive.
This study looked to compare different predictive models to see which one could best
be suited for this purpose.
Historic data regarding earlier deliveries was gathered from a large international
manufacturing company and was preprocessed before used in the models. The target
value that the models were to predict was the actual delivery time from the supplier.
The data was then tested with the following four regression models in Python: Linear
regression, ridge regression, Lasso and Elastic net. The results were calculated by
cross-validation and presented in the form of the mean absolute error together with
the standard deviation. The results showed that the Elastic net was the overall best
performing model, and that the linear regression performed worst.
Keywords: Production planning, Supply, Deliveries, Prediction, Linear regression,
Ridge regression, Lasso, Elastic net.
3
Table of Contents
Abstract ............................................................................................................................... 2
List of abbreviations ........................................................................................................... 4
1 Introduction ...................................................................................................................... 5
1.1 Background ...................................................................................................................... 5
1.2 Purpose & Aim ................................................................................................................ 7
1.3 Research question ........................................................................................................... 7
1.4 Delimitations ................................................................................................................... 7
2 Theory ............................................................................................................................... 8
2.1 Supply chain management ............................................................................................ 8
2.2 Stochastic processes ........................................................................................................ 9
2.3 Predictive models .......................................................................................................... 10
2.4 Evaluation methods ...................................................................................................... 15
2.5 Previous studies ............................................................................................................ 18
3 Method ............................................................................................................................ 20
3.1 Data collection ............................................................................................................... 20
3.2 Data preprocessing ....................................................................................................... 21
3.3 Prediction ....................................................................................................................... 22
3.4 Evaluation ...................................................................................................................... 26
3.5 Reliability & Validity .................................................................................................... 27
3.6 Research ethics .............................................................................................................. 27
4 Results ............................................................................................................................ 29
5 Discussion ...................................................................................................................... 32
6 Conclusion ..................................................................................................................... 34
6.1 Future research .............................................................................................................. 34
References......................................................................................................................... 35
4
List of abbreviations
ANN Artificial Neural Network
ARIMA Autoregressive Integrated Moving Average
ERP Enterprise Resource Planning
Lasso Least Absolute Shrinkage and Selection Operator
MAE Mean Absolute Error
MAPE Mean Absolute Percentage Error
MRP Materials Requirement Planning
MRPII Manufacturing Resource Planning
MTO Make-to-order
MTS Make-to-stock
SCP Supply chain planning
sMAPE symmetric Mean Absolute Percentage Error
5
1 Introduction
In the highly competitive and volatile business environment many manufacturing
companies act in today, it is of great importance for them to have a functioning supply
chain management, which manages the flow of material from the procurement,
through manufacturing, and last with the delivery to the end customer (Sweeney et
al., 2018). The production planning is a big part of this process, which is trying to
fulfill the demand while considering the available material, employee and production
resources.
The available material is dependent on the deliveries from the suppliers, which not
always can be taken for granted to be on time (Khakdaman et al., 2015). This creates
the need to take the uncertainty in the material procurement in consideration when
the delivery date to the end customer is communicated. Therefore, it is important to
find a model that can predict the incoming deliveries with a high accuracy.
1.1 Background
Supply chain management is a way of coordinating the activities within a company to
ensure that the customer demand is met and delivered on time. The five main supply
chain processes of the Supply Chain Operations Reference-model are plan, source,
make, deliver and return (Bushuev and Guiffrida, 2012). The delivery part of the
supply chain is of great importance, and the delivery performance is one of the key
aspects of a successful supply chain. The timeliness of the deliveries has an impact on
the customer satisfaction, as well as having an economical aspect. Both early and late
deliveries create costs, early deliveries create extra inventory costs and late deliveries
produce costs in the form of production stoppage costs, lost sales and lost goodwill. In
some cases the delivery reliability, that the delivery is made in time, is more valued
than a fast delivery (Bushuev and Guiffrida, 2012).
To maintain a good delivery performance while maintaining low inventories is
increasingly difficult, with the increased need for customization which creates an
issue to accurately forecast the demand (Ngniatedema et al., 2015). This highlights one
of the key processes within manufacturing companies to deal with these problems, the
production planning. The production planning is the process that aims to making sure
that the production fulfills the demands available, with consideration to the available
material, the employees and the production resources. The production planning can
be divided into three parts, starting with a long-term aggregate plan, followed by a
mid-term plan and finally a short-term plan which decides which products and how
many to be produced, and when the production should start (Er et al., 2018). The
production planning is very closely linked to the material procurement since the
production cannot be completed if the required material is not available. A number of
systems have been developed to help coordinate this, such as the materials
requirement planning (MRP), the manufacturing resource planning (MRPII) and the
enterprise resource planning (ERP).
It is desirable that the scheduling produced by the production planning is as stable as
6
possible, but different uncertainties often cause a continual re-planning (Er et al.,
2018).
The uncertainties that needs to be taken into consideration within production
planning can be divided into three main groups (Peidro et al., 2010): demand,
process/manufacturing and supply. The demand uncertainty is being presented as the
most important one, which consists of the volatility in demand and the inexact
forecasting. The process/manufacturing uncertainty is regarding the internal
production process, with machine problems, quality issues etc. The uncertainty in
supply is caused by faults or delays in the deliveries from the suppliers (Peidro et al.,
2010).
There has been extensive research on the area of production planning with
uncertainty, but there is a handful of weaknesses that are not always addressed.
Beemsterboer et al. (2016) brings up the issue that a lot of the literature assumes either
pure make-to-order (MTO) or make-to-stock (MTS) production systems, conflicting
with modern hybrid production systems which combine the two.
Also a lack of consideration of multi-product organization is brought up (Khakdaman
et al., 2015) as a weakness in the available literature. Assuming a single product is a
simplification and doesn’t portray modern organizations.
In some of the previous research of production planning with uncertainty the
frameworks used have been stochastic programming and robust optimization, but
these models have only been considering demand uncertainty, which is only one type
that exists within production planning (Aouam et al., 2018).
The need for further research is expressed in (Mula et al., 2006), where they determine
that uncertainty always needs to be taken into consideration, and that optimization in
the area of production planning is very complex. The same study mentions models
based on artificial intelligence and fuzzy set theory to possibly be of use to model
production planning under uncertainty. Mula et al. (2006) also highlights the need for
models with other types of uncertainty, since models dealing with uncertain demand
has received a lot more attention in comparison.
Several studies on the area, such as Khakdaman et al. (2015), Gao et al. (2017), Peidro
et al. (2010) and Mirzapour Al-e-hashem et al. (2011) build their models with a
number of variables to help with production planning (etc.) under uncertainty, but
they lack consideration to uncertain supply, or it is assumed that the supply is
stochastic. Assuming that the uncertain supply is stochastic is a simplification of
reality as different parts and suppliers differs significant in their delivery
performance. A model to predict the incoming deliveries could improve these models
to become even more accurate.
7
1.2 Purpose & Aim
The aim of the study is to contribute to the research areas of supply chain risk
management as well as predictive modeling, by examining which predictive model is
best suited to predict the supply to a manufacturing company.
1.3 Research question
The research question is to examine which type of predictive model that gives the
most accurate results when predicting the supply of components and material to a
manufacturing company.
1.4 Delimitations
The data in the research will be taken from a large international manufacturing
company that has a multi-product mix, with both MTO and MTS production. The
production mostly consists of assembly of components from suppliers.
8
2 Theory
A few theoretical concepts are presented in the following chapter, which are relevant
to the study. This includes different models for prediction, as well as methods to
evaluate the results. It is also motivated which prediction models and evaluation
methods that ended up being used in the study. These will be presented in the
method chapter in more depth as to how they were practically used.
2.1 Supply chain management
Within business, a supply chain is a network of organizations, people, activities and
resources needed to meet the demand of the end customer. Managing this is a
foundation for an organization to attain and maintain its competitive advantage
(Bushuev and Guiffrida, 2012). This supply chain management includes an active
coordination of raw material acquisition, production processing as well as the
distribution of goods and services.
The five supply chain processes defined by the Supply Chain Operations Reference-
model (SCOR) are plan, source, make, deliver and return. The importance of the
delivery process in the supply chain is well documented, as the timeliness of
deliveries are of great importance to the customer.
This together with the fact that deliveries are integrated and effecting the different
stages of the supply chain makes the delivery performance the most important metric
in a supply chain (Bushuev and Guiffrida, 2012).
Improving the delivery process in the supply chain is therefore of interest to managers
within supply chain and logistics.
How well the delivery process works for a manufacturing company is dependent on if
the company can produce the required amount of goods in time, which requires a
well-functioning production planning.
2.1.1 Production planning
Production planning is the process within a manufacturing company to plan the
production regarding the availability of material, production resources and
employees. The goals of the production planning are, among others, to provide on-
time deliveries, optimal production utilization and low inventories. These goals are
often in conflict with each other, for example could on-time deliveries more often be
achieved with the help of a large inventory of finished goods, but such an inventory
instead leads to higher costs (Hübl, 2018).
Several technologies have been developed through the years to help companies with
production planning, such as the material requirement planning (MRP),
manufacturing resource planning (MRPII), enterprise resource planning (ERP) and the
supply chain planning (SCP) (Hübl, 2018).
9
The difficulty and the source to continuous re-planning within production planning
are the uncertainties involved. These are the demand uncertainty, the uncertainty in
the internal manufacturing/processes, and the uncertainty of supply (Peidro et al.,
2010). The demand uncertainty has been considered the most important of the three,
and the one that has received the most attention from researchers. One reason to the
difficulty of forecasting the demand is the phenomenon called the bullwhip effect,
which has been extensively researched, resulting in several ways presented to
mitigate this effect. The bullwhip effect is discussed in more detail in chapter 2.5.
The supply uncertainty is due to the fact that it can not be taken for granted that
suppliers deliver the correct amount at the correct time (Khakdaman et al., 2015).
Similarly, as for the demand side, this makes it difficult to forecast and predict when
all needed material will be available for the production. This however has not been
addressed to the same extent as the demand uncertainty resulting in a gap in the
research as to how this problem should be addressed.
The rest of the chapter will cover theory relevant to the supply chain management,
and production planning under uncertainty.
2.2 Stochastic processes
Stochastic means that something is randomly determined, and a stochastic process is
randomly changing as time passes. It is often used in modern chemical and physical
research, as it suits the random nature of their underlying mechanisms.
The random variables in a stochastic process can be generated in a number of different
distributions, where the following three are the most commonly used (Weiss, 2017):
• Uniform distribution, with values equal probably distributed throughout
the range.
• Poisson distribution, where a given number of independent events occur
in a time interval at a known rate.
• Binomial distribution, variables with two possible outcomes are randomly
generated, the outcomes can be of equal or unequal probability.
Some of the studies considering uncertainty in supply uses a stochastic variable for
calculating the risk. Since suppliers differs when it comes to delivery performance, as
well as the big difference between different parts and components in their material,
complexity, need for processing etc. makes a stochastic variable for calculating the
delivery uncertainty a simplification of reality. Therefore, an accurate model needs to
be based on a calculation that considers the changing delivery performance between
different suppliers and products.
10
2.3 Predictive models
A few commonly used predictive models are presented below, along with some
information of which types of studies they have been used in, and what type of data
they can be used with.
2.3.1 Neural Networks
An Artificial Neural Network (ANN) can be described as a model that is inspired by
the structure and processing of a biological brain. It consists of processing elements,
the neurons, and connections between these together with weights. It is commonly
defined by four parameters (Shanmuganathan and Samarasinghe, 2016):
1. The type of neuron, for example McCulloch-Pitts neuron.
2. The connection architecture, neural networks can be fully or partially
connected, as well as having different layers of neurons. In an auto associative
network, the input neurons are the also the output neurons, while in a hetero
associative network there are separate input and output neurons. The
architecture is determined depending on the connection between the output
and input neurons. The feedforward architecture has no connection back from
the output to the input neurons and the network has no memory of the
previous output values. The feedback architecture on the other hand, has
connections back from the output to the input neurons, and the network
remembers its previous states.
3. The learning algorithm which trains the network, which are divided into
supervised learning, unsupervised learning and reinforcement learning.
4. Recall algorithm, how the knowledge is extracted from the neural network.
Includes pattern association, data clustering, categorization.
Figure 1: Example of the structure of an artificial neural network (Neural-networks 2020)
11
Depending on what the goal of the neural network is, it can be constructed in different
ways. The learning algorithms depends on whether the model is linear or not, as
linear neural networks often are based on the least mean square rule. Nonlinear
models are often based on the back-propagation training rule (Russo et al., 2013).
Neural networks are often used for signal processing, forecasting and clustering, and
are increasingly being used in different areas where imprecise data and complex
relationship between variables exist, something that is difficult to handle with
traditional analytical methods (Shanmuganathan and Samarasinghe, 2016).
2.3.2 Time series analysis
Time series consists of observations that are taken sequentially in time, which can be
analyzed with the aim of finding trends or other useful information. The observations
need to be taken with equal time periods between them, it can be monthly, weekly,
daily or any other defined time period. Time series are frequently used in economics,
business, engineering and natural sciences (Box et al., 2013). One inherent feature of
time series is that the observations normally are dependent. This dependence is often
of great interest, and time series analysis includes techniques to analyze this
dependence.
In industry, business and economics, forecasting with the help of time series is of
particular interest. In these fields the time series are often represented as
nonstationary, with no constant mean level over time. Forecasting methods to deal
with this has been developed, which use the exponentially weighted moving
averages. One common model is the autoregressive integrated moving average
(ARIMA) (Box et al., 2013).
One problem with time series is brought up by Soo and Rottman (2018), which is the
complexity of influences that variables have on each other. Variables can experience
temporal trends, which can make it appear as if it exists positive or negative
relationship between variables even if no direct relation exist.
2.3.3 Regression analysis
A regression model is an analytical tool which estimates the relationship between a
dependent variable, and one or more explanatory variables. The explanatory
variables, or regressors, have a statistical or causal relationship to the dependent
variable. A regression model is quantitative and can be constructed in different ways,
depending on the purpose of the model and the data available (Welc and Esquerdo,
2018).
Regression models can be classified depending on a number of criteria, a few of them
are (Welc and Esquerdo, 2018):
• The numbers of equations used, there are single-equation models, which
use one equation to explain the relationship between the dependent
variable and the explanatory variables, and multi-equation models, which
12
have more than one dependent variable, and uses more than one
equation.
• How many explanatory variables the model uses, there are univariate
models with only one explanatory variable, and there are multivariate
models with at least two explanatory variables.
• The functional form of the model, there are linear models with a linear
relationship between the variables, and nonlinear models.
• The type of the dependent variable, in some models the dependent
variable is continuous, other models use a dependent variable expressed
in a binary way.
Regression models are mainly used for the following three purposes:
To examine the relationship, or lack thereof, between variables.
Forecasting, based on a model which explains the behavior of historical data, future
events can be predicted.
Scenario analyses, for example how an explanatory variable must change in order for
the dependent variable to reach a certain value (Welc and Esquerdo, 2018).
2.3.3.1 Linear regression
Linear regression is the simplest of the regression models and uses a linear
combination of the predictors for its function (Su et al., 2012). Linear regression has
become the building block for numerous modeling tools and is popular for the easily
interpretable parameters of the model. Also being able to provide a satisfactory result
to regression problems even when the sample size is small or the relationship between
the predictors and the dependent variable is relatively vague has increased the
popularity of linear regression (Su et al., 2012).
There are many methods to estimate the parameters of the linear regression model,
and the most common one is the least squares method which tries to minimize the
distance from the predicted values to the actual ones.
In linear regression models there are a few assumptions made about the data, the
predictors, the dependent variable and the relationship between them. These are a few
of the assumptions that are made (Pandis, 2016):
• That the observations of the dependent variable are independent from
each other.
• That for each value of the predictor, the dependent variable follows a
normal distribution.
• That the variability of the dependent variable stays the same for each
value the independent variable takes.
Although, even if the difference between the estimated values and the observed ones
not are normally distributed, linear regression can still produce valid results when the
13
sample size is large. And on the other hand, a linear regression model with these
values normally distributed is not necessarily valid (Schmidt and Finan, 2018).
2.3.3.2 Ridge regression
Also known as Tikhonov regularization, the Ridge regression was developed to solve
some of the issues with the least squares method within linear regression models
(Silva et al., 2018). Ridge regression together with, among others, Lasso are so called
shrinkage methods which picks out variables to keep in the finished model. This is also
the idea of another group of called subset selection methods, but there is an important
difference between the two groups. Subset selection methods have a discrete process
where the variables either are kept or discarded which often produces models with
high variance. Shrinkage methods however are continuous in their variable selection
process and are therefore not as prone to having high variability (Hastie et al., 2009).
One of these issues with the least squares method sometimes occurs when the
predictors have a high internal dependence and the regression coefficients have large
standard errors. This often results in poorly performing estimators, which the Ridge
regression tries to correct by implementing a penalty on the size of the coefficients
standard errors (Silva et al., 2018).
A similar function is also used in neural networks, which is called weight decay.
Another flaw with linear regression is that unusual large coefficient on a variable can
be canceled out by a negative coefficient with of similar size on its correlated variable,
this is dealt with in Ridge regression by implementing size constraints on the
coefficients (Hastie et al., 2009).
2.3.3.3 Lasso
Least Absolute Shrinkage and Selection Operator (Lasso) is a linear regression model
that effectively reduce the number of variables that the solution is dependent on
(Linear Models 2020). Standard regression models tend to include too many variables
which leads to overfitting as well as overestimating how the included variables are
used to explain the observed variability, thus “optimism bias” (Ranstam and Cook,
2018).
Lasso offers a solution to some of these problems with standard regression models,
since it tries to identify the variables and regression coefficients that leads to the least
prediction error. One of the drawbacks of the Lasso model is that it does not focus on
the contribution of individual variables, but instead the focus is on the best combined
prediction (Ranstam and Cook, 2018).
The Lasso model has been used for prediction with good results, one example is a
study by Wang et al. (2018) in which they predicted fuel consumption of ships at sea.
The results of the Lasso model were compared with three other methods (artificial
14
neural networks, support vector regression and Gaussian process regression) which
showed that the Lasso model was the best performing of the four.
2.3.3.4 Elastic net
Built in a similar way as Lasso, the regression method Elastic net simultaneously
performs automatic variable selection as well as continuous shrinkage. It was
developed in order to overcome some of the shortcomings of Lasso, one of which is
that when a group of variables exist with very high pairwise correlations between
them, Lasso selects one variable from the entire group without care for which one it
selects. Elastic net on the other hand prefers to use a grouping effect, where variables
with strong correlations are either kept in the model or removed together as a group
(Zou and Hastie, 2005).
Elastic net has shown to be very useful when the amount of variables is bigger than
the number of observations in the data (Hao and Lu, 2018).
Zou and Hastie (2005) show in their research, using real world data, that the Elastic
net method often achieves better results than both Lasso and Ridge regression.
Choosing the models
As shown above, there are several different types of predictive models available
depending on the data and how the prediction problem is structured.
As the problem in this case is the uncertainty in deliveries from suppliers, the
prediction is looking to provide an answer to when the delivery will be taking place.
Regression can provide a continues value, i.e. not providing the results as classified
into different categories. This suits the data well as deliveries can have very different
expected delivery times, and classification would create categories with large spans.
Time series analysis will not be used due to the fact that they look at defined time
periods, whereas deliveries have varying lead times. Time series could be used to
forecast what would be delivered during a defined time period but lacks the ability to
predict specific deliveries or the performance of suppliers or parts.
When it comes to prediction, linear regression models can sometimes perform better
than more advanced nonlinear models, in particular when the dataset is relatively
small (Hastie et al., 2009). Since the results are wanted as a continual value and not
classified, the simple linear regression is a good starting point.
Wang et al. (2018) showed in their study that Lasso outperformed, among others, the
artificial neural network when predicting fuel consumption. The number of variables
in their study is almost the same as in this study, and they too were looking to predict
a continuous value. Considering this, Lasso is a reasonable choice of model to try in
this study.
Scikit-learn is an open source machine learning library which offers both numerous
algorithms as well as several guides and thorough documentation for all its available
15
functions. And since finding the correct model is one of the hardest parts of solving a
machine learning problem, scikit-learn offers a guide to that as well. Depending on
the problem that the model will be used to solve, and what types of data that are
available, some machine learning models are better suited than others (Choosing the
right estimator 2020). The delivery times in this study is to be calculated as a
continuous variable and the number of samples is not very large. These conditions
make Lasso, Ridge regression and Elastic net some of the machine-learning models
that scikit-learn recommends for this type of study. This together with the fact that
Ridge was developed as an improvement to the ordinary linear regression model, and
that Elastic net is developed to deal with some of the shortcomings of Lasso makes the
comparison between them all interesting.
The predictive models chosen for the study are therefore:
Linear regression
Ridge regression
Lasso
Elastic net
2.4 Evaluation methods
This chapter presents some of the most common methods used for evaluating
predictive models.
2.4.1 MAPE
Mean Absolute Percentage Error (MAPE) is a measure for assessing the quality and
accuracy of a prediction made with a forecasting method. It can be presented as a
percentage and it is commonly used in practice thanks to the intuitive interpretation
of relative error that it offers (de Myttenaere et al., 2016).
Although being a popular and often used assessment method, MAPE has a few flaws.
It cannot handle zero values, as it would require divisions with zero. The most
concerning problem however, is that equal errors result in different MAPE-scores,
depending on if the forecasted value is above or below the actual value (Makridakis,
1993). This means that MAPE rewards results that under-forecast, which creates a
misleading result.
(2.1)
Table 1 The variables of MAPE
t Observation index 𝐴𝑡 Actual value
n Number of observations 𝐹𝑡 Forecasted value
𝑀𝐴𝑃𝐸 =1
𝑛∑
|𝐴𝑡 − 𝐹𝑡|
𝐴𝑡
𝑛
𝑡=1
16
2.4.2 sMAPE
To overcome some of the shortcoming of the MAPE method, it was expanded and
became the symmetric Mean Absolute Percentage Error (sMAPE). In addition to being
symmetric, it can also handle zero values (Flores, 1986). The model, just like MAPE,
gives a forecast a percentage error in comparison to the actual value.
(2.2)
Table 2 The variables of sMAPE
t Observation index 𝐴𝑡 Actual value
n Number of observations 𝐹𝑡 Forecasted value
2.4.3 MAE
Mean absolute error (MAE) is a measure of errors between observations, which can be
used to evaluate how well predictive models perform. MAE is calculated by taking
the average of the absolute error between the predicted and the actual value (Metrics
and scoring 2020).
(2.3)
Table 3 The variables of MAE
i Observation index 𝑦�̂� Predicted value
n Number of observations 𝑦𝑖 Actual value
Next to MAE, the very similar root mean square error (RMSE) is also widely used to
evaluate similar models as MAE, and there is no consensus on which of the two is
most appropriate to use (Chai and Draxler, 2014). One major difference between the
two is that all errors are given the same weight when using MAE, while RMSE on the
other hand gives errors with larger absolute values a higher weight. This has the effect
that the RMSE is more sensitive to outliers in the data compared to MAE (Mendo,
2009).
𝑠𝑀𝐴𝑃𝐸 =100%
𝑛 ∑
|𝐹𝑡 − 𝐴𝑡|
(|𝐴𝑡| + |𝐹𝑡|)/2
𝑛
𝑡=1
𝑀𝐴𝐸 =1
𝑛∑|𝑦𝑖 − 𝑦�̂�|
𝑛
𝑖=1
17
Chai and Draxler (2014) argues that when the errors are expected to be normally
distributed, the RMSE is the better choice over MAE, and that MEA should be used
when the errors are expected to be uniformly distributed.
Another distinct difference between RMSE and MEA is that the MAE uses absolute
values, which is a big disadvantage in many mathematical calculations (Chai and
Draxler, 2014).
Also, that the results of the MAE are of the same scale as the data it evaluates must be
taken into consideration before choosing the appropriate evaluation method.
2.4.4 Cross-validation
The validity of predictive models depend on their ability to generalize and accurately
predict properties of new and unseen data that is independent from the data used to
train the model (Varoquaux, 2018). This can be done with cross-validation, which
splits the available data into separate training and testing sets. Cross-validation
methods often use multiple rounds to evaluate the results. One of these methods is the
k-fold cross-validation where the data is divided into k subsets of equal size, and the
model performs k training rounds, or folds. For each training fold the model chooses
one of the subsets to be the test set, and the rest are chosen as training sets (Liu et al.,
2020). Each fold results in a validation error, and the k-fold cross-validation estimate is
the average validation error for all the k folds.
Many prediction models use hyperparameters, for example kernel parameter or
regularization parameter. These parameters are not calculated within the model, so
they need to be set before the model can be used. Hyperparameters can have different
purposes in a model, the regularization parameter for instance decides the amount of
shrinkage in a model. The value of these hyperparameters can sometimes greatly
affect the results, so finding the optimal value can be decisive for the performance of
the model. Cross-validation can be used both to find the optimal hyperparameter for
the model chosen, or simply to evaluate different models compared to each other (Liu
et al., 2020).
Choosing the models
All the models chosen for prediction are regression models, in which there are several
predictors that are used together to try to predict the dependent variable, or target
value. One common feature of most machine learning techniques is that the model is
first trained on a sub-set of the available data, and then tested on the rest.
In this study, a 5-fold cross-validation will be used in order to make sure that the
results are not affected by randomness as each sub-part of the dataset will be used
four times as a training set, and once as the test set.
In the cross-validation it has to be decided which format it is supposed to return the
results in. In this study the results will be in the form of MAE, the mean actual error.
The target value of the study is the actual lead time of each part and is measured in
days. This allows MAE to be easily understood and the fact that it is scale dependent
18
is not an issue. MAE is also better in handling outliers in the data, which exist in this
study (Mendo, 2009).
Scikit-learn also offers a built-in MAE calculator in its cross-validation function, as
opposed to MAPE and sMAPE.
2.5 Previous studies
Khakdaman et al. (2015) builds a robust optimization model to manage production
planning in an MTS/MTO environment. The uncertainties consist of delivery, process
and demand. The delivery part of the uncertainty is only about cost and currency
rates, which means that they overlook the risk of delayed or incomplete deliveries.
They conclude that although their focus is mostly on cost-related uncertainties, they
feel confident that their model delivered accurate estimates and useful findings.
Gao et al. (2017) uses a Markov-model for signal-based dynamic supply forecast, in
which suppliers are examined to find signals (financial health etc.) in order to guide
the procurement and selling decisions of the company. In the model, the capacity of
the suppliers is assumed to be stochastic. Their research concludes that a traditional
stationary forecast for supply capacity uncertainty may lead to poor decisions and
severe losses.
In (Peidro et al., 2010), a fuzzy linear programming model is used to deal with supply
chain planning under demand, process and delivery uncertainty. The uncertainties are
jointly considered, and the aim of the model is to use the available resources in order
to meet customer demands at a minimum cost. The model is tested with real-word
data and the results show that the model is more effective than deterministic methods
for handling situations without certain and precise information in supply chain
planning. The researchers see the integration of analytical and simulation models as a
possible improvement, where the best capacities of both types of models are
integrated to be used for supply chain planning problems.
Mirzapour Al-e-hashem et al. (2011) builds a robust multi-objective mixed integer
nonlinear programming model to deal with multi-product aggregate production
planning (APP) with uncertainty. The supply chain includes multiple suppliers,
multiple customers, is multi-period and multi-site. The uncertainties include the
demand fluctuations, as well as the cost parameters of the supply chain.
The demand part of the supply chain management is extensively researched, and one
phenomena of uncertainty in demand is called the bullwhip effect. This effect is one of
the main sources of inefficiencies in supply chains, and by extension, in production
planning. The effect is described as a growing difficulty to forecast demand the
further from the end customer, or higher up the supply chain an organization is (Braz
et al., 2018). Different ways to mitigate this effect are recommended, one way is to
implement a closed-loop supply chain, which has shown to be less affected than
traditional supply chains (Braz et al., 2018). Another way, introduced by Jaggi et al.
(2018) is the iterative proportional allocation algorithm which discourages the
bullwhip effect.
19
In the cases where these models deal with the uncertainty in supply, this uncertainty
is mixed and calculated together with other uncertainties, or it is viewed as a
stochastic uncertainty. The models do not focus on the supply uncertainty alone. This
means that the models produce results that are generalized and not consistent with
the complexity of real-world cases since they attribute all suppliers and items with the
same uncertainty. The research gap found here is to move away from the
generalization and instead use predictive modeling to form a data driven model
which can be incorporated in a bigger model that deals with production planning
under uncertainty in general.
Khakdaman et al. (2015, p. 1384) express at the end in their conclusion this need to
consider delivery uncertainty: “Finally, incorporating non-financial uncertainties such
as production lead time or raw material supply lead time into the model would
enhance its usefulness to both academics and practitioners.”
Predicting deliveries from supplier could both be used in models as explained above,
or independently as a way to handle uncertainties in the supply chain, i.e. a way of
supply chain risk management.
This study aims to find the predictive model best suited for these purposes.
20
3 Method
The methods and approaches to answer the research question are presented in the
following chapter, along with some information regarding the reliability, validity and
research ethics and how these are considered in this study.
The sub chapters are presented in the order in which they were carried out.
The study consists of four main parts, the data collection, data preprocessing, the
prediction and evaluation. The first part, the data collection, was made directly at a
manufacturing company. This was made in order to make sure that the results of the
study are applicable to real-world cases. The data then had to be preprocessed, to be
able to be used in the chosen predictive models. After the preprocessing was
complete, the predictive models were constructed and used with the now appropriate
data. Lastly, all the results from the different predictive models were gathered,
summarized and presented, all in order to see if any conclusions could be drawn.
All parts are described in more detail in the following chapter.
Figure 2: The structure of the study
3.1 Data collection
All the data was gathered from the company ERP-system, which tracks all orders,
warehouse information etc. throughout the different company facilities. The system
also contains information about the supplier for each part, along with the expected
lead time. The lead time is the time from the placement of the order to when the order
Data collection
Data preprocessing
Prediction
Evaluation
21
is fulfilled. The system also tracks the actual lead time of each delivery, and that is the
important information. The actual lead time is what is being used as the target value
in this study, the value that the models will try to predict.
This means that there is historical data in the company ERP-system regarding when
each order is placed, the lead time of each part, the supplier, and when the order was
received.
The data gathered is all internal transactions during 2018 and 2019, which in total was
375 000 lines. This was then filtered to only include the deliveries from the main
producing factory in Sweden that are sent to the other company facilities around the
world.
To make sure that the data used was going to be large enough, and that the results
would show over multiple cases, the 10 items with most deliveries during these two
years were selected to be used in the models. This means that the predictive models
were used on 10 different items to see if any clear conclusions could be drawn.
The data was collected from the ERP-system to an Excel file, in which the data
regarding the 10 items most frequently delivered was picked out, and then remade
into a comma separated file (.csv) in order to facilitate the data handling in Python.
3.2 Data preprocessing
To be able to use the data in a prediction model, it first needs to be preprocessed and
structured in a correct way. To successfully use algorithms to draw conclusions from
large sets of data, one key point is the quality of the data. The data needs to be
complete, up to date, and without errors, which collected data rarely is (Davidson and
Tayi, 2009).
Modern databases, many of huge sizes, often include noisy and missing data that
originate from multiple different sources (Jiawei Han et al., 2012). Data quality can be
described as the accuracy, completeness, consistency, timeliness, believability and
interpretability of the data (Jiawei Han et al., 2012). To help improve the data quality
and substantially improve the quality of the results, there are a few techniques of data
preprocessing that can be used.
Completeness is one key part of data quality, and one that often needs to be properly
addressed since missing values are a common issue with large datasets. A missing
value is when an observation, or tuple, doesn’t have values for each of the attributes.
The reason why a tuple lack values for each attribute can be a technical issue, but also
that the system gathering the data is constructed in a way that creates missing values.
A few different techniques to deal with this problem exists, among them is simply to
remove the tuple, to use a global constant to fill in the missing value, or to fill in the
mean or median value for that particular attribute.
All these techniques come with drawbacks, as ignoring the tuple decreases the size of
the data and possibly affect the quality of the results. Using a constant for all missing
values, or using the mean or median value produces a bias in the data and can
therefore also reduce the quality of the results (Jiawei Han et al., 2012).
22
In this study there were one attribute that many of the tuples were missing data for,
which was due to how the system is constructed. These missing values are actually 0
and were thus replaced with 0.
Binning is a data preprocessing technique to smooth data and can be used to reduce
the numbers of distinct values that an attribute can be (Jiawei Han et al., 2012). In this
study all the orders gathered from the ERP-system includes information about when
the order was placed and when it was scheduled for delivery and other scheduling
information. All these were in a year-month-day format, which creates many different
possible combinations over a two-year period. Here binning was used to reduce the
numbers of possible values, and to better visualize any monthly effects. All dates were
formatted as just month.
One of the attributes in the dataset had all its values formatted as text, so these were
transformed into numbers instead to ease the handling and loading of the data.
Lastly, the data was cleared from any sensitive information (costs, who placed the
order etc.) as well as a few redundant attributes. Some attributes were also removed
that contained information only available when the delivery was completed, this was
removed due to the fact that if a predictive model is to be useful, it cannot include
information not available before the actual delivery taking place. It would then not be
possible to use as a predictive model. When the data was preprocessed it contained 15
attributes without the target value (actual delivery time).
3.3 Prediction
This chapter is divided into parts for each of the chosen prediction models, in which it
is described which parameters that must be chosen and how it is performed.
Python has become one of the most popular languages due to its interactive nature
and the growing ecosystem of scientific libraries (Pedregosa et al., 2011).
Scikit-learn is a Python module which integrates several machine learning algorithms
for supervised and unsupervised learning. The scikit-learn project was created in
order to provide an open source machine learning library that would be well
accessible to non-machine learning experts and usable in various scientific areas
(Buitinck et al., 2013).
The library is a collection of functions and classes that the user then imports into a
Python environment.
Scikit-learn is as previously mentioned, a Python library, and needs an environment
to run it. Anaconda, which is a package and environment manager has been chosen to
be used together with the scikit-learn library. Anaconda has thousands of open-source
packages able for installation, and offers a simple user interface (Anaconda Distribution
2020).
23
3.3.1 Linear regression
Scikit uses the most common method for parameter estimation, the ordinary least
squares method, shown in equation 3.1, which tries to minimize the sum of squares
that differs between the observed values and the predicted ones (Linear Regression
2020).
(3.1)
Table 4 The variables of ordinary least squares
w Coefficients y Target values
X Training data
The data is divided into arrays, X for the testing data and y for the target values. The
coefficients, or weights, are calculated and the model is then fitted and tested.
The results are then calculated with the help of cross validation and presented in the
form of MAE.
3.3.2 Ridge regression
Ridge regression uses a similar method for parameter estimation as the ordinary least
squares method does but adds a penalty on the size of the coefficients. Then the model
minimizes the sum of squares in accordance with equation 3.2 (Linear Models 2020):
(3.2)
Table 5 The variables of Ridge regression
w Coefficients y Target values
X Training data 𝛼 Regularization parameter
The complexity parameter alpha in the equation above decides the amount of
shrinkage in the model, a larger value of alpha results in a bigger shrinkage. Since the
best suitable value of alpha can differ a lot depending on the data, Scikit-learn offers a
method to try different values. This is done with RidgeCV which is a cross validation
||𝑋𝑤 − 𝑦||𝑤
𝑚𝑖𝑛
2
2
𝑚𝑖𝑛
𝑤||𝑋𝑤 − 𝑦||
2
2+ 𝛼||𝑤||
2
2
24
method where the model uses different alpha values to see which one best fits the
model (RidgeCV 2020).
Figure 3: Example of how the MAE changes depending on the value of alpha
RidgeCV performs a 5-fold cross validation for each of the alpha values that is
specified to be tested. In this study, the following alpha values are included:
Table 6 The values tested as alpha in Ridge regression
𝛼 0.001, 0.005, 0.01, 0.05, 0.1, 0.3, 1, 3, 5, 10, 15, 30, 50, 75, 100, 150
After the best suited alpha value is calculated, the results are calculated with cross
validation and presented in the form of MAE together with the standard deviation of
the results.
3.3.3 Lasso
Lasso estimates sparse coefficients and tends to prefer solutions with coefficients,
which reduces the variables the solution is dependent on. Lasso tries to minimize
equation 3.3 (Linear Models 2020):
(3.3)
Table 7 The variables of Lasso
w Coefficients y Target values
X Training data 𝛼 Regularization parameter
𝑚𝑖𝑛
𝑤
1
2𝑛𝑠𝑎𝑚𝑝𝑙𝑒𝑠||𝑋𝑤 − 𝑦||
2
2+ 𝛼||𝑤||
1
25
The alpha in the equation decides the degree of sparsity of the coefficients, and just
like Ridge regression, the best suited alpha for the model can be decided by cross
validation. Scikit-learn offers two different methods for deciding the value of alpha,
the LassoCV and the LassoLarsCV function. LassoCV is however most often preferred
and will be used in this study (Lasso 2020).
Figure 4: Example of how the MAE changes depending on the value of alpha
The following values of alpha was tested in the model, each time with LassoCV
performing a 5-fold cross validation to test how the different alpha values affect the
results:
Table 8 The values tested as alpha in Lasso
𝛼 0.001, 0.005, 0.01, 0.05, 0.1, 0.3, 1, 3, 5, 10, 15, 30, 50, 75, 100, 150
The best suited parameter is then set and used in a 5-fold cross validation to get the
results, which are presented as MAE together with the standard deviation of the
results.
3.3.4 Elastic net
Elastic net uses both L1 and L2-norm regularization on the coefficients, which gives
the model some of the properties of both Lasso and Ridge regression. Elastic net tries
to minimize equation 3.4 (Linear Models 2020):
(3.4)
𝑚𝑖𝑛
𝑤
1
2𝑛𝑠𝑎𝑚𝑝𝑙𝑒𝑠||𝑋𝑤 − 𝑦||
2
2+ 𝛼𝜌||𝑤||
1+
𝛼(1 − 𝜌)
2||𝑤||
2
2
26
Table 9 The variables of Elastic net
w Coefficients y Target values
X Training data 𝛼 Regularization parameter
𝜌 Ratio between L1 and L2 penalty
As seen above, the equation for Elastic net contains two parameters, alpha and 𝜌 (L1
ratio). To find the parameters most suited for the model, the ElasticNetCV function
can be used (ElasticNetCV 2020). Another possibility is to use the function
GridSearchCV which performs an exhaustive search from a grid of specified
parameters to see which combination provides the best result (GridSearchCV 2020). In
this study, the alpha values tested are the following:
Table 10 The values tested as alpha in Elastic net
𝛼 0.001, 0.005, 0.01, 0.05, 0.1, 0.3, 1, 3, 5, 10, 15, 30, 50, 75, 100, 150
The second parameters used in Elastic net is the L1 ratio which decides the
combination of L1 and L2 penalty. This value must be between 0 and 1, with 0 being
an L2 penalty (Ridge), and 1 being an L1 penalty (Lasso). All values between 0 and 1
results in a combined L1 and L2 penalty. It is recommended to try more values of the
L1 ratio closer to 1 than 0 (ElasticNetCV 2020). The L1 ratio numbers tested by the
GridSearchCV in this study are:
Table 11 The values tested as 𝜌 in Elastic net
𝜌 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99, 1
After the parameters are set, a 5-fold cross validation is used to get the results of the
model, which are presented together with the standard deviation.
3.4 Evaluation
After calculating the optimal parameters for the prediction models, these parameters
were inserted in the models and the results were calculated by using a 5-fold cross
validation. During the cross validation the dataset is divided into 5 smaller sets of
equal size, of which 4 will be used for training and the 5th set for testing. This is
repeated 5 times until all of the smaller sets have been used as both training and
testing sets (Cross-validation 2020). The function for the actual results of the cross
validation in Scikit-learn is the cross_val_score function, which returns the results of
each of the folds in the cross validation. The cross_val_score function offers a few
different ways of presenting the results, one of which is the MAE that was used in this
study. The results returned by the cross_val_score function is the average value of the
prediction for each tuple in the test set. This means that each fold of the cross-
27
validation only returns one value, which was collected and is presented in the results
chapter.
3.5 Reliability & Validity
This study is a quantitative study, which comes with the benefit that the
interpretations and findings are based on quantitative and measured data rather than
impressions. But for the analysis to have any value, the data has to be valid
(Denscombe, 2014). Good validity and reliability are two key principles of ensuring
that the research has good credibility and is of high quality.
3.5.1 Reliability
Reliability is a measure of the consistency of the study, that a research instrument
should reach approximately the same result every time the test is completed (Heale
and Twycross, 2015). This means that the people behind the research should have no
impact on the outcome of the study, and given everything else being equal, produce
the same results on different occasions (Denscombe, 2014).
The use of methods that are firmly grounded in previous research both for the
analysis of the data as well as the examination of the results indicates that the study is
of high reliability.
3.5.2 Validity
The validity refers to the accuracy of the data, as well as if the data is appropriate to
answer the research question of the study (Denscombe, 2014). The data in this study is
taken directly from the ERP-system of an actual company, which makes it appropriate
to answer the research question. The data is also of large quantity, spans over a two-
year period and was checked for errors before being used.
This means that the data in the study is of high validity.
3.6 Research ethics
To conduct research in an ethical way, there are six main principles that needs to be
considered. These are (Our core principles 2020):
• The research should maximize the benefit for individuals and society, as
well as minimize the risk and harm.
• The research respects the rights and dignity of individuals and groups.
28
• Participation in the research should be voluntary and appropriately
informed.
• The research is to be conducted with integrity and transparency.
• That the lines of responsibility and accountability are clearly defined.
• Research should be independent, and that conflicts of interest are explicit
in the cases where they cannot be avoided.
There is no reason to believe that the study will violate any of these principles.
Following the principles is easier since no information about individuals will be
gathered, the only data used is retrieved from the company ERP and is presented
without any sensitive information.
29
4 Results
As data regarding 10 items were chosen, 10 individual predictions were made with
the four different models. The results are presented below in table 8 and figure 6 and
7.
The number of the item is in the header, along with the number of rows (tuples) and
attributes. The results of the predictions are presented as the mean absolute value
(MAE) together with the standard deviation (SD). Ridge regression, Lasso and Elastic
net all use one or two parameters which must be set in the model. The values here are
the parameters that yielded the best results, blank fields mean that the model does not
use the parameter.
Table 8 The results of the prediction models
Item 1 (160 x 15)
MAE SD Alpha L1 ratio
Linear regression 34.923 14.029
Ridge regression 29.794 5.933 15
Lasso 29.78 7.286 5
Elastic net 29.78 7.286 5 1
Item 2 (159 x 15)
MAE SD Alpha L1 ratio
Linear regression 15.785 4.082
Ridge regression 15.785 4.082 0.001
Lasso 15.785 4.08 0.001
Elastic net 15.785 4.08 0.001 1
Item 3 (188 x 15)
MAE SD Alpha L1 ratio
Linear regression 26.631 9.759
Ridge regression 26.603 9.965 50
Lasso 26.091 10.491 10
Elastic net 26.091 10.491 10 1
Item 4 (374 x 15)
MAE SD Alpha L1 ratio
Linear regression 32.974 8.014
Ridge regression 32.783 8.38 10
Lasso 32.891 8.374 1
Elastic net 32.783 8.4 0.05 0.3
30
Item 5 (260 x 15)
MAE SD Alpha L1 ratio
Linear
Regression
39.594 15.488
Ridge regression 37.398 10.768 75
Lasso 39.05 11.389 15
Elastic net 37.443 10.924 0.3 0
Item 6 (165 x 15)
MAE SD Alpha L1 ratio
Linear regression 37.619 12.131
Ridge regression 37.567 12.095 1
Lasso 37.375 7.826 50
Elastic net 37.375 7.826 50 1
Item 7 (163 x 15)
MAE SD Alpha L1 ratio
Linear regression 27.046 8.319
Ridge regression 26.678 8.153 150
Lasso 26.319 8.441 5
Elastic net 26.319 8.441 5 1
Item 8 (155 x 15)
MAE SD Alpha L1 ratio
Linear regression 26.719 5.191
Ridge regression 25.778 4.564 150
Lasso 26.411 5.135 30
Elastic net 25.82 4.731 3 0.4
Item 9 (152 x 15)
MAE SD Alpha L1 ratio
Linear regression 12.645 3.756
Ridge regression 12.521 3.927 150
Lasso 12.164 4.319 5
Elastic net 12.164 4.319 5 1
Item 10 (178 x 15)
MAE SD Alpha L1 ratio
Linear regression 25.182 8.999
Ridge regression 24.493 8.646 150
Lasso 23.809 8.74 15
Elastic net 23.803 8.772 15 0.95
31
Below is a presentation of the average values of the mean absolute error as well as the
average standard deviation of the models.
Figure 6: The average MAE of the results
Figure 7: The average SD of the results
32
5 Discussion
The linear regression model with the ordinary least squares method performs worst in
all cases, as can be seen in table 8, which could be expected as this model is unable to
perform any regularization. That the regularization is what makes the other models
perform better is clear, as the results of Item 2 shows that linear regression, just as it
theoretically should (compare equation 1 and 2 for example), performs just in line
with the other models when their regularization parameter, alpha, moves towards
zero. 0.001 was the lowest value of alpha that the models could use, but they might
have chosen 0 if it had been among the possible values.
Ridge regression was developed to deal with some of the shortcomings of the
ordinary least squares method, and the results clearly shows that it does so. Even
though the results are close to each other, Ridge regression performs better or as good
as the linear regression in all cases.
Lasso was also developed as a continuation and improvement of the linear regression
and is highlighted for its ability to reduce the number of variables that the solution is
dependent on. This is brought up as a solution to the problem with standard
regression models which often include too many variables, leading to overfitting
(Ranstam and Cook, 2018). It is difficult to draw any conclusions when looking at the
results from Lasso compared to Ridge regression, since even though the number of
variables is rather small (15) Lasso performs better than Ridge in most cases, but not
all, see table 8. So, it is hard to say if Lasso is performing better due to its tendency to
prefer fewer variables, since there are not that many to begin with, and Lasso does not
perform better over all cases.
Elastic net was developed to improve Lasso, and the results show that it does not just
perform better than Lasso, it performs best overall which clearly can be seen in figure
6. It has, thanks to its parameter to control the ratio between L1 and L2 penalty, the
ability to perform more like Ridge regression or as Lasso depending on which is best
at that moment. As the L1 ratio approaches 1 the model performs as Lasso, and when
the ratio moves towards 0 it performs as Ridge regression, just as it theoretically
should. This gives Elastic net a flexibility to adapt to the current data and to always
perform better or as good as the other models.
Hao and Lu (2018) brought up that the Elastic net model is performing especially well
when the number of variables is bigger than the number of tuples, but as seen in the
results it performs as good as, or better than, Ridge regression and Lasso even though
the amount of variables is far smaller than the number of tuples in all cases.
The performance of Elastic net in situations with less variables than tuples is also a
key point in the study of Zou and Hastie (2005). This together with the fact that no
clear patterns can be seen when comparing the items with more tuples with the ones
with fewer makes it interesting to see whether any of the models would stand out
when the number of tuples becomes larger or smaller. At least in theory, the Elastic
net should perform even better as the number of tuples shrinks to fewer than the
variables.
33
Previous research on the area of production planning under uncertainty has resulted
in models which are an improvement compared to production planning without
taking any consideration to the uncertainties involved. This means that the area is
already useful for practitioners and researchers but including this data-driven method
for forecasting deliveries could improve the models even further.
34
6 Conclusion
The best overall model for predicting the delivery times is the Elastic net, with being
the best predictor in almost all cases, and very close to the best in the rest. It is clear
that it adapts and becomes more like Ridge or as Lasso depending on which one
works the best. Ordinary linear regression performs worst in more or less all cases and
should therefore not be used for this purpose.
This means that Elastic net should be the chosen as the predictive model to be used
when calculating the actual delivery times from suppliers. These results can then be
used to mitigate the supply uncertainty in a larger model for production planning, or
to be used independently in supply chain risk management.
6.1 Future research
The sample sizes in this model spans from 152 to 374 observations, which means that
these parts were delivered multiple times per month during these two years. For a
producing company, many parts from the suppliers will be delivered far less often
than that, which results in a weaker base for a prediction model. Future research
could look if a similar result would appear when the number of observations gets
smaller, and how small this number can be and with the model still providing a
meaningful result.
As the previous studies on the area often simplify the calculations of uncertainty from
the supply side, it would be interesting to see a predictive model, preferably elastic
net, implemented in a larger model. This though, requires data from a real-world case.
The results can then be compared with the results of the previous models to see if
there is any improvement.
35
References
1.1. Linear Models — scikit-learn 0.22.2 documentation [WWW Document], n.d. URL
https://scikit-learn.org/stable/modules/linear_model.html#lasso (accessed
3.19.20).
3.1. Cross-validation: evaluating estimator performance — scikit-learn 0.22.2
documentation [WWW Document], n.d. URL https://scikit-
learn.org/stable/modules/cross_validation.html#cross-validation (accessed
5.1.20).
3.2.4.1.1. sklearn.linear_model.ElasticNetCV — scikit-learn 0.22.2 documentation
[WWW Document], n.d. URL https://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html#
sklearn.linear_model.ElasticNetCV (accessed 5.1.20).
3.2.4.1.9. sklearn.linear_model.RidgeCV — scikit-learn 0.22.2 documentation [WWW
Document], n.d. URL https://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html#skle
arn.linear_model.RidgeCV (accessed 4.30.20).
3.3. Metrics and scoring: quantifying the quality of predictions — scikit-learn 0.22.2
documentation [WWW Document], n.d. URL https://scikit-
learn.org/stable/modules/model_evaluation.html#regression-metrics
(accessed 5.4.20).
Anaconda Distribution — Anaconda documentation [WWW Document], n.d. URL
https://docs.anaconda.com/anaconda/ (accessed 3.19.20).
Aouam, T., Geryl, K., Kumar, K., Brahimi, N., 2018. Production planning with order
acceptance and demand uncertainty. Comput. Oper. Res. 91, 145–159.
https://doi.org/10.1016/j.cor.2017.11.013
Beemsterboer, B., Land, M., Teunter, R., 2016. Hybrid MTO-MTS production planning:
An explorative study. Eur. J. Oper. Res. 248, 453–461.
https://doi.org/10.1016/j.ejor.2015.07.037
Box, G.E.P., Jenkins, G.M., Reinsel, G.C., 2013. Introduction, in: Time Series Analysis.
John Wiley & Sons, Ltd, pp. 7–18. https://doi.org/10.1002/9781118619193.ch1
Braz, A.C., De Mello, A.M., de Vasconcelos Gomes, L.A., de Souza Nascimento, P.T.,
2018. The bullwhip effect in closed-loop supply chains: A systematic literature
review. J. Clean. Prod. 202, 376–389.
https://doi.org/10.1016/j.jclepro.2018.08.042
Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae,
V., Prettenhofer, P., Gramfort, A., Grobler, J., Layton, R., Vanderplas, J., Joly,
A., Holt, B., Varoquaux, G., 2013. API design for machine learning software:
experiences from the scikit-learn project. ArXiv13090238 Cs.
Bushuev, M.A., Guiffrida, A.L., 2012. Optimal position of supply chain delivery
window: Concepts and general conditions. Int. J. Prod. Econ. 137, 226–234.
https://doi.org/10.1016/j.ijpe.2012.01.039
36
Chai, T., Draxler, R.R., 2014. Root mean square error (RMSE) or mean absolute error
(MAE)? – Arguments against avoiding RMSE in the literature. Geosci. Model
Dev. 7, 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014
Choosing the right estimator — scikit-learn 0.22.2 documentation [WWW Document],
n.d. URL https://scikit-
learn.org/stable/tutorial/machine_learning_map/index.html (accessed 3.19.20).
Davidson, I., Tayi, G., 2009. Data preparation using data quality matrices for
classification mining. Eur. J. Oper. Res. 197, 764–772.
https://doi.org/10.1016/j.ejor.2008.07.019
de Myttenaere, A., Golden, B., Le Grand, B., Rossi, F., 2016. Mean Absolute Percentage
Error for regression models. Neurocomputing 192, 38–48.
https://doi.org/10.1016/j.neucom.2015.12.114
Denscombe, M., 2014. The Good Research Guide : For Small-scale Research Projects,
Open UP Study Skills. McGraw-Hill Education, Maidenhead, Berkshire.
Er, M., Arsad, N., Astuti, H., Kusumawardani, R., Utami, R., 2018. Analysis of
production planning in a global manufacturing company with process
mining. J. Enterp. Inf. Manag. 31, 317–337. https://doi.org/10.1108/JEIM-01-
2017-0003
Flores, B.E., 1986. A pragmatic view of accuracy measurement in forecasting. Omega
14, 93–98. https://doi.org/10.1016/0305-0483(86)90013-7
Gao, L., Yang, N., Zhang, R., Luo, T., 2017. Dynamic Supply Risk Management with
Signal-Based Forecast, Multi-Sourcing, and Discretionary Selling. Prod. Oper.
Manag. 26, 1399–1415. https://doi.org/10.1111/poms.12695
Hao, Y.-X., Lu, D., 2018. Sparse approximation of fitting surface by elastic net.
Comput. Appl. Math. 37, 2784–2794. https://doi.org/10.1007/s40314-017-0475-4
Hastie, T., Tibshirani, R., Friedman, J., 2009. Linear Methods for Regression, in: Hastie,
T., Tibshirani, R., Friedman, J. (Eds.), The Elements of Statistical Learning:
Data Mining, Inference, and Prediction, Springer Series in Statistics. Springer,
New York, NY, pp. 43–99. https://doi.org/10.1007/978-0-387-84858-7_3
Heale, R., Twycross, A., 2015. Validity and reliability in quantitative studies. Evid.
Based Nurs. 18, 66–67. https://doi.org/10.1136/eb-2015-102129
Hübl, A., 2018. Stochastic Modelling in Production Planning. Springer Fachmedien
Wiesbaden, Wiesbaden. https://doi.org/10.1007/978-3-658-19120-7
Jaggi, C., Verma, M., Jain, R., 2018. Quantitative analysis for measuring and
suppressing bullwhip effect. Yugosl. J. Oper. Res. 28, 415–433.
https://doi.org/10.2298/YJOR161211019J
Jiawei Han, Micheline Kamber, Jian Pei, 2012. 3 - Data Preprocessing.
https://doi.org/10.1016/B978-0-12-381479-1.00003-4
Khakdaman, M., Wong, K.Y., Zohoori, B., Tiwari, M.K., Merkert, R., 2015. Tactical
production planning in a hybrid Make-to-Stock–Make-to-Order environment
under supply, process and demand uncertainties: a robust optimisation
model. Int. J. Prod. Res. 53, 1358–1386.
https://doi.org/10.1080/00207543.2014.935828
37
Liu, Y., Liao, S., Jiang, S., Ding, L., Lin, H., Wang, W., 2020. Fast Cross-Validation for
Kernel-Based Algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 42, 1083–
1096. https://doi.org/10.1109/TPAMI.2019.2892371
Makridakis, S., 1993. Accuracy measures: theoretical and practical concerns. Int. J.
Forecast. 9, 527–529. https://doi.org/10.1016/0169-2070(93)90079-3
Mendo, L., 2009. Estimation of a probability with guaranteed normalized mean
absolute error. IEEE Commun. Lett. 13, 817–819.
https://doi.org/10.1109/LCOMM.2009.091128
Mirzapour Al-e-hashem, S.M.J., Malekly, H., Aryanezhad, M.B., 2011. A multi-
objective robust optimization model for multi-product multi-site aggregate
production planning in a supply chain under uncertainty. Int. J. Prod. Econ.
134, 28–42. https://doi.org/10.1016/j.ijpe.2011.01.027
Mula, J., Poler, R., García-Sabater, J.P., Lario, F.C., 2006. Models for production
planning under uncertainty: A review. Int. J. Prod. Econ. 103, 271–285.
https://doi.org/10.1016/j.ijpe.2005.09.001
“neural-networks” tag wiki [WWW Document], n.d. . Math. Stack Exch. URL
https://math.stackexchange.com/tags/neural-networks/info (accessed 5.17.20).
Ngniatedema, T., Fono, L.A., Mbondo, G.D., 2015. A delayed product customization
cost model with supplier delivery performance. Eur. J. Oper. Res. 243, 109–
119. https://doi.org/10.1016/j.ejor.2014.11.017
Our core principles - Economic and Social Research Council [WWW Document], n.d.
URL https://esrc.ukri.org/funding/guidance-for-applicants/research-
ethics/our-core-principles/ (accessed 3.16.20).
Pandis, N., 2016. Linear regression. Am. J. Orthod. Dentofacial Orthop. 149, 431–434.
https://doi.org/10.1016/j.ajodo.2015.11.019
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel,
M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É., 2011. Scikit-learn:
Machine Learning in Python. J. Mach. Learn. Res. 12, 2825−2830.
Peidro, D., Mula, J., Jiménez, M., del Mar Botella, M., 2010. A fuzzy linear
programming based approach for tactical supply chain planning in an
uncertainty environment. Eur. J. Oper. Res. 205, 65–80.
https://doi.org/10.1016/j.ejor.2009.11.031
Ranstam, J., Cook, J.A., 2018. LASSO regression. BJS Br. J. Surg. 105, 1348–1348.
https://doi.org/10.1002/bjs.10895
Russo, A., Raischel, F., Lind, P.G., 2013. Air quality prediction using optimal neural
networks with stochastic variables. Atmos. Environ. 79, 822–830.
https://doi.org/10.1016/j.atmosenv.2013.07.072
Schmidt, A.F., Finan, C., 2018. Linear regression and the normality assumption. J. Clin.
Epidemiol. 98, 146–151. https://doi.org/10.1016/j.jclinepi.2017.12.006
Shanmuganathan, S., Samarasinghe, S. (Eds.), 2016. Artificial Neural Network
Modelling, Studies in Computational Intelligence. Springer International
Publishing, Cham. https://doi.org/10.1007/978-3-319-28495-8
38
Silva, T.C., Ribeiro, A.A., Periçaro, G.A., 2018. A new accelerated algorithm for ill-
conditioned ridge regression problems. Comput. Appl. Math. 37, 1941–1958.
https://doi.org/10.1007/s40314-017-0430-4
sklearn.linear_model.Lasso — scikit-learn 0.22.2 documentation [WWW Document],
n.d. URL https://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.
linear_model.Lasso (accessed 4.30.20).
sklearn.linear_model.LinearRegression — scikit-learn 0.22.2 documentation [WWW
Document], n.d. URL https://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.ht
ml#sklearn.linear_model.LinearRegression (accessed 4.29.20).
sklearn.model_selection.GridSearchCV — scikit-learn 0.22.2 documentation [WWW
Document], n.d. URL https://scikit-
learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.ht
ml (accessed 5.5.20).
Soo, K.W., Rottman, B.M., 2018. Causal strength induction from time series data. J.
Exp. Psychol. Gen. 147, 485–513.
http://dx.doi.org.proxybib.miun.se/10.1037/xge0000423
Su, X., Yan, X., Tsai, C.-L., 2012. Linear regression. WIREs Comput. Stat. 4, 275–294.
https://doi.org/10.1002/wics.1198
Sweeney, E., Grant, D.B., Mangan, D.J., 2018. Strategic adoption of logistics and
supply chain management. Int. J. Oper. Prod. Manag. 38, 852–873.
https://doi.org/10.1108/IJOPM-05-2016-0258
Varoquaux, G., 2018. Cross-validation failure: Small sample sizes lead to large error
bars. NeuroImage 180, 68–77. https://doi.org/10.1016/j.neuroimage.2017.06.061
Wang, S., Ji, B., Zhao, J., Liu, W., Xu, T., 2018. Predicting ship fuel consumption based
on LASSO regression. Transp. Res. Part Transp. Environ. 65, 817–824.
https://doi.org/10.1016/j.trd.2017.09.014
Weiss, C.J., 2017. Introduction to Stochastic Simulations for Chemical and Physical
Processes: Principles and Applications. J. Chem. Educ. 94, 1904–1910.
https://doi.org/10.1021/acs.jchemed.7b00395
Welc, J., Esquerdo, P.J.R., 2018. Basics of Regression Models, in: Welc, J., Esquerdo,
P.J.R. (Eds.), Applied Regression Analysis for Business: Tools, Traps and
Applications. Springer International Publishing, Cham, pp. 1–6.
https://doi.org/10.1007/978-3-319-71156-0_1
Zou, H., Hastie, T., 2005. Regularization and Variable Selection via the Elastic Net. J.
R. Stat. Soc. Ser. B Stat. Methodol. 67, 301–320.