+ All Categories
Home > Documents > Predicting deliveries from suppliers1445722/FULLTEXT01.pdf · The need for further research is...

Predicting deliveries from suppliers1445722/FULLTEXT01.pdf · The need for further research is...

Date post: 01-Feb-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
38
Predicting deliveries from suppliers A comparison of predictive models Marcus Sawert Högskolepoäng: 30 HP Termin/år: VT 2020 Handledare: Leif Olsson Examinator: Aron Larsson Kurskod/registreringsnummer: IG001A Utbildningsprogram: Civilingenjör i Industriell ekonomi
Transcript
  • Predicting deliveries from suppliers

    A comparison of predictive models

    Marcus Sawert

    Högskolepoäng: 30 HP

    Termin/år: VT 2020

    Handledare: Leif Olsson

    Examinator: Aron Larsson

    Kurskod/registreringsnummer: IG001A

    Utbildningsprogram: Civilingenjör i Industriell ekonomi

  • 2

    Abstract

    In the highly competitive environment that companies find themselves in today, it is

    key to have a well-functioning supply chain. For manufacturing companies, having a

    good supply chain is dependent on having a functioning production planning. The

    production planning tries to fulfill the demand while considering the resources

    available. This is complicated by the uncertainties that exist, such as the uncertainty in

    demand, in manufacturing and in supply. Several methods and models have been

    created to deal with production planning under uncertainty, but they often overlook

    the complexity in the supply uncertainty, by considering it as a stochastic uncertainty.

    To improve these models, a prediction based on earlier data regarding the supplier or

    item could be used to see when the delivery is likely to arrive.

    This study looked to compare different predictive models to see which one could best

    be suited for this purpose.

    Historic data regarding earlier deliveries was gathered from a large international

    manufacturing company and was preprocessed before used in the models. The target

    value that the models were to predict was the actual delivery time from the supplier.

    The data was then tested with the following four regression models in Python: Linear

    regression, ridge regression, Lasso and Elastic net. The results were calculated by

    cross-validation and presented in the form of the mean absolute error together with

    the standard deviation. The results showed that the Elastic net was the overall best

    performing model, and that the linear regression performed worst.

    Keywords: Production planning, Supply, Deliveries, Prediction, Linear regression,

    Ridge regression, Lasso, Elastic net.

  • 3

    Table of Contents

    Abstract ............................................................................................................................... 2

    List of abbreviations ........................................................................................................... 4

    1 Introduction ...................................................................................................................... 5

    1.1 Background ...................................................................................................................... 5

    1.2 Purpose & Aim ................................................................................................................ 7

    1.3 Research question ........................................................................................................... 7

    1.4 Delimitations ................................................................................................................... 7

    2 Theory ............................................................................................................................... 8

    2.1 Supply chain management ............................................................................................ 8

    2.2 Stochastic processes ........................................................................................................ 9

    2.3 Predictive models .......................................................................................................... 10

    2.4 Evaluation methods ...................................................................................................... 15

    2.5 Previous studies ............................................................................................................ 18

    3 Method ............................................................................................................................ 20

    3.1 Data collection ............................................................................................................... 20

    3.2 Data preprocessing ....................................................................................................... 21

    3.3 Prediction ....................................................................................................................... 22

    3.4 Evaluation ...................................................................................................................... 26

    3.5 Reliability & Validity .................................................................................................... 27

    3.6 Research ethics .............................................................................................................. 27

    4 Results ............................................................................................................................ 29

    5 Discussion ...................................................................................................................... 32

    6 Conclusion ..................................................................................................................... 34

    6.1 Future research .............................................................................................................. 34

    References......................................................................................................................... 35

  • 4

    List of abbreviations

    ANN Artificial Neural Network

    ARIMA Autoregressive Integrated Moving Average

    ERP Enterprise Resource Planning

    Lasso Least Absolute Shrinkage and Selection Operator

    MAE Mean Absolute Error

    MAPE Mean Absolute Percentage Error

    MRP Materials Requirement Planning

    MRPII Manufacturing Resource Planning

    MTO Make-to-order

    MTS Make-to-stock

    SCP Supply chain planning

    sMAPE symmetric Mean Absolute Percentage Error

  • 5

    1 Introduction

    In the highly competitive and volatile business environment many manufacturing

    companies act in today, it is of great importance for them to have a functioning supply

    chain management, which manages the flow of material from the procurement,

    through manufacturing, and last with the delivery to the end customer (Sweeney et

    al., 2018). The production planning is a big part of this process, which is trying to

    fulfill the demand while considering the available material, employee and production

    resources.

    The available material is dependent on the deliveries from the suppliers, which not

    always can be taken for granted to be on time (Khakdaman et al., 2015). This creates

    the need to take the uncertainty in the material procurement in consideration when

    the delivery date to the end customer is communicated. Therefore, it is important to

    find a model that can predict the incoming deliveries with a high accuracy.

    1.1 Background

    Supply chain management is a way of coordinating the activities within a company to

    ensure that the customer demand is met and delivered on time. The five main supply

    chain processes of the Supply Chain Operations Reference-model are plan, source,

    make, deliver and return (Bushuev and Guiffrida, 2012). The delivery part of the

    supply chain is of great importance, and the delivery performance is one of the key

    aspects of a successful supply chain. The timeliness of the deliveries has an impact on

    the customer satisfaction, as well as having an economical aspect. Both early and late

    deliveries create costs, early deliveries create extra inventory costs and late deliveries

    produce costs in the form of production stoppage costs, lost sales and lost goodwill. In

    some cases the delivery reliability, that the delivery is made in time, is more valued

    than a fast delivery (Bushuev and Guiffrida, 2012).

    To maintain a good delivery performance while maintaining low inventories is

    increasingly difficult, with the increased need for customization which creates an

    issue to accurately forecast the demand (Ngniatedema et al., 2015). This highlights one

    of the key processes within manufacturing companies to deal with these problems, the

    production planning. The production planning is the process that aims to making sure

    that the production fulfills the demands available, with consideration to the available

    material, the employees and the production resources. The production planning can

    be divided into three parts, starting with a long-term aggregate plan, followed by a

    mid-term plan and finally a short-term plan which decides which products and how

    many to be produced, and when the production should start (Er et al., 2018). The

    production planning is very closely linked to the material procurement since the

    production cannot be completed if the required material is not available. A number of

    systems have been developed to help coordinate this, such as the materials

    requirement planning (MRP), the manufacturing resource planning (MRPII) and the

    enterprise resource planning (ERP).

    It is desirable that the scheduling produced by the production planning is as stable as

  • 6

    possible, but different uncertainties often cause a continual re-planning (Er et al.,

    2018).

    The uncertainties that needs to be taken into consideration within production

    planning can be divided into three main groups (Peidro et al., 2010): demand,

    process/manufacturing and supply. The demand uncertainty is being presented as the

    most important one, which consists of the volatility in demand and the inexact

    forecasting. The process/manufacturing uncertainty is regarding the internal

    production process, with machine problems, quality issues etc. The uncertainty in

    supply is caused by faults or delays in the deliveries from the suppliers (Peidro et al.,

    2010).

    There has been extensive research on the area of production planning with

    uncertainty, but there is a handful of weaknesses that are not always addressed.

    Beemsterboer et al. (2016) brings up the issue that a lot of the literature assumes either

    pure make-to-order (MTO) or make-to-stock (MTS) production systems, conflicting

    with modern hybrid production systems which combine the two.

    Also a lack of consideration of multi-product organization is brought up (Khakdaman

    et al., 2015) as a weakness in the available literature. Assuming a single product is a

    simplification and doesn’t portray modern organizations.

    In some of the previous research of production planning with uncertainty the

    frameworks used have been stochastic programming and robust optimization, but

    these models have only been considering demand uncertainty, which is only one type

    that exists within production planning (Aouam et al., 2018).

    The need for further research is expressed in (Mula et al., 2006), where they determine

    that uncertainty always needs to be taken into consideration, and that optimization in

    the area of production planning is very complex. The same study mentions models

    based on artificial intelligence and fuzzy set theory to possibly be of use to model

    production planning under uncertainty. Mula et al. (2006) also highlights the need for

    models with other types of uncertainty, since models dealing with uncertain demand

    has received a lot more attention in comparison.

    Several studies on the area, such as Khakdaman et al. (2015), Gao et al. (2017), Peidro

    et al. (2010) and Mirzapour Al-e-hashem et al. (2011) build their models with a

    number of variables to help with production planning (etc.) under uncertainty, but

    they lack consideration to uncertain supply, or it is assumed that the supply is

    stochastic. Assuming that the uncertain supply is stochastic is a simplification of

    reality as different parts and suppliers differs significant in their delivery

    performance. A model to predict the incoming deliveries could improve these models

    to become even more accurate.

  • 7

    1.2 Purpose & Aim

    The aim of the study is to contribute to the research areas of supply chain risk

    management as well as predictive modeling, by examining which predictive model is

    best suited to predict the supply to a manufacturing company.

    1.3 Research question

    The research question is to examine which type of predictive model that gives the

    most accurate results when predicting the supply of components and material to a

    manufacturing company.

    1.4 Delimitations

    The data in the research will be taken from a large international manufacturing

    company that has a multi-product mix, with both MTO and MTS production. The

    production mostly consists of assembly of components from suppliers.

  • 8

    2 Theory

    A few theoretical concepts are presented in the following chapter, which are relevant

    to the study. This includes different models for prediction, as well as methods to

    evaluate the results. It is also motivated which prediction models and evaluation

    methods that ended up being used in the study. These will be presented in the

    method chapter in more depth as to how they were practically used.

    2.1 Supply chain management

    Within business, a supply chain is a network of organizations, people, activities and

    resources needed to meet the demand of the end customer. Managing this is a

    foundation for an organization to attain and maintain its competitive advantage

    (Bushuev and Guiffrida, 2012). This supply chain management includes an active

    coordination of raw material acquisition, production processing as well as the

    distribution of goods and services.

    The five supply chain processes defined by the Supply Chain Operations Reference-

    model (SCOR) are plan, source, make, deliver and return. The importance of the

    delivery process in the supply chain is well documented, as the timeliness of

    deliveries are of great importance to the customer.

    This together with the fact that deliveries are integrated and effecting the different

    stages of the supply chain makes the delivery performance the most important metric

    in a supply chain (Bushuev and Guiffrida, 2012).

    Improving the delivery process in the supply chain is therefore of interest to managers

    within supply chain and logistics.

    How well the delivery process works for a manufacturing company is dependent on if

    the company can produce the required amount of goods in time, which requires a

    well-functioning production planning.

    2.1.1 Production planning

    Production planning is the process within a manufacturing company to plan the

    production regarding the availability of material, production resources and

    employees. The goals of the production planning are, among others, to provide on-

    time deliveries, optimal production utilization and low inventories. These goals are

    often in conflict with each other, for example could on-time deliveries more often be

    achieved with the help of a large inventory of finished goods, but such an inventory

    instead leads to higher costs (Hübl, 2018).

    Several technologies have been developed through the years to help companies with

    production planning, such as the material requirement planning (MRP),

    manufacturing resource planning (MRPII), enterprise resource planning (ERP) and the

    supply chain planning (SCP) (Hübl, 2018).

  • 9

    The difficulty and the source to continuous re-planning within production planning

    are the uncertainties involved. These are the demand uncertainty, the uncertainty in

    the internal manufacturing/processes, and the uncertainty of supply (Peidro et al.,

    2010). The demand uncertainty has been considered the most important of the three,

    and the one that has received the most attention from researchers. One reason to the

    difficulty of forecasting the demand is the phenomenon called the bullwhip effect,

    which has been extensively researched, resulting in several ways presented to

    mitigate this effect. The bullwhip effect is discussed in more detail in chapter 2.5.

    The supply uncertainty is due to the fact that it can not be taken for granted that

    suppliers deliver the correct amount at the correct time (Khakdaman et al., 2015).

    Similarly, as for the demand side, this makes it difficult to forecast and predict when

    all needed material will be available for the production. This however has not been

    addressed to the same extent as the demand uncertainty resulting in a gap in the

    research as to how this problem should be addressed.

    The rest of the chapter will cover theory relevant to the supply chain management,

    and production planning under uncertainty.

    2.2 Stochastic processes

    Stochastic means that something is randomly determined, and a stochastic process is

    randomly changing as time passes. It is often used in modern chemical and physical

    research, as it suits the random nature of their underlying mechanisms.

    The random variables in a stochastic process can be generated in a number of different

    distributions, where the following three are the most commonly used (Weiss, 2017):

    • Uniform distribution, with values equal probably distributed throughout

    the range.

    • Poisson distribution, where a given number of independent events occur

    in a time interval at a known rate.

    • Binomial distribution, variables with two possible outcomes are randomly

    generated, the outcomes can be of equal or unequal probability.

    Some of the studies considering uncertainty in supply uses a stochastic variable for

    calculating the risk. Since suppliers differs when it comes to delivery performance, as

    well as the big difference between different parts and components in their material,

    complexity, need for processing etc. makes a stochastic variable for calculating the

    delivery uncertainty a simplification of reality. Therefore, an accurate model needs to

    be based on a calculation that considers the changing delivery performance between

    different suppliers and products.

  • 10

    2.3 Predictive models

    A few commonly used predictive models are presented below, along with some

    information of which types of studies they have been used in, and what type of data

    they can be used with.

    2.3.1 Neural Networks

    An Artificial Neural Network (ANN) can be described as a model that is inspired by

    the structure and processing of a biological brain. It consists of processing elements,

    the neurons, and connections between these together with weights. It is commonly

    defined by four parameters (Shanmuganathan and Samarasinghe, 2016):

    1. The type of neuron, for example McCulloch-Pitts neuron.

    2. The connection architecture, neural networks can be fully or partially

    connected, as well as having different layers of neurons. In an auto associative

    network, the input neurons are the also the output neurons, while in a hetero

    associative network there are separate input and output neurons. The

    architecture is determined depending on the connection between the output

    and input neurons. The feedforward architecture has no connection back from

    the output to the input neurons and the network has no memory of the

    previous output values. The feedback architecture on the other hand, has

    connections back from the output to the input neurons, and the network

    remembers its previous states.

    3. The learning algorithm which trains the network, which are divided into

    supervised learning, unsupervised learning and reinforcement learning.

    4. Recall algorithm, how the knowledge is extracted from the neural network.

    Includes pattern association, data clustering, categorization.

    Figure 1: Example of the structure of an artificial neural network (Neural-networks 2020)

  • 11

    Depending on what the goal of the neural network is, it can be constructed in different

    ways. The learning algorithms depends on whether the model is linear or not, as

    linear neural networks often are based on the least mean square rule. Nonlinear

    models are often based on the back-propagation training rule (Russo et al., 2013).

    Neural networks are often used for signal processing, forecasting and clustering, and

    are increasingly being used in different areas where imprecise data and complex

    relationship between variables exist, something that is difficult to handle with

    traditional analytical methods (Shanmuganathan and Samarasinghe, 2016).

    2.3.2 Time series analysis

    Time series consists of observations that are taken sequentially in time, which can be

    analyzed with the aim of finding trends or other useful information. The observations

    need to be taken with equal time periods between them, it can be monthly, weekly,

    daily or any other defined time period. Time series are frequently used in economics,

    business, engineering and natural sciences (Box et al., 2013). One inherent feature of

    time series is that the observations normally are dependent. This dependence is often

    of great interest, and time series analysis includes techniques to analyze this

    dependence.

    In industry, business and economics, forecasting with the help of time series is of

    particular interest. In these fields the time series are often represented as

    nonstationary, with no constant mean level over time. Forecasting methods to deal

    with this has been developed, which use the exponentially weighted moving

    averages. One common model is the autoregressive integrated moving average

    (ARIMA) (Box et al., 2013).

    One problem with time series is brought up by Soo and Rottman (2018), which is the

    complexity of influences that variables have on each other. Variables can experience

    temporal trends, which can make it appear as if it exists positive or negative

    relationship between variables even if no direct relation exist.

    2.3.3 Regression analysis

    A regression model is an analytical tool which estimates the relationship between a

    dependent variable, and one or more explanatory variables. The explanatory

    variables, or regressors, have a statistical or causal relationship to the dependent

    variable. A regression model is quantitative and can be constructed in different ways,

    depending on the purpose of the model and the data available (Welc and Esquerdo,

    2018).

    Regression models can be classified depending on a number of criteria, a few of them

    are (Welc and Esquerdo, 2018):

    • The numbers of equations used, there are single-equation models, which

    use one equation to explain the relationship between the dependent

    variable and the explanatory variables, and multi-equation models, which

  • 12

    have more than one dependent variable, and uses more than one

    equation.

    • How many explanatory variables the model uses, there are univariate

    models with only one explanatory variable, and there are multivariate

    models with at least two explanatory variables.

    • The functional form of the model, there are linear models with a linear

    relationship between the variables, and nonlinear models.

    • The type of the dependent variable, in some models the dependent

    variable is continuous, other models use a dependent variable expressed

    in a binary way.

    Regression models are mainly used for the following three purposes:

    To examine the relationship, or lack thereof, between variables.

    Forecasting, based on a model which explains the behavior of historical data, future

    events can be predicted.

    Scenario analyses, for example how an explanatory variable must change in order for

    the dependent variable to reach a certain value (Welc and Esquerdo, 2018).

    2.3.3.1 Linear regression

    Linear regression is the simplest of the regression models and uses a linear

    combination of the predictors for its function (Su et al., 2012). Linear regression has

    become the building block for numerous modeling tools and is popular for the easily

    interpretable parameters of the model. Also being able to provide a satisfactory result

    to regression problems even when the sample size is small or the relationship between

    the predictors and the dependent variable is relatively vague has increased the

    popularity of linear regression (Su et al., 2012).

    There are many methods to estimate the parameters of the linear regression model,

    and the most common one is the least squares method which tries to minimize the

    distance from the predicted values to the actual ones.

    In linear regression models there are a few assumptions made about the data, the

    predictors, the dependent variable and the relationship between them. These are a few

    of the assumptions that are made (Pandis, 2016):

    • That the observations of the dependent variable are independent from

    each other.

    • That for each value of the predictor, the dependent variable follows a

    normal distribution.

    • That the variability of the dependent variable stays the same for each

    value the independent variable takes.

    Although, even if the difference between the estimated values and the observed ones

    not are normally distributed, linear regression can still produce valid results when the

  • 13

    sample size is large. And on the other hand, a linear regression model with these

    values normally distributed is not necessarily valid (Schmidt and Finan, 2018).

    2.3.3.2 Ridge regression

    Also known as Tikhonov regularization, the Ridge regression was developed to solve

    some of the issues with the least squares method within linear regression models

    (Silva et al., 2018). Ridge regression together with, among others, Lasso are so called

    shrinkage methods which picks out variables to keep in the finished model. This is also

    the idea of another group of called subset selection methods, but there is an important

    difference between the two groups. Subset selection methods have a discrete process

    where the variables either are kept or discarded which often produces models with

    high variance. Shrinkage methods however are continuous in their variable selection

    process and are therefore not as prone to having high variability (Hastie et al., 2009).

    One of these issues with the least squares method sometimes occurs when the

    predictors have a high internal dependence and the regression coefficients have large

    standard errors. This often results in poorly performing estimators, which the Ridge

    regression tries to correct by implementing a penalty on the size of the coefficients

    standard errors (Silva et al., 2018).

    A similar function is also used in neural networks, which is called weight decay.

    Another flaw with linear regression is that unusual large coefficient on a variable can

    be canceled out by a negative coefficient with of similar size on its correlated variable,

    this is dealt with in Ridge regression by implementing size constraints on the

    coefficients (Hastie et al., 2009).

    2.3.3.3 Lasso

    Least Absolute Shrinkage and Selection Operator (Lasso) is a linear regression model

    that effectively reduce the number of variables that the solution is dependent on

    (Linear Models 2020). Standard regression models tend to include too many variables

    which leads to overfitting as well as overestimating how the included variables are

    used to explain the observed variability, thus “optimism bias” (Ranstam and Cook,

    2018).

    Lasso offers a solution to some of these problems with standard regression models,

    since it tries to identify the variables and regression coefficients that leads to the least

    prediction error. One of the drawbacks of the Lasso model is that it does not focus on

    the contribution of individual variables, but instead the focus is on the best combined

    prediction (Ranstam and Cook, 2018).

    The Lasso model has been used for prediction with good results, one example is a

    study by Wang et al. (2018) in which they predicted fuel consumption of ships at sea.

    The results of the Lasso model were compared with three other methods (artificial

  • 14

    neural networks, support vector regression and Gaussian process regression) which

    showed that the Lasso model was the best performing of the four.

    2.3.3.4 Elastic net

    Built in a similar way as Lasso, the regression method Elastic net simultaneously

    performs automatic variable selection as well as continuous shrinkage. It was

    developed in order to overcome some of the shortcomings of Lasso, one of which is

    that when a group of variables exist with very high pairwise correlations between

    them, Lasso selects one variable from the entire group without care for which one it

    selects. Elastic net on the other hand prefers to use a grouping effect, where variables

    with strong correlations are either kept in the model or removed together as a group

    (Zou and Hastie, 2005).

    Elastic net has shown to be very useful when the amount of variables is bigger than

    the number of observations in the data (Hao and Lu, 2018).

    Zou and Hastie (2005) show in their research, using real world data, that the Elastic

    net method often achieves better results than both Lasso and Ridge regression.

    Choosing the models

    As shown above, there are several different types of predictive models available

    depending on the data and how the prediction problem is structured.

    As the problem in this case is the uncertainty in deliveries from suppliers, the

    prediction is looking to provide an answer to when the delivery will be taking place.

    Regression can provide a continues value, i.e. not providing the results as classified

    into different categories. This suits the data well as deliveries can have very different

    expected delivery times, and classification would create categories with large spans.

    Time series analysis will not be used due to the fact that they look at defined time

    periods, whereas deliveries have varying lead times. Time series could be used to

    forecast what would be delivered during a defined time period but lacks the ability to

    predict specific deliveries or the performance of suppliers or parts.

    When it comes to prediction, linear regression models can sometimes perform better

    than more advanced nonlinear models, in particular when the dataset is relatively

    small (Hastie et al., 2009). Since the results are wanted as a continual value and not

    classified, the simple linear regression is a good starting point.

    Wang et al. (2018) showed in their study that Lasso outperformed, among others, the

    artificial neural network when predicting fuel consumption. The number of variables

    in their study is almost the same as in this study, and they too were looking to predict

    a continuous value. Considering this, Lasso is a reasonable choice of model to try in

    this study.

    Scikit-learn is an open source machine learning library which offers both numerous

    algorithms as well as several guides and thorough documentation for all its available

  • 15

    functions. And since finding the correct model is one of the hardest parts of solving a

    machine learning problem, scikit-learn offers a guide to that as well. Depending on

    the problem that the model will be used to solve, and what types of data that are

    available, some machine learning models are better suited than others (Choosing the

    right estimator 2020). The delivery times in this study is to be calculated as a

    continuous variable and the number of samples is not very large. These conditions

    make Lasso, Ridge regression and Elastic net some of the machine-learning models

    that scikit-learn recommends for this type of study. This together with the fact that

    Ridge was developed as an improvement to the ordinary linear regression model, and

    that Elastic net is developed to deal with some of the shortcomings of Lasso makes the

    comparison between them all interesting.

    The predictive models chosen for the study are therefore:

    Linear regression

    Ridge regression

    Lasso

    Elastic net

    2.4 Evaluation methods

    This chapter presents some of the most common methods used for evaluating

    predictive models.

    2.4.1 MAPE

    Mean Absolute Percentage Error (MAPE) is a measure for assessing the quality and

    accuracy of a prediction made with a forecasting method. It can be presented as a

    percentage and it is commonly used in practice thanks to the intuitive interpretation

    of relative error that it offers (de Myttenaere et al., 2016).

    Although being a popular and often used assessment method, MAPE has a few flaws.

    It cannot handle zero values, as it would require divisions with zero. The most

    concerning problem however, is that equal errors result in different MAPE-scores,

    depending on if the forecasted value is above or below the actual value (Makridakis,

    1993). This means that MAPE rewards results that under-forecast, which creates a

    misleading result.

    (2.1)

    Table 1 The variables of MAPE

    t Observation index 𝐴𝑡 Actual value

    n Number of observations 𝐹𝑡 Forecasted value

    𝑀𝐴𝑃𝐸 =1

    𝑛∑

    |𝐴𝑡 − 𝐹𝑡|

    𝐴𝑡

    𝑛

    𝑡=1

  • 16

    2.4.2 sMAPE

    To overcome some of the shortcoming of the MAPE method, it was expanded and

    became the symmetric Mean Absolute Percentage Error (sMAPE). In addition to being

    symmetric, it can also handle zero values (Flores, 1986). The model, just like MAPE,

    gives a forecast a percentage error in comparison to the actual value.

    (2.2)

    Table 2 The variables of sMAPE

    t Observation index 𝐴𝑡 Actual value

    n Number of observations 𝐹𝑡 Forecasted value

    2.4.3 MAE

    Mean absolute error (MAE) is a measure of errors between observations, which can be

    used to evaluate how well predictive models perform. MAE is calculated by taking

    the average of the absolute error between the predicted and the actual value (Metrics

    and scoring 2020).

    (2.3)

    Table 3 The variables of MAE

    i Observation index 𝑦�̂� Predicted value

    n Number of observations 𝑦𝑖 Actual value

    Next to MAE, the very similar root mean square error (RMSE) is also widely used to

    evaluate similar models as MAE, and there is no consensus on which of the two is

    most appropriate to use (Chai and Draxler, 2014). One major difference between the

    two is that all errors are given the same weight when using MAE, while RMSE on the

    other hand gives errors with larger absolute values a higher weight. This has the effect

    that the RMSE is more sensitive to outliers in the data compared to MAE (Mendo,

    2009).

    𝑠𝑀𝐴𝑃𝐸 =100%

    𝑛 ∑

    |𝐹𝑡 − 𝐴𝑡|

    (|𝐴𝑡| + |𝐹𝑡|)/2

    𝑛

    𝑡=1

    𝑀𝐴𝐸 =1

    𝑛∑|𝑦𝑖 − 𝑦�̂�|

    𝑛

    𝑖=1

  • 17

    Chai and Draxler (2014) argues that when the errors are expected to be normally

    distributed, the RMSE is the better choice over MAE, and that MEA should be used

    when the errors are expected to be uniformly distributed.

    Another distinct difference between RMSE and MEA is that the MAE uses absolute

    values, which is a big disadvantage in many mathematical calculations (Chai and

    Draxler, 2014).

    Also, that the results of the MAE are of the same scale as the data it evaluates must be

    taken into consideration before choosing the appropriate evaluation method.

    2.4.4 Cross-validation

    The validity of predictive models depend on their ability to generalize and accurately

    predict properties of new and unseen data that is independent from the data used to

    train the model (Varoquaux, 2018). This can be done with cross-validation, which

    splits the available data into separate training and testing sets. Cross-validation

    methods often use multiple rounds to evaluate the results. One of these methods is the

    k-fold cross-validation where the data is divided into k subsets of equal size, and the

    model performs k training rounds, or folds. For each training fold the model chooses

    one of the subsets to be the test set, and the rest are chosen as training sets (Liu et al.,

    2020). Each fold results in a validation error, and the k-fold cross-validation estimate is

    the average validation error for all the k folds.

    Many prediction models use hyperparameters, for example kernel parameter or

    regularization parameter. These parameters are not calculated within the model, so

    they need to be set before the model can be used. Hyperparameters can have different

    purposes in a model, the regularization parameter for instance decides the amount of

    shrinkage in a model. The value of these hyperparameters can sometimes greatly

    affect the results, so finding the optimal value can be decisive for the performance of

    the model. Cross-validation can be used both to find the optimal hyperparameter for

    the model chosen, or simply to evaluate different models compared to each other (Liu

    et al., 2020).

    Choosing the models

    All the models chosen for prediction are regression models, in which there are several

    predictors that are used together to try to predict the dependent variable, or target

    value. One common feature of most machine learning techniques is that the model is

    first trained on a sub-set of the available data, and then tested on the rest.

    In this study, a 5-fold cross-validation will be used in order to make sure that the

    results are not affected by randomness as each sub-part of the dataset will be used

    four times as a training set, and once as the test set.

    In the cross-validation it has to be decided which format it is supposed to return the

    results in. In this study the results will be in the form of MAE, the mean actual error.

    The target value of the study is the actual lead time of each part and is measured in

    days. This allows MAE to be easily understood and the fact that it is scale dependent

  • 18

    is not an issue. MAE is also better in handling outliers in the data, which exist in this

    study (Mendo, 2009).

    Scikit-learn also offers a built-in MAE calculator in its cross-validation function, as

    opposed to MAPE and sMAPE.

    2.5 Previous studies

    Khakdaman et al. (2015) builds a robust optimization model to manage production

    planning in an MTS/MTO environment. The uncertainties consist of delivery, process

    and demand. The delivery part of the uncertainty is only about cost and currency

    rates, which means that they overlook the risk of delayed or incomplete deliveries.

    They conclude that although their focus is mostly on cost-related uncertainties, they

    feel confident that their model delivered accurate estimates and useful findings.

    Gao et al. (2017) uses a Markov-model for signal-based dynamic supply forecast, in

    which suppliers are examined to find signals (financial health etc.) in order to guide

    the procurement and selling decisions of the company. In the model, the capacity of

    the suppliers is assumed to be stochastic. Their research concludes that a traditional

    stationary forecast for supply capacity uncertainty may lead to poor decisions and

    severe losses.

    In (Peidro et al., 2010), a fuzzy linear programming model is used to deal with supply

    chain planning under demand, process and delivery uncertainty. The uncertainties are

    jointly considered, and the aim of the model is to use the available resources in order

    to meet customer demands at a minimum cost. The model is tested with real-word

    data and the results show that the model is more effective than deterministic methods

    for handling situations without certain and precise information in supply chain

    planning. The researchers see the integration of analytical and simulation models as a

    possible improvement, where the best capacities of both types of models are

    integrated to be used for supply chain planning problems.

    Mirzapour Al-e-hashem et al. (2011) builds a robust multi-objective mixed integer

    nonlinear programming model to deal with multi-product aggregate production

    planning (APP) with uncertainty. The supply chain includes multiple suppliers,

    multiple customers, is multi-period and multi-site. The uncertainties include the

    demand fluctuations, as well as the cost parameters of the supply chain.

    The demand part of the supply chain management is extensively researched, and one

    phenomena of uncertainty in demand is called the bullwhip effect. This effect is one of

    the main sources of inefficiencies in supply chains, and by extension, in production

    planning. The effect is described as a growing difficulty to forecast demand the

    further from the end customer, or higher up the supply chain an organization is (Braz

    et al., 2018). Different ways to mitigate this effect are recommended, one way is to

    implement a closed-loop supply chain, which has shown to be less affected than

    traditional supply chains (Braz et al., 2018). Another way, introduced by Jaggi et al.

    (2018) is the iterative proportional allocation algorithm which discourages the

    bullwhip effect.

  • 19

    In the cases where these models deal with the uncertainty in supply, this uncertainty

    is mixed and calculated together with other uncertainties, or it is viewed as a

    stochastic uncertainty. The models do not focus on the supply uncertainty alone. This

    means that the models produce results that are generalized and not consistent with

    the complexity of real-world cases since they attribute all suppliers and items with the

    same uncertainty. The research gap found here is to move away from the

    generalization and instead use predictive modeling to form a data driven model

    which can be incorporated in a bigger model that deals with production planning

    under uncertainty in general.

    Khakdaman et al. (2015, p. 1384) express at the end in their conclusion this need to

    consider delivery uncertainty: “Finally, incorporating non-financial uncertainties such

    as production lead time or raw material supply lead time into the model would

    enhance its usefulness to both academics and practitioners.”

    Predicting deliveries from supplier could both be used in models as explained above,

    or independently as a way to handle uncertainties in the supply chain, i.e. a way of

    supply chain risk management.

    This study aims to find the predictive model best suited for these purposes.

  • 20

    3 Method

    The methods and approaches to answer the research question are presented in the

    following chapter, along with some information regarding the reliability, validity and

    research ethics and how these are considered in this study.

    The sub chapters are presented in the order in which they were carried out.

    The study consists of four main parts, the data collection, data preprocessing, the

    prediction and evaluation. The first part, the data collection, was made directly at a

    manufacturing company. This was made in order to make sure that the results of the

    study are applicable to real-world cases. The data then had to be preprocessed, to be

    able to be used in the chosen predictive models. After the preprocessing was

    complete, the predictive models were constructed and used with the now appropriate

    data. Lastly, all the results from the different predictive models were gathered,

    summarized and presented, all in order to see if any conclusions could be drawn.

    All parts are described in more detail in the following chapter.

    Figure 2: The structure of the study

    3.1 Data collection

    All the data was gathered from the company ERP-system, which tracks all orders,

    warehouse information etc. throughout the different company facilities. The system

    also contains information about the supplier for each part, along with the expected

    lead time. The lead time is the time from the placement of the order to when the order

    Data collection

    Data preprocessing

    Prediction

    Evaluation

  • 21

    is fulfilled. The system also tracks the actual lead time of each delivery, and that is the

    important information. The actual lead time is what is being used as the target value

    in this study, the value that the models will try to predict.

    This means that there is historical data in the company ERP-system regarding when

    each order is placed, the lead time of each part, the supplier, and when the order was

    received.

    The data gathered is all internal transactions during 2018 and 2019, which in total was

    375 000 lines. This was then filtered to only include the deliveries from the main

    producing factory in Sweden that are sent to the other company facilities around the

    world.

    To make sure that the data used was going to be large enough, and that the results

    would show over multiple cases, the 10 items with most deliveries during these two

    years were selected to be used in the models. This means that the predictive models

    were used on 10 different items to see if any clear conclusions could be drawn.

    The data was collected from the ERP-system to an Excel file, in which the data

    regarding the 10 items most frequently delivered was picked out, and then remade

    into a comma separated file (.csv) in order to facilitate the data handling in Python.

    3.2 Data preprocessing

    To be able to use the data in a prediction model, it first needs to be preprocessed and

    structured in a correct way. To successfully use algorithms to draw conclusions from

    large sets of data, one key point is the quality of the data. The data needs to be

    complete, up to date, and without errors, which collected data rarely is (Davidson and

    Tayi, 2009).

    Modern databases, many of huge sizes, often include noisy and missing data that

    originate from multiple different sources (Jiawei Han et al., 2012). Data quality can be

    described as the accuracy, completeness, consistency, timeliness, believability and

    interpretability of the data (Jiawei Han et al., 2012). To help improve the data quality

    and substantially improve the quality of the results, there are a few techniques of data

    preprocessing that can be used.

    Completeness is one key part of data quality, and one that often needs to be properly

    addressed since missing values are a common issue with large datasets. A missing

    value is when an observation, or tuple, doesn’t have values for each of the attributes.

    The reason why a tuple lack values for each attribute can be a technical issue, but also

    that the system gathering the data is constructed in a way that creates missing values.

    A few different techniques to deal with this problem exists, among them is simply to

    remove the tuple, to use a global constant to fill in the missing value, or to fill in the

    mean or median value for that particular attribute.

    All these techniques come with drawbacks, as ignoring the tuple decreases the size of

    the data and possibly affect the quality of the results. Using a constant for all missing

    values, or using the mean or median value produces a bias in the data and can

    therefore also reduce the quality of the results (Jiawei Han et al., 2012).

  • 22

    In this study there were one attribute that many of the tuples were missing data for,

    which was due to how the system is constructed. These missing values are actually 0

    and were thus replaced with 0.

    Binning is a data preprocessing technique to smooth data and can be used to reduce

    the numbers of distinct values that an attribute can be (Jiawei Han et al., 2012). In this

    study all the orders gathered from the ERP-system includes information about when

    the order was placed and when it was scheduled for delivery and other scheduling

    information. All these were in a year-month-day format, which creates many different

    possible combinations over a two-year period. Here binning was used to reduce the

    numbers of possible values, and to better visualize any monthly effects. All dates were

    formatted as just month.

    One of the attributes in the dataset had all its values formatted as text, so these were

    transformed into numbers instead to ease the handling and loading of the data.

    Lastly, the data was cleared from any sensitive information (costs, who placed the

    order etc.) as well as a few redundant attributes. Some attributes were also removed

    that contained information only available when the delivery was completed, this was

    removed due to the fact that if a predictive model is to be useful, it cannot include

    information not available before the actual delivery taking place. It would then not be

    possible to use as a predictive model. When the data was preprocessed it contained 15

    attributes without the target value (actual delivery time).

    3.3 Prediction

    This chapter is divided into parts for each of the chosen prediction models, in which it

    is described which parameters that must be chosen and how it is performed.

    Python has become one of the most popular languages due to its interactive nature

    and the growing ecosystem of scientific libraries (Pedregosa et al., 2011).

    Scikit-learn is a Python module which integrates several machine learning algorithms

    for supervised and unsupervised learning. The scikit-learn project was created in

    order to provide an open source machine learning library that would be well

    accessible to non-machine learning experts and usable in various scientific areas

    (Buitinck et al., 2013).

    The library is a collection of functions and classes that the user then imports into a

    Python environment.

    Scikit-learn is as previously mentioned, a Python library, and needs an environment

    to run it. Anaconda, which is a package and environment manager has been chosen to

    be used together with the scikit-learn library. Anaconda has thousands of open-source

    packages able for installation, and offers a simple user interface (Anaconda Distribution

    2020).

  • 23

    3.3.1 Linear regression

    Scikit uses the most common method for parameter estimation, the ordinary least

    squares method, shown in equation 3.1, which tries to minimize the sum of squares

    that differs between the observed values and the predicted ones (Linear Regression

    2020).

    (3.1)

    Table 4 The variables of ordinary least squares

    w Coefficients y Target values

    X Training data

    The data is divided into arrays, X for the testing data and y for the target values. The

    coefficients, or weights, are calculated and the model is then fitted and tested.

    The results are then calculated with the help of cross validation and presented in the

    form of MAE.

    3.3.2 Ridge regression

    Ridge regression uses a similar method for parameter estimation as the ordinary least

    squares method does but adds a penalty on the size of the coefficients. Then the model

    minimizes the sum of squares in accordance with equation 3.2 (Linear Models 2020):

    (3.2)

    Table 5 The variables of Ridge regression

    w Coefficients y Target values

    X Training data 𝛼 Regularization parameter

    The complexity parameter alpha in the equation above decides the amount of

    shrinkage in the model, a larger value of alpha results in a bigger shrinkage. Since the

    best suitable value of alpha can differ a lot depending on the data, Scikit-learn offers a

    method to try different values. This is done with RidgeCV which is a cross validation

    ||𝑋𝑤 − 𝑦||𝑤

    𝑚𝑖𝑛

    2

    2

    𝑚𝑖𝑛

    𝑤||𝑋𝑤 − 𝑦||

    2

    2+ 𝛼||𝑤||

    2

    2

  • 24

    method where the model uses different alpha values to see which one best fits the

    model (RidgeCV 2020).

    Figure 3: Example of how the MAE changes depending on the value of alpha

    RidgeCV performs a 5-fold cross validation for each of the alpha values that is

    specified to be tested. In this study, the following alpha values are included:

    Table 6 The values tested as alpha in Ridge regression

    𝛼 0.001, 0.005, 0.01, 0.05, 0.1, 0.3, 1, 3, 5, 10, 15, 30, 50, 75, 100, 150

    After the best suited alpha value is calculated, the results are calculated with cross

    validation and presented in the form of MAE together with the standard deviation of

    the results.

    3.3.3 Lasso

    Lasso estimates sparse coefficients and tends to prefer solutions with coefficients,

    which reduces the variables the solution is dependent on. Lasso tries to minimize

    equation 3.3 (Linear Models 2020):

    (3.3)

    Table 7 The variables of Lasso

    w Coefficients y Target values

    X Training data 𝛼 Regularization parameter

    𝑚𝑖𝑛

    𝑤

    1

    2𝑛𝑠𝑎𝑚𝑝𝑙𝑒𝑠||𝑋𝑤 − 𝑦||

    2

    2+ 𝛼||𝑤||

    1

  • 25

    The alpha in the equation decides the degree of sparsity of the coefficients, and just

    like Ridge regression, the best suited alpha for the model can be decided by cross

    validation. Scikit-learn offers two different methods for deciding the value of alpha,

    the LassoCV and the LassoLarsCV function. LassoCV is however most often preferred

    and will be used in this study (Lasso 2020).

    Figure 4: Example of how the MAE changes depending on the value of alpha

    The following values of alpha was tested in the model, each time with LassoCV

    performing a 5-fold cross validation to test how the different alpha values affect the

    results:

    Table 8 The values tested as alpha in Lasso

    𝛼 0.001, 0.005, 0.01, 0.05, 0.1, 0.3, 1, 3, 5, 10, 15, 30, 50, 75, 100, 150

    The best suited parameter is then set and used in a 5-fold cross validation to get the

    results, which are presented as MAE together with the standard deviation of the

    results.

    3.3.4 Elastic net

    Elastic net uses both L1 and L2-norm regularization on the coefficients, which gives

    the model some of the properties of both Lasso and Ridge regression. Elastic net tries

    to minimize equation 3.4 (Linear Models 2020):

    (3.4)

    𝑚𝑖𝑛

    𝑤

    1

    2𝑛𝑠𝑎𝑚𝑝𝑙𝑒𝑠||𝑋𝑤 − 𝑦||

    2

    2+ 𝛼𝜌||𝑤||

    1+

    𝛼(1 − 𝜌)

    2||𝑤||

    2

    2

  • 26

    Table 9 The variables of Elastic net

    w Coefficients y Target values

    X Training data 𝛼 Regularization parameter

    𝜌 Ratio between L1 and L2 penalty

    As seen above, the equation for Elastic net contains two parameters, alpha and 𝜌 (L1

    ratio). To find the parameters most suited for the model, the ElasticNetCV function

    can be used (ElasticNetCV 2020). Another possibility is to use the function

    GridSearchCV which performs an exhaustive search from a grid of specified

    parameters to see which combination provides the best result (GridSearchCV 2020). In

    this study, the alpha values tested are the following:

    Table 10 The values tested as alpha in Elastic net

    𝛼 0.001, 0.005, 0.01, 0.05, 0.1, 0.3, 1, 3, 5, 10, 15, 30, 50, 75, 100, 150

    The second parameters used in Elastic net is the L1 ratio which decides the

    combination of L1 and L2 penalty. This value must be between 0 and 1, with 0 being

    an L2 penalty (Ridge), and 1 being an L1 penalty (Lasso). All values between 0 and 1

    results in a combined L1 and L2 penalty. It is recommended to try more values of the

    L1 ratio closer to 1 than 0 (ElasticNetCV 2020). The L1 ratio numbers tested by the

    GridSearchCV in this study are:

    Table 11 The values tested as 𝜌 in Elastic net

    𝜌 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99, 1

    After the parameters are set, a 5-fold cross validation is used to get the results of the

    model, which are presented together with the standard deviation.

    3.4 Evaluation

    After calculating the optimal parameters for the prediction models, these parameters

    were inserted in the models and the results were calculated by using a 5-fold cross

    validation. During the cross validation the dataset is divided into 5 smaller sets of

    equal size, of which 4 will be used for training and the 5th set for testing. This is

    repeated 5 times until all of the smaller sets have been used as both training and

    testing sets (Cross-validation 2020). The function for the actual results of the cross

    validation in Scikit-learn is the cross_val_score function, which returns the results of

    each of the folds in the cross validation. The cross_val_score function offers a few

    different ways of presenting the results, one of which is the MAE that was used in this

    study. The results returned by the cross_val_score function is the average value of the

    prediction for each tuple in the test set. This means that each fold of the cross-

  • 27

    validation only returns one value, which was collected and is presented in the results

    chapter.

    3.5 Reliability & Validity

    This study is a quantitative study, which comes with the benefit that the

    interpretations and findings are based on quantitative and measured data rather than

    impressions. But for the analysis to have any value, the data has to be valid

    (Denscombe, 2014). Good validity and reliability are two key principles of ensuring

    that the research has good credibility and is of high quality.

    3.5.1 Reliability

    Reliability is a measure of the consistency of the study, that a research instrument

    should reach approximately the same result every time the test is completed (Heale

    and Twycross, 2015). This means that the people behind the research should have no

    impact on the outcome of the study, and given everything else being equal, produce

    the same results on different occasions (Denscombe, 2014).

    The use of methods that are firmly grounded in previous research both for the

    analysis of the data as well as the examination of the results indicates that the study is

    of high reliability.

    3.5.2 Validity

    The validity refers to the accuracy of the data, as well as if the data is appropriate to

    answer the research question of the study (Denscombe, 2014). The data in this study is

    taken directly from the ERP-system of an actual company, which makes it appropriate

    to answer the research question. The data is also of large quantity, spans over a two-

    year period and was checked for errors before being used.

    This means that the data in the study is of high validity.

    3.6 Research ethics

    To conduct research in an ethical way, there are six main principles that needs to be

    considered. These are (Our core principles 2020):

    • The research should maximize the benefit for individuals and society, as

    well as minimize the risk and harm.

    • The research respects the rights and dignity of individuals and groups.

  • 28

    • Participation in the research should be voluntary and appropriately

    informed.

    • The research is to be conducted with integrity and transparency.

    • That the lines of responsibility and accountability are clearly defined.

    • Research should be independent, and that conflicts of interest are explicit

    in the cases where they cannot be avoided.

    There is no reason to believe that the study will violate any of these principles.

    Following the principles is easier since no information about individuals will be

    gathered, the only data used is retrieved from the company ERP and is presented

    without any sensitive information.

  • 29

    4 Results

    As data regarding 10 items were chosen, 10 individual predictions were made with

    the four different models. The results are presented below in table 8 and figure 6 and

    7.

    The number of the item is in the header, along with the number of rows (tuples) and

    attributes. The results of the predictions are presented as the mean absolute value

    (MAE) together with the standard deviation (SD). Ridge regression, Lasso and Elastic

    net all use one or two parameters which must be set in the model. The values here are

    the parameters that yielded the best results, blank fields mean that the model does not

    use the parameter.

    Table 8 The results of the prediction models

    Item 1 (160 x 15)

    MAE SD Alpha L1 ratio

    Linear regression 34.923 14.029

    Ridge regression 29.794 5.933 15

    Lasso 29.78 7.286 5

    Elastic net 29.78 7.286 5 1

    Item 2 (159 x 15)

    MAE SD Alpha L1 ratio

    Linear regression 15.785 4.082

    Ridge regression 15.785 4.082 0.001

    Lasso 15.785 4.08 0.001

    Elastic net 15.785 4.08 0.001 1

    Item 3 (188 x 15)

    MAE SD Alpha L1 ratio

    Linear regression 26.631 9.759

    Ridge regression 26.603 9.965 50

    Lasso 26.091 10.491 10

    Elastic net 26.091 10.491 10 1

    Item 4 (374 x 15)

    MAE SD Alpha L1 ratio

    Linear regression 32.974 8.014

    Ridge regression 32.783 8.38 10

    Lasso 32.891 8.374 1

    Elastic net 32.783 8.4 0.05 0.3

  • 30

    Item 5 (260 x 15)

    MAE SD Alpha L1 ratio

    Linear

    Regression

    39.594 15.488

    Ridge regression 37.398 10.768 75

    Lasso 39.05 11.389 15

    Elastic net 37.443 10.924 0.3 0

    Item 6 (165 x 15)

    MAE SD Alpha L1 ratio

    Linear regression 37.619 12.131

    Ridge regression 37.567 12.095 1

    Lasso 37.375 7.826 50

    Elastic net 37.375 7.826 50 1

    Item 7 (163 x 15)

    MAE SD Alpha L1 ratio

    Linear regression 27.046 8.319

    Ridge regression 26.678 8.153 150

    Lasso 26.319 8.441 5

    Elastic net 26.319 8.441 5 1

    Item 8 (155 x 15)

    MAE SD Alpha L1 ratio

    Linear regression 26.719 5.191

    Ridge regression 25.778 4.564 150

    Lasso 26.411 5.135 30

    Elastic net 25.82 4.731 3 0.4

    Item 9 (152 x 15)

    MAE SD Alpha L1 ratio

    Linear regression 12.645 3.756

    Ridge regression 12.521 3.927 150

    Lasso 12.164 4.319 5

    Elastic net 12.164 4.319 5 1

    Item 10 (178 x 15)

    MAE SD Alpha L1 ratio

    Linear regression 25.182 8.999

    Ridge regression 24.493 8.646 150

    Lasso 23.809 8.74 15

    Elastic net 23.803 8.772 15 0.95

  • 31

    Below is a presentation of the average values of the mean absolute error as well as the

    average standard deviation of the models.

    Figure 6: The average MAE of the results

    Figure 7: The average SD of the results

  • 32

    5 Discussion

    The linear regression model with the ordinary least squares method performs worst in

    all cases, as can be seen in table 8, which could be expected as this model is unable to

    perform any regularization. That the regularization is what makes the other models

    perform better is clear, as the results of Item 2 shows that linear regression, just as it

    theoretically should (compare equation 1 and 2 for example), performs just in line

    with the other models when their regularization parameter, alpha, moves towards

    zero. 0.001 was the lowest value of alpha that the models could use, but they might

    have chosen 0 if it had been among the possible values.

    Ridge regression was developed to deal with some of the shortcomings of the

    ordinary least squares method, and the results clearly shows that it does so. Even

    though the results are close to each other, Ridge regression performs better or as good

    as the linear regression in all cases.

    Lasso was also developed as a continuation and improvement of the linear regression

    and is highlighted for its ability to reduce the number of variables that the solution is

    dependent on. This is brought up as a solution to the problem with standard

    regression models which often include too many variables, leading to overfitting

    (Ranstam and Cook, 2018). It is difficult to draw any conclusions when looking at the

    results from Lasso compared to Ridge regression, since even though the number of

    variables is rather small (15) Lasso performs better than Ridge in most cases, but not

    all, see table 8. So, it is hard to say if Lasso is performing better due to its tendency to

    prefer fewer variables, since there are not that many to begin with, and Lasso does not

    perform better over all cases.

    Elastic net was developed to improve Lasso, and the results show that it does not just

    perform better than Lasso, it performs best overall which clearly can be seen in figure

    6. It has, thanks to its parameter to control the ratio between L1 and L2 penalty, the

    ability to perform more like Ridge regression or as Lasso depending on which is best

    at that moment. As the L1 ratio approaches 1 the model performs as Lasso, and when

    the ratio moves towards 0 it performs as Ridge regression, just as it theoretically

    should. This gives Elastic net a flexibility to adapt to the current data and to always

    perform better or as good as the other models.

    Hao and Lu (2018) brought up that the Elastic net model is performing especially well

    when the number of variables is bigger than the number of tuples, but as seen in the

    results it performs as good as, or better than, Ridge regression and Lasso even though

    the amount of variables is far smaller than the number of tuples in all cases.

    The performance of Elastic net in situations with less variables than tuples is also a

    key point in the study of Zou and Hastie (2005). This together with the fact that no

    clear patterns can be seen when comparing the items with more tuples with the ones

    with fewer makes it interesting to see whether any of the models would stand out

    when the number of tuples becomes larger or smaller. At least in theory, the Elastic

    net should perform even better as the number of tuples shrinks to fewer than the

    variables.

  • 33

    Previous research on the area of production planning under uncertainty has resulted

    in models which are an improvement compared to production planning without

    taking any consideration to the uncertainties involved. This means that the area is

    already useful for practitioners and researchers but including this data-driven method

    for forecasting deliveries could improve the models even further.

  • 34

    6 Conclusion

    The best overall model for predicting the delivery times is the Elastic net, with being

    the best predictor in almost all cases, and very close to the best in the rest. It is clear

    that it adapts and becomes more like Ridge or as Lasso depending on which one

    works the best. Ordinary linear regression performs worst in more or less all cases and

    should therefore not be used for this purpose.

    This means that Elastic net should be the chosen as the predictive model to be used

    when calculating the actual delivery times from suppliers. These results can then be

    used to mitigate the supply uncertainty in a larger model for production planning, or

    to be used independently in supply chain risk management.

    6.1 Future research

    The sample sizes in this model spans from 152 to 374 observations, which means that

    these parts were delivered multiple times per month during these two years. For a

    producing company, many parts from the suppliers will be delivered far less often

    than that, which results in a weaker base for a prediction model. Future research

    could look if a similar result would appear when the number of observations gets

    smaller, and how small this number can be and with the model still providing a

    meaningful result.

    As the previous studies on the area often simplify the calculations of uncertainty from

    the supply side, it would be interesting to see a predictive model, preferably elastic

    net, implemented in a larger model. This though, requires data from a real-world case.

    The results can then be compared with the results of the previous models to see if

    there is any improvement.

  • 35

    References

    1.1. Linear Models — scikit-learn 0.22.2 documentation [WWW Document], n.d. URL

    https://scikit-learn.org/stable/modules/linear_model.html#lasso (accessed

    3.19.20).

    3.1. Cross-validation: evaluating estimator performance — scikit-learn 0.22.2

    documentation [WWW Document], n.d. URL https://scikit-

    learn.org/stable/modules/cross_validation.html#cross-validation (accessed

    5.1.20).

    3.2.4.1.1. sklearn.linear_model.ElasticNetCV — scikit-learn 0.22.2 documentation

    [WWW Document], n.d. URL https://scikit-

    learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html#

    sklearn.linear_model.ElasticNetCV (accessed 5.1.20).

    3.2.4.1.9. sklearn.linear_model.RidgeCV — scikit-learn 0.22.2 documentation [WWW

    Document], n.d. URL https://scikit-

    learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html#skle

    arn.linear_model.RidgeCV (accessed 4.30.20).

    3.3. Metrics and scoring: quantifying the quality of predictions — scikit-learn 0.22.2

    documentation [WWW Document], n.d. URL https://scikit-

    learn.org/stable/modules/model_evaluation.html#regression-metrics

    (accessed 5.4.20).

    Anaconda Distribution — Anaconda documentation [WWW Document], n.d. URL

    https://docs.anaconda.com/anaconda/ (accessed 3.19.20).

    Aouam, T., Geryl, K., Kumar, K., Brahimi, N., 2018. Production planning with order

    acceptance and demand uncertainty. Comput. Oper. Res. 91, 145–159.

    https://doi.org/10.1016/j.cor.2017.11.013

    Beemsterboer, B., Land, M., Teunter, R., 2016. Hybrid MTO-MTS production planning:

    An explorative study. Eur. J. Oper. Res. 248, 453–461.

    https://doi.org/10.1016/j.ejor.2015.07.037

    Box, G.E.P., Jenkins, G.M., Reinsel, G.C., 2013. Introduction, in: Time Series Analysis.

    John Wiley & Sons, Ltd, pp. 7–18. https://doi.org/10.1002/9781118619193.ch1

    Braz, A.C., De Mello, A.M., de Vasconcelos Gomes, L.A., de Souza Nascimento, P.T.,

    2018. The bullwhip effect in closed-loop supply chains: A systematic literature

    review. J. Clean. Prod. 202, 376–389.

    https://doi.org/10.1016/j.jclepro.2018.08.042

    Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae,

    V., Prettenhofer, P., Gramfort, A., Grobler, J., Layton, R., Vanderplas, J., Joly,

    A., Holt, B., Varoquaux, G., 2013. API design for machine learning software:

    experiences from the scikit-learn project. ArXiv13090238 Cs.

    Bushuev, M.A., Guiffrida, A.L., 2012. Optimal position of supply chain delivery

    window: Concepts and general conditions. Int. J. Prod. Econ. 137, 226–234.

    https://doi.org/10.1016/j.ijpe.2012.01.039

  • 36

    Chai, T., Draxler, R.R., 2014. Root mean square error (RMSE) or mean absolute error

    (MAE)? – Arguments against avoiding RMSE in the literature. Geosci. Model

    Dev. 7, 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014

    Choosing the right estimator — scikit-learn 0.22.2 documentation [WWW Document],

    n.d. URL https://scikit-

    learn.org/stable/tutorial/machine_learning_map/index.html (accessed 3.19.20).

    Davidson, I., Tayi, G., 2009. Data preparation using data quality matrices for

    classification mining. Eur. J. Oper. Res. 197, 764–772.

    https://doi.org/10.1016/j.ejor.2008.07.019

    de Myttenaere, A., Golden, B., Le Grand, B., Rossi, F., 2016. Mean Absolute Percentage

    Error for regression models. Neurocomputing 192, 38–48.

    https://doi.org/10.1016/j.neucom.2015.12.114

    Denscombe, M., 2014. The Good Research Guide : For Small-scale Research Projects,

    Open UP Study Skills. McGraw-Hill Education, Maidenhead, Berkshire.

    Er, M., Arsad, N., Astuti, H., Kusumawardani, R., Utami, R., 2018. Analysis of

    production planning in a global manufacturing company with process

    mining. J. Enterp. Inf. Manag. 31, 317–337. https://doi.org/10.1108/JEIM-01-

    2017-0003

    Flores, B.E., 1986. A pragmatic view of accuracy measurement in forecasting. Omega

    14, 93–98. https://doi.org/10.1016/0305-0483(86)90013-7

    Gao, L., Yang, N., Zhang, R., Luo, T., 2017. Dynamic Supply Risk Management with

    Signal-Based Forecast, Multi-Sourcing, and Discretionary Selling. Prod. Oper.

    Manag. 26, 1399–1415. https://doi.org/10.1111/poms.12695

    Hao, Y.-X., Lu, D., 2018. Sparse approximation of fitting surface by elastic net.

    Comput. Appl. Math. 37, 2784–2794. https://doi.org/10.1007/s40314-017-0475-4

    Hastie, T., Tibshirani, R., Friedman, J., 2009. Linear Methods for Regression, in: Hastie,

    T., Tibshirani, R., Friedman, J. (Eds.), The Elements of Statistical Learning:

    Data Mining, Inference, and Prediction, Springer Series in Statistics. Springer,

    New York, NY, pp. 43–99. https://doi.org/10.1007/978-0-387-84858-7_3

    Heale, R., Twycross, A., 2015. Validity and reliability in quantitative studies. Evid.

    Based Nurs. 18, 66–67. https://doi.org/10.1136/eb-2015-102129

    Hübl, A., 2018. Stochastic Modelling in Production Planning. Springer Fachmedien

    Wiesbaden, Wiesbaden. https://doi.org/10.1007/978-3-658-19120-7

    Jaggi, C., Verma, M., Jain, R., 2018. Quantitative analysis for measuring and

    suppressing bullwhip effect. Yugosl. J. Oper. Res. 28, 415–433.

    https://doi.org/10.2298/YJOR161211019J

    Jiawei Han, Micheline Kamber, Jian Pei, 2012. 3 - Data Preprocessing.

    https://doi.org/10.1016/B978-0-12-381479-1.00003-4

    Khakdaman, M., Wong, K.Y., Zohoori, B., Tiwari, M.K., Merkert, R., 2015. Tactical

    production planning in a hybrid Make-to-Stock–Make-to-Order environment

    under supply, process and demand uncertainties: a robust optimisation

    model. Int. J. Prod. Res. 53, 1358–1386.

    https://doi.org/10.1080/00207543.2014.935828

  • 37

    Liu, Y., Liao, S., Jiang, S., Ding, L., Lin, H., Wang, W., 2020. Fast Cross-Validation for

    Kernel-Based Algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 42, 1083–

    1096. https://doi.org/10.1109/TPAMI.2019.2892371

    Makridakis, S., 1993. Accuracy measures: theoretical and practical concerns. Int. J.

    Forecast. 9, 527–529. https://doi.org/10.1016/0169-2070(93)90079-3

    Mendo, L., 2009. Estimation of a probability with guaranteed normalized mean

    absolute error. IEEE Commun. Lett. 13, 817–819.

    https://doi.org/10.1109/LCOMM.2009.091128

    Mirzapour Al-e-hashem, S.M.J., Malekly, H., Aryanezhad, M.B., 2011. A multi-

    objective robust optimization model for multi-product multi-site aggregate

    production planning in a supply chain under uncertainty. Int. J. Prod. Econ.

    134, 28–42. https://doi.org/10.1016/j.ijpe.2011.01.027

    Mula, J., Poler, R., García-Sabater, J.P., Lario, F.C., 2006. Models for production

    planning under uncertainty: A review. Int. J. Prod. Econ. 103, 271–285.

    https://doi.org/10.1016/j.ijpe.2005.09.001

    “neural-networks” tag wiki [WWW Document], n.d. . Math. Stack Exch. URL

    https://math.stackexchange.com/tags/neural-networks/info (accessed 5.17.20).

    Ngniatedema, T., Fono, L.A., Mbondo, G.D., 2015. A delayed product customization

    cost model with supplier delivery performance. Eur. J. Oper. Res. 243, 109–

    119. https://doi.org/10.1016/j.ejor.2014.11.017

    Our core principles - Economic and Social Research Council [WWW Document], n.d.

    URL https://esrc.ukri.org/funding/guidance-for-applicants/research-

    ethics/our-core-principles/ (accessed 3.16.20).

    Pandis, N., 2016. Linear regression. Am. J. Orthod. Dentofacial Orthop. 149, 431–434.

    https://doi.org/10.1016/j.ajodo.2015.11.019

    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel,

    M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,

    Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É., 2011. Scikit-learn:

    Machine Learning in Python. J. Mach. Learn. Res. 12, 2825−2830.

    Peidro, D., Mula, J., Jiménez, M., del Mar Botella, M., 2010. A fuzzy linear

    programming based approach for tactical supply chain planning in an

    uncertainty environment. Eur. J. Oper. Res. 205, 65–80.

    https://doi.org/10.1016/j.ejor.2009.11.031

    Ranstam, J., Cook, J.A., 2018. LASSO regression. BJS Br. J. Surg. 105, 1348–1348.

    https://doi.org/10.1002/bjs.10895

    Russo, A., Raischel, F., Lind, P.G., 2013. Air quality prediction using optimal neural

    networks with stochastic variables. Atmos. Environ. 79, 822–830.

    https://doi.org/10.1016/j.atmosenv.2013.07.072

    Schmidt, A.F., Finan, C., 2018. Linear regression and the normality assumption. J. Clin.

    Epidemiol. 98, 146–151. https://doi.org/10.1016/j.jclinepi.2017.12.006

    Shanmuganathan, S., Samarasinghe, S. (Eds.), 2016. Artificial Neural Network

    Modelling, Studies in Computational Intelligence. Springer International

    Publishing, Cham. https://doi.org/10.1007/978-3-319-28495-8

  • 38

    Silva, T.C., Ribeiro, A.A., Periçaro, G.A., 2018. A new accelerated algorithm for ill-

    conditioned ridge regression problems. Comput. Appl. Math. 37, 1941–1958.

    https://doi.org/10.1007/s40314-017-0430-4

    sklearn.linear_model.Lasso — scikit-learn 0.22.2 documentation [WWW Document],

    n.d. URL https://scikit-

    learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.

    linear_model.Lasso (accessed 4.30.20).

    sklearn.linear_model.LinearRegression — scikit-learn 0.22.2 documentation [WWW

    Document], n.d. URL https://scikit-

    learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.ht

    ml#sklearn.linear_model.LinearRegression (accessed 4.29.20).

    sklearn.model_selection.GridSearchCV — scikit-learn 0.22.2 documentation [WWW

    Document], n.d. URL https://scikit-

    learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.ht

    ml (accessed 5.5.20).

    Soo, K.W., Rottman, B.M., 2018. Causal strength induction from time series data. J.

    Exp. Psychol. Gen. 147, 485–513.

    http://dx.doi.org.proxybib.miun.se/10.1037/xge0000423

    Su, X., Yan, X., Tsai, C.-L., 2012. Linear regression. WIREs Comput. Stat. 4, 275–294.

    https://doi.org/10.1002/wics.1198

    Sweeney, E., Grant, D.B., Mangan, D.J., 2018. Strategic adoption of logistics and

    supply chain management. Int. J. Oper. Prod. Manag. 38, 852–873.

    https://doi.org/10.1108/IJOPM-05-2016-0258

    Varoquaux, G., 2018. Cross-validation failure: Small sample sizes lead to large error

    bars. NeuroImage 180, 68–77. https://doi.org/10.1016/j.neuroimage.2017.06.061

    Wang, S., Ji, B., Zhao, J., Liu, W., Xu, T., 2018. Predicting ship fuel consumption based

    on LASSO regression. Transp. Res. Part Transp. Environ. 65, 817–824.

    https://doi.org/10.1016/j.trd.2017.09.014

    Weiss, C.J., 2017. Introduction to Stochastic Simulations for Chemical and Physical

    Processes: Principles and Applications. J. Chem. Educ. 94, 1904–1910.

    https://doi.org/10.1021/acs.jchemed.7b00395

    Welc, J., Esquerdo, P.J.R., 2018. Basics of Regression Models, in: Welc, J., Esquerdo,

    P.J.R. (Eds.), Applied Regression Analysis for Business: Tools, Traps and

    Applications. Springer International Publishing, Cham, pp. 1–6.

    https://doi.org/10.1007/978-3-319-71156-0_1

    Zou, H., Hastie, T., 2005. Regularization and Variable Selection via the Elastic Net. J.

    R. Stat. Soc. Ser. B Stat. Methodol. 67, 301–320.


Recommended