Date post: | 19-Feb-2016 |
Category: |
Documents |
Upload: | abdulhmeed-mutalat |
View: | 221 times |
Download: | 0 times |
0
Republic of Sudan
Ministry of High Education & Scientific Research
Nile Valley University
College of Post Graduate Studies
Lumpy Demand
Reported by/
Abdulhmeed Mohamed Elhassan Mahjoub Ali
1
Lumpy Demand
Introduction
Demand forecasting is one of the most crucial issues of inventory management.
Forecasts, which form the basis for the planning of inventory levels, are probably
the biggest challenge in the repair and overhaul industry.
The problem of controlling items with lumpy demand patterns has received
relatively little attention, even though these items constitute an appreciable portion
of the inventories in parts and supply types of stockholdings. Lumpy demand arises
in service parts and electronics components when there are variations in volumes
associated with the product mix, and with intervals between demands being fairly
erratic and unpredictable.
Lumpy demand, intermittent demand, or slow-moving demand, that is when
there are time periods without demand and then suddenly a time period with
demand, becomes even more difficult to forecast. If the demand is underestimated
it will lead to lost sales and therefore lost revenues. If the demand is overestimated,
in the best case the stock is increased or in worst case, the items lie unsold until
they become obsolete.
Definitions
However, to forecast lumpy demand is different compared to when there is a
demand in every forecasting period. Silver (1981) defines lumpy
2
demand as when both the demand and the periods between the demands are
random. Croston (1972) defines lumpy as when the demand is zero in a
number of forecasting periods. When the zero demands are present some of the
most used forecasting methods start to overestimate the demand. Croston (1972)
suggested that the forecast should be split in two; one for the demand and the other
forecast for the number of periods between the demand occasions, inter-demand.
Syntetos and Boylan (2001) proved that the Croston forecasting method also had
bias. The method overestimates the demand.
There have been a number of alternatives to Croston’s forecasting method,
some of the alternatives to Croston’s method are; Segerstedt (2000) Snyder (2002),
willemain et al (2004) and Syntetos and Boylan (2005). [1]
Forecasting Methods
Some of the most common methods to forecast lumpy demand are SES
and moving average. Moving average is the mean of a fixed number of the
demand of the previous periods, as new demands occurs it replaces the oldest
(Makridakis et al, 1998). However, the ability for SES to forecast slow moving
items or an lumpy demand has been questioned. Croston (1972) presents a
method that is updated only when there is a demand and therefore the forecast
precision should increase. The method consists of two forecasts, one for the
demand and the other for the inter-demand period. Segerstedt (2000) suggests a
variant of Croston but with one forecast. Syntetos and Boylan (2005) present a
version of Croston where a bias correction is added. Bias is a systematic error,
when the forecast is, on the average, significantly above or below the demand
during the forecasted periods
3
1 Single Exponential Smoothing (SES)
Single exponential smoothing is a technique applied in different fields,
such as forecasting Brown (1959), and process regulation Montgomery (2005).
According to Gardner (2006) the method was originally developed for
antisubmarine purposes. Brown used a variant of the exponential smoothing to
create a tracking model for fire-control information on the location of the
submarine. Makridakis and Hibon (1991) consider SES to be a robust method
that is easy to use.
In every time period the model is re-estimated with the most recent available
demand data and the previous forecast. The smoothing constant, Į, regulates
the amount of influence the forecast error have. The forecast error is the difference
between the real demand and the forecasted demand.
(Montgomery et al, 1990)
X ˆ t+1 = X ˆ t +α (X t − X ˆ t ) .
Another way of describing the function of the smoothing
constant is that the different observations have weights that decrease geometrically
with age. The smoothing constant regulates the influence of historical values; a low
smoothing constant emphasis the past, favorable with a stable demand but
then the technique is slow to react if systematic changes occur. A high
smoothing constant emphasis the most recent observations, which is better
suited when faster reaction is wanted, but the drawback is sensitivity to random
changes. (Montgomery et al, 1990)
In a practical application different smoothing constants should be used for
4
different classes of items, which should also be the case with SES. A
smoothing constant between 0.1-0.3 is suitable for SES when forecasts are
done on a monthly basis (Silver et al, 1998).
The weight given to data with number of k periods ago can be expressed as
Į(1- Į)k which makes the average age:
With a higher resolution of the forecast intervals (shorter time periods) the
probability for periods with zero demand increases. If several zero demand is
consecutive the forecast will decrease and eventually approach zero. This
scenario is most likely to occur when the items are slow moving. The items
have an lumpy demand. An alternative is to update the forecast only after
a demand has occurred. This makes SES biased which is not the case when the
update occurs in every period (Boylan and Syntetos, 2007).
The conclusions of SES as a forecast method of lumpy demand are varied. Croston
(1972) discusses SES problem with overestimation of the demand when the
forecast update takes place right after a demand. Boylan and Johnston (1996)
consider SES to be suitable when the inter-demand is 1.25 periods or lower. Eaves
and Kingsman (2004), on the other hand, concludes that SES can be used as a
method for demand that is lumpy.
2 The Croston Method (Croston)
Croston (1972) presented a solution for slow-moving items. He suggests
that the forecast should be divided in two parts; one for the demand size and one
for the interdemand interval. The forecast is only recalculated when there is a
demand.
5
Where
The two exponential smoothing forecasts are then combined to estimate the mean
demand per period length:
Syntetos and Boylan (2001) showed that the original Croston method
overestimated the outcome and that it therefore was biased. They suggested a
modification to the Croston method. Bias is that the forecast is, on the average,
significantly above or below the demand during the forecasted periods. The
modification can be described as a bias correcting function. In eq. (5) a bias
corrector is added to the original Croston. The forecast updates are the same as for
the original Croston:
3 Modified Croston (ModCr)
SES uses one smoothing constant where Croston and SyBo use two
smoothing constants which increase the complexity compared to SES. The Croston
technique forecasts the mean demand and the mean inter-demand. Another
interpretation of the quotient is that it represents the demand. Levén and Segerstedt
(2004) presented a version of Croston, Modified Croston (ModCr), that forecasts
the demand rate directly instead of separating the forecast into demand and inter-
demand; therefore it requires one smoothing constant. The update occurs when
there is a demand, but maximum is one per working day. If there are several
6
demands in a day, the demands are summarised. The demand rate is the quotient
between the demand and the inter-demand interval. In a simulation study the
method was shown to perform better than SES. The method was proposed to avoid
the bias problem that Croston had.
The idea behind ModCr is to avoid the decision of what method to use in a
practical application, SES or Croston. A withdrawal every time period
(working day) transforms equation. When the demand takes place in every time
period the ModCr is equal to SES. The smoothing constant for ModCr must have
lower start values, probably 0.05-0.3, when the resolution of the time period is
higher, days or weeks instead of months. (This is also valid for other forecast
methods). Items with high frequency of withdrawals or demand (every day) with
ModCr should have a lower constant than SES, if the forecast interval for SES is
much longer (weeks, months) than for ModCr (days).
The simulation study of Levén and Segerstedt (2004) has not been confirmed
in other studies. Teunter and Sani (2009) find the ModCr to have bias problem
that is more severe than for Croston that also tends to overestimate the demand.
This is a confirmation of the results Syntetos and Boylan (2007) presented.
They also found that the bias of ModCr is not dependent of the value of the
smoothing constant. The statement that ModCr is nonbiased is a relevant claim
when demand rate is considered, however this is not what the method is
supposed to forecast (Syntetos Boylan, 2007). Gardner (2006) stated that there
was no evidence presented that motivates the statement of nonbiased.[2]
7
4 Teunter, Syntetos, Babai (TeunterSB)
Teunter et al (2011) present a new idea to forecast lumpy demand. Every
time period a probability for demand is updated; and a forecast for expected
demanded quantity is updated only when there is a demand:
In a practical application different smoothing constants should be used for
different classes of items, which should also be the case with SES. Silver et al
(1998) discuss that a smoothing constant between 0.1-0.3 is mostly suitable for
SES when forecasts are done on a monthly basis. With ModCr items with high
frequency of withdrawals or demand (every day) should have a lower smoothing
constant than SES, as the forecast interval for SES is longer (weeks, months) than
for ModCr (days); this is also valid for other forecast methods.[3]
5 Markov Chain Model.
In a first order Markov chain model, the estimation of the next state is
obtained depending only on the one-step transition probability matrix, whereas k-
step transition matrix is needed for kth-order Markov chain model. The higher
order (kth-order) Markov chain models were first proposed by Raftery [22].
Raftery’s model is extended by [23,24] to a more general higher order Markov
chain model given in (1).
8
X ˆ (n) is the state vector which is the prediction of the next state at time n. Q
ˆi is the i-step transition probability matrix and λi are the weights given by [24] as
nonnegative real numbers such that
λi (i = 1,2, . . . , k) can be estimated by the maximum likelihood estimation
or obtained
by solving a linear programming (LP) model which is proposed by [23].
6 Modified Markov chain model.
Accurate estimates of lumpy demands cannot be obtained by applying
Markov chain model because of the high percentage of zero demands. A
modification is needed when obtaining forecast values. The steps of the algorithm
proposed are described in this section. All computations were performed by a
computer code written in Matlab.
Step 1. The frequency of each demand value is obtained for data set j (j = 1,2, . . .
,695). While some demand values (such as 0, 1, 2 and 3) are frequently observed,
the frequencies of some demand values (such as 8,9, . . . ,12) are low. Each
demand value with high frequency corresponds to a state of a Markov chain. The
demands with low frequencies are collected under a single group in order to be
reconsidered in the procedure of forecasting.
Step 2. The states of the Markov chain are determined.
Since Markov chain model used to model mostly in categorical data, the states of
the Markov chain are the categorical variables. However, in modified Markov
9
model, the states include the demand information.
Step 3. The one-step and two-step transition probability matrices are computed.
Step 4. Steady state probabilities are calculated.
Step 5. The linear programming (LP) model is constructed.
The steady state probabilities and the transition probabilities are the parameters of
the LP model and the λis are the decision variables.
Step 6. Values are calculated as a solution of the LP model.
The number of values determines the order of the Markov chain model. First order
Markov chain model is required for some of the data sets whereas second order
Markov chain model is constructed for some of them.
Step 7. By using values and the transition probability matrices, the Markov chain
model is constructed for data set j.
Step 8. According to the resulting Markov chain model, the demand forecasts are
obtained.
The modification of the Markov chain model is mostly included in this step. [23]
states that, according to k-th order state probability distribution, the prediction of
the next state (X ˆ (n)) must be taken as the state with the maximum probability.
The highest probability obviously states that the related state is the most probably
to occur but if there are some states which have non-zero probabilities, these states
are also likely to occur relative to their probabilities. We claim that when
estimating the next state, considering all the states relative to their probabilities
will yield better forecast values. Therefore, in the procedure we proposed we have
considered not only the state with the highest probability but also all the states in
proportion to their probabilities when estimating the next state of the Markov
chain. If more than one demand with a low frequency corresponds to a state, it is
thought that the demands within this state occur with an equal probability.
Consequently, modified Markov chain model determines the next state taking into
10
account all states relative to their probabilities whereas in Markov chain model
the state with maximum probability determines the next state.
Step 9. The accuracy measure r is calculated for the forecast values.
The accuracy measure used for the Markov chain model, which is represented by r,
is computed for the forecast values.
Step 10. Repeat Steps 8 and 9 until obtain the highest r value.
Since the determination of forecast values depends on the probabilities of states,
different sequences will be obtained whenever Step 8 is repeated. Step 8 is
repeated for several times and the data set that gives the highest r value is recorded
as the final forecast values.
Step 11. The lead time demands (LTD) are computed consistent with the lead time
information for data set j. Each data set may have different lead time. Summation
of monthly demand forecasts along with the lead time results in forecasts of LTD.
Step 12. Steps 8-11 are repeated many times for the same data set and the LTD
distribution is obtained.
7.Holt –Winters methods
Additive and multiplicative winter are two methods proposed by Winters and Holt
in rder to considerate hypothetical seasonal effects. A first way to considerate these
seasonal effects is the introduction of a drift D which modifies the levelled values
according to variables which depend upon time. Drift d is a function which
represents the trend. For example, a model which considerate trend effect is this:
11
The first can be seen as a weighted average of the observed value (yt) and
the forecast calculated at the previous period; the second as a weighted average of
the difference between forecasts calculated at the period t and t-1 and the drift
calculated at the period t-1 (to attribute a weight equal to 1 to this last one is
equivalent to assume a linear trend, that is a constancy in the drift).
The additive winter AW and multiplicative winter MW are an extension
of this first example in order to also considerate the seasonality in strict meaning.
The Additive Winter starts from the following relations:
where st is a factor of seasonality and p his periodicity (4 for quarterly data,
12 for monthly data, and so on). The demand forecast for the period t is:
12
In parallel, Multiplicative Winter has the following relations:
and the forecast demand for the period t is:
These models are very flexible , because they can also consider non-
polynomial trends and not-constant seasonality . With regard to the choice of the
weights , and ,values the minimize the square of the gaps can be taken or, in
alternative, they can be chosen in line with the scope of the analysis.
8.Bootstrap method
Hua et al. (2006, p.1037) say that when historical data are limited, the
bootstrap method is a useful tool to estimate the demand of spare parts.
Bookbinder and Lordahl (1989, p 303) found the bootstrap superior to the normal
approximation for estimating high percentiles of spare parts demand for
independent data. Wang and Rao (1992, p 333-336) also found the bootstrap
effective to deal with smooth demand. All these papers do not consider the special
problems of managing intermittent demand. Willemain et al. (2004, p.377-381)
provided an approach of forecasting intermittent demand for service parts
inventories. They developed a bootstrap-based approach to forecast the distribution
of the sum of intermittent demands over a fixed lead time. Bootstrapping is a
modern, computer-intensive, general purpose approach to statistical inference,
falling within a broader class of re-sampling methods. Bootstrapping is the practice
13
of estimating properties of an estimator (such as its variance) by measuring those
properties when sampling from an approximating distribution. One standard choice
for an approximating distribution is the empirical distribution of the observed data.
In the case where a set of observations can be assumed to be from an independent
and identically distributed population, this can be implemented by constructing a
number of re-samples of the observed dataset (and of equal size to the observed
dataset), each of which is obtained by random sampling with replacement from the
original dataset.
The bootstrap procedure can be illustrated with the following steps:
1- take an observed sample (in our case a sample of historical spare parts
demand) of number equal to n, called X = (x1, x2, …, xn);
2- from X, resample m other samples of number equal to n obtaining X1, X2, …,
Xm (in every bootstrap extraction, the data of the observed sample can be
extracted more then one time and every data has the probability 1/n to be
extracted);
3- given T the estimator of , parameter of study (in our case it may be the
average demand), calculate T for every bootstrap sample. In this way we have
m estimates of ;
4- from these estimates calculate the desired value: in our case the mean of T1,
…, Tm can be the demand forecast.
This method can be applied not only to find the average demand (that can be the
demand forecast) but also the intervals between non zero-demand or other desired
values.
9.Poisson method
Poisson method is typically used for the forecast of the probability of happening of
a rare event (Manzini et al., 2007, p.205). It derives directly from the binomial
distribution. This method doesn’t allow the direct calculation of the variable to
14
forecast, but it consents an estimate of the probability that it assumes a determined
value. The point of start of this model is the valuation of the average value of the
variable to forecast. In case of spare parts, given the average consumption in an
interval time T equal to d, the probability to have a demand equal to x (i.e. x
requires of components) in the interval time T is:
In consequence, the cumulative probability (a measure that not more than x
components are required) can be expressed as:
Accuracy metrics for lumpy demand
The most commonly used scale-dependent metrics are based on absolute
errors or on squared errors:
Mean Absolute Error (MAE)
Geometric Mean Absolute Error (GMAE)
Mean Square Error (MSE)
where “gmean” is a geometric mean. The MAE is often abbreviated as the
MAD (“D” for “deviation”). The use of absolute values or squared values prevents
negative and positive errors from offsetting each other.
Since all of these metrics are on the same scale as the data, none of them are
meaningful for assessing a method’s accuracy across multiple series.
15
For lumpy-demand data, Syntetos and Boylan (2005) recommend the use of
GMAE, although they call it the GRMSE. (The GMAE and GRMSE are identical
because the square root and the square cancel each other in a geometric mean.)
Boylan and Syntetos (this issue) point out that the GMAE has the flaw of being
equal to zero when any error is zero, a problem which will occur when
both the actual and forecasted demands are zero. This is the result seen in Table 1
for the naïve method.
Boylan and Syntetos claim that such a situation would occur only if an
inappropriate forecasting method is used. However, it is not clear that the naïve
method is always inappropriate. Further, Hoover indicates that division-byzero
errors in lumpy series are expected occurrences for repair parts. I suggest that the
GMAE is problematic for assessing accuracy on lumpy-demand data.
Percentage errors
The percentage error is given by pt = 100et /Yt. Percentage errors have the
advantage of being scale independent, so they are frequently used to compare
forecast performance between different data series. The most commonly used
metric is
Mean Absolute Percentage Error (MAPE) = mean(|pt |)
Measurements based on percentage errors have the disadvantage of being
infinite or undefined if there are zero values in a series, as is frequent for lumpy
data. Moreover, percentage errors can have an extremely skewed
distribution when actual values are close to zero. With lumpy-demand data, it is
impossible to use the MAPE because of the occurrences of zero periods of
demand. The MAPE has another disadvantage: it puts a heavier penalty on positive
errors than on negative errors. This observation has led to the use of the
16
“symmetric” MAPE (sMAPE) in the M3-competition (Makridakis & Hibon,
2000). It is defined by
sMAPE = mean(200 |Yt – Ft | / (Yt + Ft ))
However, if the actual value Yt is zero, the forecast Ft is likely to be close to
zero. Thus the measurement will still involve division by a number close to zero.
Also, the value of sMAPE can be negative, giving it an ambiguous interpretation.
Relative errors
An alternative to percentages for the calculation of scale independent
measurements involves dividing each error by the error obtained using some
benchmark method of forecasting. Let rt = et /et denote the relative error where
et Usually the benchmark method is the naïve method where Ft is equal to the last
observation. Then we can define
Median Relative Absolute Error (MdRAE) = median(|rt |)
Geometric Mean Relative Absolute Error (GMRAE) = gmean(|rt |)
Because they are not scale dependent, these relative-error metrics were
recommended in studies by Armstrong and Collopy (1992) and by Fildes
(1992) for assessing forecast accuracy across multiple series. However,
when the errors are small, as they can be with lumpy series, use of
the naïve method as a benchmark is no longer possible because it would
involve division by zero.
Scale-free errors
The MASE was proposed by Hyndman and Koehler (2006) as a generally
applicable measurement of forecast accuracy without the problems seen in
17
the other measurements. They proposed scaling the errors based on the in-
sample MAE from the naïve forecast method. Using the naïve method,
we generate one-period-ahead forecasts from each data point in the sample.
Accordingly, a scaled error is defined as
The first row of the table below shows the lumpy series plotted in Figure 1.
The second row gives the naïve forecasts, which are equal to the previous
actual values. The final row shows the naïve-forecast errors. The
denominator of qt is the mean of the shaded values in this row; that is the
MAE of the naïve method.
The only circumstance under which the MASE would be infinite or
undefined is when all historical observations are equal. The in-sample MAE
is used in the denominator because it is always available and it effectively
scales the errors. In contrast, the out-of-sample MAE for the naïve method
may be zero because it is usually based on fewer observations. For example,
if we were forecasting only two steps ahead, then the out-of-sample MAE
would be zero. If we wanted to compare forecast accuracy at one step ahead
for ten different series, then we would have one error for each series. The
out-of-sample MAE in this case is also zero. These types of problems are
avoided by using in-sample, one-step MAE.
18
A closely related idea is the MAD/Mean ratio proposed by Hoover (this
issue) which scales the errors by the in-sample mean of the series instead of
the in-sample mean absolute error. This ratio also renders the errors scale
free and is always finite unless all historical data happen to be zero.
Hoover explains the use of the MAD/Mean ratio only in the case of in-
sample, one-step forecasts (situation 2 of the three situations described in the
introduction). However, it would also be straightforward to use the
MAD/Mean ratio in the other two forecasting situations. The main
advantage of the MASE over the MAD/Mean ratio is that the MASE is more
widely applicable. The MAD/Mean ratio assumes that the mean is stable
over time (technically, that the series is “stationary”). This is not true for
data which show trend, seasonality, or other patterns. While lumpy
data is often quite stable, sometimes seasonality does occur, and this might
make the MAD/Mean ratio unreliable. In contrast, the MASE is suitable
even when the data exhibit a trend or a seasonal pattern.
The MASE can be used to compare forecast methods on a single series, and,
because it is scale-free, to compare forecast accuracy across series. For
example, you can average the MASE values of several series to obtain a
measurement of forecast accuracy for the group of series. This measurement
can then be compared with the MASE values of other groups of series to
identify which series are the most difficult to forecast. Typical values for
one-step MASE values are less than one, as it is usually possible to obtain
forecasts more accurate than the naïve method. Multistep MASE values are
often larger than one, as it becomes more difficult to forecast as the horizon
increases. The MASE is the only available accuracy measurement that can
be used in all three forecasting situations described above, and for all
forecast methods and all types of series. [5]
19
References:
1/ Wallstrom P., Evaluation of forecasting techniques and forecast
errors With focus on intermittent demand, Lulea University of
Technology, Lulea, Sweden,2009
2/ Ralph D. Snyder, Keith J. Ord, Beaumont J., Forecasting the Intermittent
Demand for Slow-Moving Items, RPF Working Paper No. 2010-003, The George
Washington University,2010
3/ Segerstedt A., Levén E., A study of different Croston-like forecasting methods,
Working Paper Industrial Logistics, Lulea University of Technology, Lulea,
Sweden, 2012.
4/ Kocer U., Forecasting intermittent demand by Markov chain model,
International Journal of Innovative Computing, Information and Control Volume
9, Number 8, August 2013,
5/ Hyndman R., Another look at forecast-accuracy metrics for intermittent
demand, International Journal of Forecasting, Monash Australia June 2006.
6/ Maria Caridi1 and Roberto Cigolini, Buffering against lumpy demand in MRP
environments: a theoretical approach and a case study, Proceedings of The Fourth
SMESME International Conference, Milano, Italy, 2012.
20
7/ Maria Elena Nenni, Luca Giustiniano, and Luca Pirolo, Demand Forecasting in
the Fashion Industry: A Review, International Journal of Engineering Business
Management Vol 5 July 2013.
8/ Umay Uzunoglu Kocer, Forecasting Intermittent Demand by Markov Chain
Model, International Journal of Innovative Computing, Information and Control
Volume 9, Number 8, August 2013.
9/ Ralph D. Snyder, J. Keith Ord and Adrian Beaumont, Forecasting the
Intermittent Demand for Slow-Moving Items, Center of Economic Research
Department of Economics The George Washington University Washington, DC
20052, Revised: March 11, 2011.
10/ S. D. Prestwich, R. Rossi, S. A. Tarim, and B. Hnich, Mean-Based Error
Measures for Intermittent Demand Forecasting, arXiv:1310.5663v1 [stat.ME] 18
Oct 2013.