Post on 17-Jan-2016
transcript
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
1
Learning Simio
Chapter 10Analyzing Input Data
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
2Simio
Outline
Working with various types of data.Fitting distributions to data.Summary of common distributions.Modeling customer arrivals.Modeling task times. Sensitivity of results to data.
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
3Simio
Model Input Data
A model has both structure and input data.
Both the model structure and the input data have a significant impact on the results.
The data can be a problematic aspect of a modeling project.
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
4Simio
Typical Data Cases
No data exists.Data exists in the wrong form.Lots of good data exists.
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
5Simio
No data exists Consider using the Triangular or Pert
distributions (minimum, mode, maximum) for activity times.
Hypothesize distributions based on the underlying processes, and make educated guesses for the parameters.
Run experiments to test sensitivity of results to the parameters.
Don’t use a mean in place of a distribution.
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
6Simio
Data exists in the wrong form. Data observed from a different real-world
process. Time between failures when failures are
count based. Time to repair when repairs are resource
constrained. Data recorded during a “slow time” or a
“busy time Values from multiple processes with no
discriminatory information (e.g., repair times without noting the type of stoppage).
Use the data that does exist to make intelligent guesses for the required data.
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
7Simio
Lots of data exists
If a large amount of data is available an empirical distribution may be used – however a theoretical distribution is preferred (compact, fast, easy to change).
If possible, hypothesize a distribution based on the underlying process (combine data and theory).
Use goodness of fit software to test the hypothesis and estimate the parameters.
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
8Simio
Data Fitting Procedure
Assess IID assumptions. Independent observations. Identically distributed.
Use software to view the data using a histogram
Hypothesize a distribution family/form. Use software to:
Estimate distribution parameters Assess quality of fit
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
9Simio
Sample Data Sets
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
10Simio
Common Distributions
Binomial – Models the number of successes in n trials, when the trials are independent with common success probability, p; for example; the number of defective computer chips found in a lot of n chips.
Negative Binomial – Models the number of trials required to achieve k successes; for example, the number of computer chips that we must inspect to find 4 defective chips.
Poisson – Models the number of independent events that occur in a fixed amount of time or space; for example, the number of customers that arrive to a store during 1 hour, or the number of defects found in 30 square meters of sheet metal.
Normal – Models the distribution of a process that can be thought of as the sum of a number of component processes; for example, a time to assemble a product that is the sum of times required for each assembly operation.
Lognormal – Models the distribution of a process that can be thought of as the product of a number of component processes; for example, the rate of an investment, when interest is compounded, is the product of the returns for a number of periods.
Banks et al., pp. 314-316
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
11Simio
Common Distributions
Exponential – Models the time between independent events, or a process time that is memoryless; for example, the times between the arrivals from a large population of potential customers who act independently of each other. The exponential is a highly variable distribution; it is sometime overused because it often leads to mathematically tractable models. Recall that, if the time between events is exponentially distributed, then the number of events in a fixed period of time is Poisson.
Gamma – An extremely flexible distribution used to model nonnegative random variables (can be shifted away from 0 by adding a constant).
Beta – An extremely flexible distribution used to model bounded random variables. The beta can be shifted away from 0 by adding a constant and can be given a range larger than [0, 1] by multiplying by a constant.
Erlang – Models processes that can be viewed as the sum of several exponentially distributed processes; for example, a computer network fails when a computer and two backup computers fail, and each has a TTF that is exponentially distributed.
Banks et al., pp. 314-316
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
12Simio
Common Distributions
Weibull – Models the time to failure for components; for example, the time to failure for a disk drive. The exponential is a special case of the Weibull.
Discrete or Continuous Uniform – Models complete uncertainty: All outcomes are equally likely. This distribution is often used inappropriately, when there are no data.
Triangular – Models a process for which only the minimum, most likely, and maximum values of the distribution are known; for example, the minimum, most likely, and maximum time required to test a product. This model is a marked improvement over the uniform distribution [in many cases].
Pert – A special case of the Beta with minimum, most likely, and maximum values. The pert provides a “smooth” alternative to the triangular in the absence of data.
Empirical – Samples from the distribution of the actual data collected; often used when no theoretical distribution seems appropriate.
Banks et al., pp. 314-316
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
13Simio
Goodness-of-fit (GOF) Tests
Statistical hypothesis tests that are used to assess formally whether the observations X1, X2, …, Xn constitute an independent sample from a particular distribution function
Hypothesis:
H0: The Xi’s are IID random variables with the specified distribution function.
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
14Simio
GOF Test Considerations
Failure to reject the null hypothesis should not be interpreted as “accepting H0 as being true.”
GOF tests are not very powerful for small-to-moderate sample sizes. Also, when n is large, the tests will often reject H0 since even minute differences will be detected.
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
15Simio
Some GOF Software Options
General packages EasyFit (www.mathwave.com)
Simulation specific packages Stat::Fit (www.geerms.com) ExpertFit (www.averill-law.com)
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
16Simio
Modeling Arrivals
If arrivals are independent and random, they follow a Poisson process. The number of arrivals in a fixed time is
Poisson. The time between arrivals is exponential.
In some cases the arrival rate may vary over time – Simio supports step-wise linear arrival rates using a Rate Table.
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
17Simio
Modeling Task Times
Use a distribution with a range >= 0 (e.g. not the Normal or JohnsonUB).
In the absence of data Triangular and Pert are possible choices.
With supporting data the Gamma, LogNormal, Weibull, LogLogisitc, Beta, PearsonIV, and JohnsonSB are possible choices.
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
18Simio
Gamma, Log Normal, Weibull
gamma
Log Normal Weibull
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
19Simio
Determining what data is critical
Some data may have a dominant impact on performance.
The variability is often more important than the mean.
Run scenarios specifically designed to determine the sensitivity of the model to the data inputs.
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
20Simio
References Leemis, L, “Input Modeling Techniques for
Discrete-Event Simulations,” Proceedings of the 2001 Winter Simulation Conference, Washington, DC, December 2001.
Vincent, S., “Input Data Analysis,” in Handbook of Simulation, Edited by J. Banks, John Wiley & Sons, Inc, New York, NY, pp. 55-91, 1998.
Chapter 9 – Input Modeling (Banks et al.) Chapter 6 – Selecting Input Probability
Distributions (Law)
www.simio.com| Copyright 2010 Simio LLC | All rights reserved.
21Simio
Summary Distributions are the primary method for capturing
variability in the system. Never use a mean in place of a distribution for a
random component. When data exists hypothesize a distribution and
estimate parameters and test using goodness-of-fit software.
In the absence of data, use appropriate distributions. Arrivals – exponential time between arrivals, or non-
stationary Poisson. Activities – triangular or pert.
Use the model to determine the critical data elements.