Cost-Optimized Reliability Test Planning and Decision-Making Through Bayesian Methods
and Leveraging Prior Knowledge
ASQ Reliability Division Webinar Program
Jun 6th 2013
Charles H. Recchia, MBA, [email protected]
ASQ RD Webinar 2
COST-OPTIMIZED RELIABILITY TEST PLANNING AND DECISION-MAKING THROUGH BAYESIAN METHODS AND LEVERAGING
PRIOR KNOWLEDGE When planning for and interpreting reliability datasets proper application of Bayesian statistics leads to improved decision-making, resource utilization and allows for rigorous treatment of prior knowledge to optimize overall reliability program costs and increase return on investment. In this webinar, we build upon the foundation established in our previous intro-level presentation and provide specific examples of reduced sample sizes enabled by Bayesian methods. We also describe real-world scenarios of improved decision-making during comparative reliability analyses using proper statistical perspectives on relative failure rates between systems.
Charles H Recchia, MBA, PhD has more than twenty-five years of product development, engineering management, and fundamental research experience with a special focus on reliability statistics of complex systems. He earned his doctorate in Condensed Matter Physics from The Ohio State University, and a Master of Business Administration degree from Babson College. Dr. Recchia acquired in-depth reliability engineering expertise at Intel’s Portland Technology Development, MKS Instruments and Saint-Gobain Innovative Materials R&D, has served as visiting professor of physics at Wittenberg University, and is author of numerous peer-reviewed technical papers and patents across multiple fields. Charles provided statistics & advanced lean six sigma consultancy for A123 Systems via the Andover-based Quality Support Group Inc, and has contracted under Coleman Research Group vetting CASIS-ISS US National Lab research proposals. A senior member of ASQ and the American Physical Society, Charles currently works at Raytheon Integrated Defense Systems and serves on the Advisory Committee for the Boston Chapter of the IEEE Reliability Society.
6/6/2013
ASQ RD Webinar 3
References and Further Reading• NIST/SEMATECH e-Handbook of Statistical Methods,
http://www.itl.nist.gov/div898/handbook/, April (2012)• Statistical Methods for Reliability Data, WQ Meeker and LA Escobar
(1998)• Applied Reliability, 2nd edition, PA Tobias and DC Trindade (1995)• Bayesian Reliability, MS Hamada, AG Wilson, CS Reese, and HF Martz,
Springer Series in Statistics (2008)• Bayesian Reliability Analysis, HF Martz and RA Waller (1982)• Methods for Statistical Analysis of Reliability and Life Data, NR Mann, RE
Schafer, and ND Singpurwalla (1974)• Bayes is for the Birds, RA Evans, IEEE Transactions on Reliability R-38, 401
(1989).• “A Compendium of Conjugate Priors,” Daniel Fink (1997)
6/6/2013
ASQ RD Webinar 4
Agenda
• Brief Review of Bayesian Method
• Examples of Reduced Test Sample Sizes
• Comparative Reliability Decision Making
• Question and Answer
6/6/2013
ASQ RD Webinar 5
quick poll
6/6/2013
ASQ RD Webinar 6
Agenda
• Brief Review of Bayesian Method
• Examples of Reduced Test Sample Sizes
• Comparative Reliability Decision Making
• Question and Answer
6/6/2013
ASQ RD Webinar 7
Agenda
• Brief Review of Bayesian Method
• Examples of Reduced Test Sample Sizes
• Comparative Reliability Decision Making
• Question and Answer
6/6/2013
ASQ RD Webinar 8
When reliability follows the exponential TTF model (eg the flat constant failure rate portion of Bathtub Curve):
CLASSICAL FRAMEWORK– The mean time between failures (MTBF) is one fixed unknown value -
there is no “probability” associated with it– Failure data from a test or observation period allows you to make
inferences about the value of the true unknown MTBF ( = 1/l )– No other data are used and no “judgment” - the procedure is objective
and based solely on the test data and the assumed HPP model
BAYESIAN FRAMEWORK
– The MTBF is a random quantity with a probability distribution– Prior to running the test, you already have some idea of what the
MTBF probability distribution looks like based on prior test data or an consensus engineering judgment
– Upon collecting failure data you incorporate the knowledge to refine the distribution of the possible values for l
6/6/2013
ASQ RD Webinar 9
Bayesian Core IdeaWhat you knew before WYKB.
“Prior” New Data
Best possible update of WYKB adjusted by the New Data.
“Posterior”
6/6/2013
𝑔 ( 𝜆 ) {𝑡 𝑖}
𝑔 (𝜆|{𝑡𝑖 })
𝐿 ( {𝑡 𝑖}|𝜆)= ∏uncensored
𝑓 (𝑡 𝑗|𝜆 ) ∏c ensored
(1−𝐹 (𝑡𝑘|𝜆))
“The probability of l beforenew data comes in”
“The likelihood of obtaining given parameter l”
“The probability of parameter l given “
“A new set of failure times“
ASQ RD Webinar 10
Conjugate Prior
• When the functional form of the posterior is the same as that of the prior (as modified by Bayesian likelihood/normalization kernel), that is known as a “conjugate prior”
• Similar concept as eigenfunction. • Conjugate priors are convenient to use due to
tractability and interpretation when possible.
6/6/2013
ASQ RD Webinar 11
Gamma is the conjugate prior for exponential TTF (const failure rate)
b has units of timea is dimensionless
6/6/2013
𝑔 ( 𝜆 ;𝑎 ,𝑏 )= 𝑏𝑎
Γ (𝑎 )𝜆𝑎−1𝑒−𝑏𝜆
Mean lave = a/b Variance s2 = a/b2
In Excel
=GAMMA.DIST(l, a, 1/b, FALSE)
pdfGamma distribution
ASQ RD Webinar 126/6/2013
𝐺 ( 𝜆;𝑎 ,𝑏 )= 1Γ (𝑎)
𝛾 (𝑎 ,𝑏𝜆 )
𝐺 ( 𝜆;𝑎 ,𝑏)=𝐺 (𝑏𝜆 ;𝑎 ,1 )
Where g (x, y) is the lower incomplete gamma function. Note that
In Excel
p =GAMMA.DIST(l, a, 1/b, TRUE)
and its inverse
l = GAMMA.INV(p, a, 1/b)
CDF G(l) is the prob p that the failure rate is less than or equal to l
Gamma distribution
ASQ RD Webinar 13
Bayesian assumptions for the gamma exponential system model
1. Failure times for the system under investigation can be adequately modeled by the exponential distribution with constant failure rate.
2. The MTBF for the system can be regarded as chosen from a prior distribution model that is an analytic representation of our previous information or judgments about the system's reliability. The form of this prior model is the gamma distribution (the conjugate prior for the exponential model). The prior model is actually defined for l = 1/MTBF.
3. Our prior knowledge is used to choose the gamma parameters a and b for the prior distribution model for l. There are a number of ways to convert prior knowledge to gamma parameters.
6/6/2013
ASQ RD Webinar 14
New data is collected …New information is combined with the gamma prior model to produce a gamma posterior distribution. After a new test is run with T additional system operating hours, and
r new failures, The resultant posterior distribution for failure rate l remains gamma (since conjugate), with new parameters
a' = a + rb' = b + T
6/6/2013
ASQ RD Webinar 15
Reliability estimation with Bayesian gamma prior model
6/6/2013
ASQ RD Webinar 16
Gamma Prior Method 1: Previous Test Data
1. Actual data from previous testing done on the system (or a
system believed to have the same reliability as the one under
investigation) is the most credible prior knowledge, and the
easiest to use. Simply set
a = total number of failures from all the previous data, and
b = total of all the previous test hours.
6/6/2013
ASQ RD Webinar 17
Gamma prior method 2: “50/95”
2. A consensus method for determining a and b that works well is the following: Assemble a group of engineers who know the system and its sub-components well from a reliability viewpoint.
A. Have the group reach agreement on a reasonable MTBF they expect the system to have. They could each pick a number they would be willing to bet even money that the system would either meet or miss, and the average or median of these numbers would be their 50% best guess for the MTBF. Or they could just discuss even-money MTBF candidates until a consensus is reached.
B. Repeat the process again, this time reaching agreement on a low MTBF they expect the system to exceed. A "5%" value that they are "95% confident" the system will exceed (i.e., they would give 19 to 1 odds) is a good choice. Or a "10%" value might be chosen (i.e., they would give 9 to 1 odds the actual MTBF exceeds the low MTBF). Use whichever percentile choice the group prefers.
C. Call the reasonable MTBF MTBF50 and the low MTBF you are 95% confident the system will exceedMTBF05. These two numbers uniquely determine gamma parameters a and b that have percentile values at the right locations
Called the 50/95 method (or the 50/90 method if one uses MTBF10 , etc.)
6/6/2013
ASQ RD Webinar 18
Gamma prior method 3: weak prior a = 1
3. Obtain consensus is on a reasonable expected MTBF, called MTBF50. Next, however, the group decides they want a weak prior that will change rapidly, based on new test data. If the prior parameter "a" is set to 1, the gamma has a standard deviation equal to its mean, which makes it spread out, or "weak".
To set the 50th percentile we must choose b = ln 2 × MTBF50
Note: During planning of Bayesian tests, this weak prior is actually a very friendly prior in terms of saving test time.
6/6/2013
ASQ RD Webinar 19
Special Case: a = 1 (The "Weak" Prior)
When the prior is a weak prior with a = 1, the Bayesian test is always shorter than the classical test. There is a very simple way to calculate the required Bayesian test time when the prior is a weak prior with a = 1. First calculate the classical/frequentist test time. Call this Tc. The Bayesian test time is T = Tc - b. If the b parameter was set equal to (ln 2) × MTBF50(where MTBF50 is the consensus choice for an "even money" MTBF), then T = Tc - (ln 2) × MTBF50
When a weak prior is used, the Bayesian test time is always less than the corresponding classical test time. That is why this prior is also known as a friendly prior.
This prior essentially sets the “order of magnitude” for the MTBF6/6/2013
ASQ RD Webinar 20
RemarksMany variations are possible, based on the above three methods. For example, you might have prior data from sources with various levels of applicability or suitability relative to the system under investigation. Thus, you may decide to "weight" the prior data by 0.5, to "weaken" it. This can be implemented by setting a = 0.5 x the number of fails in the prior data and b = 0.5 times the number of test hours. That spreads out the prior distribution more, and lets it be influenced more quickly by freshly accumulated test data. Most importantly, prior distribution needs to be technically credible, knowledge-based and unbiased.
6/6/2013
ASQ RD Webinar 21
WEIBULLEXAMPLE
6/6/2013
𝑔 (𝜆 ,𝑘|{𝑡 𝑖})
k
lx
TTF CDF
TTF pdf
What if we know the failure rate isn’t constant?
censored data
ASQ RD Webinar 22
Weibull Continued
• If scale q unknown, shape b known
• If scale q known, shape b unknown
6/6/2013
ASQ RD Webinar 23
Agenda
• Brief Review of Bayesian Method
• Examples of Reduced Test Sample Sizes
• Comparative Reliability Decision Making
• Question and Answer
6/6/2013
ASQ RD Webinar 24
Agenda
• Brief Review of Bayesian Method
• Examples of Reduced Test Sample Sizes
• Comparative Reliability Decision Making
• Question and Answer
6/6/2013
“Knowledge as an accelerant”
ASQ RD Webinar 25
Bayesian Test PlanningGamma prior parameters a and b and a stated MTBF = M objective.
Goal: Confirm system has MTBF of at least M at the 100×(1-a ) confidence level. Pick a maximum number of failures, r, allowed during the test.
Compute a test time T such that we can endure r failures and still "pass" the test. The posterior gamma distribution will have (worst case - assuming exactly r failures) new parameters of
a ' = a + r, and b' = b + T
Passing the test means the failure rate λ1- α ,
the upper 100×(1- a) %-tile for the posterior gamma, has to equal the target failure rate 1/M. By definition, this is the inverse CDF G -1(1- a; a', b').The required test time would be:
6/6/2013
𝑇=𝑀𝐺−1 (1−𝛼 ;𝑎+𝑟 ,1 )−𝑏λ1- α = 1/M
1- a
ASQ RD Webinar 26
Example: 50/95 Method Prior A group of engineers, discussing the reliability of a new piece of equipment, decide
to use the 50/95 method to convert their knowledge into a Bayesian gamma prior. Consensus is reached on:
likely MTBF50 value of 600 hrs, and a low MTBF05 value of 250 hrs
Corresponding parameters solved
a = 2.863
b = 1522.46 hrs
These prior parameters “pre-load” the failure rate distribution50% prob of l < 1/600 = 1.67e-3 hrs-1
95% prob of l < 1/250 = 4.00e-3 hrs-16/6/2013
ASQ RD Webinar 27
Example: Bayesian Optimization
System has MTBF requirement of M = 500 hrs at 80 % confidence (a = 0.2).
Test time needed to prove M ≤ 500 hrs with 80% confidence, provided the system suffers no more than two failures (r = 2).
Obtain T = {500 hrs} × (G -1(1-0.2; 2.863+2, 1)) – {1522.46 hrs} = 1756 hrs
If the test then runs for 1756 hrs, with no more than two failures, an MTBF of at least 500 hrs has
been confirmed at 80 % confidence.
The classical (non-Bayesian) test time required would have been (is) 2140 hrs.
The Bayesian test saves about 384 hrs, or an 18 % $avings.
If, instead, a weak prior had been chosen with same 600 hr MTBF50 the required test time would have been
1724 hrs, a savings of roughly 416 hrs, a 19% time $avings vs non-Bayesian.
6/6/2013
ASQ RD Webinar 28
Post-Test Analysis
6/6/2013
ASQ RD Webinar 29
Agenda
• Brief Review of Bayesian Method
• Examples of Reduced Test Sample Sizes
• Comparative Reliability Decision Making
• Question and Answer
6/6/2013
ASQ RD Webinar 30
Agenda
• Brief Review of Bayesian Method
• Examples of Reduced Test Sample Sizes
• Comparative Reliability Decision Making
• Question and Answer
6/6/2013
ASQ RD Webinar 31
EXCEL SPREADSHEET EXAMPLES
6/6/2013
Yes, the Excel spreadsheet will be available along with webinar slides.
“do it live”
ASQ RD Webinar 32
Agenda
• Brief Review of Bayesian Method
• Examples of Reduced Test Sample Sizes
• Comparative Reliability Decision Making
• Question and Answer
6/6/2013
ASQ RD Webinar 33
Agenda
• Brief Review of Bayesian Method
• Examples of Reduced Test Sample Sizes
• Comparative Reliability Decision Making
• Question and Answer
6/6/2013
ASQ RD Webinar 34
References and Further Reading
• NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/,
April (2012)
• Statistical Methods for Reliability Data, WQ Meeker and LA Escobar (1998)
• Applied Reliability, 2nd edition, PA Tobias and DC Trindade (1995)
• Bayesian Reliability, MS Hamada, AG Wilson, CS Reese, and HF Martz, Springer Series in Statistics
(2008)
• Bayesian Reliability Analysis, HF Martz and RA Waller (1982)
• Methods for Statistical Analysis of Reliability and Life Data, NR Mann, RE Schafer, and ND
Singpurwalla (1974)
• Bayes is for the Birds, RA Evans, IEEE Transactions on Reliability R-38, 401 (1989).
• “A Compendium of Conjugate Priors,” Daniel Fink (1997)
6/6/2013
ASQ RD Webinar 35
final quic
k poll
6/6/2013