+ All Categories
Home > Documents > Research Article Defining Sample Quantiles by the True Rank...

Research Article Defining Sample Quantiles by the True Rank...

Date post: 31-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
7
Research Article Defining Sample Quantiles by the True Rank Probability Lasse Makkonen 1 and Matti Pajari 2 1 VTT Technical Research Centre of Finland, 02044 Espoo, Finland 2 Berakon, Espoo, Finland Correspondence should be addressed to Lasse Makkonen; lasse.makkonen@vtt.fi Received 30 June 2014; Revised 10 November 2014; Accepted 11 November 2014; Published 8 December 2014 Academic Editor: Z. D. Bai Copyright © 2014 L. Makkonen and M. Pajari. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Many definitions exist for sample quantiles and are included in statistical soſtware. e need to adopt a standard definition of sample quantiles has been recognized and different definitions have been compared in terms of satisfying some desirable properties, but no consensus has been found. We outline here that comparisons of the sample quantile definitions are irrelevant because the probabilities associated with order-ranked sample values are known exactly. Accordingly, the standard definition for sample quantiles should be based on the true rank probabilities. We show that this allows more accurate inference of the tails of the distribution, and thus improves estimation of the probability of extreme events. 1. Introduction e quantile of a continuous, strictly monotonous distribu- tion function is defined as () = −1 () , (1) where is the probability of nonexceedance of a variable value. When the distribution is unknown, sample quantiles provide estimators of their population counterparts based on a set of independent order-ranked observations 1 ,..., . e associated sample probabilities are then 1 ,..., , where is the probability of a new sampled value +1 being less than or equal to . ese nonexceedance probabilities are those defining the cumulative distribution function (CDF). Many different formulas for defining sample quantiles have been used in literature and statistical soſtware. is has caused considerable confusion, in particular when perform- ing extreme value analysis for various applications where probabilities of rare events need to be estimated. In a widely cited article Hyndman and Fan [1] identified this problem and emphasized that there is a need to adopt a standard definition for sample quantiles. e same problem was discussed again by Langford [2] who identified twelve different sample quan- tile definitions that are used in statistical soſtware. Hyndman and Fan [1] analysed nine different sample quantile definitions. ey selected six “desirable properties” for an estimator of a sample quantile and considered how well different definitions satisfy them. is approach is similar to judging the plotting position estimators by five “postulates” as done by Gumbel [3] and by three “purposes” by Kimball [4]. Hyndman and Fan [1] proposed = ( − 1/3)/( + 1/3) to be used as the basis of the standard definition. However, the definition of the quantile has not yet been standardized. Modern statistical soſtware, such as Matlab, Excel, SciPy, STATA, , and , include different definitions and offer user-selected options for the formulation of the quantile function, as well as for plotting positions in quantile plots and quantile-quantile plots; see, for example, Castillo- Guti´ errez et al. [5]. e inability to agree on a standard def- inition has arisen from the many proposals [68] and the subjective nature of the “criteria” and “desired properties.” Since the quantile function is the reverse of the cumula- tive distribution function, the quality of its definition must be judged by how close the probabilities defined by it are to the true probabilities of the cumulative distribution function. us, the definition of a sample quantile function should be based on the true nonexceedance probabilities. It was pointed Hindawi Publishing Corporation Journal of Probability and Statistics Volume 2014, Article ID 326579, 6 pages http://dx.doi.org/10.1155/2014/326579
Transcript
Page 1: Research Article Defining Sample Quantiles by the True Rank …downloads.hindawi.com/journals/jps/2014/326579.pdf · 2019-07-31 · Research Article Defining Sample Quantiles by the

Research ArticleDefining Sample Quantiles by the True Rank Probability

Lasse Makkonen1 and Matti Pajari2

1VTT Technical Research Centre of Finland 02044 Espoo Finland2Berakon Espoo Finland

Correspondence should be addressed to Lasse Makkonen lassemakkonenvttfi

Received 30 June 2014 Revised 10 November 2014 Accepted 11 November 2014 Published 8 December 2014

Academic Editor Z D Bai

Copyright copy 2014 L Makkonen and M Pajari This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Many definitions exist for sample quantiles and are included in statistical software The need to adopt a standard definition ofsample quantiles has been recognized and different definitions have been compared in terms of satisfying some desirable propertiesbut no consensus has been found We outline here that comparisons of the sample quantile definitions are irrelevant becausethe probabilities associated with order-ranked sample values are known exactly Accordingly the standard definition for samplequantiles should be based on the true rank probabilities We show that this allows more accurate inference of the tails of thedistribution and thus improves estimation of the probability of extreme events

1 Introduction

The quantile of a continuous strictly monotonous distribu-tion function 119865 is defined as

119876 (119901) = 119865minus1

(119901) (1)

where 119901 is the probability of nonexceedance of a variablevalue When the distribution is unknown sample quantilesprovide estimators of their population counterparts based ona set of independent order-ranked observations 119883

1

119883119899

The associated sample probabilities are then119901

1

119901119899

where119901119898

is the probability of a new sampled value 119883119899+1

being lessthan or equal to 119883

119898

These nonexceedance probabilities arethose defining the cumulative distribution function (CDF)

Many different formulas for defining sample quantileshave been used in literature and statistical software This hascaused considerable confusion in particular when perform-ing extreme value analysis for various applications whereprobabilities of rare events need to be estimated In a widelycited articleHyndman and Fan [1] identified this problem andemphasized that there is a need to adopt a standard definitionfor sample quantiles The same problem was discussed againby Langford [2] who identified twelve different sample quan-tile definitions that are used in statistical software

Hyndman and Fan [1] analysed nine different samplequantile definitions They selected six ldquodesirable propertiesrdquofor an estimator of a sample quantile and considered howwelldifferent definitions satisfy them This approach is similar tojudging the plotting position estimators by five ldquopostulatesrdquo asdone by Gumbel [3] and by three ldquopurposesrdquo by Kimball [4]Hyndman and Fan [1] proposed 119901

119898

= (119898 minus 13)(119899 + 13) tobe used as the basis of the standard definition

However the definition of the quantile has not yet beenstandardized Modern statistical software such as MatlabExcel SciPy STATA 119866 and 119877 include different definitionsand offer user-selected options for the formulation of thequantile function as well as for plotting positions in quantileplots and quantile-quantile plots see for example Castillo-Gutierrez et al [5] The inability to agree on a standard def-inition has arisen from the many proposals [6ndash8] and thesubjective nature of the ldquocriteriardquo and ldquodesired propertiesrdquo

Since the quantile function is the reverse of the cumula-tive distribution function the quality of its definition mustbe judged by how close the probabilities defined by it are tothe true probabilities of the cumulative distribution functionThus the definition of a sample quantile function should bebased on the true nonexceedance probabilities It was pointed

Hindawi Publishing CorporationJournal of Probability and StatisticsVolume 2014 Article ID 326579 6 pageshttpdxdoiorg1011552014326579

2 Journal of Probability and Statistics

X1 Xk Xk+1 Xnmiddot middot middot middot middot middot

Xn+1

(a)

X1 Xk Xk+1 Xnmiddot middot middot middot middot middotXn+1

(b)

Figure 1 Re-ranking when adding a new sample

out by Makkonen [9] that for order-ranked data they areknown exactly We outline here two rigorous proofs of thisconclusion and show how the appropriate definition for thesample quantile function follows from it

2 Sample Probabilities

We present in the following two deductions of the probability119901119898

Consider in Figure 1 an order-ranked sample (a) of 119899

random observations (white circles) and a new observation(grey circle) sampled randomly from the population thedistribution of which is unknown In the new sample (b)obtained by including the new observation the new valuemay fall in any interval of the original sample or be smallerthan 119883

1

or larger than 119883119899

In the sample each observation1198831

119883119899+1

has the same probability 1(119899 + 1) to be thesmallest one In particular 119875(119883

119899+1

is smallest) = 119875(119883119899+1

le

1198831

) = 1(119899 + 1)In the same way each observation 119883

1

119883119899+1

has thesame probability to be the 119898th in order where 119898 = 2 119899In particular 119875(119883

119898minus1

lt 119883119899+1

le 119883119898

) = 1(119899 + 1) Conse-quently the probability of the value119883

119899+1

to be smaller than orequal to the119898th value of the original sample equals119898(119899+1)

The result deduced above can also be derived by for-mal mathematics [10] Consider variate 119909 with cumulativedistribution function 119865 and a sample of 119899 observations1198831

119883119899

ranked in ascending order Values119883119898

in differentsamples of size 119899 are random values of variate 119909

119898

for whichthe probability density function 119891

119898

in terms of 119865 and itsderivative 1198651015840 = 119891 is given by [3 11]

119891119898

(119909119898

) = 119898(119899

119898)119865 (119909

119898

)119898minus1

[1 minus 119865 (119909119898

)]119873minus119898

119891 (119909119898

) (2)

We wish to associate a probability 119901119898

= 119875119909 le 119909119898

with eachobserved rank 119898 The precise meaning of 119901

119898

is illustrated inFigure 2 Since the probability of event 119909 le 119909

119898

is controlledby two variates 119909

119898

and 119909 the probability is obtained byintegrating the joint density function 119891

119909

119898119909

of the variates 119909119898

and 119909 over the area where 119909 le 119909119898

Due to the mutual independence of 119909

119898

and 119909 their jointdensity function 119891

119883

119898119883

(119909119898

119909) equals 119891119898

(119909119898

)119891(119909) where 119891119898

and119891 = 1198651015840 are the density functions of119909119898

and119909 respectively

x

xm

x=x m

fx119898x

Figure 2 Contours of joint density function 119891119909

119898119909

drawn in 119909119898

119909-plane 119891

119909

119898119909

integrated over the grey half plane where 119909 le 119909119898

givesthe probability 119875119909 le 119909

119898

The nonexceedance probability 119875119909 le 119909119898

is thereforeobtained by integration of the joint density function over zone119909 le 119909

119898

119875 119909 le 119909119898

= int

+infin

minusinfin

int

119909

119898

minusinfin

119891119898

(119909119898

) 119891 (119909) 119889119909119898

119889119909

= int

+infin

minusinfin

[int

119909

119898

minusinfin

119891 (119909) 119889119909]119891119898

(119909119898

) 119889119909119898

= int

+infin

minusinfin

119865 (119909119898

) 119891119898

(119909119898

) 119889119909119898

= int

+infin

minusinfin

119865 (119909119898

)119898(119899

119898)119865 (119909

119898

)119898minus1

times (1 minus 119865 (119909119898

))119899minus119898

119891 (119909119898

) 119889119909119898

= 119898(119899

119898)int

1

0

119865119898

(1 minus 119865)119899minus119898

119889119865

= sdot sdot sdot =119898

119899 + 1

(3)

The last step in the deduction above is based on directapplication of Eulerrsquos 120573-function

3 Plotting Positions

One might expect that based on (3) and the definition ofCDF 119898(119899 + 1) would have been the sample probabilityused by everybody Unfortunately this is not so as discussedfor example in the reviews [12 13] Even though 119898(119899 + 1)has been recommended already by Weibull [7] and used bynumerous researchers since the 1950s there has been a lotof research aiming at ldquoimprovingrdquo the probability by usingprobability estimates called plotting positions of the form

Journal of Probability and Statistics 3

(119898 + 119886)(119899 + 119887) where 119886 and 119887 are constants dependingon the type of parent distribution size of the sample andso forth These attempts include for example Benard andBos-Levenbach [14] Blom [6] Langbein [15] Gringorten [8]Wilk and Gnanadesikan [16] Barnett [17] Cunnane [18]Guo [19] Jones [20] Yu and Huang [21] and Folland andAnderson [22] To the effect of adding to this complexity evennumerical methods to calculate plotting positions have beenproposed [23]

The main reason for this trend appears to be a misun-derstanding in the role of the mean value both in theoreticalconsiderations and when interpreting the results of MonteCarlo simulations This is shown in the Appendix in whichalso the ideal performance of theWeibull positions is demon-strated in one special case using the bin frequency criterionintroduced by Makkonen et al [24] According to the binfrequency criterion the method of least squares (MLS) usingthe Weibull probabilities gives a better distribution functionthan MLS using any of the plotting positions of CunnaneGringorten and Blom

4 Sample Quantiles

It is apparent that even if there may be some reasons forexample bias in the MLS to use the linear transformationsof 119898(119899 + 1) that is the plotting positions of the typeof Cunnane Blom Gringorten Hyndman and Fan and soforth in curve fitting none of them is valid when the sampleprobabilities are plotted The definition of a sample quantiledoes not need to be based on any ldquoestimatorrdquo of 119901

119898

Itcan be defined objectively based on the true nonexceedanceprobability 119901

119898

= 119898(119899 + 1) of a new value drawn fromthe population For variable values in between the observedvalues one may interpolate linearly Accordingly we definethe following

For119898 = 1 119899

119901119898

=119898

119899 + 1 (4)

For119898 = 1 119899 minus 1 and 119901119898

le 119901 le 119901119898+1

we define

119876 (119901) = (119899 + 1) (119883119898+1

minus 119883119898

) (119901 minus 119901119898

) + 119883119898

(5)

The definition of the quantile function in (5) is illustrated inFigure 3

The definition of (5) has all the desired properties ofHyndman and Fan [1] The huge advantage of (5) over allother suggestions for the sample quantile definition is thatit is based on the true probabilities This makes it possibleto standardize the sample quantile function in statisticalsoftware by sound theoretical basis instead of subjectivecriteria

Using any other definition for example 119898119899 promotedby Langford [2] as an estimate of the probability 119901

119898

results in significant relative errors particularly at the tailsof the distribution This is illustrated in Figure 4 showing

p5

9 + 1

1

9 + 1

1

08

06

04

02

0X1 X5 X

Figure 3 Illustration of the proposed sample quantile definition byan example where 119899 = 9

minus10

minus08

minus06

minus04

minus02

000 10 20 30 40

Rela

tive e

rror

m = n

m = 2

m = n minus 1

m = n minus 2

m = n2m = 1

n

Figure 4 Relative error in the exceedance probability 1 minus 119901119898

resulting from the conventional definition [2] of the empiricaldistribution function for order-ranked observations 119883

1

1198832

119883119899

as a function of the sample size 119899 for different ranks119898

the relative errors for different ranks 119898 The relative error inthe exceedance probability 1 minus 119901

119898

is defined as

[(1 minus 119898119899) minus (1 minus 119898 (119899 + 1))]

[1 minus 119898 (119899 + 1)]= sdot sdot sdot = minus

119898

119899 (6)

Note that at the upper tail of the distribution the relativeerror remains large even when the sample size 119899 goes toinfinity The error of using 119898119899 or some other improperestimate of the probability plotting positions is particularlypersistent in the extreme value analysis methods [13] Thisconfusion alone justifies the definition as proposed here

5 Conclusions

The sample quantiles can be defined by the true nonex-ceedance probabilities119898(119899 + 1) of the order-ranked samplevalues Following the basic principles of the probabilitycalculus the sample quantile function can therefore bedefined by (5)

This definition removes the methodological uncertaintyrelated to calculating sample quantiles and should be adoptedas a standard in statistical software

4 Journal of Probability and Statistics

The claim that the true nonexceedance probabilities so-called Weibull plotting positions result in a biased estimateof a CDF is shown to be false and founded on a misunder-standing in theoretical considerations and when interpretingthe results of Monte Carlo simulations

The definition of the quantile function proposed hereshould of course be applied to its reverse function EDF aswell and used in the inference of data This is particularlyimportant in the extreme value analysis where probabilitiesof rare events need to be estimated

Appendix

Evaluating Plotting Positions

Consider a normally distributed variate 119909 with mean 0standard deviation 1 and distribution function 119865 Take 119899values 119909

119898

= 119865minus1

(119898(119899 + 1)) 119898 = 1 119899 Assume that119866 represents the function which transforms the probabilitiesto the probability paper that is all points (119909

119898

119866(119901119898

)) fallon the same straight line which represents the cumulativedistribution function (CDF) The slope and intercept of thestraight line are independent of 119899

Plot next the points (119909119898

119866(1199011015840

119898

)) on the same probabilitypaper Here values 119909

119898

are the same as those above butthe probabilities 1199011015840

119898

represent Blomrsquos [6] plotting position(119898 minus 0375)(119899 + 025) which have been developed fornormal distribution With increasing 119899 the resulting curveapproaches the correct straight line but they never coincideas illustrated in Figure 5 In this waywe have detected sampleswhich are correctly represented by the Weibull plottingpositions and incorrectly by any other Vice versa choosingproperly plotting positions (119898+119886)(119899+ 119887)with (119886 119887) = (0 1)we get a curve which by a linear regression can be forcedto compensate for any error somewhere else This is exactlywhat has been done in the history of plotting positions Thereason for and the nature of the error are characterized in thefollowing

First the reasons why the Weibull [7] plotting positionshave been objected in the literature are discussed

(1) Gumbel showed that the expected value of the prob-ability 119865(119909

119898

) is 119864(119865(119909119898

)) = 119898(119899 + 1) and forexample Langbein [15] argued that the probabilityof the next observation to fall in interval (119909

119898

119909119898+1

)

is 1(119899 + 1) but these observations have not beenregarded as rigorous justifications to use the Weibullpositions It is likely that the terminology used forexample by Chow [25] (mean number of exceedancesin 119873 future trials) and by Langbein [15] (mean valueof exceedance probabilities) has not been understoodas giving an ordinate on the CDF A rigorous proof for119875(119909 le 119909

119898

) = 119898(119899+ 1)was presented in the textbookby Madsen et al [11] but this has not deserved muchattention in the later research

(2) The way of thinking for some researchers has beenthat the sample values are given and the probabilitiesassociated with them are random while the correctway is to think vice versa that the sample probabilities

minus25

minus2

minus15

minus1

minus05

0

05

1

15

2

25

minus25 minus2 minus15 minus1 minus05 0 05 1 15 2 25

Blom10

Blom20

Blom50

Weibull

G(p)

x

Figure 5 cdf (straight line) of a normal distribution on probabilitypaper and the curves due to using Blomrsquos plotting positions [6]instead of the true probabilities Sample sizes 10 20 and 50

are exact and the sample values associated with themare variates Some others for example Benson [26]appear to have fully understood that the plottingpositions different from those of Weibull are notprobabilities Nevertheless for example Cunnane[18] claimed that it is not necessary to use the trueprobabilities because the final result that is the result-ing regression line is decisive Cunnane [18] tried toestimate the probability 119865(119864(119909

119898

)) and used it as theplotting positionwhen determining the CDF In otherwords he used 119875(119909 le 119864(119909

119898

)) to represent 119875(119909 le 119909119898

)although they are different concepts It is not at allsurprising that the Weibull positions correspondingto the latter probability are not representative of theformer

(3) In Monte Carlo simulations by the Weibull positionsthe conventional curve fitting procedures like theMLS tend to result in parameter estimates the meansof which do not coincide with the parameters of thedistribution from which the samples are taken It hasbeen observed that the difference can be reduced bytransforming the Weibull points linearly that is byreplacing probabilities 119898(119899 + 1) by (119898 + 119886)(119899 +119887) = [(119899 + 1)(119899 + 119887)][119898(119899 + 1)] + 119886(119899 + 119887)Geometrically this simply means that to improve thefit the straight line on the probability paper resultingfrom linear regression is replaced by another straightline In more detail when points (119909

119898119894

119898(119899 + 1))

are replaced by points (119909119898119894

(119898 + 119886)(119899 + 119887)) thelinearity is also affected as illustrated in Figure 5Thiseffect remains hidden however because the linearregression forces the fitted curve to be linear Thebehaviour described above has often been explainedby stating that the Weibull probabilities are ldquobiasedrdquoor more ldquobiasedrdquo than some other probabilities Thisis misuse of terminologyThe bias is defined as 119864(119886)minus119886 where 119864(119886) is the expected value of an estimator119886 determined from a sample and 119886 is the correct

Journal of Probability and Statistics 5

parameter value The Weibull probabilities are exactvalues in the same way as 12 is the probability ofheads when tossing a coin There is no need forestimation here

In contrast to the abovementioned arguments against theWeibull positions the bias in the parameters resulting fromthe traditional Monte Carlo simulations by the Weibullpositions is in fact attributable to taking the mean of theparameter estimates and to the curve fitting method Fromthe mathematics we know that if 119901 and 119902 are nonlinearlyrelated as 119901 = 119892(119902) it follows that 119864(119901) = 119892(119864(119902)) Fromthe elementary statistics we know that the sample varianceis a biased estimate of the population variance Why shouldwe then believe that the mean of standard deviations orany other distribution parameters obtained from successivesamples (119909

1119894

1199092119894

119909119899119894

) would approach the parameterof the population On probability paper the slope of theregression line represents 119889119909119889(119866(119875)) where 119866 stands for aproper nonlinear transformation of the probability axisThusthere is no a priori reason to expect that a mean of sampleslopes in MC simulations presents something relevant in theprobabilistic sense

Consequently the convergence of the mean of successiveparameter estimates to the correct parameter value cannot beregarded as a goodness criterion for plotting positions Weshould use a criterion based on the bin frequency insteadbecause it is the frequency by which probability is definedFrom a parent distribution with given parameters take asample of size 119899 find the estimated straight line 119866(119875) = 119896119909 +119888 take from the parent distribution one additional randomvalue 119909

119899+1

record the bin [0 1(119899 + 1)] (1(119899 + 1) 2(119899 +1)] (119899(119899+ 1) 1] to which 119875

119899+1

= 119866minus1

(119896119909119899+1

+ 119888) belongsand repeat the steps Auniformdistribution of hits to each binmeans that the method has been successful The fit on eachbin ((119898 minus 1)(119899 + 1) 119898(119899 + 1)] can be considered separatelyusing the criterion

1198622

119898

= (119873119898

119873minus

1

119899 + 1)

2

(A1)

or the whole distribution by

1198622

=

119899+1

sum

119898=1

1198622

119898

(A2)

Here 119873 is the number of simulations and 119873119898

is the numberof hits to bin119898

Such an analysis was made by Makkonen et al [24]The simulations verified that the Weibull positions give themost accurate estimate in the sense of criterion (A2) forall considered distributions that is for Gumbel Weibullnormal and lognormal distribution Another simulation forthe Gumbel distribution 119865(119909) = exp(minus exp(minus(119909 minus 120583)120573))with mean = 5 and standard deviation = 2 was carried outusing the Weibull and Gringorten plotting positions and itsresults are presented in Figure 6 Sample size 2 was chosen to

000

005

010

015

020

025

030

035

040

1 2 3

Relat

ive f

requ

ency

in b

in

Bin

ExactWeibull

Gringorten

Figure 6 Relative frequency of hits in probability intervals 1 2 and3 ([0 13] (13 23] and (23 1]) when using Weibullrsquos [7] andGringortenrsquos [8] plotting positions in Monte Carlo simulations with50000 cycles

Table 1 Parameters obtained for Gumbel distribution from MCsimulation with 50000 cycles These parameters cannot be used inevaluating plotting positions (see text)

Exact parameter Estimated parameterWeibull Gringorten

Mean 5 5381 5196Standard deviation 2 2787 1893120583 4099918 4127 4345120573 1559394 2173 1476

eliminate the possible bias due to the linear regression Theresults confirm the performance of the Weibull positions inthe same way as a nearly uniform distribution in numbers1 6 confirms that the die is fairThe erroneous deductionbased on calculating the mean of the estimated distributionparameters 120583 and 120573 results in a traditional (and incorrect)conclusion that the Weibull positions are worse than thoseproposed by Gringorten [8] The source of this misunder-standing is demonstrated in Table 1

The discussion above shows that all the claims presentedagainst the Weibull plotting positions are unfounded Par-ticularly the recent Monte Carlo simulations supportingthese claims [27ndash33] are based on a misunderstood role ofthe mean of the sample parameters The performance ofa single fitted curve should always be compared with theWeibull probabilities plotted against the observed order-ranked values The plotting positions different from those ofWeibull should not be used to eliminate the error observedwhen a mean of the sample estimates is taken inMonte Carlosimulations because such an error never occurs in a practicalsituationwhere we have only one sample and one estimate foreach parameter

6 Journal of Probability and Statistics

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Academy of Finland via theFICCA programme

References

[1] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo American Statistician vol 50 no 4 pp 361ndash3651996

[2] E Langford ldquoQuartiles in elementary statisticsrdquo Journal ofStatistics Education vol 14 no 3 2006

[3] E J Gumbel Statistics of Extremes Columbia University PressNew York 1958

[4] B F Kimball ldquoOn the choice of plotting positions on probabilitypaperrdquo Journal of the American Statistical Association vol 55pp 546ndash560 1960

[5] S Castillo-Gutierrez E Lozano-Aguilera and M D Estudillo-Martınez ldquoSelection of a plotting position for a normal Q-Qplot R scriptrdquo Journal of Communication and Computer vol 9pp 243ndash250 2012

[6] G Blom Statistical Estimates and Transformed Beta-VariablesJohn Wiley amp Sons 1958

[7] W Weibull ldquoA statistical theory of strength of materialsrdquoIngeniorsvetenskapakademiens Handlingar vol 151 45 pages1939

[8] I I Gringorten ldquoA plotting rule for extreme probability paperrdquoJournal of Geophysical Research vol 68 no 3 pp 813ndash814 1963

[9] LMakkonen ldquoBringing closure to the plotting position contro-versyrdquo Communications in StatisticsmdashTheory and Methods vol37 no 3ndash5 pp 460ndash467 2008

[10] L Makkonen M Pajari and M Tikanmaki ldquoClosure to ldquoProb-lems in the extreme value analysisrdquo (Struct Safety 200830405ndash419)rdquo Structural Safety vol 40 pp 65ndash67 2013

[11] H O Madsen S Krenk and N C Lind Methods of StructuralSafety Prentice-Hall Englewood Cliffs NJ USA 1986

[12] H L Harter ldquoAnother look at plotting positionsrdquo Communica-tions in Statistics-Theory and Methods vol 13 no 13 pp 1613ndash1633 1984

[13] L Makkonen ldquoProblems in the extreme value analysisrdquo Struc-tural Safety vol 30 pp 405ndash419 2008

[14] A Benard and E C Bos-Levenbach ldquoThe plotting of observa-tions on probability paperrdquo Statistica vol 7 pp 163ndash173 1953

[15] W B Langbein ldquoPlotting positions in frequency analysisrdquo USGeological Survey WatermdashSupply Paper 1543-A 1960

[16] M BWilk and R Gnanadesikan ldquoProbability plottingmethodsfor the analysis of datardquo Biometrika vol 55 no 1 pp 1ndash17 1968

[17] V Barnett ldquoProbability plotting methods and order statisticsrdquoJournal of the Royal Statistical Society C Applied Statistics vol24 no 1 pp 95ndash108 1975

[18] C Cunnane ldquoUnbiased plotting positionsmdasha reviewrdquo Journal ofHydrology vol 37 no 3-4 pp 205ndash222 1978

[19] S L Guo ldquoA discussion on unbiased plotting positions for thegeneral extreme value distributionrdquo Journal of Hydrology vol121 pp 33ndash44 1990

[20] D A Jones ldquoPlotting positions via maximum-likelihood for anon-standard situationrdquo Hydrology and Earth System Sciencesvol 1 no 2 pp 357ndash366 1997

[21] G-H Yu and C-C Huang ldquoA distribution free plotting posi-tionrdquo Stochastic Environmental Research and Risk Assessmentvol 15 no 6 pp 462ndash476 2001

[22] C Folland and C Anderson ldquoEstimating changing extremesusing empirical ranking methodsrdquo Journal of Climate vol 15pp 2954ndash2960 2002

[23] R I Harris ldquoGumbel re-visitedmdasha new look at extreme valuestatistics applied to wind speedsrdquo Journal of Wind Engineeringand Industrial Aerodynamics vol 59 no 1 pp 1ndash22 1996

[24] L Makkonen M Pajari and M Tikanmaki ldquoDiscussion onldquoPlotting positions for fitting distributions and extreme valueanalysisrdquordquo Canadian Journal of Civil Engineering vol 40 no 9pp 927ndash929 2013

[25] V T ChowHandbook of Applied Hydrology McGraw-Hill NewYork NY USA 1964

[26] M A Benson ldquoPlotting positions and economics of engineer-ing planningrdquo Journal of the Hydraulics Division vol 88 no 6pp 57ndash71 1962

[27] R I Harris ldquoThe accuracy of design values predicted fromextreme value analysisrdquo Journal of Wind Engineering and Indus-trial Aerodynamics vol 89 no 2 pp 153ndash164 2001

[28] F Mehdi and J Mehdi ldquoDetermination of plotting posi-tion formula for the normal log-normal Pearson( III) log-Pearson(III) and Gumble distributional hypotheses using theprobability plot correlation coefficient testrdquo World AppliedSciences Journal vol 15 no 8 pp 1181ndash1185 2011

[29] N J Cook ldquoRebuttal of ldquoproblems in the extreme valueanalysisrdquordquo Structural Safety vol 34 no 1 pp 418ndash423 2012

[30] A S Yahaya M N Nor N R M Jali N A Ramli FAhmad and A Z Ul-Saufie ldquoDetermination of the probabilityplotting position for type I extreme value distributionrdquo Journalof Applied Sciences vol 12 no 14 pp 1501ndash1506 2012

[31] A S Yahaya C S Yee N A Ramli and F Ahmad ldquoDetermi-nation of the best probability plotting position for predictingparameters of theWeibull distributionrdquo International Journal ofApplied Science and Technology vol 2 pp 106ndash111 2012

[32] S KimH Shin K Joo and J-HHeo ldquoDevelopment of plottingposition for the general extreme value distributionrdquo Journal ofHydrology vol 475 pp 259ndash269 2012

[33] M Fuglem G Parr and I J Jordaan ldquoPlotting positionsfor fitting distributions and extreme value analysisrdquo CanadianJournal of Civil Engineering vol 40 no 2 pp 130ndash139 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Defining Sample Quantiles by the True Rank …downloads.hindawi.com/journals/jps/2014/326579.pdf · 2019-07-31 · Research Article Defining Sample Quantiles by the

2 Journal of Probability and Statistics

X1 Xk Xk+1 Xnmiddot middot middot middot middot middot

Xn+1

(a)

X1 Xk Xk+1 Xnmiddot middot middot middot middot middotXn+1

(b)

Figure 1 Re-ranking when adding a new sample

out by Makkonen [9] that for order-ranked data they areknown exactly We outline here two rigorous proofs of thisconclusion and show how the appropriate definition for thesample quantile function follows from it

2 Sample Probabilities

We present in the following two deductions of the probability119901119898

Consider in Figure 1 an order-ranked sample (a) of 119899

random observations (white circles) and a new observation(grey circle) sampled randomly from the population thedistribution of which is unknown In the new sample (b)obtained by including the new observation the new valuemay fall in any interval of the original sample or be smallerthan 119883

1

or larger than 119883119899

In the sample each observation1198831

119883119899+1

has the same probability 1(119899 + 1) to be thesmallest one In particular 119875(119883

119899+1

is smallest) = 119875(119883119899+1

le

1198831

) = 1(119899 + 1)In the same way each observation 119883

1

119883119899+1

has thesame probability to be the 119898th in order where 119898 = 2 119899In particular 119875(119883

119898minus1

lt 119883119899+1

le 119883119898

) = 1(119899 + 1) Conse-quently the probability of the value119883

119899+1

to be smaller than orequal to the119898th value of the original sample equals119898(119899+1)

The result deduced above can also be derived by for-mal mathematics [10] Consider variate 119909 with cumulativedistribution function 119865 and a sample of 119899 observations1198831

119883119899

ranked in ascending order Values119883119898

in differentsamples of size 119899 are random values of variate 119909

119898

for whichthe probability density function 119891

119898

in terms of 119865 and itsderivative 1198651015840 = 119891 is given by [3 11]

119891119898

(119909119898

) = 119898(119899

119898)119865 (119909

119898

)119898minus1

[1 minus 119865 (119909119898

)]119873minus119898

119891 (119909119898

) (2)

We wish to associate a probability 119901119898

= 119875119909 le 119909119898

with eachobserved rank 119898 The precise meaning of 119901

119898

is illustrated inFigure 2 Since the probability of event 119909 le 119909

119898

is controlledby two variates 119909

119898

and 119909 the probability is obtained byintegrating the joint density function 119891

119909

119898119909

of the variates 119909119898

and 119909 over the area where 119909 le 119909119898

Due to the mutual independence of 119909

119898

and 119909 their jointdensity function 119891

119883

119898119883

(119909119898

119909) equals 119891119898

(119909119898

)119891(119909) where 119891119898

and119891 = 1198651015840 are the density functions of119909119898

and119909 respectively

x

xm

x=x m

fx119898x

Figure 2 Contours of joint density function 119891119909

119898119909

drawn in 119909119898

119909-plane 119891

119909

119898119909

integrated over the grey half plane where 119909 le 119909119898

givesthe probability 119875119909 le 119909

119898

The nonexceedance probability 119875119909 le 119909119898

is thereforeobtained by integration of the joint density function over zone119909 le 119909

119898

119875 119909 le 119909119898

= int

+infin

minusinfin

int

119909

119898

minusinfin

119891119898

(119909119898

) 119891 (119909) 119889119909119898

119889119909

= int

+infin

minusinfin

[int

119909

119898

minusinfin

119891 (119909) 119889119909]119891119898

(119909119898

) 119889119909119898

= int

+infin

minusinfin

119865 (119909119898

) 119891119898

(119909119898

) 119889119909119898

= int

+infin

minusinfin

119865 (119909119898

)119898(119899

119898)119865 (119909

119898

)119898minus1

times (1 minus 119865 (119909119898

))119899minus119898

119891 (119909119898

) 119889119909119898

= 119898(119899

119898)int

1

0

119865119898

(1 minus 119865)119899minus119898

119889119865

= sdot sdot sdot =119898

119899 + 1

(3)

The last step in the deduction above is based on directapplication of Eulerrsquos 120573-function

3 Plotting Positions

One might expect that based on (3) and the definition ofCDF 119898(119899 + 1) would have been the sample probabilityused by everybody Unfortunately this is not so as discussedfor example in the reviews [12 13] Even though 119898(119899 + 1)has been recommended already by Weibull [7] and used bynumerous researchers since the 1950s there has been a lotof research aiming at ldquoimprovingrdquo the probability by usingprobability estimates called plotting positions of the form

Journal of Probability and Statistics 3

(119898 + 119886)(119899 + 119887) where 119886 and 119887 are constants dependingon the type of parent distribution size of the sample andso forth These attempts include for example Benard andBos-Levenbach [14] Blom [6] Langbein [15] Gringorten [8]Wilk and Gnanadesikan [16] Barnett [17] Cunnane [18]Guo [19] Jones [20] Yu and Huang [21] and Folland andAnderson [22] To the effect of adding to this complexity evennumerical methods to calculate plotting positions have beenproposed [23]

The main reason for this trend appears to be a misun-derstanding in the role of the mean value both in theoreticalconsiderations and when interpreting the results of MonteCarlo simulations This is shown in the Appendix in whichalso the ideal performance of theWeibull positions is demon-strated in one special case using the bin frequency criterionintroduced by Makkonen et al [24] According to the binfrequency criterion the method of least squares (MLS) usingthe Weibull probabilities gives a better distribution functionthan MLS using any of the plotting positions of CunnaneGringorten and Blom

4 Sample Quantiles

It is apparent that even if there may be some reasons forexample bias in the MLS to use the linear transformationsof 119898(119899 + 1) that is the plotting positions of the typeof Cunnane Blom Gringorten Hyndman and Fan and soforth in curve fitting none of them is valid when the sampleprobabilities are plotted The definition of a sample quantiledoes not need to be based on any ldquoestimatorrdquo of 119901

119898

Itcan be defined objectively based on the true nonexceedanceprobability 119901

119898

= 119898(119899 + 1) of a new value drawn fromthe population For variable values in between the observedvalues one may interpolate linearly Accordingly we definethe following

For119898 = 1 119899

119901119898

=119898

119899 + 1 (4)

For119898 = 1 119899 minus 1 and 119901119898

le 119901 le 119901119898+1

we define

119876 (119901) = (119899 + 1) (119883119898+1

minus 119883119898

) (119901 minus 119901119898

) + 119883119898

(5)

The definition of the quantile function in (5) is illustrated inFigure 3

The definition of (5) has all the desired properties ofHyndman and Fan [1] The huge advantage of (5) over allother suggestions for the sample quantile definition is thatit is based on the true probabilities This makes it possibleto standardize the sample quantile function in statisticalsoftware by sound theoretical basis instead of subjectivecriteria

Using any other definition for example 119898119899 promotedby Langford [2] as an estimate of the probability 119901

119898

results in significant relative errors particularly at the tailsof the distribution This is illustrated in Figure 4 showing

p5

9 + 1

1

9 + 1

1

08

06

04

02

0X1 X5 X

Figure 3 Illustration of the proposed sample quantile definition byan example where 119899 = 9

minus10

minus08

minus06

minus04

minus02

000 10 20 30 40

Rela

tive e

rror

m = n

m = 2

m = n minus 1

m = n minus 2

m = n2m = 1

n

Figure 4 Relative error in the exceedance probability 1 minus 119901119898

resulting from the conventional definition [2] of the empiricaldistribution function for order-ranked observations 119883

1

1198832

119883119899

as a function of the sample size 119899 for different ranks119898

the relative errors for different ranks 119898 The relative error inthe exceedance probability 1 minus 119901

119898

is defined as

[(1 minus 119898119899) minus (1 minus 119898 (119899 + 1))]

[1 minus 119898 (119899 + 1)]= sdot sdot sdot = minus

119898

119899 (6)

Note that at the upper tail of the distribution the relativeerror remains large even when the sample size 119899 goes toinfinity The error of using 119898119899 or some other improperestimate of the probability plotting positions is particularlypersistent in the extreme value analysis methods [13] Thisconfusion alone justifies the definition as proposed here

5 Conclusions

The sample quantiles can be defined by the true nonex-ceedance probabilities119898(119899 + 1) of the order-ranked samplevalues Following the basic principles of the probabilitycalculus the sample quantile function can therefore bedefined by (5)

This definition removes the methodological uncertaintyrelated to calculating sample quantiles and should be adoptedas a standard in statistical software

4 Journal of Probability and Statistics

The claim that the true nonexceedance probabilities so-called Weibull plotting positions result in a biased estimateof a CDF is shown to be false and founded on a misunder-standing in theoretical considerations and when interpretingthe results of Monte Carlo simulations

The definition of the quantile function proposed hereshould of course be applied to its reverse function EDF aswell and used in the inference of data This is particularlyimportant in the extreme value analysis where probabilitiesof rare events need to be estimated

Appendix

Evaluating Plotting Positions

Consider a normally distributed variate 119909 with mean 0standard deviation 1 and distribution function 119865 Take 119899values 119909

119898

= 119865minus1

(119898(119899 + 1)) 119898 = 1 119899 Assume that119866 represents the function which transforms the probabilitiesto the probability paper that is all points (119909

119898

119866(119901119898

)) fallon the same straight line which represents the cumulativedistribution function (CDF) The slope and intercept of thestraight line are independent of 119899

Plot next the points (119909119898

119866(1199011015840

119898

)) on the same probabilitypaper Here values 119909

119898

are the same as those above butthe probabilities 1199011015840

119898

represent Blomrsquos [6] plotting position(119898 minus 0375)(119899 + 025) which have been developed fornormal distribution With increasing 119899 the resulting curveapproaches the correct straight line but they never coincideas illustrated in Figure 5 In this waywe have detected sampleswhich are correctly represented by the Weibull plottingpositions and incorrectly by any other Vice versa choosingproperly plotting positions (119898+119886)(119899+ 119887)with (119886 119887) = (0 1)we get a curve which by a linear regression can be forcedto compensate for any error somewhere else This is exactlywhat has been done in the history of plotting positions Thereason for and the nature of the error are characterized in thefollowing

First the reasons why the Weibull [7] plotting positionshave been objected in the literature are discussed

(1) Gumbel showed that the expected value of the prob-ability 119865(119909

119898

) is 119864(119865(119909119898

)) = 119898(119899 + 1) and forexample Langbein [15] argued that the probabilityof the next observation to fall in interval (119909

119898

119909119898+1

)

is 1(119899 + 1) but these observations have not beenregarded as rigorous justifications to use the Weibullpositions It is likely that the terminology used forexample by Chow [25] (mean number of exceedancesin 119873 future trials) and by Langbein [15] (mean valueof exceedance probabilities) has not been understoodas giving an ordinate on the CDF A rigorous proof for119875(119909 le 119909

119898

) = 119898(119899+ 1)was presented in the textbookby Madsen et al [11] but this has not deserved muchattention in the later research

(2) The way of thinking for some researchers has beenthat the sample values are given and the probabilitiesassociated with them are random while the correctway is to think vice versa that the sample probabilities

minus25

minus2

minus15

minus1

minus05

0

05

1

15

2

25

minus25 minus2 minus15 minus1 minus05 0 05 1 15 2 25

Blom10

Blom20

Blom50

Weibull

G(p)

x

Figure 5 cdf (straight line) of a normal distribution on probabilitypaper and the curves due to using Blomrsquos plotting positions [6]instead of the true probabilities Sample sizes 10 20 and 50

are exact and the sample values associated with themare variates Some others for example Benson [26]appear to have fully understood that the plottingpositions different from those of Weibull are notprobabilities Nevertheless for example Cunnane[18] claimed that it is not necessary to use the trueprobabilities because the final result that is the result-ing regression line is decisive Cunnane [18] tried toestimate the probability 119865(119864(119909

119898

)) and used it as theplotting positionwhen determining the CDF In otherwords he used 119875(119909 le 119864(119909

119898

)) to represent 119875(119909 le 119909119898

)although they are different concepts It is not at allsurprising that the Weibull positions correspondingto the latter probability are not representative of theformer

(3) In Monte Carlo simulations by the Weibull positionsthe conventional curve fitting procedures like theMLS tend to result in parameter estimates the meansof which do not coincide with the parameters of thedistribution from which the samples are taken It hasbeen observed that the difference can be reduced bytransforming the Weibull points linearly that is byreplacing probabilities 119898(119899 + 1) by (119898 + 119886)(119899 +119887) = [(119899 + 1)(119899 + 119887)][119898(119899 + 1)] + 119886(119899 + 119887)Geometrically this simply means that to improve thefit the straight line on the probability paper resultingfrom linear regression is replaced by another straightline In more detail when points (119909

119898119894

119898(119899 + 1))

are replaced by points (119909119898119894

(119898 + 119886)(119899 + 119887)) thelinearity is also affected as illustrated in Figure 5Thiseffect remains hidden however because the linearregression forces the fitted curve to be linear Thebehaviour described above has often been explainedby stating that the Weibull probabilities are ldquobiasedrdquoor more ldquobiasedrdquo than some other probabilities Thisis misuse of terminologyThe bias is defined as 119864(119886)minus119886 where 119864(119886) is the expected value of an estimator119886 determined from a sample and 119886 is the correct

Journal of Probability and Statistics 5

parameter value The Weibull probabilities are exactvalues in the same way as 12 is the probability ofheads when tossing a coin There is no need forestimation here

In contrast to the abovementioned arguments against theWeibull positions the bias in the parameters resulting fromthe traditional Monte Carlo simulations by the Weibullpositions is in fact attributable to taking the mean of theparameter estimates and to the curve fitting method Fromthe mathematics we know that if 119901 and 119902 are nonlinearlyrelated as 119901 = 119892(119902) it follows that 119864(119901) = 119892(119864(119902)) Fromthe elementary statistics we know that the sample varianceis a biased estimate of the population variance Why shouldwe then believe that the mean of standard deviations orany other distribution parameters obtained from successivesamples (119909

1119894

1199092119894

119909119899119894

) would approach the parameterof the population On probability paper the slope of theregression line represents 119889119909119889(119866(119875)) where 119866 stands for aproper nonlinear transformation of the probability axisThusthere is no a priori reason to expect that a mean of sampleslopes in MC simulations presents something relevant in theprobabilistic sense

Consequently the convergence of the mean of successiveparameter estimates to the correct parameter value cannot beregarded as a goodness criterion for plotting positions Weshould use a criterion based on the bin frequency insteadbecause it is the frequency by which probability is definedFrom a parent distribution with given parameters take asample of size 119899 find the estimated straight line 119866(119875) = 119896119909 +119888 take from the parent distribution one additional randomvalue 119909

119899+1

record the bin [0 1(119899 + 1)] (1(119899 + 1) 2(119899 +1)] (119899(119899+ 1) 1] to which 119875

119899+1

= 119866minus1

(119896119909119899+1

+ 119888) belongsand repeat the steps Auniformdistribution of hits to each binmeans that the method has been successful The fit on eachbin ((119898 minus 1)(119899 + 1) 119898(119899 + 1)] can be considered separatelyusing the criterion

1198622

119898

= (119873119898

119873minus

1

119899 + 1)

2

(A1)

or the whole distribution by

1198622

=

119899+1

sum

119898=1

1198622

119898

(A2)

Here 119873 is the number of simulations and 119873119898

is the numberof hits to bin119898

Such an analysis was made by Makkonen et al [24]The simulations verified that the Weibull positions give themost accurate estimate in the sense of criterion (A2) forall considered distributions that is for Gumbel Weibullnormal and lognormal distribution Another simulation forthe Gumbel distribution 119865(119909) = exp(minus exp(minus(119909 minus 120583)120573))with mean = 5 and standard deviation = 2 was carried outusing the Weibull and Gringorten plotting positions and itsresults are presented in Figure 6 Sample size 2 was chosen to

000

005

010

015

020

025

030

035

040

1 2 3

Relat

ive f

requ

ency

in b

in

Bin

ExactWeibull

Gringorten

Figure 6 Relative frequency of hits in probability intervals 1 2 and3 ([0 13] (13 23] and (23 1]) when using Weibullrsquos [7] andGringortenrsquos [8] plotting positions in Monte Carlo simulations with50000 cycles

Table 1 Parameters obtained for Gumbel distribution from MCsimulation with 50000 cycles These parameters cannot be used inevaluating plotting positions (see text)

Exact parameter Estimated parameterWeibull Gringorten

Mean 5 5381 5196Standard deviation 2 2787 1893120583 4099918 4127 4345120573 1559394 2173 1476

eliminate the possible bias due to the linear regression Theresults confirm the performance of the Weibull positions inthe same way as a nearly uniform distribution in numbers1 6 confirms that the die is fairThe erroneous deductionbased on calculating the mean of the estimated distributionparameters 120583 and 120573 results in a traditional (and incorrect)conclusion that the Weibull positions are worse than thoseproposed by Gringorten [8] The source of this misunder-standing is demonstrated in Table 1

The discussion above shows that all the claims presentedagainst the Weibull plotting positions are unfounded Par-ticularly the recent Monte Carlo simulations supportingthese claims [27ndash33] are based on a misunderstood role ofthe mean of the sample parameters The performance ofa single fitted curve should always be compared with theWeibull probabilities plotted against the observed order-ranked values The plotting positions different from those ofWeibull should not be used to eliminate the error observedwhen a mean of the sample estimates is taken inMonte Carlosimulations because such an error never occurs in a practicalsituationwhere we have only one sample and one estimate foreach parameter

6 Journal of Probability and Statistics

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Academy of Finland via theFICCA programme

References

[1] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo American Statistician vol 50 no 4 pp 361ndash3651996

[2] E Langford ldquoQuartiles in elementary statisticsrdquo Journal ofStatistics Education vol 14 no 3 2006

[3] E J Gumbel Statistics of Extremes Columbia University PressNew York 1958

[4] B F Kimball ldquoOn the choice of plotting positions on probabilitypaperrdquo Journal of the American Statistical Association vol 55pp 546ndash560 1960

[5] S Castillo-Gutierrez E Lozano-Aguilera and M D Estudillo-Martınez ldquoSelection of a plotting position for a normal Q-Qplot R scriptrdquo Journal of Communication and Computer vol 9pp 243ndash250 2012

[6] G Blom Statistical Estimates and Transformed Beta-VariablesJohn Wiley amp Sons 1958

[7] W Weibull ldquoA statistical theory of strength of materialsrdquoIngeniorsvetenskapakademiens Handlingar vol 151 45 pages1939

[8] I I Gringorten ldquoA plotting rule for extreme probability paperrdquoJournal of Geophysical Research vol 68 no 3 pp 813ndash814 1963

[9] LMakkonen ldquoBringing closure to the plotting position contro-versyrdquo Communications in StatisticsmdashTheory and Methods vol37 no 3ndash5 pp 460ndash467 2008

[10] L Makkonen M Pajari and M Tikanmaki ldquoClosure to ldquoProb-lems in the extreme value analysisrdquo (Struct Safety 200830405ndash419)rdquo Structural Safety vol 40 pp 65ndash67 2013

[11] H O Madsen S Krenk and N C Lind Methods of StructuralSafety Prentice-Hall Englewood Cliffs NJ USA 1986

[12] H L Harter ldquoAnother look at plotting positionsrdquo Communica-tions in Statistics-Theory and Methods vol 13 no 13 pp 1613ndash1633 1984

[13] L Makkonen ldquoProblems in the extreme value analysisrdquo Struc-tural Safety vol 30 pp 405ndash419 2008

[14] A Benard and E C Bos-Levenbach ldquoThe plotting of observa-tions on probability paperrdquo Statistica vol 7 pp 163ndash173 1953

[15] W B Langbein ldquoPlotting positions in frequency analysisrdquo USGeological Survey WatermdashSupply Paper 1543-A 1960

[16] M BWilk and R Gnanadesikan ldquoProbability plottingmethodsfor the analysis of datardquo Biometrika vol 55 no 1 pp 1ndash17 1968

[17] V Barnett ldquoProbability plotting methods and order statisticsrdquoJournal of the Royal Statistical Society C Applied Statistics vol24 no 1 pp 95ndash108 1975

[18] C Cunnane ldquoUnbiased plotting positionsmdasha reviewrdquo Journal ofHydrology vol 37 no 3-4 pp 205ndash222 1978

[19] S L Guo ldquoA discussion on unbiased plotting positions for thegeneral extreme value distributionrdquo Journal of Hydrology vol121 pp 33ndash44 1990

[20] D A Jones ldquoPlotting positions via maximum-likelihood for anon-standard situationrdquo Hydrology and Earth System Sciencesvol 1 no 2 pp 357ndash366 1997

[21] G-H Yu and C-C Huang ldquoA distribution free plotting posi-tionrdquo Stochastic Environmental Research and Risk Assessmentvol 15 no 6 pp 462ndash476 2001

[22] C Folland and C Anderson ldquoEstimating changing extremesusing empirical ranking methodsrdquo Journal of Climate vol 15pp 2954ndash2960 2002

[23] R I Harris ldquoGumbel re-visitedmdasha new look at extreme valuestatistics applied to wind speedsrdquo Journal of Wind Engineeringand Industrial Aerodynamics vol 59 no 1 pp 1ndash22 1996

[24] L Makkonen M Pajari and M Tikanmaki ldquoDiscussion onldquoPlotting positions for fitting distributions and extreme valueanalysisrdquordquo Canadian Journal of Civil Engineering vol 40 no 9pp 927ndash929 2013

[25] V T ChowHandbook of Applied Hydrology McGraw-Hill NewYork NY USA 1964

[26] M A Benson ldquoPlotting positions and economics of engineer-ing planningrdquo Journal of the Hydraulics Division vol 88 no 6pp 57ndash71 1962

[27] R I Harris ldquoThe accuracy of design values predicted fromextreme value analysisrdquo Journal of Wind Engineering and Indus-trial Aerodynamics vol 89 no 2 pp 153ndash164 2001

[28] F Mehdi and J Mehdi ldquoDetermination of plotting posi-tion formula for the normal log-normal Pearson( III) log-Pearson(III) and Gumble distributional hypotheses using theprobability plot correlation coefficient testrdquo World AppliedSciences Journal vol 15 no 8 pp 1181ndash1185 2011

[29] N J Cook ldquoRebuttal of ldquoproblems in the extreme valueanalysisrdquordquo Structural Safety vol 34 no 1 pp 418ndash423 2012

[30] A S Yahaya M N Nor N R M Jali N A Ramli FAhmad and A Z Ul-Saufie ldquoDetermination of the probabilityplotting position for type I extreme value distributionrdquo Journalof Applied Sciences vol 12 no 14 pp 1501ndash1506 2012

[31] A S Yahaya C S Yee N A Ramli and F Ahmad ldquoDetermi-nation of the best probability plotting position for predictingparameters of theWeibull distributionrdquo International Journal ofApplied Science and Technology vol 2 pp 106ndash111 2012

[32] S KimH Shin K Joo and J-HHeo ldquoDevelopment of plottingposition for the general extreme value distributionrdquo Journal ofHydrology vol 475 pp 259ndash269 2012

[33] M Fuglem G Parr and I J Jordaan ldquoPlotting positionsfor fitting distributions and extreme value analysisrdquo CanadianJournal of Civil Engineering vol 40 no 2 pp 130ndash139 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Defining Sample Quantiles by the True Rank …downloads.hindawi.com/journals/jps/2014/326579.pdf · 2019-07-31 · Research Article Defining Sample Quantiles by the

Journal of Probability and Statistics 3

(119898 + 119886)(119899 + 119887) where 119886 and 119887 are constants dependingon the type of parent distribution size of the sample andso forth These attempts include for example Benard andBos-Levenbach [14] Blom [6] Langbein [15] Gringorten [8]Wilk and Gnanadesikan [16] Barnett [17] Cunnane [18]Guo [19] Jones [20] Yu and Huang [21] and Folland andAnderson [22] To the effect of adding to this complexity evennumerical methods to calculate plotting positions have beenproposed [23]

The main reason for this trend appears to be a misun-derstanding in the role of the mean value both in theoreticalconsiderations and when interpreting the results of MonteCarlo simulations This is shown in the Appendix in whichalso the ideal performance of theWeibull positions is demon-strated in one special case using the bin frequency criterionintroduced by Makkonen et al [24] According to the binfrequency criterion the method of least squares (MLS) usingthe Weibull probabilities gives a better distribution functionthan MLS using any of the plotting positions of CunnaneGringorten and Blom

4 Sample Quantiles

It is apparent that even if there may be some reasons forexample bias in the MLS to use the linear transformationsof 119898(119899 + 1) that is the plotting positions of the typeof Cunnane Blom Gringorten Hyndman and Fan and soforth in curve fitting none of them is valid when the sampleprobabilities are plotted The definition of a sample quantiledoes not need to be based on any ldquoestimatorrdquo of 119901

119898

Itcan be defined objectively based on the true nonexceedanceprobability 119901

119898

= 119898(119899 + 1) of a new value drawn fromthe population For variable values in between the observedvalues one may interpolate linearly Accordingly we definethe following

For119898 = 1 119899

119901119898

=119898

119899 + 1 (4)

For119898 = 1 119899 minus 1 and 119901119898

le 119901 le 119901119898+1

we define

119876 (119901) = (119899 + 1) (119883119898+1

minus 119883119898

) (119901 minus 119901119898

) + 119883119898

(5)

The definition of the quantile function in (5) is illustrated inFigure 3

The definition of (5) has all the desired properties ofHyndman and Fan [1] The huge advantage of (5) over allother suggestions for the sample quantile definition is thatit is based on the true probabilities This makes it possibleto standardize the sample quantile function in statisticalsoftware by sound theoretical basis instead of subjectivecriteria

Using any other definition for example 119898119899 promotedby Langford [2] as an estimate of the probability 119901

119898

results in significant relative errors particularly at the tailsof the distribution This is illustrated in Figure 4 showing

p5

9 + 1

1

9 + 1

1

08

06

04

02

0X1 X5 X

Figure 3 Illustration of the proposed sample quantile definition byan example where 119899 = 9

minus10

minus08

minus06

minus04

minus02

000 10 20 30 40

Rela

tive e

rror

m = n

m = 2

m = n minus 1

m = n minus 2

m = n2m = 1

n

Figure 4 Relative error in the exceedance probability 1 minus 119901119898

resulting from the conventional definition [2] of the empiricaldistribution function for order-ranked observations 119883

1

1198832

119883119899

as a function of the sample size 119899 for different ranks119898

the relative errors for different ranks 119898 The relative error inthe exceedance probability 1 minus 119901

119898

is defined as

[(1 minus 119898119899) minus (1 minus 119898 (119899 + 1))]

[1 minus 119898 (119899 + 1)]= sdot sdot sdot = minus

119898

119899 (6)

Note that at the upper tail of the distribution the relativeerror remains large even when the sample size 119899 goes toinfinity The error of using 119898119899 or some other improperestimate of the probability plotting positions is particularlypersistent in the extreme value analysis methods [13] Thisconfusion alone justifies the definition as proposed here

5 Conclusions

The sample quantiles can be defined by the true nonex-ceedance probabilities119898(119899 + 1) of the order-ranked samplevalues Following the basic principles of the probabilitycalculus the sample quantile function can therefore bedefined by (5)

This definition removes the methodological uncertaintyrelated to calculating sample quantiles and should be adoptedas a standard in statistical software

4 Journal of Probability and Statistics

The claim that the true nonexceedance probabilities so-called Weibull plotting positions result in a biased estimateof a CDF is shown to be false and founded on a misunder-standing in theoretical considerations and when interpretingthe results of Monte Carlo simulations

The definition of the quantile function proposed hereshould of course be applied to its reverse function EDF aswell and used in the inference of data This is particularlyimportant in the extreme value analysis where probabilitiesof rare events need to be estimated

Appendix

Evaluating Plotting Positions

Consider a normally distributed variate 119909 with mean 0standard deviation 1 and distribution function 119865 Take 119899values 119909

119898

= 119865minus1

(119898(119899 + 1)) 119898 = 1 119899 Assume that119866 represents the function which transforms the probabilitiesto the probability paper that is all points (119909

119898

119866(119901119898

)) fallon the same straight line which represents the cumulativedistribution function (CDF) The slope and intercept of thestraight line are independent of 119899

Plot next the points (119909119898

119866(1199011015840

119898

)) on the same probabilitypaper Here values 119909

119898

are the same as those above butthe probabilities 1199011015840

119898

represent Blomrsquos [6] plotting position(119898 minus 0375)(119899 + 025) which have been developed fornormal distribution With increasing 119899 the resulting curveapproaches the correct straight line but they never coincideas illustrated in Figure 5 In this waywe have detected sampleswhich are correctly represented by the Weibull plottingpositions and incorrectly by any other Vice versa choosingproperly plotting positions (119898+119886)(119899+ 119887)with (119886 119887) = (0 1)we get a curve which by a linear regression can be forcedto compensate for any error somewhere else This is exactlywhat has been done in the history of plotting positions Thereason for and the nature of the error are characterized in thefollowing

First the reasons why the Weibull [7] plotting positionshave been objected in the literature are discussed

(1) Gumbel showed that the expected value of the prob-ability 119865(119909

119898

) is 119864(119865(119909119898

)) = 119898(119899 + 1) and forexample Langbein [15] argued that the probabilityof the next observation to fall in interval (119909

119898

119909119898+1

)

is 1(119899 + 1) but these observations have not beenregarded as rigorous justifications to use the Weibullpositions It is likely that the terminology used forexample by Chow [25] (mean number of exceedancesin 119873 future trials) and by Langbein [15] (mean valueof exceedance probabilities) has not been understoodas giving an ordinate on the CDF A rigorous proof for119875(119909 le 119909

119898

) = 119898(119899+ 1)was presented in the textbookby Madsen et al [11] but this has not deserved muchattention in the later research

(2) The way of thinking for some researchers has beenthat the sample values are given and the probabilitiesassociated with them are random while the correctway is to think vice versa that the sample probabilities

minus25

minus2

minus15

minus1

minus05

0

05

1

15

2

25

minus25 minus2 minus15 minus1 minus05 0 05 1 15 2 25

Blom10

Blom20

Blom50

Weibull

G(p)

x

Figure 5 cdf (straight line) of a normal distribution on probabilitypaper and the curves due to using Blomrsquos plotting positions [6]instead of the true probabilities Sample sizes 10 20 and 50

are exact and the sample values associated with themare variates Some others for example Benson [26]appear to have fully understood that the plottingpositions different from those of Weibull are notprobabilities Nevertheless for example Cunnane[18] claimed that it is not necessary to use the trueprobabilities because the final result that is the result-ing regression line is decisive Cunnane [18] tried toestimate the probability 119865(119864(119909

119898

)) and used it as theplotting positionwhen determining the CDF In otherwords he used 119875(119909 le 119864(119909

119898

)) to represent 119875(119909 le 119909119898

)although they are different concepts It is not at allsurprising that the Weibull positions correspondingto the latter probability are not representative of theformer

(3) In Monte Carlo simulations by the Weibull positionsthe conventional curve fitting procedures like theMLS tend to result in parameter estimates the meansof which do not coincide with the parameters of thedistribution from which the samples are taken It hasbeen observed that the difference can be reduced bytransforming the Weibull points linearly that is byreplacing probabilities 119898(119899 + 1) by (119898 + 119886)(119899 +119887) = [(119899 + 1)(119899 + 119887)][119898(119899 + 1)] + 119886(119899 + 119887)Geometrically this simply means that to improve thefit the straight line on the probability paper resultingfrom linear regression is replaced by another straightline In more detail when points (119909

119898119894

119898(119899 + 1))

are replaced by points (119909119898119894

(119898 + 119886)(119899 + 119887)) thelinearity is also affected as illustrated in Figure 5Thiseffect remains hidden however because the linearregression forces the fitted curve to be linear Thebehaviour described above has often been explainedby stating that the Weibull probabilities are ldquobiasedrdquoor more ldquobiasedrdquo than some other probabilities Thisis misuse of terminologyThe bias is defined as 119864(119886)minus119886 where 119864(119886) is the expected value of an estimator119886 determined from a sample and 119886 is the correct

Journal of Probability and Statistics 5

parameter value The Weibull probabilities are exactvalues in the same way as 12 is the probability ofheads when tossing a coin There is no need forestimation here

In contrast to the abovementioned arguments against theWeibull positions the bias in the parameters resulting fromthe traditional Monte Carlo simulations by the Weibullpositions is in fact attributable to taking the mean of theparameter estimates and to the curve fitting method Fromthe mathematics we know that if 119901 and 119902 are nonlinearlyrelated as 119901 = 119892(119902) it follows that 119864(119901) = 119892(119864(119902)) Fromthe elementary statistics we know that the sample varianceis a biased estimate of the population variance Why shouldwe then believe that the mean of standard deviations orany other distribution parameters obtained from successivesamples (119909

1119894

1199092119894

119909119899119894

) would approach the parameterof the population On probability paper the slope of theregression line represents 119889119909119889(119866(119875)) where 119866 stands for aproper nonlinear transformation of the probability axisThusthere is no a priori reason to expect that a mean of sampleslopes in MC simulations presents something relevant in theprobabilistic sense

Consequently the convergence of the mean of successiveparameter estimates to the correct parameter value cannot beregarded as a goodness criterion for plotting positions Weshould use a criterion based on the bin frequency insteadbecause it is the frequency by which probability is definedFrom a parent distribution with given parameters take asample of size 119899 find the estimated straight line 119866(119875) = 119896119909 +119888 take from the parent distribution one additional randomvalue 119909

119899+1

record the bin [0 1(119899 + 1)] (1(119899 + 1) 2(119899 +1)] (119899(119899+ 1) 1] to which 119875

119899+1

= 119866minus1

(119896119909119899+1

+ 119888) belongsand repeat the steps Auniformdistribution of hits to each binmeans that the method has been successful The fit on eachbin ((119898 minus 1)(119899 + 1) 119898(119899 + 1)] can be considered separatelyusing the criterion

1198622

119898

= (119873119898

119873minus

1

119899 + 1)

2

(A1)

or the whole distribution by

1198622

=

119899+1

sum

119898=1

1198622

119898

(A2)

Here 119873 is the number of simulations and 119873119898

is the numberof hits to bin119898

Such an analysis was made by Makkonen et al [24]The simulations verified that the Weibull positions give themost accurate estimate in the sense of criterion (A2) forall considered distributions that is for Gumbel Weibullnormal and lognormal distribution Another simulation forthe Gumbel distribution 119865(119909) = exp(minus exp(minus(119909 minus 120583)120573))with mean = 5 and standard deviation = 2 was carried outusing the Weibull and Gringorten plotting positions and itsresults are presented in Figure 6 Sample size 2 was chosen to

000

005

010

015

020

025

030

035

040

1 2 3

Relat

ive f

requ

ency

in b

in

Bin

ExactWeibull

Gringorten

Figure 6 Relative frequency of hits in probability intervals 1 2 and3 ([0 13] (13 23] and (23 1]) when using Weibullrsquos [7] andGringortenrsquos [8] plotting positions in Monte Carlo simulations with50000 cycles

Table 1 Parameters obtained for Gumbel distribution from MCsimulation with 50000 cycles These parameters cannot be used inevaluating plotting positions (see text)

Exact parameter Estimated parameterWeibull Gringorten

Mean 5 5381 5196Standard deviation 2 2787 1893120583 4099918 4127 4345120573 1559394 2173 1476

eliminate the possible bias due to the linear regression Theresults confirm the performance of the Weibull positions inthe same way as a nearly uniform distribution in numbers1 6 confirms that the die is fairThe erroneous deductionbased on calculating the mean of the estimated distributionparameters 120583 and 120573 results in a traditional (and incorrect)conclusion that the Weibull positions are worse than thoseproposed by Gringorten [8] The source of this misunder-standing is demonstrated in Table 1

The discussion above shows that all the claims presentedagainst the Weibull plotting positions are unfounded Par-ticularly the recent Monte Carlo simulations supportingthese claims [27ndash33] are based on a misunderstood role ofthe mean of the sample parameters The performance ofa single fitted curve should always be compared with theWeibull probabilities plotted against the observed order-ranked values The plotting positions different from those ofWeibull should not be used to eliminate the error observedwhen a mean of the sample estimates is taken inMonte Carlosimulations because such an error never occurs in a practicalsituationwhere we have only one sample and one estimate foreach parameter

6 Journal of Probability and Statistics

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Academy of Finland via theFICCA programme

References

[1] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo American Statistician vol 50 no 4 pp 361ndash3651996

[2] E Langford ldquoQuartiles in elementary statisticsrdquo Journal ofStatistics Education vol 14 no 3 2006

[3] E J Gumbel Statistics of Extremes Columbia University PressNew York 1958

[4] B F Kimball ldquoOn the choice of plotting positions on probabilitypaperrdquo Journal of the American Statistical Association vol 55pp 546ndash560 1960

[5] S Castillo-Gutierrez E Lozano-Aguilera and M D Estudillo-Martınez ldquoSelection of a plotting position for a normal Q-Qplot R scriptrdquo Journal of Communication and Computer vol 9pp 243ndash250 2012

[6] G Blom Statistical Estimates and Transformed Beta-VariablesJohn Wiley amp Sons 1958

[7] W Weibull ldquoA statistical theory of strength of materialsrdquoIngeniorsvetenskapakademiens Handlingar vol 151 45 pages1939

[8] I I Gringorten ldquoA plotting rule for extreme probability paperrdquoJournal of Geophysical Research vol 68 no 3 pp 813ndash814 1963

[9] LMakkonen ldquoBringing closure to the plotting position contro-versyrdquo Communications in StatisticsmdashTheory and Methods vol37 no 3ndash5 pp 460ndash467 2008

[10] L Makkonen M Pajari and M Tikanmaki ldquoClosure to ldquoProb-lems in the extreme value analysisrdquo (Struct Safety 200830405ndash419)rdquo Structural Safety vol 40 pp 65ndash67 2013

[11] H O Madsen S Krenk and N C Lind Methods of StructuralSafety Prentice-Hall Englewood Cliffs NJ USA 1986

[12] H L Harter ldquoAnother look at plotting positionsrdquo Communica-tions in Statistics-Theory and Methods vol 13 no 13 pp 1613ndash1633 1984

[13] L Makkonen ldquoProblems in the extreme value analysisrdquo Struc-tural Safety vol 30 pp 405ndash419 2008

[14] A Benard and E C Bos-Levenbach ldquoThe plotting of observa-tions on probability paperrdquo Statistica vol 7 pp 163ndash173 1953

[15] W B Langbein ldquoPlotting positions in frequency analysisrdquo USGeological Survey WatermdashSupply Paper 1543-A 1960

[16] M BWilk and R Gnanadesikan ldquoProbability plottingmethodsfor the analysis of datardquo Biometrika vol 55 no 1 pp 1ndash17 1968

[17] V Barnett ldquoProbability plotting methods and order statisticsrdquoJournal of the Royal Statistical Society C Applied Statistics vol24 no 1 pp 95ndash108 1975

[18] C Cunnane ldquoUnbiased plotting positionsmdasha reviewrdquo Journal ofHydrology vol 37 no 3-4 pp 205ndash222 1978

[19] S L Guo ldquoA discussion on unbiased plotting positions for thegeneral extreme value distributionrdquo Journal of Hydrology vol121 pp 33ndash44 1990

[20] D A Jones ldquoPlotting positions via maximum-likelihood for anon-standard situationrdquo Hydrology and Earth System Sciencesvol 1 no 2 pp 357ndash366 1997

[21] G-H Yu and C-C Huang ldquoA distribution free plotting posi-tionrdquo Stochastic Environmental Research and Risk Assessmentvol 15 no 6 pp 462ndash476 2001

[22] C Folland and C Anderson ldquoEstimating changing extremesusing empirical ranking methodsrdquo Journal of Climate vol 15pp 2954ndash2960 2002

[23] R I Harris ldquoGumbel re-visitedmdasha new look at extreme valuestatistics applied to wind speedsrdquo Journal of Wind Engineeringand Industrial Aerodynamics vol 59 no 1 pp 1ndash22 1996

[24] L Makkonen M Pajari and M Tikanmaki ldquoDiscussion onldquoPlotting positions for fitting distributions and extreme valueanalysisrdquordquo Canadian Journal of Civil Engineering vol 40 no 9pp 927ndash929 2013

[25] V T ChowHandbook of Applied Hydrology McGraw-Hill NewYork NY USA 1964

[26] M A Benson ldquoPlotting positions and economics of engineer-ing planningrdquo Journal of the Hydraulics Division vol 88 no 6pp 57ndash71 1962

[27] R I Harris ldquoThe accuracy of design values predicted fromextreme value analysisrdquo Journal of Wind Engineering and Indus-trial Aerodynamics vol 89 no 2 pp 153ndash164 2001

[28] F Mehdi and J Mehdi ldquoDetermination of plotting posi-tion formula for the normal log-normal Pearson( III) log-Pearson(III) and Gumble distributional hypotheses using theprobability plot correlation coefficient testrdquo World AppliedSciences Journal vol 15 no 8 pp 1181ndash1185 2011

[29] N J Cook ldquoRebuttal of ldquoproblems in the extreme valueanalysisrdquordquo Structural Safety vol 34 no 1 pp 418ndash423 2012

[30] A S Yahaya M N Nor N R M Jali N A Ramli FAhmad and A Z Ul-Saufie ldquoDetermination of the probabilityplotting position for type I extreme value distributionrdquo Journalof Applied Sciences vol 12 no 14 pp 1501ndash1506 2012

[31] A S Yahaya C S Yee N A Ramli and F Ahmad ldquoDetermi-nation of the best probability plotting position for predictingparameters of theWeibull distributionrdquo International Journal ofApplied Science and Technology vol 2 pp 106ndash111 2012

[32] S KimH Shin K Joo and J-HHeo ldquoDevelopment of plottingposition for the general extreme value distributionrdquo Journal ofHydrology vol 475 pp 259ndash269 2012

[33] M Fuglem G Parr and I J Jordaan ldquoPlotting positionsfor fitting distributions and extreme value analysisrdquo CanadianJournal of Civil Engineering vol 40 no 2 pp 130ndash139 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Defining Sample Quantiles by the True Rank …downloads.hindawi.com/journals/jps/2014/326579.pdf · 2019-07-31 · Research Article Defining Sample Quantiles by the

4 Journal of Probability and Statistics

The claim that the true nonexceedance probabilities so-called Weibull plotting positions result in a biased estimateof a CDF is shown to be false and founded on a misunder-standing in theoretical considerations and when interpretingthe results of Monte Carlo simulations

The definition of the quantile function proposed hereshould of course be applied to its reverse function EDF aswell and used in the inference of data This is particularlyimportant in the extreme value analysis where probabilitiesof rare events need to be estimated

Appendix

Evaluating Plotting Positions

Consider a normally distributed variate 119909 with mean 0standard deviation 1 and distribution function 119865 Take 119899values 119909

119898

= 119865minus1

(119898(119899 + 1)) 119898 = 1 119899 Assume that119866 represents the function which transforms the probabilitiesto the probability paper that is all points (119909

119898

119866(119901119898

)) fallon the same straight line which represents the cumulativedistribution function (CDF) The slope and intercept of thestraight line are independent of 119899

Plot next the points (119909119898

119866(1199011015840

119898

)) on the same probabilitypaper Here values 119909

119898

are the same as those above butthe probabilities 1199011015840

119898

represent Blomrsquos [6] plotting position(119898 minus 0375)(119899 + 025) which have been developed fornormal distribution With increasing 119899 the resulting curveapproaches the correct straight line but they never coincideas illustrated in Figure 5 In this waywe have detected sampleswhich are correctly represented by the Weibull plottingpositions and incorrectly by any other Vice versa choosingproperly plotting positions (119898+119886)(119899+ 119887)with (119886 119887) = (0 1)we get a curve which by a linear regression can be forcedto compensate for any error somewhere else This is exactlywhat has been done in the history of plotting positions Thereason for and the nature of the error are characterized in thefollowing

First the reasons why the Weibull [7] plotting positionshave been objected in the literature are discussed

(1) Gumbel showed that the expected value of the prob-ability 119865(119909

119898

) is 119864(119865(119909119898

)) = 119898(119899 + 1) and forexample Langbein [15] argued that the probabilityof the next observation to fall in interval (119909

119898

119909119898+1

)

is 1(119899 + 1) but these observations have not beenregarded as rigorous justifications to use the Weibullpositions It is likely that the terminology used forexample by Chow [25] (mean number of exceedancesin 119873 future trials) and by Langbein [15] (mean valueof exceedance probabilities) has not been understoodas giving an ordinate on the CDF A rigorous proof for119875(119909 le 119909

119898

) = 119898(119899+ 1)was presented in the textbookby Madsen et al [11] but this has not deserved muchattention in the later research

(2) The way of thinking for some researchers has beenthat the sample values are given and the probabilitiesassociated with them are random while the correctway is to think vice versa that the sample probabilities

minus25

minus2

minus15

minus1

minus05

0

05

1

15

2

25

minus25 minus2 minus15 minus1 minus05 0 05 1 15 2 25

Blom10

Blom20

Blom50

Weibull

G(p)

x

Figure 5 cdf (straight line) of a normal distribution on probabilitypaper and the curves due to using Blomrsquos plotting positions [6]instead of the true probabilities Sample sizes 10 20 and 50

are exact and the sample values associated with themare variates Some others for example Benson [26]appear to have fully understood that the plottingpositions different from those of Weibull are notprobabilities Nevertheless for example Cunnane[18] claimed that it is not necessary to use the trueprobabilities because the final result that is the result-ing regression line is decisive Cunnane [18] tried toestimate the probability 119865(119864(119909

119898

)) and used it as theplotting positionwhen determining the CDF In otherwords he used 119875(119909 le 119864(119909

119898

)) to represent 119875(119909 le 119909119898

)although they are different concepts It is not at allsurprising that the Weibull positions correspondingto the latter probability are not representative of theformer

(3) In Monte Carlo simulations by the Weibull positionsthe conventional curve fitting procedures like theMLS tend to result in parameter estimates the meansof which do not coincide with the parameters of thedistribution from which the samples are taken It hasbeen observed that the difference can be reduced bytransforming the Weibull points linearly that is byreplacing probabilities 119898(119899 + 1) by (119898 + 119886)(119899 +119887) = [(119899 + 1)(119899 + 119887)][119898(119899 + 1)] + 119886(119899 + 119887)Geometrically this simply means that to improve thefit the straight line on the probability paper resultingfrom linear regression is replaced by another straightline In more detail when points (119909

119898119894

119898(119899 + 1))

are replaced by points (119909119898119894

(119898 + 119886)(119899 + 119887)) thelinearity is also affected as illustrated in Figure 5Thiseffect remains hidden however because the linearregression forces the fitted curve to be linear Thebehaviour described above has often been explainedby stating that the Weibull probabilities are ldquobiasedrdquoor more ldquobiasedrdquo than some other probabilities Thisis misuse of terminologyThe bias is defined as 119864(119886)minus119886 where 119864(119886) is the expected value of an estimator119886 determined from a sample and 119886 is the correct

Journal of Probability and Statistics 5

parameter value The Weibull probabilities are exactvalues in the same way as 12 is the probability ofheads when tossing a coin There is no need forestimation here

In contrast to the abovementioned arguments against theWeibull positions the bias in the parameters resulting fromthe traditional Monte Carlo simulations by the Weibullpositions is in fact attributable to taking the mean of theparameter estimates and to the curve fitting method Fromthe mathematics we know that if 119901 and 119902 are nonlinearlyrelated as 119901 = 119892(119902) it follows that 119864(119901) = 119892(119864(119902)) Fromthe elementary statistics we know that the sample varianceis a biased estimate of the population variance Why shouldwe then believe that the mean of standard deviations orany other distribution parameters obtained from successivesamples (119909

1119894

1199092119894

119909119899119894

) would approach the parameterof the population On probability paper the slope of theregression line represents 119889119909119889(119866(119875)) where 119866 stands for aproper nonlinear transformation of the probability axisThusthere is no a priori reason to expect that a mean of sampleslopes in MC simulations presents something relevant in theprobabilistic sense

Consequently the convergence of the mean of successiveparameter estimates to the correct parameter value cannot beregarded as a goodness criterion for plotting positions Weshould use a criterion based on the bin frequency insteadbecause it is the frequency by which probability is definedFrom a parent distribution with given parameters take asample of size 119899 find the estimated straight line 119866(119875) = 119896119909 +119888 take from the parent distribution one additional randomvalue 119909

119899+1

record the bin [0 1(119899 + 1)] (1(119899 + 1) 2(119899 +1)] (119899(119899+ 1) 1] to which 119875

119899+1

= 119866minus1

(119896119909119899+1

+ 119888) belongsand repeat the steps Auniformdistribution of hits to each binmeans that the method has been successful The fit on eachbin ((119898 minus 1)(119899 + 1) 119898(119899 + 1)] can be considered separatelyusing the criterion

1198622

119898

= (119873119898

119873minus

1

119899 + 1)

2

(A1)

or the whole distribution by

1198622

=

119899+1

sum

119898=1

1198622

119898

(A2)

Here 119873 is the number of simulations and 119873119898

is the numberof hits to bin119898

Such an analysis was made by Makkonen et al [24]The simulations verified that the Weibull positions give themost accurate estimate in the sense of criterion (A2) forall considered distributions that is for Gumbel Weibullnormal and lognormal distribution Another simulation forthe Gumbel distribution 119865(119909) = exp(minus exp(minus(119909 minus 120583)120573))with mean = 5 and standard deviation = 2 was carried outusing the Weibull and Gringorten plotting positions and itsresults are presented in Figure 6 Sample size 2 was chosen to

000

005

010

015

020

025

030

035

040

1 2 3

Relat

ive f

requ

ency

in b

in

Bin

ExactWeibull

Gringorten

Figure 6 Relative frequency of hits in probability intervals 1 2 and3 ([0 13] (13 23] and (23 1]) when using Weibullrsquos [7] andGringortenrsquos [8] plotting positions in Monte Carlo simulations with50000 cycles

Table 1 Parameters obtained for Gumbel distribution from MCsimulation with 50000 cycles These parameters cannot be used inevaluating plotting positions (see text)

Exact parameter Estimated parameterWeibull Gringorten

Mean 5 5381 5196Standard deviation 2 2787 1893120583 4099918 4127 4345120573 1559394 2173 1476

eliminate the possible bias due to the linear regression Theresults confirm the performance of the Weibull positions inthe same way as a nearly uniform distribution in numbers1 6 confirms that the die is fairThe erroneous deductionbased on calculating the mean of the estimated distributionparameters 120583 and 120573 results in a traditional (and incorrect)conclusion that the Weibull positions are worse than thoseproposed by Gringorten [8] The source of this misunder-standing is demonstrated in Table 1

The discussion above shows that all the claims presentedagainst the Weibull plotting positions are unfounded Par-ticularly the recent Monte Carlo simulations supportingthese claims [27ndash33] are based on a misunderstood role ofthe mean of the sample parameters The performance ofa single fitted curve should always be compared with theWeibull probabilities plotted against the observed order-ranked values The plotting positions different from those ofWeibull should not be used to eliminate the error observedwhen a mean of the sample estimates is taken inMonte Carlosimulations because such an error never occurs in a practicalsituationwhere we have only one sample and one estimate foreach parameter

6 Journal of Probability and Statistics

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Academy of Finland via theFICCA programme

References

[1] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo American Statistician vol 50 no 4 pp 361ndash3651996

[2] E Langford ldquoQuartiles in elementary statisticsrdquo Journal ofStatistics Education vol 14 no 3 2006

[3] E J Gumbel Statistics of Extremes Columbia University PressNew York 1958

[4] B F Kimball ldquoOn the choice of plotting positions on probabilitypaperrdquo Journal of the American Statistical Association vol 55pp 546ndash560 1960

[5] S Castillo-Gutierrez E Lozano-Aguilera and M D Estudillo-Martınez ldquoSelection of a plotting position for a normal Q-Qplot R scriptrdquo Journal of Communication and Computer vol 9pp 243ndash250 2012

[6] G Blom Statistical Estimates and Transformed Beta-VariablesJohn Wiley amp Sons 1958

[7] W Weibull ldquoA statistical theory of strength of materialsrdquoIngeniorsvetenskapakademiens Handlingar vol 151 45 pages1939

[8] I I Gringorten ldquoA plotting rule for extreme probability paperrdquoJournal of Geophysical Research vol 68 no 3 pp 813ndash814 1963

[9] LMakkonen ldquoBringing closure to the plotting position contro-versyrdquo Communications in StatisticsmdashTheory and Methods vol37 no 3ndash5 pp 460ndash467 2008

[10] L Makkonen M Pajari and M Tikanmaki ldquoClosure to ldquoProb-lems in the extreme value analysisrdquo (Struct Safety 200830405ndash419)rdquo Structural Safety vol 40 pp 65ndash67 2013

[11] H O Madsen S Krenk and N C Lind Methods of StructuralSafety Prentice-Hall Englewood Cliffs NJ USA 1986

[12] H L Harter ldquoAnother look at plotting positionsrdquo Communica-tions in Statistics-Theory and Methods vol 13 no 13 pp 1613ndash1633 1984

[13] L Makkonen ldquoProblems in the extreme value analysisrdquo Struc-tural Safety vol 30 pp 405ndash419 2008

[14] A Benard and E C Bos-Levenbach ldquoThe plotting of observa-tions on probability paperrdquo Statistica vol 7 pp 163ndash173 1953

[15] W B Langbein ldquoPlotting positions in frequency analysisrdquo USGeological Survey WatermdashSupply Paper 1543-A 1960

[16] M BWilk and R Gnanadesikan ldquoProbability plottingmethodsfor the analysis of datardquo Biometrika vol 55 no 1 pp 1ndash17 1968

[17] V Barnett ldquoProbability plotting methods and order statisticsrdquoJournal of the Royal Statistical Society C Applied Statistics vol24 no 1 pp 95ndash108 1975

[18] C Cunnane ldquoUnbiased plotting positionsmdasha reviewrdquo Journal ofHydrology vol 37 no 3-4 pp 205ndash222 1978

[19] S L Guo ldquoA discussion on unbiased plotting positions for thegeneral extreme value distributionrdquo Journal of Hydrology vol121 pp 33ndash44 1990

[20] D A Jones ldquoPlotting positions via maximum-likelihood for anon-standard situationrdquo Hydrology and Earth System Sciencesvol 1 no 2 pp 357ndash366 1997

[21] G-H Yu and C-C Huang ldquoA distribution free plotting posi-tionrdquo Stochastic Environmental Research and Risk Assessmentvol 15 no 6 pp 462ndash476 2001

[22] C Folland and C Anderson ldquoEstimating changing extremesusing empirical ranking methodsrdquo Journal of Climate vol 15pp 2954ndash2960 2002

[23] R I Harris ldquoGumbel re-visitedmdasha new look at extreme valuestatistics applied to wind speedsrdquo Journal of Wind Engineeringand Industrial Aerodynamics vol 59 no 1 pp 1ndash22 1996

[24] L Makkonen M Pajari and M Tikanmaki ldquoDiscussion onldquoPlotting positions for fitting distributions and extreme valueanalysisrdquordquo Canadian Journal of Civil Engineering vol 40 no 9pp 927ndash929 2013

[25] V T ChowHandbook of Applied Hydrology McGraw-Hill NewYork NY USA 1964

[26] M A Benson ldquoPlotting positions and economics of engineer-ing planningrdquo Journal of the Hydraulics Division vol 88 no 6pp 57ndash71 1962

[27] R I Harris ldquoThe accuracy of design values predicted fromextreme value analysisrdquo Journal of Wind Engineering and Indus-trial Aerodynamics vol 89 no 2 pp 153ndash164 2001

[28] F Mehdi and J Mehdi ldquoDetermination of plotting posi-tion formula for the normal log-normal Pearson( III) log-Pearson(III) and Gumble distributional hypotheses using theprobability plot correlation coefficient testrdquo World AppliedSciences Journal vol 15 no 8 pp 1181ndash1185 2011

[29] N J Cook ldquoRebuttal of ldquoproblems in the extreme valueanalysisrdquordquo Structural Safety vol 34 no 1 pp 418ndash423 2012

[30] A S Yahaya M N Nor N R M Jali N A Ramli FAhmad and A Z Ul-Saufie ldquoDetermination of the probabilityplotting position for type I extreme value distributionrdquo Journalof Applied Sciences vol 12 no 14 pp 1501ndash1506 2012

[31] A S Yahaya C S Yee N A Ramli and F Ahmad ldquoDetermi-nation of the best probability plotting position for predictingparameters of theWeibull distributionrdquo International Journal ofApplied Science and Technology vol 2 pp 106ndash111 2012

[32] S KimH Shin K Joo and J-HHeo ldquoDevelopment of plottingposition for the general extreme value distributionrdquo Journal ofHydrology vol 475 pp 259ndash269 2012

[33] M Fuglem G Parr and I J Jordaan ldquoPlotting positionsfor fitting distributions and extreme value analysisrdquo CanadianJournal of Civil Engineering vol 40 no 2 pp 130ndash139 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Defining Sample Quantiles by the True Rank …downloads.hindawi.com/journals/jps/2014/326579.pdf · 2019-07-31 · Research Article Defining Sample Quantiles by the

Journal of Probability and Statistics 5

parameter value The Weibull probabilities are exactvalues in the same way as 12 is the probability ofheads when tossing a coin There is no need forestimation here

In contrast to the abovementioned arguments against theWeibull positions the bias in the parameters resulting fromthe traditional Monte Carlo simulations by the Weibullpositions is in fact attributable to taking the mean of theparameter estimates and to the curve fitting method Fromthe mathematics we know that if 119901 and 119902 are nonlinearlyrelated as 119901 = 119892(119902) it follows that 119864(119901) = 119892(119864(119902)) Fromthe elementary statistics we know that the sample varianceis a biased estimate of the population variance Why shouldwe then believe that the mean of standard deviations orany other distribution parameters obtained from successivesamples (119909

1119894

1199092119894

119909119899119894

) would approach the parameterof the population On probability paper the slope of theregression line represents 119889119909119889(119866(119875)) where 119866 stands for aproper nonlinear transformation of the probability axisThusthere is no a priori reason to expect that a mean of sampleslopes in MC simulations presents something relevant in theprobabilistic sense

Consequently the convergence of the mean of successiveparameter estimates to the correct parameter value cannot beregarded as a goodness criterion for plotting positions Weshould use a criterion based on the bin frequency insteadbecause it is the frequency by which probability is definedFrom a parent distribution with given parameters take asample of size 119899 find the estimated straight line 119866(119875) = 119896119909 +119888 take from the parent distribution one additional randomvalue 119909

119899+1

record the bin [0 1(119899 + 1)] (1(119899 + 1) 2(119899 +1)] (119899(119899+ 1) 1] to which 119875

119899+1

= 119866minus1

(119896119909119899+1

+ 119888) belongsand repeat the steps Auniformdistribution of hits to each binmeans that the method has been successful The fit on eachbin ((119898 minus 1)(119899 + 1) 119898(119899 + 1)] can be considered separatelyusing the criterion

1198622

119898

= (119873119898

119873minus

1

119899 + 1)

2

(A1)

or the whole distribution by

1198622

=

119899+1

sum

119898=1

1198622

119898

(A2)

Here 119873 is the number of simulations and 119873119898

is the numberof hits to bin119898

Such an analysis was made by Makkonen et al [24]The simulations verified that the Weibull positions give themost accurate estimate in the sense of criterion (A2) forall considered distributions that is for Gumbel Weibullnormal and lognormal distribution Another simulation forthe Gumbel distribution 119865(119909) = exp(minus exp(minus(119909 minus 120583)120573))with mean = 5 and standard deviation = 2 was carried outusing the Weibull and Gringorten plotting positions and itsresults are presented in Figure 6 Sample size 2 was chosen to

000

005

010

015

020

025

030

035

040

1 2 3

Relat

ive f

requ

ency

in b

in

Bin

ExactWeibull

Gringorten

Figure 6 Relative frequency of hits in probability intervals 1 2 and3 ([0 13] (13 23] and (23 1]) when using Weibullrsquos [7] andGringortenrsquos [8] plotting positions in Monte Carlo simulations with50000 cycles

Table 1 Parameters obtained for Gumbel distribution from MCsimulation with 50000 cycles These parameters cannot be used inevaluating plotting positions (see text)

Exact parameter Estimated parameterWeibull Gringorten

Mean 5 5381 5196Standard deviation 2 2787 1893120583 4099918 4127 4345120573 1559394 2173 1476

eliminate the possible bias due to the linear regression Theresults confirm the performance of the Weibull positions inthe same way as a nearly uniform distribution in numbers1 6 confirms that the die is fairThe erroneous deductionbased on calculating the mean of the estimated distributionparameters 120583 and 120573 results in a traditional (and incorrect)conclusion that the Weibull positions are worse than thoseproposed by Gringorten [8] The source of this misunder-standing is demonstrated in Table 1

The discussion above shows that all the claims presentedagainst the Weibull plotting positions are unfounded Par-ticularly the recent Monte Carlo simulations supportingthese claims [27ndash33] are based on a misunderstood role ofthe mean of the sample parameters The performance ofa single fitted curve should always be compared with theWeibull probabilities plotted against the observed order-ranked values The plotting positions different from those ofWeibull should not be used to eliminate the error observedwhen a mean of the sample estimates is taken inMonte Carlosimulations because such an error never occurs in a practicalsituationwhere we have only one sample and one estimate foreach parameter

6 Journal of Probability and Statistics

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Academy of Finland via theFICCA programme

References

[1] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo American Statistician vol 50 no 4 pp 361ndash3651996

[2] E Langford ldquoQuartiles in elementary statisticsrdquo Journal ofStatistics Education vol 14 no 3 2006

[3] E J Gumbel Statistics of Extremes Columbia University PressNew York 1958

[4] B F Kimball ldquoOn the choice of plotting positions on probabilitypaperrdquo Journal of the American Statistical Association vol 55pp 546ndash560 1960

[5] S Castillo-Gutierrez E Lozano-Aguilera and M D Estudillo-Martınez ldquoSelection of a plotting position for a normal Q-Qplot R scriptrdquo Journal of Communication and Computer vol 9pp 243ndash250 2012

[6] G Blom Statistical Estimates and Transformed Beta-VariablesJohn Wiley amp Sons 1958

[7] W Weibull ldquoA statistical theory of strength of materialsrdquoIngeniorsvetenskapakademiens Handlingar vol 151 45 pages1939

[8] I I Gringorten ldquoA plotting rule for extreme probability paperrdquoJournal of Geophysical Research vol 68 no 3 pp 813ndash814 1963

[9] LMakkonen ldquoBringing closure to the plotting position contro-versyrdquo Communications in StatisticsmdashTheory and Methods vol37 no 3ndash5 pp 460ndash467 2008

[10] L Makkonen M Pajari and M Tikanmaki ldquoClosure to ldquoProb-lems in the extreme value analysisrdquo (Struct Safety 200830405ndash419)rdquo Structural Safety vol 40 pp 65ndash67 2013

[11] H O Madsen S Krenk and N C Lind Methods of StructuralSafety Prentice-Hall Englewood Cliffs NJ USA 1986

[12] H L Harter ldquoAnother look at plotting positionsrdquo Communica-tions in Statistics-Theory and Methods vol 13 no 13 pp 1613ndash1633 1984

[13] L Makkonen ldquoProblems in the extreme value analysisrdquo Struc-tural Safety vol 30 pp 405ndash419 2008

[14] A Benard and E C Bos-Levenbach ldquoThe plotting of observa-tions on probability paperrdquo Statistica vol 7 pp 163ndash173 1953

[15] W B Langbein ldquoPlotting positions in frequency analysisrdquo USGeological Survey WatermdashSupply Paper 1543-A 1960

[16] M BWilk and R Gnanadesikan ldquoProbability plottingmethodsfor the analysis of datardquo Biometrika vol 55 no 1 pp 1ndash17 1968

[17] V Barnett ldquoProbability plotting methods and order statisticsrdquoJournal of the Royal Statistical Society C Applied Statistics vol24 no 1 pp 95ndash108 1975

[18] C Cunnane ldquoUnbiased plotting positionsmdasha reviewrdquo Journal ofHydrology vol 37 no 3-4 pp 205ndash222 1978

[19] S L Guo ldquoA discussion on unbiased plotting positions for thegeneral extreme value distributionrdquo Journal of Hydrology vol121 pp 33ndash44 1990

[20] D A Jones ldquoPlotting positions via maximum-likelihood for anon-standard situationrdquo Hydrology and Earth System Sciencesvol 1 no 2 pp 357ndash366 1997

[21] G-H Yu and C-C Huang ldquoA distribution free plotting posi-tionrdquo Stochastic Environmental Research and Risk Assessmentvol 15 no 6 pp 462ndash476 2001

[22] C Folland and C Anderson ldquoEstimating changing extremesusing empirical ranking methodsrdquo Journal of Climate vol 15pp 2954ndash2960 2002

[23] R I Harris ldquoGumbel re-visitedmdasha new look at extreme valuestatistics applied to wind speedsrdquo Journal of Wind Engineeringand Industrial Aerodynamics vol 59 no 1 pp 1ndash22 1996

[24] L Makkonen M Pajari and M Tikanmaki ldquoDiscussion onldquoPlotting positions for fitting distributions and extreme valueanalysisrdquordquo Canadian Journal of Civil Engineering vol 40 no 9pp 927ndash929 2013

[25] V T ChowHandbook of Applied Hydrology McGraw-Hill NewYork NY USA 1964

[26] M A Benson ldquoPlotting positions and economics of engineer-ing planningrdquo Journal of the Hydraulics Division vol 88 no 6pp 57ndash71 1962

[27] R I Harris ldquoThe accuracy of design values predicted fromextreme value analysisrdquo Journal of Wind Engineering and Indus-trial Aerodynamics vol 89 no 2 pp 153ndash164 2001

[28] F Mehdi and J Mehdi ldquoDetermination of plotting posi-tion formula for the normal log-normal Pearson( III) log-Pearson(III) and Gumble distributional hypotheses using theprobability plot correlation coefficient testrdquo World AppliedSciences Journal vol 15 no 8 pp 1181ndash1185 2011

[29] N J Cook ldquoRebuttal of ldquoproblems in the extreme valueanalysisrdquordquo Structural Safety vol 34 no 1 pp 418ndash423 2012

[30] A S Yahaya M N Nor N R M Jali N A Ramli FAhmad and A Z Ul-Saufie ldquoDetermination of the probabilityplotting position for type I extreme value distributionrdquo Journalof Applied Sciences vol 12 no 14 pp 1501ndash1506 2012

[31] A S Yahaya C S Yee N A Ramli and F Ahmad ldquoDetermi-nation of the best probability plotting position for predictingparameters of theWeibull distributionrdquo International Journal ofApplied Science and Technology vol 2 pp 106ndash111 2012

[32] S KimH Shin K Joo and J-HHeo ldquoDevelopment of plottingposition for the general extreme value distributionrdquo Journal ofHydrology vol 475 pp 259ndash269 2012

[33] M Fuglem G Parr and I J Jordaan ldquoPlotting positionsfor fitting distributions and extreme value analysisrdquo CanadianJournal of Civil Engineering vol 40 no 2 pp 130ndash139 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Defining Sample Quantiles by the True Rank …downloads.hindawi.com/journals/jps/2014/326579.pdf · 2019-07-31 · Research Article Defining Sample Quantiles by the

6 Journal of Probability and Statistics

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Academy of Finland via theFICCA programme

References

[1] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo American Statistician vol 50 no 4 pp 361ndash3651996

[2] E Langford ldquoQuartiles in elementary statisticsrdquo Journal ofStatistics Education vol 14 no 3 2006

[3] E J Gumbel Statistics of Extremes Columbia University PressNew York 1958

[4] B F Kimball ldquoOn the choice of plotting positions on probabilitypaperrdquo Journal of the American Statistical Association vol 55pp 546ndash560 1960

[5] S Castillo-Gutierrez E Lozano-Aguilera and M D Estudillo-Martınez ldquoSelection of a plotting position for a normal Q-Qplot R scriptrdquo Journal of Communication and Computer vol 9pp 243ndash250 2012

[6] G Blom Statistical Estimates and Transformed Beta-VariablesJohn Wiley amp Sons 1958

[7] W Weibull ldquoA statistical theory of strength of materialsrdquoIngeniorsvetenskapakademiens Handlingar vol 151 45 pages1939

[8] I I Gringorten ldquoA plotting rule for extreme probability paperrdquoJournal of Geophysical Research vol 68 no 3 pp 813ndash814 1963

[9] LMakkonen ldquoBringing closure to the plotting position contro-versyrdquo Communications in StatisticsmdashTheory and Methods vol37 no 3ndash5 pp 460ndash467 2008

[10] L Makkonen M Pajari and M Tikanmaki ldquoClosure to ldquoProb-lems in the extreme value analysisrdquo (Struct Safety 200830405ndash419)rdquo Structural Safety vol 40 pp 65ndash67 2013

[11] H O Madsen S Krenk and N C Lind Methods of StructuralSafety Prentice-Hall Englewood Cliffs NJ USA 1986

[12] H L Harter ldquoAnother look at plotting positionsrdquo Communica-tions in Statistics-Theory and Methods vol 13 no 13 pp 1613ndash1633 1984

[13] L Makkonen ldquoProblems in the extreme value analysisrdquo Struc-tural Safety vol 30 pp 405ndash419 2008

[14] A Benard and E C Bos-Levenbach ldquoThe plotting of observa-tions on probability paperrdquo Statistica vol 7 pp 163ndash173 1953

[15] W B Langbein ldquoPlotting positions in frequency analysisrdquo USGeological Survey WatermdashSupply Paper 1543-A 1960

[16] M BWilk and R Gnanadesikan ldquoProbability plottingmethodsfor the analysis of datardquo Biometrika vol 55 no 1 pp 1ndash17 1968

[17] V Barnett ldquoProbability plotting methods and order statisticsrdquoJournal of the Royal Statistical Society C Applied Statistics vol24 no 1 pp 95ndash108 1975

[18] C Cunnane ldquoUnbiased plotting positionsmdasha reviewrdquo Journal ofHydrology vol 37 no 3-4 pp 205ndash222 1978

[19] S L Guo ldquoA discussion on unbiased plotting positions for thegeneral extreme value distributionrdquo Journal of Hydrology vol121 pp 33ndash44 1990

[20] D A Jones ldquoPlotting positions via maximum-likelihood for anon-standard situationrdquo Hydrology and Earth System Sciencesvol 1 no 2 pp 357ndash366 1997

[21] G-H Yu and C-C Huang ldquoA distribution free plotting posi-tionrdquo Stochastic Environmental Research and Risk Assessmentvol 15 no 6 pp 462ndash476 2001

[22] C Folland and C Anderson ldquoEstimating changing extremesusing empirical ranking methodsrdquo Journal of Climate vol 15pp 2954ndash2960 2002

[23] R I Harris ldquoGumbel re-visitedmdasha new look at extreme valuestatistics applied to wind speedsrdquo Journal of Wind Engineeringand Industrial Aerodynamics vol 59 no 1 pp 1ndash22 1996

[24] L Makkonen M Pajari and M Tikanmaki ldquoDiscussion onldquoPlotting positions for fitting distributions and extreme valueanalysisrdquordquo Canadian Journal of Civil Engineering vol 40 no 9pp 927ndash929 2013

[25] V T ChowHandbook of Applied Hydrology McGraw-Hill NewYork NY USA 1964

[26] M A Benson ldquoPlotting positions and economics of engineer-ing planningrdquo Journal of the Hydraulics Division vol 88 no 6pp 57ndash71 1962

[27] R I Harris ldquoThe accuracy of design values predicted fromextreme value analysisrdquo Journal of Wind Engineering and Indus-trial Aerodynamics vol 89 no 2 pp 153ndash164 2001

[28] F Mehdi and J Mehdi ldquoDetermination of plotting posi-tion formula for the normal log-normal Pearson( III) log-Pearson(III) and Gumble distributional hypotheses using theprobability plot correlation coefficient testrdquo World AppliedSciences Journal vol 15 no 8 pp 1181ndash1185 2011

[29] N J Cook ldquoRebuttal of ldquoproblems in the extreme valueanalysisrdquordquo Structural Safety vol 34 no 1 pp 418ndash423 2012

[30] A S Yahaya M N Nor N R M Jali N A Ramli FAhmad and A Z Ul-Saufie ldquoDetermination of the probabilityplotting position for type I extreme value distributionrdquo Journalof Applied Sciences vol 12 no 14 pp 1501ndash1506 2012

[31] A S Yahaya C S Yee N A Ramli and F Ahmad ldquoDetermi-nation of the best probability plotting position for predictingparameters of theWeibull distributionrdquo International Journal ofApplied Science and Technology vol 2 pp 106ndash111 2012

[32] S KimH Shin K Joo and J-HHeo ldquoDevelopment of plottingposition for the general extreme value distributionrdquo Journal ofHydrology vol 475 pp 259ndash269 2012

[33] M Fuglem G Parr and I J Jordaan ldquoPlotting positionsfor fitting distributions and extreme value analysisrdquo CanadianJournal of Civil Engineering vol 40 no 2 pp 130ndash139 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Defining Sample Quantiles by the True Rank …downloads.hindawi.com/journals/jps/2014/326579.pdf · 2019-07-31 · Research Article Defining Sample Quantiles by the

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of


Recommended