Victor ChernozhukovIván Fernández-Val
Alfred Galichon
THE INSTITUTE FOR FISCAL STUDIESDEPARTMENT OF ECONOMICS, UCL
cemmap working paper CWP10/07
QUANTILE AND PROBABILITY CURVES WITHOUT CROSSING
VICTOR CHERNOZHUKOV† IVAN FERNANDEZ-VAL§ ALFRED GALICHON‡
Abstract. The most common approach to estimating conditional quantile curves is to fit a curve,
typically linear, pointwise for each quantile. Linear functional forms, coupled with pointwise fitting, are
used for a number of reasons including parsimony of the resulting approximations and good computa-
tional properties. The resulting fits, however, may not respect a logical monotonicity requirement – that
the quantile curve be increasing as a function of probability. This paper studies the natural monotoniza-
tion of these empirical curves induced by sampling from the estimated non-monotone model, and then
taking the resulting conditional quantile curves that by construction are monotone in the probability.
This construction of monotone quantile curves may be seen as a bootstrap and also as a monotonic re-
arrangement of the original non-monotone function. It is shown that the monotonized curves are closer
to the true curves in finite samples, for any sample size. Under correct specification, the rearranged
conditional quantile curves have the same asymptotic distribution as the original non-monotone curves.
Under misspecification, however, the asymptotics of the rearranged curves may partially differ from the
asymptotics of the original non-monotone curves. An analogous procedure is developed to monotonize
the estimates of conditional distribution functions. The results are derived by establishing the compact
(Hadamard) differentiability of the monotonized quantile and probability curves with respect to the
original curves in discontinuous directions, tangentially to a set of continuous functions. In doing so,
the compact differentiability of the rearrangement-related operators is established.
Keywords: Quantile regression, Monotonicity, Rearrangement, Approximation, Functional Delta
Method, Hadamard Differentiability of Rearrangement Operators.
AMS 2000 subject classification: Primary 62J02; Secondary 62E20, 62P20
Date: First version is of April 6, 2005. The last modification was done April 27, 2007. The title
of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful
for the inspiration and formulation of the problem. We would like to thank Josh Angrist, Andrew
Chesher, Phil Cross, Raymond Guiteras, Xuming He, Roger Koenker, Vadim Marmer, Ilya Molchanov,
Francesca Molinari, Whitney Newey, Steve Portnoy, Shinichi Sakata, Art Shneyerov, Alp Simsek, and
seminar participants at BU, Columbia, Cornell, Georgetown, Harvard-MIT, MIT, Northwestern, UBC,
and UCL for very useful comments that helped improve the paper.1
2
1. Introduction and Discussion
The problem studied in this paper can be best described using linear quantile regres-
sion as the prime example (Koenker, 2005). Suppose that x′β(u) is a linear approxima-
tion to the u-quantile Q0(u|x) of a real response variable Y , given a vector of regressors
X = x. The typical estimation methods fit the conditional curve x′β(u) pointwise in
u ∈ (0, 1) producing an estimate x′β(u). Linear functional forms, coupled with pointwise
fitting, are used for a number of reasons including parsimony of the resulting approxi-
mations and good computational properties (Portnoy and Koenker, 1997). However, a
problem that might occur is that the map
u 7→ x′β(u)
may not be increasing in u, which violates the logical monotonicity requirement. Another
manifestation of this issue, known as the “quantile crossing problem” (He, 1997), is that
the conditional quantile curves x 7→ x′β(u) may cross for different values of u.
In the analysis we shall distinguish the following two cases, each leading to the lack
of monotonicity or the crossing problem:
(1) Monotonically correct case: The population curve u 7→ x′β(u) is increasing in u,
and thus satisfies the monotonicity requirement. However, the empirical curve
u 7→ x′β(u) may be non-monotone due to estimation error.
(2) Monotonically incorrect case: The population curve u 7→ x′β(u) is non-monotone
due to imperfect approximation to the true conditional quantile function. Ac-
cordingly, the resulting empirical curve u 7→ x′β(u) is also non-monotone due to
both non-monotonicity of the population curve and estimation error.
Consider the random variable
Yx := x′β(U) where U ∼ U(0, 1).
This variable can be seen as a bootstrap draw from the estimated quantile regression
model, as in Koenker (1994), and has the distribution function
F (y|x) =
∫ 1
0
1{x′β(u) ≤ y}du. (1.1)
3
Moreover, inverting the distribution function, one obtains a proper quantile function
F−1(u|x) = inf{y : F (y|x) ≥ u}, (1.2)
which is monotone in u. The rearranged quantile function F−1(u|x) coincides with the
original curve x′β(u) if the original curve is increasing in u, but differs from the original
curve otherwise. Thus, starting with a possibly non-monotone original curve u 7→ x′β(u),
the rearrangement (1.1)-(1.2) produces a monotone quantile curve u 7→ F−1(u|x). In
what follows, we focus our attention on the interval (0, 1) without loss of generality.
Indeed, any closed subinterval of (0, 1) can also be considered isomorphically to the
treatment of the unit interval case, as commented further in Section 2.
As mentioned above, this rearrangement mechanism has a direct relation to the quan-
tile regression bootstrap (Koenker, 1994), since the rearranged quantile curve is produced
by sampling from the estimated original quantile model. Moreover, both the mechanism
and its name have a direct relation to rearrangement maps in variational analysis and op-
erations research (e.g., Hardy, Littlewood, and Polya, 1952, and Villani, 2003). Further
important references on the rearrangement method are discussed below.
The purpose of this paper is to establish the empirical properties of the rearranged
quantile curves and their distribution counterparts:
u 7→ F−1(u|x) and y 7→ F (y|x),
under scenarios (1) and (2). The paper also characterizes certain analytical and approx-
imation properties of the corresponding population curves:
u 7→ F−1(u|x) = inf{y : F (y|x) ≥ u}, and y 7→ F (y|x) =
∫ 1
0
1{x′β(u) ≤ y}du.
The first main result of the paper establishes the improved estimation properties of
the rearranged curves. We show that the rearranged curve F−1(u|x) is closer to the true
conditional quantile curve Q0(u|x) than the original curve. Formally, for each x, we have
that for all p ∈ [1,∞]
(∫
U|Q0(u|x)− F−1(u|x)|pdu
)1/p
≤(∫
U|Q0(u|x)− x′β(u)|pdu
)1/p
,
4
where the inequality is strict for p ∈ (1,∞) whenever u 7→ x′β(u) is decreasing on
a subset of U := (0, 1) of positive Lebesgue measure, while u 7→ Q0(u|x) is strictly
increasing. This property is independent of the sample size, and thus continues to hold
in the population, and also regardless of whether the linear quantile estimator x′β(u)
estimates Q0(u|x) consistently or not, i.e. whether Q0(u|x) = x′β(u) or Q0(u|x) 6=x′β(u). In other words, the rearranged quantile curves have smaller estimation error
than the original curves whenever the latter are not monotone. This is a very important
property that does not depend on the way the quantile model is estimated. It also does
not rely on any other specifics of the current context and is therefore applicable quite
generally.
Towards describing the essence of the rest of results, let us fix the value of the regressor
X to x. Suppose that β(u) is an estimator for β(u) that converges weakly to a Gaussian
process G(u), so that√
nx′(β(u)− β(u)) ⇒ x′G(u), (1.3)
as a stochastic process indexed by u in the metric space of bounded functions `∞(0, 1).
For sufficient conditions, see, for example, Gutenbrunner and Jureckova (1992), Portnoy
(1991), and Angrist, Chernozhukov, and Fernandez-Val (2006).
The second main result of the paper is that in the monotonically correct case (1),
√n(F (y|x)− F (y|x)) ⇒ F ′(y|x)[x′G(F (y|x))], (1.4)
as a stochastic process indexed by y in the metric space `∞(Y), where Y is the support
of Yx; and
F ′(y|x) =1
x′β′(u)
∣∣∣u=F (y|x)
, with β′(u) :=∂β(u)
∂u.
Moreover, we show that
√n(F−1(u|x)− F−1(u|x)) ⇒ x′G(u), (1.5)
as a stochastic process indexed by u, in `∞(0, 1); which, remarkably, coincides with the
first order asymptotics (1.3) of the original curve. This result has a convenient practical
implication: if the population curve is monotone, then the empirical non-monotone
curve can be re-arranged to be monotonic without affecting its (first order) asymptotic
properties. To derive the above results we find the functional Hadamard derivatives
5
of F (y|x) and F−1(u|x) with respect to perturbations of the underlying curve x′β(u)
in discontinuous directions, tangentially to the set of continuous functions, and then
use the functional delta method. Establishing the Hadamard differentiability of the
rearranged distribution and quantile curves in discontinuous directions is the second
main theoretical result of the paper.
The third main result is that in the monotonically incorrect case
√n(F (y|x)− F (y|x)) ⇒
K(y|x)∑
k=1
x′G(uk(y|x))
|x′β′(uk(y|x))| , (1.6)
as a stochastic process indexed by y ∈ K, in `∞(K), where K is an appropriate set
defined in the next section. Here u1(y|x) < ... < uK(y|x)(y|x) are solutions to the
equation y = x′β(u), assuming that K(y|x) is bounded. Similarly, for the rearranged
quantile curve,
√n(F−1(u|x)− F−1(u|x)) ⇒
(K(y|x)∑
k=1
1
|x′β′(uk(y|x))|
)−1 K(y|x)∑
k=1
x′G(uk(y|x))
|x′β′(uk(y|x))|
∣∣∣∣∣y=F−1(u|x)
,
(1.7)
as a stochastic process indexed by u ∈ K ′, in `∞(K ′), where K ′ is an appropriate set
defined in the next section.
Analogously to quantiles, most estimation methods for conditional distribution func-
tions do not impose monotonicity, and therefore can give rise to non-monotonic empirical
conditional distribution curves; see, for example, Hall, Wolff, and Yao (1999). A similar
monotone rearrangement can be applied to these distribution curves by exchanging the
roles played by the quantile and the probability spaces. Thus, suppose that P (y|x) is a
candidate estimate of a conditional distribution function, which is not monotone in y.
The rearranged monotone quantile curve associated with P (y|x) is
Q (u|x) =
∫ ∞
0
1{P (y|x) ≤ u}dy −∫ 0
−∞1{P (y|x) > u}dy.
The rearranged probability curve can then be obtained as the inverse of this quantile
curve, i.e.,
F (y|x) = inf{
u : Q (u|x) ≥ y}
,
6
which is monotone by construction. Section 3 shows in more detail that similar improved
estimation properties and an asymptotic distribution theory goes through for Q and F .
The distributional results in the paper do not rely on the sampling properties of the
particular estimation method used, because they are expressed in terms of the differ-
entiability of the operator with respect to the basic estimated process. Moreover, the
results that follow are derived without imposing linearity of the functional forms. The
only conditions required are that (1) a central limit theorem like (1.3) applies to the
estimator of the curve, and (2) the population curves have some smoothness properties.
The exact nature of these population curves does not affect the validity of the results.
For example, the results hold regardless of whether the underlying model is an ordinary
or an instrumental quantile regression model.
There exist other methods to obtain monotonic fits based on quantile regression.
He (1997), for example, proposes to impose a location-scale regression model, which
naturally satisfies monotonicity. This approach is fruitful for location-scale situations,
but in numerous cases data do not satisfy the location-scale model, as discussed, for
example, in Lehmann (1974), Doksum (1974), and Koenker (2005). Koenker and Ng
(2005) develop a computational method for quantile regression that imposes the non-
crossing constraints in simultaneous fitting of quantile curves. This approach may be
fruitful in many situations, but the statistical properties of the method remain unknown.
Clearly, Koenker and Ng’s proposal is different from the rearrangement method.
The distributional results obtained in the paper can also be viewed as a functional
delta method for the rearrangement-related operators (1.1) and (1.2) that include the
inverse (quantile) operators as a special case. In this sense, they extend the previous
results by Gill and Johansen (1990), Doss and Gill (1992), and Dudley and Norvaisa
(1999) on compact differentiability of the quantile operator. The main technical difficulty
here, as well as in the quantile case, is that differentiability needs to be established in
discontinuous directions (that converge to continuous directions, i.e., tangentially to
the set of continuous functions), because the empirical perturbations of the quantile
processes are typically step functions.
Both the statistical and mathematical results of this paper complement the important
work of Dette, Neumeyer, and Pilz (2006), which applies the rearrangement operators to
7
kernel mean regressions, in order to obtain mean regression functions that are monotonic
in the regressors. Our results on Hadamard differentiability in discontinuous directions
are new. They complement the local expansions in smooth directions subsumed in the
proofs in Dette et. al. for the case called here the monotonically correct case. In
addition, our results cover monotonically incorrect cases. The statistical problem also
differs quite substantially. The mathematical results of this paper also complement the
results on directional differentiability of L1- functionals of rearranged functions like (1.2)
by Mossino and Temam (1981). The results for L1-functionals do not imply the main
results of this paper, such as (1.4)-(1.7), but the converse is shown to be true. (See
discussion after Proposition 4 in Section 2 for more details.)
There are many potential applications of the estimation and differentiability results of
the paper to objects other than probability or quantile curves. For example, in a com-
panion work we present applications to economic demand and production functions and
to biometric growth curves, where monotonization is used to impose useful theoretical
or logical restrictions (Chernozhukov, Fernandez-Val, and Galichon, 2006a).
We organize the rest of the paper as follows. In Section 2.1 we describe some basic
analytical properties of the rearranged population curves. In Section 2.2 we derive the
functional differentiability results. In Section 2.3 we present estimation properties of
the rearranged curves and establish their limit distributions. In Section 3 we extend
the previous results to monotonize estimates of distribution curves. In Section 4.1 we
illustrate the rearrangement procedure with an empirical application, and in Section 4.2
we provide a Monte-Carlo example. In Section 5 we conclude with a summary of the
main results.
2. Rearranged Quantile Curves: Analytical and Empirical Properties
In this section the treatment of the problem is somewhat more general than in the in-
troduction. In particular, we replace the linear functional form x′β(u) by Q(u|x). Define
Yx := Q(U |x), where U ∼ Uniform(U) with U = (0, 1). Let F (y|x) :=∫ 1
01{Q(u|x) ≤
y}du be the distribution function of Yx, and F−1(u|x) := inf{y : F (y|x) ≥ u} be the
quantile function of Yx.
8
Remark. We consider the interval (0, 1) without loss of generality. Indeed, suppose
we are interested in a particular subinterval (a, a + b) of (0, 1). For example, we may
wish to focus estimation on a particular range of quantiles or to avoid estimation of tail
quantiles. For this purpose, we define all objects conditionally on the event U ∈ (a, a+b):
Yx := Q(U |x) = Q(a + bU |x), where U ∼ U(0, 1), F (y|x) :=∫ 1
01{Q(u|x) ≤ y}du, and
F−1(u|x) := inf{y : F (y|x) ≥ u} for u ∈ (0, 1). The analysis of the paper applies to the
functions Q and F−1. In order to go back to the unconditional quantities, we can use
the transformations Q(u|x) = Q((u−a)/b|x) for u ∈ (a, a+b) and F (y|x) = a+bF (y|x)
for y ∈ {Q(u|x) : u ∈ (a, a + b)}.
2.1. Basic Analytical Properties. We start by developing some basic properties for
F (y|x) and F−1(u|x), the population counterparts of the rearranged distribution curve
and its inverse. We need these properties to derive various empirical properties stated
in the next section.
Recall first the following definitions from Milnor (1965): Let g : U ⊂ R → R be a
continuously differentiable function. A point u ∈ U is called a regular point of g if the
derivative of g at this point does not vanish, i.e., g′ (u) 6= 0. A point u which is not a
regular point is called a critical point. A value y ∈ g (U) is called a regular value of g
if g−1 ({y}) contains only regular points, i.e., if ∀u ∈ g−1 ({y}), g′ (u) 6= 0. A value y
which is not a regular value is called a critical value.
Denote by Yx the support of Yx, YX := {(y, x) : y ∈ Yx, x ∈ X}, and UX := U × X .
We assume throughout that Yx ⊂ Y , which is compact subset of R, and that x ∈ X , a
compact subset of Rd. In some applications the curves of interest are not functions of
x, or we might be interested in a particular value x. In this case, the set X is taken to
be a singleton X = {x}. We make the following assumptions about Q(u|x):
(a) Q(u|x) : U ×X → R is a continuously differentiable function in both arguments,
(b) For each x ∈ X , the number of elements of {u ∈ U | Q′(u|x) = 0} is finite and
uniformly bounded on x ∈ X .
Assumption (b) implies that, for each x ∈ X , Q′(u|x) is not zero almost everywhere on
U and can only switch sign a bounded number of times. Let Y∗x be the subset of regular
values of u 7→ Q(u|x) in Yx, and YX ∗ := {(y, x) : y ∈ Y∗x, x ∈ X}.
9
Proposition 1 (Basic Properties of F (y|x) and F−1(u|x)). Under assumptions (a) -
(b), the functions F (y|x) and F−1(u|x) satisfy the following properties:
1. The set of critical values, Yx \ Y∗x, is finite, and∫Yx\Y∗x dF (y|x) = 0.
2. For any y ∈ Y∗x
F (y|x) =
K(y|x)∑
k=1
sign{Q′(uk(y|x)|x)}uk(y|x) + 1{Q′(uK(y|x)(y|x)|x) < 0},
where {uk(y|x), for k = 1, ..., K(y|x) < ∞} are the roots of Q(u|x) = y in increasing
order.
3. For any y ∈ Y∗x, the ordinary derivative f(y|x) = ∂F (y|x)/∂y exists and takes the
form
f(y|x) =
K(y|x)∑
k=1
1
|Q′(uk(y|x)|x)| ,
which is continuous at each y ∈ Y∗x. For any y ∈ Y \ Y∗x, set f(y|x) := 0. F (y|x) is
absolutely continuous and strictly increasing in y ∈ Yx. Moreover, f(y|x) is a Radon-
Nikodym derivative of F (y|x) with respect to the Lebesgue measure.
4. The quantile function F−1(u|x) partially coincides with Q(u|x); namely
F−1(u|x) = Q(u|x),
provided that Q(u|x) is increasing at u, and the equation Q(u|x) = y has unique solution
for y = F−1(u|x).
5. The quantile function F−1(u|x) is equivariant to location and scale transformations
of Q(u|x).
6. The quantile function F−1(u|x) has an ordinary continuous derivative
1/f(F−1(u|x)|x),
when F−1(u|x) ∈ Y∗x. This function is also a Radon-Nikodym derivative with respect to
the the Lebesgue measure.
7. The map (y, x) 7→ F (y|x) is continuous on YX and the map (u, x) 7→ F−1(u|x) is
continuous on UX .
10
The following simple example illustrates some of these basic properties in a situation
where the initial population pseudo-quantile curve is highly non-monotone. Consider
the following pseudo-quantile function:
Q(u) = 5
(u +
1
πsin(2πu)
). (2.1)
The left panel of Figure 1 shows that this function is non-monotone in [0, 1]. In par-
ticular, the slope of Q(u) changes sign twice at 1/3 and 2/3. The rearranged quantile
curve F−1(u), also plotted in this panel, is continuous and monotonically increasing.
The results 1, 2, 4 and 7 of the proposition are illustrated in the right panel of Figure
1, which plots the original and rearranged distribution curves. Here we can see that
the rearranged distribution function is continuous, does not have mass points, and co-
incide with the original curve for values of y where the original curve is one to one and
increasing.
Figure 2 illustrates the third and sixth results of the proposition by plotting the
sparsity function for F−1(u) and the density function of F (y). The derivative of F (y)
in the right panel is continuous at the regular values of Q(u). Similarly, the sparsity
function for F−1(u) in the left panel is continuous at the corresponding image values
(under F (y)).
2.2. Functional Derivatives. Next, we establish the main results of the paper on
Hadamard differentiability of F (y|x) and F−1(u|x) with respect to Q(u|x), tangentially
to the space of continuous functions on UX . This differentiability property is important
for deriving the asymptotic distributions of the rearranged estimates. In particular, the
property allows us to establish generic convergence results for rearranged curves based on
any initial quantile estimator, provided the initial estimator satisfies a functional central
limit theorem. The property also implies that the bootstrap is valid for performing
inference on the rearranged estimates, provided the bootstrap is valid for the initial
estimates. This result follows from the functional delta method for the bootstrap (e.g.,
Theorem 13.9 in van der Vaart, 1998).
In what follows, `∞(UX ) denotes the set of bounded and measurable functions h :
UX → R, C(UX ) denotes the set of continuous functions mapping h : UX → R, and
11
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
u
Q(u)F−1(u)
0 1 2 3 4 50.
00.
20.
40.
60.
81.
0
y
Q−1(y)F(y)
Figure 1. Left: The pseudo-quantile function Q(u) and the rearranged
quantile function F−1(u). Right: The pseudo-distribution function Q−1(y)
and the rearranged distribution function F (y).
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
u
1/f(F−1(u))
0 1 2 3 4 5
0.0
0.5
1.0
1.5
y
f(y)
Figure 2. Left: The density (sparsity) function of the rearranged quan-
tile function F−1(u). Right: The density function of the rearranged dis-
tribution function F (y).
12
`1(UX ) denotes the set of measurable functions h : UX → R such that∫U
∫X |h(u|x)|dudx
< ∞, where du and dx denote the integration with respect to the Lebesgue measure on
U and X , respectively.
Proposition 2 (Hadamard Derivative of F (y|x) with respect to Q(u|x)). Define F (y|x, ht)
:=∫ 1
01{Q(u|x) + tht(u|x) ≤ y}du. Under assumptions (a)-(b), as t → 0,
Dht(y|x, t) =F (y|x, ht)− F (y|x)
t→ Dh(y|x), (2.2)
Dh(y|x) := −K(y|x)∑
k=1
h(uk (y|x) |x)
|Q′(uk(y|x)|x)| . (2.3)
The convergence holds uniformly in any compact subset of YX ∗ := {(y, x) : y ∈ Y∗x, x ∈X}, for every |ht − h|∞ → 0, where ht ∈ `∞ (UX ), and h ∈ C(UX ).
Proposition 3 (Hadamard Derivative of F−1(u|x) with respect to Q(u|x)). Under as-
sumptions (a)-(b), as t → 0,
Dht(u|x, t) :=F−1(u|x, ht)− F−1(u|x)
t→ Dh(u|x), (2.4)
Dh(u|x) := − 1
f(F−1(u|x)|x)·Dh(F
−1(u|x)|x). (2.5)
The convergence holds uniformly in any compact subset of UX ∗ = {(u, x) : (F−1(u|x), x) ∈YX ∗}, for every |ht − h|∞ → 0, where ht ∈ `∞ (UX ), and h ∈ C(UX ).
The convergence results hold uniformly on regions that exclude the critical values of
the mapping u 7→ Q(u|x). At the critical values, Q(u|x) possibly changes from increasing
to decreasing. Moreover, in the monotonically correct case (1), the following result is
worth emphasizing:
Corollary 1 (Monotonically correct case). Suppose u 7→ Q(u|x) has Q′(u|x) > 0, for
each (u, x) ∈ UX , then YX ∗ = YX and UX ∗ = UX . Therefore, the convergence in
Propositions 2 and 3 holds uniformly over the entire YX and UX , respectively. More-
over, Dh(u|x) = h, i.e., the Hadamard derivative of the rearranged quantile with respect
to the original curve is the identity operator.
13
The convergence is uniform over the entire domain in the monotonically correct case.
This result raises naturally the question of whether uniform convergence can be achieved
by some operation of smoothing in the monotonically incorrect case – namely integrating
either over y (or over u). The answer is indeed yes.
The following proposition calculates the Hadamard derivative of the following func-
tionals obtained by integration:
(y′, x) 7→∫
Y1{y ≤ y′}g(y|x)F (y|x)dy, (u′, x) 7→
∫
U1{u ≤ u′}g(u|x)F−1(u|x)du,
with the restrictions on g specified below. These elementary functionals are useful
building blocks for various statistics, as briefly mentioned in the next section.
Proposition 4. The following results are true with the limits being continuous on the
specified domains:
1.
∫
Y1{y ≤ y′}g(y|x)Dht(y|x, t)dy →
∫
Y1{y ≤ y′}g(y|x)Dh(y|x)dy
uniformly in (y′, x) ∈ YX , for any g ∈ `∞(YX ) such that x 7→ g(y|x) is continuous for
a.e. y.
2.
∫
U1{u ≤ u′}g(u|x)Dht(u|x, t)du →
∫
U1{u ≤ u′}g(u|x)Dh(u|x)du
uniformly in (u′, x) ∈ UX , for any g ∈ `1(UX ) such that x 7→ g(u|x) is continuous for
a.e. u.
This proposition essentially is a corollary of Propositions 2 and 3. Indeed, the results
(1)-(2) follow from the fact that the pointwise convergence of Propositions 2 and 3,
coupled with the uniform integrability shown in Lemma 3 in the Appendix, permits the
interchange of limits and integrals. An alternative way of proving result (2), but not any
other result in the paper, can be based on exploiting the convexity of the functional in
(2) with respect to the underlying curve, following the approach of Mossino and Temam
(1981), and Alvino, Lions, and Trombetti (1989). Due to this limitation, we do not
pursue this approach in this paper. However, details of this approach are described
in Chernozhukov, Fernandez-Val, and Galichon (2006b) with an application to some
nonparametric estimation problems.
14
It is also worth emphasizing the properties of the following smoothed functionals. For
a measurable function f : R 7→ R define the smoothing operator as
Sf(y) :=
∫kδ(y − y′)f(y′)dy′, (2.6)
where kδ(v) = 1{|v| ≤ δ}/2δ and δ > 0 is a fixed bandwidth. Accordingly, the smoothed
curves SF (y|x) and SF−1(u|x) are given by
SF (y|x) :=
∫kδ(y − y′)F (y′|x)dy′, SF−1(u|x) :=
∫kδ(u− u′)F−1(u′|x)du′.
Since these curves are merely formed as differences of the elementary functionals in
Proposition 4, followed by a division by δ, the following corollary is immediate.
Corollary 2. We have that SDht(y|x, t) → SDh(y|x) uniformly in (y, x) ∈ YX , and
SDht(u|x, t) → SDh(u|x) uniformly in (u, x) ∈ UX .
Note that smoothing accomplishes uniform convergence over the entire domain, which
is a good property to have from the perspective of data analysis.
2.3. Empirical Properties of F (y|x) and F−1(u|x). We are now ready to state the
main results for this section.
Proposition 5 (Improvement in Estimation Property Provided by Rearrangement).
Suppose that Q(·|·) is an estimator (not necessarily consistent) for some true quantile
curve Q0(·|·). Then, the rearranged curve F−1(u|x) is closer to the true curve than
Q(u|x) in the sense that, for each x ∈ X ,
(∫
U|Q0(u|x)− F−1(u|x)|pdu
)1/p
≤(∫
U|Q0(u|x)− Q(u|x)|pdu
)1/p
, p ∈ [1,∞],
where the inequality is strict for p ∈ (1,∞) whenever Q(u|x) is decreasing on a subset
of U of positive Lebesgue measure, while Q0(u|x) is increasing on U .
The above property is independent of the sample size and of the way the estimate of
the curve is obtained, and thus continues to hold in the population.
This proposition establishes that the rearranged quantile curves have smaller estima-
tion error than the original curves whenever the latter are not monotone. This is a very
15
important property that does not depend on the way the quantile model is estimated.
It also does not rely on any other specifics and is thus applicable quite generally.
The following proposition investigates the asymptotic distributions of the rearranged
curves.
Proposition 6 (Empirical Properties of (y, x) 7→ F (y|x) and (u, x) 7→ F−1(u|x)).
Suppose that Q(·|·) is an estimator for Q(·|·) that takes its values in the space of bounded
measurable functions defined on UX , and that, in `∞(UX ),
√n(Q(u|x)−Q(u|x)) ⇒ G(u|x),
as a stochastic process indexed by (u, x) ∈ UX , where (u, x) 7→ G(u|x) is a Gaussian
process with continuous paths. Assume also that Q(u|x) satisfies the basic conditions (a)
and (b). Then in `∞(K), where K is any compact subset of YX ∗,√
n(F (y|x)− F (y|x)) ⇒ DG(y|x)
as a stochastic process indexed by (y, x) ∈ YX ∗; and in `∞(UXK), with UXK = {(u, x) :
(F−1(u|x), x) ∈ K},√
n(F−1(u|x)− F−1(u|x)) ⇒ DG(u|x),
as a stochastic process indexed by (u, x) ∈ UXK.
Corollary 3 (Monotonically correct case). Suppose u 7→ Q(u|x) has Q′(u|x) > 0
for each (u, x) ∈ UX , then YX ∗ = YX and UX ∗ = UX . Accordingly, the con-
vergence in Proposition 5 holds uniformly over the entire YX and UX . Moreover,
DG(u|x) = G(u|x), i.e., the rearranged quantile curves have the same first order as-
ymptotic distribution as the original quantile curves.
Thus, in the monotonically correct case, the first order properties of the rearranged
and initial quantile estimates coincide. Hence, all the inference tools that apply to
original quantile estimates also apply to the rearranged quantile estimates. In particular,
if the bootstrap is valid for the original estimate, it is also valid for the rearranged
estimate, by the functional delta method for the bootstrap. In the empirical example
of Section 4, we exploit this useful property to construct uniform confidence bands for
the conditional quantile functions based on the rearranged quantile function estimates.
16
In addition to the results on quantile function estimates, Proposition 6 provides the
asymptotic properties of the distribution function estimates. The preceding remark
about the validity of bootstrap applies also to these estimates.
In the monotonically incorrect case, the large sample properties of the rearranged
quantile estimates differ from those of the initial quantile estimates. Proposition 6
enables us to perform inferences for rearranged curves in this case, including by the
bootstrap, but only after excluding certain nonregular neighborhoods (for the distribu-
tion estimates, the neighborhood of the critical values of the map u 7→ Q(u|x), and, for
the rearranged quantile estimates, the image of the latter neighborhood under F (y|x)).
However, if we consider the following linear functionals of the rearranged quantile and
distribution estimates:
(y′, x) 7→∫
Y1{y ≤ y′}g(y|x)F (y|x)dy, (u′, x) 7→
∫
U1{u ≤ u′}g(u|x)F−1(u|x)du,
then we no longer need to exclude the nonregular neighborhoods. The following propo-
sition describes the empirical properties of these functionals in large samples.
Proposition 7 (Empirical Properties of Integrated Curves). Under the conditions of
Proposition 6, the following results are true with the limits being continuous on the
specified domains:
1.√
n
∫
Y1{y ≤ y′}g(y|x)(F (y|x)− F (y|x))dy ⇒
∫
Y1{y ≤ y′}g(y|x)DG(y|x)dy,
as a stochastic process indexed by (y′, x) ∈ YX , in `∞(YX ).
2.√
n
∫
U1{u ≤ u′}g(u|x)(F−1(u|x)− F−1(u|x))du ⇒
∫
U1{u ≤ u′}g(u|x)DG(u|x)du,
as stochastic process indexed by (u′, x) ∈ UX , in `∞(UX ).
The restrictions on the function g are the same as in Proposition 4.
The linear functionals defined above are useful building blocks for various statistics,
such as partial means, various moments, and Lorenz curves. For example, the conditional
Lorenz curve is
L(u|x) =( ∫
U1{t ≤ u}F−1(t|x)dt
)/( ∫
UF−1(t|x)dt
),
17
which is a ratio of a partial mean to the mean. Hadamard differentiability of these
statistics with respect to the underlying Q(u|x) immediately follows from the Hadamard
differentiability of the elementary functionals of Proposition 7 by means of the chain
rule. Therefore, the asymptotic distribution of these statistics can be determined from
the asymptotic distribution of the linear functionals, by the functional delta method.
In particular, the validity of the bootstrap for these functionals is preserved by the
functional delta method for the bootstrap.
We next consider the empirical properties of the smoothed curves obtained by applying
the linear smoothing operator S defined in (2.6) to F (y′|x) and F−1(u|x):
SF (y|x) :=
∫kδ(y − y′)F (y′|x)dy′, SF−1(u|x) :=
∫kδ(u− u′)F−1(u′|x)du′.
The following corollary immediately follows from Corollary 2 and the functional delta
method.
Corollary 4 (Large Sample Properties of Smoothed Curves). Under the conditions of
Proposition 6, in `∞(YX ),
√n(SF (y|x)− SF (y|x)) ⇒ S[DG(y|x)],
as a stochastic process indexed by (y, x) ∈ YX , and in `∞(UX ),
√n(SF−1(u|x)− SF−1(u|x)) ⇒ S[DG(u|x)],
as a stochastic process indexed by (u, x) ∈ UX .
Thus, inference on the smoothed rearranged estimates can be performed without
excluding nonregular neighborhoods, which is convenient for practice. Furthermore,
validity of the bootstrap for the smoothed curves follows by the functional delta method
for the bootstrap.
3. Theory of Rearranged Distribution Curves
The rearrangement method can also be applied to rearrange cumulative distribution
curves monotonically by exchanging the roles of the quantile and probability spaces.
There are several situations where one might be faced with the problem of non-
increasing empirical distribution curves. In an option pricing context, for example,
18
Ait-Sahalia and Duarte (2003) use market data to estimate a risk-neutral distribution.
Estimation error may cause the resulting distribution function to be non-monotonic. In
other cases the distribution curve is obtained by some inverse transformation and local
non-monotonicity comes as an artefact of the regularization technique. In other situa-
tions the particular estimation technique may not respect monotonicity (see, e.g., Hall,
Wolff, and Yao, 1999). We present an alternative solution to this problem that uses the
rearrangement method.
Here we do not present the conditional case for notational convenience. All derivations
for conditional distributions, however, are exactly parallel to those presented in this
section. Suppose we have y 7→ P (y) as a candidate empirical probability distribution
curve, which does not necessarily satisfy monotonicity, with population counterpart
P (y). Define the following quantile curve
Q(u) =
∫ ∞
0
1{P (y) < u}dy −∫ 0
−∞1{P (y) > u}dy,
which is monotone. In what follows, we further assume that the support of P (y) is
Y ⊂ [0, +∞), so that the second term drops out (otherwise it can be treated analogously
to the first).
The inverse of the quantile curve is the rearranged probability curve
F (y) = inf{
y : Q (u) ≥ y}
,
which is also monotone by construction. It should be clear at this point that the quan-
tities Q and F are exactly symmetric to F and Q in the quantile case.
The following improved approximation property is true for F : Let F0(y) be the true
distribution function, then for all p ∈ [1,∞],
(∫
R|F0(y)− F (y)|pdy
)1/p
≤(∫
R|F0(y)− P (y)|pdy
)1/p
,
where the inequality is strict for p ∈ (1,∞) whenever the integral on the right is finite
and y 7→ P (y) is decreasing on a subset of positive Lebesgue measure, while F0(u) is
strictly increasing. This property is independent of the sample size, and thus continues
to hold in the population.
19
In the monotonically correct case, that is when P ′ (Q(u)) > 0 for all u ∈ [0, 1], if the
empirical distribution curve P (y) satisfies
√n
(P (y)− P (y)
)⇒ G(y)
in `∞(Y), where G is a Gaussian process, then
√n
(Q(u)−Q(u)
)⇒
(1
P ′(Q(u))
)G (Q(u))
in `∞([0, 1]), and, in `∞(Y),
√n
(F (y)− F (y)
)⇒ G (y) . (3.1)
Results paralleling those of the previous section also follow for the monotonically
incorrect case. In particular, we have
√n
(Q(u)−Q(u)
)⇒
K(u)∑
k=1
G (yk (u))
|P ′(yk(u))|in `∞([0, 1]), and
√n
(F (y)− F (y)
)⇒
K(u)∑
k=1
1
|P ′(yk (u))|
−1
K(u)∑
k=1
G (yk (u))
|P ′(yk (u))|
∣∣∣∣∣∣u=P (y)
(3.2)
in `∞(Y), where {yk(u), for k = 1, ..., K(u)} are the roots of P (y) = u, assuming K(u)
is bounded uniformly in u.
4. Illustrative Examples
4.1. Empirical Example. To illustrate the practical applicability of the rearrangement
method, we consider the estimation of expenditure curves. We use the original Engel
(1857) data, from 235 budget surveys of 19th century working-class Belgium households,
to estimate the relationship between food expenditure and annual household income (see
Koenker, 2005). Ernst Engel originally presented these data to support the hypothesis
that food expenditure constitutes a declining share of household income (Engel’s Law).
In Figure 3, we show a scatterplot of the Engel data on food expenditure versus house-
hold income, along with quantile regression curves with the quantile indices {05, 0.1, ...,
0.95}. We see that the quantile regression lines become closer and cross at low values of
20
income. This crossing problem of the Engel curves is also evident in Figure 4, in which
we plot the quantile regression process of food expenditure as a function of the quantile
index. For low values of income, the quantile regression process is clearly non-monotone.
The rearrangement procedure fixes the non-monotonicity producing increasing quantile
functions. Moreover, the rearranged curves coincide with their quantile regression coun-
terparts for the middle values of income where there is no quantile-crossing problem.
In Figure 5, we plot simultaneous 90% confidence intervals for the conditional quantile
function of food expenditure for different values of income (at the sample median, and the
5% percentile of income). We construct the bands using both original quantile regression
curves and rearranged quantile curves based on 500 bootstrap repetitions and a grid of
quantile indices {0.10, 0.11, ..., 0.90}. We obtain the bands for the rearranged curves
assuming that the population quantile regression curves are monotonically correct, so
that the first order behavior of the rearranged curves coincides with the behavior of the
original curves. The figure shows that even for the low value of income the rearranged
bands lie within the quantile regression bands. This observation points towards the
maintained assumption of the monotonically correct case. The lack of monotonicity of
the estimated quantile regression process in this case is likely to by caused by sampling
error.
We find more evidence consistent with the monotonically correct case in Figure 6, in
which we plot the simultaneous confidence bands for the smoothed quantile regression
and rearranged curves. We construct the band by bootstrapping the smoothed curves
(with bandwidth equal to .05). The bootstrap bands are valid for the smoothed rear-
ranged curves even in the monotonically incorrect case. The almost perfect overlapping
between the confidence bands points towards the monotonically correct case. Interest-
ingly, smoothing reduces the width of the confidence bands, but does not completely
monotonize the quantile regression curves.
4.2. Monte Carlo. We use the following Monte Carlo experiment, matching closely the
previous empirical application, to illustrate the estimation properties of the rearranged
curves in finite samples. In particular, we consider two designs based on the location-
scale shift model: Y = Z(X)′α + (Z(X)′γ)ε, where ε is independent of X, with the true
21
400 600 800 1000 1200 1400
300
400
500
600
700
800
Income
Food E
xpenditu
re
Figure 3. The scatterplot and quantile regression fits of the Engel
food expenditure data. The plot shows a scatterplot of the Engel data
on food expenditure vs. household income for a sample of 235 19th cen-
tury working-class Belgium households. Superimposed on the plot are the
{0.05, 0.10, ..., 0.95} quantile regression curves. The range displayed corre-
sponds to values of income lower than 1500 and values of food expenditure
lower than 800.
conditional quantile function
Q0(u|X) = Z(X)′α + (Z(X)′γ)Qε(u).
Design 1 includes a constant and a regressor, namely Z(X) = (1, X); and design 2
has an additional nonlinear regressor, namely, Z(X) = (1, X, 1{X > a} · X), where
a = median(X). We select the parameters for designs 1 and 2 to match the Engel
empirical example, employing the estimation method of Koenker and Xiao (2002). For
design 1 we set α = (624.15, 0.55) and γ = (1, 0.0013); and for design 2 we set α =
22
0.0 0.2 0.4 0.6 0.8 1.0
270
290
310
330
u
Foo
d E
xpen
ditu
re
A. Income = 394 (1% quantile)
Q(u)F (u)Q(u)
−1
0.0 0.2 0.4 0.6 0.8 1.0
300
320
340
360
380
u
Foo
d E
xpen
ditu
re
B. Income = 452 (5% quantile)
Q(u)F (u)Q(u)
−1
0.0 0.2 0.4 0.6 0.8 1.0
500
550
600
650
u
Foo
d E
xpen
ditu
re
C. Income = 884 (Median)
Q(u)F (u)Q(u)
−1
0.0 0.2 0.4 0.6 0.8 1.0
1100
1300
1500
1700
u
Foo
d E
xpen
ditu
re
D. Income = 2533 (99% quantile)
Q(u)F (u)Q(u)
−1
Figure 4. Quantile regression processes and rearranged quantile pro-
cesses for the Engel food expenditure data. Quantile regression estimates
are plotted with a thick gray line, whereas the rearranged estimates are
plotted in black.
(624.15, 0.55,−0.003) and γ = (1, 0.0017,−0.0003). For each design, we draw 1,000
Monte Carlo samples of size n = 235. To generate the values of the dependent variable,
we draw observations from a normal distribution with the same mean and variance as
the residuals ε = (Y −Z(X)′α)/(Z(X)′γ) of the Engel data set; and we fix the regressor
X in all the replications to the observations of income in the Engel data set.
We use designs 1 and 2 to assess the estimation properties of the original and rear-
ranged quantile regressions under the correct and incorrect specification of the functional
23
0.0 0.2 0.4 0.6 0.8 1.0
250
300
350
400
u
Food E
xpenditu
reA. Income = 452 (5% quantile)
Q(u)F (u)Q(u)
−1
0.0 0.2 0.4 0.6 0.8 1.0
450
500
550
600
650
700
u
Food E
xpenditu
re
B. Income = 884 (Median)
Q(u)F (u)Q(u)
−1
Figure 5. Simultaneous 90% confidence bands for quantile regression
processes and rearranged quantile processes for the Engel food expenditure
data. Two different values of the income regressor are considered. The
bands for quantile regression are plotted in light gray, whereas the bands
for rearranged quantile regression are plotted in dark gray.
form. Thus, in each replication, we estimate the model
Q(u|X) = Z(X)′β(u), Z(X) = (1, X).
This gives the correct functional form for design 1, that is, Q(u|X) ≡ Q0(u|X), and an
incorrect functional form for design 2, that is Q(u|X) 6≡ Q0(u|X) (due to the omission of
a nonlinear regressor). Accordingly, estimation error for design 1 arises entirely due to
sampling error, while the estimation error for design 2 arises due to both sampling error
and specification error. Regardless of the nature of the estimation error, Proposition 5
establishes that the rearranged quantile curves should be closer to the true conditional
quantiles than the original curves.
24
0.0 0.2 0.4 0.6 0.8 1.0
250
300
350
400
u
Food E
xpenditu
reA. Income = 452 (5% quantile)
SQ(u)SF (u)SQ(u)S −1
0.0 0.2 0.4 0.6 0.8 1.0
450
500
550
600
650
700
u
Food E
xpenditu
re
B. Income = 884 (Median)
SQ(u)SF (u)SQ(u)S −1
Figure 6. Simultaneous 90% confidence bands for the smoothed quan-
tile regression processes and the smoothed rearranged quantile processes
for the Engel food expenditure data. The bands for the smoothed original
curves are plotted in light gray, whereas the bands for the smoothed rear-
ranged curves are plotted in dark gray. The smoothed curves are obtained
using a bandwidth equal to 0.05.
In each replication, we fit a linear quantile regression curve Q(u|X) = X ′β(u) and
monotonize this curve to get F−1(u|X) using the rearrangement method. Table 1 reports
measures of the estimation error of the original and rearranged estimated conditional
quantile curves using different norms (p = 1, 2, 3, 4, and ∞), with the regressor fixed at
a value, X = x0, that corresponds to the 5% quantile of the regressor X (X = 452).
We select this value motivated by the empirical example. Each entry of the table gives
a Monte Carlo average of
Lp :=
(∫
U|Q0(u|x0)− Q(u|x0)|pdu
)1/p
,
25
for Q(u|x0) = x′0β(u) and Q(u|x0) = F−1(u|x0). We evaluate the integral using a net of
indices u of size .01.
Both in the correctly specified case and in the misspecified case, we find that the
rearranged curves estimate the true quantile curves Q0(u|X) more accurately than the
original curves, providing a 4% to 15% reduction in the estimation/approximation error,
depending on the norm.
Table 1. Estimation Error of Original and Rearranged Curves.
Design 1: Correct Specification Design 2: Incorrect Specification
Original Rearranged Ratio Original Rearranged Ratio
L1 6.79 6.61 0.96 7.33 7.02 0.95
L2 7.99 7.69 0.95 8.72 8.20 0.93
L3 8.93 8.51 0.95 9.85 9.12 0.92
L4 9.70 9.17 0.94 10.78 9.86 0.91
L∞ 17.14 15.32 0.90 19.44 16.44 0.85
5. Conclusion
This paper analyzes a simple regularization procedure for estimation of conditional
quantile and distribution functions based on rearrangement operators. Starting from
a possibly non-monotone empirical curve, the procedure produces a rearranged curve
that not only satisfies the natural monotonicity requirement, but also has smaller esti-
mation error than the original curve. Asymptotic distribution theory is derived for the
rearranged curves, and the usefulness of the approach is illustrated with an empirical
example and a simulation experiment.
† Massachusetts Institute of Technology, Department of Economics and Operations
Research Center, University College London, CEMMAP, and The University of Chicago.
E-mail: [email protected]. Research support from the Castle Krob Chair, National Science
Foundation, the Sloan Foundation, and CEMMAP is gratefully acknowledged.
§ Boston University, Department of Economics. E-mail: [email protected].
26
‡ Harvard University, Department of Economics. E-mail: [email protected].
Research support from the Conseil General des Mines and the National Science Foun-
dation is gratefully acknowledged.
27
Appendix A. Proofs
A.1. Proof of Proposition 1. First, note that the distribution of Yx has no atoms,
i.e.,
Pr[Yx = y] = Pr[Q(U |x) = y] = Pr[U ∈ {u ∈ U : u is a root of Q(u|x) = y}] = 0,
since the number of roots of Q(u|x) = y is finite under (a) - (b), and U ∼ Uniform(U).
Next, by assumptions (a)-(b) the number of critical values of Q(u|x) is finite, hence
claim (1) follows.
Next, for any regular y, we can write F (y|x) as
∫ 1
0
1{Q(u|x) ≤ y}du =
K(y|x)−1∑
k=0
∫ uk+1(y|x)
uk(y|x)
1{Q(u|x) ≤ y}du +
∫ 1
uK(y|x)(y|x)
1{Q(u|x) ≤ y}du,
where u0(y|x) := 0 and {uk(y|x), for k = 1, ..., K(y|x) < ∞} are the roots of Q(u|x) = y
in increasing order. Note that the sign of Q′(u|x) alternates over consecutive uk(y|x),
determining whether 1{Q(y|x) ≤ y} = 1 on the interval [uk−1(y|x), uk(y|x)]. Hence
the first term in the previous expression simplifies to∑K(y|x)−1
k=0 1{Q′(uk+1(y|x)|x) ≥0}(uk+1(y|x)− uk(y|x)); while the last term simplifies to 1{Q′(uK(y|x)(y|x)|x) ≤ 0}(1−uK(y|x)(y|x)). An additional simplification yields the expression given in claim (2) of the
proposition.
The proof of claim (3) follows by taking the derivative of expression in claim (2), noting
that at any regular value y the number of solutions K(y|x) and sign(Q′(uk(y|x)|x)) are
locally constant; moreover,
u′k(y|x) =sign(Q′(uk(y|x)|x))
|Q′(uk(y|x)|x)| .
Combining these facts we get the expression for the derivative given in claim (3).
To show the absolute continuity of F (y|x) with f(y|x) being the Radon-Nykodym
derivative, it suffices to show that for each y′ ∈ Yx,∫ y′
−∞ f(y|x)dy =∫ y′
−∞ dF (y|x), cf.
Theorem 31.8 in Billingsley (1995). Let V xt be the union of closed balls of radius t
centered on the critical points Yx \ Y∗x, and define Y tx = Yx\V x
t . Then,∫ y′
−∞ 1{y ∈Y t
x}f(y|x)dy =∫ y′
−∞ 1{y ∈ Y tx}dF (y|x). Since the set of critical points Yx \ Y∗x is finite
28
and has mass zero under F (y|x),∫ y′
−∞ 1{y ∈ Y tx}dF (y|x) ↑ ∫ y′
−∞ dF (y|x) as t → 0.
Therefore,∫ y′
−∞ 1{y ∈ Y tx}f(y|x)dy ↑ ∫ y′
−∞ f(y|x)dy =∫ y′
−∞ dF (y|x).
Claim (4) follows by noting that at the regions where s → Q(s|x) is increasing and
one-to-one, we have that F (y|x) =∫
Q(s|x)≤yds =
∫s≤Q−1(y|x)
ds = Q−1(y|x). Inverting
the equation u = F (F−1(u|x)|x) = Q−1(F−1(u|x)|x) yields F−1(u|x) = Q(u|x).
Claim (5). We have Yx = Q(U |x) has quantile function F−1(u|x). The quantile
function of α + βQ(U |x) = α + βYx, for β > 0, is therefore inf{y : Pr(α + βYx ≤ y) ≥u} = α + βF−1(u|x).
Claim (6) is immediate from claim (3).
Claim (7). The proof of continuity of F (y|x) is subsumed in the step 1 of the proof of
Proposition 3 (see below). Therefore, for any sequence xt → x we have that F (y|xt) →F (y|x) uniformly in y, and F (y|x) is continuous. Let ut → u and xt → x. Since
F (y|x) = u has a unique root y = F−1(u|x), the root of F (y|xt) = ut, i.e., yt =
F−1(ut|xt), converges to y by a standard argument, see, e.g., van der Vaart and Wellner
(1997). ¤
A.2. Proof of Propositions 2-7. In the proofs that follow we will repeatedly use
Lemma 1, which establishes the equivalence of continuous convergence and uniform
convergence:
Lemma 1. Let D and D′ be complete separable metric spaces, with D compact. Suppose
f : D → D′ is continuous. Then a sequence of functions fn : D → D′ converges to f
uniformly on D if and only if for any convergent sequence xn → x in D we have that
fn(xn) → f(x).
Proof of Lemma 1: See, for example, Resnick (1987), page 2. ¤
Proof of Proposition 2. We have that for any δ > 0, there exists ε > 0 such that
for u ∈ Bε(uk(y|x)) and for small enough t ≥ 0
1{Q(u|x) + tht(u|x) ≤ y} ≤ 1{Q(u|x) + t(h(uk(y|x)|x)− δ) ≤ y},
for all k ∈ 1, ..., K(y|x); whereas for all u 6∈ ∪kBε(uk(y|x)), as t → 0,
1{Q(u|x) + tht(u|x) ≤ y} = 1{Q(u|x) ≤ y}.
29
Therefore,
∫ 1
01{Q(u|x) + tht(u|x) ≤ y}du− ∫ 1
01{Q(u|x) ≤ y}du
t(A.1)
≤K(y|x)∑
k=1
∫
Bε(uk(y|x))
1{Q(u|x) + t(h(uk(y|x)|x)− δ) ≤ y} − 1{Q(u|x) ≤ y}t
du,
which by the change of variables y′ = Q(u|x) is equal to
1
t
K(y|x)∑
k=1
∫
Jk∩[y,y−t(h(uk(y|x)|x)−δ)]
1
|Q′(Q−1(y′|x)|x)|dy′,
where Jk is the image of Bε(uk(y|x)) under u 7→ Q(·|x). The change of variables is
possible because for ε small enough, Q(·|x) is one-to-one between Bε(uk(y|x)) and Jk.
Fixing ε > 0, for t → 0, we have that Jk ∩ [y, y − t(h(uk(y|x)|x) − δ)] = [y, y −t(h(uk(y|x)|x) − δ)], and |Q′(Q−1(y′|x)|x)| → |Q′(uk(y|x)|x)| as Q−1(y′|x) → uk(y|x).
Therefore, the right hand term in (A.1) is no greater than
K(y|x)∑
k=1
−h(uk(y|x)|x) + δ
|Q′(uk(y|x)|x)| + o (1) .
Similarly∑K(y|x)
k=1−h(uk(y|x)|x)−δ|Q′(uk(y|x)|x)| + o (1) bounds (A.1) from below. Since δ > 0 can be
made arbitrarily small, the result follows.
To show that the result holds uniformly in (y, x) ∈ K, a compact subset of YX ∗, we
use Lemma 1. Take a sequence of (yt, xt) in K that converges to (y, x) ∈ K, then the
preceding argument applies to this sequence, since (1) the function (y, x) 7→ −h(uk(y|x)|x)|Q′(uk(y|x)|x)|
is uniformly continuous on K, and (2) the function (y, x) 7→ K(y|x) is uniformly con-
tinuous on K. To see (2), note that K excludes a neighborhood of critical points
(Y \ Y∗x, x ∈ X ), and therefore can be expressed as the union of a finite number of com-
pact sets (K1, ..., KM) such that the function K(y|x) is constant over each of these sets,
i.e., K(y|x) = kj for some integer kj > 0, for all (y, x) ∈ Kj and j ∈ {1, ..., M}. Likewise,
(1) follows by noting that the limit expression for the derivative is continuous on each of
the sets (K1, ..., KM) by the assumed continuity of h(u|x) in both arguments, continuity
of uk(y|x) (implied by the Implicit Function Theorem), and the assumed continuity of
Q′(u|x) in both arguments. ¤
30
Proof of Proposition 3. For a fixed x the result follows by Proposition 2, by step
1 of the proof below, and by an application of the Hadamard differentiability of the
quantile operator shown by Doss and Gill (1992). Step 2 establishes uniformity over
x ∈ X .
Step 1. Let K be a compact subset of YX ∗. Let (yt, xt) be a sequence in K, convergent
to a point, say (y, x). Then, for every such sequence, εt := t‖ht‖∞+‖Q(·|xt)−Q(·|x)‖∞+
|yt − y| → 0, and
|F (yt|xt, ht)− F (y|x)| ≤∣∣∣∫ 1
0
[1{Q(u|xt) + tht(u|x) ≤ yt} − 1{Q(u|x) ≤ y}]du∣∣∣
≤∣∣∣∫ 1
0
1{|Q(u|x)− y| ≤ εt}du∣∣∣ → 0, (A.2)
where the last step follows from the absolute continuity of y 7→ F (y|x), the distribution
function of Q(U |x). By setting ht = 0 the above argument also verifies that F (y|x)
is continuous in (y, x). Lemma 1 implies uniform convergence of F (y|x, ht) to F (y|x),
which in turn implies by a standard argument1 the uniform convergence of quantiles
F−1(u|x, ht) → F−1(u|x), uniformly over K∗, where K∗ is any compact subset of UX ∗.
Step 2. We have that uniformly over K∗,
F (F−1(u|x, ht)|x, ht)− F (F−1(u|x, ht)|x)
t= Dh(F
−1(u|x, ht)|x) + o(1),
= Dh(F−1(u|x)|x) + o(1),
(A.3)
using Step 1, Proposition 2, and the continuity properties of Dh(y|x). Further, uniformly
over K∗, by Taylor expansion and Proposition 1, as t → 0,
F (F−1(u|x, ht)|x)− F (F−1(u|x)|x)
t= f(F−1(u|x)|x)
F−1(u|x, ht)− F−1(u|x)
t+ o(1),
(A.4)
and (as will be shown below)
F (F−1(u|x, ht)|x, ht)− F (F−1(u|x)|x)
t= o(1), (A.5)
as t → 0. Observe that the left hand side of (A.5) equals that of (A.4) plus that of
(A.3). The result then follows.
1See, e.g., Lemma 1 in Chernozhukov and Fernandez-Val (2005).
31
It only remains to show that equation (A.5) holds uniformly in K∗. Note that for any
right-continuous cdf F , we have that u ≤ F (F−1(u)) ≤ u + F (F−1(u)) − F (F−1(u)−),
where F (·−) denotes the left limit of F , i.e., F (x0−) = limx↑x0 F (x). For any continuous,
strictly increasing cdf F , we have that F (F−1(u)) = u. Therefore, write
0 ≤ F (F−1(u|x, ht)|x, ht)− F (F−1(u|x)|x)
t
≤ u + F (F−1(u|x, ht)|x, ht)− F (F−1(u|x, ht)− |x, ht)− u
t
≤ F (F−1(u|x, ht)|x, ht)− F (F−1(u|x, ht)− |x, ht)
t
(1)=
[F (F−1(u|x, ht)|x, ht)− F (F−1(u|x, ht)|x)]
t
− [F (F−1(u|x, ht)− |x, ht)− F (F−1(u|x, ht)− |x)]
t(2)= Dh(F
−1(u|x, ht)|x)−Dh(F−1(u|x, ht)− |x) + o(1) = o(1),
as t → 0, where in (1) we use that F (F−1(u|x, ht)|x) = F (F−1(u|x, ht)−|x) since F (y|x)
is continuous and strictly increasing in y, and in (2) we use Proposition 2. ¤
The following lemma, due to Pratt (1960), will be very useful to prove Proposition 4.
Lemma 2. Let |fn| ≤ Gn and suppose that fn → f and Gn → G almost everywhere,
then if∫
Gn →∫
G finite, then∫
fn →∫
f .
Proof of Lemma 2. See Pratt (1960). ¤
Lemma 3 (Boundedness and Integrability Properties). Under the hypotheses of Propo-
sition 2 and 3, we have that for all (y, x) ∈ YX :
|Dht(u|x, t)| ≤ ‖ht‖∞, (A.6)
and
|Dht(y|x, t)| ≤ ∆(y|x, t) =
∫ 1
0
1{|Q(u|x)− y| ≤ t‖ht‖∞}t
du, (A.7)
where for any xt → x ∈ X , as t → 0,
∆(y|xt, t) → 2‖h‖∞f(y|x) for a.e y ∈ Y and
∫
Y∆(y|xt, t)dy →
∫
Y2‖h‖∞f(y|x)dy.
32
Proof of Lemma 3. To show (A.6) note that
supx∈X ,y∈Y
|Dht(y|x, t)| ≤ ‖ht‖∞ (A.8)
immediately follows from the equivariance property noted in Claim (5) of Proposition 1.
The inequality (A.7) is trivial. That for any xt → x ∈ X , ∆(y|xt, t) → 2‖h‖∞f(y|x)
for a.e y ∈ Y follows by applying Proposition 2 respectively with functions h′t(u|x) =
‖ht‖∞ and h′t(u, x) = −‖ht‖∞ (for the case when f(y|x) > 0; and trivially otherwise).
Similarly, that for any yt → y ∈ Y , ∆(yt|x, t) → 2‖h‖∞f(y|x) for a.e x ∈ X follows by
Proposition 2 (for the case when f(y|x) > 0; and trivially otherwise) .
Further, by Fubini’s Theorem,∫
Y∆(y|xt, t)dy =
∫ 1
0
(∫
Y
1{|Q(u|xt)− y| ≤ t‖ht‖∞}t
dy
)
︸ ︷︷ ︸=: ft(u)
du. (A.9)
Note that ft(u) ≤ 2‖ht‖∞. Moreover, for almost every u, ft(u) = 2‖ht‖∞ for small
enough t, and 2‖ht‖∞ converges to 2‖h‖∞ as t → 0. Then, trivially, 2∫ 1
0‖ht‖∞du →
2‖h‖∞. By Lemma 2 the right hand side of (A.9) converges to 2‖h‖∞. ¤
A.3. Proof of Proposition 4. Define mt(y|x, y′) := 1{y ≤ y′}g(y|x)Dht(y|x, t) and
m(y|x, y′) := 1{y ≤ y′}g(y|x)Dh(y|x). To show claim (1), we need to demostrate that
for any y′t → y′ and xt → x∫
Ymt(y|xt, y
′t)dy →
∫
Ym(y|x, y′)dy, (A.10)
and that the limit is continuous in (x, y′). We have that |mt(y|xt, yt)| is bounded, for
some constant C, by C∆(y|xt, t) which converges a.e. and the integral of which converges
to a finite number by Lemma 3. Moreover, by Proposition 2, for almost every y we have
mt(y|xt, y′t) → m(y|x, y′). We conclude that (A.10) holds by Lemma 2.
In order to check continuity, we need to show that for any y′t → y′ and xt → x∫
Ym(y|xt, y
′t)dy →
∫
Ym(y|x, y′)dy. (A.11)
We have that m(y|xt, y′t) → m(y|x, y′) for almost every y. Moreover, m(y|xt, yt) is
dominated by ‖g‖∞‖h‖∞f(y|xt), which converges to ‖g‖∞‖h‖∞f(y|x) for almost every
33
y, and, moreover,∫Y ‖g‖∞‖h‖∞f(y|x)dy converges to ‖g‖∞‖h‖∞. Conclude that (A.11)
holds by Lemma 2.
To show claim (2), define mt(u|x, u′) = 1{u ≤ u′}g(u|x)Dht(u|x) and m(u|x, u′) =
1{u ≤ u′}g(u|x)Dh(u|x). Here we need to show that for any u′t → u′ and xt → x∫
Umt(u|xt, u
′t)du →
∫
Um(u|x, u′)du, (A.12)
and that the limit is continuous in (u′, x). We have that mt(u|xt, u′t) is bounded by
g(u|xt)‖ht‖∞, which converges to g(u|x)‖h‖∞ for a.e. u. Furthermore, the integral of
g(u|xt)‖ht‖∞ converges to the integral of g(u|x)‖h‖∞ by the dominated convergence
theorem. Moreover, by Proposition 2, we have that mt(u|xt, u′t) → m(u|x, u′) for almost
every u. We conclude that (A.12) holds by Lemma 2.
In order to check the continuity of the limit, we need to show that for any u′t → u′
and xt → x ∫
Um(u|xt, u
′t)du →
∫
Um(u|x, u′)du. (A.13)
We have that m(u|xt, u′t) → m(u|x, u′) for almost every u. Moreover, for small enough
t, m(u|xt, u′t) is dominated by |g(u|xt)|‖h‖∞, which converges for almost every value of
u to |g(u|x)|‖h‖∞ as t → 0. Furthermore, the integral of |g(u|xt)|‖h‖∞ converges to the
integral of |g(u|x)|‖h‖∞ by dominated convergence theorem. We conclude that (A.13)
holds by Lemma 2. ¤
The following lemma will be used to prove Proposition 5:
Lemma 4. Assume that Q(u) is a function mapping U := (0, 1) to K, a bounded subset
of R, and that Q0(u) is a non-decreasing function mapping U to K. Think of Q(u)
as an approximation to Q0(u). Let FQ(y) =∫U 1{Q(u) ≤ y}du denote the distribution
function of Q(U) when U ∼ U(0, 1). Let Q∗(u) = F−1Q (u) = inf{y ∈ R : FQ(y) ≥ u}.
Then, for any p ∈ [1,∞],[∫
U|Q0(u)−Q∗(u)|p du
]1/p
≤[∫
U|Q0(u)−Q(u)|p du
]1/p
.
Moreover, this inequality is strict provided (1) p ∈ (1,∞), (2) Q(u) is decreasing on
a subset of U that has positive Lebesgue measure, and (3) the true function Q0(u) is
increasing on U .
34
Proof of Lemma 4. A direct proof of this lemma is given in Proposition 1 of
Chernozhukov, Fernandez-Val, Galichon (2006a). It is helpful to give a quick indirect
proof of the weak inequality contained in the lemma using the following inequality due
to Lorentz (1953): Let Q and G be two functions mapping U to K, a bounded subset
of R. Let Q∗ and G∗ denote their corresponding increasing rearrangements. Then, we
have ∫
UL(Q∗(u), G∗(u))du ≤
∫
UL(Q(u), G(u))du,
for any submodular discrepancy function L : R2 7→ R+ In our case, G(u) = Q0(u) =
G∗(u) = Q∗0(u) almost everywhere. Thus, the true function is its own rearrangement.
Moreover, L(v, w) = |w − v|p is submodular for p ∈ [1,∞). For the proof of the strict
inequality, please refer to Chernozhukov, Fernandez-Val, Galichon (2006a), Proposition
1. For p = ∞, the inequalities follows by taking limit as p →∞. ¤
A.4. Proof of Proposition 5. This proposition is an immediate consequence of Lemma
4. ¤
A.5. Proof of Proposition 6. This Proposition simply follows by the functional delta
method (e.g. van der Vaart, 1998). Instead of restating what this method is, it takes
less space to simply recall the proof in the current context.
To show the first part, consider the map gn(y, x|h) =√
n(F (y|x, n−1/2h) − F (y|x)).
The sequence of maps satisfies gn′(y, x|hn′) → Dh(y|x) in `∞(K) for every subsequence
hn′ → h in `∞(UX ∗), where h is continuous. It follows by the Extended Continuous Map-
ping Theorem that, in `∞(K), gn(y, x|√n(Q(u|x)−Q(u|x))) ⇒ DG(y|x) as a stochastic
process indexed by (y, x), since√
n(Q(u|x)−Q(u|x)) ⇒ G(u|x) in `∞(K).
Conclude similarly for the second part. ¤
A.6. Proof of Proposition 7. This follows by the functional delta method, similarly
to the proof of Proposition 6. ¤
References
[1] Ait-Sahalia, Y. and Duarte, J. (2003),“Nonparametric option pricing under shape restrictions,”
Journal of Econometrics 116, pp. 9–47.
35
[2] Alvino, A., Lions, P. L. and Trombetti, G. (1989), “On Optimization Problems with Prescribed
Rearrangements,” Nonlinear Analysis 13 (2), pp. 185–220.
[3] Angrist, J., Chernozhukov, V., and I. Fernandez-Val (2006): “Quantile Regression under Misspec-
ification, with an Application to the U.S. Wage Structure,” Econometrica 74, pp. 539–563.
[4] Billingsley, P. (1995), Probability and measure. Third edition. Wiley Series in Probability and
Mathematical Statistics. A Wiley-Interscience Publication. John Wiley & Sons, Inc., New York.
[5] Chernozhukov, V., and I. Fernandez-Val (2005): “Subsampling Inference on Quantile Regression
Processes. ” Sankhya 67, pp. 253–276.
[6] Chernozhukov, V., Fernandez-Val, I., and A. Galichon (2006a): “Improving Estimates of Monotone
Functions by Rearrangement,” Preprint, available at arxiv.org and ssrn.com.
[7] Chernozhukov, V., Fernandez-Val, I., and A. Galichon (2006b): “An Addendum for Quantile and
Probability Curves Without Crossing (Alternative Proof Directions and Explorations) ” Preprint.
[8] Dette, H., Neumeyer, N., and K. Pilz (2006): “A simple Nonparametric Estimator of a Strictly
Monotone Regression Function,” Bernoulli, 12, no. 3, pp 469-490.
[9] Doksum, K. (1974): “Empirical Probability Plots and Statistical Inference for Nonlinear Models
in the Two-Sample Case,” Annals of Statistics 2, pp. 267–277.
[10] Doss, Hani; Gill, Richard D. (1992), “An elementary approach to weak convergence for quan-
tile processes, with applications to censored survival data.” Journal of the American Statistical
Association 87, no. 419, 869–877.
[11] Dudley, R. M., and R. Norvaisa (1999), Differentiability of six operators on nonsmooth functions
and p-variation. With the collaboration of Jinghua Qian. Lecture Notes in Mathematics, 1703.
Springer-Verlag, Berlin.
[12] Engel, E. (1857), “Die Produktions und Konsumptionsverhaltnisse des Konigreichs Sachsen,”
Zeitschrift des Statistischen Bureaus des Koniglich Sachsischen Misisteriums des Innerm, 8, pp.
1-54.
[13] Gill, R. D., and S. Johansen (1990), “A survey of product-integration with a view toward applica-
tion in survival analysis.” Annals of Statistics 18, no. 4, 1501–1555.
[14] Gutenbrunner, C., and J. Jureckova (1992): “Regression Quantile and Regression Rank Score
Process in the Linear Model and Derived Statistics,” Annals of Statistics 20, pp. 305-330.
[15] Hall, P., Wolff, R., and Yao, Q. (1999), “Methods for estimating a conditional distribution func-
tion,” Journal of the American Statistical Association 94, pp. 154–163.
[16] Hardy, G., Littlewood, J., and G. Polya (1952), Inequalities. Cambridge: Cambridge University
Press.
[17] He, X. (1997), “Quantile Curves Without Crossing,” American Statistician, 51, pp. 186–192.
36
[18] Koenker, R. (1994): “Confidence Intervals for Regression Quantiles,” in M.P. and M. Huskova
(eds.), Asymptotic Statistics: Proceeding of the 5th Prague Symposium on Asymptotic Statistics.
Physica-Verlag.
[19] Koenker, R. (2005), Quantile Regression. Econometric Society Monograph Series 38, Cambridge
University Press.
[20] Koenker, R., P. Ng (2005), “Inequality constrained quantile regression.” Sankhya 67, no. 2, 418–
440.
[21] Koenker, R., and Z. Xiao (2002): “Inference on the Quantile Regression Process,” Econometrica
70, no. 4, pp. 1583–1612.
[22] Lehmann, E. (1974): Nonparametrics: Statistical Methods Based on Ranks, San Francisco: Holden-
Day.
[23] Lorentz, G. G. (1953): “An Inequality for Rearrangements,” The American Mathematical Monthly
60, pp. 176–179.
[24] Milnor, J. (1965), Topology from the differential viewpoint, Princeton University Press.
[25] Mossino J. and R. Temam (1981), “Directional derivative of the increasing rearrangement mapping
and application to a queer differential equation in plasma physics,” Duke Math. J. 48 (3), 475–495.
[26] Portnoy, S. (1991), “Asymptotic behavior of regression quantiles in nonstationary, dependent
cases,” Journal of Multivariate Analysis 38 , no. 1, 100–113.
[27] Portnoy, S. , and R. Koenker (1997), “The Gaussian hare and the Laplacian tortoise: computability
of squared-error versus absolute-error estimators,” Statist. Sci. 12, no. 4, 279–300.
[28] Pratt, J.W. (1960), “On interchanging limits and integrals.” Annals of Mathematical Statistics 31,
74–77.
[29] Resnick, S. I. (1987), Extreme values, regular variation, and point processes, Applied Probability.
A Series of the Applied Probability Trust, 4. Springer-Verlag, New York.
[30] Vaart, A. van der (1998). Asymptotic statistics. Cambridge Series in Statistical and Probabilistic
Mathematics, 3.
[31] Vaart, A. van der, and J. Wellner (1996), Weak convergence and empirical processes: with appli-
cations to statistics, New York: Springer.
[32] Villani, C. (2003), Topics in Optimal Transportation, Providence: American Mathematical Society.
Victor ChernozhukovIván Fernández-Val
Alfred Galichon
THE INSTITUTE FOR FISCAL STUDIESDEPARTMENT OF ECONOMICS, UCL
cemmap working paper CWP09/07
IMPROVING ESTIMATES OF MONOTONE FUNCTIONS BYREARRANGEMENT
VICTOR CHERNOZHUKOV† IVAN FERNANDEZ-VAL§ ALFRED GALICHON‡
Abstract. Suppose that a target function f0 : Rd → R is monotonic, namely, weakly
increasing, and an original estimate f of the target function is available, which is not
weakly increasing. Many common estimation methods used in statistics produce such
estimates f . We show that these estimates can always be improved with no harm using
rearrangement techniques: The rearrangement methods, univariate and multivariate,
transform the original estimate to a monotonic estimate f∗, and the resulting estimate
is closer to the true curve f0 in common metrics than the original estimate f . We
illustrate the results with a computational example and an empirical example dealing
with age-height growth charts.
Key words. Monotone function, improved approximation, multivariate rearrange-
ment, univariate rearrangement, growth chart, quantile regression, mean regression,
series, locally linear, kernel methods
AMS Subject Classification. Primary 62G08; Secondary 46F10, 62F35, 62P10
Date: First version is of December, 2006. This version is of April 26, 2007. We would like to thank
Andrew Chesher, Moshe Cohen, Emily Gallagher, Raymond Guiteras, Xuming He, Roger Koenker,
Charles Manski, Costas Meghir, Ilya Molchanov, Steve Portnoy, Alp Simsek, and seminar participants
at Columbia, Cornell, BU, Georgetown, MIT, MIT-Harvard, Northwestern, UBC, UCL, and UIUC for
very useful comments that helped improve the paper.† Massachusetts Institute of Technology, Department of Economics & Operations Research Center,
University College London, CEMMAP, and The University of Chicago. E-mail: [email protected]. Re-
search support from the Castle Krob Chair, National Science Foundation, the Sloan Foundation, and
CEMMAP is gratefully acknowledged.
§ Boston University, Department of Economics. E-mail: [email protected].
‡ Harvard University, Department of Economics. E-mail: [email protected]. Research
support from the Conseil General des Mines and the National Science Foundation is gratefully
acknowledged.1
2
1. Introduction
A common problem in statistics is to approximate an unknown monotonic function
on the basis of available samples. For example, biometric age-height charts should be
monotonic in age; econometric demand functions should be monotonic in price; and
quantile functions should be monotonic in the probability index. Suppose an original,
possibly non-monotonic, estimate is available. Then, the rearrangement operation from
variational analysis (Hardy, Littlewood, and Polya 1952, Lorentz 1953, Villani 2003)
can be used to monotonize the original estimate. The rearrangement has been shown
to be useful in producing monotonized estimates of conditional mean functions (Dette,
Neumeyer, and Pilz 2006, Dette and Pilz 2006) and various conditional quantile and
probability functions (Chernozhukov, Fernandez-Val, and Galichon (2006a, 2006b)). In
this paper, it is shown that the rearrangement of the original estimate is useful not
only for producing monotonicity, but also has the following important property: The
rearrangement always improves over the original estimate, whenever the latter is not
monotonic. Namely, the rearranged curves are always closer (often considerably closer)
to the target curve being estimated. Furthermore, this improvement property is generic,
i.e. it does not depend on the underlying specifics of the original estimate and applies
to both univariate and multivariate cases.
The paper is organized as follows. In Section 2.1, we motivate the monotonicity
issue in regression problems, and discuss common estimates/approximations of regression
functions that are not naturally monotonic. In Section 2.2, we analyze the improvements
in estimation/approximation properties that the rearranged estimates deliver. In Section
2.3, we discuss the computation of the rearrangement, using sorting and simulation. In
Section 2.4, we extend the analysis of Section 2.2 to multivariate functions. In Section 3,
we provide proofs of the main results. In Section 4, we present an empirical application to
biometric age-height charts. We show how the rearrangement monotonizes and improves
the original estimates of the conditional mean function in this example, and quantify
the improvement in a simulation example resembling the empirical one. In the same
section, we also analyze estimation of conditional quantile processes for height given age
that need to be monotonic in both age and the quantile index. We apply a multivariate
rearrangement to doubly monotonize the estimates both in age and the quantile index.
We show that the rearrangement monotonizes and improves the original estimates, and
3
quantify the improvement in a simulation example mimicking the empirical example. In
Section 5 we offer a summary and a conclusion.
2. Improving Approximations of Monotonic Functions
2.1. Common Estimates of Monotonic Functions. A basic problem in many ar-
eas of analysis is to approximate an unknown function f0 : Rd → R on the basis of
some available information. In statistics, the common problem is to approximate an
unknown regression function, such as the conditional mean or a conditional quantile, us-
ing an available sample. In numerical analysis, the common problem is to approximate
an intractable target function by a more tractable function on the basis of the target
function’s values at a collection of points.
Suppose we know that the target function f0 is monotonic, namely weakly increasing.
Suppose further that an original estimate f is available, which is not necessarily mono-
tonic. Many common estimation methods do indeed produce such estimates. Can these
estimates always be improved with no harm? The answer provided by this paper is yes:
the rearrangement method transforms the original estimate to a monotonic estimate
f ∗, and this estimate is in fact closer to the true curve f0 than the original estimate f
in common metrics. Furthermore, the rearrangement is computationally tractable, and
thus preserves the computational appeal of the original estimates.
Estimation methods, specifically the ones used in regression analysis, can be grouped
into global methods and local methods. An example of a global method is the series
estimator of f0 taking the form
f(x) = Pkn(x)′b,
where Pkn(x) is a kn-vector of suitable transformations of the variable x, such as B-
splines, polynomials, and trigonometric functions. Section 4 lists specific examples in
the context of an empirical example. The estimate b is obtained by solving the regression
problem
b = arg minb∈Rkn
n∑i=1
ρ(Yi − Pkn(Xi)′b),
where (Yi, Xi), i = 1, ..., n denotes the data. In particular, using the square loss ρ(u) = u2
produces estimates of the conditional mean of Yi given Xi (Gallant 1981, Andrews
4
1991, Stone 1994, Newey 1997), while using the asymmetric absolute deviation loss
ρ(u) = (u − 1(u < 0))u produces estimates of the conditional u-quantile of Yi given Xi
(Koenker and Bassett 1978, Portnoy 1997, He and Shao 2000). Likewise, in numerical
analysis “data” often consist of values Yi of a target function evaluated at a collection
of mesh points {Xi, i = 1, , n} and the mesh points themselves. The series estimates
x 7→ f(x) = Pkn(x)′b are widely used in data analysis due to their good approximation
properties and computational tractability. However, these estimates need not be natu-
rally monotone, unless explicit constraints are added into the optimization program (for
example, Matzkin (1994), Silvapulle and Sen (2005), and Koenker and Ng (2005)).
Examples of local methods include kernel and locally polynomial estimators. A kernel
estimator takes the form
f(x) = arg minb∈R
n∑i=1
wiρ(Yi − b), wi = K
(Xi − x
h
),
where the loss function ρ plays the same role as above, K(u) is a standard, possibly
high-order, kernel function, and h > 0 is a vector of bandwidths (see, for example,
Wand and Jones (1995) and Ramsay and Silverman (2005)). The resulting estimate
x 7→ f(x) needs not be naturally monotone. Dette, Neumeyer, and Pilz (2006) show
that the rearrangement transforms the kernel estimate into a monotonic one. We further
show here that the rearranged estimate necessarily improves upon the original estimate,
whenever the latter is not monotonic. The locally polynomial regression is a related
local method (Chaudhuri 1991, Fan and Gijbels 1996). In particular, the locally linear
estimator takes the form
(f(x), d(x)) = argminb∈R,d∈R
n∑i=1
wiρ(Yi − b− d(Xi − x))2, wi = K
(Xi − x
h
).
The resulting estimate x 7→ f(x) may also be non-monotonic, unless explicit constrains
are added to the optimization problem. Section 4 illustrates the non-monotonicity of
the locally linear estimate in an empirical example.
In summary, there are many attractive estimation and approximation methods in sta-
tistics that do not necessarily produce monotonic estimates. These estimates do have
other attractive features though, such as good approximation properties and computa-
tional tractability. Below we show that the rearrangement operation applied to these
estimates produces (monotonic) estimates that improve the approximation properties of
5
the original estimates by bringing them closer to the target curve. Furthermore, the re-
arrangement is computationally tractable, and thus preserves the computational appeal
of the original estimates.
2.2. The Rearrangement and its Approximation Property: The Univariate
Case. In what follows, let X be a compact interval. Without loss of generality, it is
convenient to take this interval to be X = [0, 1]. Let f(x) be a measurable function
mapping X to K, a bounded subset of R. Let Ff (y) =∫X 1{f(u) ≤ y}du denote the
distribution function of f(X) when X follows the uniform distribution on [0, 1]. Let
f ∗(x) := Qf (x) := inf {y ∈ R : Ff (y) ≥ x}
be the quantile function of Ff (y). Thus,
f ∗(x) := inf
{y ∈ R :
[∫
X1{f(u) ≤ y}du
]≥ x
}.
This function f ∗ is called the increasing rearrangement of the function f .
Thus, the rearrangement operator simply transforms a function f to its quantile func-
tion f ∗. That is, x 7→ f ∗(x) is the quantile function of the random variable f(X) when
X ∼ U(0, 1). It is also convenient to think of the rearrangement as a sorting operation:
given values of the function f(x) evaluated at x in a fine enough net of equidistant
points, we simply sort the values in an increasing order. The function created in this
way is the rearrangement of f .
The main point of this paper is the following:
Proposition 1. Let f0 : X → K be a weakly increasing measurable function in x, where
K is a bounded subset of R. This is the target function. Let f : X → K be another
measurable function, an initial estimate of the target function f0.
1. For any p ∈ [1,∞], the rearrangement of f , denoted f ∗, weakly reduces the estimation
error: [∫
X
∣∣∣f ∗(x)− f0(x)∣∣∣p
dx
]1/p
≤[∫
X
∣∣∣f(x)− f0(x)∣∣∣p
dx
]1/p
. (2.1)
2. Suppose that there exist regions X0 and X ′0, each of measure greater than δ > 0, such
that for all x ∈ X0 and x′ ∈ X ′0 we have that (i) x′ > x, (ii) f(x) > f(x′) + ε, and (iii)
6
f0(x′) > f0(x) + ε, for some ε > 0. Then the gain in the quality of approximation is
strict for p ∈ (1,∞). Namely, for any p ∈ [1,∞],
[∫
X
∣∣∣f ∗(x)− f0(x)∣∣∣p
dx
]1/p
≤[∫
X
∣∣∣f(x)− f0(x)∣∣∣p
dx− δηp
]1/p
, (2.2)
where ηp = inf{|v − t′|p + |v′ − t|p − |v − t|p − |v′ − t′|p} and ηp > 0 for p ∈ (1,∞), with
the infimum taken over all v, v′, t, t′ in the set K such that v′ ≥ v + ε and t′ ≥ t + ε.
The first part of the proposition states the weak inequality (2.1), and the second part
states the strict inequality (2.2). For example, the inequality is strict for p ∈ (1,∞) if
the original estimate f(x) is decreasing on a subset of X having positive measure, while
the target function f0(x) is increasing on X (by increasing, we mean strictly increasing
throughout). Of course, if f0(x) is constant, then the inequality (2.1) becomes an equal-
ity, as the distribution of the rearranged function f ∗ is the same as the distribution of
the original function f , that is Ff∗ = Ff .
This proposition establishes that the rearranged estimate f ∗ has a smaller estimation
error in the Lp norm than the original estimate whenever the latter is not monotone.
This is a very useful and generally applicable property that is independent of the sample
size and of the way the original estimate f is obtained.
An indirect proof of the weak inequality (2.1) is a simple but important consequence
of the following classical inequality due to Lorentz (1953): Let q and g be two functions
mapping X to K, a bounded subset of R. Let q∗ and g∗ denote their corresponding
increasing rearrangements. Then,
∫
XL(q∗(x), g∗(x), x)dx ≤
∫
XL(q(x), g(x), x)dx,
for any submodular discrepancy function L : R3 7→ R. Set q(x) = f(x), q∗(x) =
f ∗(x), g(x) = f0(x), and g∗(x) = f ∗0 (x). Now, note that in our case f ∗0 (x) = f0(x)
almost everywhere, that is, the target function is its own rearrangement. Moreover,
L(v, w, x) = |w − v|p is submodular for p ∈ [1,∞). This proves the first part of the
proposition above. For p = ∞, the first part follows by taking the limit as p →∞.
In Section 3 we provide a proof of the strong inequality (2.2) as well as the direct proof
of the weak inequality (2.1). The direct proof illustrates how reductions of the estimation
7
error arise from even a partial sorting of the values of the estimate f . Moreover, the
direct proof characterizes the conditions for the strict reduction of the estimation error.
The following immediate implication of the above finite-sample result is also worth
emphasizing: The rearranged estimate f ∗ inherits the Lp rates of convergence from the
original estimates f . For p ∈ [1,∞], if λn = [∫X |f0(x)− f(x)|pdu]1/p = OP (an) for some
sequence of constants an, then [∫X |f0(x)− f ∗(x)|pdu]1/p ≤ λn = OP (an).
2.3. Computation of the Rearranged Estimate. One of the following methods can
be used for computing the rearrangement. Let {Xj, j = 1, ..., B} be either (1) a net of
equidistant points in [0, 1] or (2) a sample of i.i.d. draws from the uniform distribution
on [0, 1]. Then the rearranged estimate f ∗(u) at point u ∈ X can be approximately
computed as the u-quantile of the sample {f(Xj), j = 1, ..., B}. The first method is
deterministic, and the second is stochastic. Thus, for a given number of draws B, the
complexity of computing the rearranged estimate f ∗(u) in this way is equivalent to the
complexity of computing the sample u-quantile in the sample of size B.
The number of evaluations B can depend on the problem. Suppose that the den-
sity function of the random variable f(X), when X ∼ U(0, 1), is bounded away from
zero over a neighborhood of f ∗(x). Then f ∗(x) can be computed with the accuracy
of OP (1/√
B), as B → ∞, where the rate follows from the results of Knight (2002).
As shown in Chernozhukov, Fernandez-Val, and Galichon (2006a), the density of f(X),
denoted F ′f (t), exists if f(x) is continuously differentiable and the number of elements
in {x ∈ X : f ′(x) = 0} is bounded; in particular,
F ′f (t) =
∑
x∈{r∈X :f(r)=t}
1
|f ′(x)| . (2.3)
Thus, the density F ′f (t) is bounded away from zero if f ′(x) is bounded away from infinity.
Interestingly, the density has infinite poles at {t ∈ X : there is an x such that f ′(x) =
0 and f(x) = t}.
2.4. The Rearrangement and Its Approximation Property: The Multivariate
Case. In this section, we consider multivariate functions f : X d → K, where X d =
[0, 1]d and K is a bounded subset of R. The notion of monotonicity we seek to impose
on f is the following: We say that the function f is weakly increasing in x if f(x′) ≥ f(x)
whenever x′ ≥ x. The notation x′ = (x′1, ..., x′d) ≥ x = (x1, ..., xd) means that one vector
8
is weakly larger than the other in each of the components, that is, x′j ≥ xj for each
j = 1, ..., d. In what follows, we use the notation f(xj, x−j) to denote the dependence of
f on its j-th argument, xj, and all other arguments, x−j, that exclude xj. The notion
of monotonicity above is equivalent to the requirement that for each j in 1, ..., d the
mapping xj 7→ f(xj, x−j) is weakly increasing in xj, for each x−j in X d−1.
Define the rearranged operator Rj and the rearranged function f ∗j (x) with respect to
the j-th argument as follows:
f ∗j (x) := Rj ◦ f(x) := inf
{y :
[∫
X1{f(x′j, x−j) ≤ y}dx′j
]≥ xj
}.
This is the one-dimensional increasing rearrangement applied to one-dimensional func-
tion xj 7→ f(xj, x−j), holding the other arguments x−j fixed. The rearrangement is
applied for every value of the other arguments x−j.
Let π = (π1, ..., πd) be an ordering, i.e. a permutation, of the integers 1, ..., d. Let us
define the π-rearrangement operator Rπ and the π-rearranged function f ∗π(x) as follows:
f ∗π(x) := Rπ ◦ f(x) := Rπ1 ◦ ... ◦Rπd◦ f(x).
For any ordering π, the π-rearrangement operator rearranges the function with respect
to all of its arguments. As shown below, the resulting function fπ(x) is weakly increasing
in x.
In general, two different orderings π and π′ of 1, ..., d can yield different rearranged
functions f ∗π(x) and f ∗π′(x). Therefore, to resolve the conflict among rearrangements
done with different orderings, we may consider averaging among them: letting Π be any
collection of distinct orderings π, we can define the average rearrangement as
f ∗(x) :=1
|Π|∑π∈Π
f ∗π(x), (2.4)
where |Π| denotes the number of elements in the set of orderings Π. As shown below, the
approximation error of the average rearrangement is weakly smaller than the average of
approximation errors of individual π-rearrangements.
The following proposition describes the properties of multivariate π-rearrangements:
Proposition 2. Let the target function f0 : X d → K be weakly increasing and measur-
able in x. Let f : X d → K be a measurable function that is an initial estimate of the
9
target function f0. Let f : X d → K be another estimate of f0, which is measurable in x,
including, for example, a rearranged f with respect to some of the arguments. Then,
1. For each ordering π of 1, ..., d, the π-rearranged estimate f ∗π(x) is weakly increasing
in x. Moreover, f ∗(x), an average of π-rearranged estimates, is weakly increasing in x.
2. (a) For any j in 1, ..., d and any p in [1,∞], the rearrangement of f with respect
to the j-th argument produces a weak reduction in the approximation error:
[∫
X d
|f ∗j (x)− f0(x)|pdx
]1/p
≤[∫
X d
|f(x)− f0(x)|pdx
]1/p
. (2.5)
(b) Consequently, a π-rearranged estimate f ∗π(x) of f(x) weakly reduces the approxi-
mation error of the original estimate:
[∫
X d
|f ∗π(x)− f0(x)|pdx
]1/p
≤[∫
X d
|f(x)− f0(x)|pdx
]1/p
. (2.6)
3. Suppose that f(x) and f0(x) have the following properties: there exist subsets
Xj ⊂ X and X ′j ⊂ X , each of measure δ > 0, and a subset X−j ⊆ X d−1, of measure
ν > 0, such that for all x = (xj, x−j) and x′ = (x′j, x−j), with x′j ∈ X ′j , xj ∈ Xj,
x−j ∈ X−j, we have that (i) x′j > xj, (ii) f(x) > f(x′) + ε, and (iii) f0(x′) > f0(x) + ε,
for some ε > 0.
(a) Then, for any p ∈ [1,∞],
[∫
X d
|f ∗j (x)− f0(x)|pdx
]1/p
≤[∫
X d
|f(x)− f0(x)|pdx− ηpδν
]1/p
, (2.7)
where ηp = inf{|v− t′|p + |v′− t|p − |v− t|p − |v′− t′|p}, and ηp > 0 for p ∈ (1,∞), with
the infimum taken over all v, v′, t, t′ in the set K such that v′ ≥ v + ε and t′ ≥ t + ε.
(b) Further, for an ordering π = (π1, ..., πk, ..., πd) with πk = j, let f be a partially
rearranged function, f(x) = Rπk+1◦ ... ◦ Rπd
◦ f(x) (for k = d we set f(x) = f(x)). If
the function f(x) and the target function f0(x) satisfy the condition stated above, then,
for any p ∈ [1,∞],
[∫
X d
|f ∗π(x)− f0(x)|pdx
]1/p
≤[∫
X d
|f(x)− f0(x)|pdx− ηpδν
]1/p
. (2.8)
10
4. The approximation error of an average rearrangement is weakly smaller than the
average approximation error of the individual π- rearrangements: For any p ∈ [1,∞],
[∫
X d
|f ∗(x)− f0(x)|pdx
]1/p
≤ 1
|Π|∑π∈Π
[∫
X d
|f ∗π(x)− f0(x)|pdx
]1/p
. (2.9)
This proposition generalizes the results of Proposition 1 to the multivariate case,
also demonstrating several features unique of the multivariate case. We see that the
π-rearranged functions are monotonic in all of the arguments. The rearrangement along
any argument improves the approximation properties of the estimate. Moreover, the
improvement is strict when the rearrangement with respect to a j-th argument is per-
formed on an estimate that is decreasing in the j-th argument, while the target function
is increasing in the same j-th argument, in the sense precisely defined in the proposition.
Moreover, averaging different π-rearrangements is better (on average) than using a single
π-rearrangement chosen at random. All other basic implications of the proposition are
similar to those discussed for the univariate case.
3. Proofs of Propositions
3.1. Proof of Proposition 1. The first part establishes the weak inequality, following
in part the strategy in Lorentz’s (1953) proof. The proof focuses directly on obtaining
the result stated in the proposition. The second part establishes the strong inequality.
Proof of Part 1. We assume at first that the functions f(·) and f0(·) are simple
functions, constant on intervals ((s− 1)/r, s/r], s = 1, ..., r. For any simple f(·) with r
steps, let f denote the r-vector with the s-th element, denoted fs, equal to the value of
f(·) on the s-th interval. Let us define the sorting operator S(f) as follows: Let ` be an
integer in 1, ..., r such that f` > fm for some m > l. If ` does not exist, set S(f) = f . If
` exists, set S(f) to be a r-vector with the `-th element equal to fm, the m-th element
equal to f`, and all other elements equal to the corresponding elements of f . For any
submodular function L : R2 → R+, by f` ≥ fm, f0m ≥ f0` and the definition of the
submodularity,
L(fm, f0`) + L(f`, f0m) ≤ L(f`, f0`) + L(fm, f0m).
11
Therefore, we conclude that
∫
XL(S(f)(x), f0(x))dx ≤
∫
XL(f(x), f0(x))dx, (3.1)
using that we integrate simple functions.
Applying the sorting operation a sufficient finite number of times to f , we obtain a
completely sorted, that is, rearranged, vector f ∗. Thus, we can express f ∗ as a finite
composition f ∗ = S ◦ ... ◦ S(f) . By repeating the argument above, each composition
weakly reduces the approximation error. Therefore,
∫
XL(f ∗(x), f0(x))dx ≤
∫
XL(S ◦ ... ◦ S︸ ︷︷ ︸
finite times
(f), f0(x))dx ≤∫
XL(f(x), f0(x))dx. (3.2)
Furthermore, this inequality is extended to general measurable functions f(·) and f0(·)mapping X to K by taking a sequence of bounded simple functions f (r)(·) and f
(r)0 (·)
converging to f(·) and f0(·) almost everywhere as r → ∞. The almost everywhere
convergence of f (r)(·) to f(·) implies the almost everywhere convergence of its quantile
function f ∗(r)(·) to the quantile function of the limit, f ∗(·). Since inequality (3.2) holds
along the sequence, the dominated convergence theorem implies that (3.2) also holds for
the general case. ¤Proof of Part 2. Let us first consider the case of simple functions, as defined in Part
1. We take the functions to satisfy the following hypotheses: there exist regions X0 and
X ′0, each of measure greater than δ > 0, such that for all x ∈ X0 and x′ ∈ X ′
0, we have
that (i) x′ > x, (ii) f(x) > f(x′) + ε, and (iii) f0(x′) > f0(x) + ε, for ε > 0 specified in
the proposition. For any strictly submodular function L : R2 → R+ we have that
η = inf{L(v′, t) + L(v, t′)− L(v, t)− L(v′, t′)} > 0,
where the infimum is taken over all v, v′, t, t′ in the set K such that v′ ≥ v + ε and
t′ ≥ t + ε.
We can begin sorting by exchanging an element f(x), x ∈ X0, of r-vector f with an
element f(x′), x′ ∈ X ′0, of r-vector f . This induces a sorting gain of at least η times 1/r.
The total mass of points that can be sorted in this way is at least δ. We then proceed to
sort all of these points in this way, and then continue with the sorting of other points.
12
After the sorting is completed, the total gain from sorting is at least δη. That is,∫
XL(f ∗(x), f0(x))dx ≤
∫
XL(f(x), f0(x))dx− δη.
We then extend this inequality to the general measurable functions exactly as in the
proof of part one. ¤
3.2. Proof of Proposition 2. The proof consists of the following four parts.
Proof of Part 1. We prove the claim by induction. The claim is true for d = 1 by
f ∗(x) being a quantile function. We then consider any d ≥ 2. Suppose the claim is
true in d− 1 dimensions. If so, then the estimate f(xj, x−j), obtained from the original
estimate f(x) after applying the rearrangement to all arguments x−j of x, except for the
argument xj, must be weakly increasing in x−j for each xj. Thus, for any x′−j ≥ x−j,
we have that
f(Xj, x′−j) ≥ f(Xj, x−j) for Xj ∼ U(0, 1). (3.3)
Therefore, the random variable on the left of (3.3) dominates the random variable on
the right of (3.3) in the stochastic sense. Therefore, the quantile function of the random
variable on the left dominates the quantile function of the random variable on the right,
namely
f ∗j (xj, x′−j) ≥ f ∗j (xj, x−j) for each xj ∈ X = (0, 1). (3.4)
Moreover, for each x−j, the function xj 7→ f ∗j (xj, x−j) is weakly increasing by virtue of
being a quantile function. We conclude therefore that x 7→ f ∗j (x) is weakly increasing
in all of its arguments at all points x ∈ X d. The claim of Part 1 of the Proposition now
follows by induction. ¤Proof of Part 2 (a). By Proposition 1, we have that for each x−j,∫
X
∣∣f ∗j (xj, x−j)− f0(xj, x−j)∣∣p dxj ≤
∫
X
∣∣f(xj, x−j)− f0(xj, x−j)∣∣p dxj. (3.5)
Now, the claim follows by integrating with respect to x−j and taking the p-th root of
both sides. For p = ∞, the claim follows by taking the limit as p →∞. ¤Proof of Part 2 (b). We first apply the inequality of Part 2(a) to f(x) = f(x), then
to f(x) = Rπd◦ f(x), then to f(x) = Rπd−1
◦ Rπd◦ f(x), and so on. In doing so,
we recursively generate a sequence of weak inequalities that imply the inequality (2.6)
stated in the Proposition. ¤
13
Proof of Part 3 (a). For each x−j ∈ X d−1 \ X−j, by Part 2(a), we have the weak
inequality (3.5), and for each x−j ∈ X−j, by the inequality for the univariate case stated
in Proposition 1 Part 2, we have the strong inequality∫
X
∣∣f ∗j (xj, x−j)− f0(xj, x−j)∣∣p dxj ≤
∫
X
∣∣f(xj, x−j)− f0(xj, x−j)∣∣p dxj − ηpδ, (3.6)
where ηp is defined in the same way as in Proposition 1. Integrating the weak inequality
(3.5) over x−j ∈ X d−1 \ X−j, of measure 1− ν, and the strong inequality (3.6) over X−j,
of measure ν, we obtain∫
X d
∣∣f ∗j (x)− f0(x)∣∣p dx ≤
∫
X d
∣∣f(x)− f0(x)∣∣p dx− ηpδν. (3.7)
The claim now follows. ¤Proof of Part 3 (b). As in Part 2(a), we can recursively obtain a sequence of weak
inequalities describing the improvements in approximation error from rearranging se-
quentially with respect to the individual arguments. Moreover, at least one of the
inequalities can be strengthened to be of the form stated in (3.7), from the assumption
of the claim. The resulting system of inequalities yields the inequality (2.8), stated in
the proposition. ¤Proof of Part 4. We can write
[∫
X d
∣∣∣f ∗(x)− f0(x)∣∣∣p
dx
]1/p
=
[∫
X d
∣∣∣∣∣1
|Π|∑π∈Π
(f ∗(x)− f0(x)
)∣∣∣∣∣
p
dx
]1/p
≤ 1
|Π|∑π∈Π
[∫
X d
∣∣∣f ∗π(x)− f0(x)∣∣∣p
dx
]1/p
,
(3.8)
where the last inequality follows by pulling out 1/|Π| and then applying the triangle
inequality for the Lp norm. ¤
4. Illustrations
In this section we provide an empirical application to biometric age-height charts.
We show how the rearrangement monotonizes and improves various nonparametric esti-
mates, and then we quantify the improvement in a simulation example that mimics the
empirical application.
14
4.1. An Empirical Illustration with Age-Height Reference Charts. Since their
introduction by Quetelet in the 19th century, reference growth charts have become com-
mon tools to asses an individual’s health status. These charts describe the evolution
of individual anthropometric measures, such as height, weight, and body mass index,
across different ages. See Cole (1988) for a classical work on the subject and Wei, Pere,
Koenker, and He (2006) for a recent analysis from a quantile regression perspective and
additional references.
To illustrate the properties of the rearrangement method we consider the estimation
of growth charts for height. It is clear that height should naturally follow an increasing
relationship with age. Our data consist of repeated cross sectional measurements of
height and age from the 2003-2004 National Health and Nutrition Survey collected by
the National Center for Health Statistics. Height is measured as standing height in
centimeters, and age is recorded in months and expressed in years. To avoid confounding
factors that might affect the relationship between age and height, we restrict the sample
to US-born white males age two through twenty. Our final sample consists of 533 subjects
almost evenly distributed across these ages.
Let Y and X denote height and age, respectively. Let E[Y |X = x] denote the condi-
tional expectation of Y given X = x, and QY (u|X = x) denote the u-th quantile of Y
given X = x, where u is the quantile index. The population functions of interests are
(1) the conditional expectation function (CEF), (2) the conditional quantile functions
(CQF) for several quantile indices (5%, median, and 95%), and (3) the entire condi-
tional quantile process (CQP) for height given age. In the first case, the target function
x 7→ f0(x) is x 7→ E[Y |X = x]; in the second case, the target function x 7→ f0(x) is
x 7→ QY [u|X = x], for u = 5%, 50%, and 95%; and, in the third case, the target func-
tion (u, x) 7→ f0(u, x) is (u, x) 7→ QY [u|X = x]. The natural monotonicity requirements
for the target functions are the following: The CEF x 7→ E[Y |X = x] and the CQF
x 7→ QY (u|X = x) should be increasing in age x, and the CQP (u, x) 7→ QY [u|X = x]
should be increasing in both age x and the quantile index u.
We estimate the target functions using non-parametric ordinary least squares or quan-
tile regression techniques and then rearrange the estimates to satisfy the monotonicity
requirements. We consider (a) kernel, (b) local linear, (c) spline, (d) global polynomial,
(e) Fourier, and (f) flexible Fourier methods. For the kernel method, we provide a fit
15
on a cell-by-cell basis, with each cell corresponding to one month. For the local linear
method, we choose a bandwidth of one year and a box kernel. For the spline method,
we use cubic B-splines with a knot sequence {3, 5, 8, 10, 11.5, 13, 14.5, 16, 18}, following
Wei, Pere, Koenker, and He (2006). For the global polynomial method, we fit a quartic
polynomial. For the Fourier method, we employ eight trigonometric terms, with four
sines and four cosines. For the flexible Fourier method, we use a quadratic polynomial
and four trigonometric terms, with two sines and two cosines. Finally, for the estima-
tion of the conditional quantile process, we use a net of two hundred quantile indices
{0.005, 0.010, ..., 0.995}. In the choice of the parameters for the different methods, we
select values that either have been used in the previous empirical work or give rise to
specifications with similar complexities for the different methods.
The panels A-F of Figure 1 show the original and rearranged estimates of the con-
ditional expectation function for the different methods. All the estimated curves have
trouble capturing the slowdown in the growth of height after age sixteen and yield non-
monotonic curves for the highest values of age. The Fourier series have a special difficulty
approximating the aperiodic age-height relationship. The rearranged estimates correct
the non-monotonicity of the original estimates, providing weakly increasing curves that
coincide with the original estimates in the parts where the latter are monotonic. More-
over, the rearranged estimates necessarily improve upon the original estimates, since,
by the theoretical results derived earlier, they are closer to the true functions than the
original estimates. We quantify this improvement in the next subsection.
Figure 2 displays similar but more pronounced non-monotonicity patterns for the
estimates of the conditional quantile functions. The rearrangement again performs well
in delivering curves that improve upon the original estimates and that satisfy the natural
monotonicity requirement.
Figures 3-7 illustrate the multivariate rearrangement of the conditional quantile pro-
cess (CQP) along both the age and the quantile index arguments. We plot in three
dimensions the original estimate, its age rearrangement, its quantile rearrangement, and
its average multivariate rearrangement (the average of the age-quantile and quantile-age
rearrangements). We also plot the corresponding contour surfaces. (Here, we do not
show the multivariate age-quantile and quantile-age rearrangements separately, because
16
they are very similar to the average multivariate rearrangement.) We see from the con-
tour plots that, for all of the estimation methods considered, the estimated CQP is
non-monotone in age and non-monotone in the quantile index at extremal values of this
index. The contour plots for the estimates based on the Fourier series best illustrate the
non-monotonicity problem. We see that the average multivarite rearrangement fixes the
non-monotonicity problem, and delivers an estimate of the CQP that is monotone in
both the age and the quantile index arguments. Furthermore, by the theoretical results
of the paper, the multivariate rearranged estimates necessarily improve upon the original
estimates.
4.2. Monte-Carlo Illustration. The following Monte Carlo experiment quantifies the
improvement in the estimation/approximation properties of the rearranged estimates
relative to the original estimates. The experiment closely matches the empirical appli-
cation presented above.
Specifically, we consider the design where the outcome variable Y equals a location
function plus a disturbance ε, Y = Z(X)′β+ε, and the disturbance is independent of the
regressor X. The vector Z(X) includes a constant and a piecewise linear transformation
of the regressor X with three changes of slope, namely Z(X) = (1, X, 1{X > 5} · (X −5), 1{X > 10} · (X − 10), 1{X > 15} · (X − 15)). This design implies the conditional
expectation function
E[Y |X] = Z(X)′β, (4.1)
and the conditional quantile function
QY (u|X) = Z(X)′β + Qε(u). (4.2)
We select the parameters of the design to match the empirical example of growth charts
in the previous subsection. Thus, we set the parameter β equal to the ordinary least
squares estimate obtained in the growth chart data, namely (71.25, 8.13, −2.72, 1.78,
−6.43). This parameter value and the location specification (4.2) imply a model for CEF
and CQP that is monotone in age over the range of 2-20. To generate the values of the
dependent variable, we draw disturbances from a normal distribution with the mean and
variance equal to the mean and variance of the estimated residuals, ε = Y − Z(X)′β,
in the growth chart data. We fix the regressor X in all of the replications to be the
17
observed values of age in the growth chart data set. In each replication, we estimate the
CEF and CQP using the nonparametric methods described in the previous section.
In Table 1 we report the average Lp errors (for p = 1, 2, 3, 4 and ∞) for the original
estimates and the rearranged estimates of the CEF. We also report the relative efficiency
of the two estimates, measured as the ratio of the average error of the rearranged estimate
to the average error of the original estimate. We calculate the average Lp error as the
Monte Carlo average of
Lp :=
[∫
X|f(x)− f0(x)|pdx
]1/p
,
where the target function f0(x) is the CEF E[Y |X = x], and the estimate f(x) denotes
either the original nonparametric estimate of the CEF or its increasing rearrangement.
For all of the methods considered, we find that the rearranged curves estimate the true
CEF more accurately than the original curves, providing a 2% to 84% reduction in the
average error, depending on the method and the norm (i.e. values of p).
In Table 2 we report the average Lp errors for the original estimates of the conditional
quantile process and their multivariate rearrangement with respect to the age and quan-
tile index arguments. We also report the ratio of the average error of the rearranged
estimate to the average error of the original estimate. The average Lp error is the Monte
Carlo average of
Lp :=
[∫
U
∫
X|f(u, x)− f0(u, x)|pdxdu
]1/p
,
where the target function f0(u, x) is the conditional quantile process QY (u|X = x), and
the estimate f(u, x) denotes either the original nonparametric estimate of the conditional
quantile process or its multivariate rearrangement. We present the results for the av-
erage multivariate rearrangement only. The age-quantile and quantile-age multivariate
rearrangements give errors that are very similar to their average multivariate rearrange-
ment, and we therefore do not report them separately. For all the methods considered,
we find that the multivariate rearranged curves estimate the true CQP more accurately
than the original curves, providing a 4% to 74% reduction in the approximation error,
depending on the method and the norm.
18
In Table 3 we report the average Lp error for the univariate rearrangements of the
conditional quantile function along either the age argument or the quantile index ar-
gument. We also report the ratio of the average error for these rearrangements to the
average error of the original estimates. For all of the methods considered, we find that
these rearranged curves estimate the true CQP more accurately than the original curves,
providing noticeable reductions in the estimation error. Moreover, in this case the re-
arrangement along the age argument is more effective in reducing the estimation error
than the rearrangement along the quantile index. Furthermore, by comparing Tables 2
and 3, we also see that the multivariate rearrangement provides an improvement over
the individual univariate rearrangements, yielding estimates of the CQP that are often
much closer to the true process.
5. Conclusion
Suppose that a target function is known to be weakly increasing, and we have an
original estimate of this function, which is not weakly increasing. Common estima-
tion methods provide estimates with such a property. We show that these estimates
can always be improved using rearrangement techniques. The univariate and multivari-
ate rearrangement methods transform the original estimate to a monotonic estimate.
The resulting monotonic estimate is in fact closer to the target function in common
metrics than the original estimate. We illustrate these theoretical results with a com-
putational example and an empirical example, dealing with estimation of conditional
mean and quantile functions of height given age. The rearrangement both monotonizes
and improves the original non-monotone estimates. It would be interesting to determine
whether this improved estimation/approximation property carries over to other methods
of monotonization. We leave this extension for future research.
References
Andrews, D. W. K. (1991): “Asymptotic normality of series estimators for nonparametric and semi-
parametric regression models,” Econometrica, 59(2), 307–345.
Chaudhuri, P. (1991): “Nonparametric estimates of regression quantiles and their local Bahadur
representation,” Ann. Statist., 19, 760–777.
Chernozhukov, V., I. Fernandez-Val, and A. Galichon (2006a): “Quantile and Probability
Curves without Crossing,” MIT Working Paper, www.arxiv.org and www.ssrn.com.
19
(2006b): “Rearranging Edgeworth-Cornish-Fisher Expansions,” MIT Working Paper, coming
to www.arxiv.org and www.ssrn.com.
Cole, T. J. (1988): “Fitting smoothed centile curves to reference data,” J Royal Stat Soc, 151, 385–418.
Dette, H., N. Neumeyer, and K. F. Pilz (2006): “A simple nonparametric estimator of a strictly
monotone regression function,” Bernoulli, 12(3), 469–490.
Dette, H., and K. F. Pilz (2006): “A comparative study of monotone nonparametric kernel esti-
mates,” J. Stat. Comput. Simul., 76(1), 41–56.
Fan, J., and I. Gijbels (1996): Local polynomial modelling and its applications, vol. 66 of Monographs
on Statistics and Applied Probability. Chapman & Hall, London.
Gallant, A. R. (1981): “On the bias in flexible functional forms and an essentially unbiased form:
the Fourier flexible form,” J. Econometrics, 15(2), 211–245.
Hardy, G. H., J. E. Littlewood, and G. Polya (1952): Inequalities. Cambridge University Press,
2d ed.
He, X., and Q.-M. Shao (2000): “On parameters of increasing dimensions,” J. Multivariate Anal.,
73(1), 120–135.
Knight, K. (2002): “What are the limiting distributions of quantile estimators?,” in Statistical data
analysis based on the L1-norm and related methods (Neuchatel, 2002), Stat. Ind. Technol., pp. 47–65.
Birkhauser, Basel.
Koenker, R., and G. S. Bassett (1978): “Regression quantiles,” Econometrica, 46, 33–50.
Koenker, R., and P. Ng (2005): “Inequality constrained quantile regression,” Sankhya, 67(2), 418–
440.
Lorentz, G. G. (1953): “An inequality for rearrangements,” Amer. Math. Monthly, 60, 176–179.
Matzkin, R. L. (1994): “Restrictions of economic theory in nonparametric methods,” in Handbook of
econometrics, Vol. IV, vol. 2 of Handbooks in Econom., pp. 2523–2558. North-Holland, Amsterdam.
Newey, W. K. (1997): “Convergence rates and asymptotic normality for series estimators,” J. Econo-
metrics, 79(1), 147–168.
Portnoy, S. (1997): “Local asymptotics for quantile smoothing splines,” Ann. Statist., 25(1), 414–434.
Ramsay, J. O., and B. W. Silverman (2005): Functional data analysis. Springer, New York, second
edn.
Silvapulle, M. J., and P. K. Sen (2005): Constrained statistical inference, Wiley Series in Proba-
bility and Statistics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ.
Stone, C. J. (1994): “The use of polynomial splines and their tensor products in multivariate function
estimation,” Ann. Statist., 22(1), 118–184, With discussion by Andreas Buja and Trevor Hastie and a
rejoinder by the author.
Villani, C. (2003): Topics in optimal transportation, vol. 58 of Graduate Studies in Mathematics.
American Mathematical Society, Providence, RI.
Wand, M. P., and M. C. Jones (1995): Kernel smoothing, vol. 60 of Monographs on Statistics and
Applied Probability. Chapman and Hall Ltd., London.
20
Wei, Y., A. Pere, R. Koenker, and X. He (2006): “Quantile regression methods for reference
growth charts,” Stat. Med., 25(8), 1369–1382.
21
Table 1. Lp Estimation/Approximation Error of Original and Rear-
ranged Estimates of the Conditional Expectation Function, for p =
1, 2, 3, 4, and ∞. Univariate Rearrangement.
p LpO Lp
R LpR/Lp
O LpO Lp
R LpR/Lp
O
A. Kernel B. Local Polynomial
1 3.69 1.33 0.36 0.79 0.76 0.96
2 4.80 1.84 0.38 1.00 0.96 0.96
3 5.81 2.46 0.42 1.17 1.13 0.96
4 6.72 3.12 0.46 1.33 1.28 0.96
∞ 16.8 9.84 0.58 2.96 2.81 0.95
C. Splines D. Quartic
1 0.87 0.81 0.93 1.33 1.19 0.89
2 1.10 1.02 0.93 1.64 1.46 0.89
3 1.31 1.22 0.93 1.89 1.68 0.89
4 1.52 1.39 0.92 2.10 1.87 0.89
∞ 3.72 3.19 0.86 4.38 3.79 0.87
E. Fourier F. Flexible Fourier
1 6.57 3.21 0.49 0.73 0.72 0.97
2 10.7 3.79 0.35 0.91 0.89 0.97
3 15.2 4.24 0.28 1.06 1.04 0.98
4 19.0 4.59 0.24 1.18 1.16 0.98
∞ 48.9 7.79 0.16 2.44 2.40 0.98
Notes: The table is based on 10,000 replications.
LpO is the Lp error of the original estimate.
LpR is the Lp error of the rearranged estimate.
22
Table 2. Lp Estimation/Approximation Error of Original and Rear-
ranged Estimates of the Conditional Quantile Process, for p = 1, 2, 3, 4,
and ∞. Average Multivariate Rearrangement.
p LpO Lp
RR LpRR/Lp
O LpO Lp
RR LpRR/Lp
O
A. Kernel B. Local Polynomial
1 5.35 3.13 0.58 1.21 1.09 0.91
2 6.97 4.37 0.63 1.61 1.46 0.91
3 8.40 5.49 0.65 2.03 1.84 0.91
4 9.72 6.49 0.67 2.48 2.24 0.91
∞ 34.3 26.4 0.77 12.3 10.4 0.84
C. Splines D. Quartic
1 1.33 1.20 0.90 1.49 1.35 0.90
2 1.78 1.60 0.90 1.87 1.69 0.90
3 2.30 2.03 0.88 2.23 1.99 0.89
4 2.92 2.50 0.86 2.62 2.29 0.87
∞ 16.9 12.1 0.72 12.6 8.61 0.68
E. Fourier F. Flexible Fourier
1 6.72 4.18 0.62 1.05 1.00 0.96
2 13.7 5.35 0.39 1.38 1.31 0.95
3 20.8 6.36 0.31 1.72 1.63 0.95
4 26.7 7.25 0.27 2.12 1.98 0.94
∞ 84.9 21.9 0.26 10.9 9.13 0.84
Notes: The table is based on 1,000 replications.
LpO is the Lp error of the original estimate.
LpRR is the Lp error of the average multivariate rearranged estimate.
23
Table 3. Lp Estimation/Approximation Error of Rearranged Estimates
of the Conditional Quantile Process, for p = 1, 2, 3, 4,∞. Univariate Re-
arrangements.
p LpRu
LpRx
LpRu
/LpO Lp
Rx/Lp
O LpRu
LpRx
LpRu
/LpO Lp
Rx/Lp
O
A. Kernel B. Local Polynomial
1 5.35 3.13 1.00 0.58 1.20 1.10 1.00 0.91
2 6.97 4.37 1.00 0.63 1.60 1.47 1.00 0.91
3 8.40 5.49 1.00 0.65 2.01 1.85 0.99 0.91
4 9.72 6.49 1.00 0.67 2.45 2.26 0.99 0.91
∞ 34.3 26.4 1.00 0.77 11.8 10.8 0.96 0.88
C. Splines D. Quartic
1 1.31 1.21 0.99 0.91 1.49 1.35 1.00 0.91
2 1.75 1.63 0.98 0.91 1.87 1.69 1.00 0.90
3 2.24 2.08 0.97 0.90 2.22 2.00 0.99 0.90
4 2.80 2.59 0.96 0.89 2.60 2.30 0.99 0.88
∞ 14.4 13.9 0.85 0.82 11.9 9.11 0.95 0.72
E. Fourier F. Flexible Fourier
1 6.71 4.19 1.00 0.62 1.04 1.01 0.99 0.96
2 13.7 5.36 1.00 0.39 1.36 1.32 0.99 0.96
3 20.8 6.37 1.00 0.31 1.70 1.65 0.99 0.96
4 26.7 7.26 1.00 0.27 2.08 2.02 0.98 0.95
∞ 84.9 22.2 1.00 0.26 10.0 9.86 0.92 0.91
Notes. The table is based on 1,000 replications.
LpO is the Lp error of the original estimate.
LpRx
is the Lp error of the estimate rearranged in age x.
LpRu
is the Lp error of the estimate rearranged in the quantile index u.
24
5 10 15 20
8010
012
014
016
018
0
Age
Hei
ght
A. CEF (Kernel)
OriginalRearranged
5 10 15 20
8010
012
014
016
018
0
Age
Hei
ght
B. CEF (Local Pol.)
OriginalRearranged
5 10 15 20
8010
012
014
016
018
0
Age
Hei
ght
C. CEF (Splines)
OriginalRearranged
5 10 15 20
8010
012
014
016
018
0
Age
Hei
ght
D. CEF (Quartic)
OriginalRearranged
5 10 15 20
8010
012
014
016
018
0
Age
Hei
ght
E. CEF (Fourier)
OriginalRearranged
5 10 15 20
8010
012
014
016
018
0
Age
Hei
ght
F. CEF (Flexible Fourier)
OriginalRearranged
Figure 1. Nonparametric estimates of the Conditional Expectation
Function (CEF) of height given age and their increasing rearrangements.
Nonparametric estimates are obtained using kernel regression (A), locally
linear regression (B), cubic B-splines series (C), a four degree polynomial
(D), Fourier series (E), and flexible Fourier series (F).
25
5 10 15 20
8010
012
014
016
018
0
Age
Hei
ght
A. CQF: 5%, 50%, 95% (Kernel)
OriginalRearranged
5 10 15 20
8010
012
014
016
018
0
Age
Hei
ght
B. CQF: 5%, 50%, 95% (Local Pol.)
OriginalRearranged
5 10 15 20
8010
012
014
016
018
0
Age
Hei
ght
C. CQF: 5%, 50%, 95% (Splines)
OriginalRearranged
5 10 15 20
8010
012
014
016
018
0
Age
Hei
ght
D. CQF: 5%, 50%, 95% (Quartic)
OriginalRearranged
5 10 15 20
8010
012
014
016
018
0
Age
Hei
ght
E. CQF: 5%, 50%, 95% (Fourier)
OriginalRearranged
5 10 15 20
8010
012
014
016
018
0
Age
Hei
ght
F. CQF: 5%, 50%, 95% (Flexible Fourier)
OriginalRearranged
Figure 2. Nonparametric estimates of the 5%, 50%, and 95% Condi-
tional Quantile Functions (CQF) of height given age and their increasing
rearrangements. Nonparametric estimates are obtained using kernel re-
gression (A), locally linear regression (B), cubic B-splines series (C), a
four degree polynomial (D), Fourier series (E), and flexible Fourier series
(F).
26
quan
tileage
height
A. CQP (Kernel)
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
B. CQP: Contour
quan
tileage
height
C. CQP: Age Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
D. CQP: Contour (R−Age)
quan
tileage
height
E. CQP: Quantile Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
F. CQP: Contour (R−Quantile)
quan
tileage
height
G. CQP: Average Quantile/Age Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
H. CQP: Contour (RR−Quantile/Age)
Figure 3. Kernel estimates of the Conditional Quantile Process (CQP)
of height given age and their increasing rearrangements. Panels C and
E plot the one dimensional increasing rearrangement along the age and
quantile dimension respectively; panel G shows the average multivariate
rearrangement.
27
quan
tileage
height
A. CQP (Local Pol.)
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
B. CQP: Contour
quan
tileage
height
C. CQP: Age Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
D. CQP: Contour (R−Age)
quan
tileage
height
E. CQP: Quantile Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
F. CQP: Contour (R−Quantile)
quan
tileage
height
G. CQP: Average Quantile/Age Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
H. CQP: Contour (RR−Quantile/Age)
Figure 4. Locally linear estimates of the Conditional Quantile Process
(CQP) of height given age and their increasing rearrangements. Panels C
and E plot the one dimensional increasing rearrangement along the age and
quantile dimension respectively; panel G shows the average multivariate
rearrangement.
28
quan
tileage
height
A. CQP (Splines)
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
B. CQP: Contour
quan
tileage
height
C. CQP: Age Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
D. CQP: Contour (R−Age)
quan
tileage
height
E. CQP: Quantile Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
F. CQP: Contour (R−Quantile)
quan
tileage
height
G. CQP: Average Quantile/Age Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
H. CQP: Contour (RR−Quantile/Age)
Figure 5. Cubic B-splines series estimates of the Conditional Quantile
Process (CQP) of height given age and their increasing rearrangements.
Panels C and E plot the one dimensional increasing rearrangement along
the age and quantile dimension respectively; panel G shows the average
multivariate rearrangement.
29
quan
tileage
height
A. CQP (Quartic)
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
B. CQP: Contour
quan
tileage
height
C. CQP: Age Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
D. CQP: Contour (R−Age)
quan
tileage
height
E. CQP: Quantile Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
F. CQP: Contour (R−Quantile)
quan
tileage
height
G. CQP: Average Quantile/Age Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
H. CQP: Contour (RR−Quantile/Age)
Figure 6. Quartic polynomial series estimates of the Conditional Quan-
tile Process (CQP) of height given age and their increasing rearrange-
ments. Panels C and E plot the one dimensional increasing rearrange-
ment along the age and quantile dimension respectively; panel G shows
the average multivariate rearrangement.
30
quan
tileage
height
A. CQP (Fourier)
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
B. CQP: Contour
quan
tileage
height
C. CQP: Age Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
D. CQP: Contour (R−Age)
quan
tileage
height
E. CQP: Quantile Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
F. CQP: Contour (R−Quantile)
quan
tileage
height
G. CQP: Average Quantile/Age Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
H. CQP: Contour (RR−Quantile/Age)
Figure 7. Fourier series estimates of the Conditional Quantile Process
(CQP) of height given age and their increasing rearrangements. Panels C
and E plot the one dimensional increasing rearrangement along the age and
quantile dimension respectively; panel G shows the average multivariate
rearrangement.
31
quan
tileage
height
A. CQP (Flexible Fourier)
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
B. CQP: Contour
quan
tileage
height
C. CQP: Age Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
D. CQP: Contour (R−Age)
quan
tileage
height
E. CQP: Quantile Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
F. CQP: Contour (R−Quantile)
quan
tileage
height
G. CQP: Average Quantile/Age Rearrangement
quantile
age
0.0 0.2 0.4 0.6 0.8 1.0
510
1520
H. CQP: Contour (RR−Quantile/Age)
Figure 8. Flexible Fourier form series estimates of the Conditional
Quantile Process (CQP) of height given age and their increasing rearrange-
ments. Panels C and E plot the one dimensional increasing rearrangement
along the age and quantile dimension respectively; panel G shows the av-
erage multivariate rearrangement.