IDENTIFICATION AND INFERENCE IN NONLINEARMODELS USING TWO SAMPLES WITH NONCLASSICAL
MEASUREMENT ERRORS∗
By Xiaohong Chen† and Yingyao HuYale University and Johns Hopkins University
This paper considers identification and inference of a general non-linear Errors-in-Variables (EIV) model using two samples. Both sam-ples consist of a dependent variable, some error-free covariates, and anerror-ridden covariate, in which the measurement error has unknowndistribution and could be arbitrarily correlated with the latent truevalues; and neither sample contains an accurate measurement of thecorresponding true variable. We assume that the latent model of in-terest – the conditional distribution of the dependent variable giventhe latent true covariate and the error-free covariates – is the samein both samples, but the distributions of the latent true covariatesvary with observed error-free discrete covariates. We first show thatthe general latent nonlinear model is nonparametrically identified us-ing the two samples when both could have nonclassical errors, withno existence of instrumental variables nor independence between thetwo samples. When the two samples are independent and the latentnonlinear model is parameterized, we propose sieve Quasi MaximumLikelihood Estimation (Q-MLE) for the parameter of interest, andestablish its root-n consistency and asymptotic normality under pos-sible misspecification, and its semiparametric efficiency under correctspecification. We also provide a sieve likelihood ratio model selectiontest to compare two possibly misspecified parametric latent models.A small Monte Carlo simulation is presented.
1. Introduction. The Measurement error problems are frequently en-
countered by researchers conducting empirical studies in social and natural
sciences. A measurement error is called classical if it is independent of the
latent true values; otherwise, it is called nonclassical. There have been many
∗The authors would like to thank R. Blundell, R. Carroll, P. Cross, S. Donald, B.Fitzenberger, E. Mammen, L. Nesheim, M. Stinchcombe, C. Taber, and conference par-ticipants at the 2006 North American Summer Meeting of the Econometric Society andthe 2006 Southern Economic Association annual meeting for valuable suggestions.
†Chen acknowledges partial support from the National Science Foundation.AMS 2000 subject classifications: Primary 62H12, 62D05; secondary 62G08Keywords and phrases: data combination, nonlinear errors-in-variables model, non-
classical measurement error, nonparametric identification, misspecified parametric latentmodel, sieve likelihood estimation and inference
2 X. CHEN AND Y. HU
studies on identification and estimation of linear, nonlinear, and even non-
parametric models with classical measurement errors (see, e.g., Fuller (1987),
Cheng and Van Ness (1999), Wansbeek and Meijer (2000), Carroll, Ruppert,
Stefanski and Crainiceanu (2006) for detailed reviews). However, numerous
validation studies in economic survey data sets indicate that the errors in
self-reported variables, such as earnings, are typically correlated with the
true values, and hence, are nonclassical (see, e.g., Bound, Brown, and Math-
iowetz (2001)). In fact, in many survey situations, a rational agent has an in-
centive to purposely report wrong values conditioning on his/her truth. This
motivates many recent studies on Errors-In-Variables (EIV) problems allow-
ing for nonclassical measurement errors. In this paper, we provide one so-
lution to the nonparametric identification of a general nonlinear EIV model
by combining two samples, where both samples contain mismeasured co-
variates and neither contains an accurate measurement of the latent true
variable. Our identification strategy does not require the existence of instru-
mental variables or repeated measurements, and both samples could have
nonclassical measurement errors and the two samples could be arbitrarily
correlated.
It is well known that, without additional parametric restrictions or sample
information, a general nonlinear model cannot be identified in the presence
of mismeasured covariates. There are currently three broad approaches to
regaining identification of nonlinear EIV models. The first one is to impose
parametric restrictions on measurement error distributions (see, e.g., Hsiao
(1989), Fan (1991), Murphy and Van der Vaart (1996), Wang, Lin, Gutier-
rez and Carroll (1998), Liang, Hardle and Carroll (1999), Taupin (2001),
Hong and Tamer (2003), and others). The second approach is to assume
the existence of Instrumental Variables (IVs), such as a repeated measure-
ment of the mismeasured covariates, that do not enter the latent model of
interest but do contain information to recover features of latent true vari-
ables (see, e.g., Amemiya and Fuller (1988), Carroll and Stefanski (1990),
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 3
Hausman, Ichimura, Newey, and Powell (1991), Wang and Hsiao (1995),
Buzas and Stefanski (1996), Li and Vuong (1998), Newey (2001), Li (2002),
Wang (2004), Schennach (2004), Carroll, Ruppert, Crainiceanu, Tosteson
and Karagas (2004), Lewbel (2007), Hu (2006) and Hu and Schennach
(2006), to name only a few). The third approach to identifying nonlinear
EIV models with nonclassical errors is to combine two samples (see, e.g.,
Hausman, Ichimura, Newey, and Powell (1991), Carroll and Wand (1991),
Lee and Sepanski (1995), Chen, Hong, and Tamer (2005), Chen, Hong and
Tarozzi (2007), Hu and Ridder (2006), and Ichimura and Martinez-Sanchis
(2006), to name only a few).
The approach of combining samples has the advantages of allowing for
arbitrary measurement errors in the primary sample, without the need of
finding IVs or imposing parametric assumptions on measurement error dis-
tributions. However, all the currently published papers using this approach
require that the auxiliary sample contain an accurate measurement of the
true value; such a sample might be difficult to find in some applications. See
Carroll, Ruppert, and Stefanski (1995) and Ridder and Moffitt (2006) for a
detailed survey of this approach.
In this paper, we provide nonparametric identification of a general non-
linear EIV model with measurement errors in covariates by combining a
primary sample and an auxiliary sample, in which each sample contains
only one measurement of the error-ridden explanatory variable, and the er-
rors in both samples may be nonclassical. Our approach differs from the IV
approach in that we do not require an IV excluded from the latent model of
interest, and all the variables in our samples may be included in the model.
Our approach is closer to the existing two-sample approach, since we also
require an auxiliary sample and allow for nonclassical measurement errors
in both samples. However, our identification strategy differs crucially from
the existing two-sample approach in that neither of our samples contains an
accurate measurement of the latent true variable.
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
4 X. CHEN AND Y. HU
We assume that both samples consist of a dependent variable (Y ), some
error-free covariates (W ), and an error-ridden covariate (X), in which the
measurement error has unknown distribution and could be arbitrarily cor-
related with the latent true values (X∗); and neither sample contains an
accurate measurement of the corresponding true variable. We assume that
the latent model of interest, fY |X∗,W , the conditional distribution of the
dependent variable given the latent true covariate and the error-free covari-
ates, is the same in both samples, but the marginal distributions of the latent
true variables differ across some contrasting subsamples. These contrasting
subsamples of the primary and the auxiliary samples may be different geo-
graphic areas, age groups, or other observed demographic characteristics. We
use the difference between the distributions of the latent true values in the
contrasting subsamples of both samples to show that the measurement error
distributions are identified. To be specific, we may identify the relationship
between the measurement error distribution in the auxiliary sample and the
ratio of the marginal distributions of latent true values in the subsamples. In
fact, the ratio of the marginal distributions plays the role of an eigenvalue of
an observed linear operator, while the measurement error distribution in the
auxiliary sample is the corresponding eigenfunction. Therefore, the measure-
ment error distribution may be identified through a diagonal decomposition
of an observed linear operator under the normalization condition that the
measurement error distribution in the auxiliary sample has zero mode (or
zero median or mean). The latent nonlinear model of interest, fY |X∗,W , may
then be nonparametrically identified. In this paper, we first illustrate our
identification strategy using a nonlinear EIV model with nonclassical errors
in discrete covariates of two samples. We then focus on nonparametric iden-
tification of a general latent nonlinear model with arbitrary measurement
errors in continuous covariates.
Our identification result allows for fully nonparametric EIV models and
also allows for two correlated samples. But, in most empirical applications,
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 5
the latent models of interest are parametric nonlinear models, and the two
samples are regarded as independent. Within this framework, we propose a
sieve Quasi-Maximum Likelihood Estimation (Q-MLE) for the latent non-
linear model of interest using two samples with nonclassical measurement
errors. Under possible misspecification of the latent parametric model, we
establish root-n consistency and asymptotic normality of the sieve Q-MLE of
the finite dimensional parameter of interest, as well as its semiparametric ef-
ficiency under correct specification. In addition, we provide a sieve likelihood
ratio model selection test to compare two possibly misspecified parametric
nonlinear EIV models with nonclassical errors.
In this paper, for any two possibly vector-valued random variables A and
B, we let fA|B denote the conditional density of A given B, fA denote the
density of A. We assume the existence of two samples. The primary sam-
ple is a random sample from (X,W, Y ), in which X is a mismeasured X∗;
and the auxiliary sample is a random sample from (Xa,Wa, Ya), in which
Xa is a mismeasured X∗a . These two samples could be correlated and could
have different joint distributions. The rest of the paper is organized as fol-
lows. Section 2 establishes the nonparametric identification of the latent
probability model of interest, fY |X∗,W , using two samples with (possibly)
nonclassical errors. Section 3 presents the two-sample sieve Q-MLE and the
sieve likelihood ratio model selection test under possibly misspecified para-
metric latent models. Section 4 provides a Monte Carlo study and Section 5
briefly concludes. The Appendix contains the proofs of the main theorems.
2. Nonparametric Identification.
2.1. The dichotomous case: an illustration. We first illustrate our iden-
tification strategy by describing a special case in which all the variables
(X∗,X,W, Y ) are 0-1 dichotomous. For example, suppose that we are inter-
ested in the effect of the latent true college education level X∗ on the em-
ployment status Y with the marital status W as a covariate, i.e., fY |X∗,W .
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
6 X. CHEN AND Y. HU
Instead of X∗ we observe self-reported college education level X.
In this illustration subsection, we use italic letters to highlight all the
assumptions imposed for the nonparametric identification of fY |X∗,W , while
detailed discussions of the assumptions are postponed to subsection 2.2.
First, we assume that the measurement error in X is independent of all other
variables in the model conditional on the true value X∗, i.e., fX|X∗,W,Y =
fX|X∗ . In our simple example, this assumption implies that all the people
with the same latent true education level have the same pattern of misre-
porting, although the true education level could depend on other individual
characteristics. Under this assumption, the probability distribution of the
observables equals
(2.1)
fX,W,Y (x,w, y) =X
x∗=0,1
fX|X∗(x|x∗)fX∗,W,Y (x∗, w, y) for all x,w, y.
We define the matrix representations of fX|X∗ as follows:
LX|X∗ =
ÃfX|X∗(0|0) fX|X∗(0|1)fX|X∗(1|0) fX|X∗(1|1)
!.
Notice that the matrix LX|X∗ contains the same information as the condi-
tional density fX|X∗ . Equation (2.1) becomes for all w, y
(2.2)
ÃfX,W,Y (0, w, y)fX,W,Y (1, w, y)
!= LX|X∗ ×
ÃfX∗,W,Y (0, w, y)fX∗,W,Y (1, w, y)
!.
Equation (2.2) implies that the density fX∗,W,Y would be identified provided
that LX|X∗ would be identifiable and invertible.
Equation (2.1) is equivalent to, for the subsamples of the married (W = 1)
and of the unmarried (W = 0)
(2.3) fX,Y |W=j(x, y) =X
x∗=0,1
fX|X∗ (x|x∗) fY |X∗,W=j(y|x∗)fX∗|W=j(x∗),
in which fX,Y |W=j(x, y) ≡ fX,Y |W (x, y|j) and j = 0, 1. By counting the
numbers of knowns and unknowns in equation (2.3), one can see that the
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 7
unknown densities fX|X∗ , fY |X∗,W=j and fX∗|W=j cannot be identified using
the primary sample alone.
In the auxiliary sample, we assume that the measurement error in Xa
satisfies the same conditional independence assumption as that in X, i.e.,
fXa|X∗a ,Wa,Ya = fXa|X∗a . Furthermore, we link the two samples by a stable
assumption that the effect of interest, i.e., the distribution of the employment
status conditional on the true education level and the marital status, is the
same in the two samples, i.e., fYa|X∗a ,Wa(y|x∗, w) = fY |X∗,W (y|x∗, w) for all
y, x∗, w. Therefore, we have for the subsamples of the married (Wa = 1) and
of the unmarried (Wa = 0):
(2.4)
fXa,Ya|Wa=j(x, y) =X
x∗=0,1
fXa|X∗a (x|x∗) fY |X∗,W=j(y|x∗)fX∗a |Wa=j(x
∗).
Since all the variables are 0-1 dichotomous and probabilities sum to one,
Equations (2.3) and (2.4) involve 12 distinct known probability values of
fX,Y |W=j and fXa,Ya|Wa=j , and 12 distinct unknown values of fX|X∗ , fY |X∗,W=j ,
fX∗|W=j , fXa|X∗a and fX∗a |Wa=j , which makes exact identification (unique so-
lution) of the 12 distinct unknown values possibly. However, equations (2.3)
and (2.4) are nonlinear in the unknown values, we need additional restric-
tions to ensure the existence of unique solution.
Denote Wj = {j} for j = 0, 1. Define the matrix representations of rele-vant densities for the subsamples of the married (W1) and of the unmarried
(W0) in the primary sample as follows: for j = 0, 1,
LX,Y |Wj=
ÃfX,Y |Wj
(0, 0) fX,Y |Wj(0, 1)
fX,Y |Wj(1, 0) fX,Y |Wj
(1, 1)
!,
LY |X∗,Wj=
ÃfY |X∗,Wj
(0|0) fY |X∗,Wj(0|1)
fY |X∗,Wj(1|0) fY |X∗,Wj
(1|1)
!T
,
LX∗|Wj=
ÃfX∗|Wj
(0) 0
0 fX∗|Wj(1)
!,
where the superscript T stands for the transpose of a matrix. Let Waj =
{j} for j = 0, 1. We similarly define the matrix representations LXa,Ya|Waj,
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
8 X. CHEN AND Y. HU
LXa|X∗a , and LX∗a |Wajof the corresponding densities fXa,Ya|Waj
, fXa|X∗a , and
fX∗a |Wajin the auxiliary sample. To simplify notation, in the following we
use Wj instead of Waj in the auxiliary sample.
Using the matrix notations, equation (2.3) becomes for j = 0, 1,
LX,Y |Wj
=
ÃfX,Y |Wj
(0, 0) fX,Y |Wj(0, 1)
fX,Y |Wj(1, 0) fX,Y |Wj
(1, 1)
!
=
ÃfX|X∗(0|0) fX|X∗(0|1)fX|X∗(1|0) fX|X∗(1|1)
!ÃfY,X∗|Wj
(0, 0) fY,X∗|Wj(1, 0)
fY,X∗|Wj(0, 1) fY,X∗|Wj
(1, 1)
!
= LX|X∗
ÃfX∗|Wj
(0) 0
0 fX∗|Wj(1)
!ÃfY |X∗,Wj
(0|0) fY |X∗,Wj(0|1)
fY |X∗,Wj(1|0) fY |X∗,Wj
(1|1)
!T
= LX|X∗LX∗|WjLY |X∗,Wj
that is
(2.5) LX,Y |Wj= LX|X∗LX∗,Y |Wj
= LX|X∗LX∗|WjLY |X∗,Wj
.
Similarly, equation (2.4) becomes
(2.6) LXa,Ya|Wj= LXa|X∗aLX∗a ,Ya|Wj
= LXa|X∗aLX∗a |WjLY |X∗,Wj
.
We assume that the observable matrices LX,Y |Wjand LXa,Ya|Wj
are in-
vertible, that the diagonal matrices LX∗|Wjand LX∗a |Wj
are invertible, and
that LXa|X∗a is invertible. Then equations (2.6) and (2.5) imply that LY |X∗,Wj
and LX|X∗ are invertible. We can then eliminate LY |X∗,Wj, to have for
j = 0, 1
LXa,Ya|WjL−1X,Y |Wj
= LXa|X∗aLX∗a |WjL−1X∗|Wj
L−1X|X∗ .
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 9
Since this equation holds for j = 0, 1, we may then eliminate LX|X∗ , to have
LXa,Xa
≡³LXa,Ya|W1
L−1X,Y |W1
´ ³LXa,Ya|W0
L−1X,Y |W0
´−1= LXa|X∗a
³LX∗a |W1
L−1X∗|W1LX∗|W0
L−1X∗a |W0
´L−1Xa|X∗a
=
ÃfXa|X∗a (0|0) fXa|X∗a (0|1)fXa|X∗a (1|0) fXa|X∗a (1|1)
!ÃkX∗a (0) 00 kX∗a (1)
!×(2.7)
×Ã
fXa|X∗a (0|0) fXa|X∗a (0|1)fXa|X∗a (1|0) fXa|X∗a (1|1)
!−1,
with
kX∗a (x∗) ≡
fX∗a |W1(x∗) fX∗|W0
(x∗)
fX∗|W1(x∗) fX∗a |W0
(x∗)for x∗ ∈ {0, 1}.
Notice that the matrix LXa,Xa on the left-hand side of the equation (2.7) can
be viewed as observed given the data. Equation (2.7) provides an eigenvalue-
eigenvector decomposition of LXa,Xa . If such a decomposition is unique, then
we may identify (or solve) LXa|X∗a , i.e., fXa|X∗a , from the observed matrix
LXa,Xa .
To ensure a unique eigenvalue-eigenvector decomposition of LXa,Xa , we
assume that the eigenvalues are distinctive; i.e., kX∗a (0) 6= kX∗a (1). This
assumption requires that the distributions of the latent education level of
the married or the unmarried in the primary sample are different from those
in the auxiliary sample, and that the distribution of the latent education
level of the married is different from that of the unmarried in one of the
two samples. Notice that each eigenvector is a column in LXa|X∗a , which is
a conditional density; hence each eigenvector is automatically normalized.
Therefore, for an observed LXa,Xa , we may have an eigenvalue-eigenvector
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
10 X. CHEN AND Y. HU
decomposition as follows:
LXa,Xa(2.8)
=
ÃfXa|X∗a (0|x
∗1) fXa|X∗a (0|x
∗2)
fXa|X∗a (1|x∗1) fXa|X∗a (1|x
∗2)
!ÃkX∗a (x
∗1) 0
0 kX∗a (x∗2)
!×
×Ã
fXa|X∗a (0|x∗1) fXa|X∗a (0|x
∗2)
fXa|X∗a (1|x∗1) fXa|X∗a (1|x
∗2)
!−1.
The value of each entry on the right-hand side of equation (2.8) can be
directly computed from the observed matrix LXa,Xa . The only ambiguity
left in equation (2.8) is the value of the indices x∗1 and x∗2, or the index-
ing of the eigenvalues and eigenvectors. In other words, the identification
of fXa|X∗a boils down to finding a 1-to-1 mapping between the two sets of
indices of the eigenvalues and eigenvectors: {x∗1, x∗2}⇐⇒ {0, 1}. For this, wemake a normalization assumption that people with (or without) true college
education in the auxiliary sample are more likely to report that they have
(or do not have) college education; i.e., fXa|X∗a (x∗|x∗) > 0.5 for x∗ = 0, 1.
(This assumption also implies the invertibility of LXa|X∗a .) Since the values of
fXa|X∗a (0|x∗1) and fXa|X∗a (1|x
∗1) are known in equation (2.8), this assumption
pins down the index x∗1 as follows:
x∗1 =
(0 if fXa|X∗a (0|x
∗1) > 0.5
1 if fXa|X∗a (1|x∗1) > 0.5
.
The value of x∗2 may be found in the same way. In summary, we have iden-
tified LXa|X∗a , i.e., fXa|X∗a , from the decomposition of the observed matrix
LXa,Xa .
After identifying LXa|X∗a , we can then identify LX∗a ,Ya|Wjor fX∗a ,Ya|Wj
from equation (2.6) as LX∗a ,Ya|Wj= L−1Xa|X∗a
LXa,Ya|Wj; hence the conditional
density fY |X∗,Wj= fYa|X∗a ,Wj
and the marginal density fX∗a |Wjare identified.
Moreover, we can then identify LX,X∗|Wjor fX,X∗|Wj
from equation (2.5)
as LX,X∗|Wj= LX,Y |Wj
L−1Y |X∗,Wj; hence the densities fX|X∗ and fX∗|Wj
are
identified.
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 11
This simple example with dichotomous variables demonstrates that we
can nonparametrically identify fY |X∗,W = fYa|X∗a ,Wa, fX|X∗ and fXa|X∗a using
the two samples in which both samples contain nonclassical measurement
errors. We next show that such a nonparametric identification strategy is in
fact generally applicable.
2.2. The continuous latent regressor case. We are interested in identify-
ing a latent (structural) probability model: fY |X∗,W (y|x∗, w), in which Y is
a continuous dependent variable, X∗ is an unobserved continuous regressor
subject to a possibly nonclassical measurement error, and W is an accu-
rately measured discrete covariate. For example, the discrete covariate W
may stand for subpopulations with different demographic characteristics,
such as marital status, race, gender, profession, and geographic location.
Suppose the supports of X,W, Y, and X∗ are X ⊆ R, W = {w1, w2, ..., wJ},Y ⊆ R, and X ∗ ⊆ R, respectively. We assume
Assumption 2.1. fX|X∗,W,Y (x|x∗, w, y) = fX|X∗(x|x∗) for all x ∈ X ,x∗ ∈ X ∗, w ∈W, and y ∈ Y.
Assumption 2.1 implies that the measurement error in X is independent
of all other variables in the model conditional on the true value X∗. The
measurement error in X may still be correlated with the true value X∗ in
an arbitrary way, and hence is nonclassical.
Assumption 2.2. (i) X∗a , Wa, and Ya have the same supports as X
∗,
W , and Y , respectively; (ii) fXa|X∗a ,Wa,Ya(x|x∗, w, y) = fXa|X∗a (x|x∗) for all
x ∈ X , x∗ ∈ X ∗, w ∈W, and y ∈ Y.
The next condition requires that the latent structural probability model
is the same in both samples, which is a reasonable stable assumption.
Assumption 2.3. fYa|X∗a ,Wa(y|x∗, w) = fY |X∗,W (y|x∗, w) for all x∗ ∈
X ∗, w ∈W, and y ∈ Y.
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
12 X. CHEN AND Y. HU
We note that, under assumption 2.3, the joint distributions of the true
value X∗ and covariate W in the primary sample may still be different from
those of X∗a and Wa in the auxiliary sample.
Let Lp (X ), 1 ≤ p <∞ denote the space of functions withRX |h(x)|pdx <
∞, and let L∞ (X ) be the space of functions with supx∈X |h(x)| <∞. Thenit is clear that for any fixed w ∈ W, y ∈ Y, fX,W,Y (·, w, y) ∈ Lp (X ) ,and fX∗,W,Y (·, w, y) ∈ Lp (X ∗) for all 1 ≤ p ≤ ∞. Let HX ⊆ L2 (X ) andHX∗ ⊆ L2 (X ∗) . Define the integral operator LX|X∗ : HX∗ → HX as:
{LX|X∗h} (x) =ZX∗
fX|X∗ (x|x∗)h (x∗) dx∗ for any h ∈ HX∗ , x ∈ X .
Denote Wj = {wj} for j = 1, ..., J and define the following operators for theprimary sample³
LX,Y |Wjh´(x) =
ZfX,Y |W (x, u|wj)h (u) du,³
LY |X∗,Wjh´(x∗) =
ZfY |X∗,Wj
(u|x∗)h (u) du,³LX∗|Wj
h´(x∗) = fX∗|Wj
(x∗)h (x∗) .
We also define the operators LXa|X∗a , LXa,Ya|Wj, LYa|X∗a ,Wj
, and LX∗a |Wjfor
the auxiliary sample in the same way as their counterparts for the primary
sample. Notice that operators LX∗|Wjand LX∗a |Wj
are diagonal operators.
Under the operators formulation, assumption 2.1 implies
LX,Y |Wj= LX|X∗LX∗|Wj
LY |X∗,Wj
in the primary sample; assumptions 2.2 and 2.3 imply
LXa,Ya|Wj= LXa|X∗aLX∗a |Wj
LY |X∗,Wj
in the auxiliary sample. We assume
Assumption 2.4. LXa|X∗a : HX∗a → HXa is injective, i.e., the set {h ∈HX∗a : LXa|X∗ah = 0} = {0}.
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 13
Assumption 2.4 is equivalent to assume that the linear operator LXa|X∗a is
invertible. Recall that the conditional expectation operator of X∗a given Xa,
EX∗a |Xa: L2 (X ∗)→ L2 (Xa), is defined as
{EX∗a |Xah0}(x) =
ZX∗
fX∗a |Xa(x∗|x)h0(x∗)dx∗
= E[h0 (X∗a) |Xa = x] for any h0 ∈ L2 (X ∗) , x ∈ Xa.
We have {LXa|X∗ah} (x) =RX∗ fXa|X∗a (x|x
∗)h (x∗) dx∗ = E
∙fXa(x)h(X
∗a)
fX∗a (X∗a)
|Xa = x
¸for any h ∈ HX∗a , x ∈ Xa. Assumption 2.4 is equivalent toE
∙h (X∗
a)fXa(Xa)fX∗a (X
∗a)|Xa
¸=
0 implies h = 0. If 0 < fX∗a (x∗) <∞ over int(X ∗) and 0 < fXa(x) <∞ over
int(Xa) (which are very minor restrictions), then assumption 2.4 is the sameas the identification condition imposed in Newey and Powell (2003), and Car-
rasco, Florens, and Renault (2006), among others. As these authors point
out, this condition is implied by the completeness of the conditional density
fX∗a |Xa, which is satisfied, for example, when fX∗a |Xa
belongs to an exponen-
tial family. Moreover, if we are willing to assume supx∗,w fX∗a ,Wa(x∗, w) ≤
c < ∞, then a sufficient condition for assumption 2.4 is the bounded com-pleteness of the conditional density fX∗a |Xa
; see, e.g., Blundell, Chen, and
Kristensen (2007) and Chernozhukov, Imbens, and Newey (2007). Distrib-
utions that are complete are automatically bounded complete. There are
much larger families of distributions that are bounded complete (and some
of them may not be complete). See, e.g., Lehmann (1986, page 173), Hoeffd-
ing (1977) and Mattner (1993) for many examples. When Xa and X∗a are
discrete, assumption 2.4 requires that the support of Xa is not smaller than
that of X∗a .
Assumption 2.5. (i) fX∗|Wj> 0 and fX∗a |Wj
> 0; (ii) LX,Y |Wjis injec-
tive; (iii) LX,Y |Wjis injective.
Assumptions 2.4 and 2.5 imply that LY |X∗,Wjand LX|X∗ are invertible.
In the Appendix we establish the diagonalization of an observed operator
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
14 X. CHEN AND Y. HU
LijXa,Xa
:
LijXa,Xa
= LXa|X∗aLijX∗a
L−1Xa|X∗afor all i, j,
where the operator LijX∗a≡³LX∗a |Wj
L−1X∗|WjLX∗|Wi
L−1X∗a |Wi
´is a diagonal op-
erator defined as:³LijX∗a
h´(x∗) = kijX∗a (x
∗)h (x∗) with
kijX∗a (x∗) ≡
fX∗a |Wj(x∗) fX∗|Wi
(x∗)
fX∗|Wj(x∗) fX∗a |Wi
(x∗).
In order to show the identification of fXa|X∗a and kijX∗a (x∗), we assume
Assumption 2.6. supx∗∈X∗ kijX∗a(x∗) <∞ for all i, j ∈ {1, 2, ..., J}.
Notice that the subsets W1,W2, ...,WJ ⊂ W do not need to be collectively
exhaustive. We may only consider those subsets in W in which these as-
sumptions are satisfied.
Assumption 2.7. For any x∗1 6= x∗2, there exist i, j ∈ {1, 2, ..., J}, suchthat kijX∗a (x
∗1) 6= kijX∗a (x
∗2).
Assumption 2.7 implies that, for any two different eigenfunctions fXa|X∗a (·|x∗1)
and fXa|X∗a (·|x∗2), one can always find two subsets Wj and Wi such that the
two different eigenfunctions correspond to two different eigenvalues kijX∗a (x∗1) and
kijX∗a (x∗2) and, therefore, are identified. Although there may exist duplicate
eigenvalues in each decomposition corresponding to a pair of i and j, this
assumption guarantees that each eigenfunction fXa|X∗a (·|x∗) is uniquely de-
termined by combining all the information from a series of decompositions
of LijXa,Xa
for i, j ∈ {1, 2, ..., J}.We now provide an example of the marginal distribution of X∗ to il-
lustrate that assumptions 2.6 and 2.7 are easily satisfied. Suppose that
the distribution of X∗ in the primary sample is the standard normal, i.e.,
fX∗|wj (x∗) = ψ (x∗) for j = 1, 2, 3, where ψ is the probability density func-
tion of the standard normal, and the distribution of X∗a in the auxiliary
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 15
sample is for 0 < σ < 1 and μ 6= 0
(2.9) fX∗a |wj (x∗) =
⎧⎪⎨⎪⎩ψ (x∗) for j = 1
σ−1ψ¡σ−1x∗
¢for j = 2
ψ (x∗ − μ) for j = 3.
It is obvious that assumption 2.6 is satisfied with
(2.10) kijX∗a (x∗) =
⎧⎨⎩ σ−1 exp³−1−σ−22 (x∗)2
´for i = 1, j = 2
ψ(x∗−μ)ψ(x∗) for i = 1, j = 3
.
For i = 1,j = 2, any two eigenvalues k12X∗a (x∗1) and k
12X∗a(x∗2) of L
12Xa,Xa
may be
the same if and only if x∗1 = −x∗2. In other words, we cannot distinguish theeigenfunctions fXa|X∗a (·|x
∗1) and fXa|X∗a (·|x
∗2) in the decomposition of L
12Xa,Xa
if and only if x∗1 = −x∗2. Since kijX∗a(x∗) for i = 1, j = 3 is not symmetric
around zero, the eigenvalues k13X∗a (x∗1) and k13X∗a (x
∗2) of L
13Xa,Xa
are different
for any x∗1 = −x∗2. Notice that the operators L12Xa,Xaand L13Xa,Xa
share the
same eigenfunctions fXa|X∗a (·|x∗1) and fXa|X∗a (·|x
∗2). Therefore, we may dis-
tinguish the eigenfunctions fXa|X∗a (·|x∗1) and fXa|X∗a (·|x
∗2) with x
∗1 = −x∗2 in
the decomposition of L13Xa,Xa. By combining the information obtained from
the decompositions of L12Xa,Xaand L13Xa,Xa
, we can distinguish the eigenfunc-
tions corresponding to any two different values of x∗.
Remark 2.1. (1) Assumption 2.7 does not hold if fX∗|W=wj (x∗) =
fX∗a |W=wj (x∗) for all wj and all x
∗ ∈ X ∗. This assumption requires that thetwo samples be from different populations. Given assumption 2.3 and the in-
vertibility of the operator LY |X∗,Wj, one could check assumption 2.7 from the
observed densities fY |W=wj and fYa|Wa=wj . In particular, if fY |W=wj (y) =
fYa|Wa=wj (y) for all wj and all y ∈ Y, then assumption 2.7 is not satis-fied. (2) Assumption 2.7 does not hold if fX∗|W=wj (x
∗) = fX∗|W=wi (x∗)
and fX∗a |Wa=wj (x∗) = fX∗a |Wa=wi (x
∗) for all wj 6= wi and all x∗ ∈ X ∗.
This means that the marginal distribution of X∗ or X∗a should be different
in the subsamples corresponding to different wj in at least one of the two
samples. For example, if X∗ or X∗a are earnings and wj corresponds to gen-
der, then assumption 2.7 requires that the earning distribution of males be
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
16 X. CHEN AND Y. HU
different from that of females in one of the samples (either the primary or
the auxiliary). Given the invertibility of the operators LX|X∗ and LXa|X∗a ,
one could check assumption 2.7 from the observed densities fX|W=wj and
fXa|Wa=wj . In particular, if fX|W=wj (x) = fX|W=wi (x) for all wj 6= wi,
and all x ∈ X , then assumption 2.7 requires the existence of an auxiliarysample such that fXa|Wa=wj (Xa) 6= fXa|Wa=wi (Xa) with positive probability
for some wj 6= wi.
In order to fully identify each eigenfunction, i.e., fXa|X∗a , we need to iden-
tify the exact value of x∗ in each eigenfunction fXa|X∗a (·|x∗). Notice that
the eigenfunction fXa|X∗a (·|x∗) is identified up to the value of x∗. In other
words, we have identified a probability density of Xa conditional on X∗a = x∗
with the value of x∗ unknown. An intuitive normalization assumption is
that the value of x∗ is the mean of this identified probability density, i.e.,
x∗ =RxfXa|X∗a (x|x
∗) dx; this assumption implies that the measurement
error in the auxiliary sample has zero mean conditional on the latent true
values. An alternative normalization assumption is that the value of x∗ is the
mode of this identified probability density, i.e., x∗ = argmaxx
fXa|X∗a (x|x∗);
this assumption implies that the error distribution conditional on the latent
true values has zero mode. The intuition behind this assumption is that
people are more willing to report some values close to the latent true val-
ues than they are to report those far from the truth. Another normalization
assumption may be that the value of x∗ is the median of the identified proba-
bility density, i.e., x∗ = infnz :R z−∞ fXa|X∗a (x|x
∗) dx ≥ 12
o; this assumption
implies that the error distribution conditional on the latent true values has
zero median, and that people have the same probability of overreporting as
that of underreporting. Obviously, the zero median condition can be gener-
alized to an assumption that the error distribution conditional on the latent
true values has a zero quantile.
Assumption 2.8. One of the followings holds for all x∗ ∈ X ∗: (i) (mean)
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 17RxfXa|X∗a (x|x
∗) dx = x∗; or (ii) (mode) argmaxx
fXa|X∗a (x|x∗) = x∗; or (iii)
(quantile) there is an γ ∈ (0, 1) such that infnz :R z−∞ fXa|X∗a (x|x
∗) dx ≥ γo=
x∗.
Assumption 2.8 requires that the support of Xa not be smaller than that
of X∗a , and that, although the measurement error in the auxiliary sample
(Xa−X∗a) could be nonclassical, it needs to satisfy some location regularity
such as zero conditional mean, or zero conditional mode or zero conditional
median. Recall that, in the dichotomous case, assumption 2.8 with zero
conditional mode also implies the invertibility of LXa|X∗a (i.e., assumption
2.4). However, this is not true in the general discrete case. For the general
discrete case, a comparable sufficient condition for the invertibility of LXa|X∗ais strictly diagonal dominance (i.e., the diagonal entries of LXa|X∗a are all
larger than 0.5), but assumption 2.8 with zero mode only requires that
the diagonal entries of LXa|X∗a be the largest in each row, which cannot
guarantee the invertibility of LXa|X∗a when the support of X∗a contains more
than 2 values.
We obtain the following identification result.
Theorem 2.1. Suppose assumptions 2.1—2.8 hold. Then, the densities
fX,W,Y and fXa,Wa,Ya uniquely determine fY |X∗,W , fX|X∗, and fXa|X∗a .
Remark 2.2. (1) When there exist extra common covariates in the two
samples, we may consider more generally defined W and Wa, or relax as-
sumptions on the error distributions in the auxiliary sample. On the one
hand, this identification theorem still holds when we replace W and Wa with
a scalar measurable function of W and Wa, respectively. The identification
theorem is still valid. On the other hand, we may relax assumptions 2.1 and
2.2(ii) to allow the error distributions to be conditional on the true values
and the extra common covariates. (2) The identification theorem does not
require that the two samples be independent of each other.
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
18 X. CHEN AND Y. HU
3. Sieve Quasi Likelihood Estimation and Inference. Our iden-
tification result is very general and does not require the two samples to be
independent. However, for many applications, it is reasonable to assume that
there are two random samples {Xi,Wi, Yi}ni=1 and {Xaj ,Waj , Yaj}naj=1 thatare mutually independent.
As shown in Section 2, the densities fY |X∗,W , fX|X∗ , fX∗|W , fXa|X∗a , and
fX∗a |Waare nonparametrically identified under assumptions 2.1—2.8. Never-
theless, in empirical studies, we typically have either a semiparametric or a
parametric specification of the conditional density fY |X∗,W as the model of
interest. In this section, we treat the other densities fX|X∗ , fX∗|W , fXa|X∗a ,
and fX∗a |Waas unknown nuisance functions, but consider a parametrically
specified conditional density of Y given (X∗,W ):
{g(y|x∗, w; θ) : θ ∈ Θ}, Θ a compact subset of Rdθ , 1 ≤ dθ <∞.
Define
θ0 ≡ argmaxθ∈Θ
Z[log g(y|x∗, w; θ)]fY |X∗,W (y|x∗, w)dy.
The latent parametric model is correctly specified if g(y|x∗, w; θ0) = fY |X∗,W (y|x∗, w)for almost all y, x∗, w (and θ0 is called true parameter value); otherwise it is
misspecified (and θ0 is called pseudo-true parameter value); see, e.g., White
(1982).
In this section we provide a root-n consistent and asymptotically nor-
mally distributed sieve MLE of θ0, regardless of whether the latent para-
metric model g(y|x∗, w; θ) is correctly specified or not. When g(y|x∗, w; θ)is misspecified, the estimator is better called the “sieve quasi MLE” than
the “sieve MLE.” (In this paper we have used both terminologies since we
allow the latent model g(y|x∗, w; θ) to either correctly or incorrectly specifythe true latent conditional density fY |X∗,W .) Under the correct specifica-
tion of the latent model, we show that the sieve MLE of θ0 is automatically
semiparametrically efficient, and provide a simple consistent estimator of its
asymptotic variance. In addition, we provide a sieve likelihood ratio model
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 19
selection test of two non-nested parametric specifications of fY |X∗,W when
both could be misspecified.
3.1. Sieve likelihood estimation under possible misspecification. Let α0 ≡(θT0 , f01, f01a, f02, f02a)
T ≡ (θT0 , fX|X∗ , fXa|X∗a , fX∗|W , fX∗a |Wa)T denote the
true parameter value, in which θ0 is really “pseudo-true” when the paramet-
ric model g(y|x∗, w; θ) is incorrectly specified for the unknown true densityfY |X∗,W . We introduce a dummy random variable S, with S = 1 indicating
the primary sample and S = 0 indicating the auxiliary sample. Then we
have a big combined samplenZTt ≡ (StXt, StWt, StYt, St, (1− St)Xt, (1− St)Wt, (1− St)Yt)
on+nat=1
such that {Xt,Wt, Yt, St = 1}nt=1 is the primary sample and {Xt,Wt, Yt, St =
0}n+nat=n+1 is the auxiliary sample. Denote p ≡ Pr(St = 1) ∈ (0, 1). Then theobserved joint likelihood for α0 is
n+naYt=1
[p× f(Xt,Wt, Yt|St = 1;α0)]St [(1− p)× f(Xt,Wt, Yt|St = 0;α0)]1−St ,
in which
f(X,W,Y |S = 1;α0) = fW (W )
Zf01(X|x∗)g(Y |x∗,W ; θ0)f02(x
∗|W )dx∗,
f(X,W, Y |S = 0;α0) = fWa(W )
Zf01a(X|x∗)g(Y |x∗,W ; θ0)f02a(x
∗|W )dx∗.
Before we present a sieve (quasi-) MLE estimator bα for α0, we need to im-pose some mild smoothness restrictions on the unknown densities. The sieve
method allows for unknown functions belonging to many different function
spaces such as Sobolev space, Besov space, and others; see, e.g., Shen and
Wong (1994) and Van de Geer (1993, 2000). But for the sake of concreteness
and simplicity, we consider the widely used Holder space of functions. Let
ξ = (ξ1, ξ2)T ∈ R2, a = (a1, a2)
T , and ∇ah(ξ) ≡ ∂a1+a2h(ξ1,ξ2)
∂ξa11 ∂ξ
a22
denote the
(a1+a2)-th derivative. Let k·kE denote the Euclidean norm. Let V ⊆ R2 andγ be the largest integer satisfying γ > γ. The Holder space Λγ(V) of order
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
20 X. CHEN AND Y. HU
γ > 0 is a space of functions h : V 7→ R, such that the first γ derivativesare continuous and bounded, and the γ-th derivative is Holder continuous
with the exponent γ−γ ∈ (0, 1]. The Holder space Λγ(V) becomes a Banachspace under the Holder norm:
khkΛγ = maxa1+a2≤γ
supξ|∇ah(ξ)|+ max
a1+a2=γsupξ 6=ξ0
¯∇ah(ξ)−∇ah(ξ0)
¯¡°°ξ − ξ0°°E
¢γ−γ <∞.
We define a Holder ball as Λγc (V) ≡ {h ∈ Λγ(V) : khkΛγ ≤ c <∞}. Denote
F1 =½f1(·|·) ∈ Λγ1c (X ×X ∗) : f1(·|x∗) > 0,
ZXf1(x|x∗)dx = 1 for all x∗ ∈ X ∗
¾,
F1a =(
f1a(·|·) ∈ Λγ1ac (Xa ×X ∗) : assumptions 2.4, 2.8 hold,f1a(·|x∗) > 0,
RXa f1a(x|x
∗)dx = 1 for all x∗ ∈ X ∗
),
F2 =(
f2 (·|w) ∈ Λγ2c (X ∗) : assumptions 2.6, 2.7 hold,f2 (·|w) > 0,
RX∗ f2 (x
∗|w) dx∗ = 1 for all w ∈W
),
We impose the following smoothness restrictions on the densities:
Assumption 3.1. (i) All the assumptions in theorem 2.1 hold; (ii) fX|X∗(·|·) ∈F1 with γ1 > 1; (iii) fXa|X∗a (·|·) ∈ F1a with γ1a > 1; (iv) fX∗|W (·|w) , fX∗a |Wa
(·|w) ∈F2 with γ2 > 1/2 for all w ∈W.
Denote A = Θ×F1×F1a×F2×F2 and α =³θT , f1, f1a, f2, f2a
´T. Then
the log-joint likelihood for α ∈ A is given by:n+naXt=1
(St ln [p× f(Xt,Wt, Yt|St = 1;α)]
+(1− St) ln [(1− p)× f(Xt,Wt, Yt|St = 0;α)]
)
= n ln p+ na ln(1− p) +n+naXt=1
(Zt;α),
in which
(Zt;α) ≡ St p(Zt; θ, f1, f2) + (1− St) a(Zt; f1a, f2a),
p(Zt; θ, f1, f2) = ln
Zf1(Xt|x∗)g(Yt|x∗,Wt; θ)f2(x
∗|Wt)dx∗ + ln fW (Wt),
a(Zt; f1a, f2a) = ln
Zf1a(Xt|x∗a)g(Yt|x∗a,Wt; θ)f2a(x
∗a|Wt)dx
∗a + ln fWa(Wt).
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 21
Let E[·] denote the expectation with respect to the underlying true datagenerating process for Zt. To stress that our combined data set consists of
two samples, sometimes we let Zpi = (Xi,Wi, Yi)T denote i− th observation
in the primary data set, and Zaj = (Xaj ,Waj , Yaj)T denote j−th observation
in the auxiliary data set. Then
α0 = arg supα∈A
E [ (Zt;α)]
= arg supα∈A
[pE{ p(Zpi; θ, f1, f2)}+ (1− p)E{ a(Zaj ; f1a, f2a)}] .
Let An = Θ × Fn1 × Fn
1a × Fn2 × Fn
2 be a sieve space for A, which is asequence of approximating spaces that are dense in A under some pseudo-
metric. The two-sample sieve quasi- MLE bαn = µbθT , bf1, bf1a, bf2, bf2a¶T ∈ An
for α0 ∈ A is defined as:
bαn = argmaxα∈An
n+naXt=1
(Zt;α)
= argmaxα∈An
⎡⎣ nXi=1
p(Zpi; θ, f1, f2) +naXj=1
a(Zaj ; f1a, f2a)
⎤⎦ .We could apply infinite-dimensional approximating spaces as sieves Fn
j
for Fj , j = 1, 1a, 2. However, in applications we shall use finite-dimensional
sieve spaces since they are easier to implement. For j = 1, 1a, 2, let pkj,nj (·)
be a kj,n× 1−vector of known basis functions, such as power series, splines,Fourier series, wavelets, Hermite polynomials, etc. Then we denote the sieve
space for F1, F1a, and F2 as follows:
Fn1 =
nf1(x|x∗) = p
k1,n1 (x, x∗)Tβ1 ∈ F1
o,
Fn1a =
nf1a(xa|x∗a) = p
k1a,n1a (xa, x
∗a)Tβ1a ∈ F1a
o,
Fn2 =
⎧⎨⎩f2 (x∗|w) =JXj=1
1 (w = wj) pk2,n2 (x∗)Tβ2j ∈ F2
⎫⎬⎭ ,
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
22 X. CHEN AND Y. HU
3.1.1. Consistency. Define a norm on A as: kαks = kθkE + kf1k∞,ω1+
kf1ak∞,ω1a+kf2k∞,ω2
+kf2ak∞,ω2in which khk∞,ωj
≡ supξ |h(ξ)ωj (ξ)| with
ωj (ξ) =³1 + kξk2E
´−ςj/2, ςj > 0 for j = 1, 1a, 2. We assume each of X , Xa,
X ∗ is R, and
Assumption 3.2. (i) {Xi,Wi, Yi}ni=1 and {Xaj ,Waj , Yaj}naj=1 are i.i.dand independent of each other. In addition, limn→∞
nn+na
= p ∈ (0, 1); (ii)g(y|x∗, w; θ) is continuous in θ ∈ Θ, and Θ is a compact subset of Rdθ ; and
(iii) θ0 ∈ Θ is the unique maximizer ofR[log g(y|x∗, w; θ)]fY |X∗,W (y|x∗, w)dy
over θ ∈ Θ.
Assumption 3.3. (i) −∞ < E [ (Zt;α0)] < ∞, E [ (Zt;α)] is upper
semicontinuous on A under the metric k·ks; (ii) there is a finite κ > 0 and a
random variable U(Zt) with E{U(Zt)} <∞ such that supα∈An:kα−α0ks≤δ | (Zt;α)−(Zt;α0)| ≤ δκU(Zt).
Assumption 3.4. (i) pk2,n2 (·) is a k2,n×1−vector of spline wavelet basis
functions on R, and for j = 1, 1a, pkj,nj (·, ·) is a kj,n × 1−vector of tensor
product of spline wavelet basis functions on R2; (ii) kn ≡ max{k1,n, k1a,n, k2,n}→∞ and kn/n→ 0.
Assumption 3.2(i) is a typical condition used in cross-sectional analyses
with two samples. Assumption 3.2(ii—iii) are typical conditions for paramet-
ric (quasi-) MLE of θ0 if X∗ could be observed without error. Assumption
3.3(ii) requires that the log density be Holder continuous under the metric
k·ks over the sieve space. The following consistency lemma is a direct appli-cation of lemma A.1 of Newey and Powell (2003) or theorem 3.1 (or remark
3.1(4), remark 3.3) of Chen (2006); hence, we omit its proof.
Lemma 3.1. Let bαn be the two-sample sieve MLE. Under assumptions3.1—3.4, we have kbαn − α0ks = op(1).
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 23
3.1.2. Convergence rate under a weaker metric. Given Lemma 3.1, we
can now restrict our attention to a shrinking || · ||s−neighborhood aroundα0. Let A0s ≡ {α ∈ A : ||α − α0||s = o(1), ||α||s ≤ c0 < c} and A0sn ≡{α ∈ An : ||α − α0||s = o(1), ||α||s ≤ c0 < c}. Then, for the purpose ofestablishing a convergence rate under a pseudo metric that is weaker than
|| · ||s, we can treat A0s as the new parameter space and A0sn as its sievespace, and assume that both A0s and A0sn are convex parameter spaces.For any α1, α2 ∈ A0s, we consider a continuous path {α (τ) : τ ∈ [0, 1]} inA0s such that α (0) = α1 and α (1) = α2. For simplicity we assume that for
any α, α + v ∈ A0s, {α + τv : τ ∈ [0, 1]} is a continuous path in A0s, andthat (Zt;α+ τv) is twice continuously differentiable at τ = 0 for almost all
Zt and any direction v ∈ A0s. We define the pathwise first derivative asd (Zt;α)
dα[v] ≡ d (Zt;α+ τv)
dτ|τ=0 a.s. Zt,
and the pathwise second derivative as
d2 (Zt;α)
dαdαT[v, v] ≡ d2 (Zt;α+ τv)
dτ2|τ=0 a.s. Zt.
Following Ai and Chen (2007), for any α1, α2 ∈ A0s, we define a pseudometric || · ||2 as follows:
kα1 − α2k2 ≡s−E
µd2 (Zt;α0)
dαdαT[α1 − α2, α1 − α2]
¶.
We show that bαn converges to α0 at a rate faster than n−1/4 under the
pseudo metric k·k2 and the following assumptions:
Assumption 3.5. (i) ςj > γj for j = 1, 1a, 2; (ii) k−γn = o([n+na]
−1/4)
with γ ≡ min{γ1/2, γ1a/2, γ2} > 1/2.
Assumption 3.6. (i) A0s is convex at α0 and θ0 ∈ int (Θ); (ii) (Zt;α)
is twice continuously pathwise differentiable with respect to α ∈ A0s, andlog g(y|x∗, w; θ) is twice continuously differentiable at θ0.
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
24 X. CHEN AND Y. HU
Assumption 3.7. supeα∈A0s supα∈A0sn ¯d (Zt;eα)dα
hα−α0
kα−α0ks
i¯≤ U(Zt) for a
random variable U(Zt) with E{[U(Zt)]2} <∞.
Assumption 3.8. (i) supv∈A0s:||v||s=1−E³d2 (Zt;α0)dαdαT
[v, v]´≤ C < ∞;
(ii) uniformly over eα ∈ A0s and α ∈ A0sn, we have
−EÃd2 (Zt; eα)dαdαT
[α− α0, α− α0]
!= kα− α0k22 × {1 + o(1)}.
Assumption 3.5 guarantees that the sieve approximation error under the
strong norm || · ||s goes to zero faster than [n + na]−1/4. Assumption 3.6
makes sure that the twice pathwise derivatives are well defined with respect
to α ∈ A0s; hence, the pseudo metric kα− α0k2 is well defined on A0s.Assumption 3.7 imposes an envelope condition. Assumption 3.8(i) implies
that kα− α0k2 ≤√C kα− α0ks for all α ∈ A0s. Assumption 3.8(ii) implies
that there are positive finite constants C1 and C2, such that for all α ∈ A0sn,C1 kα− α0k22 ≤ E[ (Zt;α0)− (Zt;α)] ≤ C2 kα− α0k22; that is, kα− α0k22 isequivalent to the Kullback-Leibler discrepancy on the local sieve space A0sn.The following convergence rate theorem is a direct application of theorem
3.2 of Shen and Wong (2004) or theorem 3.2 of Chen (2006) to the local
parameter space A0s and the local sieve space A0sn; hence, we omit itsproof.
Theorem 3.1. Under assumptions 3.1—3.8, if kn = O³[n+ na]
12γ+1
´,
then
kbαn − α0k2 = OP
Ãmax
(k−γn ,
skn
n+ na
)!= OP
µ[n+ na]
−γ2γ+1
¶.
3.1.3. Asymptotic normality under possible misspecification. We can de-
rive the asymptotic distribution of the sieve quasi MLE bθn regardless ofwhether the latent parametric model g(y|x∗, w; θ0) is correctly specified ornot. First, we define an inner product corresponding to the pseudo metric
k·k2:
hv1, v2i2 ≡ −E(d2 (Zt;α0)
dαdαT[v1, v2]
).
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 25
LetV denote the closure of the linear span ofA− {α0} under the metric k·k2.Then
³V, k·k2
´is a Hilbert space and we can represent V = Rdθ × U with
U ≡ F1 ×F1a ×F2 ×F2 − {(f01, f01a, f02, f02a)}. Let h = (f1, f1a, f2, f2a)
denote all the unknown densities. Then the pathwise first derivative can be
written as
d (Zt;α0)
dα[α− α0] =
d (Zt;α0)
dθT(θ − θ0) +
d (Z;α0)
dh[h− h0]
=
µd (Zt;α0)
dθT− d (Z;α0)
dh[μ]
¶(θ − θ0),
with h− h0 ≡ −μ× (θ − θ0), and in which
d (Z;α0)
dh[h− h0] =
d (Z; θ0, h0(1− τ) + τh)
dτ|τ=0
=d (Zt;α0)
df1[f1 − f01] +
d (Zt;α0)
df1a[f1a − f01a]
+d (Zt;α0)
df2[f2 − f02] +
d (Zt;α0)
df2a[f2a − f02a] .
Note that
E
Ãd2 (Zt;α0)
dαdαT[α− α0, α− α0]
!
= (θ − θ0)TE
Ãd2 (Zt;α0)
dθdθT− 2d
2 (Z;α0)
dθdhT[μ] +
d2 (Z;α0)
dhdhT[μ, μ]
!(θ − θ0),
with h− h0 ≡ −μ× (θ − θ0), and in which
d2 (Z;α0)
dθdhT[h− h0] =
d(∂ (Z; θ0, h0(1− τ) + τh)/∂θ)
dτ|τ=0,
d2 (Z;α0)
dhdhT[h− h0, h− h0] =
d2 (Z; θ0, h0(1− τ) + τh)
dτ2|τ=0.
For each component θk (of θ), k = 1, ..., dθ, suppose there exists a μ∗k ∈ U
that solves:
μ∗k : infμk∈U
E
(−Ã∂2 (Z;α0)
∂θk∂θk− 2d
2 (Z;α0)
∂θkdhT[μk] +
d2 (Z;α0)
dhdhT[μk, μk]
!).
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
26 X. CHEN AND Y. HU
Denote μ∗ =³μ∗1, μ∗2, ..., μ∗dθ
´with each μ∗k ∈ U , and
d (Z;α0)
dh[μ∗] =
µd (Z;α0)
dh
hμ∗1
i, ...,
d (Z;α0)
dh
hμ∗dθ
i¶,
d2 (Z;α0)
∂θdhT[μ∗] =
Ãd2 (Z;α0)
∂θdh[μ∗1], ...,
d2 (Z;α0)
∂θdh[μ∗dθ ]
!,
d2 (Z;α0)
dhdhT[μ∗, μ∗] =
⎛⎜⎝d2 (Z;α0)dhdhT
[μ∗1, μ∗1] · · · d2 (Z;α0)dhdhT
[μ∗1, μ∗dθ ]· · · · · · · · ·
d2 (Z;α0)dhdhT
[μ∗dθ , μ∗1] · · · d2 (Z;α0)dhdhT
[μ∗dθ , μ∗dθ ]
⎞⎟⎠ .
Also denote
V∗ ≡ −EÃ∂2 (Z;α0)
∂θ∂θT− 2d
2 (Z;α0)
∂θdhT[μ∗] +
d2 (Z;α0)
dhdhT[μ∗, μ∗]
!.
Now we consider a linear functional of α, which is λT θ for any λ ∈ Rdθ
with λ 6= 0. Since
supα−α0 6=0
|λT (θ − θ0) |2||α− α0||22
= supθ 6=θ0,μ6=0
(θ − θ0)TλλT (θ − θ0)
(θ − θ0)TEn−³d2 (Zt;α0)
dθdθT− 2d2 (Z;α0)
dθdhT[μ] + d2 (Z;α0)
dhdhT[μ, μ]
´o(θ − θ0)
= λT (V∗)−1λ,
the functional λT (θ − θ0) is bounded if and only if the matrix V∗ is nonsin-
gular.
Suppose that V∗ is nonsingular. For any fixed λ 6= 0, denote υ∗ ≡ (v∗θ , v∗h)with v∗θ ≡ (V∗)−1λ and v∗h ≡ −μ∗ × v∗θ . Then the Riesz representation theo-
rem implies: λT (θ − θ0) = hυ∗, α− α0i2 for all α ∈ A. In the appendix, weshow that
λT³bθn − θ0
´= hυ∗, bαn − α0i2
=1
n+ na
n+naXt=1
d (Zt;α0)
dα[υ∗] + op
µ1√
n+ na
¶.
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 27
Denote N0 = {α ∈ A0s : kα− α0k2 = o([n + na]−1/4)} and N0n = {α ∈
A0sn : kα− α0k2 = o([n + na]−1/4)}. We impose the following additional
conditions for asymptotic normality of sieve quasi MLE bθn:Assumption 3.9. μ∗ exists (i.e., μ∗k ∈ U for k = 1, ..., dθ), and V∗ is
positive-definite.
Assumption 3.10. There is a υ∗n ∈ An− {α0}, such that ||υ∗n − υ∗||2 =o(1) and kυ∗n − υ∗k2 × kbαn − α0k2 = oP (
1√n+na
).
Assumption 3.11. There is a random variable U(Zt) with E{[U(Zt)]2} <
∞ and a non-negative measurable function η with limδ→0 η(δ) = 0, such that,
for all α ∈ N0n,
supα∈N0
¯¯d2 (Zt;α)
dαdαT[α− α0, υ
∗n]
¯¯ ≤ U(Zt)× η(||α− α0||s).
Assumption 3.12. Uniformly over α ∈ N0 and α ∈ N0n,
E
Ãd2 (Zt;α)
dαdαT[α− α0, υ
∗n]−
d2 (Zt;α0)
dαdαT[α− α0, υ
∗n]
!= o
µ1√
n+ na
¶.
Assumption 3.13. E
½³d (Zt;α0)
dα [υ∗n − υ∗]´2¾
goes to zero as kυ∗n − υ∗k2goes to zero.
Assumption 3.9 is critical for obtaining the√n convergence of sieve quasi
MLE bθn to θ0 and its asymptotic normality. We notice that it is possi-
ble that θ0 is uniquely identified but assumption 3.9 is not satisfied. If
this happens, θ0 can still be consistently estimated, but the best achiev-
able convergence rate is slower than the√n−rate. Assumption 3.10 implies
that the asymptotic bias of the Riesz representer is negligible. Assumptions
3.11 and 3.12 control the remainder term. Assumption 3.13 is automati-
cally satisfied when the latent parametric model is correctly specified, since
E
½³d (Zt;α0)
dα [υ∗n − υ∗]´2¾
= kυ∗n − υ∗k22 under correct specification. Denote
Sθ0 ≡d (Zt;α0)
dθT− d (Zt;α0)
dh[μ∗] and I∗ ≡ E
hSTθ0Sθ0
i.
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
28 X. CHEN AND Y. HU
The following asymptotic normality result applies to possibly misspecified
models.
Theorem 3.2. Under assumptions 3.1—3.13, we have√n+ na
³bθn − θ0´
d→N¡0, V −1∗ I∗V −1∗
¢.
3.1.4. Semiparametric efficiency under correct specification. In this sub-
section we assume that g(y|x∗, w; θ0) correctly specifies the true unknownconditional density fY |X∗,W (y|x∗, w). We can then establish the semipara-metric efficiency of the two-sample sieve MLE bθn for the parameter of inter-est θ0. First we recall the Fisher metric k·k on A: for any α1, α2 ∈ A,
kα1 − α2k2 ≡ E
(µd (Zt;α0)
dα[α1 − α2]
¶2)and the Fisher norm—induced inner product:
hv1, v2i ≡ E
½µd (Zt;α0)
dα[v1]
¶µd (Zt;α0)
dα[v2]
¶¾.
Under correct specification, g(y|x∗, w; θ0) = fY |X∗,W (y|x∗, w), it can beshown that kvk = kvk2 and hv1, v2i = hv1, v2i2. Thus, the spaceV is also the
closure of the linear span of A− {α0} under the Fisher metric k·k. For eachparametric component θk of θ, k = 1, 2, ..., dθ, an alternative way to obtain
μ∗ =³μ∗1, μ∗2, ..., μ∗dθ
´is to compute μ∗k ≡
³μ∗k1 , μ∗k1a, μ
∗k2 , μ∗k2a
´T∈ U as
the solution to
infμk∈U
E
(µd (Zt;α0)
dθk− d (Zt;α0)
dh
hμki¶2)
= inf(μ1,μ1a,μ2,μ2a)
T∈UE
⎧⎪⎨⎪⎩⎛⎝ d (Zt;α0)
dθk− d (Zt;α0)
df1[μ1]−
d (Zt;α0)df1a
[μ1a]
−d (Zt;α0)df2
[μ2]−d (Zt;α0)
df2a[μ2a]
⎞⎠2⎫⎪⎬⎪⎭ .
Then
Sθ0 ≡d (Zt;α0)
dθT− d (Zt;α0)
dh[μ∗]
becomes the semiparametric efficient score for θ0, and under correct spec-
ification, I∗ ≡ EhSTθ0Sθ0
i= V∗, which is the semiparametric information
bound for θ0.
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 29
Given the expression of the density function, the pathwise first derivative
at α0 can be written as
d (Zt;α0)
dα[α− α0]
= Std p(Zt; θ0, f01, f02)
dα[α− α0] + (1− St)
d a(Zt; f01a, f02a)
dα[α− α0] .
See Appendix for the expressions of d p(Zt;θ0,f01,f02)dα [α− α0] and
d a(Zt;f01a,f02a)dα [α− α0].
Thus
I∗ ≡ EhSTθ0Sθ0
i= pI∗p + (1− p)I∗a
with
I∗p = E
⎡⎣ ³d p(Zt;θ0,f01,f02)
dθT−P2
j=1d p(Zt;θ0,f01,f02)
dfj
hμ∗ji´T³
d p(Zt;θ0,f01,f02)
dθT−P2
j=1d p(Zt;θ0,f01,f02)
dfj
hμ∗ji´
⎤⎦ ,
I∗a = E
⎡⎣ ³P2j=1
d a(Zt;f01a,f02a)dfja
hμ∗ja
i´T³P2j=1
d a(Zt;f01a,f02a)dfja
hμ∗ja
i´⎤⎦ .
Therefore, the influence function representation of our two-sample sieve
MLE is:
λT³bθn − θ0
´=
1
n+ na
⎧⎨⎩nXi=1
d p(Zpi; θ0, f01, f02)
dα[υ∗] +
naXj=1
d a(Zaj ; f01a, f02a)
dα[υ∗]
⎫⎬⎭+op
µ1√
n+ na
¶,
and the asymptotic distribution of√n+ na
³bθn − θ0´is N
¡0, I−1∗
¢. Com-
bining our theorem 3.2 and theorem 4 of Shen (1997), we immediately obtain
Theorem 3.3. Suppose that g(y|x∗, w; θ0) = fY |X∗,W (y|x∗, w) for al-most all y, x∗, w, that I∗ is positive definite, and that assumptions 3.1—3.12
hold. Then the two-sample sieve MLE bθn is semiparametrically efficient, and√n³bθn − θ0
´d→ N
³0, [I∗p +
1−pp I∗a]−1
´= N
¡0, pI−1∗
¢.
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
30 X. CHEN AND Y. HU
Following Ai and Chen (2003), the asymptotic efficient variance, I−1∗ , of
the sieve MLE bθn (under correct specification) can be consistently estimatedby bI−1∗ , with
bI∗ = 1
n+ na
n+naXt=1
µd (Zt; bα)
dθT− d (Zt; bα)
dh[bμ∗]¶T µd (Zt; bα)
dθT− d (Zt; bα)
dh[bμ∗]¶ ,
in which bμ∗ = ³bμ∗1, bμ∗2, ..., bμ∗dθ´ and bμ∗k ≡ ³bμ∗k1 , bμ∗k1a, bμ∗k2 , bμ∗k2a´T solves thefollowing sieve minimization problem: for k = 1, 2, ..., dθ,
minμk∈Fn
n+naXt=1
⎛⎝ d (Zt;bα)dθk
− d (Zt;bα)df1
hμk1
i− d (Zt;bα)
df1a
hμk1a
i−d (Zt;bα)
df2
hμk2
i− d (Zt;bα)
df2a
hμk2a
i⎞⎠2 ,
in which Fn ≡ Fn1 ×Fn
1a ×Fn2 ×Fn
2 . Denote
d (Zt; bα)dh
hbμ∗ki ≡ d (Zt; bα)df1
hbμ∗k1 i+ d (Zt; bα)df1a
hbμ∗k1ai+d (Zt; bα)
df2
hbμ∗k2 i+ d (Zt; bα)df2a
hbμ∗k2ai ,and
d (Zt; bα)dh
[bμ∗] = µd (Zt; bα)
dh
hbμ∗1i , ..., d (Zt; bα)dh
hbμ∗dθi¶ .3.2. Sieve likelihood ratio model selection test. In many empirical appli-
cations, researchers often estimate different parametrically specified struc-
ture models in order to select one that fits the data the “best”. We shall
consider two non-nested, possibly misspecified, parametric latent structure
models: {g1(y|x∗, w; θ1) : θ2 ∈ Θ1} and {g2(y|x∗, w; θ2) : θ2 ∈ Θ2}. If X∗
were observed without error in the primary sample, researchers could ap-
ply Vuong’s (1989) likelihood ratio test to select a “best” parametric model
that is closest to the true underlying conditional density fY |X∗,W (y|x∗, w)according to the KLIC. In this subsection, we shall extend Vuong’s result to
the case in which X∗ is not observed in either sample.
Consider two parametric families of models {gj(y|x∗, w; θj) : θj ∈ Θj},Θj a compact subset of Rdθj , j = 1, 2 for the latent true conditional density
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 31
fY |X∗,W . Define
θ0j ≡ arg maxθj∈Θj
Z[log gj(y|x∗, w; θj)]fY |X∗,W (y|x∗, w)dy.
According to Vuong (1989), the two models are nested if g1(y|x∗, w; θ01) =g2(y|x∗, w; θ02) for almost all y ∈ Y, x∗ ∈ X ∗, w ∈ W; the two models arenon-nested if g1(Y |X∗,W ; θ01) 6= g2(Y |X∗,W ; θ02) with positive probabil-
ity.
For j = 1, 2, denote α0j = (θT0j , f01, f01a, f02, f02a)T ∈ Aj with Aj =
Θj×F1×F1a×F2×F2, and let j(Zt;α0j) denote the log-likelihood according
to model j evaluated at data Zt. Following Vuong (1989), we select model
1 if H0 holds, in which
H0 : E { 2(Zt;α02)− 1(Zt;α01)} ≤ 0,
and we select model 2 if H1 holds, in which
H1 : E { 2(Zt;α02)− 1(Zt;α01)} > 0.
For j = 1, 2, denote Aj,n = Θj×Fn1 ×Fn
1a×Fn2 ×Fn
2 and define the sieve
quasi MLE for α0j ∈ Aj as
bαj = argmaxαj∈Aj,n
n+naXt=1
j(Zt;αj)
= argmaxαj∈Aj,n
"nXt=1
j,p(Zpt; θj , f1, f2) +naXt=1
j,a(Zat; f1a, f2a)
#.
In the following, we denote σ2 ≡ V ar ( 2(Zt;α02)− 1(Zt;α01)) and
σ2 =1
n+ na
n+naXt=1
"{ 2(Zt; bα2)− 1(Zt; bα1)}
− 1n+na
Pn+nas=1 { 2(Zs; bα2)− 1(Zs; bα1)}
#2.
Theorem 3.4. Suppose both models 1 and 2 satisfy assumptions 3.1—3.8,
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
32 X. CHEN AND Y. HU
and σ2 <∞. Then
1√n+ na
n+naXt=1
Ã{ 2(Zt; bα2)− 1(Zt; bα1)}
−E{ 2(Zt;α02)− 1(Zt;α01)}
!
=1√
n+ na
n+naXt=1
Ã{ 2(Zt;α02)− 1(Zt;α01)}−E{ 2(Zt;α02)− 1(Zt;α01)}
!+ oP (1)
d→ N³0, σ2
´.
Suppose models 1 and 2 are non-nested, then
1
σ√n+ na
n+naXt=1
Ã{ 2(Zt; bα2)− 1(Zt; bα1)}
−E{ 2(Zt;α02)− 1(Zt;α01)}
!d→ N (0, 1) .
Thus under the least favorable null hypothesis of E { 2(Zt;α02)− 1(Zt;α01)} =0, we have 1
σ√n+na
Pn+nat=1 { 2(Zt; bα2)− 1(Zt; bα1)} d→ N (0, 1), which can be
used to provide a sieve likelihood ratio model selection test of H0 against
H1.
4. Simulation. In this section we present a simulation study to illus-
trate the finite sample performance of the two-sample sieve MLE. The true
latent probability density model fY |X∗,W is:
fY |X∗,W (y|x∗, w; θ) = φ (y −m (x∗, w; θ)) ,
where φ (·) is the pdf of the standard normal distribution and
m (X∗,W ; θ) = β1X∗ + β2X
∗W + β3
³X∗2W +X∗W 2
´/2,
in which θ = (β1, β2, β3)T is unknown and W ∈ {−1, 0, 1}. We have two
independent random samples: {Xi,Wi, Yi}ni=1 and {Xaj ,Waj , Yaj}naj=1, withn = 1500 and na = 1000. In the primary sample, we let θ0 = (1, 1, 1)T ,
X∗|W ∼ N(0, 1), and Pr(W = 1) = Pr(W = 0) = 1/3. The mismeasured
value X equals
X = 0.1X∗ + e−0.1X∗ε with ε ∼ N(0, 0.36).
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 33
In the auxiliary sample we generateWa in the same way that we generateW
in the primary sample. We set the unknown true conditional density fX∗a |Wa
as follows:
fX∗a |Wa(x∗a|wa) =
⎧⎪⎨⎪⎩ψ (x∗a) for wa = −1
0.25ψ (0.25x∗a) for wa = 0ψ (x∗a − 0.5) for wa = 1
.
The mismeasured value Xa equals
Xa = X∗a + 0.5e
−X∗aν, with ν ∼ N(0, 1),
which implies that x∗a is the mode of the conditional density fXa|X∗a (·|x∗a).
We use the simple sieve expression pk1,n1 (x1, x2)
Tβ1 =PJn
j=0
PKnk=0 γjkpj (x1 − x2) qk (x2)
to approximate the conditional densities fX|X∗(x1|x2) and fXa|X∗a (x1|x2),with k1,n = (Jn + 1)(Kn + 1); and p
k2,n2 (x∗)Tβ2(w) =
Pk2,nk=1 γk(w)qk (x
∗)
to approximate the conditional densities fX∗|Wj=w, fX∗a |Wj=w with Wj =
−1, 0, 1. The bases {pj(·)} and {qk(·)} are Hermite polynomials bases.The simulation results shown in Table 1 include three estimators. The first
estimator is the standard probit MLE using the primary sample {Xi,Wi, Yi}ni=1alone as if it were accurate; this estimator is inconsistent and its bias should
dominate the squared root of mean square error (root MSE). The second
estimator is the standard probit MLE using accurate data {Yi,X∗i ,Wi}ni=1.
This estimator is consistent and most efficient; however, we call it “infeasi-
ble MLE” since X∗i is not observed in practice. The third estimator is the
two-sample sieve MLE developed in this paper. In the last column, we also
report the square root of the sum of the three mean square errors of β1, β2,
and β3. The simulation repetition times is 400. The simulation results show
that the 2-sample sieve MLE has a much smaller bias (and a slightly bigger
standard error) than the estimator ignoring measurement error. Moreover,
the 2-sample sieve MLE has a smaller total root MSE than the inconsistent
estimator. In summary, our 2-sample sieve MLE performs well in this Monte
Carlo simulation.
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
34 X. CHEN AND Y. HU
Table 1Simulation results (n = 1500, na = 1000, reps = 400)
true value of β: β1 = 1 β2 = 1 β3 = 1
ignoring meas. error:— mean estimate 0.1753 0.3075 0.5953— standard error 0.08422 0.1227 0.1879— root mse 0.8290 0.7033 0.4461
infeasible MLE:— mean estimate 0.9998 1.001 1.000— standard error 0.02792 0.03382 0.03549— root mse 0.02792 0.03382 0.03549
2-sample sieve MLE:— mean estimate 1.024 1.038 0.9866— standard error 0.08670 0.1229 0.2290— root mse 0.08999 0.1286 0.2293
note: Jn = 5, Kn = 3 in bfX|X∗ , bfXa|X∗a ;k2,n = 4 for bfX∗|W , bfX∗a |Wa .
5. Conclusion. This paper considers nonparametric identification and
semiparametric estimation of a general nonlinear model using two random
samples. Both samples consist of a dependent variable, some error-free co-
variates and an error-ridden covariate, in which the measurement error has
unknown distribution and could be arbitrarily correlated with the latent
true values. We provide reasonable conditions so that the latent nonlinear
model is nonparametrically identified using the two samples. The advantage
of our identification strategy is that, in addition to allowing for nonclassical
measurement errors in both samples, neither sample is required to contain
an accurate measurement of the latent true covariate, and only one mea-
surement of the error-ridden covariate is assumed in each sample. Moreover,
our identification result does not require that the primary sample contain
an IV excluded from the nonlinear model of interest, nor does it require that
the two samples be independent.
Since the latent nonlinear model is nonparametrically identified without
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 35
imposing two independent samples, we could estimate the latent nonlinear
model nonparametrically via two potentially correlated samples, provided
that we impose some structure on the correlation of the two samples. In
particular, the panel data structure in Horowitz and Markatou (1996) could
be borrowed to model two correlated samples. We shall investigate this in
future research.
6. Appendix: Mathematical Proofs. Proof : (Theorem 2.1) Under
assumption 2.1, the probability density of the observed vectors equals
(A.1)
fX,W,Y (x,w, y) =
ZX∗
fX|X∗(x|x∗)fX∗,W,Y (x∗, w, y)dx∗ for all x,w, y.
For each value wj of W , assumption 2.1 implies that
(A.2) fX,Y |W (x, y|wj) =
ZfX|X∗ (x|x∗) fY |X∗,W (y|x∗, wj)fX∗|Wj
(x∗)dx∗
in the primary sample. Similarly, assumptions 2.2 and 2.3 imply that
(A.3)
fXa,Ya|Wa(x, y|wj) =
ZfXa|X∗a (x|x
∗) fY |X∗,W (y|x∗, wj)fX∗a |Wj(x∗)dx∗
in the auxiliary sample.
By equation (A.2) and the definition of the operators, we have, for any
function h,³LX,Y |Wj
h´(x)
=
ZfX,Y |Wj
(x, u|wj)h (u) du
=
Z µZfX|X∗ (x|x∗) fY |X∗,W (u|x∗, wj)fX∗|Wj
(x∗)dx∗¶h (u) du
=
ZfX|X∗ (x|x∗) fX∗|Wj
(x∗)µZ
fY |X∗,W (u|x∗, wj)h (u) du
¶dx∗
=
ZfX|X∗ (x|x∗) fX∗|Wj
(x∗)³LY |X∗,Wj
h´(x∗) dx∗
=
ZfX|X∗ (x|x∗)
³LX∗|Wj
LY |X∗,Wjh´(x∗) dx∗
=³LX|X∗LX∗|Wj
LY |X∗,Wjh´(x) .
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
36 X. CHEN AND Y. HU
This means we have the operator equivalence
(A.4) LX,Y |Wj= LX|X∗LX∗|Wj
LY |X∗,Wj
in the primary sample. Similarly, equation (A.3) and the definition of the
operators imply
(A.5) LXa,Ya|Wj= LXa|X∗aLX∗a |Wj
LY |X∗,Wj
in the auxiliary sample. While the left-hand sides of equations (A.4) and
(A.5) are observed, the right-hand sides contain unknown operators corre-
sponding to the error distributions (LX|X∗ and LXa|X∗a ), the marginal distri-
butions of the latent true values (LX∗|Wjand LX∗a |Wj
), and the conditional
distribution of the dependent variable (LY |X∗,Wj).
Assumptions 2.4 and 2.5 imply that all the operators involved in equations
(A.4) and (A.5) are invertible. Under assumptions 2.4 and 2.5, for any given
Wj we can eliminate LY |X∗,Wjin equations (A.4) and (A.5) to obtain
(A.6) LXa,Ya|WjL−1X,Y |Wj
= LXa|X∗aLX∗a |WjL−1X∗|Wj
L−1X|X∗ .
This equation holds for all Wi and Wj . We may then eliminate LX|X∗ to
have
LijXa,Xa
≡³LXa,Ya|Wj
L−1X,Y |Wj
´³LXa,Ya|Wi
L−1X,Y |Wi
´−1= LXa|X∗a
³LX∗a |Wj
L−1X∗|WjLX∗|Wi
L−1X∗a |Wi
´L−1Xa|X∗a
≡ LXa|X∗aLijX∗a
L−1Xa|X∗a.(A.7)
The operator LijXa,Xa
on the left-hand side is observed for all i and j. An im-
portant observation is that the operator LijX∗a≡³LX∗a |Wj
L−1X∗|WjLX∗|Wi
L−1X∗a |Wi
´is a diagonal operator defined as
(A.8)³LijX∗a
h´(x∗) ≡ kijX∗a (x
∗)h (x∗)
with
kijX∗a (x∗) ≡
fX∗a |Wj(x∗) fX∗|Wi
(x∗)
fX∗|Wj(x∗) fX∗a |Wi
(x∗).
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 37
Equation (A.7) implies a diagonalization of an observed operator LijXa,Xa
.
An eigenvalue of LijXa,Xa
equals kijX∗a (x∗) for a value of x∗, which corresponds
to an eigenfunction fXa|X∗a (·|x∗).
We now show the identification of fXa|X∗a and kijX∗a (x∗). First, we require
the operator LijXa,Xa
to be bounded so that the diagonal decomposition may
be unique; see, e.g., Dunford and Schwartz (1971). Equation (A.7) implies
that the operator LijXa,Xa
has the same spectrum as the diagonal operator
LijX∗a. Since an operator is bounded by the largest element of its spectrum,
assumption 2.6 guarantees that the operator LijXa,Xa
is bounded. Second,
although it implies a diagonalization of the operator LijXa,Xa
, equation (A.7)
does not guarantee distinctive eigenvalues. If there exist duplicate eigen-
values, there exist two linearly independent eigenfunctions corresponding
to the same eigenvalue. A linear combination of the two eigenfunctions is
also an eigenfunction corresponding to the same eigenvalue. Therefore, the
eigenfunctions may not be identified in each decomposition corresponding
to a pair of i and j. However, such ambiguity can be eliminated by noting
that the observed operators LijXa,Xa
for all i, j share the same eigenfunctions
fXa|X∗a (·|x∗). Assumption 2.7 guarantees that, for any two different eigen-
functions fXa|X∗a (·|x∗1) and fXa|X∗a (·|x
∗2), one can always find two subsets
Wj and Wi such that the two different eigenfunctions correspond to two
different eigenvalues kijX∗a (x∗1) and kijX∗a (x
∗2) and, therefore, are identified.
The third ambiguity is that, for a given value of x∗, an eigenfunction
fXa|X∗a (·|x∗) times a constant is still an eigenfunction corresponding to x∗.
To eliminate this ambiguity, we need to normalize each eigenfunction. No-
tice that fXa|X∗a (·|x∗) is a conditional probability density for each x∗; hence,R
fXa|X∗a (x|x∗) dx = 1 for all x∗. This property of conditional density pro-
vides a perfect normalization condition.
Fourth, in order to fully identify each eigenfunction, i.e., fXa|X∗a , we need
to identify the exact value of x∗ in each eigenfunction fXa|X∗a (·|x∗). Notice
that the eigenfunction fXa|X∗a (·|x∗) is identified up to the value of x∗. In
other words, we have identified a probability density of Xa conditional on
X∗a = x∗ with the value of x∗ unknown. Moreover, assumption 2.8 identifies
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
38 X. CHEN AND Y. HU
the exact value of x∗ for each eigenfunction fXa|X∗a (·|x∗). For example, an
intuitive assumption is that the value of x∗ is the mean of this identified
probability density, i.e., x∗ =RxfXa|X∗a (x|x
∗) dx; this assumption is equiv-
alent to that the measurement error in the auxiliary sample (Xa −X∗a) has
zero mean conditional on the latent true values.
After fully identifying the density function fXa|X∗a , we now show that the
density of interest fY |X∗,W and fX|X∗ are also identified. By equation (A.3),
we have fXa,Ya|Wa= LXa|X∗afYa,X∗a |Wa
. By the injectivity of operator LXa|X∗a ,
the joint density fYa,X∗a |Wamay be identified as follows:
fYa,X∗a |Wa= L−1Xa|X∗afXa,Ya|Wa
.
Assumption 2.3 implies that fYa|X∗a ,Wa= fY |X∗,W so that we may identify
fY |X∗,W through
fY |X∗,W (y|x∗, w) =fYa,X∗a |Wa
(y, x∗|w)RfYa,X∗a |Wa
(y, x∗|w)dy for all x∗ and w.
By equation (A.4) and the injectivity of the identified operator LY |X∗,Wj,
we have
(A.9) LX|X∗LX∗|Wj= LX,Y |Wj
L−1Y |X∗,Wj.
The left-hand side of equation (A.9) equals an operator with the kernel
function fX,X∗|W=wj ≡ fX|X∗fX∗|W=wj . Since the right-hand side of equa-
tion (A.9) has been identified, the kernel fX,X∗|W=wj on the left-hand side
is also identified. We may then identify fX|X∗ through
fX|X∗(x|x∗) =fX,X∗|W=wj (x, x
∗)RfX,X∗|W=wj (x, x
∗)dxfor all x∗ ∈ X ∗.
Proof : (Theorem 3.2) The proof is a simplified version of that for the-
orem 4.1 in Ai and Chen (2007). Recall the neighborhoods N0n = {α ∈A0sn : kα− α0k2 = o([n + na]
−1/4)} and N0 = {α ∈ A0s : kα− α0k2 =o([n+ na]
−1/4)}. For any α ∈ N0n, define
r[Zt;α,α0] ≡ (Zt;α)− (Zt;α0)−d (Zt;α0)
dα[α− α0].
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 39
Denote the centered empirical process indexed by any measurable function
h as
μn (h(Zt)) ≡1
n+ na
n+naXt=1
{h(Zt)−E[h(Zt)]}.
Let εn > 0 be at the order of o([n+na]−1/2). By definition of the two-sample
sieve quasi MLE bαn, we have0 ≤ 1
n+ na
n+naXt=1
[ (Zt; bα)− (Zt; bα± εnυ∗n)]
= μn ( (Zt; bα)− (Zt; bα± εnυ∗n)) +E ( (Zt; bα)− (Zt; bα± εnυ
∗n))
= ∓εn ×1
n+ na
n+naXt=1
d (Zt;α0)
dα[υ∗n] + μn (r[Zt; bα,α0]− r[Zt; bα± εnυ
∗n, α0])
+E (r[Zt; bα,α0]− r[Zt; bα± εnυ∗n, α0]) .
In the following we will show that:
1
n+ na
n+naXt=1
d (Zt;α0)
dα[υ∗n − υ∗](A.10)
= oP (1√
n+ na);
E (r[Zt; bα,α0]− r[Zt; bα± εnυ∗n, α0])(A.11)
= ±εn hbα− α0, υ∗i2 + εnoP (
1√n+ na
);
μn (r[Zt; bα,α0]− r[Zt; bα± εnυ∗n, α0])(A.12)
= εn × oP (1√
n+ na).
Notice that assumptions 3.1, 3.2(ii)(iii), and 3.6 imply E³d (Zt;α0)
dα [υ∗]´= 0.
Under (A.10) - (A.12) we have:
0 ≤ 1
n+ na
n+naXt=1
[ (Zt; bα)− (Zt; bα± εnυ∗n)]
= ∓εn × μn
µd (Zt;α0)
dα[υ∗]
¶± εn × hbα− α0, υ
∗i2 + εn × oP (1√
n+ na).
Hence
√n+ na hbα− α0, υ
∗i2 =√n+ naμn
µd (Zt;α0)
dα[υ∗]
¶+ oP (1)⇒ N
³0, σ2∗
´,
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
40 X. CHEN AND Y. HU
with
σ2∗ ≡ E
(µd (Zt;α0)
dα[υ∗]
¶2)= (v∗θ)
TEhSTθ0Sθ0
i(v∗θ) = λT (V∗)
−1I∗(V∗)−1λ.
Thus, assumptions 3.2(i), 3.7, and 3.9 together imply that σ2∗ <∞ and
√n+ naλ
T (bθn − θ0) =√n+ na hbα− α0, υ
∗i2 + oP (1)⇒ N³0, σ2∗
´.
To complete the proof, it remains to establish (A.10) - (A.12). Notice that
(A.10) is implied by the Chebyshev inequality, i.i.d. data, and assumptions
3.10 and 3.13. For (A.11) and (A.12) we notice that
r[Zt; bα,α0]− r[Zt; bα± εnυ∗n, α0]
= (Zt; bα)− (Zt; bα± εnυ∗n)−
d (Zt;α0)
dα[∓εnυ∗n]
= ∓εn ×µd (Zt; eα)
dα[υ∗n]−
d (Zt;α0)
dα[υ∗n]
¶= ∓εn ×
Ãd2 (Zt;α)
dαdαT[eα− α0, υ
∗n]
!in which eα ∈ N0n is in between bα and bα± εnυ
∗n, and α ∈ N0 is in betweeneα ∈ N0n and α0. Therefore, for (A.11), by the definition of inner product
h·, ·i2, we have:
E (r[Zt; bα,α0]− r[Zt; bα± εnυ∗n, α0])
= ∓εn ×E
Ãd2 (Zt;α)
dαdαT[eα− α0, υ
∗n]
!= ±εn × heα− α0, υ
∗ni2
∓εn ×E
Ãd2 (Zt;α)
dαdαT[eα− α0, υ
∗n]−
d2 (Zt;α0)
dαdαT[eα− α0, υ
∗n]
!= ±εn × hbα− α0, υ
∗ni2 ± εn × heα− bα, υ∗ni2 + oP (
εn√n+ na
)
= ±εn × hbα− α0, υ∗i2 +OP (ε
2n) + oP (
εn√n+ na
)
in which the last two equalities hold due to the definition of eα, assumptions3.10 and 3.12, and
hbα− α0, υ∗n − υ∗i2 = oP (
1√n+ na
) and ||υ∗n||22 → ||υ∗||22 <∞.
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 41
Hence, (A.11) is satisfied. For (A.12), we notice
μn (r[Zt; bα,α0]− r[Zt; bα± εnυ∗n, α0]) = ∓εn×μn
µd (Zt; eα)
dα[υ∗n]−
d (Zt;α0)
dα[υ∗n]
¶in which eα ∈ N0n is in between bα and bα±εnυ∗n. Since the class nd (Zt;eα)
dα [υ∗n] : eα ∈ A0sois Donsker under assumptions 3.1, 3.2, 3.6, and 3.7, and since
E
(µd (Zt; eα)
dα[υ∗n]−
d (Zt;α0)
dα[υ∗n]
¶2)= E
⎧⎨⎩Ãd2 (Zt;α)
dαdαT[eα− α0, υ
∗n]
!2⎫⎬⎭goes to zero as ||eα−α0||s goes to zero under assumption 3.11, we have (A.12)holds.
For the sake of completeness, we write down the expressions of d p(Z;θ0,f01,f02)dα [α− α0]
and d a(Z;f01a,f02a)dα [α− α0] that are needed in the calculation of the Riesz
representer and the asymptotic efficient variance of the sieve MLE bθ in sub-section 3.1.4:
fX,Y |W (X,Y |W ; θ0, f01, f02)×d p(Z; θ0, f01, f02)
dα[α− α0]
=
ZX∗
f01(X|x∗)dg(Y |x∗,W ; θ0)
dθTf02(x
∗|W )dx∗ [θ − θ0]
+
ZX∗[f1(X|x∗)− f01(X|x∗)] g(Y |x∗,W ; θ0)f02(x
∗|W )dx∗
+
ZX∗
f01(X|x∗)g(Y |x∗,W ; θ0) [f2(x∗|W )− f02(x
∗|W )] dx∗,
and
fXa,Ya|Wa(Xa, Ya|Wa; f01a, f02a)×
d a(Z; f01a, f02a)
dα[α− α0]
=
ZX∗
f01a(X|x∗)dg(Y |x∗,W ; θ0)
dθTf02a(x
∗|Wa)dx∗
+
ZX∗[f1a(X|x∗)− f01a(X|x∗)] g(Y |x∗,W ; θ0)f02a(x
∗|Wa)dx∗
+
ZX∗
f01a(X|x∗)g(Y |x∗,W ; θ0) [f2a(x∗|Wa)− f02a(x
∗|Wa)] dx∗.
Proof : (Theorem 3.4) Under stated assumptions, we have, for model
j = 1, 2,
1√n+ na
n+naXt=1
Ã{ j(Zt; bαj)− j(Zt;α0j)}−E{ j(Zt; bαj)− j(Zt;α0j)}
!= oP (1),
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
42 X. CHEN AND Y. HU
and
E{ j(Zt; bαj)− j(Zt;α0j)} ³ ||bαj − α0j ||22 = oP
µ1√
n+ na
¶thus
1√n+ na
n+naXt=1
({ j(Zt; bαj)−E[ j(Zt;α0j)]})
=1√
n+ na
n+naXt=1
({ j(Zt; bαj)− j(Zt;α0j)}−E{ j(Zt; bαj)− j(Zt;α0j)})
+1√
n+ na
n+naXt=1
{ j(Zt;α0j)−E[ j(Zt;α0j)]}
+√n+ naE{ j(Zt; bαj)− j(Zt;α0j)}
=1√
n+ na
n+naXt=1
{ j(Zt;α0j)−E[ j(Zt;α0j)]}+ oP (1).
Under stated conditions, it is obvious that σ2 = σ2+oP (1). Suppose models
1 and 2 are non-nested, then σ > 0. Thus,
1
σ√n+ na
n+naXt=1
Ã{ 2(Zt; bα2)− 1(Zt; bα1)}
−E{ 2(Zt;α02)− 1(Zt;α01)}
!d→ N (0, 1) .
REFERENCES
Ai, C., and X. Chen (2003): “Efficient Estimation of Models with Condi-tional Moment Restrictions Containing Unknown Functions,” Econo-metrica 71, 1795—1843.
Ai, C., and X. Chen (2007): “Estimation of Possibly Misspecified Semipara-metric Conditional Moment Restriction Models with Different Condi-tioning Variables,” forthcoming in Journal of Econometrics.
Amemiya, Y., and W. A. Fuller (1988): “Estimation for the NonlinearFunctional Relationship,” Annals of Statistics 16, 147—160.
Blundell, R., X. Chen, and D. Kristensen (2007): “Semiparametric EngelCurves with Endogenous Expenditure,” forthcoming in Econometrica.
Bound, J., C. Brown, and N. Mathiowetz (2001): “Measurement Error inSurvey Data,” in Handbook of Econometrics, vol. 5, ed. by J. J. Heck-man and E. Leamer, Elsevier Science.
Buzas, J., and L. Stefanski (1996): “Instrumental Variable Estimation inGeneralized Linear Measurement Error Models,” Journal of the Amer-ican Statistical Association 91, 999—1006.
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 43
Carrasco, M., J.-P. Florens, and E. Renault (2006): “Linear Inverse Prob-lems and Structural Econometrics: Estimation Based on Spectral De-composition and Regularization,” in Handbook of Econometrics, vol.6, ed. by J. J. Heckman, and E. Leamer, Elsevier Science.
Carroll, R. J., D. Ruppert, C. Crainiceanu, T. Tosteson, and R. Karagas(2004): “Nonlinear and Nonparametric Regression and InstrumentalVariables,” Journal of the American Statistical Association 99, 736—750.
Carroll, R. J., D. Ruppert, and L. A. Stefanski (1995): Measurement Errorin Nonlinear Models: A Modern Perspective. New York: Chapman &Hall.
Carroll, R. J., D. Ruppert, L. A. Stefanski and C. Crainiceanu, 2006, Mea-surement Error in Nonlinear Models: A Modern Perspective, SecondEdition, CRI.
Carroll, R. J., and L. A. Stefanski (1990): “Approximate Quasi-likelihoodEstimation in Models with Surrogate Predictors,” Journal of the Amer-ican Statistical Association 85, 652—663.
Carroll, R. J. and M. P. Wand (1991): “Semiparametric Estimation inLogistic Measurement Error Models,” Journal of the Royal StatisticalSociety B 53, 573—585.
Chen, X. (2006): “Large Sample Sieve Estimation of Semi-nonparametricModels,” in Handbook of Econometrics, vol. 6, ed. by J. J. Heckmanand E. Leamer, Elsevier Science.
Chen, X., H. Hong, and E. Tamer (2005): “Measurement Error Models withAuxiliary Data,” Review of Economic Studies 72, 343—366.
Chen, X., H. Hong, and A. Tarozzi (2007): “Semiparametric Efficiency inGMM Models with Nonclassical Measurement Error,” forthcoming inAnnals of Statistics.
Cheng, C. L., Van Ness, J. W., 1999, Statistical Regression with Measure-ment Error, Arnold, London.
Chernozhukov, V., G. Imbens, and W. Newey (2007): “Instrumental Vari-able Identification and Estimation of Nonseparable Models via Quan-tile Conditions,” Journal of Econometrics.
Dunford, N., and J. T. Schwartz (1971): Linear Operators. New York: JohnWiley & Sons.
Fan, J. (1991): “On the Optimal Rates of Convergence for NonparametricDeconvolution Problems,” Annals of Statistics 19, 1257—1272.
Fuller, W., 1987, Measurement error models. New York: John Wiley &Sons.
Hausman, J., H. Ichimura, W. Newey, and J. Powell (1991): “Identificationand Estimation of Polynomial Errors-in-variables Models,” Journal ofEconometrics 50, 273—295.
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
44 X. CHEN AND Y. HU
Hoeffding, W. (1977): “Some Incomplete and Boundedly Complete Familiesof Distributions,” Annals of Statistics, 5, 278-291.
Hong, H., and E. Tamer (2003): “A Simple Estimator for Nonlinear Errorin Variable Models,” Journal of Econometrics 117(1), 1—19.
Horowitz, J., and M. Markatou (1996): “Semiparametric Estimation of Re-gression Models for Panel Data,” Review of Economic Studies 63, 145—168.
Hsiao, C. (1989): “Consistent Estimation for Some Nonlinear Errors-in-Variables Models,” Journal of Econometrics 41, 159—185.
Hu, Y. (2006): “Identification and Estimation of Nonlinear Models withMisclassification Error Using Instrumental Variables,” working paper,University of Texas at Austin.
Hu, Y., and G. Ridder (2006): “Estimation of Nonlinear Models with Mea-surement Error Using Marginal Information,” working paper, Univer-sity of Southern California.
Hu, Y., and S. M. Schennach (2006): “Identification and Estimation ofNonclassical Nonlinear Errors-in-Variables Models with ContinuousDistributions Using Instruments,” Cemmap working paper (Centre forMicrodata Methods and Practice).
Ichimura, H., and E. Martinez-Sanchis (2006): “Identification and Estima-tion of GMM Models by Combining Two Data Sets,” working paper,University College London.
Lee, L.-F., and J. H. Sepanski (1995): “Estimation of Linear and Nonlin-ear Errors-in-Variables Models Using Validation Data,” Journal of theAmerican Statistical Association 90 (429).
Lehmann, E.L. (1986): Testing Statistical Hypothesis, 2nd ed. Wiley: NewYork.
Lewbel, A. (2007): “Estimation of Average Treatment Effects with Mis-classification,” Econometrica, 75, 537-551.
Li, T., and Q. Vuong (1998): “Nonparametric Estimation of the Measure-ment Error Model Using Multiple Indicators,” Journal of MultivariateAnalysis 65, 139—165.
Li, T. (2002): “Robust and Consistent Estimation of Nonlinear Errors-in-Variables Models,” Journal of Econometrics 110, 1—26.
Liang, H., W. Hardle, and R. Carroll, 1999, “Estimation in a Semiparamet-ric Partially Linear Errors-in-Variables Model,” The Annals of Statis-tics, Vol. 27, No. 5, 1519-1535.
Mattner, L. (1993): “Some Incomplete but Bounded Complete LocationFamilies,” Annals of Statistics, 21, 2158-2162.
Murphy, S. A. and Van der Vaart, A. W. 1996, ”Likelihood inference in theerrors-in-variables model.” J. Multivariate Anal. 59, no. 1, 81-08.
Newey, W. (2001): “Flexible Simulated Moment Estimation of Nonlinear
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007
MEASUREMENT ERROR MODELS WITH TWO SAMPLES 45
Errors-in-Variables Models,” Review of Economics and Statistics 83,616—627.
Newey, W., and J. Powell (2003): “Instrumental Variables Estimation ofNonparametric Models,” Econometrica 71, 1557—1569.
Ridder, R., and R. Moffitt (2006): “The Econometrics of Data Combina-tion,” in Handbook of Econometrics, vol. 6, ed. by J. J. Heckman, andE. Leamer, Elsevier Science.
Schennach, S. (2004): “Estimation of Nonlinear Models with MeasurementError,” Econometrica 72(1), 33—76.
Shen, X. (1997): “On Methods of Sieves and Penalization,” Annals of Sta-tistics 25, 2555—2591.
Shen, X., and W. Wong (1994) “Convergence Rate of Sieve Estimates,”The Annals of Statistics 22, 580—615.
Taupin, M. L. (2001): “Semi-parametric Estimation in the Nonlinear Struc-tural Errors-in-Variables Model,” Annals of Statistics 29, 66—93.
Van de Geer, S. (1993), “Hellinger-Consistency of Certain NonparametricMaximum Likelihood Estimators,” The Annals of Statistics, 21, 14-44.
Van de Geer, S. (2000), Empirical Processes in M-estimation, CambridgeUniversity Press.
Vuong, Q. (1989): “Likelihood Ratio Test for Model Selection and Non-nested Hypotheses,” Econometrica 57, 307—333.
Wang, L., 2004, ”Estimation of nonlinear models with Berkson measure-ment errors,” The Annals of Statistics 32, no. 6, 2559—2579.
Wang, L., and C. Hsiao (1995): “Simulation-Based Semiparametric Esti-mation of Nonlinear Errors-in-Variables Models,” working paper, Uni-versity of Southern California.
Wang, N., X. Lin, R. Gutierrez, and R. Carroll, 1998, ”Bias analysis andSIMEX approach in generalized linear mixed measurement error mod-els,” J. Amer. Statist. Assoc. 93, no. 441, 249—261.
Wansbeek, T., and E. Meijer (2000): Measurement Error and Latent Vari-ables in Econometrics, North Holland.
White, H. (1982): “Maximum Likelihood Estimation of Misspecified Mod-els,” Econometrica 50, 143—161.
Department of EconomicsYale UniversityE-mail: [email protected]
Department of EconomicsJohns Hopkins UniversityE-mail: [email protected]
imsart-aos ver. 2007/04/13 file: ChenHu-2S-EIV-2007-aug-9-aos.tex date: August 9, 2007