+ All Categories
Home > Documents > On the Construction of Minimax Optimal Nonparametric Tests ...

On the Construction of Minimax Optimal Nonparametric Tests ...

Date post: 18-Dec-2021
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
135
On the Construction of Minimax Optimal Nonparametric Tests with Kernel Embedding Methods Tong Li Submitted in partial ful๏ฌllment of the requirements for the degree of Doctor of Philosophy under the Executive Committee of the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2021
Transcript
Page 1: On the Construction of Minimax Optimal Nonparametric Tests ...

On the Construction of Minimax Optimal Nonparametric Tests with Kernel Embedding Methods

Tong Li

Submitted in partial fulfillment of therequirements for the degree of

Doctor of Philosophyunder the Executive Committee

of the Graduate School of Arts and Sciences

COLUMBIA UNIVERSITY

2021

Page 2: On the Construction of Minimax Optimal Nonparametric Tests ...

ยฉ 2021

Tong Li

All Rights Reserved

Page 3: On the Construction of Minimax Optimal Nonparametric Tests ...

Abstract

On the Construction of Minimax Optimal Nonparametric Tests with Kernel Embedding Methods

Tong Li

Kernel embedding methods have witnessed a great deal of practical success in the area of

nonparametric hypothesis testing in recent years. But ever since its first proposal, there exists an

inevitable problem that researchers in this area have been trying to answerโ€“what kernel should be

selected, because the performance of the associated nonparametric tests can vary dramatically

with different kernels. While the way of kernel selection is usually ad hoc, we wonder if there

exists a principled way of kernel selection so as to ensure that the associated nonparametric tests

have good performance. As consistency results against fixed alternatives do not tell the full story

about the power of the associated tests, we study their statistical performance within the minimax

framework. First, focusing on the case of goodness-of-fit tests, our analyses show that a vanilla

version of the kernel embedding based test could be suboptimal, and suggest a simple remedy by

moderating the kernel. We prove that the moderated approach provides optimal tests for a wide

range of deviations from the null and can also be made adaptive over a large collection of

interpolation spaces. Then, we study the asymptotic properties of goodness-of-fit, homogeneity

and independence tests using Gaussian kernels, arguably the most popular and successful among

such tests. Our results provide theoretical justifications for this common practice by showing that

tests using a Gaussian kernel with an appropriately chosen scaling parameter are minimax

optimal against smooth alternatives in all three settings. In addition, our analysis also pinpoints

the importance of choosing a diverging scaling parameter when using Gaussian kernels and

Page 4: On the Construction of Minimax Optimal Nonparametric Tests ...

suggests a data-driven choice of the scaling parameter that yields tests optimal, up to an iterated

logarithmic factor, over a wide range of smooth alternatives. Numerical experiments are

presented to further demonstrate the practical merits of our methodology.

Page 5: On the Construction of Minimax Optimal Nonparametric Tests ...

Table of Contents

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Kernel Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Nonparametric Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Minimax Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Kernel Selection and Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Chapter 2: Moderated Kernel Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1 Background and Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Operating Characteristics of MMD Based Test . . . . . . . . . . . . . . . . . . . . 11

2.2.1 Asymptotics under ๐ปGOF0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.2 Power Analysis for MMD Based Tests . . . . . . . . . . . . . . . . . . . . 12

2.3 Optimal Tests Based on Moderated MMD . . . . . . . . . . . . . . . . . . . . . . 13

2.3.1 Moderated MMD Test Statistic . . . . . . . . . . . . . . . . . . . . . . . . 13

i

Page 6: On the Construction of Minimax Optimal Nonparametric Tests ...

2.3.2 Operating Characteristics of [2๐œš๐‘›(P๐‘›, P0) Based Tests . . . . . . . . . . . . 15

2.3.3 Minimax Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4 Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Chapter 3: Gaussian Kernel Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1 Test for Goodness-of-fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Test for Homogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Test for Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4 Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4.1 Test for Goodness-of-fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4.2 Test for Homogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.4.3 Test for Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Chapter 4: Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.1 Effect of Scaling Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2 Efficacy of Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3 Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Chapter 5: Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Chapter 6: Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Appendix A: Some Technical Results and Proofs Related to Chapter 2 . . . . . . . . . . . 102

Appendix B: Some Technical Results and Proofs Related to Chapter 3 . . . . . . . . . . . 104

ii

Page 7: On the Construction of Minimax Optimal Nonparametric Tests ...

B.1 Properties of Gaussian Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

B.2 Proof of Lemma 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

B.3 Proof of Lemma 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

B.4 Decomposition of dHSIC and Its Variance Estimation . . . . . . . . . . . . . . . . 116

B.5 Theoretical Properties of Independence Tests for General ๐‘˜ . . . . . . . . . . . . . 121

iii

Page 8: On the Construction of Minimax Optimal Nonparametric Tests ...

List of Tables

4.1 Frequency that each DAG in Figure 4.4 was selected by three tests. . . . . . . . . . 42

iv

Page 9: On the Construction of Minimax Optimal Nonparametric Tests ...

List of Figures

4.1 Observed power against log(a) in Experiment I (left) and Experiment II(right). . . 38

4.2 Observed power versus sample size in Experiment III for ๐‘‘ = 1, 10, 100, 1000 from

left to right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.3 Observed power versus sample size in Experiment IV for ๐‘‘ = 2, 10, 100, 1000 from

left to right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.4 DAGs with the top 3 highest probabilities of being selected. . . . . . . . . . . . . . 42

v

Page 10: On the Construction of Minimax Optimal Nonparametric Tests ...

Acknowledgements

First of all, I want to express my sincere gratitude to my Ph.D. advisor Prof. Ming Yuan. His

guidance and support are invaluable to my research and my life. I have learned a lot from his deep

insights and descent work ethics.

I am very grateful to Prof. Bodhisattva Sen, Prof. Victor de la Pena, Prof. Cynthia Rush and

Prof. Bin Cheng for their serving on the committee. Their comments and feedback were very

helpful for the modification of my thesis and left me inspirations for future research.

I also want to thank my close friends and my fellow students, Yi Li, Youran Qi, Chensheng

Kuang, Shulei Wang, Cuize Han, Fan Gao, Luxi Cao, Yuan Li, Yuqing Xu, Chaoyu Yuan, Guanhua

Fang, Yuanzhe Xu and Shun Xu. Their company has made this journey much more pleasurable.

Their suggestions and help have been indispensable during the hard moments of my life.

Finally, I want to thank my parents Xiuyuan Li and Ping Zhou. They have always been sup-

porting and encouraging me. I owe a lot to them.

vi

Page 11: On the Construction of Minimax Optimal Nonparametric Tests ...

Dedication

To my beloved parents who always give me unconditional support and encouragement.

vii

Page 12: On the Construction of Minimax Optimal Nonparametric Tests ...

Chapter 1: Introduction

1.1 Kernel Embedding

Tests for goodness-of-fit, homogeneity and independence are central to statistical inferences.

Numerous techniques have been developed for these tasks and are routinely used in practice. In

recent years, there is a renewed interest in them from both statistics and other related fields as they

arise naturally in many modern applications where the performance of classical methods are less

than satisfactory. In particular, nonparametric inferences via the embedding of distributions into

a reproducing kernel Hilbert space (RKHS) have emerged as a popular and powerful technique to

tackle these challenges. The approach immediately allows for easy access to the rich machinery

for RKHS and has found great successes in a wide range of applications from causal discovery to

deep learning. See, e.g., Muandet et al. (2017) for a recent review.

More specifically, let ๐พ (ยท, ยท) be a symmetric and positive definite function defined over X ร—X,

that is ๐พ (๐‘ฅ, ๐‘ฆ) = ๐พ (๐‘ฆ, ๐‘ฅ) for all ๐‘ฅ, ๐‘ฆ โˆˆ X, and the Gram matrix [๐พ (๐‘ฅ๐‘–, ๐‘ฅ ๐‘— )]1โ‰ค๐‘–, ๐‘—โ‰ค๐‘› is positive

definite for any distinct ๐‘ฅ1, . . . , ๐‘ฅ๐‘› โˆˆ X. The Moore-Aronszajn Theorem indicates that such a

function, referred to as a kernel, can always be uniquely identified with a RKHS H๐พ of functions

over X. The embedding

`P(ยท) :=โˆซX๐พ (๐‘ฅ, ยท)P(๐‘‘๐‘ฅ),

maps a probability distribution P into H๐พ . The difference between two probability distributions P

and Q can then be conveniently measured by

๐›พ๐พ (P,Q) := โ€–`P โˆ’ `Qโ€–H๐พ.

Under mild regularity conditions, it can be shown that ๐›พ๐พ (P,Q) is an integral probability metric so

1

Page 13: On the Construction of Minimax Optimal Nonparametric Tests ...

that it is zero if and only if P = Q, and

๐›พ๐พ (P,Q) = sup๐‘“ โˆˆH๐พ :โ€– ๐‘“ โ€–H๐พ โ‰ค1

โˆซX๐‘“ ๐‘‘ (P โˆ’ Q) .

As such, ๐›พ๐พ (P,Q) is often referred to as the maximum mean discrepancy (MMD) between P andQ.

See, e.g., Sriperumbudur et al. (2010) or Gretton et al. (2012a) for details. In what follows, we shall

drop the subscript ๐พ whenever its choice is clear from the context. It was noted recently that MMD

is also closely related to the so-called energy distance between random variables (Szรฉkely et al.,

2007; Szรฉkely and Rizzo, 2009) commonly used to measure independence. See, e.g., Sejdinovic

et al. (2013) and Lyons (2013).

1.2 Nonparametric Hypothesis Testing

Given a sample from P and/or Q, estimates of the ๐›พ(P,Q) can be derived by replacing P and

Q with their respective empirical distributions. These estimates can subsequently be used for

nonparametric hypothesis testing. Here are several notable examples that we shall focus on in this

work.

Goodness-of-fit tests. The goal of goodness-of-fit tests is to check if a sample comes from

a pre-specified distribution. Let ๐‘‹1, ยท ยท ยท , ๐‘‹๐‘› be ๐‘› independent X-valued samples from a certain

distribution P. We are interested in testing if the hypothesis ๐ปGOF0 : P = P0 holds for a fixed P0.

Deviation from P0 can be conveniently measured by ๐›พ(P, P0) which can be readily estimated by:

๐›พ(P๐‘›, P0) := sup๐‘“ โˆˆH (๐พ):โ€– ๐‘“ โ€–๐พโ‰ค1

โˆซX๐‘“ ๐‘‘

(P๐‘› โˆ’ P0

),

where P๐‘› is the empirical distribution of ๐‘‹1, ยท ยท ยท , ๐‘‹๐‘›. A natural procedure is to reject ๐ป0 if the

estimate exceeds a threshold calibrated to ensure a certain significance level, say ๐›ผ (0 < ๐›ผ < 1).

Homogeneity tests. Homogeneity tests check if two independent samples come from a com-

mon population. Given two independent samples ๐‘‹1, ยท ยท ยท , ๐‘‹๐‘› โˆผiid P and ๐‘Œ1, ยท ยท ยท , ๐‘Œ๐‘š โˆผiid Q, we are

2

Page 14: On the Construction of Minimax Optimal Nonparametric Tests ...

interested in testing if the null hypothesis ๐ปHOM0 : P = Q holds. Discrepancy between P and Q can

be measured by ๐›พ(P,Q), and similar to before, it can be estimated by the MMD between P๐‘› and

Q๐‘š:

๐›พ(P๐‘›, Q๐‘š) := sup๐‘“ โˆˆH (๐พ):โ€– ๐‘“ โ€–๐พโ‰ค1

โˆซX๐‘“ ๐‘‘

(P๐‘› โˆ’ Q๐‘š

).

Again we reject ๐ป0 if the estimate exceeds a threshold calibrated to ensure a certain significance

level.

Independence tests. How to measure or test for independence among a set of random variables

is another classical problem in statistics. Let ๐‘‹ = (๐‘‹1, . . . , ๐‘‹ ๐‘˜ )> โˆˆ X1 ร— ยท ยท ยท ร— X๐‘˜ be a random

vector. If the random vectors ๐‘‹1, . . . , ๐‘‹ ๐‘˜ are jointly independent, then the distribution of ๐‘‹ can be

factorized:

๐ปIND0 : P๐‘‹ = P๐‘‹

1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ .

Dependence among ๐‘‹1, . . . , ๐‘‹ ๐‘˜ can be naturally measured by the difference between the joint

distribution and the product distribution evaluated under MMD:

๐›พ(P๐‘‹ , P๐‘‹1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ ) = โ€–`P๐‘‹ โˆ’ `P๐‘‹

1โŠ—ยทยทยทโŠ—P๐‘‹๐‘˜ โ€–H๐พ.

When ๐‘‘ = 2, ๐›พ2(P๐‘‹ , P๐‘‹1 โŠ—P๐‘‹2) can be expressed as the squared Hilbert-Schmidt norm of the cross-

covariance operator associated with ๐‘‹1 and ๐‘‹2 and is therefore referred to as Hilbert-Schmidt

independence criterion (HSIC; Gretton et al., 2005). The more general case as given above is

sometimes referred to as dHSIC (see, e.g., Pfister et al., 2018). As before, we proceed to reject the

independence assumption when ๐›พ(P๐‘‹๐‘› , P๐‘‹1

๐‘› โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜๐‘› ) exceed a certain threshold where P๐‘‹๐‘› and

P๐‘‹๐‘—

๐‘› are the empirical distribution of ๐‘‹ and ๐‘‹ ๐‘— respectively.

1.3 Minimax Framework

In all these cases the test statistic, namely ๐›พ2(P๐‘›, P0), ๐›พ2(P๐‘›, Q๐‘š) or ๐›พ2(P๐‘›, P๐‘‹1

๐‘› โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜๐‘› ),

is a V-statistic. Following standard asymptotic theory for V-statistics (see, e.g., Serfling, 2009), it

3

Page 15: On the Construction of Minimax Optimal Nonparametric Tests ...

can be shown that under mild regularity conditions, when appropriately scaled by the sample size,

they converge to a mixture of ๐œ’21 distribution with weights determined jointly by the underlying

probability distribution and the choice of kernel ๐พ . In contrast, it can also be derived that for a

fixed alternative,

๐›พ2(P๐‘›, P0) โ†’๐‘ ๐›พ2(P, P0), ๐›พ2(P๐‘›, Q๐‘š) โ†’๐‘ ๐›พ

2(P,Q)

and ๐›พ2(P๐‘›, P๐‘‹1

๐‘› โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜๐‘› ) โ†’๐‘ ๐›พ2(P, P๐‘‹1 โŠ— ยท ยท ยท ร— P๐‘‹ ๐‘˜ ).

This immediately suggests that all aforementioned tests are consistent against fixed alternatives in

that their power tends to one as sample sizes increase. Although useful, such consistency results

do not tell the full story about the power of these tests, and if there are yet more powerful methods.

Specifically, although consistency against any fixed alternative is proved, the rate at which the

power convergeces to 1 may vary for different alternatives and it remains a problem whether a

given sample size is large enough to ensure a good power for the underlying alternative. In other

words, can we detect the difference between the null and the alternative hypotheses with a high

probability even in the worst scenario?

This concern naturally brings the notion of uniform consistency, meaning power converging to

1 uniformly over all alternaives to be considered, and leads us to adpot the minimax hypothesis

testing framework pioneered by Burnashev (1979), Ingster (1987), and Ingster (1993). See also

Ermakov (1991), Spokoiny (1996), Lepski and Spokoiny (1999), Ingster and Suslina (2000), In-

gster (2000), Baraud (2002), Fromont and Laurent (2006), Fromont et al. (2012), and Fromont et

al. (2013), and references therein. Within this framework, we consider testing against alternatives

getting closer and closer to the null hypothesis as the sample size increases. The smallest departure

from the null hypotheses that can be detected consistently, in a minimax sense, is referred to as

the optimal detection boundary. And the test that maintains uniform consistency as the departure

converges to 0 at the rate of optimal detection boundary is called minimax rate optimal.

4

Page 16: On the Construction of Minimax Optimal Nonparametric Tests ...

1.4 Kernel Selection and Adaptation

The critical importance of kernel selection is widely recognized in practice, as the statistical

performances of the associated tests can vary dramatically with different kernels. Yet, the way

it is done is usually ad hoc and how to do so in a more principled way remains one of the chief

practical challenges. See, e.g., Gretton et al. (2008), Fukumizu et al. (2009), Gretton et al. (2012b),

and Sutherland et al. (2017). In the following chapters, we address this problem by proposing two

kernel selection methods in different settings such that the associated tests are shown to be minimax

rate optimal.

However, such kernel selection methods depend on some regularity condition of the underlying

space of probability distributions, and whether we can do it in an agnostic approach remains an-

other challenge. This also naturally brings about the issue of adaptation. To address this challenge,

we introduce a simple testing procedure by maximizing a normalized MMD over a pre-specified

class of kernels. Similar idea of maximizing MMD over a class of kernels was first introduced

by Sriperumbudur et al. (2009). Our analysis, however, suggests that it is more desirable to maxi-

mize normalized MMD instead. More specifically, we show that the proposed procedure can attain

the optimal rate, up to an iterated logarithmic factor, over spaces of probability distributions with

different regularity conditions.

The rest of the thesis is organized as follows. In Chapter 2, focusing on the case of goodness-

of-fit tests, our analyses show that a vanilla version of the kernel embedding based test could be

suboptimal, and suggest a simple remedy by moderating the kernel. We prove that the moderated

approach provides optimal tests and can also be made adaptive over a wide range of deviations from

the null. Then, in Chapter 3 we study the asymptotic properties of goodness-of-fit, homogeneity

and independence tests using Gaussian kernels, arguably the most popular and successful among

such tests. Our results provide theoretical justifications for this common practice by showing that

tests using Gaussian kernel with an appropriately chosen scaling parameter are minimax optimal

against smooth alternatives in all three settings. In addition, we suggests a data-driven choice of the

5

Page 17: On the Construction of Minimax Optimal Nonparametric Tests ...

scaling parameter that yields tests optimal, up to an iterated logarithmic factor, over a wide range

of smooth alternatives. Numerical experiments are presented in Chapter 4 to further demonstrate

the practical merits of our methodology. We conclude with some summary discussion in Chapter

5. All the main proofs are relegated to Chapter 6. Other technical results and their proofs are put

in the appendix.

6

Page 18: On the Construction of Minimax Optimal Nonparametric Tests ...

Chapter 2: Moderated Kernel Embedding

2.1 Background and Problem Setting

In this chapter, we focus on goodness-of-fit test. Specifically, with ๐‘› independent X-valued

samples ๐‘‹1, ยท ยท ยท , ๐‘‹๐‘› from a certain distribution P, we are interested in testing if the hypothesis

๐ปGOF0 : P = P0

holds for a fixed P0. Problems of this kind have a long and illustrious history in statistics and is

often associated with household names such as Kolmogrov-Smirnov tests, Pearsonโ€™s Chi-square

test or Neymanโ€™s smooth test. A plethora of other techniques have also been proposed over the

years in both parametric and nonparametric settings (e.g., Ingster and Suslina, 2003; Lehmann

and Romano, 2008). Most of the existing techniques are developed with the domain X = R or

[0, 1] in mind and work the best in these cases. Modern applications, however, oftentimes involve

domains different from these traditional ones. For example, when dealing with directional data,

which arise naturally in applications such as diffusion tensor imaging, it is natural to consider X

as the unit sphere in R3 (e.g., Jupp, 2005). Another example occurs in the context of ranking or

preference data (e.g., Ailon et al., 2008). In these cases, X can be taken as the group of permuta-

tions. Furthermore, motivated by several applications, combinatorial testing problems have been

investigated recently (e.g., Addario-Berry et al., 2010), where the spaces under consideration are

specific combinatorially structured spaces.

We consider kernel embedding and maximum mean discrepancy (MMD) based goodness-of-fit

test. Specifically, the goodness-of-fit test can be carried out conveniently by first constructing an

estimate of ๐›พ(P, P0), ๐›พ(P, P0), and then rejecting ๐ป0 if the estimate exceeds a threshold calibrated

7

Page 19: On the Construction of Minimax Optimal Nonparametric Tests ...

to ensure a certain significance level, say ๐›ผ (0 < ๐›ผ < 1).

We adpot the minimax framework to evaluate the above mentioned testing strategy. To fix

ideas, we assume in this chapter that P is dominated by P0 under the alternative so that the Radon-

Nikodym derivative ๐‘‘P/๐‘‘P0 is well defined and use the ๐œ’2 divergence between P and P0,

๐œ’2(P, P0) :=โˆซX

(๐‘‘P

๐‘‘P0

)2๐‘‘P0 โˆ’ 1,

as the separation metric to quantify the departure from the null hypothesis. We are particularly

interested in the detection boundary, namely how close P and P0 can be in terms of ๐œ’2 distance,

under the alternative, so that a test based on a sample of ๐‘› observations can still consistently dis-

tinguish between the null hypothesis and the alternative. For example, in the parametric setting

where P is known up to a finite dimensional parameters under the alternative, the detection bound-

ary of the likelihood ratio test is ๐‘›โˆ’1/2 under mild regularity conditions (e.g., Theorem 13.5.4 in

Lehmann and Romano, 2008, and the discussion leading to it). We are concerned here with alter-

natives that are nonparametric in nature. Our first result suggests that the detection boundary for

aforementioned ๐›พ๐พ (P๐‘›, P0) based test is of the order ๐‘›โˆ’1/4. However, our main results indicate,

perhaps surprisingly at first, that this rate is far from optimal and the gap between it and the usual

parametric rate can be largely bridged.

In particular, we argue that the distinguishability between P and P0 depends on how close

๐‘ข := ๐‘‘P/๐‘‘P0 โˆ’ 1 is to the RKHS H๐พ . The closeness of ๐‘ข to H๐พ can be measured by the distance

from ๐‘ข to an arbitrary ball in H๐พ . In particular, we shall consider the case where H๐พ is dense in

๐ฟ2(P0), and focus on functions that are polynomially approximable by H๐พ for concreteness. More

precisely, for some constants ๐‘€, \ > 0, denote by F (\;๐‘€) the collection of functions ๐‘“ โˆˆ ๐ฟ2(P0)

such that for any ๐‘… > 0, there exists an ๐‘“๐‘… โˆˆ H๐พ such that

โ€– ๐‘“๐‘…โ€–H๐พโ‰ค ๐‘…, and โ€– ๐‘“ โˆ’ ๐‘“๐‘…โ€–๐ฟ2 (P0) โ‰ค ๐‘€๐‘…โˆ’1/\ .

8

Page 20: On the Construction of Minimax Optimal Nonparametric Tests ...

We also adopt the convention that

F (0;๐‘€) = { ๐‘“ โˆˆ H๐พ : โ€– ๐‘“ โ€–H๐พโ‰ค ๐‘€}.

We investigate the optimal rate of detection for testing ๐ปGOF0 : P = P0 against

๐ปGOF1 (ฮ”๐‘›, \, ๐‘€) : P โˆˆ P(ฮ”๐‘›, \, ๐‘€), (2.1)

where P(ฮ”๐‘›, \, ๐‘€) is the collection of distributions P on (X,B) satisfying:

๐‘‘P/๐‘‘P0 โˆ’ 1 โˆˆ F (\;๐‘€), and ๐œ’(P, P0) โ‰ฅ ฮ”๐‘›.

We call ๐‘Ÿ๐‘› the optimal rate of detection if for any ๐‘ > 0, there exists no consistent test whenever

ฮ”๐‘› โ‰ค ๐‘๐‘Ÿ๐‘›; and on the other hand, a consistent test exists as long as ฮ”๐‘› ๏ฟฝ ๐‘Ÿ๐‘›.

Throughout this chapter, we shall assume

โˆซXร—X

๐พ2(๐‘ฅ, ๐‘ฅโ€ฒ)๐‘‘P0(๐‘ฅ)๐‘‘P0(๐‘ฅโ€ฒ) < โˆž.

Hence the Hilbert-Schmidt integral operator

๐ฟ๐พ ( ๐‘“ ) (๐‘ฅ) =โˆซX๐พ (๐‘ฅ, ๐‘ฅโ€ฒ) ๐‘“ (๐‘ฅโ€ฒ)๐‘‘P0(๐‘ฅโ€ฒ), โˆ€ ๐‘ฅ โˆˆ X

is well-defined. The spectral decomposition theorem ensures that ๐ฟ๐พ admits an eigenvalue decom-

position. Let {๐œ™๐‘˜ }๐‘˜โ‰ฅ1 denote the orthonormal eigenfunctions of ๐ฟ๐พ with eigenvalues _๐‘˜ โ€™s such

that _1 โ‰ฅ _2 โ‰ฅ ยท ยท ยท _๐‘˜ โ‰ฅ ยท ยท ยท > 0. Then as proved in, e.g., Dunford and Schwartz (1963),

๐พ (๐‘ฅ, ๐‘ฅโ€ฒ) =โˆ‘๐‘˜โ‰ฅ1

_๐‘˜๐œ™๐‘˜ (๐‘ฅ)๐œ™๐‘˜ (๐‘ฅโ€ฒ) (2.2)

in ๐ฟ2(P0 โŠ— P0). We further assume that ๐พ is continuous and that P0 is nondegenerate, meaning the

9

Page 21: On the Construction of Minimax Optimal Nonparametric Tests ...

support of P0 is X. Then Mercerโ€™s theorem ensures that (2.2) holds pointwisely. See, e.g., Theorem

4.49 of Steinwart and Christmann (2008).

As shown in Gretton et al. (2012a), the squared MMD between two probability distributions P

and P0 can be expressed as

๐›พ2๐พ (P, P0) =

โˆซ๐พ (๐‘ฅ, ๐‘ฅโ€ฒ)๐‘‘ (P โˆ’ P0) (๐‘ฅ)๐‘‘ (P โˆ’ P0) (๐‘ฅโ€ฒ). (2.3)

Write

๏ฟฝ๏ฟฝ (๐‘ฅ, ๐‘ฅโ€ฒ) = ๐พ (๐‘ฅ, ๐‘ฅโ€ฒ) โˆ’ EP0๐พ (๐‘ฅ, ๐‘‹) โˆ’ EP0๐พ (๐‘‹, ๐‘ฅโ€ฒ) + EP0๐พ (๐‘‹, ๐‘‹โ€ฒ), (2.4)

where the subscript P0 signifies the fact that the expectation is taken over ๐‘‹, ๐‘‹โ€ฒ โˆผ P0 independently.

By (2.4), ๐›พ2๐พ(P, P0) = ๐›พ2

๏ฟฝ๏ฟฝ(P, P0). Therefore, without loss of generality, we can focus on kernels

that are degenerate under P0, i.e.,

EP0๐พ (๐‘‹, ยท) = 0. (2.5)

Passing from a nondegenerate kernel to a degenerate one however presents a subtlety regarding

universality. Universality of a kernel is essential for MMD by ensuring that ๐‘‘P/๐‘‘P0 โˆ’ 1 resides

in the linear space spanned by its eigenfunctions. See, e.g., Steinwart (2001) for the definition

of universal kernel and Sriperumbudur et al. (2011) for a detailed discussion of different types of

universality. Observe that ๐‘‘P/๐‘‘P0 โˆ’ 1 necessarily lies in the orthogonal complement of constant

functions in ๐ฟ2(P0). A degenerate kernel ๐พ is universal if its eigenfunctions {๐œ™๐‘˜ }๐‘˜โ‰ฅ1 form an

orthonormal basis of the orthogonal complement of linear space {๐‘ ยท ๐œ™0 : ๐‘ โˆˆ R} where ๐œ™0(๐‘ฅ) = 1

in ๐ฟ2(P0). In what follows, we shall assume that ๐พ is both degenerate and universal.

For the sake of concreteness, we shall also assume that ๐พ has infinitely many positive eigen-

10

Page 22: On the Construction of Minimax Optimal Nonparametric Tests ...

values decaying polynomially, i.e.,

0 < lim inf๐‘˜โ†’โˆž

๐‘˜2๐‘ _๐‘˜ โ‰ค lim sup๐‘˜โ†’โˆž

๐‘˜2๐‘ _๐‘˜ < โˆž (2.6)

for some ๐‘  > 1/2. In addition, we also assume that the eigenfunctions of ๐พ are uniformly bounded,

i.e.,

sup๐‘˜โ‰ฅ1

โ€–๐œ™๐‘˜ โ€–โˆž < โˆž, (2.7)

Together with Assumptions (2.6), (2.7) ensures that Mercerโ€™s decomposition (2.2) holds uniformly.

2.2 Operating Characteristics of MMD Based Test

2.2.1 Asymptotics under ๐ปGOF0

Note that (2.5) implies E๐‘ƒ0๐œ™๐‘˜ (๐‘‹) = 0, โˆ€ ๐‘˜ โ‰ฅ 1. Hence

๐›พ2(P, P0) =โˆ‘๐‘˜โ‰ฅ1

_๐‘˜ [EP๐œ™๐‘˜ (๐‘‹)]2

for any P. Accordingly, when P is replaced by the empirical distribution P๐‘›, the empirical squared

MMD can be expressed as

๐›พ2(P๐‘›, P0) =โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

[1๐‘›

๐‘›โˆ‘๐‘–=1

๐œ™๐‘˜ (๐‘‹๐‘–)]2

.

Classic results on the asymptotics of V-statistic (Serfling, 2009) imply that

๐‘›๐›พ2(P๐‘›, P0)๐‘‘โ†’

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜๐‘2๐‘˜ := ๐‘Š

11

Page 23: On the Construction of Minimax Optimal Nonparametric Tests ...

under ๐ปGOF0 , where ๐‘๐‘˜

๐‘–.๐‘–.๐‘‘.โˆผ ๐‘ (0, 1). Let ฮฆMMD be an MMD based test, which rejects ๐ปGOF0 if and

only if ๐‘›๐›พ2(P๐‘›, P0) exceeds the 1 โˆ’ ๐›ผ quantile ๐‘ž๐‘ค,1โˆ’๐›ผ of๐‘Š , i.e.,

ฮฆMMD = 1{๐‘›๐›พ2 (P๐‘›,P0)>๐‘ž๐‘ค,1โˆ’๐›ผ} .

The above limiting distribution of ๐‘›๐›พ2(P๐‘›, P0) immediately suggests that ฮฆMMD is an asymptotic

๐›ผ-level test.

2.2.2 Power Analysis for MMD Based Tests

We now investigate the power of ฮฆMMD in testing ๐ปGOF0 against ๐ปGOF

1 (ฮ”๐‘›, \, ๐‘€) given by (2.1).

Recall that the type II error of a test ฮฆ : X๐‘› โ†’ [0, 1] for testing ๐ป0 against a composite alternative

๐ป1 : P โˆˆ P is given by

๐›ฝ(ฮฆ;P) = supPโˆˆPEP [1 โˆ’ฮฆ(๐‘‹1, . . . , ๐‘‹๐‘›)],

where EPmeans taking expectation over ๐‘‹1, . . . , ๐‘‹๐‘›๐‘–.๐‘–.๐‘‘.โˆผ P. For brevity, we shall write ๐›ฝ(ฮฆ;ฮ”๐‘›, \, ๐‘€)

instead of ๐›ฝ(ฮฆ;P(ฮ”๐‘›, \, ๐‘€)) in what follows, with P(ฮ”๐‘›, \, ๐‘€) defined right below (2.1). The

performance of a test ฮฆ can then be evaluated by its detection boundary, that is, the smallest ฮ”๐‘›

under which the type II error converges to 0 as ๐‘› โ†’ โˆž. Our first result establishes the conver-

gence rate of the detection boundary for ฮฆMMD in the case when \ = 0. Hereafter, we abbreviate

๐‘€ in P(ฮ”๐‘›, \, ๐‘€), ๐ปGOF1 (ฮ”๐‘›, \, ๐‘€) and ๐›ฝ(ฮฆ;ฮ”๐‘›, \, ๐‘€), unless it is necessary to emphasize the

dependence.

Theorem 1. Consider testing ๐ปGOF0 against ๐ปGOF

1 (ฮ”๐‘›, 0) by ฮฆMMD.

(i) If ๐‘›1/4ฮ”๐‘› โ†’ โˆž, then

๐›ฝ(ฮฆMMD;ฮ”๐‘›, 0) โ†’ 0 as ๐‘›โ†’ โˆž;

(ii) conversely, there exists a constant ๐‘0 > 0 such that

lim inf๐‘›โ†’โˆž

๐›ฝ(ฮฆMMD; ๐‘0๐‘›โˆ’1/4, 0) > 0.

12

Page 24: On the Construction of Minimax Optimal Nonparametric Tests ...

Theorem 1 shows that when the alternative ๐ปGOF1 (ฮ”๐‘›, 0) is considered, the detection boundary

of ฮฆMMD is of the order ๐‘›โˆ’1/4. It is of interest to compare the detection rate achieved by ฮฆMMD with

that in a parametric setting where consistent tests are available if ๐‘›1/2ฮ”๐‘› โ†’ โˆž. See, e.g., Theorem

13.5.4 in Lehmann and Romano (2008) and the discussion leading to it. It is natural to raise the

question to what extent such a gap can be entirely attributed to the fundamental difference between

parametric and nonparametric testing problems. We shall now argue that this gap actually is largely

due to the sub-optimality of ฮฆMMD, and the detection boundary of ฮฆMMD could be significantly

improved through a slight modification of the MMD.

2.3 Optimal Tests Based on Moderated MMD

2.3.1 Moderated MMD Test Statistic

The basic idea behind MMD is to project two probability measures onto a unit ball in H๐พ

and use the distance between the two projections to measure the distance between the original

probability measures. If the Radon-Nikodym derivative of P with respect to P0 is far away from

H๐พ , the distance between the two projections may not honestly reflect the distance between them.

More specifically, ๐›พ2(P, P0) =โˆ‘๐‘˜โ‰ฅ1

_๐‘˜ [EP๐œ™๐‘˜ (๐‘‹)]2, while the ๐œ’2 distance between P and P0 is

๐œ’2(P, P0) =โˆ‘๐‘˜โ‰ฅ1

[EP๐œ™๐‘˜ (๐‘‹)]2. Considering that _๐‘˜ decreases with ๐‘˜ , ๐›พ2(P, P0) can be much smaller

than ๐œ’2(P, P0). To overcome this problem, we consider a moderated version of the MMD which

allows us to project the probability measures onto a larger ball in H๐พ . In particular, write

[๐พ,๐œš (P,Q;P0) = sup๐‘“ โˆˆH๐พ :โ€– ๐‘“ โ€–2

๐ฟ2 (P0)+๐œš2โ€– ๐‘“ โ€–2

H๐พโ‰ค1

โˆซX๐‘“ ๐‘‘ (P โˆ’ Q) (2.8)

for a given distribution P0 and a constant ๐œš > 0. Distance between probability measures of this type

was first introduced by Harchaoui et al. (2007) when considering kernel methods for two sample

test. A subtle difference between [๐พ,๐œš (P,Q;P0) and the distance from Harchaoui et al. (2007) is

the set of ๐‘“ that we optimize over on the righthand side of (2.8). In the case of two sample test,

there is no information about P0 and therefore one needs to replace the norm โ€– ยท โ€–๐ฟ2 (P0) with the

13

Page 25: On the Construction of Minimax Optimal Nonparametric Tests ...

empirical ๐ฟ2 norm.

It is worth noting that [๐พ,๐œš (P,Q;P0) can also be identified with a particular type of MMD.

Specifically, [๐พ,๐œš (P,Q;P0) = ๐›พ๏ฟฝ๏ฟฝ๐œš (P,Q), where

๏ฟฝ๏ฟฝ๐œš (๐‘ฅ, ๐‘ฅโ€ฒ) :=โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2 ๐œ™๐‘˜ (๐‘ฅ)๐œ™๐‘˜ (๐‘ฅโ€ฒ).

We shall nonetheless still refer to [๐พ,๐œš (P,Q;P0) as a moderated MMD in what follows to empha-

size the critical importance of moderation. We shall also abbreviate the dependence of [ on ๐พ and

P0 unless necessary. The unit ball in (2.8) is defined in terms of both RKHS norm and ๐ฟ2(P0)

norm. Recall that ๐‘ข = ๐‘‘P/๐‘‘P0 โˆ’ 1 so that

supโ€– ๐‘“ โ€–๐ฟ2 (P0)โ‰ค1

โˆซX๐‘“ ๐‘‘ (P โˆ’ P0) = sup

โ€– ๐‘“ โ€–๐ฟ2 (P0)โ‰ค1

โˆซX๐‘“ ๐‘ข๐‘‘๐‘ƒ0 = โ€–๐‘ขโ€–๐ฟ2 (P0) = ๐œ’(P, P0).

We can therefore expect that a smaller ๐œš will make [2๐œš (P, P0) closer to ๐œ’2(P, P0), since the unit

ball to be considered will become more similar to the unit ball in ๐ฟ2(P0). This can also be verified

by noticing that

lim๐œšโ†’0

[2๐œš (P, P0) = lim

๐œšโ†’0

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2 [E๐‘ƒ๐œ™๐‘˜ (๐‘‹)]2 =

โˆ‘๐‘˜โ‰ฅ1

[E๐‘ƒ๐œ™๐‘˜ (๐‘‹)]2 = ๐œ’2(P, P0).

Therefore, we choose ๐œš converging to 0 when constructing our test statistic.

Hereafter we shall attach the subscript ๐‘› to ๐œš to signify its dependence on ๐‘›. We shall argue that

letting ๐œŒ๐‘› converge to 0 at an appropriate rate as ๐‘› increases indeed results in a test more powerful

than ฮฆMMD. The test statistic we propose is the empirical version of [2๐œš๐‘›(P, P0):

[2๐œš๐‘›(P๐‘›, ๐‘ƒ0) =

1๐‘›2

โˆ‘๐‘–, ๐‘—=1

๏ฟฝ๏ฟฝ๐œš๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ) =โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

[1๐‘›

๐‘›โˆ‘๐‘–=1

๐œ™๐‘˜ (๐‘‹๐‘–)]2

. (2.9)

This test statistics is similar in spirit to the homogeneity test proposed previously by Harchaoui

et al. (2007), albeit motivated from a different viewpoint. In either case, it is intuitive to expect

14

Page 26: On the Construction of Minimax Optimal Nonparametric Tests ...

improved performance over the vanilla version of the MMD when ๐œš๐‘› converges to zero at an appro-

priate rate. The main goal of the present work to precisely characterize the amount of moderation

needed to ensure maximum power. We first argue that letting ๐œš๐‘› converge to 0 at an appropriate

rate indeed results in a test more powerful than ฮฆMMD.

2.3.2 Operating Characteristics of [2๐œš๐‘›(P๐‘›, P0) Based Tests

Although the expression for [2๐œš๐‘›(P๐‘›, P0) given by (2.9) looks similar to that of ๐›พ2(P๐‘›, P0),

their asymptotic behaviors are quite different. At a technical level, this is due to the fact that the

eigenvalues of the underlying kernel

_๐‘›๐‘˜ :=_๐‘˜

_๐‘˜ + ๐œš2๐‘›

depend on ๐‘› and may not be uniformly summable over ๐‘›. As presented in the following theorem,

a certain type of asymptotic normality, instead of a sum of chi-squares as in the case of ๐›พ2(P๐‘›, P0),

holds for [2๐œš๐‘›(P๐‘›, P0) under P0, which helps determine the rejection region of the [2

๐œš๐‘›based test.

Theorem 2. Assume that ๐œš๐‘› โ†’ 0 as ๐‘› โ†’ โˆž in such a fashion that ๐‘›๐œš1/(2๐‘ )๐‘› โ†’ โˆž. Then under

๐ปGOF0 ,

๐‘ฃโˆ’1/2๐‘› [๐‘›[2

๐œš๐‘›(P๐‘›, P0) โˆ’ ๐ด๐‘›]

๐‘‘โ†’ ๐‘ (0, 2),

where

๐‘ฃ๐‘› =โˆ‘๐‘˜โ‰ฅ1

(_๐‘˜

_๐‘˜ + ๐œš2๐‘›

)2, and ๐ด๐‘› =

1๐‘›

๐‘›โˆ‘๐‘–=1

๏ฟฝ๏ฟฝ๐œš๐‘› (๐‘‹๐‘–, ๐‘‹๐‘–).

In the light of Theorem 2, a test that rejects ๐ป0 if and only if

2โˆ’1/2๐‘ฃโˆ’1/2๐‘› [๐‘›[2

๐œš๐‘›(P๐‘›, P0) โˆ’ ๐ด๐‘›]

exceeds ๐‘ง1โˆ’๐›ผ is an asymptotic ๐›ผ-level test, where ๐‘ง1โˆ’๐›ผ stands for the 1 โˆ’ ๐›ผ quantile of a standard

normal distribution. We refer to this test as ฮฆM3d where the subscript M3d stands for Moderated

MMD. The performance of ฮฆM3d under the alternative hypothesis is characterized by the follow-

15

Page 27: On the Construction of Minimax Optimal Nonparametric Tests ...

ing theorem, showing that its detection boundary is much improved when compared with that of

ฮฆMMD.

Theorem 3. Consider testing ๐ปGOF0 against ๐ปGOF

1 (ฮ”๐‘›, \) by ฮฆM3d with ๐œš๐‘› = ๐‘๐‘›โˆ’2๐‘  (\+1)4๐‘ +\+1 for an

arbitrary constant ๐‘ > 0. If ๐‘›2๐‘ 

4๐‘ +\+1ฮ”๐‘› โ†’ โˆž, then ฮฆM3d is consistent in that

๐›ฝ(ฮฆM3d;ฮ”๐‘›, \) โ†’ 0, as ๐‘›โ†’ โˆž.

Theorem 3 indicates that the detection boundary for ฮฆM3d is ๐‘›โˆ’2๐‘ /(4๐‘ +\+1) . In particular, when

testing ๐ปGOF0 against ๐ปGOF

1 (ฮ”๐‘›, 0), i.e., \ = 0, it becomes ๐‘›โˆ’4๐‘ /(4๐‘ +1) . This is to be contrasted with

the detection boundary for ฮฆMMD, which, as suggested by Theorem 1, is of the order ๐‘›โˆ’1/4. It is

also worth noting that the detection boundary for ฮฆM3d deteriorates as \ increases, implying that it

is harder to test against a larger interpolation space.

2.3.3 Minimax Optimality

It is of interest to investigate if the detection boundary of ฮฆM3d can be further improved. We

now show that the answer is negative in a certain sense.

Theorem 4. Consider testing๐ปGOF0 against๐ปGOF

1 (ฮ”๐‘›, \), for some \ < 2๐‘ โˆ’1. If lim sup๐‘›โ†’โˆž

ฮ”๐‘›๐‘›2๐‘ 

4๐‘ +\+1 <

โˆž, then there exists ๐›ผ โˆˆ (0, 1) such that for any ฮฆ๐‘› of level ๐›ผ (asymptotically) based on ๐‘‹1, ยท ยท ยท , ๐‘‹๐‘›,

lim sup๐‘›โ†’โˆž

๐›ฝ(ฮฆ๐‘›;ฮ”๐‘›, \) > 0.

Together with Theorem 3, this suggests that ฮฆM3d is rate optimal in the minimax sense, when

considering ๐œ’2 distance as the separation metric and F (\, ๐‘€) as the regularity condition of alter-

native space.

16

Page 28: On the Construction of Minimax Optimal Nonparametric Tests ...

2.4 Adaptation

Despite the minimax optimality of ฮฆM3d, a practical challenge in using it is the choice of an

appropriate tuning parameter ๐œš๐‘›. In particular, Theorem 3 suggests that ๐œš๐‘› needs to be taken at

the order of ๐‘›โˆ’2๐‘ (\+1)/(4๐‘ +\+1) which depends on the value of ๐‘  and \. On the one hand, since P0

and ๐พ are known a priori, so is ๐‘ . On the other hand, \ reflects the property of ๐‘‘P/๐‘‘P0 which

is typically not known in advance. This naturally brings us to the issue of adaptation (see, e.g.,

Spokoiny, 1996; Ingster, 2000). In other words, we are interested in a single testing procedure that

can achieve the detection boundary for testing ๐ปGOF0 against ๐ปGOF

1 (ฮ”๐‘› (\), \) simultaneously over

all \ โ‰ฅ 0. We emphasize the dependence of ฮ”๐‘› on \ since the detection boundary may depend

on \, as suggested by the results from the previous section. In fact, we should build upon the test

statistic introduced before.

More specifically, write

๐œŒโˆ— =

(โˆšlog log ๐‘›๐‘›

)2๐‘ 

,

and

๐‘šโˆ— =

log2

๐œŒโˆ’1โˆ—

(โˆšlog log ๐‘›๐‘›

) 2๐‘ 4๐‘ +1

.Then our test statistic is taken to be the maximum of ๐‘‡๐‘›,๐œš๐‘› for ๐œŒ๐‘› = ๐œŒโˆ—, 2๐œŒโˆ—, 22๐œŒโˆ—, . . . , 2๐‘šโˆ—๐œŒโˆ—:

๐‘‡GOF(adapt)๐‘› := sup

0โ‰ค๐‘˜โ‰ค๐‘šโˆ—

๐‘‡๐‘›,2๐‘˜ ๐œšโˆ— , (2.10)

where

๐‘‡๐‘›,๐œš๐‘› = (2๐‘ฃ๐‘›)โˆ’1/2 [๐‘›[2๐œš๐‘›(P๐‘›, P0) โˆ’ ๐ด๐‘›] .

It turns out if an appropriate rejection threshold is chosen, ๐‘‡GOF(adapt)๐‘› can achieve a detection

boundary very similar to the one we have before, but now simultaneously over all \ > 0.

17

Page 29: On the Construction of Minimax Optimal Nonparametric Tests ...

Theorem 5. (i) Under ๐ปGOF0 ,

lim๐‘›โ†’โˆž

๐‘ƒ

(๐‘‡

GOF(adapt)๐‘› โ‰ฅ

โˆš3 log log ๐‘›

)= 0;

(ii) on the other hand, there exists a constant ๐‘1 > 0 such that,

lim๐‘›โ†’โˆž

infPโˆˆโˆช\โ‰ฅ0P(ฮ”๐‘› (\),\)

๐‘ƒ

(๐‘‡

GOF(adapt)๐‘› โ‰ฅ

โˆš3 log log ๐‘›

)= 1,

provided that ฮ”๐‘› (\) โ‰ฅ ๐‘1(๐‘›โˆ’1โˆšlog log ๐‘›) 2๐‘ 4๐‘ +\+1 .

Theorem 5 immediately suggests that a test rejects๐ปGOF0 if and only if๐‘‡GOF(adapt)

๐‘› โ‰ฅโˆš

3 log log ๐‘›

is consistent for testing it against ๐ปGOF1 (ฮ”๐‘› (\), \) for all \ โ‰ฅ 0 provided that

ฮ”๐‘› (\) โ‰ฅ ๐‘1(๐‘›โˆ’1โˆšlog log ๐‘›) 2๐‘ 4๐‘ +\+1 .

We note that the detection boundary given in Theorem 5 is similar, but inferior by a factor of

(log log ๐‘›) 2๐‘ 4๐‘ +\+1 , to that from Theorem 4. As our next result indicates such an extra factor is indeed

unavoidable and is the price one needs to pay for adaptation.

Theorem 6. Let 0 < \1 < \2 < 2๐‘  โˆ’ 1. Then there exists a positive constant ๐‘2, such that if

lim sup๐‘›โ†’โˆž

sup\โˆˆ[\1,\2]

ฮ”๐‘› (\)(

๐‘›โˆšlog log ๐‘›

) 2๐‘ 4๐‘ +\+1 โ‰ค ๐‘2

then

lim๐‘›โ†’โˆž

infฮฆ๐‘›

[EP0ฮฆ๐‘› + sup

\โˆˆ[\1,\2]๐›ฝ(ฮฆ๐‘›;ฮ”๐‘› (\), \)

]= 1.

Similar to Theorem 4, Theorem 6 shows that there is no consistent test for ๐ปGOF0 against

๐ปGOF1 (ฮ”๐‘›, \) simultaneously over all \ โˆˆ [\1, \2], if ฮ”๐‘› (\) โ‰ค ๐‘2

(๐‘›โˆ’1โˆšlog log ๐‘›

) 2๐‘ 4๐‘ +\+1 โˆ€ \ โˆˆ

[\1, \2] for a sufficiently small ๐‘2. Together with Theorem 5, this suggests that the above men-

18

Page 30: On the Construction of Minimax Optimal Nonparametric Tests ...

tioned adaptive test is indeed rate optimal.

19

Page 31: On the Construction of Minimax Optimal Nonparametric Tests ...

Chapter 3: Gaussian Kernel Embedding

3.1 Test for Goodness-of-fit

Throughout this chapter, we shall consider goodness-of-fit, homogeneity and independence

tests. We focus on continuous data, e.g., X = R๐‘‘ , and Gaussian kernels, which are arguably the

most popular and successful choice in practice.

Among the three testing problems that we consider, it is instructive to begin with the case of

goodness-of-fit. Obviously, the choice of kernel ๐พ plays an essential role in kernel embedding of

distributions. In particular, when data are continuous, Gaussian kernels are commonly used. More

specifically, a Gaussian kernel with a scaling parameter a > 0 is given by

๐บ๐‘‘,a (๐‘ฅ, ๐‘ฆ) = exp(โˆ’aโ€–๐‘ฅ โˆ’ ๐‘ฆโ€–2

๐‘‘

), โˆ€๐‘ฅ, ๐‘ฆ โˆˆ R๐‘‘ .

Hereafter โ€– ยท โ€–๐‘‘ stands for the usual Euclidean norm in R๐‘‘ . For brevity, we shall suppress the

subscript ๐‘‘ in both โ€– ยท โ€– and ๐บ when the dimensionality is clear from the context. When P and

Q are probability distributions defined over X = R๐‘‘ , we shall write the MMD between them with

a Gaussian kernel and scaling parameter a as ๐›พa (P,Q) where the subscript signifies the specific

value of the scaling parameter.

We shall restrict our attention to distributions with smooth densities. Denote by W๐‘ ,2๐‘‘

the ๐‘ th

order Sobolev space in R๐‘‘ , that is

W๐‘ ,2๐‘‘

=

{๐‘“ : R๐‘‘ โ†’ R

๏ฟฝ๏ฟฝ ๐‘“ is almost surely continuous andโˆซ

(1 + โ€–๐œ”โ€–2)๐‘ โ€–F ( ๐‘“ ) (๐œ”)โ€–2๐‘‘๐œ” < โˆž}

20

Page 32: On the Construction of Minimax Optimal Nonparametric Tests ...

where F ( ๐‘“ ) is the Fourier transform of ๐‘“ :

F ( ๐‘“ ) (๐œ”) = 1(2๐œ‹)๐‘‘/2

โˆซR๐‘‘๐‘“ (๐‘ฅ)๐‘’โˆ’๐‘–๐‘ฅ>๐œ”๐‘‘๐‘ฅ.

In what follows, we shall again abbreviate the subscript ๐‘‘ in W๐‘ ,2๐‘‘

when it is clear from the context.

For any ๐‘“ โˆˆ W๐‘ ,2, we shall write

โ€– ๐‘“ โ€–2W๐‘ ,2 =

โˆซR๐‘‘(1 + โ€–๐œ”โ€–2)๐‘ โ€–F ( ๐‘“ ) (๐œ”)โ€–2๐‘‘๐œ”.

Let ๐‘ and ๐‘0 be the density functions of P and P0 respectively. We are interested in the case when

both ๐‘ and ๐‘0 are elements from W๐‘ ,2.

Note that we can rewrite the null hypothesis ๐ปGOF0 in terms of density functions: ๐ปGOF

0 : ๐‘ = ๐‘0

for some prespecified denstiy ๐‘0 โˆˆ W๐‘ ,2. To better quantify the power of a test, we shall consider

testing against an alternative that is increasingly closer to the null as the sample size ๐‘› increases:

๐ปGOF1 (ฮ”๐‘›; ๐‘ ) : ๐‘ โˆˆ W๐‘ ,2(๐‘€), โ€–๐‘ โˆ’ ๐‘0โ€–๐ฟ2 โ‰ฅ ฮ”๐‘›,

where

W๐‘ ,2(๐‘€) ={๐‘“ โˆˆ W๐‘ ,2 : โ€– ๐‘“ โ€–W๐‘ ,2 โ‰ค ๐‘€

}.

and

โ€– ๐‘“ โ€–2๐ฟ2

=

โˆซR๐‘‘๐‘“ 2(๐‘ฅ)๐‘‘๐‘ฅ.

The alternative hypothesis๐ปGOF1 (ฮ”๐‘›; ๐‘ ) is composite and the power of a test ฮฆ based on ๐‘‹1, . . . , ๐‘‹๐‘› โˆผ

๐‘ is therefore defined as

power(ฮฆ;๐ปGOF1 (ฮ”๐‘›; ๐‘ )) := inf

๐‘โˆˆW๐‘ ,2 (๐‘€),โ€–๐‘โˆ’๐‘0โ€–๐ฟ2โ‰ฅฮ”๐‘›P{ฮฆ rejects ๐ปGOF

0 }

21

Page 33: On the Construction of Minimax Optimal Nonparametric Tests ...

Let

๏ฟฝ๏ฟฝa (๐‘ฅ, ๐‘ฆ;P0) = ๐บa (๐‘ฅ, ๐‘ฆ) โˆ’ E๐‘‹โˆผP0๐บa (๐‘‹, ๐‘ฆ) โˆ’ E๐‘‹โˆผP0๐บa (๐‘ฅ, ๐‘‹) + E๐‘‹,๐‘‹ โ€ฒโˆผiidP0๐บa (๐‘‹, ๐‘‹โ€ฒ).

and recall that

๐›พ2a (P๐‘›, P0) =

1๐‘›2

๐‘›โˆ‘๐‘–, ๐‘—=1

๏ฟฝ๏ฟฝa (๐‘‹๐‘–, ๐‘‹ ๐‘— ).

Similarly with Chapter 2, we correct for bias and use instead the following๐‘ˆ-statistic:

๐›พ2a (P, P0) :=

1๐‘›(๐‘› โˆ’ 1)

๐‘›โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๏ฟฝ๏ฟฝa (๐‘‹๐‘–, ๐‘‹ ๐‘— ),

which we shall focus on in what follows.

The choice of the scaling parameter a is essential when using RKHS embedding for goodness-

of-fit test. While the importance of data-driven choice of a is widely recognized in practice, almost

all existing theoretical studies assume that a fixed kernel, therefore a fixed scaling parameter, is

used. Here we shall demonstrate the benefit of using a data-driven scaling parameter, and especially

choosing a scaling parameter that diverges with the sample size.

More specifically, we argue that, with appropriate scaling, ๐›พ2a (P, P0) can be viewed as an esti-

mate of โ€–๐‘ โˆ’ ๐‘0โ€–2๐ฟ2

when a โ†’ โˆž as ๐‘›โ†’ โˆž. Note that

โˆซ(๐‘ โˆ’ ๐‘0)2 =

โˆซ๐‘2 โˆ’ 2

โˆซ๐‘ ยท ๐‘0 +

โˆซ๐‘2

0.

The first term can be estimated by

โˆซ๐‘2 โ‰ˆ 1

๐‘›

๐‘›โˆ‘๐‘–=1

๐‘(๐‘‹๐‘–) โ‰ˆ1๐‘›

๐‘›โˆ‘๐‘–=1

๐‘โ„Ž,โˆ’๐‘– (๐‘‹๐‘–)

where ๐‘โ„Ž,โˆ’๐‘– is a kernel density estimate of ๐‘ with the ๐‘–th observation removed and bandwidth โ„Ž:

๐‘โ„Ž,โˆ’๐‘– (๐‘ฅ) =1

๐‘›(2๐œ‹โ„Ž2)๐‘‘/2โˆ‘๐‘—โ‰ ๐‘–

๐บ (2โ„Ž2)โˆ’1 (๐‘ฅ โˆ’ ๐‘‹ ๐‘— ).

22

Page 34: On the Construction of Minimax Optimal Nonparametric Tests ...

Thus, we can estimateโˆซ๐‘2 by

1๐‘›(๐‘› โˆ’ 1) (2๐œ‹โ„Ž2)๐‘‘/2

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๐บ (2โ„Ž2)โˆ’1 (๐‘‹๐‘–, ๐‘‹ ๐‘— ).

Similarly, the cross-product term can be estimated by

โˆซ๐‘ ยท ๐‘0 โ‰ˆ

โˆซ๐‘โ„Ž (๐‘ฅ)๐‘0(๐‘ฅ)๐‘‘๐‘ฅ =

1๐‘›(2๐œ‹โ„Ž2)๐‘‘/2

๐‘›โˆ‘๐‘–=1

โˆซ๐บ (2โ„Ž2)โˆ’1 (๐‘ฅ, ๐‘‹๐‘–)๐‘0(๐‘ฅ)๐‘‘๐‘ฅ.

Together, we can view1

๐‘›(๐‘› โˆ’ 1) (2๐œ‹โ„Ž2)๐‘‘/2โˆ‘

1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›๏ฟฝ๏ฟฝ (2โ„Ž2)โˆ’1 (๐‘‹๐‘–, ๐‘‹ ๐‘— )

as an estimate ofโˆซ(๐‘ โˆ’ ๐‘0)2. Following standard asymptotic properties of the kernel density

estimator (see, e.g., Tsybakov, 2008), we know that

(๐œ‹/a)โˆ’๐‘‘/2๐›พ2a (P, P0) โ†’๐‘ โ€–๐‘ โˆ’ ๐‘0โ€–2

๐ฟ2

if a โ†’ โˆž in such a fashion that a = ๐‘œ(๐‘›4/๐‘‘). Motivated by this observation, we shall now consider

testing ๐ปGOF0 using ๐›พ2

a (P, P0) with a diverging a. To signify the dependence of a on the sample

size, we shall add a subscript ๐‘› in what follows.

Under ๐ปGOF0 , it is clear E๐›พ2

a๐‘› (P, P0) = 0. Note also that

var(๐›พ2a๐‘› (P, P0))

=2

๐‘›(๐‘› โˆ’ 1)E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)

]2

=2

๐‘›(๐‘› โˆ’ 1)

[E

[๐บa๐‘› (๐‘‹1, ๐‘‹2)

]2 โˆ’ 2E[๐บa๐‘› (๐‘‹1, ๐‘‹2)๐บa๐‘› (๐‘‹1, ๐‘‹3)] +(E

[๐บa๐‘› (๐‘‹1, ๐‘‹2)

] )2]

=2

๐‘›(๐‘› โˆ’ 1)

[E๐บ2a๐‘› (๐‘‹1, ๐‘‹2) โˆ’ 2E[๐บa๐‘› (๐‘‹1, ๐‘‹2)๐บa๐‘› (๐‘‹1, ๐‘‹3)] +

(E

[๐บa๐‘› (๐‘‹1, ๐‘‹2)

] )2]. (3.1)

23

Page 35: On the Construction of Minimax Optimal Nonparametric Tests ...

Simple calculations yield:

var(๐›พ2a๐‘› (P, P0)) =

2(๐œ‹/(2a๐‘›))๐‘‘/2๐‘›2 ยท โ€–๐‘0โ€–2

๐ฟ2ยท (1 + ๐‘œ(1)),

assuming that a๐‘› โ†’ โˆž. We shall show that

๐‘›โˆš

2

(2a๐‘›๐œ‹

)๐‘‘/4๐›พ2a๐‘› (P, P0) โ†’๐‘‘ ๐‘

(0, โ€–๐‘0โ€–2

๐ฟ2

).

To use this as a test statistic, however, we will need to estimate var(๐›พ2a๐‘› (P, P0)). To this end, it

is natural to consider estimating each of the three terms on the rightmost hand side of (3.1) by

๐‘ˆ-statistics:

๐‘ 2๐‘›,a๐‘› =1

๐‘›(๐‘› โˆ’ 1)โˆ‘

1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›๐บ2a๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— )

โˆ’ 2(๐‘› โˆ’ 3)!๐‘›!

โˆ‘1โ‰ค๐‘–, ๐‘—1, ๐‘—2โ‰ค๐‘›|{๐‘–, ๐‘—1, ๐‘—2}|=3

๐บa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘—1)๐บa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘—2)

+ (๐‘› โˆ’ 4)!๐‘›!

โˆ‘1โ‰ค๐‘–1,๐‘–2, ๐‘—1, ๐‘—2โ‰ค๐‘›|{๐‘–1,๐‘–2, ๐‘—1, ๐‘—2}|=4

๐บa๐‘› (๐‘‹๐‘–1 , ๐‘‹ ๐‘—1)๐บa๐‘› (๐‘‹๐‘–2 , ๐‘‹ ๐‘—2).

Note that ๐‘ 2๐‘›,a๐‘› is not always positive. To avoid a negative estimate of the variance, we can replace

it with a sufficiently small value, say 1/๐‘›2, whenever it is negative or too small. Namely, let

๏ฟฝ๏ฟฝ2๐‘›,a๐‘› = max{๐‘ 2๐‘›,a๐‘› , 1/๐‘›

2} ,and consider a test statistic:

๐‘‡GOF๐‘›,a๐‘›

:=๐‘›โˆš

2๏ฟฝ๏ฟฝโˆ’1๐‘›,a๐‘›

๐›พ2a๐‘› (P, P0).

We have

24

Page 36: On the Construction of Minimax Optimal Nonparametric Tests ...

Theorem 7. Let a๐‘› โ†’ โˆž as ๐‘›โ†’ โˆž in such a fashion that a๐‘› = ๐‘œ(๐‘›4/๐‘‘). Then, under ๐ปGOF0 ,

๐‘›โˆš

2

(2a๐‘›๐œ‹

)๐‘‘/4๐›พ2a๐‘› (P, P0) โ†’๐‘‘ ๐‘ (0, โ€–๐‘0โ€–2

๐ฟ2). (3.2)

Moreover,

๐‘‡GOF๐‘›,a๐‘›

โ†’๐‘‘ ๐‘ (0, 1). (3.3)

Theorem 7 immediately implies a test, denoted by ฮฆGOF๐‘›,a๐‘›,๐›ผ

(๐›ผ โˆˆ (0, 1)), that rejects ๐ปGOF0 if

and only if ๐‘‡GOF๐‘›,a๐‘›

exceeds ๐‘ง๐›ผ, the upper 1 โˆ’ ๐›ผ quantile of the standard normal distribution, is an

asymptotic ๐›ผ-level test.

We now proceed to study its power against a smooth alternative. Following the same argument

as before, it can be shown that

1๐‘›(๐‘› โˆ’ 1) (๐œ‹/a๐‘›)๐‘‘/2

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ) โ†’๐‘ โ€–๐‘ โˆ’ ๐‘0โ€–2๐ฟ2,

and

(2a๐‘›/๐œ‹)๐‘‘/2 ๏ฟฝ๏ฟฝ2๐‘›,a๐‘› โ†’๐‘ โ€–๐‘โ€–2๐ฟ2,

so that

๐‘›โˆ’1(a๐‘›/(2๐œ‹))๐‘‘/4๐‘‡GOF๐‘› โ†’๐‘ โ€–๐‘ โˆ’ ๐‘0โ€–2

๐ฟ2/โ€–๐‘โ€–๐ฟ2 .

This immediately implies that, if a๐‘› โ†’ โˆž in such a manner that a๐‘› = ๐‘œ(๐‘›4/๐‘‘), then ฮฆGOF๐‘›,a๐‘›,๐›ผ

is

consistent for a fixed ๐‘ โ‰  ๐‘0 in that its power converges to one. In fact, as ๐‘› increases, more and

more subtle deviation from ๐‘0 can be detected by ฮฆGOF๐‘›,a๐‘›,๐›ผ

. A refined analysis of the asymptotic

behavior of ๐‘‡GOF๐‘›,a๐‘›

yields that

Theorem 8. Assume that ๐‘›2๐‘ /(๐‘‘+4๐‘ )ฮ”๐‘› โ†’ โˆž. Then for any ๐›ผ โˆˆ (0, 1),

lim๐‘›โ†’โˆž

power{ฮฆGOF๐‘›,a๐‘›,๐›ผ

;๐ปGOF1 (ฮ”๐‘›; ๐‘ )} โ†’ 1,

25

Page 37: On the Construction of Minimax Optimal Nonparametric Tests ...

provided that a๐‘› ๏ฟฝ ๐‘›4/(๐‘‘+4๐‘ ) .

In other words, ฮฆGOF๐‘›,a๐‘›,๐›ผ

has a detection boundary of the order ๐‘‚ (๐‘›โˆ’2๐‘ /(๐‘‘+4๐‘ )) which turns out

to be minimax optimal in that no other tests could attain a detection boundary with faster rate of

convergence. More precisely, we have

Theorem 9. Assume that lim inf๐‘›โ†’โˆž ๐‘›2๐‘ /(๐‘‘+4๐‘ )ฮ”๐‘› < โˆž and ๐‘0 is density such that

โ€–๐‘0โ€–W๐‘ ,2 < ๐‘€ . Then there exists some ๐›ผ โˆˆ (0, 1) such that for any test ฮฆ๐‘› of level ๐›ผ (asymptoti-

cally) based on ๐‘‹1, . . . , ๐‘‹๐‘› โˆผ ๐‘,

lim inf๐‘›โ†’โˆž

power{ฮฆ๐‘›;๐ปGOF1 (ฮ”๐‘›; ๐‘ )} < 1.

Together, Theorems 8 and 9 suggest that Gaussian kernel embedding of distributions is espe-

cially suitable for testing against smooth alternatives, and it yields a test that could consistently

detect the smallest departures, in terms of rate of convergence, from the null distribution. The idea

can also be readily applied to testing of homogeneity and independence which we shall examine

next.

3.2 Test for Homogeneity

As in the case of goodness of fit test, we shall consider the case when the underlying distri-

butions have smooth densities so that we can rewrite the null hypothesis as ๐ปHOM0 : ๐‘ = ๐‘ž โˆˆ

W๐‘ ,2(๐‘€), and the alternative hypothesis as

๐ปHOM1 (ฮ”๐‘›; ๐‘ ) : ๐‘, ๐‘ž โˆˆ W๐‘ ,2(๐‘€), โ€–๐‘ โˆ’ ๐‘žโ€–๐ฟ2 โ‰ฅ ฮ”๐‘›.

The power of a test ฮฆ based on ๐‘‹1, . . . , ๐‘‹๐‘› โˆผ ๐‘ and ๐‘Œ1, . . . , ๐‘Œ๐‘š โˆผ ๐‘ž is given by

power(ฮฆ;๐ปHOM1 (ฮ”๐‘›; ๐‘ )) := inf

๐‘,๐‘žโˆˆW๐‘ ,2 (๐‘€),โ€–๐‘โˆ’๐‘žโ€–๐ฟ2โ‰ฅฮ”๐‘›P{ฮฆ rejects ๐ปHOM

0 }

26

Page 38: On the Construction of Minimax Optimal Nonparametric Tests ...

To fix ideas, we shall also assume that ๐‘ โ‰ค ๐‘š/๐‘› โ‰ค ๐ถ for some constants 0 < ๐‘ โ‰ค ๐ถ < โˆž.

In addition, we shall express explicitly only the dependence on ๐‘› and not ๐‘š, for brevity. Our

treatment, however, can be straightforwardly extended to more general situations.

Recall that

๐›พ2a๐‘›(P๐‘›, Q๐‘š) =

1๐‘›2

โˆ‘1โ‰ค๐‘–, ๐‘—โ‰ค๐‘›

๐บa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ) +1๐‘š2

โˆ‘1โ‰ค๐‘–, ๐‘—โ‰ค๐‘š

๐บa๐‘› (๐‘Œ๐‘–, ๐‘Œ ๐‘— )

โˆ’ 2๐‘š๐‘›

๐‘›โˆ‘๐‘–=1

๐‘šโˆ‘๐‘—=1๐บa๐‘› (๐‘‹๐‘–, ๐‘Œ ๐‘— ).

As before, to reduce bias, we shall focus instead on a closely related estimate of ๐›พa๐‘› (P,Q):

๐›พ2a๐‘› (P,Q) =

1๐‘›(๐‘› โˆ’ 1)

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๐บa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ) +1

๐‘š(๐‘š โˆ’ 1)โˆ‘

1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘š๐บa๐‘› (๐‘Œ๐‘–, ๐‘Œ ๐‘— )

โˆ’ 2๐‘š๐‘›

๐‘›โˆ‘๐‘–=1

๐‘šโˆ‘๐‘—=1๐บa๐‘› (๐‘‹๐‘–, ๐‘Œ ๐‘— ).

It is easy to see that under ๐ปHOM0 ,

E๐›พ2a๐‘› (P,Q) = 0,

and

var(๐›พ2a๐‘› (P,Q)

)= 2

(1

๐‘›(๐‘› โˆ’ 1) +2๐‘š๐‘›

+ 1๐‘š(๐‘š โˆ’ 1)

)E(๐‘‹,๐‘Œ )โˆผPโŠ—Q๏ฟฝ๏ฟฝ

2a๐‘›(๐‘‹,๐‘Œ ),

where

๏ฟฝ๏ฟฝa๐‘› (๐‘ฅ, ๐‘ฆ) = ๐บa (๐‘ฅ, ๐‘ฆ) โˆ’ E๐‘‹โˆผP๐บa๐‘› (๐‘‹, ๐‘ฆ) โˆ’ E๐‘ŒโˆผQ๐บa๐‘› (๐‘ฅ,๐‘Œ ) + E(๐‘‹,๐‘Œ )โˆผPโŠ—Q๐บa๐‘› (๐‘‹,๐‘Œ ).

27

Page 39: On the Construction of Minimax Optimal Nonparametric Tests ...

It is therefore natural to consider estimating the variance by ๏ฟฝ๏ฟฝ2๐‘›,๐‘š,a๐‘› = max{๐‘ 2๐‘›,๐‘š,a๐‘› , 1/๐‘›

2} where

๐‘ 2๐‘›,๐‘š,a๐‘› =1

๐‘ (๐‘ โˆ’ 1)โˆ‘

1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘๐บ2a๐‘› (๐‘๐‘–, ๐‘ ๐‘— )

โˆ’ 2(๐‘ โˆ’ 3)!๐‘!

โˆ‘1โ‰ค๐‘–, ๐‘—1, ๐‘—2โ‰ค๐‘|{๐‘–, ๐‘—1, ๐‘—2}|=3

๐บa๐‘› (๐‘๐‘–, ๐‘ ๐‘—1)๐บa๐‘› (๐‘๐‘–, ๐‘ ๐‘—2)

+ (๐‘ โˆ’ 4)!๐‘!

โˆ‘1โ‰ค๐‘–1,๐‘–2, ๐‘—1, ๐‘—2โ‰ค๐‘|{๐‘–1,๐‘–2, ๐‘—1, ๐‘—2}|=4

๐บa๐‘› (๐‘๐‘–1 , ๐‘ ๐‘—1)๐บa๐‘› (๐‘๐‘–2 , ๐‘ ๐‘—2),

๐‘ = ๐‘› + ๐‘š and ๐‘๐‘– = ๐‘‹๐‘– if ๐‘– โ‰ค ๐‘› and ๐‘Œ๐‘–โˆ’๐‘› if ๐‘– > ๐‘›. This leads to the following test statistic

๐‘‡HOM๐‘›,a๐‘›

=๐‘›๐‘š

โˆš2(๐‘› + ๐‘š)

ยท ๏ฟฝ๏ฟฝโˆ’1๐‘›,๐‘š,a๐‘›

ยท ๐›พ2a๐‘› (P,Q).

As before, we can show

Theorem 10. Let a๐‘› โ†’ โˆž as ๐‘› โ†’ โˆž in such a fashion that a๐‘› = ๐‘œ(๐‘›4/๐‘‘). Then under ๐ปHOM0 :

๐‘ = ๐‘ž โˆˆ W๐‘ ,2(๐‘€),

๐‘‡HOM๐‘›,a๐‘›

โ†’๐‘‘ ๐‘ (0, 1), as ๐‘›โ†’ โˆž.

Motivated by Theorem 10, we can consider a test, denoted by ฮฆHOM๐‘›,a๐‘›,๐›ผ

, that rejects ๐ปHOM0 if and

only if ๐‘‡HOM๐‘›,a๐‘›

exceeds ๐‘ง๐›ผ. By construction, ฮฆHOM๐‘›,a๐‘›,๐›ผ

is an asymptotic ๐›ผ level test. We now turn to

study its power against ๐ปHOM1 . As in the case of goodness of fit test, we can prove that ฮฆHOM

๐‘›,a๐‘›,๐›ผis

minimax optimal in that it can detect the smallest difference between ๐‘ and ๐‘ž in terms of rate of

convergence. More precisely, we have

Theorem 11. (i) Assume that ๐‘›2๐‘ /(๐‘‘+4๐‘ )ฮ”๐‘› โ†’ โˆž. Then for any ๐›ผ โˆˆ (0, 1),

lim๐‘›โ†’โˆž

power{ฮฆHOM๐‘›,a๐‘›,๐›ผ

;๐ปHOM1 (ฮ”๐‘›; ๐‘ )} โ†’ 1,

provided that a๐‘› ๏ฟฝ ๐‘›4/(๐‘‘+4๐‘ ) .

(ii) Conversely, if lim inf๐‘›โ†’โˆž ๐‘›2๐‘ /(๐‘‘+4๐‘ )ฮ”๐‘› < โˆž, then there exists some ๐›ผ โˆˆ (0, 1) such that for

28

Page 40: On the Construction of Minimax Optimal Nonparametric Tests ...

any test ฮฆ๐‘› of level ๐›ผ (asymptotically) based on ๐‘‹1, . . . , ๐‘‹๐‘› โˆผ ๐‘ and

๐‘Œ1, . . . , ๐‘Œ๐‘š โˆผ ๐‘ž,

lim inf๐‘›โ†’โˆž

power{ฮฆ๐‘›;๐ปHOM1 (ฮ”๐‘›; ๐‘ )} < 1.

3.3 Test for Independence

Similarly, we can also use Gaussian kernel embedding to construct minimax optimal tests of

independence. Let ๐‘‹ = (๐‘‹1, . . . , ๐‘‹ ๐‘˜ )> โˆˆ R๐‘‘ be a random vector where the subvectors ๐‘‹ ๐‘— โˆˆ R๐‘‘ ๐‘—

for ๐‘— = 1, . . . , ๐‘˜ so that ๐‘‘1 + ยท ยท ยท + ๐‘‘๐‘˜ = ๐‘‘. Denote by ๐‘ the joint density function of ๐‘‹ , and ๐‘ ๐‘—

the marginal density of ๐‘‹ ๐‘— . We assume that both the joint density and the marginal densities are

smooth. Specifically, we shall consider testing

๐ปIND0 : ๐‘ = ๐‘1 โŠ— ยท ยท ยท โŠ— ๐‘๐‘˜ , ๐‘ ๐‘— โˆˆ W๐‘ ,2(๐‘€ ๐‘— ), 1 โ‰ค ๐‘— โ‰ค ๐‘˜

against a smooth departure from independence:

๐ปIND1 (ฮ”๐‘›; ๐‘ ) : ๐‘ โˆˆ W๐‘ ,2(๐‘€), ๐‘ ๐‘— โˆˆ W๐‘ ,2(๐‘€ ๐‘— ), 1 โ‰ค ๐‘— โ‰ค ๐‘˜ and โ€–๐‘ โˆ’ ๐‘1 โŠ— ยท ยท ยท โŠ— ๐‘๐‘˜ โ€–๐ฟ2 โ‰ฅ ฮ”๐‘›,

where ๐‘€ =๐‘˜โˆ๐‘—=1๐‘€ ๐‘— so that ๐‘1 โŠ— ยท ยท ยท โŠ— ๐‘๐‘˜ โˆˆ W๐‘ ,2(๐‘€) under both null and alternative hypotheses.

Given a sample {๐‘‹1, . . . , ๐‘‹๐‘›} of independent copies of ๐‘‹ , we can naturally estimate the so-

called dHSIC ๐›พ2a๐‘›(P, P๐‘‹1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ ) by

๐›พ2a๐‘›(P๐‘›, P๐‘‹

1

๐‘› โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜๐‘› ) =1๐‘›2

โˆ‘1โ‰ค๐‘–, ๐‘—โ‰ค๐‘›

๐บa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— )

+ 1๐‘›2๐‘˜

โˆ‘1โ‰ค๐‘–1,...,๐‘–๐‘˜ , ๐‘—1..., ๐‘—๐‘˜โ‰ค๐‘›

๐บa๐‘› ((๐‘‹1๐‘–1, . . . , ๐‘‹ ๐‘˜๐‘–๐‘˜ ), (๐‘‹

1๐‘—1, . . . , ๐‘‹ ๐‘˜๐‘—๐‘˜ ))

โˆ’ 2๐‘›๐‘˜+1

โˆ‘1โ‰ค๐‘–, ๐‘—1,..., ๐‘—๐‘˜โ‰ค๐‘›

๐บa๐‘› (๐‘‹๐‘–, (๐‘‹1๐‘—1, . . . , ๐‘‹ ๐‘˜๐‘—๐‘˜ )).

29

Page 41: On the Construction of Minimax Optimal Nonparametric Tests ...

To correct for the bias, we shall consider the following estimate of ๐›พ2a๐‘›(P, P๐‘‹1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ ) instead.

๐›พ2a๐‘› (P, P

๐‘‹1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ )

=1

๐‘›(๐‘› โˆ’ 1)โˆ‘

1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›๐บa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— )

+ (๐‘› โˆ’ 2๐‘˜)!๐‘›!

โˆ‘1โ‰ค๐‘–1,ยทยทยท ,๐‘–๐‘˜ , ๐‘—1,ยทยทยท , ๐‘—๐‘˜โ‰ค๐‘›|{๐‘–1,ยทยทยท ,๐‘–๐‘˜ , ๐‘—1,ยทยทยท , ๐‘—๐‘˜ }|=2๐‘˜

๐บa๐‘› ((๐‘‹1๐‘–1, . . . , ๐‘‹ ๐‘˜๐‘–๐‘˜ ), (๐‘‹

1๐‘—1, . . . , ๐‘‹ ๐‘˜๐‘—๐‘˜ ))

โˆ’ 2(๐‘› โˆ’ ๐‘˜ โˆ’ 1)!๐‘›!

โˆ‘1โ‰ค๐‘–, ๐‘—1,ยทยทยท , ๐‘—๐‘˜โ‰ค๐‘›

|{๐‘–, ๐‘—1,ยทยทยท , ๐‘—๐‘˜ }|=๐‘˜+1

๐บa๐‘› (๐‘‹๐‘–, (๐‘‹1๐‘—1, . . . , ๐‘‹ ๐‘˜๐‘—๐‘˜ )).

Under ๐ปIND0 , we have

E๐›พ2a๐‘› (P, P

๐‘‹1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ ) = 0.

Deriving its variance, however, requires a bit more work. Write

โ„Ž ๐‘— (๐‘ฅ ๐‘— , ๐‘ฆ) = E๐‘‹โˆผP๐‘‹1โŠ—ยทยทยทโŠ—P๐‘‹๐‘˜๐บa๐‘› ((๐‘‹1, . . . , ๐‘‹ ๐‘—โˆ’1, ๐‘ฅ ๐‘— , ๐‘‹ ๐‘—+1, . . . , ๐‘‹ ๐‘˜ ), ๐‘ฆ)

and

๐‘” ๐‘— (๐‘ฅ ๐‘— , ๐‘ฆ) = โ„Ž ๐‘— (๐‘ฅ ๐‘— , ๐‘ฆ) โˆ’ E๐‘‹ ๐‘—โˆผP๐‘‹ ๐‘— โ„Ž ๐‘— (๐‘‹๐‘— , ๐‘ฆ) โˆ’ E๐‘ŒโˆผPโ„Ž ๐‘— (๐‘ฅ ๐‘— , ๐‘Œ ) + E(๐‘‹ ๐‘— ,๐‘Œ )โˆผP๐‘‹ ๐‘— โŠ—Pโ„Ž ๐‘— (๐‘‹

๐‘— , ๐‘Œ ).

With slight abuse of notation, also denote by

โ„Ž ๐‘—1, ๐‘—2 (๐‘ฅ ๐‘—1 , ๐‘ฆ ๐‘—2) = E๐‘‹,๐‘ŒโˆผiidP๐‘‹1โŠ—ยทยทยทโŠ—P๐‘‹๐‘˜๐บa๐‘› ((๐‘‹1, . . . , ๐‘‹ ๐‘—1โˆ’1, ๐‘ฅ ๐‘—1 , ๐‘‹ ๐‘—1+1, . . . , ๐‘‹ ๐‘˜ ),

(๐‘Œ1, . . . , ๐‘Œ ๐‘—2โˆ’1, ๐‘ฆ ๐‘—2 , ๐‘Œ ๐‘—2+1, . . . , ๐‘Œ ๐‘˜ ))

30

Page 42: On the Construction of Minimax Optimal Nonparametric Tests ...

and

๐‘” ๐‘—1, ๐‘—2 (๐‘ฅ ๐‘—1 , ๐‘ฆ ๐‘—2) =โ„Ž ๐‘—1, ๐‘—2 (๐‘ฅ ๐‘—1 , ๐‘ฆ ๐‘—2) โˆ’ E๐‘‹ ๐‘—1โˆผP๐‘‹ ๐‘—1 โ„Ž ๐‘—1, ๐‘—2 (๐‘‹๐‘—1 , ๐‘ฆ ๐‘—2)

โˆ’ E๐‘‹ ๐‘—2โˆผP๐‘‹ ๐‘—2 โ„Ž ๐‘—1, ๐‘—2 (๐‘ฅ

๐‘—1 , ๐‘‹ ๐‘—2) + E(๐‘‹ ๐‘—1 ,๐‘Œ ๐‘—2 )โˆผP๐‘‹ ๐‘—1 โŠ—P๐‘‹ ๐‘—2 โ„Ž ๐‘—1, ๐‘—2 (๐‘‹๐‘—1 , ๐‘Œ ๐‘—2).

Then we have

Lemma 1. Under ๐ปIND0 ,

var(๐›พ2a๐‘› (P, P

๐‘‹1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ ))=

2๐‘›(๐‘› โˆ’ 1)

(E๏ฟฝ๏ฟฝ2

a๐‘›(๐‘‹,๐‘Œ ) โˆ’ 2

โˆ‘1โ‰ค ๐‘—โ‰ค๐‘˜

E(๐‘” ๐‘— (๐‘‹ ๐‘— , ๐‘Œ )

)2

+โˆ‘

1โ‰ค ๐‘—1, ๐‘—2โ‰ค๐‘˜E

(๐‘” ๐‘—1, ๐‘—2 (๐‘‹ ๐‘—1 , ๐‘Œ ๐‘—2)

)2)+๐‘‚ (E๐บ2a๐‘› (๐‘‹,๐‘Œ )/๐‘›3). (3.4)

In light of Lemma 1, a variance estimator can be derived by estimating the leading term on the

righthand side of (3.4) term by term using ๐‘ˆ-statistics. Formulae for estimating the variance for

general ๐‘˜ are tedious and we defer them to the appendix for space consideration. In the special

case when ๐‘˜ = 2, the leading term on the righthand side of (3.4) takes a much simplified form:

2๐‘›(๐‘› โˆ’ 1)E๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘Œ1) ยท E๏ฟฝ๏ฟฝa๐‘› (๐‘‹2, ๐‘Œ2),

where ๐‘‹ ๐‘— , ๐‘Œ ๐‘— โˆผiid P๐‘‹ ๐‘— for ๐‘— = 1, 2. Thus, we can estimate E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹ ๐‘— , ๐‘Œ ๐‘— )]2 by

๐‘ 2๐‘›, ๐‘— ,a๐‘› =1

๐‘›(๐‘› โˆ’ 1)โˆ‘

1โ‰ค๐‘–1โ‰ ๐‘–2โ‰ค๐‘›๐บ2a๐‘› (๐‘‹

๐‘—

๐‘–1, ๐‘‹

๐‘—

๐‘–2)

โˆ’ 2(๐‘› โˆ’ 3)!๐‘›!

โˆ‘1โ‰ค๐‘–,๐‘™1,๐‘™2โ‰ค๐‘›|{๐‘–,๐‘™1,๐‘™2}|=3

๐บa๐‘› (๐‘‹๐‘—

๐‘–, ๐‘‹

๐‘—

๐‘™1)๐บa๐‘› (๐‘‹

๐‘—

๐‘–, ๐‘‹

๐‘—

๐‘™2)

+ (๐‘› โˆ’ 4)!๐‘›!

โˆ‘1โ‰ค๐‘–1,๐‘–2,๐‘™1,๐‘™2โ‰ค๐‘›|{๐‘–1,๐‘–2,๐‘™1,๐‘™2}|=4

๐บa๐‘› (๐‘‹๐‘—

๐‘–1, ๐‘‹

๐‘—

๐‘™1)๐บa๐‘› (๐‘‹

๐‘—

๐‘–2, ๐‘‹

๐‘—

๐‘™2)

31

Page 43: On the Construction of Minimax Optimal Nonparametric Tests ...

and var(๐›พ2a๐‘› (P, P๐‘‹

1 โŠ— P๐‘‹2)) by 2/[๐‘›(๐‘› โˆ’ 1)] ๏ฟฝ๏ฟฝ2๐‘›,a๐‘› where

๏ฟฝ๏ฟฝ2๐‘›,a๐‘› := max{๐‘ 2๐‘›,1,a๐‘›๐‘ 

2๐‘›,2,a๐‘› , 1/๐‘›

2}.

so that a test statistic for ๐ปIND0 is

๐‘‡ IND๐‘›,a๐‘›

:=๐‘›โˆš

2๏ฟฝ๏ฟฝโˆ’1๐‘›,a๐‘›

๐›พ2a๐‘› (P, P

๐‘‹1 โŠ— P๐‘‹2).

Test statistics for general ๐‘˜ > 2 can be defined accordingly. Again, we have

Theorem 12. Let a๐‘› โ†’ โˆž as ๐‘›โ†’ โˆž in such a fashion that a๐‘› = ๐‘œ(๐‘›4/๐‘‘). Then under ๐ปIND0 ,

๐‘‡ IND๐‘›,a๐‘›

โ†’๐‘‘ ๐‘ (0, 1), as ๐‘›โ†’ โˆž.

Motivated by Theorem 12, we can consider a test, denoted by ฮฆIND๐‘›,a๐‘›,๐›ผ

, that rejects ๐ปIND0 if and

only if ๐‘‡ IND๐‘›,a๐‘›

exceeds ๐‘ง๐›ผ. By construction, ฮฆIND๐‘›,a๐‘›,๐›ผ

is an asymptotic ๐›ผ level test. We now turn to

study its power against ๐ปIND1 . As in the case of goodness of fit test, we can prove that ฮฆHOM

๐‘›,a๐‘›,๐›ผis

minimax optimal in that it can detect the smallest departure from independence in terms of rate of

convergence. More precisely, we have

Theorem 13. (i) Assume that ๐‘›2๐‘ /(๐‘‘+4๐‘ )ฮ”๐‘› โ†’ โˆž. Then for any ๐›ผ โˆˆ (0, 1),

lim๐‘›โ†’โˆž

power{ฮฆIND๐‘›,a๐‘›,๐›ผ

;๐ปIND1 (ฮ”๐‘›; ๐‘ )} โ†’ 1,

provided that a๐‘› ๏ฟฝ ๐‘›4/(๐‘‘+4๐‘ ) .

(ii) Conversely, if lim inf๐‘›โ†’โˆž ๐‘›2๐‘ /(๐‘‘+4๐‘ )ฮ”๐‘› < โˆž, then there exists some ๐›ผ โˆˆ (0, 1) such that for

any test ฮฆ๐‘› of level ๐›ผ (asymptotically) based on ๐‘‹1, . . . , ๐‘‹๐‘› โˆผ ๐‘,

lim inf๐‘›โ†’โˆž

power{ฮฆ๐‘›;๐ปIND1 (ฮ”๐‘›; ๐‘ )} < 1.

32

Page 44: On the Construction of Minimax Optimal Nonparametric Tests ...

3.4 Adaptation

The results presented in the previous sections not only suggest that Gaussian kernel embedding

of distributions is especially suitable for testing against smooth alternatives, but also indicate the

importance of choosing an appropriate scaling parameter in order to detect small deviation from the

null hypothesis. To achieve maximum power, the scaling parameter should be chosen according

to the smoothness of underlying density functions. This, however, presents a practical challenge

because the level of smoothness is rarely known a priori. This naturally brings about the questions

of adaption: can we devise an agnostic testing procedure that does not require such knowledge

but still attain similar performance? We shall show in this section that this is possible, at least for

sufficiently smooth densities.

3.4.1 Test for Goodness-of-fit

We again begin with the test for goodness-of-fit. As we show in Section 3.1, under ๐ปGOF0 ,

๐‘‡GOF๐‘›,a๐‘›

โ†’๐‘‘ ๐‘ (0, 1) if 1 ๏ฟฝ a๐‘› ๏ฟฝ ๐‘›4/๐‘‘; whereas for any ๐‘ โˆˆ W๐‘ ,2 such that โ€–๐‘โˆ’๐‘0โ€–๐ฟ2 ๏ฟฝ ๐‘›โˆ’2๐‘ /(๐‘‘+4๐‘ ) ,

๐‘‡GOF๐‘›,a๐‘›

โ†’ โˆž provided that a๐‘› ๏ฟฝ ๐‘›4/(๐‘‘+4๐‘ ) . This motivates us to consider the following test statistic:

๐‘‡GOF(adapt)๐‘› = max

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘๐‘‡GOF๐‘›,a๐‘›

.

In light of earlier discussion, it is plausible that such a statistic could be used to detect any smooth

departure from the null provided that the level of smoothness ๐‘  โ‰ฅ ๐‘‘/4. We now argue that this

is indeed the case. More specifically, we shall proceed to reject ๐ปGOF0 if and only if ๐‘‡GOF(adapt)

๐‘›

exceeds the upper ๐›ผ quantile, denoted by ๐‘žGOF๐‘›,๐›ผ , of its null distribution. In what follows, we shall

call this test ฮฆGOF(adapt) . Note that, even though it is hard to derive the analytic form for ๐‘žGOF๐‘›,๐›ผ , it

can be readily evaluated via Monte Carlo method.

To study the power of ฮฆGOF(adapt) against ๐ปGOF1 with different levels of smoothness, we shall

33

Page 45: On the Construction of Minimax Optimal Nonparametric Tests ...

consider the following alternative hypothesis

๐ปGOF(adapt)1 (ฮ”๐‘›,๐‘  : ๐‘  โ‰ฅ ๐‘‘/4) : ๐‘ โˆˆ

โ‹ƒ๐‘ โ‰ฅ๐‘‘/4

{๐‘ โˆˆ W๐‘ ,2(๐‘€) : โ€–๐‘ โˆ’ ๐‘0โ€–๐ฟ2 โ‰ฅ ฮ”๐‘›,๐‘ }.

The following theorem characterizes the power of ฮฆGOF(adapt) against ๐ปGOF(adapt)1 (ฮ”๐‘›,๐‘  : ๐‘  โ‰ฅ ๐‘‘/4).

Theorem 14. There exists a constant ๐‘ > 0 such that if

lim inf๐‘›โ†’โˆž

ฮ”๐‘›,๐‘  (๐‘›/log log ๐‘›)2๐‘ /(๐‘‘+4๐‘ ) > ๐‘,

then

power{ฮฆGOF(adapt);๐ปGOF(adapt)1 (ฮ”๐‘›,๐‘  : ๐‘  โ‰ฅ ๐‘‘/4)} โ†’ 1.

Theorem 14 shows that ฮฆGOF(adapt) has a detection boundary of the order (log log ๐‘›/๐‘›) 2๐‘ ๐‘‘+4๐‘ 

when ๐‘ โˆˆ W๐‘ ,2 for any ๐‘  โ‰ฅ ๐‘‘/4. If ๐‘  is known in advance, as we show in Section 3.1, the optimal

test is based on ๐‘‡GOF๐‘›,a๐‘›

with a๐‘› ๏ฟฝ ๐‘›4/(๐‘‘+4๐‘ ) and has a detection boundary of the order ๐‘‚ (๐‘›โˆ’2๐‘ /(๐‘‘+4๐‘ )).

The extra polynomial of iterated logarithmic factor (log log ๐‘›)2๐‘ /(๐‘‘+4๐‘ ) is the price we pay to ensure

that no knowledge of ๐‘  is required and ฮฆGOF(adapt) is powerful against smooth alternatives for all

๐‘  โ‰ฅ ๐‘‘/4.

3.4.2 Test for Homogeneity

The treatment for homogeneity tests is similar. Instead of ๐‘‡HOM๐‘›,a๐‘›

, we now consider a test based

on

๐‘‡HOM(adapt)๐‘› = max

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘๐‘‡HOM๐‘›,a๐‘›

.

If ๐‘‡HOM(adapt)๐‘› exceeds the upper ๐›ผ quantile, denoted by ๐‘žHOM

๐‘›,๐›ผ , of its null distribution, then we

reject ๐ปHOM0 . In what follows, we shall refer to this test as ฮฆHOM(adapt) . As before, we do not

have a closed form expression for ๐‘žHOM๐‘›,๐›ผ , and it needs to be evaluated via Monte Carlo method. In

particular, in the case of homogeneity test, we can approximate ๐‘žHOM๐‘›,๐›ผ by permutation where we

34

Page 46: On the Construction of Minimax Optimal Nonparametric Tests ...

randomly shuffle {๐‘‹1, . . . , ๐‘‹๐‘›, ๐‘Œ1, . . . , ๐‘Œ๐‘š} and compute the test statistic as if the first ๐‘› shuffled

observations are from the first population whereas the other ๐‘š are from the second population.

This is repeated multiple times in order to approximate the critical value ๐‘žHOM๐‘›,๐›ผ .

The following theorem characterize the power of ฮฆHOM(adapt) against an alternative with dif-

ferent levels of smoothness

๐ปHOM(adapt)1 (ฮ”๐‘›,๐‘  : ๐‘  โ‰ฅ ๐‘‘/4) : (๐‘, ๐‘ž) โˆˆ

โ‹ƒ๐‘ โ‰ฅ๐‘‘/4

{(๐‘, ๐‘ž) : ๐‘, ๐‘ž โˆˆ W๐‘ ,2(๐‘€), โ€–๐‘ โˆ’ ๐‘žโ€–๐ฟ2 โ‰ฅ ฮ”๐‘›,๐‘ }.

Theorem 15. There exists a constant ๐‘ > 0 such that if

lim inf๐‘›โ†’โˆž

ฮ”๐‘›,๐‘  (๐‘›/log log ๐‘›)2๐‘ /(๐‘‘+4๐‘ ) > ๐‘,

then

power{ฮฆHOM(adapt);๐ปHOM(adapt)1 (ฮ”๐‘›,๐‘  : ๐‘  โ‰ฅ ๐‘‘/4)} โ†’ 1.

Similar to the case of goodness-of-fit test, Theorem 15 shows that ฮฆHOM(adapt) has a detection

boundary of the order ๐‘‚ ((๐‘›/log log ๐‘›)โˆ’2๐‘ /(๐‘‘+4๐‘ )) when ๐‘ โ‰  ๐‘ž โˆˆ W๐‘ ,2 for any ๐‘  โ‰ฅ ๐‘‘/4. In light of

the results from Section 3.2, this is optimal up to an extra polynomial of iterated logarithmic factor.

The main advantage is that ฮฆHOM(adapt) is powerful against smooth alternatives simultaneously for

all ๐‘  โ‰ฅ ๐‘‘/4.

3.4.3 Test for Independence

Similarly, for independence test, we shall adopt the following test statistic

๐‘‡IND(adapt)๐‘› = max

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘๐‘‡ IND๐‘›,a๐‘›

.

and reject ๐ปIND0 if and only ๐‘‡ IND(adapt)

๐‘› exceeds the upper ๐›ผ quantile, denoted by ๐‘žIND๐‘›,๐›ผ , of its null

distribution. In what follows, we shall refer to this test as ฮฆHOM(adapt) . The critical value, ๐‘žHOM๐‘›,๐›ผ ,

can also be evaluated via permutation test. See, e.g., Pfister et al. (2018) for detailed discussions.

35

Page 47: On the Construction of Minimax Optimal Nonparametric Tests ...

We now show that ฮฆIND(adapt) is powerful in testing against the alternative with different levels

of smoothness

๐ปIND(adapt)1 (ฮ”๐‘›,๐‘  : ๐‘  โ‰ฅ ๐‘‘/4) : ๐‘ โˆˆ

โ‹ƒ๐‘ โ‰ฅ๐‘‘/4

{๐‘ โˆˆ W๐‘ ,2(๐‘€), ๐‘ ๐‘— โˆˆ W๐‘ ,2(๐‘€ ๐‘— ), 1 โ‰ค ๐‘— โ‰ค ๐‘˜,

โ€–๐‘ โˆ’ ๐‘1 โŠ— ยท ยท ยท โŠ— ๐‘๐‘˜ โ€–๐ฟ2 โ‰ฅ ฮ”๐‘›,๐‘ 

}.

More specifically, we have

Theorem 16. There exists a constant ๐‘ > 0 such that if

lim inf๐‘›โ†’โˆž

ฮ”๐‘›,๐‘  (๐‘›/log log ๐‘›)2๐‘ /(๐‘‘+4๐‘ ) > ๐‘,

then

power{ฮฆIND(adapt);๐ปIND(adapt)1 (ฮ”๐‘›,๐‘  : ๐‘  โ‰ฅ ๐‘‘/4)} โ†’ 1.

Similar to before, Theorem 16 shows that ฮฆIND(adapt) is optimal up to an extra polynomial of

iterated logarithmic factor for detecting smooth departure from independence simultaneously for

all ๐‘  โ‰ฅ ๐‘‘/4.

36

Page 48: On the Construction of Minimax Optimal Nonparametric Tests ...

Chapter 4: Numerical Experiments

To further complement our theoretical development and demonstrate the practical merits of

the proposed methodology, we conducted several sets of numerical experiments. We shall mainly

consider Gaussian kernels in this chapter as they are the most popular choices in practice for

continuous data.

4.1 Effect of Scaling Parameter

Our first set of experiments were designed to illustrate the importance of the scaling parameter

and highlight the potential room for improvement over the โ€œmedianโ€ heuristicโ€”one of the most

common data-driven choice of the scaling parameter in practice (see, e.g., Gretton et al., 2008;

Pfister et al., 2018).

โ€ข Experiment I: the homogeneity test with underlying distributions being the normal distribu-

tion and the mixture of several normal distributions. Specifically,

๐‘(๐‘ฅ) = ๐‘“ (๐‘ฅ; 0, 1), ๐‘ž(๐‘ฅ) = 0.5 ร— ๐‘“ (๐‘ฅ; 0, 1) + 0.1 ร—โˆ‘โˆˆ๐๐‘“ (๐‘ฅ; `, 0.05)

where ๐‘“ (๐‘ฅ; `, ๐œŽ) denotes the density of ๐‘ (`, ๐œŽ2) and ๐ = {โˆ’1,โˆ’0.5, 0, 0.5, 1}.

โ€ข Experiment II: the joint independence test of ๐‘‹1, ยท ยท ยท , ๐‘‹5 where

๐‘‹1, ยท ยท ยท , ๐‘‹4, (๐‘‹5)โ€ฒ โˆผiid ๐‘ (0, 1), ๐‘‹5 =๏ฟฝ๏ฟฝ(๐‘‹5)โ€ฒ

๏ฟฝ๏ฟฝ ร— sign

( 4โˆ๐‘™=1

๐‘‹ ๐‘™

).

Clearly ๐‘‹1, ยท ยท ยท , ๐‘‹5 are jointly dependent sinceโˆ๐‘‘๐‘™=1 ๐‘‹

๐‘™ โ‰ฅ 0.

In both experiments, our primary goal is to investigate how the power of Gaussian MMD based

37

Page 49: On the Construction of Minimax Optimal Nonparametric Tests ...

test is influenced by a pre-fixed scaling parameter. These tests are also compared to the ones

with scaling parameter selected via โ€œmedianโ€ heuristic. In order to evaluate tests with different

scaling parameters under a unified framework, we determined the critical values for each test via

permutation test.

For Experiment I we fixed the sample size at ๐‘› = ๐‘š = 200; and for Experiment II at ๐‘› = 400.

The number of permutations was set at 100, and significance level at ๐›ผ = 0.05. We first repeated the

experiments 100 times under the null to verify that permutation tests indeed yield the correct size,

up to Monte Carlo error. Each experiment was then repeated for 100 times and the observed power

(ยฑ one standard error) for different choices of the scaling parameter. The results are summarized in

Figure 4.1. It is perhaps not surprising that the scaling parameter selected via โ€œmedian heuristicโ€

has little variation across each simulation run, and we represent its performance by a single value.

โˆ’1 0 1 2 3 40

0.2

0.4

0.6

0.8

1

log(a)

Pow

er

Single fixed aMedian

โˆ’3 โˆ’2 โˆ’1 0 1 20

0.2

0.4

0.6

0.8

1

log(a)

Figure 4.1: Observed power against log(a) in Experiment I (left) and Experiment II(right).

The importance of the scaling parameter is evident from Figure 4.1 with the observed power

varies quite significantly for different choices. It is also of interest to note that in these settings the

โ€œmedianโ€ heuristic typically does not yield a scaling parameter with great power. More specifically,

in Experiment I, log(amedian) โ‰ˆ 0.2 and maximum power is attained at log(a) = 4; in Experiment

II, log(amedian) โ‰ˆ โˆ’2.15 and maximum power is attained at log(a) = 1. This suggests that more

appropriate choice of the scaling parameter may lead to much improved performance.

38

Page 50: On the Construction of Minimax Optimal Nonparametric Tests ...

4.2 Efficacy of Adaptation

Our second experiment aims to illustrate that the adaptive procedures we proposed in Section

3.4 indeed yield more powerful tests when compared with other alternatives that are commonly

used in practice. In particular, we compare the proposed self-normalized adaptive test (S.A.)

with a couple of data-driven approaches, namely the โ€œmedianโ€ heuristic (Median) and the un-

normalized adaptive test (U.A.) proposed in Sriperumbudur et al. (2009). When computing both

self-normalized and unnormalized test statistics, we first rescaled the squared distance โ€–๐‘‹๐‘– โˆ’ ๐‘‹ ๐‘— โ€–2

by the dimensionality ๐‘‘ before taking maximum within a certain range of the scaling parameter.

We considered two experiment setups:

โ€ข Experiment III: the homogeneity test with the underlying distributions being

๐‘ƒ โˆผ ๐‘ (0, ๐ผ๐‘‘), ๐‘„ โˆผ ๐‘(0,

(1 + 2๐‘‘โˆ’1/2

)๐ผ๐‘‘

).

As the โ€˜signal strengthโ€™, the ratio between the variances of ๐‘„ and ๐‘ƒ in each single direction

is set to decrease to 1 at the order 1/โˆš๐‘‘ with ๐‘‘, which is the decreasing order of variance

ratio that can be detected by the classical ๐น-test.

โ€ข Experiment IV: the independence test of ๐‘‹1, ๐‘‹2 โˆˆ R๐‘‘/2, where ๐‘‹ = (๐‘‹1, ๐‘‹2) follows a

mixture of

๐‘ (0, ๐ผ๐‘‘) and ๐‘

(0, (1 + 6๐‘‘โˆ’3/5)๐ผ๐‘‘

)with mixture probability being 0.5. Similarly, the ratio between the variances in each direc-

tion is set to decrease with ๐‘‘, but at a slightly higher rate.

To better compare different methods, we considered different combinations of sample size and

dimensionality for each experiment. More specifically, for Experiment III, the sample sizes were

set to be ๐‘š = ๐‘› = 25, 50, 75, ยท ยท ยท , 200 and dimension ๐‘‘ = 1, 10, 100, 1000; for Experiment IV, the

sample size were ๐‘› = 100, 200, ยท ยท ยท , 600 and dimension ๐‘‘ = 2, 10, 100, 1000. In both experiments,

39

Page 51: On the Construction of Minimax Optimal Nonparametric Tests ...

we fixed the significance level at ๐›ผ = 0.05, did 100 permutations to calibrate the critical values as

before. Again we simulated under ๐ป0 to verify that the resulting tests have the targeted size, up to

Monte Carlo error. The power of each method, estimated from 100 such experiments, is reported

in Figures 4.2 and 4.3.

50 100 150 2000

0.2

0.4

0.6

0.8

1

๐‘›

Pow

er

MedianU.A.S.A.

50 100 150 2000

0.2

0.4

0.6

0.8

1

๐‘›50 100 150 200

0

0.2

0.4

0.6

0.8

1

๐‘›50 100 150 200

0

0.2

0.4

0.6

0.8

1

๐‘›

Figure 4.2: Observed power versus sample size in Experiment III for ๐‘‘ = 1, 10, 100, 1000 fromleft to right.

200 400 6000

0.2

0.4

0.6

0.8

1

๐‘›

Pow

er

MedianU.A.S.A.

200 400 6000

0.2

0.4

0.6

0.8

1

๐‘›200 400 600

0

0.2

0.4

0.6

0.8

1

๐‘›200 400 600

0

0.2

0.4

0.6

0.8

1

๐‘›

Figure 4.3: Observed power versus sample size in Experiment IV for ๐‘‘ = 2, 10, 100, 1000 fromleft to right.

As Figures 4.2 and 4.3 show, for both experiments, these tests are comparable in low-dimensional

settings. But as ๐‘‘ increases, the proposed self-normalized adaptive test becomes more and more

preferable to the two alternatives. For example, for Experiment IV, when ๐‘‘ = 1000, the observed

power of the proposed self-normalized adaptive test is about 90% when ๐‘› = 600, while the other

two tests have power around only 15%.

40

Page 52: On the Construction of Minimax Optimal Nonparametric Tests ...

4.3 Data Example

Finally, we considered applying the proposed self-normalized adaptive test in a data example

from Mooij et al. (2016). The data set consists of three variables, altitude (Alt), average temper-

ature (Temp) and average duration of sunshine (Sun) from different weather stations. One goal

of interest is to figure out the causal relationship among the three variables by figuring out a suit-

able directed acyclic graph (DAG) among them. Following Peters et al. (2014), if a set of random

variables ๐‘‹1, ยท ยท ยท ,๐‘‹๐‘‘ follow a DAG G0, then we assume that they follow a sequence of additive

models:

๐‘‹ ๐‘™ =โˆ‘๐‘ŸโˆˆPA๐‘™

๐‘“๐‘™,๐‘Ÿ (๐‘‹๐‘Ÿ) + ๐‘ ๐‘™ , โˆ€ 1 โ‰ค ๐‘™ โ‰ค ๐‘‘,

where ๐‘ ๐‘™โ€™s are independent Gaussian noises and PA๐‘™ denotes the collection of parent nodes of node

๐‘™ specified by G0. As shown by (Peters et al., 2014), G0 is identifiable from the joint distribution

of ๐‘‹1, ยท ยท ยท , ๐‘‹๐‘‘ under the assumption of ๐‘“๐‘™,๐‘Ÿโ€™s being non-linear. Therefore a natural method of

deciding a specific DAG underlying a set of random variables is by testing the independence of the

regression residuals after fitting the DAG induced additive models. In our case, there are totally

25 possible DAGs for the three variables. We can apply independence tests for the residuals for

each of the 25 DAGs and choose the one with the largest ๐‘-value as the most plausible underlying

DAG. See Peters et al. (2014) for more details.

As before, we considered three different ways for independence tests: the proposed self-

normalized adaptive test (S.A.), Gaussian kernel embedding based independent test with the

scaling parameter determined by the โ€œmedianโ€ heuristic (Median), and the unnormalized adaptive

test from Sriperumbudur et al. (2009) (U.A.). Note that the three variables have different scales

and we standardize them before applying the tests of independence.

The overall sample size of the data set is 349. Each time we randomly select 150 samples

and compute the ๐‘-value associated with each DAG. The ๐‘-value is again computed based on 100

permutations. We repeated the experiment for 1000 times and recorded for each test the DAG with

the largest ๐‘-value. All three tests agree on the top three most selected DAGs and they are shown

41

Page 53: On the Construction of Minimax Optimal Nonparametric Tests ...

in Figure 4.4.

Alt

Temp Sun

DAG I

Alt

Temp Sun

DAG II

Alt

Temp Sun

DAG III

Figure 4.4: DAGs with the top 3 highest probabilities of being selected.

In addition, we report in Table 4.1 the frequencies that these three DAGs were selected by each

of the tests. They are generally comparable with the proposed method more consistently selecting

DAG I, the one heavily favored by all three methods.

Test

Prob(%) DAGI II III

Median 78.5 4.7 14.5U.A. 81.4 8.1 8.5S.A. 83.4 9.8 4.7

Table 4.1: Frequency that each DAG in Figure 4.4 was selected by three tests.

42

Page 54: On the Construction of Minimax Optimal Nonparametric Tests ...

Chapter 5: Conclusion and Discussion

In this thesis, we aim to address the problem of kernel selection when using kernel embed-

ding for the purpose of nonparametric hypothesis testing, which is an inevitable problem that

researchers and practitioners have been trying to answer ever since the first proposal of kernel em-

bedding method while most of the existing solutions are ad-hoc. We propose principled ways of

kernel selection in two different settings which are proved to ensure minimax rate optimality for

the associated tests. We also propose adaptive test statistics to address the issue of aforementioned

kernel selection methods depending on the regularity condition of the underlying space of prob-

ability distributions, whose sacrifice in terms of detection boundary is only some polynomial of

iterated logarithmic factor of the sample size.

There are still many interesting problems in this area which remain to be explored further.

For example, can we adopt fast computation techniques to compute the kernel based test statistic

approximately so as to reduce the computation complexity to a large extent while maintaining the

statistical optimality? Parallel results in the context of regression have been derived but there seems

to be a lack of such results in hypothesis testing. The second one involves resampling methods such

as permutation method and bootstrap method. In practice, with the concern that the sample size

may not be large enough, resampling methods are usually used to decide the rejection boundary.

Can we still ensure the statistical optimality of the proposed tests when incorporating resampling

methods?

In addition to that, it is also wondered whether similar principled kernel selection methods can

be proposed in a broader range of nonparametric testing problems such as conditional independent

test, which can be very useful in Bayesian network learning and causal discovery.

43

Page 55: On the Construction of Minimax Optimal Nonparametric Tests ...

Chapter 6: Proofs

Throughout this chapter, we shall write ๐‘Ž๐‘› . ๐‘๐‘› if there exists a universal constant ๐ถ > 0 such

that ๐‘Ž๐‘› โ‰ค ๐ถ๐‘๐‘›. Similarly, we write ๐‘Ž๐‘› & ๐‘๐‘› if ๐‘๐‘› . ๐‘Ž๐‘›, and ๐‘Ž๐‘› ๏ฟฝ ๐‘๐‘› if ๐‘Ž๐‘› . ๐‘๐‘› and ๐‘Ž๐‘› & ๐‘๐‘›.

When the the constant depends on another quantity ๐ท, we shall write ๐‘Ž๐‘› .๐ท ๐‘๐‘›. Relations &๐ท

and ๏ฟฝ๐ท are defined accordingly.

Proof of Theorem 1. Part (i). The proof of the first part consists of two key steps. First, we show

that the population counterpart ๐‘›๐›พ2(P, P0) of the test statistic converges to โˆž uniformly, i.e.,

๐‘› inf๐‘ƒโˆˆP(ฮ”๐‘›,0)

๐›พ2(P, P0) โ†’ โˆž.

Then, we argue that the deviation from ๐›พ2(P, P0) to ๐›พ2(P๐‘›, P0) is uniformly negligible compared

with ๐›พ2(P, P0) itself.

It is not hard to see that

๐›พ(P๐‘›, P0) =

โˆšโˆšโˆ‘๐‘˜โ‰ฅ1

_๐‘˜

[1๐‘›

๐‘›โˆ‘๐‘–=1

๐œ™๐‘˜ (๐‘‹๐‘–)]2

โ‰ฅโˆšโˆ‘๐‘˜โ‰ฅ1

_๐‘˜ [EP๐œ™๐‘˜ (๐‘‹)]2 โˆ’

โˆšโˆšโˆ‘๐‘˜โ‰ฅ1

_๐‘˜

[1๐‘›

๐‘›โˆ‘๐‘–=1

๐œ™๐‘˜ (๐‘‹๐‘–) โˆ’ EP๐œ™๐‘˜ (๐‘‹)]2.

44

Page 56: On the Construction of Minimax Optimal Nonparametric Tests ...

Thus,

๐‘ƒ

{๐‘›๐›พ2(P๐‘›, P0) < ๐‘ž๐‘ค,1โˆ’๐›ผ

}โ‰ค๐‘ƒ

โˆš๐‘›โˆ‘๐‘˜โ‰ฅ1

_๐‘˜ [EP๐œ™๐‘˜ (๐‘‹)]2 โˆ’

โˆšโˆš๐‘›โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

[1๐‘›

๐‘›โˆ‘๐‘–=1

๐œ™๐‘˜ (๐‘‹๐‘–) โˆ’ EP๐œ™๐‘˜ (๐‘‹)]2<โˆš๐‘ž๐‘ค,1โˆ’๐›ผ

=๐‘ƒ

โˆšโˆš๐‘›โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

[1๐‘›

๐‘›โˆ‘๐‘–=1

๐œ™๐‘˜ (๐‘‹๐‘–) โˆ’ EP๐œ™๐‘˜ (๐‘‹)]2>

โˆš๐‘›โˆ‘๐‘˜โ‰ฅ1

_๐‘˜ [EP๐œ™๐‘˜ (๐‘‹)]2 โˆ’ โˆš๐‘ž๐‘ค,1โˆ’๐›ผ

.Suppose that

๐‘›โˆ‘๐‘˜โ‰ฅ1

_๐‘˜ [EP๐œ™๐‘˜ (๐‘‹)]2 > ๐‘ž๐‘ค,1โˆ’๐›ผ .

Then

๐‘ƒ

{๐‘›๐›พ2(P๐‘›, P0) < ๐‘ž๐‘ค,1โˆ’๐›ผ

}โ‰คEP

{๐‘›

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

[1๐‘›

๐‘›โˆ‘๐‘–=1๐œ™๐‘˜ (๐‘‹๐‘–) โˆ’ EP๐œ™๐‘˜ (๐‘‹)

]2}{โˆš

๐‘›โˆ‘๐‘˜โ‰ฅ1

_๐‘˜ [EP๐œ™๐‘˜ (๐‘‹)]2 โˆ’ โˆš๐‘ž๐‘ค,1โˆ’๐›ผ

}2 .

Observe that for any ๐‘ƒ โˆˆ P(ฮ”๐‘›, 0),

EP

{๐‘›โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

[1๐‘›

๐‘›โˆ‘๐‘–=1

๐œ™๐‘˜ (๐‘‹๐‘–) โˆ’ EP๐œ™๐‘˜ (๐‘‹)]2}

=โˆ‘๐‘˜โ‰ฅ1

_๐‘˜Var[๐œ™๐‘˜ (๐‘‹)]

โ‰คโˆ‘๐‘˜โ‰ฅ1

_๐‘˜EP๐œ™2๐‘˜ (๐‘‹)

โ‰ค(sup๐‘˜โ‰ฅ1

โ€–๐œ™๐‘˜ โ€–โˆž)2 โˆ‘

๐‘˜โ‰ฅ1_๐‘˜ < โˆž.

45

Page 57: On the Construction of Minimax Optimal Nonparametric Tests ...

This implies that

lim๐‘›โ†’โˆž

๐›ฝ(ฮฆMMD;ฮ”๐‘›, 0) = lim๐‘›โ†’โˆž

supPโˆˆP(ฮ”๐‘›,0)

๐‘ƒ

{๐‘›๐›พ2(P๐‘›, P0) < ๐‘ž๐‘ค,1โˆ’๐›ผ

}

โ‰ค lim๐‘›โ†’โˆž

supPโˆˆP(ฮ”๐‘›,0)

EP

{๐‘›

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

[1๐‘›

๐‘›โˆ‘๐‘–=1๐œ™๐‘˜ (๐‘‹๐‘–) โˆ’ EP๐œ™๐‘˜ (๐‘‹)

]2}inf

PโˆˆP(ฮ”๐‘›,0)

{โˆš๐‘›

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜ [EP๐œ™๐‘˜ (๐‘‹)]2 โˆ’ โˆš๐‘ž๐‘ค,1โˆ’๐›ผ

}2

=0,

provided that

infPโˆˆP(ฮ”๐‘›,0)

๐‘›โˆ‘๐‘˜โ‰ฅ1

_๐‘˜ [EP๐œ™๐‘˜ (๐‘‹)]2 โ†’ โˆž, as ๐‘›โ†’ โˆž. (6.1)

It now suffices to show that (6.1) holds if ๐‘›ฮ”4๐‘› โ†’ โˆž as ๐‘›โ†’ โˆž.

To this end, let ๐‘ข = ๐‘‘P/๐‘‘P0 โˆ’ 1 and

๐‘Ž๐‘˜ = ใ€ˆ๐‘ข, ๐œ™๐‘˜ใ€‰๐ฟ2 (P0) = EP๐œ™๐‘˜ (๐‘‹) โˆ’ EP0๐œ™๐‘˜ (๐‘‹) = EP(๐œ™๐‘˜ (๐‘‹)).

It is clear the that

โˆ‘๐‘˜โ‰ฅ1

_โˆ’1๐‘˜ ๐‘Ž

2๐‘˜ = โ€–๐‘ขโ€–2

๐พ , andโˆ‘๐‘˜โ‰ฅ1

๐‘Ž2๐‘˜ = โ€–๐‘ขโ€–2

๐ฟ2 (P0) = ๐œ’2(P, P0).

By the definition of P(ฮ”๐‘›, 0),

supPโˆˆP(ฮ”๐‘›,0)

โˆ‘๐‘˜โ‰ฅ1

_โˆ’1๐‘˜ ๐‘Ž

2๐‘˜ โ‰ค ๐‘€2, and inf

PโˆˆP(ฮ”๐‘›,0)

โˆ‘๐‘˜โ‰ฅ1

๐‘Ž2๐‘˜ โ‰ฅ ฮ”๐‘›.

46

Page 58: On the Construction of Minimax Optimal Nonparametric Tests ...

Since ๐‘›ฮ”4๐‘› โ†’ โˆž as ๐‘›โ†’ โˆž, we get

infPโˆˆP(ฮ”๐‘›,0)

๐‘›โˆ‘๐‘˜โ‰ฅ1

_๐‘˜ [EP๐œ™๐‘˜ (๐‘‹)]2 = infPโˆˆP(ฮ”๐‘›,0)

๐‘›โˆ‘๐‘˜โ‰ฅ1

_๐‘˜๐‘Ž2๐‘˜

โ‰ฅ infPโˆˆP(ฮ”๐‘›,0)

๐‘›

( โˆ‘๐‘˜โ‰ฅ1

๐‘Ž2๐‘˜

)2

โˆ‘๐‘˜โ‰ฅ1

_โˆ’1๐‘˜๐‘Ž2๐‘˜

โ‰ฅ๐‘›ฮ”4

๐‘›

๐‘€2 โ†’ โˆž

as ๐‘›โ†’ โˆž.

Part (ii). In proving the second part, we will make use of the following lemma that can be

obtained by adapting the argument in Gregory (1977). It gives the limit distribution of V-statistic

under P๐‘› such that P๐‘› converges to P0 in the order ๐‘›โˆ’1/4.

Lemma 2. Consider a sequence of probability measures {P๐‘› : ๐‘› โ‰ฅ 1} contiguous to P0 satisfying

๐‘ข๐‘› = ๐‘‘P๐‘›/๐‘‘P0 โˆ’ 1 โ†’ 0 in ๐ฟ2(P0). Suppose that for any fixed ๐‘˜ ,

lim๐‘›โ†’โˆž

โˆš๐‘›ใ€ˆ๐‘ข๐‘›, ๐œ™๐‘˜ใ€‰๐ฟ2 (P0) = ๏ฟฝ๏ฟฝ๐‘˜ , and lim

๐‘›โ†’โˆž

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜ (โˆš๐‘›ใ€ˆ๐‘ข๐‘›, ๐œ™๐‘˜ใ€‰๐ฟ2 (P0))2 =

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜ ๏ฟฝ๏ฟฝ2๐‘˜ + ๏ฟฝ๏ฟฝ0 < โˆž,

for some sequence {๏ฟฝ๏ฟฝ๐‘˜ : ๐‘˜ โ‰ฅ 0}, then

1๐‘›

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

[ ๐‘›โˆ‘๐‘–=1

๐œ™๐‘˜ (๐‘‹๐‘–)]2 ๐‘‘โ†’

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜ (๐‘๐‘˜ + ๏ฟฝ๏ฟฝ๐‘˜ )2 + ๏ฟฝ๏ฟฝ0,

where ๐‘‹1, . . . , ๐‘‹๐‘›๐‘–.๐‘–.๐‘‘โˆผ P๐‘›, and ๐‘๐‘˜s are independent standard normal random variables.

Write ๐ฟ (๐‘˜) = _๐‘˜ ๐‘˜2๐‘ . By assumption (2.6),

0 < ๐ฟ := inf๐‘˜โ‰ฅ1

๐ฟ (๐‘˜) โ‰ค sup๐‘˜โ‰ฅ1

๐ฟ (๐‘˜) := ๐ฟ < โˆž.

47

Page 59: On the Construction of Minimax Optimal Nonparametric Tests ...

Consider a sequence of {P๐‘› : ๐‘› โ‰ฅ 1} such that

๐‘‘P๐‘›/๐‘‘P0 โˆ’ 1 = ๐ถ1โˆš_๐‘˜๐‘› [๐ฟ (๐‘˜๐‘›)]โˆ’1๐œ™๐‘˜๐‘› ,

where ๐ถ1 is a positive constant and ๐‘˜๐‘› = b๐ถ2๐‘›14๐‘  c for some positive constant ๐ถ2. Both ๐ถ1 and ๐ถ2

will be determined later. Since sup๐‘˜โ‰ฅ1

โ€–๐œ™๐‘˜ โ€–โˆž < โˆž and lim๐‘˜โ†’โˆž

_๐‘˜ = 0, there exists ๐‘0 > 0 such that P๐‘›โ€™s

are well-defined probability measures for any ๐‘› โ‰ฅ ๐‘0.

Note that

โ€–๐‘ข๐‘›โ€–2๐พ =

๐ถ21

๐ฟ2(๐‘˜๐‘›)โ‰ค ๐ฟโˆ’2๐ถ2

1

and

โ€–๐‘ข๐‘›โ€–2๐ฟ2 (P0) =

๐ถ21_๐‘˜๐‘›

๐ฟ2(๐‘˜๐‘›)=

๐ถ21

๐ฟ (๐‘˜๐‘›)๐‘˜โˆ’2๐‘ ๐‘› โ‰ฅ ๐ฟ

โˆ’1๐ถ2

1 ๐‘˜โˆ’2๐‘ ๐‘› โˆผ ๐ฟ

โˆ’1๐ถ2

1๐ถโˆ’2๐‘ 2 ๐‘›โˆ’1/2,

where ๐ด๐‘› โˆผ ๐ต๐‘› means that lim๐‘›โ†’โˆž

๐ด๐‘›/๐ต๐‘› = 1. Thus, by choosing ๐ถ1 sufficiently small and ๐‘0 =

12๐ฟ

โˆ’1๐ถ2

1๐ถโˆ’2๐‘ 2 , we ensure that P๐‘› โˆˆ P(๐‘0๐‘›

โˆ’1/4, 0) for sufficiently large ๐‘›.

To apply Lemma 2, we note that

lim๐‘›โ†’โˆž

โ€–๐‘ข๐‘›โ€–2๐ฟ2 (P0) = lim

๐‘›โ†’โˆž

๐ถ21_๐‘˜๐‘›

๐ฟ2(๐‘˜๐‘›)= 0.

In addition, for any fixed ๐‘˜ ,

๏ฟฝ๏ฟฝ๐‘›,๐‘˜ =โˆš๐‘›ใ€ˆ๐‘ข๐‘›, ๐œ™๐‘˜ใ€‰๐ฟ2 (P0) = 0

for sufficiently large ๐‘›, and

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜ ๏ฟฝ๏ฟฝ2๐‘›,๐‘˜ =

๐‘›๐ถ21_

2๐‘˜๐‘›

๐ฟ2(๐‘˜๐‘›)= ๐‘›๐ถ2

1 ๐‘˜โˆ’4๐‘ ๐‘› โ†’ ๐ถ2

1๐ถโˆ’4๐‘ 2

48

Page 60: On the Construction of Minimax Optimal Nonparametric Tests ...

as ๐‘›โ†’ โˆž. Thus, Lemma 2 implies that

๐‘›๐›พ(P๐‘›, P0)๐‘‘โ†’

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜๐‘2๐‘˜ + ๐ถ

21๐ถ

โˆ’4๐‘ 2 .

Now take ๐ถ2 =(2๐ถ2

1/๐‘ž๐‘ค,1โˆ’๐›ผ)1/4๐‘  so that ๐ถ2

1๐ถโˆ’4๐‘ 2 = 1

2๐‘ž๐‘ค,1โˆ’๐›ผ. Then

lim inf๐‘›โ†’โˆž

๐›ฝ(ฮฆMMD; ๐‘0๐‘›โˆ’1/2, 0) โ‰ฅ lim

๐‘›โ†’โˆž๐‘ƒ(๐‘›๐›พ(P๐‘›, P0) < ๐‘ž๐‘ค,1โˆ’๐›ผ)

=๐‘ƒ

(โˆ‘๐‘˜โ‰ฅ1

_๐‘˜๐‘2๐‘˜ <

12๐‘ž๐‘ค,1โˆ’๐›ผ

)> 0,

which concludes the proof.

Proof of Theorem 2. Let ๏ฟฝ๏ฟฝ๐‘› (ยท, ยท) := ๏ฟฝ๏ฟฝ๐œš๐‘› (ยท, ยท). Note that

๐‘ฃโˆ’1/2๐‘› [๐‘›[2

๐œš๐‘›(P๐‘›, P0) โˆ’ ๐ด๐‘›] = 2(๐‘›2๐‘ฃ๐‘›)โˆ’1/2

๐‘›โˆ‘๐‘—=2

๐‘—โˆ’1โˆ‘๐‘–=1

๏ฟฝ๏ฟฝ๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ).

Let Z๐‘› ๐‘— =๐‘—โˆ’1โˆ‘๐‘–=1๏ฟฝ๏ฟฝ๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ). Consider a filtration {F๐‘— : ๐‘— โ‰ฅ 1} where F๐‘— = ๐œŽ{๐‘‹๐‘– : 1 โ‰ค ๐‘– โ‰ค ๐‘—}. Due to

the assumption that ๐พ is degenerate, we have E๐œ™๐‘˜ (๐‘‹) = 0 for any ๐‘˜ โ‰ฅ 1, which implies that

E(Z๐‘› ๐‘— |F๐‘—โˆ’1) =๐‘—โˆ’1โˆ‘๐‘–=1E[๏ฟฝ๏ฟฝ๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ) |F๐‘—โˆ’1] =

๐‘—โˆ’1โˆ‘๐‘–=1E[๏ฟฝ๏ฟฝ๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ) |๐‘‹๐‘–] = 0,

for any ๐‘— โ‰ฅ 2.

Write

๐‘ˆ๐‘›๐‘š =

0 ๐‘š = 1๐‘šโˆ‘๐‘—=2Z๐‘› ๐‘— ๐‘š โ‰ฅ 2

.

49

Page 61: On the Construction of Minimax Optimal Nonparametric Tests ...

Then for any fixed ๐‘›, {๐‘ˆ๐‘›๐‘š}๐‘šโ‰ฅ1 is a martingale with respect to {F๐‘š : ๐‘š โ‰ฅ 1} and

๐‘ฃโˆ’1/2๐‘› [๐‘›[2

๐œš๐‘›(P๐‘›, P0) โˆ’ ๐ด๐‘›] = 2(๐‘›2๐‘ฃ๐‘›)โˆ’1/2๐‘ˆ๐‘›๐‘›.

We now apply martingale central limit theorem to ๐‘ˆ๐‘›๐‘›. Following the argument from Hall

(1984), it can be shown that

[12๐‘›2E๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒ)]โˆ’1/2

๐‘ˆ๐‘›๐‘›๐‘‘โ†’ ๐‘ (0, 1), (6.2)

provided that

[E๐บ2๐‘› (๐‘‹, ๐‘‹โ€ฒ) + ๐‘›โˆ’1E๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒ)๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹

โ€ฒโ€ฒ) + ๐‘›โˆ’2E๏ฟฝ๏ฟฝ4๐‘› (๐‘‹, ๐‘‹โ€ฒ)]/[E๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒ)]2 โ†’ 0, (6.3)

as ๐‘›โ†’ โˆž, where ๐บ๐‘› (๐‘ฅ, ๐‘ฅโ€ฒ) = E๏ฟฝ๏ฟฝ๐‘› (๐‘‹, ๐‘ฅ)๏ฟฝ๏ฟฝ๐‘› (๐‘‹, ๐‘ฅโ€ฒ). Since

E๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹โ€ฒ) =

โˆ‘๐‘˜โ‰ฅ1

( _๐‘˜

_๐‘˜ + ๐œš2๐‘›

)2= ๐‘ฃ๐‘›,

(6.2) implies that

๐‘ฃโˆ’1/2๐‘› [๐‘›[2

๐œš๐‘›(P๐‘›, P0) โˆ’ ๐ด๐‘›] =

โˆš2 ยท

(12๐‘›2E๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒ))โˆ’1/2

๐‘ˆ๐‘›๐‘›๐‘‘โ†’ ๐‘ (0, 2).

It therefore suffices to verify (6.3).

Note that

E๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹โ€ฒ) =

โˆ‘๐‘˜โ‰ฅ1

( _๐‘˜

_๐‘˜ + ๐œš2๐‘›

)2โ‰ฅ

โˆ‘_๐‘˜โ‰ฅ๐œš2

๐‘›

14+ 1

4๐œš4๐‘›

โˆ‘_๐‘˜<๐œš

2๐‘›

_2๐‘˜

=14|{๐‘˜ : _๐‘˜ โ‰ฅ ๐œš2

๐‘›}| +1

4๐œš4๐‘›

โˆ‘_๐‘˜<๐œš

2๐‘›

_2๐‘˜ ๏ฟฝ ๐œš

โˆ’1/๐‘ ๐‘› ,

50

Page 62: On the Construction of Minimax Optimal Nonparametric Tests ...

where the last step holds by considering that _๐‘˜ ๏ฟฝ ๐‘˜โˆ’2๐‘ . Similarly,

E๐บ2๐‘› (๐‘‹, ๐‘‹โ€ฒ) =

โˆ‘๐‘˜โ‰ฅ1

( _๐‘˜

_๐‘˜ + ๐œš2๐‘›

)4โ‰ค |{๐‘˜ : _๐‘˜ โ‰ฅ ๐œš2

๐‘›}| + ๐œšโˆ’8๐‘›

โˆ‘_๐‘˜<๐œš

2๐‘›

_4๐‘˜ ๏ฟฝ ๐œš

โˆ’1/๐‘ ๐‘› ,

and

E๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹โ€ฒ)๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒโ€ฒ) =E{โˆ‘๐‘˜โ‰ฅ1

( _๐‘˜

_๐‘˜ + ๐œš2๐‘›

)2๐œ™2๐‘˜ (๐‘‹)

}2

โ‰ค(sup๐‘˜โ‰ฅ1

โ€–๐œ™๐‘˜ โ€–โˆž)4 {โˆ‘

๐‘˜โ‰ฅ1

( _๐‘˜

_๐‘˜ + ๐œš2๐‘›

)2}2๏ฟฝ ๐œš

โˆ’2/๐‘ ๐‘› .

Thus there exists a positive constant ๐ถ3 such that

E๐บ2๐‘› (๐‘‹, ๐‘‹โ€ฒ)/[E๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒ)]2 โ‰ค ๐ถ3๐œš1/๐‘ ๐‘› โ†’ 0, (6.4)

and

๐‘›โˆ’1E๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹โ€ฒ)๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒโ€ฒ)/[E๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹โ€ฒ)]2 โ‰ค ๐ถ3๐‘›

โˆ’1 โ†’ 0, (6.5)

as ๐‘›โ†’ โˆž. On the other hand,

E๏ฟฝ๏ฟฝ4๐‘› (๐‘‹, ๐‘‹โ€ฒ) โ‰ค โ€–๏ฟฝ๏ฟฝ๐‘›โ€–2

โˆžE๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹โ€ฒ),

where

โ€–๏ฟฝ๏ฟฝ๐‘›โ€–โˆž = sup๐‘ฅ

{โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

๐œ™2๐‘˜ (๐‘ฅ)

}โ‰ค

(sup๐‘˜โ‰ฅ1

โ€–๐œ™๐‘˜ โ€–โˆž)2 โˆ‘

๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

๏ฟฝ ๐œšโˆ’1/๐‘ ๐‘› .

This implies that for some positive constant ๐ถ4,

๐‘›โˆ’2E๏ฟฝ๏ฟฝ4๐‘› (๐‘‹, ๐‘‹โ€ฒ)}/[E๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒ)]2 โ‰ค ๐‘›โˆ’2โ€–๏ฟฝ๏ฟฝ๐‘›โ€–2โˆž/E๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒ) โ‰ค ๐ถ4(๐‘›2๐œš1/๐‘ ๐‘› )โˆ’1 โ†’ 0. (6.6)

51

Page 63: On the Construction of Minimax Optimal Nonparametric Tests ...

as ๐‘›โ†’ โˆž. Together, (6.4), (6.5) and (6.6) ensure that condition (6.3) holds.

Proof of Theorem 3. Note that

๐‘›[2๐œš๐‘›(P๐‘›, P0) โˆ’

1๐‘›

๐‘›โˆ‘๐‘–=1

๏ฟฝ๏ฟฝ๐‘› (๐‘‹๐‘–, ๐‘‹๐‘–)

=1๐‘›

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

โˆ‘1โ‰ค๐‘–, ๐‘—โ‰ค๐‘›๐‘–โ‰  ๐‘—

๐œ™๐‘˜ (๐‘‹๐‘–)๐œ™๐‘˜ (๐‘‹ ๐‘— )

=1๐‘›

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

โˆ‘1โ‰ค๐‘–, ๐‘—โ‰ค๐‘›๐‘–โ‰  ๐‘—

[๐œ™๐‘˜ (๐‘‹๐‘–) โˆ’ EP๐œ™๐‘˜ (๐‘‹)] [๐œ™๐‘˜ (๐‘‹ ๐‘— ) โˆ’ EP๐œ™๐‘˜ (๐‘‹)]

+ 2(๐‘› โˆ’ 1)๐‘›

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

[EP๐œ™๐‘˜ (๐‘‹)]โˆ‘

1โ‰ค๐‘–โ‰ค๐‘›[๐œ™๐‘˜ (๐‘‹๐‘–) โˆ’ EP๐œ™๐‘˜ (๐‘‹)]

+ ๐‘›(๐‘› โˆ’ 1)๐‘›

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

[EP๐œ™๐‘˜ (๐‘‹)]2

:=๐‘‰1 +๐‘‰2 +๐‘‰3.

Obviously, EP๐‘‰1๐‘‰2 = 0. We first argue that the following three statements together implies the

desired result:

lim๐‘›โ†’โˆž

infPโˆˆP(ฮ”๐‘›,\)

๐‘ฃโˆ’1/2๐‘› ๐‘‰3 = โˆž, (6.7)

supPโˆˆP(ฮ”๐‘›,\)

(EP๐‘‰21 /๐‘‰

23 ) = ๐‘œ(1), (6.8)

supPโˆˆP(ฮ”๐‘›,\)

(EP๐‘‰22 /๐‘‰

23 ) = ๐‘œ(1). (6.9)

To see this, note that (6.7) implies that

lim๐‘›โ†’โˆž

infPโˆˆP(ฮ”๐‘›,\)

๐‘ƒ(๐‘ฃโˆ’1/2๐‘› [๐‘›[2

๐œš๐‘›(P๐‘›, P0) โˆ’ ๐ด๐‘›] โ‰ฅ

โˆš2๐‘ง1โˆ’๐›ผ)

โ‰ฅ lim๐‘›โ†’โˆž

infPโˆˆP(ฮ”๐‘›,\)

๐‘ƒ

(๐‘ฃโˆ’1/2๐‘› ๐‘‰3 โ‰ฅ 2

โˆš2๐‘ง1โˆ’๐›ผ, ๐‘‰1 +๐‘‰2 +๐‘‰3 โ‰ฅ 1

2๐‘‰3

)= lim๐‘›โ†’โˆž

infPโˆˆP(ฮ”๐‘›,\)

๐‘ƒ

(๐‘‰1 +๐‘‰2 +๐‘‰3 โ‰ฅ 1

2๐‘‰3

).

52

Page 64: On the Construction of Minimax Optimal Nonparametric Tests ...

On the other hand, (6.8) and (6.9) imply that

lim๐‘›โ†’โˆž

infPโˆˆP(ฮ”๐‘›,\)

๐‘ƒ

(๐‘‰1 +๐‘‰2 +๐‘‰3 โ‰ฅ 1

2๐‘‰3

)=1 โˆ’ lim

๐‘›โ†’โˆžsup

PโˆˆP(ฮ”๐‘›,\)๐‘ƒ

(๐‘‰1 +๐‘‰2 +๐‘‰3 <

12๐‘‰3

)โ‰ฅ1 โˆ’ lim

๐‘›โ†’โˆžsup

PโˆˆP(ฮ”๐‘›,\)

EP(๐‘‰1 +๐‘‰2)2

(๐‘‰3/2)2 = 1.

This immediately suggests that ฮฆM3d is consistent. We now show that (6.7)-(6.9) indeed hold.

Verifying (6.7). We begin with (6.7). Since ๐‘ฃ๐‘› ๏ฟฝ ๐œšโˆ’1/๐‘ ๐‘› and ๐‘‰3 = (๐‘› โˆ’ 1)[2

๐œš๐‘›(P, P0), (6.7) is

equivalent to

lim๐‘›โ†’โˆž

infPโˆˆP(ฮ”๐‘›,\)

๐‘›๐œš12๐‘ ๐‘› [

2๐œš๐‘›(P, P0) = โˆž.

For any P โˆˆ P(ฮ”๐‘›, \), let ๐‘ข = ๐‘‘P/๐‘‘P0 โˆ’ 1 and ๐‘Ž๐‘˜ = ใ€ˆ๐‘ข, ๐œ™๐‘˜ใ€‰๐ฟ2 (P0) = EP๐œ™๐‘˜ (๐‘‹). Based on the

assumption that ๐พ is universal, ๐‘ข =โˆ‘๐‘˜โ‰ฅ1

๐‘Ž๐‘˜๐œ™๐‘˜ . We consider the case \ = 0 and \ > 0 separately.

(1) First consider \ = 0. It is clear that

[2๐œš๐‘›(P, P0) =

โˆ‘๐‘˜โ‰ฅ1

๐‘Ž2๐‘˜ โˆ’

โˆ‘๐‘˜โ‰ฅ1

๐œš2๐‘›

_๐‘˜ + ๐œš2๐‘›

๐‘Ž2๐‘˜

โ‰ฅโ€–๐‘ขโ€–2๐ฟ2 (P0) โˆ’ ๐œš

2๐‘›

โˆ‘๐‘˜โ‰ฅ1

1_๐‘˜๐‘Ž2๐‘˜

โ‰ฅโ€–๐‘ขโ€–2๐ฟ2 (P0) โˆ’ ๐œš

2๐‘›๐‘€

2.

Take ๐œš๐‘› โ‰คโˆšฮ”2๐‘›/(2๐‘€2) so that ๐œŒ2

๐‘›๐‘€2 โ‰ค 1

2ฮ”2๐‘›. Then we have

infPโˆˆP(ฮ”๐‘›,0)

[2๐œš๐‘›(P, P0) โ‰ฅ

12

infPโˆˆP(ฮ”๐‘›,0)

โ€–๐‘ขโ€–2๐ฟ2 (P0) =

12ฮ”2๐‘›.

(2) Now consider the case when \ > 0. For P โˆˆ P(ฮ”๐‘›, \), โˆ€ ๐‘… > 0, โˆƒ ๐‘“๐‘… โˆˆ H (๐พ) such that

53

Page 65: On the Construction of Minimax Optimal Nonparametric Tests ...

โ€–๐‘ข โˆ’ ๐‘“๐‘…โ€–๐ฟ2 (P0) โ‰ค ๐‘€๐‘…โˆ’1/\ and โ€– ๐‘“๐‘…โ€–๐พ โ‰ค ๐‘…. Let ๐‘๐‘˜ = ใ€ˆ ๐‘“๐‘…, ๐œ™๐‘˜ใ€‰๐ฟ2 (P0) .

[2๐œš๐‘›(P, P0) =

โˆ‘๐‘˜โ‰ฅ1

๐‘Ž2๐‘˜ โˆ’

โˆ‘๐‘˜โ‰ฅ1

๐œš2๐‘›

_๐‘˜ + ๐œš2๐‘›

๐‘Ž2๐‘˜

โ‰ฅโ€–๐‘ขโ€–2๐ฟ2 (P0) โˆ’ 2

โˆ‘๐‘˜โ‰ฅ1

๐œš2๐‘›

_๐‘˜ + ๐œš2๐‘›

(๐‘Ž๐‘˜ โˆ’ ๐‘๐‘˜ )2 โˆ’ 2โˆ‘๐‘˜โ‰ฅ1

๐œš2๐‘›

_๐‘˜ + ๐œš2๐‘›

๐‘2๐‘˜

โ‰ฅโ€–๐‘ขโ€–2๐ฟ2 (P0) โˆ’ 2

โˆ‘๐‘˜โ‰ฅ1

(๐‘Ž๐‘˜ โˆ’ ๐‘๐‘˜ )2 โˆ’ 2๐œš2๐‘›

โˆ‘๐‘˜โ‰ฅ1

1_๐‘˜๐‘2๐‘˜

=โ€–๐‘ขโ€–2๐ฟ2 (P0) โˆ’ 2โ€–๐‘ข โˆ’ ๐‘“๐‘…โ€–2

๐ฟ2 (P0) โˆ’ 2๐œš2๐‘›โ€– ๐‘“๐‘…โ€–2

๐พ .

Taking ๐‘… = (2๐‘€/โ€–๐‘ขโ€–๐ฟ2 (P0))\ yields that

[2๐œš๐‘›(P, P0) โ‰ฅ โ€–๐‘ขโ€–2

๐ฟ2 (P0) โˆ’ 2๐‘€2๐‘…โˆ’2/\ โˆ’ 2๐œš2๐‘›๐‘…

2 =12โ€–๐‘ขโ€–2

๐ฟ2 (P0) โˆ’ 2๐œš2๐‘›๐‘…

2.

Now by choosing

๐œš๐‘› โ‰ค1

2โˆš

2(2๐‘€)โˆ’\ฮ”1+\

๐‘› ,

we can ensure that

2๐œš2๐‘›๐‘…

2 โ‰ค 14โ€–๐‘ขโ€–2

๐ฟ2 (P0) .

So that

infPโˆˆP(ฮ”๐‘›,\)

[2๐œš๐‘›(P, P0) โ‰ฅ inf

PโˆˆP(ฮ”๐‘›,\)

14โ€–๐‘ขโ€–2

๐ฟ2 (P0) โ‰ฅ14ฮ”2๐‘›.

In both cases, with ๐œš๐‘› โ‰ค ๐ถฮ”\+1๐‘› for a sufficiently small ๐ถ = ๐ถ (๐‘€) > 0, lim

๐‘›โ†’โˆž๐œš

12๐‘ ๐‘› ๐‘›ฮ”

2๐‘› = โˆž

suffices to ensure (6.7) holds. Under the condition that lim๐‘›โ†’โˆž

ฮ”๐‘›๐‘›2๐‘ 

4๐‘ +\+1 = โˆž,

๐œš๐‘› = ๐‘๐‘›โˆ’ 2๐‘  (\+1)

4๐‘ +\+1 โ‰ค ๐ถฮ”\+1๐‘›

for sufficiently large ๐‘› and lim๐‘›โ†’โˆž

๐œš12๐‘ ๐‘› ๐‘›ฮ”

2๐‘› = โˆž holds as well.

54

Page 66: On the Construction of Minimax Optimal Nonparametric Tests ...

Verifying (6.8). Rewrite ๐‘‰1 as

๐‘‰1 =1๐‘›

โˆ‘1โ‰ค๐‘–, ๐‘—โ‰ค๐‘›๐‘–โ‰  ๐‘—

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

[๐œ™๐‘˜ (๐‘‹๐‘–) โˆ’ EP๐œ™๐‘˜ (๐‘‹)] [๐œ™๐‘˜ (๐‘‹ ๐‘— ) โˆ’ EP๐œ™๐‘˜ (๐‘‹)]

:=1๐‘›

โˆ‘1โ‰ค๐‘–, ๐‘—โ‰ค๐‘›๐‘–โ‰  ๐‘—

๐น๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ).

Then

EP๐‘‰21 =

1๐‘›2

โˆ‘๐‘–โ‰  ๐‘—๐‘–โ€ฒโ‰  ๐‘— โ€ฒ

EP๐น๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— )๐น๐‘› (๐‘‹๐‘–โ€ฒ, ๐‘‹ ๐‘— โ€ฒ)

=2๐‘›(๐‘› โˆ’ 1)

๐‘›2 EP๐น2๐‘› (๐‘‹, ๐‘‹โ€ฒ)

โ‰ค2EP๐น2๐‘› (๐‘‹, ๐‘‹โ€ฒ),

where ๐‘‹, ๐‘‹โ€ฒ ๐‘–.๐‘–.๐‘‘.โˆผ P. Recall that, for any two random variables ๐‘Œ1, ๐‘Œ2 such that E๐‘Œ21 < โˆž,

E[๐‘Œ1 โˆ’ E(๐‘Œ1 |๐‘Œ2)]2 = E๐‘Œ21 โˆ’ E[E(๐‘Œ1 |๐‘Œ2)2] โ‰ค E๐‘Œ2

1 .

Together with the fact that

๐น๐‘› (๐‘‹, ๐‘‹โ€ฒ) =๏ฟฝ๏ฟฝ๐‘› (๐‘‹, ๐‘‹โ€ฒ) โˆ’ EP [๏ฟฝ๏ฟฝ๐‘› (๐‘‹, ๐‘‹โ€ฒ) |๐‘‹] โˆ’ EP [๏ฟฝ๏ฟฝ๐‘› (๐‘‹, ๐‘‹โ€ฒ) |๐‘‹โ€ฒ] + EP๏ฟฝ๏ฟฝ๐‘› (๐‘‹, ๐‘‹โ€ฒ)

=๏ฟฝ๏ฟฝ๐‘› (๐‘‹, ๐‘‹โ€ฒ) โˆ’ EP [ ๏ฟฝ๏ฟฝ๐‘› (๐‘‹, ๐‘‹โ€ฒ) |๐‘‹] โˆ’ E[๏ฟฝ๏ฟฝ๐‘› (๐‘‹, ๐‘‹โ€ฒ) โˆ’ EP [๏ฟฝ๏ฟฝ๐‘› (๐‘‹, ๐‘‹โ€ฒ) |๐‘‹]

๏ฟฝ๏ฟฝ ๐‘‹โ€ฒ] ,we have

EP๐น2๐‘› (๐‘‹, ๐‘‹โ€ฒ) โ‰ค EP{๏ฟฝ๏ฟฝ๐‘› (๐‘‹, ๐‘‹โ€ฒ) โˆ’ EP [๏ฟฝ๏ฟฝ๐‘› (๐‘‹, ๐‘‹โ€ฒ) |๐‘‹]}2 โ‰ค EP๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒ).

55

Page 67: On the Construction of Minimax Optimal Nonparametric Tests ...

Thus, to prove (6.8), it suffices to show that

lim๐‘›โ†’โˆž

supPโˆˆP(ฮ”๐‘›,\)

EP๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹โ€ฒ)/๐‘‰2

3 = 0.

For any ๐‘” โˆˆ ๐ฟ2(P0) and positive definite kernel ๐บ (ยท, ยท) such that EP0๐บ2(๐‘‹, ๐‘‹โ€ฒ) < โˆž, let

โ€–๐‘”โ€–๐บ :=โˆšEP0 [๐‘”(๐‘‹)๐‘”(๐‘‹โ€ฒ)๐บ (๐‘‹, ๐‘‹โ€ฒ)] .

By the positive definiteness of ๐บ (ยท, ยท), triangular inequality holds for โ€– ยท โ€–๐บ , i.e., for any ๐‘”1, ๐‘”2 โˆˆ

๐ฟ2(P0),

|โ€–๐‘”1โ€–๐บ โˆ’ โ€–๐‘”2โ€–๐บ | โ‰ค โ€–๐‘”1 โˆ’ ๐‘”2โ€–๐บ .

Thus by taking ๐บ = ๏ฟฝ๏ฟฝ2๐‘› , ๐‘”1 = ๐‘‘P/๐‘‘P0 and ๐‘”2 = 1, we have๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝโˆšEP๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹โ€ฒ) โˆ’

โˆšEP0๏ฟฝ๏ฟฝ

2๐‘› (๐‘‹, ๐‘‹โ€ฒ)

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โ‰ค โˆšEP0 [๐‘ข(๐‘‹)๐‘ข(๐‘‹โ€ฒ)๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒ)] . (6.10)

We now appeal to the following lemma to bound the right hand side of (6.10):

Lemma 3. Let ๐บ be a Mercer kernel defined over X ร— X with eigenvalue-eigenfunction pairs

{(`๐‘˜ , ๐œ™๐‘˜ ) : ๐‘˜ โ‰ฅ 1} with respect to ๐ฟ2(P) such that `1 โ‰ฅ `2 โ‰ฅ ยท ยท ยท . If ๐บ is a trace kernel in that

E๐บ (๐‘‹, ๐‘‹) < โˆž, then for any ๐‘” โˆˆ ๐ฟ2(P)

EP [๐‘”(๐‘‹)๐‘”(๐‘‹โ€ฒ)๐บ2(๐‘‹, ๐‘‹โ€ฒ)] โ‰ค `1

(โˆ‘๐‘˜โ‰ฅ1

`๐‘˜

) (sup๐‘˜โ‰ฅ1

โ€–๐œ™๐‘˜ โ€–โˆž)2

โ€–๐‘”โ€–2๐ฟ2 (P) .

By Lemma 3, we get

EP0 [๐‘ข(๐‘‹)๐‘ข(๐‘‹โ€ฒ)๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹โ€ฒ)] โ‰ค๐ถ5

(โˆ‘๐‘˜

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

)โ€–๐‘ขโ€–2

๐ฟ2 (P0) ๏ฟฝ ๐œšโˆ’1/๐‘ ๐‘› โ€–๐‘ขโ€–2

๐ฟ2 (P0) .

56

Page 68: On the Construction of Minimax Optimal Nonparametric Tests ...

Recall that

EP0๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹โ€ฒ) =

โˆ‘๐‘˜

(_๐‘˜

_๐‘˜ + ๐œš2๐‘›

)2๏ฟฝ ๐œš

โˆ’1/๐‘ ๐‘› .

In the light of (6.10), they imply that

EP๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹โ€ฒ) โ‰ค 2{EP0๏ฟฝ๏ฟฝ

2๐‘› (๐‘‹, ๐‘‹โ€ฒ) + EP0 [๐‘ข(๐‘‹)๐‘ข(๐‘‹โ€ฒ)๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒ)]} โ‰ค ๐ถ6๐œšโˆ’1/๐‘ ๐‘› [1 + โ€–๐‘ขโ€–2

๐ฟ2 (P0)] .

On the other hand, as already shown in the part of verifying (6.7), ๐œš๐‘› ๏ฟฝ ฮ”\+1๐‘› suffices to ensure

that for sufficiently large ๐‘›,

14โ€–๐‘ขโ€–2

๐ฟ2 (P0) โ‰ค [2๐œš๐‘›(P, P0) โ‰ค โ€–๐‘ขโ€–2

๐ฟ2 (P0) , โˆ€ P โˆˆ P(ฮ”๐‘›, \).

Thus

lim๐‘›โ†’โˆž

supPโˆˆP(ฮ”๐‘›,\)

EP๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹โ€ฒ)/๐‘‰2

3

โ‰ค16๐ถ6

{(lim๐‘›โ†’โˆž

infPโˆˆP(ฮ”๐‘›,\)

๐œš1/๐‘ ๐‘› ๐‘›2โ€–๐‘ขโ€–4

๐ฟ2 (P0)

)โˆ’1+

(lim๐‘›โ†’โˆž

infPโˆˆP(ฮ”๐‘›,\)

๐œš1/๐‘ ๐‘› ๐‘›2โ€–๐‘ขโ€–2

๐ฟ2 (P0)

)โˆ’1}= 0

provided that lim๐‘›โ†’โˆž

๐‘›2๐‘ 

4๐‘ +\+1ฮ”๐‘› = โˆž. This immediately implies (6.8).

Verifying (6.9). Observe that

EP๐‘‰22 โ‰ค4๐‘›EP

{โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

[EP๐œ™๐‘˜ (๐‘‹)] [๐œ™๐‘˜ (๐‘‹) โˆ’ EP๐œ™๐‘˜ (๐‘‹)]}2

โ‰ค4๐‘›EP{โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

[EP๐œ™๐‘˜ (๐‘‹)] [๐œ™๐‘˜ (๐‘‹)]}2

=4๐‘›EP0

([1 + ๐‘ข(๐‘‹)]

{โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

[EP๐œ™๐‘˜ (๐‘‹)] [๐œ™๐‘˜ (๐‘‹)]}2

).

57

Page 69: On the Construction of Minimax Optimal Nonparametric Tests ...

It is clear that

EP0

{โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

[EP๐œ™๐‘˜ (๐‘‹)] [๐œ™๐‘˜ (๐‘‹)]}2

=โˆ‘๐‘˜,๐‘˜ โ€ฒโ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

_๐‘˜ โ€ฒ

_๐‘˜ โ€ฒ + ๐œš2๐‘›

EP๐œ™๐‘˜ (๐‘‹)EP๐œ™๐‘˜ โ€ฒ (๐‘‹)EP0 [๐œ™๐‘˜ (๐‘‹)๐œ™๐‘˜ โ€ฒ (๐‘‹)]

=โˆ‘๐‘˜โ‰ฅ1

( _๐‘˜

_๐‘˜ + ๐œš2๐‘›

)2[EP๐œ™๐‘˜ (๐‘‹)]2 โ‰ค [2

๐œš๐‘›(P, P0).

On the other hand,

EP0

(๐‘ข(๐‘‹)

{โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

[EP๐œ™๐‘˜ (๐‘‹)] [๐œ™๐‘˜ (๐‘‹)]}2

)โ‰ค

โˆšโˆšโˆšEP0

(๐‘ข2(๐‘‹)

{โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

[EP๐œ™๐‘˜ (๐‘‹)] [๐œ™๐‘˜ (๐‘‹)]}2

)ร—

ร—โˆšEP0

{โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

[EP๐œ™๐‘˜ (๐‘‹)] [๐œ™๐‘˜ (๐‘‹)]}2

โ‰คโ€–๐‘ขโ€–๐ฟ2 (P0) sup๐‘ฅ

๏ฟฝ๏ฟฝ๏ฟฝโˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

[EP๐œ™๐‘˜ (๐‘‹)] [๐œ™๐‘˜ (๐‘ฅ)]๏ฟฝ๏ฟฝ๏ฟฝ ยท [๐œš๐‘› (P, P0)

โ‰ค(sup๐‘˜

โ€–๐œ™๐‘˜ โ€–โˆž)โ€–๐‘ขโ€–๐ฟ2 (P0)

โˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

|EP๐œ™๐‘˜ (๐‘‹) | ยท [๐œš๐‘› (P, P0)

โ‰ค(sup๐‘˜

โ€–๐œ™๐‘˜ โ€–โˆž)โ€–๐‘ขโ€–๐ฟ2 (P0)

โˆšโˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

โˆšโˆ‘๐‘˜โ‰ฅ1

_๐‘˜

_๐‘˜ + ๐œš2๐‘›

[EP๐œ™๐‘˜ (๐‘‹)]2 ยท [๐œš๐‘› (P, P0)

โ‰ค๐ถ7โ€–๐‘ขโ€–๐ฟ2 (P0) ๐œšโˆ’ 1

2๐‘ ๐‘› [2

๐œš๐‘›(P, P0).

Together, they imply that

lim๐‘›โ†’โˆž

supPโˆˆP(ฮ”๐‘›,\)

EP๐‘‰21 /๐‘‰

23

โ‰ค4 max{1, ๐ถ7}{(

lim๐‘›โ†’โˆž

infPโˆˆP(ฮ”๐‘›,\)

๐‘›[2๐œš๐‘›(P, P0)

)โˆ’1+ lim๐‘›โ†’โˆž

supPโˆˆP(ฮ”๐‘›,\)

(โ€–๐‘ขโ€–๐ฟ2 (P0)

๐œš12๐‘ ๐‘› ๐‘›[

2๐œš๐‘› (P, P0)

)}= 0,

under the assumption that lim๐‘›โ†’โˆž

๐‘›2๐‘ 

4๐‘ +\+1ฮ”๐‘› = โˆž.

58

Page 70: On the Construction of Minimax Optimal Nonparametric Tests ...

Proof of Theorem 4. The main architect is now standard in establishing minimax lower bounds for

nonparametric hypothesis testing. The main idea is to carefully construct a set of points under the

alternative hypothesis and argue that a mixture of these alternatives cannot be reliably distinguished

from the null. See, e.g., Ingster, 1993; Ingster and Suslina, 2003; Tsybakov, 2008. Without loss of

generality, assume ๐‘€ = 1 and ฮ”๐‘› = ๐‘๐‘›โˆ’ 2๐‘ 

4๐‘ +\+1 for some ๐‘ > 0.

Let us consider the cases of \ = 0 and \ > 0 separately.

The case of \ = 0. We first treat the case when \ = 0. Let ๐ต๐‘› = b๐ถ8ฮ”โˆ’ 1๐‘ 

๐‘› c for a sufficiently small

constant ๐ถ8 > 0 and ๐‘Ž๐‘› =โˆšฮ”2๐‘›/๐ต๐‘›. For any b๐‘› := (b๐‘›1, b๐‘›2, ยท ยท ยท , b๐‘›๐ต๐‘›)> โˆˆ {ยฑ1}๐ต๐‘› , write

๐‘ข๐‘›,b๐‘› = ๐‘Ž๐‘›

๐ต๐‘›โˆ‘๐‘˜=1

b๐‘›๐‘˜๐œ‘๐‘˜ .

It is clear that

โ€–๐‘ข๐‘›,b๐‘› โ€–2๐ฟ2 (P0) = ๐ต๐‘›๐‘Ž

2๐‘› = ฮ”2

๐‘›

and

โ€–๐‘ข๐‘›,b๐‘› โ€–โˆž โ‰ค ๐‘Ž๐‘›๐ต๐‘›(sup๐‘˜

โ€–๐œ‘๐‘˜ โ€–โˆž)๏ฟฝ ฮ”

2๐‘ โˆ’12๐‘ ๐‘› โ†’ 0.

By taking ๐ถ8 small enough, we can also ensure

โ€–๐‘ข๐‘›,b๐‘› โ€–2๐พ = ๐‘Ž2

๐‘›

๐ต๐‘›โˆ‘๐‘˜=1

_โˆ’1๐‘˜ โ‰ค 1,

Therefore, there exists a probability measure P๐‘›,b๐‘› โˆˆ P(ฮ”๐‘›, 0) such that ๐‘‘P๐‘›,b๐‘›/๐‘‘P0 = 1+๐‘ข๐‘›,b๐‘› .

Following a standard argument for minimax lower bound, it suffices to show that

lim sup๐‘›โ†’โˆž

EP0ยฉยญยซ 12๐ต๐‘›

โˆ‘b๐‘›โˆˆ{ยฑ1}๐ต๐‘›

{๐‘›โˆ๐‘–=1

[1 + ๐‘ข๐‘›,b๐‘› (๐‘‹๐‘–)]}ยชยฎยฌ

2

< โˆž. (6.11)

59

Page 71: On the Construction of Minimax Optimal Nonparametric Tests ...

Note that

EP0ยฉยญยซ 12๐ต๐‘›

โˆ‘b๐‘›โˆˆ{ยฑ1}๐ต๐‘›

{๐‘›โˆ๐‘–=1

[1 + ๐‘ข๐‘›,b๐‘› (๐‘‹๐‘–)]}ยชยฎยฌ

2

=EP0ยฉยญยซ 122๐ต๐‘›

โˆ‘b๐‘›,b

โ€ฒ๐‘›โˆˆ{ยฑ1}๐ต๐‘›

{๐‘›โˆ๐‘–=1

[1 + ๐‘ข๐‘›,b๐‘› (๐‘‹๐‘–)]} {

๐‘›โˆ๐‘–=1

[1 + ๐‘ข๐‘›,b โ€ฒ๐‘› (๐‘‹๐‘–)]}ยชยฎยฌ

=1

22๐ต๐‘›

โˆ‘b๐‘›,b

โ€ฒ๐‘›โˆˆ{ยฑ1}๐ต๐‘›

๐‘›โˆ๐‘–=1EP0

{[1 + ๐‘ข๐‘›,b๐‘› (๐‘‹๐‘–)] [1 + ๐‘ข๐‘›,b โ€ฒ๐‘› (๐‘‹๐‘–)]

}=

122๐ต๐‘›

โˆ‘b๐‘›,b

โ€ฒ๐‘›โˆˆ{ยฑ1}๐ต๐‘›

(1 + ๐‘Ž2

๐‘›

๐ต๐‘›โˆ‘๐‘˜=1

b๐‘›๐‘˜bโ€ฒ๐‘›๐‘˜

)๐‘›โ‰ค 1

22๐ต๐‘›

โˆ‘b๐‘›,b

โ€ฒ๐‘›โˆˆ{ยฑ1}๐ต๐‘›

exp(๐‘›๐‘Ž2

๐‘›

๐ต๐‘›โˆ‘๐‘˜=1

b๐‘›,๐‘˜bโ€ฒ๐‘›,๐‘˜

)=

{exp(๐‘›๐‘Ž2

๐‘›) + exp(โˆ’๐‘›๐‘Ž2๐‘›)

2

}๐ต๐‘›โ‰ค exp

(12๐ต๐‘›๐‘›

2๐‘Ž4๐‘›

),

where the last inequality is ensured by that

cosh(๐‘ก) โ‰ค exp(๐‘ก2

2

), โˆ€ ๐‘ก โˆˆ R.

See, e.g., Baraud, 2002. With the particular choice of ๐ต๐‘›, ๐‘Ž๐‘›, and the conditions on ฮ”๐‘›, this

immediately implies (6.11).

The case of \ > 0. The main idea is similar to before. To find a set of probability measures in

P(ฮ”๐‘›, \), we appeal to the following lemma.

Lemma 4. Let ๐‘ข =โˆ‘๐‘˜

๐‘Ž๐‘˜๐œ‘๐‘˜ . If

sup๐ตโ‰ฅ1

(๐ตโˆ‘๐‘˜=1

๐‘Ž2๐‘˜

_๐‘˜

)2/\ (โˆ‘๐‘˜โ‰ฅ๐ต

๐‘Ž2๐‘˜

) โ‰ค ๐‘€2,

60

Page 72: On the Construction of Minimax Optimal Nonparametric Tests ...

then ๐‘ข โˆˆ F (\, ๐‘€).

Similar to before, we shall now take ๐ต๐‘› = b๐ถ10ฮ”โˆ’ \+1

๐‘ ๐‘› c and ๐‘Ž๐‘› =

โˆšฮ”2๐‘›/๐ต๐‘›. By Lemma 4, we can

find P๐‘›,b๐‘› โˆˆ P(ฮ”๐‘›, \) such that ๐‘‘P๐‘›,b๐‘›/๐‘‘P0 = 1 + ๐‘ข๐‘›,b๐‘› , for appropriately chosen ๐ถ10. Following

the same argument as in the previous case, we can again verify (6.11).

Proof of Theorem 5. Without loss of generality, assume that ฮ”๐‘› (\) = ๐‘1(๐‘›โˆ’1โˆšlog log ๐‘›) 2๐‘ 4๐‘ +\+1 for

some constant ๐‘1 > 0 to be determined later.

Type I Error. We first prove the first statement which shows that the Type I error converges to 0.

Following the same notations as defined in the proof of Theorem 2, let

๐‘๐‘›,2 = E{ ๐‘›โˆ‘๐‘—=2E(Z2๐‘› ๐‘— |F๐‘—โˆ’1

)โˆ’ 1

}2, ๐ฟ๐‘›,2 =

๐‘›โˆ‘๐‘—=2EZ4

๐‘› ๐‘—

where Z๐‘› ๐‘— =โˆš

2Z๐‘› ๐‘—/(๐‘›โˆš๐‘ฃ๐‘›). As shown by Haeusler (1988),

sup๐‘ก

|๐‘ƒ(๐‘‡๐‘›,๐œš๐‘› > ๐‘ก) โˆ’ ฮฆ(๐‘ก) | โ‰ค ๐ถ11(๐ฟ๐‘›,2 + ๐‘๐‘›,2)1/5,

where ฮฆ(๐‘ก) is the survival function of the standard normal, i.e., ฮฆ(๐‘ก) = ๐‘ƒ(๐‘ > ๐‘ก) where ๐‘ โˆผ

๐‘ (0, 1). Again by the argument from Hall (1984),

E{ ๐‘›โˆ‘๐‘—=2E(Z2

๐‘› ๐‘— |F๐‘—โˆ’1) โˆ’12๐‘›(๐‘› โˆ’ 1)๐‘ฃ๐‘›

}2โ‰ค ๐ถ12 [๐‘›4E๐บ2

๐‘› (๐‘‹, ๐‘‹โ€ฒ) + ๐‘›3E๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹โ€ฒ)๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒโ€ฒ)],

where ๐บ๐‘› (ยท, ยท) is defined in the proof of Theorem 2, and

๐‘›โˆ‘๐‘—=2EZ4

๐‘› ๐‘— โ‰ค ๐ถ13 [๐‘›2E๏ฟฝ๏ฟฝ4๐‘› (๐‘‹, ๐‘‹โ€ฒ) + ๐‘›3E๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒ)๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹

โ€ฒโ€ฒ)],

61

Page 73: On the Construction of Minimax Optimal Nonparametric Tests ...

which ensures

๐‘๐‘›,2 =

4E{ ๐‘›โˆ‘๐‘—=2E(Z2

๐‘› ๐‘—|F๐‘—โˆ’1) โˆ’ 1

2๐‘›(๐‘› โˆ’ 1)๐‘ฃ๐‘› โˆ’ 12๐‘›๐‘ฃ๐‘›

}2

๐‘›4๐‘ฃ2๐‘›

โ‰ค8 max{๐ถ12,

14

} {E๐บ2

๐‘› (๐‘‹, ๐‘‹โ€ฒ)๐‘ฃ2๐‘›

+E๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒ)๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹

โ€ฒโ€ฒ)๐‘›๐‘ฃ2

๐‘›

+ 1๐‘›2

},

and

๐ฟ๐‘›,2 =

4๐‘›โˆ‘๐‘—=2EZ4

๐‘› ๐‘—

๐‘›4๐‘ฃ2๐‘›

โ‰ค 4๐ถ13

{E๏ฟฝ๏ฟฝ4

๐‘› (๐‘‹, ๐‘‹โ€ฒ)๐‘›2๐‘ฃ2

๐‘›

+E๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒ)๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹

โ€ฒโ€ฒ)๐‘›๐‘ฃ2

๐‘›

}.

As shown in the proof of Theorem 2,

E๐บ2๐‘› (๐‘‹, ๐‘‹โ€ฒ)๐‘ฃ2๐‘›

โ‰ค ๐ถ3๐œš1/๐‘ ๐‘› ,

E๏ฟฝ๏ฟฝ4๐‘› (๐‘‹, ๐‘‹โ€ฒ)๐‘›2๐‘ฃ2

๐‘›

โ‰ค ๐ถ4๐‘›โˆ’2๐œš

โˆ’1/๐‘ ๐‘› , and

E๏ฟฝ๏ฟฝ2๐‘› (๐‘‹, ๐‘‹โ€ฒ)๏ฟฝ๏ฟฝ2

๐‘› (๐‘‹, ๐‘‹โ€ฒโ€ฒ)

๐‘›๐‘ฃ2๐‘›

โ‰ค ๐ถ3๐‘›โˆ’1.

Therefore,

sup๐‘ก

|๐‘ƒ(๐‘‡๐‘›,๐œš๐‘› > ๐‘ก) โˆ’ ฮฆ(๐‘ก) | โ‰ค ๐ถ14(๐œš15๐‘ ๐‘› + ๐‘›โˆ’ 1

5 + ๐‘›โˆ’ 25 ๐œš

โˆ’ 15๐‘ 

๐‘› ),

which implies that

๐‘ƒ

(sup

0โ‰ค๐‘˜โ‰ค๐‘šโˆ—

๐‘‡๐‘›,2๐‘˜ ๐œšโˆ— > ๐‘ก

)โ‰ค ๐‘šโˆ—ฮฆ(๐‘ก) + ๐ถ15(2

๐‘šโˆ—5๐‘  ๐œš

15๐‘ โˆ— + ๐‘šโˆ—๐‘›

โˆ’ 15 + ๐‘›โˆ’ 2

5 ๐œšโˆ’ 1

5๐‘ โˆ— ), โˆ€๐‘ก.

It is not hard to see, by the definitions of ๐‘šโˆ— and ๐œšโˆ—,

2๐‘šโˆ— ๐œšโˆ— โ‰ค 2

(โˆšlog log ๐‘›๐‘›

) 2๐‘ 4๐‘ +1

62

Page 74: On the Construction of Minimax Optimal Nonparametric Tests ...

and

๐‘šโˆ— =(log 2)โˆ’1{2๐‘  log ๐‘› โˆ’ 2๐‘ 4๐‘  + 1

log ๐‘› + ๐‘œ(log ๐‘›)}

=(log 2)โˆ’1 8๐‘ 2

4๐‘  + 1log ๐‘› + ๐‘œ(log ๐‘›) ๏ฟฝ log ๐‘›.

Together with the fact that ฮฆ(๐‘ก) โ‰ค 12๐‘’

โˆ’๐‘ก2/2 for ๐‘ก โ‰ฅ 0, we get

๐‘ƒ

(sup

0โ‰ค๐‘˜โ‰ค๐‘šโˆ—

๐‘‡๐‘›,2๐‘˜ ๐œšโˆ— >โˆš

3 log log ๐‘›

)โ‰ค๐ถ16

๐‘’โˆ’32 log log ๐‘› log ๐‘› +

(โˆšlog ๐‘›๐‘›

) 25(4๐‘ +1)

+ ๐‘›โˆ’ 15 log log ๐‘› + ๐‘›โˆ’ 2

5

(โˆšlog log ๐‘›๐‘›

)โˆ’ 25 โ†’ 0,

as ๐‘›โ†’ โˆž.

Type II Error. Next consider Type II error. To this end, write ๐œš๐‘› (\) = (โˆš

log log ๐‘›๐‘›

)2๐‘  (\+1)4๐‘ +\+1 . Let

๏ฟฝ๏ฟฝ๐‘› (\) = sup0โ‰ค๐‘˜โ‰ค๐‘šโˆ—

{2๐‘˜ ๐œšโˆ— : ๐œš๐‘› โ‰ค ๐œš๐‘› (\)}.

It is clear that ๐‘‡๐‘› โ‰ฅ ๐‘‡๐‘›,๏ฟฝ๏ฟฝ๐‘› (\) for any \ โ‰ฅ 0. It therefore suffices to show that for any \ โ‰ฅ 0,

lim๐‘›โ†’โˆž

inf\โ‰ฅ0

infPโˆˆP(ฮ”๐‘›,\)

๐‘ƒ

{๐‘‡๐‘›,๏ฟฝ๏ฟฝ๐‘› (\) โ‰ฅ

โˆš3 log log ๐‘›

}= 1.

By Markov inequality, this can accomplished by verifying

inf\โˆˆ[0,โˆž)

infPโˆˆP(ฮ”๐‘› (\),\)

EP๐‘‡๐‘›,๏ฟฝ๏ฟฝ๐‘› (\) โ‰ฅ ๏ฟฝ๏ฟฝโˆš

log log ๐‘› (6.12)

for some ๏ฟฝ๏ฟฝ >โˆš

3; and

lim๐‘›โ†’โˆž

sup\โ‰ฅ0

supPโˆˆP(ฮ”๐‘› (\),\)

Var(๐‘‡๐‘›,๏ฟฝ๏ฟฝ๐‘› (\)

)(EP๐‘‡๐‘›,๏ฟฝ๏ฟฝ๐‘› (\)

)2 = 0. (6.13)

63

Page 75: On the Construction of Minimax Optimal Nonparametric Tests ...

We now show that both (6.12) and (6.13) hold with

ฮ”๐‘› (\) = ๐‘1

(โˆšlog log ๐‘›๐‘›

) 2๐‘ 4๐‘ +\+1

for a sufficiently large ๐‘1 = ๐‘1(๐‘€, ๏ฟฝ๏ฟฝ).

Note that โˆ€ \ โˆˆ [0,โˆž),

12๐œš๐‘› (\) โ‰ค ๏ฟฝ๏ฟฝ๐‘› (\) โ‰ค ๐œš๐‘› (\), (6.14)

which immediately suggests

[2๏ฟฝ๏ฟฝ๐‘› (\) (P, P0) โ‰ฅ [2

๐œš๐‘› (\) (P, P0). (6.15)

Following the arguments in the proof of Theorem 3,

EP๐‘‡๐‘›,๏ฟฝ๏ฟฝ๐‘› (\) โ‰ฅ ๐ถ17๐‘›[ ๏ฟฝ๏ฟฝ๐‘› (\)]1/(2๐‘ )[2๏ฟฝ๏ฟฝ๐‘› (\) (P, P0) โ‰ฅ 2โˆ’1/(2๐‘ )๐ถ17๐‘›[๐œš๐‘› (\)]1/2๐‘ [2

๐œš๐‘› (\) (P, P0),

and โˆ€ P โˆˆ P(ฮ”๐‘› (\), \),

[2๐œš๐‘› (\) (P, P0) โ‰ฅ

14โ€–๐‘ขโ€–2

๐ฟ2 (๐‘ƒ0) (6.16)

provided that ฮ”๐‘› (\) โ‰ฅ ๐ถโ€ฒ(๐‘€)(โˆš

log log ๐‘›๐‘›

) 2๐‘ 4๐‘ +\+1

.

Therefore,

infPโˆˆP(ฮ”๐‘› (\),\)

EP๐‘‡๐‘›,๏ฟฝ๏ฟฝ๐‘› (\) โ‰ฅ ๐ถ18๐‘›[๐œš๐‘› (\)]1/(2๐‘ )ฮ”๐‘› (\) โ‰ฅ ๐ถ18๐‘1โˆš

log log ๐‘› โ‰ฅ ๏ฟฝ๏ฟฝโˆš

log log ๐‘›

if ๐‘1 โ‰ฅ ๐ถโˆ’118 ๏ฟฝ๏ฟฝ . Hence to ensure (6.12) holds, it suffices to take

๐‘1 = max{๐ถโ€ฒ(๐‘€), ๐ถโˆ’118 ๏ฟฝ๏ฟฝ}.

64

Page 76: On the Construction of Minimax Optimal Nonparametric Tests ...

With (6.14), (6.15) and (6.16), the results in the proof of Theorem 3 imply that for sufficiently

large ๐‘›

supPโˆˆP(ฮ”โˆ—

๐‘› (\),\)

Var(๐‘‡๐‘›,๏ฟฝ๏ฟฝ๐‘› (\)

)(EP๐‘‡๐‘›,๏ฟฝ๏ฟฝ๐‘› (\)

)2 โ‰ค๐ถ19

{ ([๐œš๐‘› (\)]

12๐‘  ๐‘›ฮ”โˆ—

๐‘› (\))โˆ’2

+([๐œš๐‘› (\)]

1๐‘  ๐‘›2ฮ”โˆ—

๐‘› (\))โˆ’1

+ (๐‘›ฮ”โˆ—๐‘› (\))โˆ’1 +

([๐œš๐‘› (\)]

12๐‘  ๐‘›

โˆšฮ”โˆ—๐‘› (\)

)โˆ’1 }โ‰ค2๐ถ19

([๐œš๐‘› (\)]

12๐‘  ๐‘›ฮ”โˆ—

๐‘› (\))โˆ’1

= 2๐ถ19(๐‘1 log log ๐‘›)โˆ’ 12 โ†’ 0,

which shows (6.13).

Proof of Theorem 6. The main idea of the proof is similar to that for Theorem 4.

Nevertheless, in order to show

infฮฆ๐‘›

[EP0ฮฆ๐‘› + sup

\โˆˆ[\1,\2]๐›ฝ(ฮฆ๐‘›;ฮ”๐‘› (\), \)

]converges to 1 rather than bounded below from 0, we need to find P๐œ‹, which is the marginal

distribution on X๐‘› with conditional distribution selected from

{PโŠ—๐‘› : P โˆˆ โˆช\โˆˆ[\1,\2]P(ฮ”๐‘› (\), \)}

and prior distribution ๐œ‹ on โˆช\โˆˆ[\1,\2]P(ฮ”๐‘› (\), \) such that the ๐œ’2 distance between P๐œ‹ and PโŠ—๐‘›0

converges to 0. See Ingster (2000).

To this end, assume, without loss of generality, that

ฮ”๐‘› (\) = ๐‘2

(๐‘›โˆš

log log ๐‘›

)โˆ’ 2๐‘ 4๐‘ +\+1

, โˆ€\ โˆˆ [\1, \2],

where ๐‘2 > 0 is a sufficiently small constant to be determined later.

Let ๐‘Ÿ๐‘› = b๐ถ20 log ๐‘›c and ๐ต๐‘›,1 = b๐ถ21ฮ”โˆ’ \1+1

๐‘ ๐‘› (\1)cfor sufficiently small ๐ถ20, ๐ถ21 > 0. Set

65

Page 77: On the Construction of Minimax Optimal Nonparametric Tests ...

\๐‘›,1 = \1. For 2 โ‰ค ๐‘Ÿ โ‰ค ๐‘Ÿ๐‘›, let

๐ต๐‘›,๐‘Ÿ = 2๐‘Ÿโˆ’2๐ต๐‘›,1

and \๐‘›,๐‘Ÿ is selected such that the following equation holds.

๐ต๐‘›,๐‘Ÿ =

โŒŠ๐ถ21 [ฮ”๐‘› (\๐‘›,๐‘Ÿ)]โˆ’

\๐‘›,๐‘Ÿ +1๐‘ 

โŒ‹.

Note that by choosing ๐ถ20 sufficiently small,

๐ต๐‘›,๐‘Ÿ๐‘› = 2๐‘Ÿ๐‘›โˆ’2๐ต๐‘›,1 โ‰ค๐‘

2(\1+1)4๐‘ +\1+12

(๐‘›โˆš

log log ๐‘›

) 2(\1+1)4๐‘ +\1+1

ยท 2๐‘Ÿ๐‘›โˆ’2

=

โŒŠ๐‘

2(\1+1)4๐‘ +\1+12 exp

(log

(๐‘›โˆš

log log ๐‘›

)ยท 2(\1 + 1)

4๐‘  + \1 + 1+ (๐‘Ÿ๐‘› โˆ’ 2) log 2

)โŒ‹โ‰ค

โŒŠ๐ถ21 exp

(log

(๐‘›โˆš

log log ๐‘›

)ยท 2(\2 + 1)

4๐‘  + \2 + 1

)โŒ‹= b๐ถ21 [ฮ”๐‘› (\2)]โˆ’

\2+1๐‘  c

for sufficiently large ๐‘›. Thus, we can guarante that โˆ€ 1 โ‰ค ๐‘Ÿ โ‰ค ๐‘Ÿ๐‘›, \๐‘›,๐‘Ÿ๐‘› โˆˆ [\1, \2].

We now construct a finite subset of โˆช\โˆˆ[\1,\2]P(ฮ”๐‘› (\), \) as follows. Let ๐ตโˆ—๐‘›,0 = 0 and ๐ตโˆ—

๐‘›,๐‘Ÿ =

๐ต๐‘›,1 + ยท ยท ยท + ๐ต๐‘›,๐‘Ÿ for ๐‘Ÿ โ‰ฅ 1. For each b๐‘›,๐‘Ÿ = (b๐‘›,๐‘Ÿ,1, ยท ยท ยท , b๐‘›,๐‘Ÿ,๐ต๐‘›,๐‘Ÿ ) โˆˆ {ยฑ1}๐ต๐‘›,๐‘Ÿ , let

๐‘“๐‘›,๐‘Ÿ,b๐‘›,๐‘Ÿ = 1 +๐ตโˆ—๐‘›,๐‘Ÿโˆ‘

๐‘˜=๐ตโˆ—๐‘›,๐‘Ÿโˆ’1+1

๐‘Ž๐‘›,๐‘Ÿb๐‘›,๐‘Ÿ,๐‘˜โˆ’๐ตโˆ—๐‘›,๐‘Ÿโˆ’1

๐œ‘๐‘˜ ,

and ๐‘Ž๐‘›,๐‘Ÿ =

โˆšฮ”2๐‘› (\๐‘›,๐‘Ÿ)/๐ต๐‘›,๐‘Ÿ . Following the same argument as that in the proof of Theorem 4, we

can verify that with a sufficiently small ๐ถ21, each P๐‘›,๐‘Ÿ,b๐‘›,๐‘Ÿ โˆˆ P(ฮ”๐‘› (\๐‘›,๐‘Ÿ), \๐‘›,๐‘Ÿ), where ๐‘“๐‘›,๐‘Ÿ,b๐‘›,๐‘Ÿ is the

Radon-Nikodym derivative ๐‘‘P๐‘›,๐‘Ÿ,b๐‘›,๐‘Ÿ /๐‘‘P0. With slight abuse of notation, write

๐‘“๐‘› (๐‘‹1, ๐‘‹2, ยท ยท ยท , ๐‘‹๐‘›) =1๐‘Ÿ๐‘›

๐‘Ÿ๐‘›โˆ‘๐‘Ÿ=1

๐‘“๐‘›,๐‘Ÿ (๐‘‹1, ๐‘‹2, ยท ยท ยท , ๐‘‹๐‘›),

66

Page 78: On the Construction of Minimax Optimal Nonparametric Tests ...

where

๐‘“๐‘›,๐‘Ÿ (๐‘‹1, ๐‘‹2, ยท ยท ยท , ๐‘‹๐‘›) =1

2๐ต๐‘›,๐‘Ÿโˆ‘

b๐‘›,๐‘Ÿโˆˆ{ยฑ1}๐ต๐‘›,๐‘Ÿ

๐‘›โˆ๐‘–=1

๐‘“๐‘›,๐‘Ÿ,b๐‘›,๐‘Ÿ (๐‘‹๐‘–).

It now suffices to show that

โ€– ๐‘“๐‘› โˆ’ 1โ€–2๐ฟ2 (P0) = โ€– ๐‘“๐‘›โ€–2

๐ฟ2 (P0) โˆ’ 1 โ†’ 0, as ๐‘›โ†’ โˆž,

where โ€– ๐‘“๐‘›โ€–2๐ฟ2 (P0) = EP0 ๐‘“

2๐‘› (๐‘‹1, ๐‘‹2, ยท ยท ยท , ๐‘‹๐‘›).

Note that

โ€– ๐‘“๐‘›โ€–2๐ฟ2 (P0) =

1๐‘Ÿ2๐‘›

โˆ‘1โ‰ค๐‘Ÿ,๐‘Ÿ โ€ฒโ‰ค๐‘Ÿ๐‘›

ใ€ˆ ๐‘“๐‘›,๐‘Ÿ , ๐‘“๐‘›,๐‘Ÿ โ€ฒใ€‰๐ฟ2 (P0)

=1๐‘Ÿ2๐‘›

โˆ‘1โ‰ค๐‘Ÿโ‰ค๐‘Ÿ๐‘›

โ€– ๐‘“๐‘›,๐‘Ÿ โ€–2๐ฟ2 (P0) +

1๐‘Ÿ2๐‘›

โˆ‘1โ‰ค๐‘Ÿ,๐‘Ÿ โ€ฒโ‰ค๐‘Ÿ๐‘›๐‘Ÿโ‰ ๐‘Ÿ โ€ฒ

ใ€ˆ ๐‘“๐‘›,๐‘Ÿ , ๐‘“๐‘›,๐‘Ÿ โ€ฒใ€‰๐ฟ2 (P0) .

It is easy to verify that, for any ๐‘Ÿ โ‰  ๐‘Ÿโ€ฒ,

ใ€ˆ ๐‘“๐‘›,๐‘Ÿ , ๐‘“๐‘›,๐‘Ÿ โ€ฒใ€‰๐ฟ2 (P0) = 1.

It therefore suffices to show that

โˆ‘1โ‰ค๐‘Ÿโ‰ค๐‘Ÿ๐‘›

โ€– ๐‘“๐‘›,๐‘Ÿ โ€–2๐ฟ2 (P0) = ๐‘œ(๐‘Ÿ

2๐‘›).

Following the same derivation as that in the proof of Theorem 4, we can show that

โ€– ๐‘“๐‘›,๐‘Ÿ โ€–2๐ฟ2 (P0) โ‰ค

(exp(๐‘›๐‘Ž2

๐‘›,๐‘Ÿ) + exp(โˆ’๐‘›๐‘Ž2๐‘›,๐‘Ÿ)

2

)๐ต๐‘›,๐‘Ÿโ‰ค exp

(12๐ต๐‘›,๐‘Ÿ๐‘›

2๐‘Ž4๐‘›,๐‘Ÿ

)

67

Page 79: On the Construction of Minimax Optimal Nonparametric Tests ...

for sufficiently large ๐‘›. By setting ๐‘2 in the expression of ฮ”๐‘› (\) sufficiently small, we have

๐ต๐‘›,๐‘Ÿ๐‘›2๐‘Ž4๐‘›,๐‘Ÿ โ‰ค log ๐‘Ÿ๐‘›,

which ensures that

โˆ‘1โ‰ค๐‘Ÿโ‰ค๐‘Ÿ๐‘›

โ€– ๐‘“๐‘›,๐‘Ÿ โ€–2๐ฟ2 (P0) โ‰ค ๐‘Ÿ

3/2๐‘› = ๐‘œ(๐‘Ÿ2

๐‘›).

Proof of Theorem 7. We begin with (3.2). Note that ๐›พ2a๐‘› (P, P0) is a U-statistic. We can apply the

general techniques for U-statistics to establish its asymptotic normality. In particular, as shown in

Hall (1984), it suffices to verify the following four conditions:

(2a๐‘›๐œ‹

)๐‘‘/2E๏ฟฝ๏ฟฝ2

a๐‘›(๐‘‹1, ๐‘‹2) โ†’ โ€–๐‘0โ€–2

๐ฟ2, (6.17)

E๏ฟฝ๏ฟฝ4a๐‘›(๐‘‹1, ๐‘‹2)

๐‘›2 [E๏ฟฝ๏ฟฝ2a๐‘› (๐‘‹1, ๐‘‹2)]2

โ†’ 0, (6.18)

E[๏ฟฝ๏ฟฝ2a๐‘›(๐‘‹1, ๐‘‹2)๏ฟฝ๏ฟฝ2

a๐‘›(๐‘‹1, ๐‘‹3)]

๐‘›[E๏ฟฝ๏ฟฝ2a๐‘› (๐‘‹1, ๐‘‹2)]2

โ†’ 0, (6.19)

E๐ป2a๐‘›(๐‘‹1, ๐‘‹2)

[E๏ฟฝ๏ฟฝ2a๐‘› (๐‘‹1, ๐‘‹2)]2

โ†’ 0, (6.20)

as ๐‘›โ†’ โˆž, where

๐ปa๐‘› (๐‘ฅ, ๐‘ฆ) = E๏ฟฝ๏ฟฝa๐‘› (๐‘ฅ, ๐‘‹3)๏ฟฝ๏ฟฝa๐‘› (๐‘ฆ, ๐‘‹3), โˆ€ ๐‘ฅ, ๐‘ฆ โˆˆ R๐‘‘ .

Verifying Condition (6.17). Note that

E๏ฟฝ๏ฟฝ2a๐‘›(๐‘‹1, ๐‘‹2) = E๐บ2

a๐‘›(๐‘‹1, ๐‘‹2) โˆ’ 2E{E[๐บa๐‘› (๐‘‹1, ๐‘‹2) |๐‘‹1]}2 + [E๐บa๐‘› (๐‘‹1, ๐‘‹2)]2.

68

Page 80: On the Construction of Minimax Optimal Nonparametric Tests ...

By Lemma 7,

E๐บa๐‘› (๐‘‹1, ๐‘‹2) =(๐œ‹

a๐‘›

) ๐‘‘2โˆซ

exp(โˆ’โ€–๐œ”โ€–2

4a๐‘›

)โ€–F ๐‘0(๐œ”)โ€–2 ๐‘‘๐œ”,

which immediately yields ( a๐‘›๐œ‹

) ๐‘‘2E๐บa๐‘› (๐‘‹1, ๐‘‹2) โ†’ โ€–๐‘0โ€–2

๐ฟ2

and (2a๐‘›๐œ‹

) ๐‘‘2

E๐บ2a๐‘›(๐‘‹1, ๐‘‹2) =

(2a๐‘›๐œ‹

) ๐‘‘2

E๐บ2a๐‘› (๐‘‹1, ๐‘‹2) โ†’ โ€–๐‘0โ€–2๐ฟ2,

as a๐‘› โ†’ โˆž.

On the other hand,

E{E[๐บa๐‘› (๐‘‹1, ๐‘‹2) |๐‘‹1]}2

=

โˆซ (โˆซ๐บa๐‘› (๐‘ฅ, ๐‘ฅโ€ฒ)๐บa๐‘› (๐‘ฅ, ๐‘ฅโ€ฒโ€ฒ)๐‘0(๐‘ฅ)๐‘‘๐‘ฅ

)๐‘0(๐‘ฅโ€ฒ)๐‘0(๐‘ฅโ€ฒโ€ฒ)๐‘‘๐‘ฅโ€ฒ๐‘‘๐‘ฅโ€ฒโ€ฒ

=

โˆซ (โˆซ๐บ2a๐‘› (๐‘ฅ, (๐‘ฅโ€ฒ + ๐‘ฅโ€ฒโ€ฒ)/2)๐‘0(๐‘ฅ)๐‘‘๐‘ฅ

)๐บa๐‘›/2(๐‘ฅโ€ฒ, ๐‘ฅโ€ฒโ€ฒ)๐‘0(๐‘ฅโ€ฒ)๐‘0(๐‘ฅโ€ฒโ€ฒ)๐‘‘๐‘ฅโ€ฒ๐‘‘๐‘ฅโ€ฒโ€ฒ.

Let ๐‘ โˆผ ๐‘ (0, 4a๐‘›๐ผ๐‘‘). Then

โˆซ๐บ2a๐‘› (๐‘ฅ, (๐‘ฅโ€ฒ + ๐‘ฅโ€ฒโ€ฒ)/2)๐‘0(๐‘ฅ)๐‘‘๐‘ฅ = (2๐œ‹)๐‘‘/2E

[F ๐‘0(๐‘) exp

(๐‘ฅโ€ฒ + ๐‘ฅโ€ฒโ€ฒ

2๐‘–๐‘

)]โ‰ค (2๐œ‹)๐‘‘/2

โˆšE โ€–F ๐‘0(๐‘)โ€–2

.๐‘‘ โ€–๐‘0โ€–๐ฟ2/a๐‘‘/4๐‘› .

Thus

E{E[๐บa๐‘› (๐‘‹1, ๐‘‹2) |๐‘‹1]}2 .๐‘‘ โ€–๐‘0โ€–3๐ฟ2/a3๐‘‘/4๐‘› .

Condition (6.17) then follows.

69

Page 81: On the Construction of Minimax Optimal Nonparametric Tests ...

Verifying Conditions (6.18) and (6.19). Since

E๏ฟฝ๏ฟฝ2a๐‘›(๐‘‹1, ๐‘‹2) ๏ฟฝ๐‘‘,๐‘0 a

โˆ’๐‘‘/2๐‘› .

and

E๏ฟฝ๏ฟฝ4a๐‘›(๐‘‹1, ๐‘‹2) . E๐บ4

a๐‘›(๐‘‹1, ๐‘‹2) .๐‘‘ a

โˆ’๐‘‘/2๐‘› ,

we obtain

๐‘›โˆ’2E๏ฟฝ๏ฟฝ4a๐‘›(๐‘‹1, ๐‘‹2)/(E๏ฟฝ๏ฟฝ2

a๐‘›(๐‘‹1, ๐‘‹2))2 .๐‘‘,๐‘0 a

๐‘‘/2๐‘› /๐‘›2 โ†’ 0.

Similarly,

E๏ฟฝ๏ฟฝ2a๐‘›(๐‘‹1, ๐‘‹2)๏ฟฝ๏ฟฝ2

a๐‘›(๐‘‹1, ๐‘‹3) . E๐บ2

a๐‘›(๐‘‹1, ๐‘‹2)๐บ2

a๐‘›(๐‘‹1, ๐‘‹3)

= E๐บ2a๐‘› (๐‘‹1, ๐‘‹2)๐บ2a๐‘› (๐‘‹1, ๐‘‹3)

.๐‘‘,๐‘0 aโˆ’3๐‘‘/4๐‘› .

This implies

๐‘›โˆ’1E๏ฟฝ๏ฟฝ2a๐‘›(๐‘‹1, ๐‘‹2)๏ฟฝ๏ฟฝ2

a๐‘›(๐‘‹1, ๐‘‹3)/(E๏ฟฝ๏ฟฝ2

a๐‘›(๐‘‹1, ๐‘‹2))2 .๐‘‘,๐‘0 a

๐‘‘/4๐‘› /๐‘›โ†’ 0,

which verifies (6.19).

Verifying Condition (6.20). We now prove (6.20). It suffices to show

a๐‘‘๐‘›E(E(๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹3) |๐‘‹2, ๐‘‹3))2 โ†’ 0

70

Page 82: On the Construction of Minimax Optimal Nonparametric Tests ...

as ๐‘›โ†’ โˆž. Note that

E(E(๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹3) |๐‘‹2, ๐‘‹3))2

.E(E(๐บa๐‘› (๐‘‹1, ๐‘‹2)๐บa๐‘› (๐‘‹1, ๐‘‹3) |๐‘‹2, ๐‘‹3))2

=E๐บa๐‘› (๐‘‹1, ๐‘‹2)๐บa๐‘› (๐‘‹1, ๐‘‹3)๐บa๐‘› (๐‘‹4, ๐‘‹2)๐บa๐‘› (๐‘‹4, ๐‘‹3)

=E(๐บa๐‘› (๐‘‹1, ๐‘‹4)๐บa๐‘› (๐‘‹2, ๐‘‹3)E(๐บa๐‘› (๐‘‹1 + ๐‘‹4, ๐‘‹2 + ๐‘‹3) |๐‘‹1 โˆ’ ๐‘‹4, ๐‘‹2 โˆ’ ๐‘‹3)).

Since for any ๐›ฟ > 0,

a๐‘‘๐‘›E(๐บa๐‘› (๐‘‹1, ๐‘‹4)๐บa๐‘› (๐‘‹2, ๐‘‹3)E(๐บa๐‘› (๐‘‹1 + ๐‘‹4, ๐‘‹2 + ๐‘‹3) |๐‘‹1 โˆ’ ๐‘‹4, ๐‘‹2 โˆ’ ๐‘‹3)

(1{โ€–๐‘‹1โˆ’๐‘‹4โ€–>๐›ฟ} + 1โ€–๐‘‹2โˆ’๐‘‹3โ€–>๐›ฟ})) โ†’ 0,

it remains to show that

a๐‘‘๐‘›E(๐บa๐‘› (๐‘‹1, ๐‘‹4)๐บa๐‘› (๐‘‹2, ๐‘‹3)E(๐บa๐‘› (๐‘‹1 + ๐‘‹4, ๐‘‹2 + ๐‘‹3) |๐‘‹1 โˆ’ ๐‘‹4, ๐‘‹2 โˆ’ ๐‘‹3)

1{โ€–๐‘‹1โˆ’๐‘‹4โ€–โ‰ค๐›ฟ,โ€–๐‘‹2โˆ’๐‘‹3โ€–โ‰ค๐›ฟ})) โ†’ 0

for some ๐›ฟ > 0, which holds as long as

E(๐บa๐‘› (๐‘‹1 + ๐‘‹4, ๐‘‹2 + ๐‘‹3) |๐‘‹1 โˆ’ ๐‘‹4, ๐‘‹2 โˆ’ ๐‘‹3) โ†’ 0 (6.21)

uniformly on {โ€–๐‘‹1 โˆ’ ๐‘‹4โ€– โ‰ค ๐›ฟ, โ€–๐‘‹2 โˆ’ ๐‘‹3โ€– โ‰ค ๐›ฟ}.

Let

๐‘Œ1 = ๐‘‹1 โˆ’ ๐‘‹4, ๐‘Œ2 = ๐‘‹2 โˆ’ ๐‘‹3, ๐‘Œ3 = ๐‘‹1 + ๐‘‹4, ๐‘Œ4 = ๐‘‹2 + ๐‘‹3.

71

Page 83: On the Construction of Minimax Optimal Nonparametric Tests ...

Then

E(๐บa๐‘› (๐‘‹1 + ๐‘‹4, ๐‘‹2 + ๐‘‹3) |๐‘‹1 โˆ’ ๐‘‹4, ๐‘‹2 โˆ’ ๐‘‹3)

=

(๐œ‹

a๐‘›

) ๐‘‘2โˆซ

exp(โˆ’โ€–๐œ”โ€–2

4a๐‘›

)F ๐‘๐‘Œ1 (๐œ”)F ๐‘๐‘Œ2 (๐œ”)๐‘‘๐œ”

โ‰ค

โˆšโˆš(๐œ‹

a๐‘›

) ๐‘‘2โˆซ

exp(โˆ’โ€–๐œ”โ€–2

4a๐‘›

) F ๐‘๐‘Œ1 (๐œ”) 2๐‘‘๐œ”

โˆšโˆš(๐œ‹

a๐‘›

) ๐‘‘2โˆซ

exp(โˆ’โ€–๐œ”โ€–2

4a๐‘›

) F ๐‘๐‘Œ2 (๐œ”) 2๐‘‘๐œ”

where

๐‘๐‘ฆ (๐‘ฆโ€ฒ) =๐‘(๐‘Œ1 = ๐‘ฆ,๐‘Œ3 = ๐‘ฆโ€ฒ)

๐‘(๐‘Œ1 = ๐‘ฆ) =

๐‘0

(๐‘ฆ+๐‘ฆโ€ฒ

2

)๐‘0

(๐‘ฆโ€ฒโˆ’๐‘ฆ

2

)โˆซ๐‘0

(๐‘ฆ+๐‘ฆโ€ฒ

2

)๐‘0

(๐‘ฆโ€ฒโˆ’๐‘ฆ

2

)๐‘‘๐‘ฆโ€ฒ

is the conditional density of ๐‘Œ3 given ๐‘Œ1 = ๐‘ฆ. Thus to prove (6.21), it suffices to show

โ„Ž๐‘› (๐‘ฆ) :=(๐œ‹

a๐‘›

) ๐‘‘2โˆซ

exp(โˆ’โ€–๐œ”โ€–2

4a๐‘›

) F ๐‘๐‘ฆ (๐œ”) 2๐‘‘๐œ”

= ๐œ‹๐‘‘2

โˆซexp

(โˆ’โ€–๐œ”โ€–2

4

) F ๐‘๐‘ฆ (โˆša๐‘›๐œ”) 2๐‘‘๐œ”

โ†’ 0

uniformly over {๐‘ฆ : โ€–๐‘ฆโ€– โ‰ค ๐›ฟ}.

Note that

โ„Ž๐‘› (๐‘ฆ) = E๐บa๐‘› (๐‘‹, ๐‘‹โ€ฒ)

where ๐‘‹, ๐‘‹โ€ฒ โˆผiid ๐‘๐‘ฆ, which suggests โ„Ž๐‘› (๐‘ฆ) โ†’ 0 pointwisely. To prove the uniform convergence of

โ„Ž๐‘› (๐‘ฆ), we only need to show

lim๐‘ฆ1โ†’๐‘ฆ

sup๐‘›

|โ„Ž๐‘› (๐‘ฆ1) โˆ’ โ„Ž๐‘› (๐‘ฆ) | = 0

for any ๐‘ฆ.

Since ๐‘0 โˆˆ ๐ฟ2, ๐‘ƒ(๐‘Œ1 = ๐‘ฆ) is continuous. Therefore, the almost surely continuity of ๐‘0 imme-

diately suggests that for every ๐‘ฆ, ๐‘๐‘ฆ1 (ยท) โ†’ ๐‘๐‘ฆ (ยท) almost surely as ๐‘ฆ1 โ†’ ๐‘ฆ. Considering that ๐‘๐‘ฆ1

72

Page 84: On the Construction of Minimax Optimal Nonparametric Tests ...

and ๐‘๐‘ฆ are both densities, it follows that

|F ๐‘๐‘ฆ1 (๐œ”) โˆ’ F ๐‘๐‘ฆ (๐œ”) | โ‰ค (2๐œ‹)โˆ’๐‘‘/2โˆซ

|๐‘๐‘ฆ1 (๐‘ฆโ€ฒ) โˆ’ ๐‘๐‘ฆ (๐‘ฆโ€ฒ) |๐‘‘๐‘ฆโ€ฒ โ†’ 0,

i.e., F ๐‘๐‘ฆ1 โ†’ F ๐‘๐‘ฆ uniformly as ๐‘ฆ1 โ†’ ๐‘ฆ. Therefore we have

sup๐‘›โ†’โˆž

|โ„Ž๐‘› (๐‘ฆ1) โˆ’ โ„Ž๐‘› (๐‘ฆ) | . F ๐‘๐‘ฆ1 โˆ’ F ๐‘๐‘ฆ

๐ฟโˆž

โ†’ 0,

which ensures the uniform convergence of โ„Ž๐‘› (๐‘ฆ) to โ„Ž(๐‘ฆ) over {๐‘ฆ : โ€–๐‘ฆโ€– โ‰ค ๐›ฟ}, and hence (6.20).

Indeed, we have shown that

๐‘›๐›พ2a๐‘› (P, P0)โˆš

2E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2โ†’๐‘‘ ๐‘ (0, 1).

By Slutsky Theorem, in order to prove (3.3), it sufficies to show

๏ฟฝ๏ฟฝ2๐‘›,a๐‘›/E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 โ†’๐‘ 1,

which is equivalent to

๐‘ 2๐‘›,a๐‘›/E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 โ†’๐‘ 1 (6.22)

since 1/๐‘›2 = ๐‘œ(E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2).

It follows from

E(๐‘ 2๐‘›,a๐‘›

)= E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2

73

Page 85: On the Construction of Minimax Optimal Nonparametric Tests ...

and

var(๐‘ 2๐‘›,a๐‘›

).๐‘›โˆ’4var ยฉยญยซ

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๐บ2a๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— )ยชยฎยฌ + ๐‘›โˆ’6var

ยฉยญยญยญยซโˆ‘

1โ‰ค๐‘–, ๐‘—1, ๐‘—2โ‰ค๐‘›|{๐‘–, ๐‘—1, ๐‘—2}|=3

๐บa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘—1)๐บa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘—2)ยชยฎยฎยฎยฌ

+ ๐‘›โˆ’8varยฉยญยญยญยซ

โˆ‘1โ‰ค๐‘–1,๐‘–2, ๐‘—1, ๐‘—2โ‰ค๐‘›|{๐‘–1,๐‘–2, ๐‘—1, ๐‘—2}|=4

๐บa๐‘› (๐‘‹๐‘–1 , ๐‘‹ ๐‘—1)๐บa๐‘› (๐‘‹๐‘–2 , ๐‘‹ ๐‘—2)ยชยฎยฎยฎยฌ

.๐‘›โˆ’2E๐บ4a๐‘› (๐‘‹1, ๐‘‹2) + ๐‘›โˆ’1E๐บ2a๐‘› (๐‘‹1, ๐‘‹2)๐บ2a๐‘› (๐‘‹1, ๐‘‹3) + ๐‘›โˆ’1(E๐บ2a๐‘› (๐‘‹1, ๐‘‹2))2

= ๐‘œ((E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2)2).

that (6.22) holds.

Proof of Theorem 8. Recall that

๐›พ2a๐‘› (P, P0) =

1๐‘›(๐‘› โˆ’ 1)

โˆ‘๐‘–โ‰  ๐‘—

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ;P0)

=๐›พ2a๐‘›(P, P0) +

1๐‘›(๐‘› โˆ’ 1)

โˆ‘๐‘–โ‰  ๐‘—

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ;P)

+ 2๐‘›

๐‘›โˆ‘๐‘–=1

(E๐‘‹โˆผP [๐บa๐‘› (๐‘‹๐‘–, ๐‘‹) |๐‘‹๐‘–] โˆ’ E๐‘‹โˆผP0 [๐บa๐‘› (๐‘‹๐‘–, ๐‘‹) |๐‘‹๐‘–]

โˆ’ E๐‘‹,๐‘‹ โ€ฒโˆผiidP๐บa๐‘› (๐‘‹, ๐‘‹โ€ฒ) + E(๐‘‹,๐‘Œ )โˆผPโŠ—P0๐บa๐‘› (๐‘‹,๐‘Œ )).

Denote by the last two terms on the rightmost hand side by ๐‘‰ (1)a๐‘› and ๐‘‰ (2)

a๐‘› respectively. It is clear

that E๐‘‰ (1)a๐‘› = E๐‘‰

(2)a๐‘› = 0. Then it suffices to show that

sup๐‘โˆˆW๐‘ ,2 (๐‘€)โ€–๐‘โˆ’๐‘0โ€–โ‰ฅฮ”๐‘›

E(๐‘‰

(1)a๐‘›

)2+ E

(๐‘‰

(2)a๐‘›

)2

๐›พ4a๐‘› (P, P0)

โ†’ 0 (6.23)

74

Page 86: On the Construction of Minimax Optimal Nonparametric Tests ...

and

inf๐‘โˆˆW๐‘ ,2 (๐‘€)โ€–๐‘โˆ’๐‘0โ€–โ‰ฅฮ”๐‘›

๐‘›๐›พ2๐บa๐‘›

(P, P0)โˆšE

(๏ฟฝ๏ฟฝ2๐‘›,a๐‘›

) โ†’ โˆž (6.24)

as ๐‘›โ†’ โˆž.

We first prove (6.23). Note that โ€–๐‘โ€–๐ฟ2 โ‰ค โ€–๐‘โ€–W๐‘ ,2 (๐‘€) โ‰ค ๐‘€ . Following arguments similar to

those in the proof of Theorem 7, we get

E(๐‘‰

(1)a๐‘›

)2. ๐‘›โˆ’2E๐บ2

a๐‘›(๐‘‹1, ๐‘‹2) .๐‘‘ ๐‘€

2๐‘›โˆ’2aโˆ’๐‘‘/2๐‘› ,

and

E(๐‘‰

(2)a๐‘›

)2โ‰ค 4๐‘›E

[E๐‘‹โˆผP [๐บa๐‘› (๐‘‹๐‘–, ๐‘‹) |๐‘‹๐‘–] โˆ’ E๐‘‹โˆผP0 [๐บa๐‘› (๐‘‹๐‘–, ๐‘‹) |๐‘‹๐‘–]

]2

=4๐‘›

โˆซ (โˆซ๐บ2a๐‘› (๐‘ฅ, (๐‘ฅโ€ฒ + ๐‘ฅโ€ฒโ€ฒ)/2)๐‘(๐‘ฅ)๐‘‘๐‘ฅ

)๐บa๐‘›/2(๐‘ฅโ€ฒ, ๐‘ฅโ€ฒโ€ฒ) ๐‘“ (๐‘ฅโ€ฒ) ๐‘“ (๐‘ฅโ€ฒโ€ฒ)๐‘‘๐‘ฅโ€ฒ๐‘‘๐‘ฅโ€ฒโ€ฒ

.๐‘‘

4๐‘€๐‘›a๐‘‘/4

โˆซ๐บa๐‘›/2(๐‘ฅโ€ฒ, ๐‘ฅโ€ฒโ€ฒ) | ๐‘“ (๐‘ฅโ€ฒ) | | ๐‘“ (๐‘ฅโ€ฒโ€ฒ) |๐‘‘๐‘ฅโ€ฒ๐‘‘๐‘ฅโ€ฒโ€ฒ

.๐‘‘

4๐‘€๐‘›a3๐‘‘/4 โ€– ๐‘“ โ€–

2๐ฟ2.

By Lemma 8, there exists a constant ๐ถ > 0 depending on ๐‘  and ๐‘€ only such that for ๐‘“ โˆˆ

W๐‘ ,2(๐‘€),

โˆซexp

(โˆ’โ€–๐œ”โ€–2

4a๐‘›

)โ€–F ๐‘“ (๐œ”)โ€–2 ๐‘‘๐œ” โ‰ฅ 1

4โ€– ๐‘“ โ€–2

๐ฟ2

given that a๐‘› โ‰ฅ ๐ถโ€– ๐‘“ โ€–โˆ’2/๐‘ ๐ฟ2

. Because a๐‘›ฮ”๐‘ /2๐‘› โ†’ โˆž, we obtain

๐›พ2a๐‘›(P, P0) &๐‘‘ a

โˆ’๐‘‘/2๐‘› โ€– ๐‘“ โ€–2

๐ฟ2,

75

Page 87: On the Construction of Minimax Optimal Nonparametric Tests ...

for sufficiently large ๐‘›. Thus

sup๐‘โˆˆW๐‘ ,2 (๐‘€)โ€–๐‘โˆ’๐‘0โ€–โ‰ฅฮ”๐‘›

E(๐‘‰

(1)a๐‘›

)2

๐›พ4a๐‘› (P, P0)

.๐‘‘ ๐‘€2(๐‘›2a

โˆ’๐‘‘/2๐‘› ฮ”4

๐‘›)โˆ’1 โ†’ 0

and

sup๐‘โˆˆW๐‘ ,2 (๐‘€)โ€–๐‘โˆ’๐‘0โ€–โ‰ฅฮ”๐‘›

E(๐‘‰

(2)a๐‘›

)2

๐›พ4๐บa๐‘›

(P, P0).๐‘‘ ๐‘€ (๐‘›aโˆ’๐‘‘/4๐‘› ฮ”2

๐‘›)โˆ’1 โ†’ 0,

as ๐‘›โ†’ โˆž.

Next we prove (6.24). It follows from

E(๏ฟฝ๏ฟฝ2๐‘›,a๐‘›

)โ‰ค Emax

{๏ฟฝ๏ฟฝ๐‘ 2๐‘›,a๐‘› ๏ฟฝ๏ฟฝ , 1/๐‘›2} . E๐บ2a๐‘› (๐‘‹1, ๐‘‹2) + 1/๐‘›2 .๐‘‘ ๐‘€2a

โˆ’๐‘‘/2๐‘› + 1/๐‘›2

that (6.24) holds.

Proof of Theorem 9. This, in a certain sense, can be viewed as an extension of results from Ingster

(1987), and the proof proceeds in a similar fashion. While Ingster (1987) considered the case when

๐‘0 is the uniform distribution on [0, 1], we shall show that similar bounds hold for a wider class of

๐‘0.

For any ๐‘€ > 0 and ๐‘0 such that โ€–๐‘0โ€–W๐‘ ,2 < ๐‘€ , let

๐ปGOF1 (ฮ”๐‘›; ๐‘ , ๐‘€ โˆ’ โ€–๐‘0โ€–W๐‘ ,2)โˆ—

:= {๐‘ โˆˆ W๐‘ ,2 : โ€–๐‘ โˆ’ ๐‘0โ€–W๐‘ ,2 โ‰ค ๐‘€ โˆ’ โ€–๐‘0โ€–W๐‘ ,2 , โ€–๐‘ โˆ’ ๐‘0โ€–๐ฟ2 โ‰ฅ ฮ”๐‘›}.

It is clear that ๐ปGOF1 (ฮ”๐‘›; ๐‘ ) โŠƒ ๐ปGOF

1 (ฮ”๐‘›; ๐‘ , ๐‘€ โˆ’ โ€–๐‘0โ€–W๐‘ ,2)โˆ—. Hence it suffices to prove Theorem

9 with ๐ปGOF1 (ฮ”๐‘›; ๐‘ ) replaced by ๐ปGOF

1 (ฮ”๐‘›; ๐‘ , ๐‘€)โˆ— for an arbitrary ๐‘€ > 0. We shall abbreviate

๐ปGOF1 (ฮ”๐‘›; ๐‘ , ๐‘€)โˆ— as ๐ปGOF

1 (ฮ”๐‘›; ๐‘ )โˆ— in the rest of the proof.

76

Page 88: On the Construction of Minimax Optimal Nonparametric Tests ...

Since ๐‘0 is almost surely continuous, there exists ๐‘ฅ0 โˆˆ R๐‘‘ and ๐›ฟ, ๐‘ > 0 such that

๐‘0(๐‘ฅ) โ‰ฅ ๐‘ > 0, โˆ€ โ€–๐‘ฅ โˆ’ ๐‘ฅ0โ€– โ‰ค ๐›ฟ.

In light of this, we shall assume ๐‘0(๐‘ฅ) โ‰ฅ ๐‘ > 0, for all ๐‘ฅ โˆˆ [0, 1]๐‘‘ without loss of generality.

Let ๐’‚๐‘› be a multivariate random index. As proved in Ingster (1987), in order to prove the

existence of ๐›ผ โˆˆ (0, 1) such that no asymptotic ๐›ผ-level test can be consistent, it suffices to identify

๐‘๐‘›,๐’‚๐‘› โˆˆ ๐ปGOF1 (ฮ”๐‘›; ๐‘ )โˆ— for all possible values of ๐’‚๐‘› such that

E๐‘0

(๐‘๐‘› (๐‘‹1, ยท ยท ยท , ๐‘‹๐‘›)โˆ๐‘›

๐‘–=1 ๐‘0(๐‘‹๐‘–)

)2= ๐‘‚ (1), (6.25)

where

๐‘๐‘› (๐‘ฅ1, ยท ยท ยท , ๐‘ฅ๐‘›) = E๐’‚๐‘›

(๐‘›โˆ๐‘–=1

๐‘๐‘›,๐’‚๐‘› (๐‘ฅ๐‘–)), โˆ€ ๐‘ฅ1, ยท ยท ยท , ๐‘ฅ๐‘›,

i.e., ๐‘ is the mixture of all ๐‘๐‘›,๐’‚๐‘›โ€™s.

Let 1{๐‘ฅโˆˆ[0,1]๐‘‘}, ๐œ™๐‘›,1, ยท ยท ยท , ๐œ™๐‘›,๐ต๐‘› be an orthonormal sets of functions in ๐ฟ2(R๐‘‘) such that the

supports of ๐œ™๐‘›,1, ยท ยท ยท , ๐œ™๐‘›,๐ต๐‘› are disjoint and all included in [0, 1]๐‘‘ . Let ๐’‚๐‘› = (๐‘Ž๐‘›,1, ยท ยท ยท , ๐‘Ž๐‘›,๐ต๐‘›)

satisfy that ๐‘Ž๐‘›,1, ยท ยท ยท , ๐‘Ž๐‘›,๐ต๐‘› are independent and that

๐‘(๐‘Ž๐‘›,๐‘˜ = 1) = ๐‘(๐‘Ž๐‘›,๐‘˜ = โˆ’1) = 12, โˆ€ 1 โ‰ค ๐‘˜ โ‰ค ๐ต๐‘›.

Define

๐‘๐‘›,๐’‚๐‘› = ๐‘0 + ๐‘Ÿ๐‘›๐ต๐‘›โˆ‘๐‘˜=1

๐‘Ž๐‘›,๐‘˜๐œ™๐‘›,๐‘˜ .

Then๐‘๐‘›,๐’‚๐‘›

๐‘0= 1 + ๐‘Ÿ๐‘›

๐ต๐‘›โˆ‘๐‘˜=1

๐‘Ž๐‘›,๐‘˜๐œ™๐‘›,๐‘˜

๐‘0,

where 1, ๐œ™๐‘›,1๐‘0, ยท ยท ยท , ๐œ™๐‘›,๐ต๐‘›

๐‘0are orthogonal in ๐ฟ2(๐‘ƒ0).

77

Page 89: On the Construction of Minimax Optimal Nonparametric Tests ...

By arguments similar to those in Ingster (1987), we find

E๐‘0

(๐‘๐‘› (๐‘‹1, ยท ยท ยท , ๐‘‹๐‘›)โˆ๐‘›

๐‘–=1 ๐‘0(๐‘‹๐‘–)

)2โ‰ค exp

(12๐ต๐‘›๐‘›

2๐‘Ÿ4๐‘› max

1โ‰ค๐‘˜โ‰ค๐ต๐‘›

(โˆซ๐œ™2๐‘›,๐‘˜/๐‘0๐‘‘๐‘ฅ

)2)

โ‰ค exp(

12๐‘2๐ต๐‘›๐‘›

2๐‘Ÿ4๐‘›

).

In order to ensure (6.25), it suffices to have

๐ต1/2๐‘› ๐‘›๐‘Ÿ2

๐‘› = ๐‘‚ (1). (6.26)

Therefore, given ฮ”๐‘› = ๐‘‚

(๐‘›โˆ’

2๐‘ 4๐‘ +๐‘‘

), once we can find proper ๐‘Ÿ๐‘›, ๐ต๐‘› and ๐œ™๐‘›,1, ยท ยท ยท , ๐œ™๐‘›,๐ต๐‘› such that

๐‘๐‘›,๐’‚๐‘› โˆˆ ๐ปGOF1 (ฮ”๐‘›; ๐‘ )โˆ— for all ๐’‚๐‘› and (6.26) holds, the proof is finished.

Let ๐‘๐‘› = ๐ต1/๐‘‘๐‘› , ๐œ™ be an infinitely differentiable function supported on [0, 1]๐‘‘ that is orthogonal

to 1{๐‘ฅโˆˆ[0,1]๐‘‘} in ๐ฟ2, and for each ๐‘ฅ๐‘›,๐‘˜ โˆˆ {0, 1, ยท ยท ยท , ๐‘๐‘› โˆ’ 1}โŠ—๐‘‘ , let

๐œ™๐‘›,๐‘˜ (๐‘ฅ) =๐‘๐‘‘/2๐‘›

โ€–๐œ™โ€–๐ฟ2

๐œ™(๐‘๐‘›๐‘ฅ โˆ’ ๐‘ฅ๐‘›,๐‘˜ ), โˆ€ ๐‘ฅ โˆˆ R๐‘‘ .

Then all ๐œ™๐‘›,๐‘˜ โ€™s are supported on [0, 1]๐‘‘ and

ใ€ˆ๐œ™๐‘›,๐‘˜ , 1ใ€‰๐ฟ2 =๐‘๐‘‘/2๐‘›

โ€–๐œ™โ€–๐ฟ2

โˆซR๐‘‘๐œ™(๐‘๐‘›๐‘ฅ โˆ’ ๐‘ฅ๐‘›,๐‘˜ )๐‘‘๐‘ฅ =

1๐‘๐‘‘/2๐‘› โ€–๐œ™โ€–๐ฟ2

โˆซR๐‘‘๐œ™(๐‘ฅ)๐‘‘๐‘ฅ = 0,

โ€–๐œ™๐‘›,๐‘˜ โ€–2๐ฟ2

=๐‘๐‘‘๐‘›

โ€–๐œ™โ€–2๐ฟ2

โˆซ[0,1/๐‘๐‘›]๐‘‘

๐œ™2(๐‘๐‘›๐‘ฅ)๐‘‘๐‘ฅ = 1,

โ€–๐œ™๐‘›,๐‘˜ โ€–2W๐‘ ,2 โ‰ค ๐‘2๐‘ 

๐‘›

โ€–๐œ™โ€–2W๐‘ ,2

โ€–๐œ™โ€–2๐ฟ2

.

Since for ๐‘˜ โ‰  ๐‘˜โ€ฒ, the supports of ๐œ™๐‘›,๐‘˜ and ๐œ™๐‘›,๐‘˜ โ€ฒ are disjoint,

โ€–๐‘๐‘›,๐’‚๐‘› โˆ’ ๐‘0โ€–โˆž = ๐‘Ÿ๐‘›๐‘๐‘‘/2๐‘›

โ€–๐œ™โ€–โˆžโ€–๐œ™โ€–๐ฟ2

,

78

Page 90: On the Construction of Minimax Optimal Nonparametric Tests ...

and

ใ€ˆ๐œ™๐‘›,๐‘˜ , ๐œ™๐‘›,๐‘˜ โ€ฒใ€‰๐ฟ2 = 0, ใ€ˆ๐œ™๐‘›,๐‘˜ , ๐œ™๐‘›,๐‘˜ โ€ฒใ€‰W๐‘ ,2 = 0,

from which we immediately obtain

โ€–๐‘๐‘›,๐’‚๐‘› โˆ’ ๐‘0โ€–2๐ฟ2

= ๐‘Ÿ2๐‘›๐‘

๐‘‘๐‘›

โ€–๐‘๐‘›,๐’‚๐‘› โˆ’ ๐‘0โ€–2W๐‘ ,2 โ‰ค ๐‘Ÿ2

๐‘›๐‘๐‘‘+2๐‘ ๐‘›

โ€–๐œ™โ€–2W๐‘ ,2

โ€–๐œ™โ€–2๐ฟ2

.

To ensure ๐‘๐‘›,๐’‚๐‘› โˆˆ ๐ปGOF1 (ฮ”๐‘›; ๐‘ )โˆ—, it suffices to make

๐‘Ÿ๐‘›๐‘๐‘‘/2๐‘›

โ€–๐œ™โ€–โˆžโ€–๐œ™โ€–๐ฟ2

โ†’ 0 as ๐‘›โ†’ โˆž, (6.27)

๐‘Ÿ2๐‘›๐‘

๐‘‘๐‘› = ฮ”2

๐‘›, (6.28)

๐‘Ÿ2๐‘›๐‘

๐‘‘+2๐‘ ๐‘›

โ€–๐œ™โ€–2W๐‘ ,2

โ€–๐œ™โ€–2๐ฟ2

โ‰ค ๐‘€2. (6.29)

Let

๐‘๐‘› =

(๐‘€ โ€–๐œ™โ€–2

๐ฟ2

โ€–๐œ™โ€–W๐‘ ,2

)1/๐‘ 

ฮ”โˆ’1/๐‘ ๐‘›

, ๐‘Ÿ๐‘› =ฮ”๐‘›

๐‘๐‘‘/2๐‘›

.

Then (6.28) and (6.29) are satisfied. Moreover, given ฮ”๐‘› = ๐‘‚

(๐‘›โˆ’

2๐‘ 4๐‘ +๐‘‘

),

๐ต1/2๐‘› ๐‘›๐‘Ÿ2

๐‘› = ๐‘โˆ’๐‘‘/2๐‘› ๐‘›ฮ”2

๐‘› .๐‘‘,๐œ™,๐‘€ ๐‘›ฮ”4๐‘ +๐‘‘

2๐‘ ๐‘› = ๐‘‚ (1),

and

๐‘Ÿ๐‘›๐‘๐‘‘/2๐‘›

โ€–๐œ™โ€–โˆžโ€–๐œ™โ€–๐ฟ2

.๐œ™ ฮ”๐‘› = ๐‘œ(1)

ensuring both (6.26) and (6.27).

79

Page 91: On the Construction of Minimax Optimal Nonparametric Tests ...

Finally, we show the existence of such ๐œ™. Let

๐œ™0(๐‘ฅ1) =

exp(โˆ’ 1

1โˆ’(4๐‘ฅ1โˆ’1)2

)0 < ๐‘ฅ1 <

12

โˆ’ exp(โˆ’ 1

1โˆ’(4๐‘ฅ1โˆ’3)2

)12 < ๐‘ฅ1 < 1

0 otherwise

.

Then ๐œ™0 is supported on [0, 1], infinitely differentiable and orthogonal to the indicator function of

[0, 1].

Let

๐œ™(๐‘ฅ) =๐‘‘โˆ๐‘™=1

๐œ™0(๐‘ฅ๐‘™), โˆ€ ๐‘ฅ = (๐‘ฅ1, ยท ยท ยท , ๐‘ฅ๐‘‘) โˆˆ R๐‘‘ .

Then ๐œ™ is supported on [0, 1]๐‘‘ , infinitely differentiable and ใ€ˆ๐œ™, 1ใ€‰๐ฟ2 = ใ€ˆ๐œ™0, 1ใ€‰๐‘‘๐ฟ2 [0,1] = 0.

Proof of Theorem 10. Let ๐‘ = ๐‘š + ๐‘› denote the total sample size. It suffices to prove the result

under the assumption that ๐‘›/๐‘ โ†’ ๐‘Ÿ โˆˆ (0, 1).

Note that under ๐ป0,

๐›พ2a๐‘› (P,Q) =

1๐‘›(๐‘› โˆ’ 1)

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ) +1

๐‘š(๐‘š โˆ’ 1)โˆ‘

1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘š๏ฟฝ๏ฟฝa๐‘› (๐‘Œ๐‘–, ๐‘Œ ๐‘— )

โˆ’ 2๐‘›๐‘š

โˆ‘1โ‰ค๐‘–โ‰ค๐‘›

โˆ‘1โ‰ค ๐‘—โ‰ค๐‘š

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘Œ ๐‘— ).

Let ๐‘›/๐‘ = ๐‘Ÿ๐‘›. Then we have

๐›พ2a๐‘› (P,Q)

=๐‘โˆ’2 ยฉยญยซ 1๐‘Ÿ๐‘› (๐‘Ÿ๐‘› โˆ’ ๐‘โˆ’1)

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ) +

1(1 โˆ’ ๐‘Ÿ๐‘›) (1 โˆ’ ๐‘Ÿ๐‘› โˆ’ ๐‘โˆ’1)

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘š

๏ฟฝ๏ฟฝa๐‘› (๐‘Œ๐‘–, ๐‘Œ ๐‘— ) โˆ’2

๐‘Ÿ๐‘› (1 โˆ’ ๐‘Ÿ๐‘›)โˆ‘

1โ‰ค๐‘–โ‰ค๐‘›

โˆ‘1โ‰ค ๐‘—โ‰ค๐‘š

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘Œ ๐‘— )ยชยฎยฌ .

80

Page 92: On the Construction of Minimax Optimal Nonparametric Tests ...

Let

๐›พ2a๐‘› (P,Q)

โ€ฒ =๐‘โˆ’2 ยฉยญยซ 1๐‘Ÿ2

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ) +1

(1 โˆ’ ๐‘Ÿ)2

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘š

๏ฟฝ๏ฟฝa๐‘› (๐‘Œ๐‘–, ๐‘Œ ๐‘— )

โˆ’ 2๐‘Ÿ (1 โˆ’ ๐‘Ÿ)

โˆ‘1โ‰ค๐‘–โ‰ค๐‘›

โˆ‘1โ‰ค ๐‘—โ‰ค๐‘š

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘Œ ๐‘— )ยชยฎยฌ .

As we assume ๐‘Ÿ๐‘› โ†’ ๐‘Ÿ as ๐‘›โ†’ โˆž, Theorem 7 ensures that

๐‘›๐‘šโˆš

2(๐‘› + ๐‘š)[E๏ฟฝ๏ฟฝ2

a๐‘›(๐‘‹1, ๐‘‹2)

]โˆ’ 12(๐›พ2a๐‘› (P,Q) โˆ’ ๐›พ2

a๐‘› (P,Q)โ€ฒ)= ๐‘œ๐‘ (1)

A slight adaption of arguments in Hall (1984) suggests that

E๏ฟฝ๏ฟฝ4a๐‘›(๐‘‹1, ๐‘‹2)

๐‘2E๏ฟฝ๏ฟฝ2a๐‘› (๐‘‹1, ๐‘‹2)

+E๏ฟฝ๏ฟฝ2

a๐‘›(๐‘‹1, ๐‘‹2)๏ฟฝ๏ฟฝ2

a๐‘›(๐‘‹1, ๐‘‹3)

๐‘E๏ฟฝ๏ฟฝ2a๐‘› (๐‘‹1, ๐‘‹2)

+E๐ป2

a๐‘›(๐‘‹1, ๐‘‹2)

E๏ฟฝ๏ฟฝ2a๐‘› (๐‘‹1, ๐‘‹2)

โ†’ 0 (6.30)

ensures that๐‘›๐‘š

โˆš2(๐‘› + ๐‘š)

[E๏ฟฝ๏ฟฝ2

a๐‘›(๐‘‹1, ๐‘‹2)

]โˆ’ 12 ๐›พ2

a๐‘› (P,Q)โ€ฒ โ†’๐‘‘ ๐‘ (0, 1).

Following arguments similar to those in the proof of Theorem 7, given a๐‘› โ†’ โˆž and a๐‘›/๐‘›4/๐‘‘ โ†’ 0,

(6.30) holds and therefore

๐‘›๐‘šโˆš

2(๐‘› + ๐‘š)[E๏ฟฝ๏ฟฝ2

a๐‘›(๐‘‹1, ๐‘‹2)

]โˆ’ 12 ๐›พ2

a๐‘› (P,Q) โ†’๐‘‘ ๐‘ (0, 1).

Additionally, based on the same arguments as in the proof of Theorem 7,

๏ฟฝ๏ฟฝ2๐‘›,๐‘š,a๐‘›/E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 โ†’๐‘ 1.

The proof is therefore concluded.

81

Page 93: On the Construction of Minimax Optimal Nonparametric Tests ...

Proof of Theorem 11. With slight abuse of notation, we shall write

๏ฟฝ๏ฟฝa๐‘› (๐‘ฅ, ๐‘ฆ;P,Q) = ๐บa๐‘› (๐‘ฅ, ๐‘ฆ) โˆ’ E๐‘ŒโˆผQ๐บa๐‘› (๐‘ฅ,๐‘Œ ) โˆ’ E๐‘‹โˆผP๐บa๐‘› (๐‘‹, ๐‘ฆ) + E(๐‘‹,๐‘Œ )โˆผPโŠ—Q๐บa๐‘› (๐‘‹,๐‘Œ ),

We consider the two parts separately.

Part (i). We first verify the consistency of ฮฆHOM๐‘›,a๐‘›,๐›ผ

with a๐‘› ๏ฟฝ ๐‘›4/(๐‘‘+4๐‘ ) given ฮ”๐‘› ๏ฟฝ ๐‘›โˆ’2๐‘ /(๐‘‘+4๐‘ ) .

Observe the following decomposition of ๐›พ2a๐‘› (P,Q),

๐›พ2a๐‘› (P,Q) = ๐›พ

2a๐‘›(P,Q) + ๐ฟ (1)

๐‘›,a๐‘› + ๐ฟ(2)๐‘›,a๐‘› ,

where

๐ฟ(1)๐‘›,a๐‘› =

1๐‘›(๐‘› โˆ’ 1)

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ;P) โˆ’2๐‘š๐‘›

โˆ‘1โ‰ค๐‘–โ‰ค๐‘›

โˆ‘1โ‰ค ๐‘—โ‰ค๐‘š

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘Œ ๐‘— ;P,Q)

+ 1๐‘š(๐‘š โˆ’ 1)

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘š

๏ฟฝ๏ฟฝa๐‘› (๐‘Œ๐‘–, ๐‘Œ ๐‘— ;Q)

and

๐ฟ(2)๐‘›,a๐‘› =

2๐‘›

๐‘›โˆ‘๐‘–=1

(E[๐บa๐‘› (๐‘‹๐‘–, ๐‘‹) |๐‘‹๐‘–] โˆ’ E๐บa๐‘› (๐‘‹, ๐‘‹โ€ฒ) โˆ’ E[๐บa๐‘› (๐‘‹๐‘–, ๐‘Œ ) |๐‘‹๐‘–] + E๐บa๐‘› (๐‘‹,๐‘Œ )

)+ 2๐‘š

๐‘šโˆ‘๐‘—=1

(E[๐บa๐‘› (๐‘Œ ๐‘— , ๐‘Œ ) |๐‘Œ ๐‘— ] โˆ’ E๐บa๐‘› (๐‘Œ,๐‘Œ โ€ฒ) โˆ’ E[๐บa๐‘› (๐‘‹,๐‘Œ ๐‘— ) |๐‘Œ ๐‘— ] + E๐บa๐‘› (๐‘‹,๐‘Œ )

).

In order to prove the consistency of ฮฆHOM๐‘›,a๐‘›,๐›ผ

, it suffices to show

sup๐‘,๐‘žโˆˆW๐‘ ,2 (๐‘€)โ€–๐‘โˆ’๐‘žโ€–๐ฟ2โ‰ฅฮ”๐‘›

E(๐ฟ(1)๐‘›,a๐‘›

)2+ E

(๐ฟ(2)๐‘›,a๐‘›

)2

๐›พ4๐บa๐‘›

(P,Q)โ†’ 0, (6.31)

inf๐‘,๐‘žโˆˆW๐‘ ,2 (๐‘€)โ€–๐‘โˆ’๐‘žโ€–๐ฟ2โ‰ฅฮ”๐‘›

๐›พ2๐บa๐‘›

(P,Q)

(1/๐‘› + 1/๐‘š)โˆšE

(๏ฟฝ๏ฟฝ2๐‘›,๐‘š,a๐‘›

) โ†’ โˆž, (6.32)

82

Page 94: On the Construction of Minimax Optimal Nonparametric Tests ...

as ๐‘› โ†’ โˆž. We now prove (6.31) and (6.32) with arguments similar to those obtained in the proof

of Theorem 8.

Note that

E(๐ฟ (1)๐‘›,a๐‘›)

2

.Eยฉยญยซ 1๐‘›(๐‘› โˆ’ 1)

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ;P)ยชยฎยฌ

2

+ E ยฉยญยซ 2๐‘š๐‘›

โˆ‘1โ‰ค๐‘–โ‰ค๐‘›

โˆ‘1โ‰ค ๐‘—โ‰ค๐‘š

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘Œ ๐‘— ;P,Q)ยชยฎยฌ

2

+ E ยฉยญยซ 1๐‘š(๐‘š โˆ’ 1)

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘š

๏ฟฝ๏ฟฝa๐‘› (๐‘Œ๐‘–, ๐‘Œ ๐‘— ;Q)ยชยฎยฌ

2

.1๐‘›2E๐บ

2a๐‘›(๐‘‹1, ๐‘‹2) +

1๐‘š2E๐บ

2a๐‘›(๐‘Œ1, ๐‘Œ2).

Given ๐‘, ๐‘ž โˆˆ W๐‘ ,2(๐‘€),

E๐บ2a๐‘›(๐‘‹1, ๐‘‹2) .๐‘‘ ๐‘€

2aโˆ’๐‘‘/2๐‘› , E๐บ2

a๐‘›(๐‘Œ1, ๐‘Œ2) .๐‘‘ ๐‘€

2aโˆ’๐‘‘/2๐‘› .

Hence

E(๐ฟ (1)๐‘›,a๐‘›)

2 .๐‘‘ ๐‘€2a

โˆ’๐‘‘/2๐‘›

(1๐‘›2 + 1

๐‘š2

). (6.33)

Now consider bounding ๐ฟ (2)๐‘›,a๐‘› . Let ๐‘“ = ๐‘ โˆ’ ๐‘ž. Then we have

E(๐ฟ (2)๐‘›,a๐‘›)

2 .๐‘‘ aโˆ’ 3๐‘‘

4๐‘› ๐‘€ โ€– ๐‘“ โ€–2

๐ฟ2

(1๐‘›+ 1๐‘š

). (6.34)

Since a๐‘› ๏ฟฝ ๐‘›4/(4๐‘ +๐‘‘) ๏ฟฝ ฮ”โˆ’2/๐‘ ๐‘› , Lemma 8 ensures that for sufficiently large ๐‘›,

๐›พ2๐บa๐‘›

(P,Q) &๐‘‘ aโˆ’๐‘‘/2๐‘› โ€– ๐‘“ โ€–2

๐ฟ2, โˆ€ ๐‘, ๐‘ž โˆˆ W๐‘ ,2(๐‘€).

83

Page 95: On the Construction of Minimax Optimal Nonparametric Tests ...

This together with (6.33) and (6.34) gives

sup๐‘,๐‘žโˆˆW๐‘ ,2 (๐‘€)โ€–๐‘โˆ’๐‘žโ€–๐ฟ2โ‰ฅฮ”๐‘›

E(๐ฟ(1)๐‘›,a๐‘›

)2+ E

(๐ฟ(2)๐‘›,a๐‘›

)2

๐›พ4๐บa๐‘›

(P,Q).๐‘‘

๐‘€2a๐‘‘/2๐‘›

๐‘›2ฮ”4๐‘›

+ ๐‘€a๐‘‘/4๐‘›

๐‘›ฮ”2๐‘›

โ†’ 0

as ๐‘›โ†’ โˆž, which proves (6.31).

Finally, consider (6.32). It follows from

E(๏ฟฝ๏ฟฝ2๐‘›,๐‘š,a๐‘›

)โ‰ค Emax

{๏ฟฝ๏ฟฝ๐‘ 2๐‘›,๐‘š,a๐‘› ๏ฟฝ๏ฟฝ , 1/๐‘›2}. max{E๐บ2

a๐‘›(๐‘‹1, ๐‘‹2),E๐บ2

a๐‘›(๐‘Œ1, ๐‘Œ2)} + 1/๐‘›2

.๐‘‘๐‘€2a

โˆ’๐‘‘/2๐‘› + 1/๐‘›2

that (6.32) holds.

Part (ii). Next, we prove that if lim inf๐‘›โ†’โˆž ฮ”๐‘›๐‘›2๐‘ /(๐‘‘+4๐‘ ) < โˆž, then there exists some ๐›ผ โˆˆ

(0, 1) such that no asymptotic ๐›ผ-level test can be consistent. To prove this, we shall verify that

consistency of homogeneity test is harder to achieve than that of goodness-of-fit test.

Consider an arbitrary ๐‘0 โˆˆ W๐‘ ,2(๐‘€/2). It immediately follows

๐ปHOM1 (ฮ”๐‘›; ๐‘ ) โŠƒ {(๐‘, ๐‘0) : ๐‘ โˆˆ ๐ปGOF

1 (ฮ”๐‘›; ๐‘ )}.

Let {ฮฆ๐‘›}๐‘›โ‰ฅ1 be any sequence of asymptotic ๐›ผ-level homogeneity tests, where

ฮฆ๐‘› = ฮฆ๐‘› (๐‘‹1, ยท ยท ยท , ๐‘‹๐‘›, ๐‘Œ1, ยท ยท ยท , ๐‘Œ๐‘š).

Then if ๐‘Œ1, ยท ยท ยท , ๐‘Œ๐‘š โˆผiid ๐‘ƒ0, {ฮฆ๐‘›}๐‘›โ‰ฅ1 can also be treated as a sequence of (random) goodness-of-fit

tests

ฮฆ๐‘› (๐‘‹1, ยท ยท ยท , ๐‘‹๐‘›, ๐‘Œ1, ยท ยท ยท , ๐‘Œ๐‘š) = ฮฆ๐‘› (๐‘‹1, ยท ยท ยท , ๐‘‹๐‘›)

84

Page 96: On the Construction of Minimax Optimal Nonparametric Tests ...

whose probabilities of type I error with respect to ๐‘ƒ0 are controlled at ๐›ผ asymptotically. Moreover,

power{ฮฆ๐‘›;๐ปHOM1 (ฮ”๐‘›; ๐‘ )} โ‰ค power{ฮฆ๐‘›;๐ปGOF

1 (ฮ”๐‘›; ๐‘ )}

Since 0 < ๐‘ โ‰ค ๐‘š/๐‘› โ‰ค ๐ถ < โˆž, Theorem 9 ensures that there exists some ๐›ผ โˆˆ (0, 1) such that

for any sequence of asymptotic ๐›ผ-level tests {ฮฆ๐‘›}๐‘›โ‰ฅ1,

lim inf๐‘›โ†’โˆž

power{ฮฆ๐‘›;๐ปHOM1 (ฮ”๐‘›; ๐‘ )} โ‰ค lim inf

๐‘›โ†’โˆžpower{ฮฆ๐‘›;๐ปGOF

1 (ฮ”๐‘›; ๐‘ )} < 1

given lim inf๐‘›โ†’โˆž ฮ”๐‘›๐‘›2๐‘ /(๐‘‘+4๐‘ ) < โˆž.

Proof of Theorem 12. For brevity, we shall focus on the case when ๐‘˜ = 2 in the rest of the proof.

Our argument, however, can be straightforwardly extended to the more general cases. The proof

relies on the following decomposition of ๐›พ2a๐‘› (P, P๐‘‹

1 โŠ— P๐‘‹2) under ๐ปIND0 :

๐›พ2a๐‘› (P, P

๐‘‹1 โŠ— P๐‘‹2) = 1๐‘›(๐‘› โˆ’ 1)

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๐บโˆ—a๐‘›(๐‘‹๐‘–, ๐‘‹ ๐‘— ) + ๐‘…๐‘›,

where

๐บโˆ—a๐‘›(๐‘ฅ, ๐‘ฆ) = ๏ฟฝ๏ฟฝa๐‘› (๐‘ฅ, ๐‘ฆ) โˆ’

โˆ‘1โ‰ค ๐‘—โ‰ค2

๐‘” ๐‘— (๐‘ฅ ๐‘— , ๐‘ฆ) โˆ’โˆ‘

1โ‰ค ๐‘—โ‰ค2๐‘” ๐‘— (๐‘ฆ ๐‘— , ๐‘ฅ) +

โˆ‘1โ‰ค ๐‘—1, ๐‘—2โ‰ค2

๐‘” ๐‘—1, ๐‘—2 (๐‘ฅ ๐‘—1 , ๐‘ฆ ๐‘—2)

and the remainder ๐‘…๐‘› satisfies

E(๐‘…๐‘›)2 . E๐บ2a (๐‘‹1, ๐‘‹2)/๐‘›3 .๐‘‘ โ€–๐‘โ€–2๐ฟ2aโˆ’๐‘‘/2๐‘› /๐‘›3.

See Appendix B.4 for more details.

85

Page 97: On the Construction of Minimax Optimal Nonparametric Tests ...

Moreover, borrowing arguments in the proof of Lemma 1, we obtain

E(๐บโˆ—a๐‘›(๐‘‹1, ๐‘‹2) โˆ’ ๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2))2

.โˆ‘

1โ‰ค ๐‘—โ‰ค2E(๐‘” ๐‘— (๐‘‹ ๐‘—

1 , ๐‘‹2))2

+โˆ‘

1โ‰ค ๐‘—1, ๐‘—2โ‰ค2E(๐‘” ๐‘—1, ๐‘—2 (๐‘‹

๐‘—11 , ๐‘‹

๐‘—22 )

)2

โ‰คโˆ‘

1โ‰ค ๐‘—1โ‰  ๐‘—2โ‰ค2E๐บ2a๐‘› (๐‘‹

๐‘—11 , ๐‘‹

๐‘—12 ) ยท E

{E

[๐บa๐‘› (๐‘‹

๐‘—21 , ๐‘‹

๐‘—22 )

๏ฟฝ๏ฟฝ๏ฟฝ๐‘‹ ๐‘—21

]}2+โˆ‘

1โ‰ค ๐‘—1โ‰  ๐‘—2โ‰ค2E๐บ2a๐‘› (๐‘‹

๐‘—11 , ๐‘‹

๐‘—12 ) [E๐บa๐‘› (๐‘‹

๐‘—21 , ๐‘‹

๐‘—22 )]2+

2E{E

[๐บa๐‘› (๐‘‹1

1 , ๐‘‹12 )

๏ฟฝ๏ฟฝ๏ฟฝ๐‘‹11

]}2E

{E

[๐บa๐‘› (๐‘‹2

1 , ๐‘‹22 )

๏ฟฝ๏ฟฝ๏ฟฝ๐‘‹21

]}2

.๐‘‘ aโˆ’๐‘‘1/2โˆ’3๐‘‘2/4๐‘› โ€–๐‘1โ€–2

๐ฟ2โ€–๐‘2โ€–3

๐ฟ2+ aโˆ’3๐‘‘1/4โˆ’๐‘‘2/2

๐‘› โ€–๐‘1โ€–3๐ฟ2โ€–๐‘2โ€–2

๐ฟ2

Together with the fact that

(2a๐‘›/๐œ‹)๐‘‘/2E๏ฟฝ๏ฟฝ2a๐‘›(๐‘‹1, ๐‘‹2) โ†’ โ€–๐‘โ€–2

๐ฟ2

as a๐‘› โ†’ โˆž, we conclude that

๐›พ2a๐‘› (P, P

๐‘‹1 โŠ— P๐‘‹2) = ๐ท (a๐‘›) + ๐‘œ๐‘(โˆšE๐ท2(a๐‘›)

),

where

๐ท (a๐‘›) =1

๐‘›(๐‘› โˆ’ 1)โˆ‘

1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ).

Applying arguments similar to those in the proofs of Theorem 7 and 10, we have

๐ท (a๐‘›)โˆšE๐ท2(a๐‘›)

โ†’๐‘‘ ๐‘ (0, 1).

Since

E๐ท2(a๐‘›) =2

๐‘›(๐‘› โˆ’ 1)E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 and E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2/E[๐บโˆ—a๐‘›(๐‘‹1, ๐‘‹2)]2 โ†’ 1,

86

Page 98: On the Construction of Minimax Optimal Nonparametric Tests ...

it remains to prove

๏ฟฝ๏ฟฝ2๐‘›,a๐‘›/E[๐บโˆ—a๐‘›(๐‘‹1, ๐‘‹2)]2 โ†’๐‘ 1,

which immediately follows by observing

๐‘ 2๐‘›,a๐‘›/E[๐บโˆ—a๐‘›(๐‘‹1, ๐‘‹2)]2 =

2โˆ๐‘—=1

๐‘ 2๐‘›, ๐‘— ,a๐‘›/E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘—

1 , ๐‘‹๐‘—

2 )]2 โ†’๐‘ 1

and 1/๐‘›2 = ๐‘œ(E[๐บโˆ—a๐‘›(๐‘‹1, ๐‘‹2)]2). The proof is therefore concluded.

Proof of Theorem 13. We prove the two parts separately. Part (i). The proof of consistency of

ฮฆIND๐‘›,a๐‘›,๐›ผ

is very similar to its counterpart in the proof of Theorem 11. It sufficies to show

sup๐‘โˆˆ๐ปIND

1 (ฮ”๐‘›,๐‘ )

var(๐›พ2a๐‘› (P, P๐‘‹

1 โŠ— P๐‘‹2))๐›พ4a๐‘› (P, P๐‘‹

1 โŠ— P๐‘‹2)โ†’ 0, (6.35)

inf๐‘โˆˆ๐ปIND

1 (ฮ”๐‘›,๐‘ )

๐‘›๐›พ2a๐‘›(P, P๐‘‹1 โŠ— P๐‘‹2)E

(๏ฟฝ๏ฟฝ๐‘›,a๐‘›

) โ†’ โˆž, (6.36)

as ๐‘›โ†’ โˆž.

We begin with (6.35). Let ๐‘“ = ๐‘ โˆ’ ๐‘1 โŠ— ๐‘2. Lemma 8 then implies that there exists ๐ถ =

๐ถ (๐‘ , ๐‘€) > 0 such that

๐›พ2a (P, P๐‘‹

1 โŠ— P๐‘‹2) ๏ฟฝ๐‘‘ aโˆ’๐‘‘/2โ€– ๐‘“ โ€–2๐ฟ2

for a โ‰ฅ ๐ถโ€– ๐‘“ โ€–โˆ’2/๐‘ ๐ฟ2

, which is satisfied by all ๐‘ โˆˆ ๐ปIND1 (ฮ”๐‘›, ๐‘ ) given a = a๐‘› and lim

๐‘›โ†’โˆžฮ”๐‘›๐‘›

2๐‘ 4๐‘ +๐‘‘ = โˆž.

On the other hand, we can still do the decomposition of ๐›พ2a๐‘› (P, P๐‘‹

1 โŠ— P๐‘‹2) as in Appendix B.4. We

follow the same notations here.

87

Page 99: On the Construction of Minimax Optimal Nonparametric Tests ...

Under the alternative hypothesis, the โ€œfirst orderโ€ term

๐ท1(a๐‘›)

=2๐‘›

โˆ‘1โ‰ค๐‘–โ‰ค๐‘›

(E๐‘‹๐‘– ,๐‘‹โˆผiidP [๐บa๐‘› (๐‘‹๐‘–, ๐‘‹) |๐‘‹๐‘–] โˆ’ E๐‘‹,๐‘‹ โ€ฒโˆผiidP๐บa๐‘› (๐‘‹, ๐‘‹โ€ฒ)

)โˆ’ 2๐‘›

โˆ‘1โ‰ค๐‘–โ‰ค๐‘›

(E๐‘‹๐‘–โˆผP,๐‘ŒโˆผP๐‘‹1โŠ—P๐‘‹2 [๐บa๐‘› (๐‘‹๐‘–, ๐‘Œ ) |๐‘‹๐‘–] โˆ’ E๐‘‹โˆผP,๐‘ŒโˆผP๐‘‹1โŠ—P๐‘‹2๐บa๐‘› (๐‘‹,๐‘Œ )

)โˆ’

โˆ‘1โ‰ค ๐‘—โ‰ค2

(2๐‘›

โˆ‘1โ‰ค๐‘–โ‰ค๐‘›

(E๐‘‹๐‘–โˆผP๐‘‹1โŠ—P๐‘‹2

,๐‘‹โˆผP [๐บa๐‘› (๐‘‹๐‘–, ๐‘‹) |๐‘‹๐‘—

๐‘–] โˆ’ E

๐‘‹โˆผP,๐‘ŒโˆผP๐‘‹1โŠ—P๐‘‹2๐บa๐‘› (๐‘‹,๐‘Œ )))

+โˆ‘

1โ‰ค ๐‘—โ‰ค2

(2๐‘›

โˆ‘1โ‰ค๐‘–โ‰ค๐‘›

(E๐‘‹๐‘– ,๐‘ŒโˆผiidP๐‘‹

1โŠ—P๐‘‹2 [๐บa๐‘› (๐‘‹๐‘–, ๐‘Œ ) |๐‘‹๐‘—

๐‘–] โˆ’ E

๐‘Œ,๐‘Œ โ€ฒโˆผiidP๐‘‹1โŠ—P๐‘‹2๐บa๐‘› (๐‘Œ,๐‘Œ โ€ฒ)

))

no longer vanish, but based on arguments similar to those in the proof of Theorem 8,

E๐ท21(a๐‘›) .๐‘‘ ๐‘€๐‘›

โˆ’1aโˆ’3๐‘‘/4๐‘› โ€– ๐‘“ โ€–2

๐ฟ2.

Moreover, the โ€œsecond orderโ€ term ๐ท2(a๐‘›) is not solelyโˆ‘

1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›๐บโˆ—a๐‘›(๐‘‹๐‘–, ๐‘‹ ๐‘— )/(๐‘›(๐‘› โˆ’ 1)), but we

still have

E๐ท22(a๐‘›) . ๐‘›โˆ’2 max{E๐บ2a๐‘› (๐‘‹1, ๐‘‹2),E๐บ2a๐‘› (๐‘‹1

1 , ๐‘‹12 )E๐บ2a๐‘› (๐‘‹2

1 , ๐‘‹22 )} .๐‘‘ ๐‘€

2๐‘›โˆ’2aโˆ’๐‘‘/2๐‘› .

Similarly, define the third order term ๐ท3(a๐‘›) and the fourth order term ๐ท4(a๐‘›) as the aggregation

of all 3-variate centered components and the aggregation of all 4-variate centered components in

๐›พ2a๐‘› (P, P๐‘‹

1 โŠ— P๐‘‹2) respectively, which together constitue ๐‘…๐‘›. Then we have

E๐ท23(a๐‘›) .๐‘‘ ๐‘€

2๐‘›โˆ’3aโˆ’๐‘‘/2๐‘› , E๐ท2

4(a๐‘›) .๐‘‘ ๐‘€2๐‘›โˆ’4a

โˆ’๐‘‘/2๐‘› .

Hence we finally obtain

๐›พ2a๐‘› (P, P

๐‘‹1 โŠ— P๐‘‹2) = ๐›พ2a๐‘›(P, P๐‘‹1 โŠ— P๐‘‹2) +

4โˆ‘๐‘™=1

๐ท ๐‘™ (a๐‘›)

88

Page 100: On the Construction of Minimax Optimal Nonparametric Tests ...

and

var(๐›พ2a๐‘› (P, P

๐‘‹1 โŠ— P๐‘‹2))=

4โˆ‘๐‘™=1E๐ท2

๐‘™ (a๐‘›) .๐‘‘ ๐‘€๐‘›โˆ’1a

โˆ’3๐‘‘/4๐‘› โ€– ๐‘“ โ€–2

๐ฟ2+ ๐‘€2๐‘›โˆ’2a

โˆ’๐‘‘/2๐‘›

which proves (6.35).

Now consider (6.36). Since

๏ฟฝ๏ฟฝ๐‘›,a๐‘› โ‰ค max

2โˆ๐‘—=1

โˆš๏ฟฝ๏ฟฝ๏ฟฝ๐‘ 2๐‘›, ๐‘— ,a๐‘› ๏ฟฝ๏ฟฝ๏ฟฝ, 1/๐‘› ,we have

E(๏ฟฝ๏ฟฝ๐‘›,a๐‘›

)โ‰ค

2โˆ๐‘—=1

โˆšE

๏ฟฝ๏ฟฝ๏ฟฝ๐‘ 2๐‘›, ๐‘— ,a๐‘› ๏ฟฝ๏ฟฝ๏ฟฝ + 1/๐‘›,

where

2โˆ๐‘—=1E

๏ฟฝ๏ฟฝ๏ฟฝ๐‘ 2๐‘›, ๐‘— ,a๐‘› ๏ฟฝ๏ฟฝ๏ฟฝ . 2โˆ๐‘—=1E๐บ2a๐‘› (๐‘‹

๐‘—

1 , ๐‘‹๐‘—

2 ) = E๐‘Œ1,๐‘Œ2โˆผiidP๐‘‹1โŠ—P๐‘‹2๐บ2a๐‘› (๐‘Œ1, ๐‘Œ2) .๐‘‘ ๐‘€

2aโˆ’๐‘‘/2๐‘› .

Therefore (6.36) holds.

Part (ii). Then we verify that ๐‘›2๐‘ /(๐‘‘+4๐‘ )ฮ”๐‘› โ†’ โˆž is also the necessary condition for the existence

of consistent asymptotic ๐›ผ-level tests for any ๐›ผ โˆˆ (0, 1). Similarly to the proof of Theorem 11,

the idea is to relate the existence of consistent independence test to the existence of consistent

goodness-of-fit test.

Let ๐‘ ๐‘— ,0 โˆˆ W๐‘ ,2(๐‘€ ๐‘—/

โˆš2)

be density on R๐‘‘ ๐‘— for ๐‘— = 1, 2 and ๐‘0 be the product of ๐‘1,0 and

๐‘2,0, i.e.,

๐‘0(๐‘ฅ1, ๐‘ฅ2) = ๐‘1,0(๐‘ฅ1)๐‘2,0(๐‘ฅ2), โˆ€ ๐‘ฅ1 โˆˆ R๐‘‘1 , ๐‘ฅ2 โˆˆ R๐‘‘2 .

Hence ๐‘0 โˆˆ W๐‘ ,2(๐‘€/2).

Let

๐ปGOF1 (ฮ”๐‘›; ๐‘ )โ€ฒ := {๐‘ : ๐‘ โˆˆ W๐‘ ,2(๐‘€), ๐‘1 = ๐‘1,0, ๐‘2 = ๐‘2,0, โ€–๐‘ โˆ’ ๐‘0โ€–๐ฟ2 โ‰ฅ ฮ”๐‘›}.

89

Page 101: On the Construction of Minimax Optimal Nonparametric Tests ...

We immediately have

๐ปIND1 (ฮ”๐‘›; ๐‘ ) โŠƒ ๐ปGOF

1 (ฮ”๐‘›; ๐‘ )โ€ฒ

Let {ฮฆ๐‘›}๐‘›โ‰ฅ1 be any sequence of asymptotic ๐›ผ-level independence tests, where

ฮฆ๐‘› = ฮฆ๐‘› (๐‘‹1, ยท ยท ยท , ๐‘‹๐‘›).

Then {ฮฆ๐‘›}๐‘›โ‰ฅ1 can also be treated as a sequence of asymptotic ๐›ผ-level goodness-of-fit tests with

the null density being ๐‘0. Moreover,

power{ฮฆ๐‘›;๐ปIND1 (ฮ”๐‘›; ๐‘ )} โ‰ค power{ฮฆ๐‘›;๐ปGOF

1 (ฮ”๐‘›; ๐‘ )โ€ฒ}.

It remains to show that given lim inf๐‘›โ†’โˆž ๐‘›2๐‘ /(๐‘‘+4๐‘ )ฮ”๐‘› < โˆž, there exists some ๐›ผ โˆˆ (0, 1) such

that

lim inf๐‘›โ†’โˆž

power{ฮฆ๐‘›;๐ปGOF1 (ฮ”๐‘›; ๐‘ )โ€ฒ} < 1,

which cannot be directly obtained from Theorem 9 because of the additional constraints

๐‘1 = ๐‘1,0, ๐‘2 = ๐‘2,0 (6.37)

in ๐ปGOF1 (ฮ”๐‘›; ๐‘ )โ€ฒ.

However, by modifying the proof of Theorem 9, we only need to further require each ๐‘๐‘›,๐’‚๐‘› in

the proof of Theorem 9 satisfying (6.37), or equivalently,

โˆซR๐‘‘2

(๐‘ โˆ’ ๐‘0) (๐‘ฅ1, ๐‘ฅ2)๐‘‘๐‘ฅ2 = 0,โˆซR๐‘‘1

(๐‘ โˆ’ ๐‘0) (๐‘ฅ1, ๐‘ฅ2)๐‘‘๐‘ฅ1 = 0.

90

Page 102: On the Construction of Minimax Optimal Nonparametric Tests ...

Recall that each ๐‘๐‘›,๐’‚๐‘› = ๐‘0 + ๐‘Ÿ๐‘›๐ต๐‘›โˆ‘๐‘˜=1

๐‘Ž๐‘›,๐‘˜๐œ™๐‘›,๐‘˜ , where

๐œ™๐‘›,๐‘˜ (๐‘ฅ) =๐‘๐‘‘/2๐‘›

โ€–๐œ™โ€–๐ฟ2

๐œ™(๐‘๐‘›๐‘ฅ โˆ’ ๐‘ฅ๐‘›,๐‘˜ ).

Write ๐‘ฅ๐‘›,๐‘˜ = (๐‘ฅ1๐‘›,๐‘˜, ๐‘ฅ2๐‘›,๐‘˜

) โˆˆ R๐‘‘1 ร— R๐‘‘2 . Since ๐œ™ can be decomposed as

๐œ™(๐‘ฅ1, ๐‘ฅ2) = ๐œ™1(๐‘ฅ1)๐œ™2(๐‘ฅ2),

we have

๐œ™๐‘›,๐‘˜ (๐‘ฅ) =๐‘๐‘‘/2๐‘›

โ€–๐œ™โ€–๐ฟ2

๐œ™1(๐‘๐‘›๐‘ฅ1 โˆ’ ๐‘ฅ1๐‘›,๐‘˜ )๐œ™2(๐‘๐‘›๐‘ฅ2 โˆ’ ๐‘ฅ2

๐‘›,๐‘˜ )

Hence

โˆซR๐‘‘2

(๐‘๐‘›,๐’‚๐‘› โˆ’ ๐‘0) (๐‘ฅ1, ๐‘ฅ2)๐‘‘๐‘ฅ2 =๐‘Ÿ๐‘›

๐ต๐‘›โˆ‘๐‘˜=1

๐‘Ž๐‘›,๐‘˜

โˆซR๐‘‘2

๐œ™๐‘›,๐‘˜ (๐‘ฅ1, ๐‘ฅ2)๐‘‘๐‘ฅ2

=๐‘Ÿ๐‘›

๐ต๐‘›โˆ‘๐‘˜=1

๐‘Ž๐‘›,๐‘˜๐‘๐‘‘/2๐‘›

โ€–๐œ™โ€–๐ฟ2

ยท ๐œ™1(๐‘๐‘›๐‘ฅ1 โˆ’ ๐‘ฅ1๐‘›,๐‘˜ ) ยท

1๐‘๐‘‘2๐‘›

โˆซR๐‘‘2

๐œ™2(๐‘ฅ2)๐‘‘๐‘ฅ2

=0

sinceโˆซR๐‘‘2 ๐œ™2(๐‘ฅ2)๐‘‘๐‘ฅ2 = 0. Similarly,

โˆซR๐‘‘1 (๐‘๐‘›,๐’‚๐‘› โˆ’ ๐‘0) (๐‘ฅ1, ๐‘ฅ2)๐‘‘๐‘ฅ1 = 0. The proof is therefore

finished.

Proof of Theorem 14. The proof of Theorem 14 consists of two steps. First, we bound ๐‘žGOF๐‘›,๐›ผ . To

be more specific, we show that there exists ๐ถ = ๐ถ (๐‘‘) > 0 such that

๐‘žGOF๐‘›,๐›ผ โ‰ค ๐ถ (๐‘‘) log log ๐‘›

91

Page 103: On the Construction of Minimax Optimal Nonparametric Tests ...

for sufficiently large ๐‘›, which holds if

lim๐‘›โ†’โˆž

๐‘ƒ(๐‘‡GOF(adapt)๐‘› โ‰ฅ ๐ถ (๐‘‘) log log ๐‘›) = 0 (6.38)

under ๐ปGOF0 . Second, we show that there exists ๐‘ > 0 such that

lim inf๐‘›โ†’โˆž

ฮ”๐‘›,๐‘  (๐‘›/log log ๐‘›)2๐‘ /(๐‘‘+4๐‘ ) > ๐‘

ensures

inf๐‘โˆˆ๐ปGOF(adapt)

1 (ฮ”๐‘›,๐‘ :๐‘ โ‰ฅ๐‘‘/4)๐‘ƒ(๐‘‡GOF(adapt)

๐‘› โ‰ฅ ๐ถ (๐‘‘) log log ๐‘›) โ†’ 1 (6.39)

as ๐‘›โ†’ โˆž.

Verifying (6.38). In order to prove (6.38), we first show the following two lemmas. The

first lemma suggests that ๏ฟฝ๏ฟฝ2๐‘›,a๐‘› is a consistent estimator of E๏ฟฝ๏ฟฝ2a๐‘›(๐‘‹1, ๐‘‹2) uniformly over all a๐‘› โˆˆ

[1, ๐‘›2/๐‘‘]. Recall we have shown in the proof of Theorem 7 that for a๐‘› increasing at a proper rate,

๏ฟฝ๏ฟฝ2๐‘›,a๐‘›/E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 โ†’๐‘ 1.

Hence the first lemma is a uniform version of such result.

Lemma 5. We have that ๏ฟฝ๏ฟฝ2๐‘›,a๐‘›/E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 converges to 1 uniformly over a๐‘› โˆˆ [1, ๐‘›2/๐‘‘], i.e.,

sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๐‘ 2๐‘›,a๐‘›/E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 โˆ’ 1๏ฟฝ๏ฟฝ = ๐‘œ๐‘ (1).

92

Page 104: On the Construction of Minimax Optimal Nonparametric Tests ...

We defer the proof of Lemma 5 to the appendix. Note that

๐‘‡GOF(adapt)๐‘› = sup

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๐‘›๐›พ2a๐‘› (P, P0)โˆš

2E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2ยทโˆšE[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2/๏ฟฝ๏ฟฝ2๐‘›,a๐‘›

โ‰ค sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐‘›๐›พ2a๐‘› (P, P0)โˆš

2E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ยท sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

โˆšE[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2/๏ฟฝ๏ฟฝ2๐‘›,a๐‘› .

Lemma 5 first ensures that

sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

โˆšE[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2/๏ฟฝ๏ฟฝ2๐‘›,a๐‘› = 1 + ๐‘œ๐‘ (1).

It therefore suffices to show that under ๐ปGOF0 ,

๐‘‡GOF(adapt)๐‘› := sup

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐‘›๐›พ2a๐‘› (P, P0)โˆš

2E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝis also of order log log ๐‘›. This is the crux of our argument yet its proof is lengthy. For brevity, we

shall state it as a lemma here and defer its proof to the appendix.

Lemma 6. There exists ๐ถ = ๐ถ (๐‘‘) > 0 such that

lim๐‘›โ†’โˆž

๐‘ƒ

(๐‘‡

GOF(adapt)๐‘› โ‰ฅ ๐ถ log log ๐‘›

)= 0

under ๐ปGOF0 .

Verifying (6.39). Let

a๐‘› (๐‘ )โ€ฒ =(log log ๐‘›

๐‘›

)โˆ’4/(4๐‘ +๐‘‘),

which is smaller than ๐‘›2/๐‘‘ for ๐‘  โ‰ฅ ๐‘‘/4. Hence it suffices to show

inf๐‘ โ‰ฅ๐‘‘/4

inf๐‘โˆˆ๐ปGOF

1 (ฮ”๐‘›,๐‘ ;๐‘ )๐‘ƒ(๐‘‡GOF

๐‘›,a๐‘› (๐‘ ) โ€ฒ โ‰ฅ ๐ถ (๐‘‘) log log ๐‘›) โ†’ 1

93

Page 105: On the Construction of Minimax Optimal Nonparametric Tests ...

as ๐‘›โ†’ โˆž.

First of all, observe

0 โ‰ค E(๐‘ 2๐‘›,a๐‘› (๐‘ ) โ€ฒ

)โ‰ค E๐บ2a๐‘› (๐‘ ) โ€ฒ (๐‘‹1, ๐‘‹2) โ‰ค ๐‘€2(2a๐‘› (๐‘ )โ€ฒ/๐œ‹)โˆ’๐‘‘/2

and

var(๐‘ 2๐‘›,a๐‘› (๐‘ ) โ€ฒ

).๐‘‘ ๐‘€

3๐‘›โˆ’1(a๐‘› (๐‘ )โ€ฒ)โˆ’3๐‘‘/4 + ๐‘€2๐‘›โˆ’2(a๐‘› (๐‘ )โ€ฒ)โˆ’๐‘‘/2

for any ๐‘  and ๐‘ โˆˆ ๐ปGOF1 (ฮ”๐‘›,๐‘ , ๐‘ ). Further considering 1/๐‘›2 = ๐‘œ(๐‘€2(2a๐‘› (๐‘ )โ€ฒ/๐œ‹)โˆ’๐‘‘/2) uniformly

over all ๐‘ , we obtain that

inf๐‘ โ‰ฅ๐‘‘/4

inf๐‘โˆˆ๐ปGOF

1 (ฮ”๐‘›,๐‘ ;๐‘ )๐‘ƒ

(๏ฟฝ๏ฟฝ2๐‘›,a๐‘› (๐‘ ) โ€ฒ โ‰ค 2๐‘€2(2a๐‘› (๐‘ )โ€ฒ/๐œ‹)โˆ’๐‘‘/2

)โ†’ 1.

Let

ฮ”๐‘›,๐‘  โ‰ฅ ๐‘(โˆš๐‘€ + ๐‘€) (log log ๐‘›/๐‘›)2๐‘ /(๐‘‘+4๐‘ )

for some sufficiently large ๐‘ = ๐‘(๐‘‘). Then

E๐›พ2a๐‘› (๐‘ ) โ€ฒ (P, P0) = ๐›พ2

a๐‘› (๐‘ ) โ€ฒ (P, P0) โ‰ฅ(

๐œ‹

a๐‘› (๐‘ )โ€ฒ

)๐‘‘/2ยทโ€–๐‘ โˆ’ ๐‘0โ€–2

๐ฟ2

4,

as guaranteed by Lemma 8. Further considering that

var(๐›พ2a๐‘› (๐‘ ) โ€ฒ (P, P0)

).๐‘‘ ๐‘€

2๐‘›โˆ’2(a๐‘› (๐‘ )โ€ฒ)โˆ’๐‘‘/2 + ๐‘€๐‘›โˆ’1(a๐‘› (๐‘ )โ€ฒ)โˆ’3๐‘‘/4โ€–๐‘ โˆ’ ๐‘0โ€–2๐ฟ2,

we immediately have

lim๐‘›โ†’โˆž

inf๐‘ โ‰ฅ๐‘‘/4

inf๐‘โˆˆ๐ปGOF

1 (ฮ”๐‘›,๐‘ ;๐‘ )๐‘ƒ(๐‘‡GOF

๐‘›,a๐‘› (๐‘ ) โ€ฒ โ‰ฅ ๐ถ (๐‘‘) log log ๐‘›)

โ‰ฅ lim๐‘›โ†’โˆž

inf๐‘ โ‰ฅ๐‘‘/4

inf๐‘โˆˆ๐ปGOF

1 (ฮ”๐‘›,๐‘ ;๐‘ )๐‘ƒ

ยฉยญยญยซ๐‘›๐›พ2

a๐‘› (๐‘ ) โ€ฒ (P, P0)/2โˆš2๏ฟฝ๏ฟฝ2๐‘›,a๐‘› (๐‘ ) โ€ฒ

โ‰ฅ ๐ถ (๐‘‘) log log ๐‘›ยชยฎยฎยฌ = 1.

94

Page 106: On the Construction of Minimax Optimal Nonparametric Tests ...

Proof of Theorem 15 and Theorem 16. The proof of Theorem 15 and Theorem 16 is very similar

to that of Theorem 14. Hence we only emphasize the main differences here.

For adaptive homogeneity test: to verify that there exists ๐ถ = ๐ถ (๐‘‘) > 0 such that

lim๐‘›โ†’โˆž

๐‘ƒ(๐‘‡HOM(adapt)๐‘› โ‰ฅ ๐ถ log log ๐‘›) = 0

under ๐ปHOM0 , observe that

๐‘‡HOM(adapt)๐‘› โ‰ค sup

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

โˆšE[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2

๏ฟฝ๏ฟฝ2๐‘›,๐‘š,a๐‘›ยท(1๐‘›+ 1๐‘š

)โˆ’1sup

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

|๐›พ2a๐‘› (P,Q) |โˆš

2E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2.

Denote ๐‘‹1, ยท ยท ยท , ๐‘‹๐‘›, ๐‘Œ1, ยท ยท ยท , ๐‘Œ๐‘š as ๐‘1, ยท ยท ยท , ๐‘๐‘ . Hence

2๐‘›โˆ‘๐‘–=1

๐‘šโˆ‘๐‘—=1๐บa๐‘› (๐‘‹๐‘–, ๐‘Œ ๐‘— ) =

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘

๐บa๐‘› (๐‘๐‘–, ๐‘ ๐‘— ) โˆ’โˆ‘

1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›๐บa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ) โˆ’

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘š

๐บa๐‘› (๐‘Œ๐‘–, ๐‘Œ ๐‘— )

and

sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

|๐›พ2a๐‘› (P,Q) |โˆš

2E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2

โ‰ค(

1๐‘›(๐‘› โˆ’ 1) +

1๐‘š๐‘›

)sup

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝโˆ‘

1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— )โˆš2E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ+

(1

๐‘š(๐‘š โˆ’ 1) +1๐‘š๐‘›

)sup

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝโˆ‘

1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘š

๏ฟฝ๏ฟฝa๐‘› (๐‘Œ๐‘–, ๐‘Œ ๐‘— )โˆš2E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ+ 1๐‘š๐‘›

sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝโˆ‘

1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘

๏ฟฝ๏ฟฝa๐‘› (๐‘๐‘–, ๐‘ ๐‘— )โˆš2E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝApply Lemma 6 to bound each term of the right hand side of the above inequality. Then we

95

Page 107: On the Construction of Minimax Optimal Nonparametric Tests ...

conclude that for some ๐ถ = ๐ถ (๐‘‘) > 0,

lim๐‘›โ†’โˆž

๐‘ƒยฉยญยญยซ(1๐‘›+ 1๐‘š

)โˆ’1sup

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

|๐›พ2a๐‘› (P,Q) |โˆš

2E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2โ‰ฅ ๐ถ log log ๐‘›

ยชยฎยฎยฌ = 0.

For adaptive independence test: to verify that there exists ๐ถ = ๐ถ (๐‘‘) > 0 such that

lim๐‘›โ†’โˆž

๐‘ƒ(๐‘‡ IND(adapt)๐‘› โ‰ฅ ๐ถ log log ๐‘›) = 0 (6.40)

under ๐ปIND0 , recall the decomposition

๐›พ2a๐‘› (P, P

๐‘‹1 โŠ— P๐‘‹2) = ๐ท2(a๐‘›) + ๐‘…๐‘› =1

๐‘›(๐‘› โˆ’ 1)โˆ‘

1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›๐บโˆ—a๐‘›(๐‘‹๐‘–, ๐‘‹ ๐‘— ) + ๐‘…๐‘›,

where we express ๐‘…๐‘› as ๐‘…๐‘› = ๐ท3(a๐‘›) + ๐ท4(a๐‘›) in the proof of Theorem 13.

Following arguments similar to those in the proof of Lemma 6, we obtain that there exists

๐ถ (๐‘‘) > 0 such that for sufficiently large ๐‘›,

๐‘ƒยฉยญยญยซ sup

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐‘›๐ท2(a๐‘›)โˆš2E[๐บโˆ—

a๐‘› (๐‘‹1, ๐‘‹2)]2

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โ‰ฅ ๐ถ (๐‘‘) (log log ๐‘› + ๐‘ก log log log ๐‘›)ยชยฎยฎยฌ . exp(โˆ’๐‘ก2/3),

Similarly,

๐‘ƒยฉยญยญยซ sup

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐‘›3/2๐ท3(a๐‘›)โˆš2E[๐บโˆ—

a๐‘› (๐‘‹1, ๐‘‹2)]2

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โ‰ฅ ๐ถ (๐‘‘) (log log ๐‘› + ๐‘ก log log log ๐‘›)ยชยฎยฎยฌ . exp(โˆ’๐‘ก1/2)

๐‘ƒยฉยญยญยซ sup

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐‘›2๐ท4(a๐‘›)โˆš2E[๐บโˆ—

a๐‘› (๐‘‹1, ๐‘‹2)]2

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โ‰ฅ ๐ถ (๐‘‘) (log log ๐‘› + ๐‘ก log log log ๐‘›)ยชยฎยฎยฌ . exp(โˆ’๐‘ก2/5)

for sufficiently large ๐‘›.

96

Page 108: On the Construction of Minimax Optimal Nonparametric Tests ...

On the other hand, note that

E[๐บโˆ—a๐‘›(๐‘‹1, ๐‘‹2)]2 =

2โˆ๐‘—=1E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹

๐‘—

1 , ๐‘‹๐‘—

2 )]2,

and based on results in the proof of Lemma 5, sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝ๐‘ 2๐‘›, ๐‘— ,a๐‘›/E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘—

1 , ๐‘‹๐‘—

2 )]2 โˆ’ 1

๏ฟฝ๏ฟฝ๏ฟฝ = ๐‘œ๐‘ (1) for

๐‘— = 1, 2. Further considering that

1/๐‘›2 = ๐‘œ(E[๐บโˆ—a๐‘›(๐‘‹1, ๐‘‹2)]2)

uniformly over all a๐‘› โˆˆ [1, ๐‘›2/๐‘‘], we obtain

sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๐‘ 2๐‘›,a๐‘›/E[๐บโˆ—a๐‘›(๐‘‹1, ๐‘‹2)]2 โˆ’ 1

๏ฟฝ๏ฟฝ = ๐‘œ๐‘ (1).They combined together ensure that (6.40) holds.

To show that the detection boundary of ฮฆIND(adapt) is of order ๐‘‚ ((๐‘›/log log ๐‘›)โˆ’2๐‘ /(๐‘‘+4๐‘ )), ob-

serve that

0 โ‰ค E(๐‘ 2๐‘›, ๐‘— ,a๐‘› (๐‘ ) โ€ฒ

)โ‰ค E๐บ2a๐‘› (๐‘ ) โ€ฒ (๐‘‹

๐‘—

1 , ๐‘‹๐‘—

2 ) โ‰ค ๐‘€2๐‘— (2a๐‘› (๐‘ )โ€ฒ/๐œ‹)โˆ’๐‘‘ ๐‘—/2

and

var(๐‘ 2๐‘›, ๐‘— ,a๐‘› (๐‘ ) โ€ฒ

).๐‘‘ ๐‘— ๐‘€

3๐‘— ๐‘›

โˆ’1(a๐‘› (๐‘ )โ€ฒ)โˆ’3๐‘‘ ๐‘—/4 + ๐‘€2๐‘— ๐‘›

โˆ’2(a๐‘› (๐‘ )โ€ฒ)โˆ’๐‘‘ ๐‘—/2

for ๐‘— = 1, 2, where a๐‘› (๐‘ )โ€ฒ = (log log ๐‘›/๐‘›)โˆ’4/(4๐‘ +๐‘‘) as in the proof of Theorem 14. Therefore,

inf๐‘ โ‰ฅ๐‘‘/4

inf๐‘โˆˆ๐ปIND

1 (ฮ”๐‘›,๐‘ ;๐‘ )๐‘ƒ

(๏ฟฝ๏ฟฝ๏ฟฝ๐‘ 2๐‘›, ๐‘— ,a๐‘› (๐‘ ) โ€ฒ๏ฟฝ๏ฟฝ๏ฟฝ โ‰ค โˆš3/2๐‘€2

๐‘— (2a๐‘› (๐‘ )โ€ฒ/๐œ‹)โˆ’๐‘‘ ๐‘—/2)โ†’ 1, ๐‘— = 1, 2.

Further considering 1/๐‘›2 = ๐‘œ(๐‘€2(2a๐‘› (๐‘ )โ€ฒ/๐œ‹)โˆ’๐‘‘/2) uniformly over all ๐‘ , we obtain that

inf๐‘ โ‰ฅ๐‘‘/4

inf๐‘โˆˆ๐ปIND

1 (ฮ”๐‘›,๐‘ ;๐‘ )๐‘ƒ

(๏ฟฝ๏ฟฝ2๐‘›,a๐‘› (๐‘ ) โ€ฒ โ‰ค 2๐‘€2(2a๐‘› (๐‘ )โ€ฒ/๐œ‹)โˆ’๐‘‘/2

)โ†’ 1.

97

Page 109: On the Construction of Minimax Optimal Nonparametric Tests ...

References

L. Addario-Berry, N. Broutin, L. Devroye, and G. Lugosi (2010). โ€œOn combinatorial testing prob-lemsโ€. In: The Annals of Statistics 38.5, pp. 3063โ€“3092.

N. Ailon, M. Charikar, and A. Newman (2008). โ€œAggregating inconsistent information: rankingand clusteringโ€. In: Journal of ACM 55.5, 23:1โ€“23:27.

M. A. Arcones and E. Gine (1993). โ€œLimit Theorems for U-Processesโ€. In: The Annals of Proba-bility 21.3, pp. 1494โ€“1542.

Y. Baraud (2002). โ€œNon-asymptotic minimax rates of testing in signal detectionโ€. In: Bernoulli 8.5,pp. 577โ€“606.

M. V. Burnashev (1979). โ€œOn the minimax detection of an inaccurately known signal in a whiteGaussian noise backgroundโ€. In: Theory of Probability & Its Applications 24.1, pp. 107โ€“119.

N. Dunford and J. T. Schwartz (1963). Linear Operators: Part II: Spectral Theory: Self AdjointOperators in Hilbert Space. Interscience Publishers.

M. S. Ermakov (1991). โ€œMinimax detection of a signal in a Gaussian white noiseโ€. In: Theory ofProbability & Its Applications 35.4, pp. 667โ€“679.

M. Fromont and B. Laurent (2006). โ€œAdaptive goodness-of-fit tests in a density modelโ€. In: TheAnnals of Statistics 34.2, pp. 680โ€“720.

M. Fromont, B. Laurent, M. Lerasle, and P. Reynaud-Bouret (2012). โ€œKernels based tests withnon-asymptotic bootstrap approaches for two-sample problemโ€. In: JMLR: Workshop and Con-ference Proceedings. Vol. 23, pp. 23โ€“1.

M. Fromont, B. Laurent, and P. Reynaud-Bouret (2013). โ€œThe two-sample problem for poissonprocesses: Adaptive tests with a nonasymptotic wild bootstrap approachโ€. In: The Annals ofStatistics 41.3, pp. 1431โ€“1461.

K. Fukumizu, A. Gretton, G. R. Lanckriet, B. Schรถlkopf, and B. K. Sriperumbudur (2009). โ€œKernelchoice and classifiability for RKHS embeddings of probability distributionsโ€. In: Advances inNeural Information Processing Systems, pp. 1750โ€“1758.

G. G. Gregory (1977). โ€œLarge sample theory for U-statistics and tests of fitโ€. In: The Annals ofStatistics 5.1, pp. 110โ€“123.

98

Page 110: On the Construction of Minimax Optimal Nonparametric Tests ...

A. Gretton, K. Borgwardt, M. Rasch, B. Schรถlkopf, and A. Smola (2012a). โ€œA kernel two-sampletestโ€. In: Journal of Machine Learning Research 13.Mar, pp. 723โ€“773.

A. Gretton, O. Bousquet, A. J. Smola, and B. Schรถlkopf (2005). โ€œMeasuring statistical dependencewith Hilbert-Schmidt normsโ€. In: International Conference on Algorithmic Learning Theory.Springer, pp. 63โ€“77.

A. Gretton, K. Fukumizu, C. H. Teo, L. Song, B. Schรถlkopf, and A. J. Smola (2008). โ€œA ker-nel statistical test of independenceโ€. In: Advances in Neural Information Processing Systems,pp. 585โ€“592.

A. Gretton, D. Sejdinovic, H. Strathmann, S. Balakrishnan, M. Pontil, K. Fukumizu, and B. K.Sriperumbudur (2012b). โ€œOptimal kernel choice for large-scale two-sample testsโ€. In: Ad-vances in Neural Information Processing systems, pp. 1205โ€“1213.

E. Haeusler (1988). โ€œOn the rate of convergence in the central limit theorem for martingales withdiscrete and continuous timeโ€. In: The Annals of Probability 16.1, pp. 275โ€“299.

P. Hall (1984). โ€œCentral limit theorem for integrated square error of multivariate nonparametricdensity estimatorsโ€. In: Journal of Multivariate Analysis 14.1, pp. 1โ€“16.

Z. Harchaoui, F. Bach, and E. Moulines (2007). โ€œTesting for homogeneity with kernel fisher dis-criminant analysisโ€. In: Advances in Neural Information Processing Systems, pp. 609โ€“616.

Y. I. Ingster (1987). โ€œMinimax testing of nonparametric hypotheses on a distribution density in theL_p metricsโ€. In: Theory of Probability & Its Applications 31.2, pp. 333โ€“337.

โ€” (1993). โ€œAsymptotically minimax hypothesis testing for nonparametric alternatives. I, II, IIIโ€.In: Mathematical Methods of Statistics 2.2, pp. 85โ€“114.

โ€” (2000). โ€œAdaptive chi-square testsโ€. In: Journal of Mathematical Sciences 99.2, pp. 1110โ€“1119.

Y. I. Ingster and I. A. Suslina (2000). โ€œMinimax nonparametric hypothesis testing for ellipsoidsand Besov bodiesโ€. In: ESAIM: Probability and Statistics 4, pp. 53โ€“135.

โ€” (2003). Nonparametric Goodness-of-Fit Testing under Gaussian Models. New York, NY: Springer.

P. E. Jupp (2005). โ€œSobolev tests of goodness of fit of distributions on compact Riemannian mani-foldsโ€. In: The Annals of Statistics 33.6, pp. 2957โ€“2966.

E. L. Lehmann and J. P. Romano (2008). Testing Statistical Hypotheses. New York, NY: SpringerScience & Business Media.

99

Page 111: On the Construction of Minimax Optimal Nonparametric Tests ...

O. V. Lepski and V. G. Spokoiny (1999). โ€œMinimax nonparametric hypothesis testing: the case ofan inhomogeneous alternativeโ€. In: Bernoulli 5.2, pp. 333โ€“358.

R. Lyons (2013). โ€œDistance covariance in metric spacesโ€. In: The Annals of Probability 41.5,pp. 3284โ€“3305.

J. M. Mooij, J. Peters, D. Janzing, J. Zscheischler, and B. Schรถlkopf (2016). โ€œDistinguishing causefrom effect using observational data: methods and benchmarksโ€. In: The Journal of MachineLearning Research 17.1, pp. 1103โ€“1204.

K. Muandet, K. Fukumizu, B. K. Sriperumbudur, and B. Schรถlkopf (2017). โ€œKernel mean embed-ding of distributions: a review and beyondโ€. In: Foundations and Trendsยฎ in Machine Learning10.1-2, pp. 1โ€“141.

J. Peters, J. M. Mooij, D. Janzing, and B. Schรถlkopf (2014). โ€œCausal discovery with continuousadditive noise modelsโ€. In: The Journal of Machine Learning Research 15.1, pp. 2009โ€“2053.

N. Pfister, P. Bรผhlmann, B. Schรถlkopf, and J. Peters (2018). โ€œKernel-based tests for joint indepen-denceโ€. In: Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80.1,pp. 5โ€“31.

D. Sejdinovic, B. Sriperumbudur, A. Gretton, and K. Fukumizu (2013). โ€œEquivalence of distance-based and RKHS-based statistics in hypothesis testingโ€. In: The Annals of Statistics 41.5,pp. 2263โ€“2291.

R. J. Serfling (2009). Approximation Theorems of Mathematical Statistics. New York, NY: JohnWiley & Sons.

V. G. Spokoiny (1996). โ€œAdaptive hypothesis testing using waveletsโ€. In: The Annals of Statistics24.6, pp. 2477โ€“2498.

B. Sriperumbudur, K. Fukumizu, A. Gretton, G. Lanckriet, and B. Schoelkopf (2009). โ€œKernelchoice and classifiability for RKHS embeddings of probability distributionsโ€. In: Advances inNeural Information Processing Systems 22, pp. 1750โ€“1758.

B. K. Sriperumbudur, K. Fukumizu, and G. R. Lanckriet (2011). โ€œUniversality, characteristic ker-nels and RKHS embedding of measuresโ€. In: Journal of Machine Learning Research 12.Jul,pp. 2389โ€“2410.

B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schรถlkopf, and G. R. Lanckriet (2010). โ€œHilbertspace embeddings and metrics on probability measuresโ€. In: Journal of Machine LearningResearch 11.Apr, pp. 1517โ€“1561.

I. Steinwart and A. Christmann (2008). Support Vector Machines. Springer Science & BusinessMedia.

100

Page 112: On the Construction of Minimax Optimal Nonparametric Tests ...

I. Steinwart (2001). โ€œOn the influence of the kernel on the consistency of support vector machinesโ€.In: Journal of machine learning research 2.Nov, pp. 67โ€“93.

D. J. Sutherland, H.-Y. Tung, H. Strathmann, S. De, A. Ramdas, A. Smola, and A. Gretton (2017).โ€œGenerative models and model criticism via optimized maximum mean discrepancyโ€. In: In-ternational Conference on Learning Representations.

G. J Szรฉkely and M. L. Rizzo (2009). โ€œBrownian distance covarianceโ€. In: The Annals of AppliedStatistics 3.4, pp. 1236โ€“1265.

G. J. Szรฉkely, M. L. Rizzo, and N. K. Bakirov (2007). โ€œMeasuring and testing dependence bycorrelation of distancesโ€. In: The Annals of Statistics 35.6, pp. 2769โ€“2794.

M. Talagrand (2014). Upper and Lower Bounds for Stochastic Processes: Modern Methods andClassical Problems. Springer Science & Business Media.

A. B. Tsybakov (2008). Introduction to Nonparametric Estimation. New York, NY: Springer Sci-ence & Business Media.

101

Page 113: On the Construction of Minimax Optimal Nonparametric Tests ...

Appendix A: Some Technical Results and Proofs Related to Chapter 2

Proof of Lemma 3. We have

๐บ2(๐‘ฅ, ๐‘ฅโ€ฒ) =โˆ‘๐‘˜,๐‘™

`๐‘˜`๐‘™๐œ‘๐‘˜ (๐‘ฅ)๐œ‘๐‘™ (๐‘ฅ)๐œ‘๐‘˜ (๐‘ฅโ€ฒ)๐œ‘๐‘™ (๐‘ฅโ€ฒ).

Thus

โˆซ๐‘”(๐‘ฅ)๐‘”(๐‘ฅโ€ฒ)๐บ2(๐‘ฅ, ๐‘ฅโ€ฒ)๐‘‘๐‘ƒ(๐‘ฅ)๐‘‘๐‘ƒ(๐‘ฅโ€ฒ)

=โˆ‘๐‘˜,๐‘™

`๐‘˜`๐‘™

( โˆซ๐‘”(๐‘ฅ)๐œ‘๐‘˜ (๐‘ฅ)๐œ‘๐‘™ (๐‘ฅ)๐‘‘๐‘ƒ(๐‘ฅ)

)2

โ‰ค`1โˆ‘๐‘˜

`๐‘˜

โˆ‘๐‘™

( โˆซ๐‘”(๐‘ฅ)๐œ‘๐‘˜ (๐‘ฅ)๐œ‘๐‘™ (๐‘ฅ)๐‘‘๐‘ƒ(๐‘ฅ)

)2

โ‰ค`1

(โˆ‘๐‘˜

`๐‘˜

โˆซ๐‘”2(๐‘ฅ)๐œ‘2

๐‘˜ (๐‘ฅ)๐‘‘๐‘ƒ(๐‘ฅ))

โ‰ค`1

(โˆ‘๐‘˜

`๐‘˜

) (sup๐‘˜

โ€–๐œ‘๐‘˜ โ€–โˆž)2

โ€–๐‘”โ€–2๐ฟ2 (๐‘ƒ) .

Proof of Lemma 4. For brevity, write

๐‘™๐พ =

๐พโˆ‘๐‘˜=1

๐‘Ž2๐‘˜

_๐‘˜.

By definition, it suffices to show that โˆ€ ๐‘… > 0, โˆƒ ๐‘“๐‘… โˆˆ H (๐พ) such that โ€– ๐‘“๐‘…โ€–2๐พ

โ‰ค ๐‘…2 and โ€–๐‘ข โˆ’

๐‘“๐‘…โ€–2๐ฟ2 (๐‘ƒ0) โ‰ค ๐‘€2๐‘…โˆ’2/\ .

102

Page 114: On the Construction of Minimax Optimal Nonparametric Tests ...

To this end, let ๐พ be such that ๐‘™2๐พโ‰ค ๐‘…2 โ‰ค ๐‘™2

๐พ+1, and denote by

๐‘“๐‘… =

๐พโˆ‘๐‘˜=1

๐‘Ž๐‘˜๐œ‘๐‘˜ + ๐‘Žโˆ—๐พ+1(๐‘…)๐œ‘๐พ+1,

where

๐‘Žโˆ—๐พ+1(๐‘…) = sgn(๐‘Ž๐พ+1)โˆš_๐พ+1(๐‘…2 โˆ’ ๐‘™2

๐พ).

Clearly,

โ€– ๐‘“๐‘…โ€–2๐พ =

๐พโˆ‘๐‘˜=1

๐‘Ž2๐‘˜

_๐‘˜+(๐‘Žโˆ—๐‘˜+1(๐‘…))

2

_๐พ+1= ๐‘…2,

and

โ€–๐‘ข โˆ’ ๐‘“๐‘…โ€–2๐ฟ2 (๐‘ƒ0) =

โˆ‘๐‘˜>๐พ+1

๐‘Ž2๐‘˜ +

(|๐‘Ž๐พ+1 | โˆ’

โˆš_๐พ+1(๐‘…2 โˆ’ ๐‘™2

๐พ))2

โ‰คโˆ‘๐‘˜โ‰ฅ๐พ+1

๐‘Ž2๐‘˜ .

To ensure ๐‘ข โˆˆ F (\, ๐‘€), it suffices to have

sup๐‘™2๐พโ‰ค๐‘…2โ‰ค๐‘™2

๐พ+1

โ€–๐‘ข โˆ’ ๐‘“๐‘…โ€–2๐ฟ2 (๐‘ƒ0)๐‘…

2/\ โ‰ค ๐‘€2, โˆ€ ๐พ โ‰ฅ 0,

which concludes the proof.

103

Page 115: On the Construction of Minimax Optimal Nonparametric Tests ...

Appendix B: Some Technical Results and Proofs Related to Chapter 3

B.1 Properties of Gaussian Kernel

We collect here a couple of useful properties of Gaussian kernel that we used repeated in the

proof to the main results.

Lemma 7. For any ๐‘“ โˆˆ ๐ฟ2(R๐‘‘),โˆซ๐บa (๐‘ฅ, ๐‘ฆ) ๐‘“ (๐‘ฅ) ๐‘“ (๐‘ฆ)๐‘‘๐‘ฅ๐‘‘๐‘ฆ =

(๐œ‹a

) ๐‘‘2โˆซ

exp(โˆ’โ€–๐œ”โ€–2

4a

)โ€–F ๐‘“ (๐œ”)โ€–2 ๐‘‘๐œ”.

Proof. Denote by ๐‘ a Gaussian random vector with mean 0 and covariance matrix 2a๐ผ๐‘‘ . Then

โˆซ๐บa (๐‘ฅ, ๐‘ฆ) ๐‘“ (๐‘ฅ) ๐‘“ (๐‘ฆ)๐‘‘๐‘ฅ๐‘‘๐‘ฆ =

โˆซexp

(โˆ’aโ€–๐‘ฅ โˆ’ ๐‘ฆโ€–2

)๐‘“ (๐‘ฅ) ๐‘“ (๐‘ฆ)๐‘‘๐‘ฅ๐‘‘๐‘ฆ

=

โˆซE exp[๐‘–๐‘>(๐‘ฅ โˆ’ ๐‘ฆ)] ๐‘“ (๐‘ฅ) ๐‘“ (๐‘ฆ)๐‘‘๐‘ฅ๐‘‘๐‘ฆ

=E

โˆซ exp(โˆ’๐‘–๐‘>๐‘ฅ) ๐‘“ (๐‘ฅ)๐‘‘๐‘ฅ 2

=

โˆซ1

(4๐œ‹a)๐‘‘/2exp

(โˆ’โ€–๐œ”โ€–2

4a

) โˆซ exp(โˆ’๐‘–๐œ”>๐‘ฅ) ๐‘“ (๐‘ฅ)๐‘‘๐‘ฅ 2

=

(๐œ‹a

) ๐‘‘2โˆซ

exp(โˆ’โ€–๐œ”โ€–2

4a

)โ€–F ๐‘“ (๐œ”)โ€–2 ๐‘‘๐œ”,

which concludes the proof.

A useful consequence of Lemma 7 is a close connection between Gaussian kernel MMD and

๐ฟ2 norm.

104

Page 116: On the Construction of Minimax Optimal Nonparametric Tests ...

Lemma 8. For any ๐‘“ โˆˆ W๐‘ ,2(๐‘€)

( a๐œ‹

)๐‘‘/2 โˆซ๐บa (๐‘ฅ, ๐‘ฆ) ๐‘“ (๐‘ฅ) ๐‘“ (๐‘ฆ)๐‘‘๐‘ฅ๐‘‘๐‘ฆ โ‰ฅ

14โ€– ๐‘“ โ€–2

๐ฟ2,

provided that

a๐‘  โ‰ฅ 41โˆ’๐‘ ๐‘€2

(log 3)๐‘  ยท โ€– ๐‘“ โ€–โˆ’2๐ฟ2.

Proof. In light of Lemma 7,

( a๐œ‹

)๐‘‘/2 โˆซ๐บa (๐‘ฅ, ๐‘ฆ) ๐‘“ (๐‘ฅ) ๐‘“ (๐‘ฆ)๐‘‘๐‘ฅ๐‘‘๐‘ฆ =

โˆซexp

(โˆ’โ€–๐œ”โ€–2

4a

)โ€–F ๐‘“ (๐œ”)โ€–2 ๐‘‘๐œ”.

By Plancherel Theorem, for any ๐‘‡ > 0,

โˆซโ€–๐œ”โ€–โ‰ค๐‘‡

โ€–F ๐‘“ (๐œ”)โ€–2 ๐‘‘๐œ” = โ€– ๐‘“ โ€–2๐ฟ2 โˆ’

โˆซโ€–๐œ”โ€–>๐‘‡

โ€–F ๐‘“ (๐œ”)โ€–2 ๐‘‘๐œ” โ‰ฅ โ€– ๐‘“ โ€–2๐ฟ2 โˆ’

๐‘€2

๐‘‡2๐‘  ,

Choosing

๐‘‡ =

(2๐‘€โ€– ๐‘“ โ€–๐ฟ2

)1/๐‘ ,

yields โˆซโ€–๐œ”โ€–โ‰ค๐‘‡

โ€–F ๐‘“ (๐œ”)โ€–2 ๐‘‘๐œ” โ‰ฅ 34โ€– ๐‘“ โ€–2

๐ฟ2 .

Hence

โˆซexp

(โˆ’โ€–๐œ”โ€–2

4a

)โ€–F ๐‘“ (๐œ”)โ€–2 ๐‘‘๐œ” โ‰ฅ exp

(โˆ’๐‘‡

2

4a

) โˆซโ€–๐œ”โ€–โ‰ค๐‘‡

โ€–F ๐‘“ (๐œ”)โ€–2 ๐‘‘๐œ”

โ‰ฅ 34

exp(โˆ’๐‘‡

2

4a

)โ€– ๐‘“ โ€–2

๐ฟ2 .

In particular, if

a โ‰ฅ (2๐‘€)2/๐‘ 

4 log 3ยท โ€– ๐‘“ โ€–โˆ’2/๐‘ 

๐ฟ2 ,

105

Page 117: On the Construction of Minimax Optimal Nonparametric Tests ...

then โˆซexp

(โˆ’โ€–๐œ”โ€–2

4a

)โ€–F ๐‘“ (๐œ”)โ€–2 ๐‘‘๐œ” โ‰ฅ 1

4โ€– ๐‘“ โ€–2

๐ฟ2 ,

which concludes the proof.

B.2 Proof of Lemma 5

We first prove that sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๐‘ 2๐‘›,a๐‘›/E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 โˆ’ 1๏ฟฝ๏ฟฝ = ๐‘œ๐‘ (1) and then show the difference

caused by the modification from ๐‘ 2๐‘›,a๐‘› to ๏ฟฝ๏ฟฝ2๐‘›,a๐‘› is asymptotically negligible.

Note that

sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๐‘ 2๐‘›,a๐‘›/E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 โˆ’ 1๏ฟฝ๏ฟฝ

โ‰ค(

inf1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

a๐‘‘/2๐‘› E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2

)โˆ’1ยท sup

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘a๐‘‘/2๐‘›

๏ฟฝ๏ฟฝ๐‘ 2๐‘›,a๐‘› โˆ’ E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2๏ฟฝ๏ฟฝ .For ๐‘‹ โˆผ P0, denote the distribution of (๐‘‹, ๐‘‹) as P1. Then we have

E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 = ๐›พ2a๐‘›(P1, P0 โŠ— P0).

Hence E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 > 0 for any a๐‘› > 0 since ๐บa๐‘› is characteristic.

In addition, a๐‘‘/2๐‘› E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 is continuous with respect to a๐‘› and

lima๐‘›โ†’โˆž

a๐‘‘/2๐‘› E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 =

(๐œ‹2

)๐‘‘/2โ€–๐‘0โ€–2

๐ฟ2.

Therefore,

inf1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

a๐‘‘/2๐‘› E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 โ‰ฅ inf

a๐‘›โˆˆ[0,โˆž)a๐‘‘/2๐‘› E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 > 0,

106

Page 118: On the Construction of Minimax Optimal Nonparametric Tests ...

and it remains to prove

sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

a๐‘‘/2๐‘›

๏ฟฝ๏ฟฝ๐‘ 2๐‘›,a๐‘› โˆ’ E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2๏ฟฝ๏ฟฝ = ๐‘œ๐‘ (1).Recall the expression of ๐‘ 2๐‘›,a๐‘› . It suffcies to show that

sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

a๐‘‘/2๐‘›

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ 1๐‘›(๐‘› โˆ’ 1)

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๐บ2a๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ) โˆ’ E๐บ2a๐‘› (๐‘‹1, ๐‘‹2)

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ (B.1)

sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

a๐‘‘/2๐‘›

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ2(๐‘› โˆ’ 3)!

๐‘›!

โˆ‘1โ‰ค๐‘–, ๐‘—1, ๐‘—2โ‰ค๐‘›|{๐‘–, ๐‘—1, ๐‘—2}|=3

๐บa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘—1)๐บa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘—2) โˆ’ E๐บa๐‘› (๐‘‹1, ๐‘‹2)๐บa๐‘› (๐‘‹1, ๐‘‹3)

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ (B.2)

sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

a๐‘‘/2๐‘›

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ(๐‘› โˆ’ 4)!๐‘›!

โˆ‘1โ‰ค๐‘–1,๐‘–2, ๐‘—1, ๐‘—2โ‰ค๐‘›|{๐‘–1,๐‘–2, ๐‘—1, ๐‘—2}|=4

๐บa๐‘› (๐‘‹๐‘–1 , ๐‘‹ ๐‘—1)๐บa๐‘› (๐‘‹๐‘–2 , ๐‘‹ ๐‘—2) โˆ’ [E๐บa๐‘› (๐‘‹1, ๐‘‹2)]2

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ (B.3)

are all ๐‘œ๐‘ (1). We shall first control (B.1) and then bound (B.2) and (B.3) in the same way.

Let

E๐‘›๐บ2a๐‘› (๐‘‹, ๐‘‹โ€ฒ) = 1๐‘›(๐‘› โˆ’ 1)

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๐บ2a๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ).

In the rest of this proof, abbreviate E๐‘›๐บ2a๐‘› (๐‘‹, ๐‘‹โ€ฒ) and E๐บ2a๐‘› (๐‘‹1, ๐‘‹2) as E๐‘›๐บ2a๐‘› and E๐บ2a๐‘› re-

spectively when no confusion occurs.

Divide the whole interval [1, ๐‘›2/๐‘‘] into ๐ด sub-intervals, [๐‘ข0, ๐‘ข1], [๐‘ข1, ๐‘ข2], ยท ยท ยท , [๐‘ข๐ดโˆ’1, ๐‘ข๐ด] with

๐‘ข0 = 1, ๐‘ข๐ด = ๐‘›2/๐‘‘ . For any a๐‘› โˆˆ [๐‘ข๐‘Žโˆ’1, ๐‘ข๐‘Ž],

a๐‘‘/2๐‘› E๐‘›๐บ2a๐‘› โˆ’ a

๐‘‘/2๐‘› E๐บ2a๐‘› โ‰ฅ โˆ’ a๐‘‘/2๐‘›

๏ฟฝ๏ฟฝ๏ฟฝE๐‘›๐บ2๐‘ข๐‘Ž โˆ’ E๐บ2๐‘ข๐‘Ž

๏ฟฝ๏ฟฝ๏ฟฝ โˆ’ a๐‘‘/2๐‘›

๏ฟฝ๏ฟฝE๐บ2๐‘ข๐‘Ž โˆ’ E๐บ2๐‘ข๐‘Žโˆ’1

๏ฟฝ๏ฟฝโ‰ฅ โˆ’ ๐‘ข๐‘‘/2๐‘Ž

๏ฟฝ๏ฟฝ๏ฟฝE๐‘›๐บ2๐‘ข๐‘Ž โˆ’ E๐บ2๐‘ข๐‘Ž

๏ฟฝ๏ฟฝ๏ฟฝ โˆ’ ๐‘ข๐‘‘/2๐‘Ž

๏ฟฝ๏ฟฝE๐บ2๐‘ข๐‘Ž โˆ’ E๐บ2๐‘ข๐‘Žโˆ’1

๏ฟฝ๏ฟฝand

a๐‘‘/2๐‘› E๐‘›๐บ2a๐‘› โˆ’ a

๐‘‘/2๐‘› E๐บ2a๐‘› โ‰ค ๐‘ข

๐‘‘/2๐‘Ž

๏ฟฝ๏ฟฝ๏ฟฝE๐‘›๐บ2๐‘ข๐‘Žโˆ’1 โˆ’ E๐บ2๐‘ข๐‘Žโˆ’1

๏ฟฝ๏ฟฝ๏ฟฝ + ๐‘ข๐‘‘/2๐‘Ž

๏ฟฝ๏ฟฝE๐บ2๐‘ข๐‘Ž โˆ’ E๐บ2๐‘ข๐‘Žโˆ’1

๏ฟฝ๏ฟฝ ,107

Page 119: On the Construction of Minimax Optimal Nonparametric Tests ...

which together ensure that

sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝa๐‘‘/2๐‘› E๐‘›๐บ2a๐‘› โˆ’ a๐‘‘/2๐‘› E๐บ2a๐‘›

๏ฟฝ๏ฟฝ๏ฟฝโ‰ค sup

1โ‰ค๐‘Žโ‰ค๐ด

(๐‘ข๐‘Ž

๐‘ข๐‘Žโˆ’1

)๐‘‘/2ยท sup

0โ‰ค๐‘Žโ‰ค๐ด๐‘ข๐‘‘/2๐‘Ž

๏ฟฝ๏ฟฝ๏ฟฝE๐‘›๐บ2๐‘ข๐‘Ž โˆ’ E๐บ2๐‘ข๐‘Ž

๏ฟฝ๏ฟฝ๏ฟฝ + sup1โ‰ค๐‘Žโ‰ค๐ด

๐‘ข๐‘‘/2๐‘Ž

๏ฟฝ๏ฟฝE๐บ2๐‘ข๐‘Ž โˆ’ E๐บ2๐‘ข๐‘Žโˆ’1

๏ฟฝ๏ฟฝโ‰ค sup

1โ‰ค๐‘Žโ‰ค๐ด

(๐‘ข๐‘Ž

๐‘ข๐‘Žโˆ’1

)๐‘‘/2ยท sup

0โ‰ค๐‘Žโ‰ค๐ด๐‘ข๐‘‘/2๐‘Ž

๏ฟฝ๏ฟฝ๏ฟฝE๐‘›๐บ2๐‘ข๐‘Ž โˆ’ E๐บ2๐‘ข๐‘Ž

๏ฟฝ๏ฟฝ๏ฟฝ + sup1โ‰ค๐‘Žโ‰ค๐ด

๏ฟฝ๏ฟฝ๏ฟฝ๐‘ข๐‘‘/2๐‘Ž E๐บ2๐‘ข๐‘Ž โˆ’ ๐‘ข๐‘‘/2๐‘Žโˆ’1E๐บ2๐‘ข๐‘Žโˆ’1

๏ฟฝ๏ฟฝ๏ฟฝ+ sup

1โ‰ค๐‘Žโ‰ค๐ด

((๐‘ข๐‘‘/2๐‘Ž โˆ’ ๐‘ข๐‘‘/2

๐‘Žโˆ’1

)E๐บ2๐‘ข๐‘Žโˆ’1

).

Bound the three terms in the right hand side of the last inequality separately.

Let {๐‘ข๐‘Ž}๐‘Žโ‰ฅ0 be a geometric sequence, namely,

๐ด := inf{๐‘Ž โˆˆ N : ๐‘Ÿ๐‘Ž โ‰ฅ ๐‘›2/๐‘‘},

and

๐‘ข๐‘Ž =

๐‘Ÿ๐‘Ž, โˆ€ 0 โ‰ค ๐‘Ž โ‰ค ๐ด โˆ’ 1

๐‘›2/๐‘‘ , ๐‘Ž = ๐ด

,

with ๐‘Ÿ > 1 to be determined later.

Since limaโ†’โˆž

a๐‘‘/2E๐บ2a๐‘› = (๐œ‹/2)๐‘‘/2โ€–๐‘0โ€–2 and a๐‘‘/2E๐บ2a is continuous, we obtain that for any

Y > 0, there exsits sufficiently small ๐‘Ÿ > 1 such that

sup1โ‰ค๐‘Žโ‰ค๐ด

๏ฟฝ๏ฟฝ๏ฟฝ๐‘ข๐‘‘/2๐‘Ž E๐บ2๐‘ข๐‘Ž โˆ’ ๐‘ข๐‘‘/2๐‘Žโˆ’1E๐บ2๐‘ข๐‘Žโˆ’1

๏ฟฝ๏ฟฝ๏ฟฝ โ‰ค Y.At the same time, we can also ensure

sup1โ‰ค๐‘Žโ‰ค๐ด

((๐‘ข๐‘‘/2๐‘Ž โˆ’ ๐‘ข๐‘‘/2

๐‘Žโˆ’1

)E๐บ2๐‘ข๐‘Žโˆ’1

)โ‰ค (๐‘Ÿ๐‘‘/2 โˆ’ 1)

(๐œ‹2

)๐‘‘/2โ€–๐‘0โ€–2 โ‰ค Y

by choosing ๐‘Ÿ sufficiently small.

108

Page 120: On the Construction of Minimax Optimal Nonparametric Tests ...

Finally consider

sup1โ‰ค๐‘Žโ‰ค๐ด

(๐‘ข๐‘Ž

๐‘ข๐‘Žโˆ’1

)๐‘‘/2ยท sup

0โ‰ค๐‘Žโ‰ค๐ด๐‘ข๐‘‘/2๐‘Ž

๏ฟฝ๏ฟฝ๏ฟฝE๐‘›๐บ2๐‘ข๐‘Ž โˆ’ E๐บ2๐‘ข๐‘Ž

๏ฟฝ๏ฟฝ๏ฟฝ .On the one hand,

sup1โ‰ค๐‘Žโ‰ค๐ด

(๐‘ข๐‘Ž

๐‘ข๐‘Žโˆ’1

)๐‘‘/2โ‰ค ๐‘Ÿ๐‘‘/2.

On the other hand, since

var(E๐‘›๐บ2a๐‘›

).

1๐‘›E๐บ2a๐‘› (๐‘‹, ๐‘‹โ€ฒ)๐บ2a๐‘› (๐‘‹, ๐‘‹โ€ฒโ€ฒ) + 1

๐‘›2E๐บ4a๐‘› (๐‘‹, ๐‘‹โ€ฒ)

.๐‘‘

aโˆ’3๐‘‘/4๐‘› โ€–๐‘0โ€–3

๐‘›+ a

โˆ’๐‘‘/2๐‘› โ€–๐‘0โ€–2

๐‘›2

for any a๐‘› โˆˆ (0,โˆž), we have

๐‘ƒ

(sup

0โ‰ค๐‘Žโ‰ค๐ด๐‘ข๐‘‘/2๐‘Ž

๏ฟฝ๏ฟฝ๏ฟฝE๐‘›๐บ2๐‘ข๐‘Ž โˆ’ E๐บ2๐‘ข๐‘Ž

๏ฟฝ๏ฟฝ๏ฟฝ โ‰ฅ Y)

โ‰ค

๐ดโˆ‘๐‘Ž=0

๐‘ข๐‘‘๐‘Žvar(E๐‘›๐บ2๐‘ข๐‘Ž

)Y2 .๐‘‘,๐‘Ÿ

1Y2

(๐‘ข๐‘‘/4๐ด

โ€–๐‘0โ€–3

๐‘›+๐‘ข๐‘‘/2๐ด

โ€–๐‘0โ€–2

๐‘›2

)โ†’ 0

as ๐‘›โ†’ โˆž. Hence we conclude sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝa๐‘‘/2๐‘› E๐‘›๐บ2a๐‘› โˆ’ a๐‘‘/2๐‘› E๐บ2a๐‘›

๏ฟฝ๏ฟฝ๏ฟฝ = ๐‘œ๐‘ (1).Considering that

lima๐‘›โ†’โˆž

a๐‘‘/2๐‘› E๐บa๐‘› (๐‘‹1, ๐‘‹2)๐บa๐‘› (๐‘‹1, ๐‘‹3) = 0, lim

a๐‘›โ†’โˆža๐‘‘/2๐‘› [E๐บa๐‘› (๐‘‹1, ๐‘‹2)]2 = 0,

we obtain that (B.2) and (B.3) are also ๐‘œ๐‘ (1), based on almost the same arguments. Hence

sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๐‘ 2๐‘›,a๐‘›/E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 โˆ’ 1๏ฟฝ๏ฟฝ = ๐‘œ๐‘ (1).

On the other hand, since E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 &๐‘0,๐‘‘ aโˆ’๐‘‘/2๐‘› for a๐‘› โˆˆ [1, ๐‘›2/๐‘‘],

sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

1๐‘›2E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2

= ๐‘œ๐‘ (1).

109

Page 121: On the Construction of Minimax Optimal Nonparametric Tests ...

Hence we finally conclude that

sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๐‘ 2๐‘›,a๐‘›/E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2 โˆ’ 1๏ฟฝ๏ฟฝ = ๐‘œ๐‘ (1).

B.3 Proof of Lemma 6

Let

๐พa๐‘› (๐‘ฅ, ๐‘ฅโ€ฒ) =๐บa๐‘› (๐‘ฅ, ๐‘ฅโ€ฒ)โˆš

2E๐บ2a๐‘› (๐‘‹1, ๐‘‹2), โˆ€ ๐‘ฅ, ๐‘ฅโ€ฒ โˆˆ R๐‘‘ ,

and accordingly,

๏ฟฝ๏ฟฝa๐‘› (๐‘ฅ, ๐‘ฅโ€ฒ) =๏ฟฝ๏ฟฝa๐‘› (๐‘ฅ, ๐‘ฅโ€ฒ)โˆš

2E๐บ2a๐‘› (๐‘‹1, ๐‘‹2).

Hence

๐‘‡GOF(adapt)๐‘› = sup

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ 1๐‘› โˆ’ 1

โˆ‘๐‘–โ‰  ๐‘—

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ) ยทโˆšE๐บ2a๐‘› (๐‘‹1, ๐‘‹2)E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ .To finish this proof, we first bound

sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ 1๐‘› โˆ’ 1

โˆ‘๐‘–โ‰  ๐‘—

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— )๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ (B.4)

and then control ๐‘‡GOF(adapt)๐‘› .

Step (i). There are two main tools that we borrow in this step. First, we apply results in Arcones

and Gine (1993) to obtain a Bernstein-type inequality for๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ 1๐‘› โˆ’ 1

โˆ‘๐‘–โ‰  ๐‘—

๏ฟฝ๏ฟฝa0 (๐‘‹๐‘–, ๐‘‹ ๐‘— )๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ and

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ 1๐‘› โˆ’ 1

โˆ‘๐‘–โ‰  ๐‘—

(๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ) โˆ’ ๏ฟฝ๏ฟฝaโ€ฒ๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— )

) ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝfor some a0 and arbitrary a๐‘›, aโ€ฒ๐‘› โˆˆ [1,โˆž). And based on that, we borrow Talagrandโ€™s techniques

on handling Bernstein-type inequality (e.g., see Talagrand, 2014) to give a generic chaining bound

of (B.4).

110

Page 122: On the Construction of Minimax Optimal Nonparametric Tests ...

To be more specific, for any a0, a๐‘›, aโ€ฒ๐‘› โˆˆ [1, ๐‘›2/๐‘‘], define

๐‘‘1(a๐‘›, aโ€ฒ๐‘›) = โ€–๏ฟฝ๏ฟฝaโ€ฒ๐‘› โˆ’ ๏ฟฝ๏ฟฝa๐‘› โ€–๐ฟโˆž , ๐‘‘2(a๐‘›, aโ€ฒ๐‘›) = โ€–๏ฟฝ๏ฟฝaโ€ฒ๐‘› โˆ’ ๏ฟฝ๏ฟฝa๐‘› โ€–๐ฟ2 .

Then Proposition 2.3 (c) of Arcones and Gine (1993) ensures that for any ๐‘ก > 0,

๐‘ƒ

(๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ 1๐‘› โˆ’ 1

โˆ‘๐‘–โ‰  ๐‘—

๏ฟฝ๏ฟฝa0 (๐‘‹๐‘–, ๐‘‹ ๐‘— )๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โ‰ฅ ๐‘ก

)โ‰ค ๐ถ exp

(โˆ’๐ถmin

{๐‘ก

โ€–๏ฟฝ๏ฟฝa0 โ€–๐ฟ2

,

( โˆš๐‘›๐‘ก

โ€–๏ฟฝ๏ฟฝa0 โ€–๐ฟโˆž

) 23})

(B.5)

and

๐‘ƒ

(๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ 1๐‘› โˆ’ 1

โˆ‘๐‘–โ‰  ๐‘—

(๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ) โˆ’ ๏ฟฝ๏ฟฝaโ€ฒ๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— )

) ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โ‰ฅ ๐‘ก)

โ‰ค๐ถ exp

(โˆ’๐ถmin

{๐‘ก

๐‘‘2(a๐‘›, aโ€ฒ๐‘›),

( โˆš๐‘›๐‘ก

๐‘‘1(a๐‘›, aโ€ฒ๐‘›)

) 23})

for some ๐ถ > 0, and based on a chaining type argument see, e.g., Theorem 2.2.28 in Talagrand,

2014 the latter inequality suggests there exists ๐ถ > 0 such that

๐‘ƒ

(sup

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ 1๐‘› โˆ’ 1

โˆ‘๐‘–โ‰  ๐‘—

(๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— ) โˆ’ ๏ฟฝ๏ฟฝa0 (๐‘‹๐‘–, ๐‘‹ ๐‘— )

) ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โ‰ฅ (B.6)

๐ถ

(๐›พ2/3( [1, ๐‘›2/๐‘‘], ๐‘‘1)โˆš

๐‘›๐‘ก + ๐›พ1( [1, ๐‘›2/๐‘‘], ๐‘‘2) + ๐ท2๐‘ก

) ). exp(โˆ’๐‘ก2/3),

where ๐›พ2/3( [1, ๐‘›2/๐‘‘], ๐‘‘1), ๐›พ1( [1, ๐‘›2/๐‘‘], ๐‘‘2) are the so-called ๐›พ-functionals and

๐ท2 =โˆ‘๐‘™โ‰ฅ0

๐‘’๐‘™ ( [1, ๐‘›2/๐‘‘], ๐‘‘2)

with ๐‘’๐‘™ being the so-called entropy numbers.

111

Page 123: On the Construction of Minimax Optimal Nonparametric Tests ...

A straightforward combination of (B.5) and (B.6) then gives

๐‘ƒ

(sup

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ 1๐‘› โˆ’ 1

โˆ‘๐‘–โ‰  ๐‘—

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— )๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โ‰ฅ

๐ถ

(๐›พ2/3( [1, ๐‘›2/๐‘‘], ๐‘‘1)โˆš

๐‘›๐‘ก + ๐›พ1( [1, ๐‘›2/๐‘‘], ๐‘‘2) + ๐ท2๐‘ก +

โ€–๏ฟฝ๏ฟฝa0 โ€–๐ฟโˆžโˆš๐‘›

+ โ€–๏ฟฝ๏ฟฝa0 โ€–๐ฟ2๐‘ก

) ). exp(โˆ’๐‘ก2/3).

Therefore, given that the bounds on โ€–๏ฟฝ๏ฟฝa0 โ€–๐ฟ2 and โ€–๏ฟฝ๏ฟฝa0 โ€–๐ฟโˆž can be obtained quite directly, e.g.,

with a0 = 1,

โ€–๏ฟฝ๏ฟฝa0 โ€–๐ฟโˆž โ‰ค 4โ€–๐พa0 | |๐ฟโˆž =4

โˆš2E๐บ2

, โ€–๏ฟฝ๏ฟฝa0 โ€–๐ฟ2 โ‰ค โ€–๐พa0 โ€–๐ฟ2 =

โˆš2

2,

the main focus is to bound ๐›พ2/3( [1, ๐‘›2/๐‘‘], ๐‘‘1), ๐›พ1( [1, ๐‘›2/๐‘‘], ๐‘‘2) and ๐ท2 properly.

First consider ๐›พ2/3( [1, ๐‘›2/๐‘‘], ๐‘‘1). Note that for any 1 โ‰ค a๐‘› < aโ€ฒ๐‘› < โˆž,

๐‘‘1(a๐‘›, aโ€ฒ๐‘›) โ‰ค 4โ€–๐พa๐‘› โˆ’ ๐พaโ€ฒ๐‘› โ€–๐ฟโˆž โ‰ค 4โˆซ aโ€ฒ๐‘›

a๐‘›

๐‘‘๐พ๐‘ข๐‘‘๐‘ข ๐ฟโˆž

๐‘‘๐‘ข

Since for any a๐‘›,

๐‘‘๐พa๐‘›

๐‘‘a๐‘›=(โˆ’โ€–๐‘ฅ โˆ’ ๐‘ฅโ€ฒโ€–2)๐บa๐‘› (๐‘‹1, ๐‘‹2)

(E๐บ2a๐‘› (๐‘‹1, ๐‘‹2)

)โˆ’1/2

โˆ’12๐บa๐‘› (๐‘‹1, ๐‘‹2)

(E๐บ2a๐‘› (๐‘‹1, ๐‘‹2)

)โˆ’3/2 ๐‘‘

๐‘‘a๐‘›E๐บ2a๐‘› (๐‘‹1, ๐‘‹2)

where

(E๐บ2a๐‘› (๐‘‹1, ๐‘‹2)

)โˆ’1/2=

(๐œ‹2

)โˆ’๐‘‘/4a๐‘‘/4๐‘›

(โˆซexp

(โˆ’โ€–๐œ”โ€–2

8a๐‘›

)โ€–F ๐‘0(๐œ”)โ€–2๐‘‘๐œ”

)โˆ’1/2

.๐‘‘ a๐‘‘/4๐‘›

(โˆซexp

(โˆ’โ€–๐œ”โ€–2

8

)โ€–F ๐‘0(๐œ”)โ€–2๐‘‘๐œ”

)โˆ’1/2,

112

Page 124: On the Construction of Minimax Optimal Nonparametric Tests ...

(E๐บ2a๐‘› (๐‘‹1, ๐‘‹2)

)โˆ’3/2 .๐‘‘ a3๐‘‘/4๐‘›

(โˆซexp

(โˆ’โ€–๐œ”โ€–2

8

)โ€–F ๐‘0(๐œ”)โ€–2๐‘‘๐œ”

)โˆ’3/2,

and

๐‘‘

๐‘‘a๐‘›E2a๐‘› (๐‘‹1, ๐‘‹2)

=

(๐œ‹2

)๐‘‘/2aโˆ’๐‘‘/2โˆ’1๐‘›

(โˆ’๐‘‘

2ยทโˆซ

exp(โˆ’โ€–๐œ”โ€–2

8a๐‘›

)โ€–F ๐‘0(๐œ”)โ€–2๐‘‘๐œ”

+โˆซ

exp(โˆ’โ€–๐œ”โ€–2

8a๐‘›

) (โ€–๐œ”โ€–2

8a๐‘›

)โ€–F ๐‘0(๐œ”)โ€–2๐‘‘๐œ”

),

which together ensure ๐‘‘๐พa๐‘›๐‘‘a๐‘›

๐ฟโˆž

.๐‘‘,๐‘0 a๐‘‘/4โˆ’1๐‘› .

Hence

๐‘‘1(a๐‘›, aโ€ฒ๐‘›) .๐‘‘,๐‘0 |a๐‘‘/4๐‘› โˆ’ (aโ€ฒ๐‘›)๐‘‘/4 |,

and ๐›พ2/3( [1, ๐‘›2/๐‘‘], ๐‘‘1) .๐‘‘,๐‘0 | (๐‘›2/๐‘‘)๐‘‘/4 โˆ’ 1๐‘‘/4 | โ‰คโˆš๐‘›.

Then consider ๐›พ1( [1, ๐‘›2/๐‘‘], ๐‘‘2). We have

๐‘‘22 (a๐‘›, a

โ€ฒ๐‘›) โ‰ค โ€–๐พaโ€ฒ๐‘› โˆ’ ๐พa๐‘› โ€–

2๐ฟ2

= 1 โˆ’E๐บa๐‘›๐บaโ€ฒ๐‘›โˆšE๐บ2a๐‘›E๐บ2aโ€ฒ๐‘›

โ‰ค โˆ’ log

(E๐บa๐‘›๐บaโ€ฒ๐‘›โˆšE๐บ2a๐‘›E๐บ2aโ€ฒ๐‘›

)

Let ๐‘“1(a๐‘›) =โˆซ

exp(โˆ’ โ€–๐œ”โ€–2

8a๐‘›

)โ€–F ๐‘0(๐œ”)โ€–2๐‘‘๐œ”. Then

log(E๐บ2a๐‘›

)=๐‘‘

2log

(๐œ‹

2a๐‘›

)+ log ๐‘“1(a๐‘›)

and hence

โˆ’ log

(E๐บa๐‘›๐บaโ€ฒ๐‘›โˆšE๐บ2a๐‘›E๐บ2aโ€ฒ๐‘›

)=๐‘‘

2

(โˆ’

log a๐‘› + log aโ€ฒ๐‘›2

+ log(a๐‘› + aโ€ฒ๐‘›

2

))+

(log ๐‘“1(a๐‘›) + log ๐‘“1(aโ€ฒ๐‘›)

2โˆ’ log ๐‘“1

(a๐‘› + aโ€ฒ๐‘›

2

)).

113

Page 125: On the Construction of Minimax Optimal Nonparametric Tests ...

Note that

log ๐‘“1(a๐‘›) + log ๐‘“1(aโ€ฒ๐‘›)2

โˆ’ log ๐‘“1(a๐‘› + aโ€ฒ๐‘›

2

)=

12

โˆซ aโ€ฒ๐‘›โˆ’a๐‘›2

0

โˆซ ๐‘ข

โˆ’๐‘ข

(log ๐‘“1

(aโ€ฒ๐‘› + a๐‘›

2+ ๐‘ฃ

))โ€ฒโ€ฒ๐‘‘๐‘ฃ๐‘‘๐‘ข.

For any a๐‘› โ‰ฅ 1,

(log ๐‘“1(a๐‘›))โ€ฒโ€ฒ =๐‘“1(a๐‘›) ๐‘“ โ€ฒโ€ฒ1 (a๐‘›) โˆ’ ( ๐‘“ โ€ฒ1 (a๐‘›))

2

๐‘“ 21 (a๐‘›)

โ‰ค๐‘“ โ€ฒโ€ฒ1 (a๐‘›)๐‘“1(a๐‘›)

,

and

๐‘“ โ€ฒโ€ฒ1 (a๐‘›) =โˆซ

exp(โˆ’โ€–๐œ”โ€–2

8a๐‘›

) (โ€–๐œ”โ€–4

64a4๐‘›

โˆ’ โ€–๐œ”โ€–2

4a3๐‘›

)โ€–F ๐‘0(๐œ”)โ€–2๐‘‘๐œ” . aโˆ’2

๐‘› โ€–๐‘0โ€–2๐ฟ2.

Moreover, there exists aโˆ—๐‘› = aโˆ—๐‘› (๐‘0) > 1 such that ๐‘“1(aโˆ—๐‘›) โ‰ฅ โ€–๐‘0โ€–2

๐ฟ2/2, from which we obtain

(log ๐‘“1(a๐‘›))โ€ฒโ€ฒ .

aโˆ’2๐‘› โ€–๐‘0โ€–2

๐ฟ2/ ๐‘“1(1), 1 โ‰ค a๐‘› โ‰ค aโˆ—๐‘›

aโˆ’2๐‘› , aโˆ—๐‘› < a๐‘› โ‰ค ๐‘›2/๐‘‘

,

which suggests that for any a๐‘›, aโ€ฒ๐‘› โˆˆ [1, aโˆ—๐‘›]

๐‘‘22 (a๐‘›, a

โ€ฒ๐‘›) .

(๐‘‘

2+โ€–๐‘0โ€–2

๐ฟ2

๐‘“1(1)

) (โˆ’

log a๐‘› + log aโ€ฒ๐‘›2

+ log(a๐‘› + aโ€ฒ๐‘›

2

)).

(๐‘‘

2+โ€–๐‘0โ€–2

๐ฟ2

๐‘“1(1)

)| log a๐‘› โˆ’ log aโ€ฒ๐‘› |,

and for any a๐‘›, aโ€ฒ๐‘› โˆˆ [aโˆ—๐‘›, ๐‘›2/๐‘‘]

๐‘‘22 (a๐‘›, a

โ€ฒ๐‘›) .

(๐‘‘

2+ 1

)| log a๐‘› โˆ’ log aโ€ฒ๐‘› |.

Note that in addition to the bound on ๐‘‘2 obtained above, we also have

๐‘‘2(a๐‘›, aโ€ฒ๐‘›) โ‰ค โ€–๏ฟฝ๏ฟฝa๐‘› โ€–๐ฟ2 + โ€–๏ฟฝ๏ฟฝaโ€ฒ๐‘› โ€–๐ฟ2 โ‰ค โ€–๐พa๐‘› โ€–๐ฟ2 + โ€–๐พaโ€ฒ๐‘› โ€–๐ฟ2 โ‰คโˆš

2.

114

Page 126: On the Construction of Minimax Optimal Nonparametric Tests ...

Therefore,

๐›พ1( [1, ๐‘›2/๐‘‘], ๐‘‘2) โ‰คโˆ‘๐‘™โ‰ฅ0

2๐‘™๐‘’๐‘™ ( [1, ๐‘›2/๐‘‘], ๐‘‘2)

.๐‘’0( [1, ๐‘›2/๐‘‘], ๐‘‘2) +โˆ‘๐‘™โ‰ฅ0

2๐‘™๐‘’๐‘™ ( [1, aโˆ—๐‘›], ๐‘‘2) +โˆ‘๐‘™โ‰ฅ0

2๐‘™๐‘’๐‘™ ( [aโˆ—๐‘›, ๐‘›2/๐‘‘], ๐‘‘2)

.1 +

โˆš๐‘‘

2+โ€–๐‘0โ€–2

๐ฟ2

๐‘“1(1)โˆ‘๐‘™โ‰ฅ0

2๐‘™โˆš

log aโˆ—๐‘› โˆ’ log 122๐‘™

+โˆš๐‘‘

2+ 1 ยฉยญยซ

โˆ‘๐‘™โ‰ฅ0

2๐‘™ min1,

โˆšlog ๐‘›2/๐‘‘ โˆ’ log aโˆ—๐‘›

22๐‘™

ยชยฎยฌ.1 +

โˆš๐‘‘

2+โ€–๐‘0โ€–2

๐ฟ2

๐‘“1(1)โˆš

log aโˆ—๐‘› +โˆš๐‘‘

2+ 1 ยฉยญยซ

โˆ‘๐‘™โ‰ฅ0

2๐‘™ min1,

โˆšlog ๐‘›2/๐‘‘

22๐‘™

ยชยฎยฌ.1 +

โˆš๐‘‘

2+โ€–๐‘0โ€–2

๐ฟ2

๐‘“1(1)โˆš

log aโˆ—๐‘› +โˆš๐‘‘

2+ 1 ยฉยญยซ

โˆ‘0โ‰ค๐‘™<๐‘™โˆ—

2๐‘™ +โˆ‘๐‘™โ‰ฅ๐‘™โˆ—

2๐‘™โˆš

log ๐‘›2/๐‘‘

22๐‘™ยชยฎยฌ

.1 +

โˆš๐‘‘

2+โ€–๐‘0โ€–2

๐ฟ2

๐‘“1(1)โˆš

log aโˆ—๐‘› +โˆš๐‘‘

2+ 1 ยท 2๐‘™

โˆ—

where ๐‘™โˆ— is the smallest ๐‘™ such that โˆšlog ๐‘›2/๐‘‘

22๐‘™โ‰ค 1.

Hence 2๐‘™โˆ— ๏ฟฝ log log ๐‘› and there exists ๐ถ = ๐ถ (๐‘‘) > 0 such that

๐›พ1( [1, ๐‘›2/๐‘‘], ๐‘‘2) โ‰ค ๐ถ (๐‘‘) log log ๐‘›

for sufficiently large ๐‘›.

By the similar approach, we get that

๐ท2 . 1 +

โˆš๐‘‘

2+โ€–๐‘0โ€–2

๐ฟ2

๐‘“1(1)โˆš

log aโˆ—๐‘› +โˆš๐‘‘

2+ 1 ยท ๐‘™โˆ—

which is upper-bounded by ๐ถ (๐‘‘) log log log ๐‘› for sufficiently large ๐‘›.

115

Page 127: On the Construction of Minimax Optimal Nonparametric Tests ...

Therefore, we finally obtain that there exists ๐ถ (๐‘‘) > 0 such that for sufficiently large ๐‘›,

๐‘ƒ

(sup

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ 1๐‘› โˆ’ 1

โˆ‘๐‘–โ‰  ๐‘—

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— )๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โ‰ฅ ๐ถ (๐‘‘) (log log ๐‘› + ๐‘ก log log log ๐‘›)

). exp(โˆ’๐‘ก2/3). (B.7)

Step (ii). By slight abuse of notation, there exists aโˆ—๐‘› = aโˆ—๐‘› (๐‘0) > 1 such that

E๐บ2a๐‘› (๐‘‹1, ๐‘‹2)E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2

โ‰ค 2

for a๐‘› โ‰ฅ aโˆ—๐‘›. Therefore,

๐‘‡GOF(adapt)๐‘› โ‰ค sup

1โ‰คa๐‘›โ‰คaโˆ—๐‘›

โˆšE๐บ2a๐‘› (๐‘‹1, ๐‘‹2)E[๏ฟฝ๏ฟฝa๐‘› (๐‘‹1, ๐‘‹2)]2

ยท sup1โ‰คa๐‘›โ‰คaโˆ—๐‘›

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ 1๐‘› โˆ’ 1

โˆ‘๐‘–โ‰  ๐‘—

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— )๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ+

โˆš2 supaโˆ—๐‘›โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ 1๐‘› โˆ’ 1

โˆ‘๐‘–โ‰  ๐‘—

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— )๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ

โ‰ค๐ถ (๐‘0) sup1โ‰คa๐‘›โ‰คaโˆ—๐‘›

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ 1๐‘› โˆ’ 1

โˆ‘๐‘–โ‰  ๐‘—

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— )๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ+

โˆš2 supaโˆ—๐‘›โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ 1๐‘› โˆ’ 1

โˆ‘๐‘–โ‰  ๐‘—

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— )๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ

for some ๐ถ (๐‘0) > 0.

Based on arguments similar to those in the first step,

๐‘ƒ

(sup

1โ‰คa๐‘›โ‰คaโˆ—๐‘›

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ 1๐‘› โˆ’ 1

โˆ‘๐‘–โ‰  ๐‘—

๏ฟฝ๏ฟฝa๐‘› (๐‘‹๐‘–, ๐‘‹ ๐‘— )๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โ‰ฅ ๐ถ (๐‘‘, ๐‘0)๐‘ก

). exp(โˆ’๐‘ก2/3)

for some ๐ถ (๐‘‘, ๐‘0) > 0 and (B.7) still holds when a๐‘› is restricted to [aโˆ—๐‘›, ๐‘›2/๐‘‘]. They together prove

Lemma 6.

B.4 Decomposition of dHSIC and Its Variance Estimation

In this section, we first derive an approximation of ๐›พ2a (P, P๐‘‹

1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ ) under ๐ป0 for general

๐‘˜ , and then the approximation of var(๐›พ2a (P, P๐‘‹

1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ ))

can be obtained subsequently.

116

Page 128: On the Construction of Minimax Optimal Nonparametric Tests ...

Note that

๐บa (๐‘ฅ, ๐‘ฆ)

=

โˆซ๐บa (๐‘ข, ๐‘ฃ)๐‘‘ (๐›ฟ๐‘ฅ โˆ’ P + P) (๐‘ข)๐‘‘ (๐›ฟ๐‘ฆ โˆ’ P + P) (๐‘ฃ)

=๏ฟฝ๏ฟฝa (๐‘ฅ, ๐‘ฆ) + (E๐บa (๐‘ฅ, ๐‘‹) โˆ’ E๐บa (๐‘‹, ๐‘‹โ€ฒ)) + (E๐บa (๐‘ฆ, ๐‘‹) โˆ’ E๐บa (๐‘‹, ๐‘‹โ€ฒ)) + E๐บa (๐‘‹, ๐‘‹โ€ฒ).

Similarly write

๐บa (๐‘ฅ, (๐‘ฆ1, ยท ยท ยท , ๐‘ฆ๐‘˜ ))

=

โˆซ๐บa (๐‘ข, (๐‘ฃ1, ยท ยท ยท , ๐‘ฃ๐‘˜ ))๐‘‘ (๐›ฟ๐‘ฅ โˆ’ P + P)๐‘‘ (๐›ฟ๐‘ฆ1 โˆ’ P๐‘‹1 + P๐‘‹1) ยท ยท ยท ๐‘‘ (๐›ฟ๐‘ฆ๐‘˜ โˆ’ P๐‘‹

๐‘˜ + P๐‘‹ ๐‘˜ )

and expand it as the summation of all ๐‘™-variate centered components where ๐‘™ โ‰ค ๐‘˜ + 1. Do the

same expansion to ๐บa ((๐‘ฅ1, ยท ยท ยท , ๐‘ฅ๐‘˜ ), (๐‘ฆ1, ยท ยท ยท , ๐‘ฆ๐‘˜ )) and write it as the summation of all ๐‘™-variate

centered components where ๐‘™ โ‰ค 2๐‘˜ . Plug these expansions in ๐›พ2a (P, P๐‘‹

1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ ) and denote

the summation of all ๐‘™-variate centered components in such expression of ๐›พ2a (P, P๐‘‹

1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ )

by ๐ท ๐‘™ (a) for ๐‘™ โ‰ค 2๐‘˜ . Let the remainder ๐‘…๐‘› =2๐‘˜โˆ‘๐‘™=3๐ท ๐‘™ (a) so that

๐›พ2a (P, P๐‘‹

1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ ) = ๐›พ2a (P, P๐‘‹

1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ ) + ๐ท1(a) + ๐ท2(a) + ๐‘…๐‘›.

Straightforward calculation yields the following facts:

โ€ข E(๐‘…๐‘›)2 .๐‘˜ ๐‘›โˆ’3

(E๐บ2a (๐‘‹1, ๐‘‹2) +

๐‘˜โˆ๐‘™=1E๐บ2a (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2)

);

โ€ข under the null hypothesis, ๐ท1(a) = 0 and

๐ท2(a) =1

๐‘›(๐‘› โˆ’ 1)โˆ‘

1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›๐บโˆ—a (๐‘‹๐‘–, ๐‘‹ ๐‘— )

117

Page 129: On the Construction of Minimax Optimal Nonparametric Tests ...

where

๐บโˆ—a (๐‘ฅ, ๐‘ฆ) = ๏ฟฝ๏ฟฝa (๐‘ฅ, ๐‘ฆ) โˆ’

โˆ‘1โ‰ค ๐‘—โ‰ค๐‘˜

๐‘” ๐‘— (๐‘ฅ ๐‘— , ๐‘ฆ) โˆ’โˆ‘

1โ‰ค ๐‘—โ‰ค๐‘˜๐‘” ๐‘— (๐‘ฆ ๐‘— , ๐‘ฅ) +

โˆ‘1โ‰ค ๐‘—1, ๐‘—2โ‰ค๐‘˜

๐‘” ๐‘—1, ๐‘—2 (๐‘ฅ ๐‘—1 , ๐‘ฆ ๐‘—2).

Proof of Lemma 1. Observe that under ๐ป0,

var(๐›พ2a (P, P๐‘‹

1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ ))= E(๐ท2(a))2 + E (๐‘…๐‘›)2 =

2๐‘›(๐‘› โˆ’ 1)E[๐บ

โˆ—a (๐‘‹1, ๐‘‹2)]2 + E (๐‘…๐‘›)2 ,

E (๐‘…๐‘›)2 .๐‘˜ ๐‘›โˆ’3E๐บ2a (๐‘‹1, ๐‘‹2),

and

E[๐บโˆ—a (๐‘‹1, ๐‘‹2)]2

=Eยฉยญยซ๏ฟฝ๏ฟฝa (๐‘‹1, ๐‘‹2) โˆ’

โˆ‘1โ‰ค ๐‘—โ‰ค๐‘˜

๐‘” ๐‘— (๐‘‹ ๐‘—

1 , ๐‘‹2)ยชยฎยฌ

2

โˆ’ E ยฉยญยซโˆ‘

1โ‰ค ๐‘—โ‰ค๐‘˜๐‘” ๐‘— (๐‘‹ ๐‘—

2 , ๐‘‹1) +โˆ‘

1โ‰ค ๐‘—1, ๐‘—2โ‰ค๐‘˜๐‘” ๐‘—1, ๐‘—2 (๐‘‹

๐‘—11 , ๐‘‹

๐‘—22 )ยชยฎยฌ

2

=E๏ฟฝ๏ฟฝ2a (๐‘‹1, ๐‘‹2) โˆ’ 2

โˆ‘1โ‰ค ๐‘—โ‰ค๐‘˜

E(๐‘” ๐‘— (๐‘‹ ๐‘—

1 , ๐‘‹2))2

+โˆ‘

1โ‰ค ๐‘—1, ๐‘—2โ‰ค๐‘˜E(๐‘” ๐‘—1, ๐‘—2 (๐‘‹

๐‘—11 , ๐‘‹

๐‘—22 )

)2.

They together conclude the proof.

Below we shall further expand E๏ฟฝ๏ฟฝ2a (๐‘‹1, ๐‘‹2), E

(๐‘” ๐‘— (๐‘‹ ๐‘—

1 , ๐‘‹2))2

and E(๐‘” ๐‘—1, ๐‘—2 (๐‘‹

๐‘—11 , ๐‘‹

๐‘—22 )

)2in Lemma

1, based on which consistent estimator of var(๐›พ2a (P, P๐‘‹

1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ ))

can be derived naturally.

First,

E๏ฟฝ๏ฟฝ2a (๐‘‹1, ๐‘‹2)

=E๐บ2a (๐‘‹1, ๐‘‹2) โˆ’ 2E๐บa (๐‘‹1, ๐‘‹2)๐บa (๐‘‹1, ๐‘‹3) + (E๐บa (๐‘‹1, ๐‘‹2))2

=โˆ

1โ‰ค๐‘™โ‰ค๐‘˜E๐บ2a (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2) โˆ’ 2

โˆ1โ‰ค๐‘™โ‰ค๐‘˜

E๐บa (๐‘‹ ๐‘™1, ๐‘‹๐‘™2)๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™3) +

โˆ1โ‰ค๐‘™โ‰ค๐‘˜

(E๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2)

)2.

118

Page 130: On the Construction of Minimax Optimal Nonparametric Tests ...

Second,

E(๐‘” ๐‘— (๐‘‹ ๐‘—

1 , ๐‘‹2))2

=E๐บ2a (๐‘‹ ๐‘—

1 , ๐‘‹๐‘—

2 ) ยทโˆ๐‘™โ‰  ๐‘—

E๐บa (๐‘‹ ๐‘™1, ๐‘‹๐‘™2)๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™3) โˆ’

โˆ1โ‰ค๐‘™โ‰ค๐‘˜

E๐บa (๐‘‹ ๐‘™1, ๐‘‹๐‘™2)๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™3)

โˆ’ E๐บa (๐‘‹ ๐‘—

1 , ๐‘‹๐‘—

2 )๐บa (๐‘‹ ๐‘—

1 , ๐‘‹๐‘—

3 ) ยทโˆ๐‘™โ‰  ๐‘—

(E๐บa (๐‘‹ ๐‘™1, ๐‘‹๐‘™2))

2 +โˆ

1โ‰ค๐‘™โ‰ค๐‘˜

(E๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2)

)2.

Hence

โˆ‘1โ‰ค ๐‘—โ‰ค๐‘˜

E(๐‘” ๐‘— (๐‘‹ ๐‘—

1 , ๐‘‹2))2

=

( โˆ1โ‰ค๐‘™โ‰ค๐‘˜

E๐บa (๐‘‹ ๐‘™1, ๐‘‹๐‘™2)๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™3)

) ยฉยญยซโˆ‘

1โ‰ค ๐‘—โ‰ค๐‘˜

E๐บ2a (๐‘‹ ๐‘—

1 , ๐‘‹๐‘—

2 )E๐บa (๐‘‹ ๐‘—

1 , ๐‘‹๐‘—

2 )๐บa (๐‘‹ ๐‘—

1 , ๐‘‹๐‘—

3 )โˆ’ ๐‘˜ยชยฎยฌ

โˆ’( โˆ1โ‰ค๐‘™โ‰ค๐‘˜

(E๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2)

)2) ยฉยญยซ

โˆ‘1โ‰ค ๐‘—โ‰ค๐‘˜

E๐บa (๐‘‹ ๐‘—

1 , ๐‘‹๐‘—

2 )๐บa (๐‘‹ ๐‘—

1 , ๐‘‹๐‘—

3 )(E๐บa (๐‘‹ ๐‘—

1 , ๐‘‹๐‘—

2 ))2โˆ’ ๐‘˜ยชยฎยฌ .

Finally,

E(๐‘” ๐‘—1, ๐‘—2 (๐‘‹

๐‘—11 , ๐‘‹

๐‘—22 )

)2

=

E(๏ฟฝ๏ฟฝa (๐‘‹ ๐‘—1

1 , ๐‘‹๐‘—12 ))2 ยท โˆ

๐‘™โ‰  ๐‘—1

(E๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2)

)2, ๐‘—1 = ๐‘—2โˆ

๐‘™โˆˆ{ ๐‘—1, ๐‘—2}

(E๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2)๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™3) โˆ’ (E๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2))

2) โˆ๐‘™โ‰  ๐‘—1, ๐‘—2

(E๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2)

)2, ๐‘—1 โ‰  ๐‘—2.

119

Page 131: On the Construction of Minimax Optimal Nonparametric Tests ...

Hence

โˆ‘1โ‰ค ๐‘—1, ๐‘—2โ‰ค๐‘˜

E(๐‘” ๐‘—1, ๐‘—2 (๐‘‹

๐‘—11 , ๐‘‹

๐‘—22 )

)2

=

( โˆ1โ‰ค๐‘™โ‰ค๐‘˜

(E๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2)

)2) ( โˆ‘

1โ‰ค ๐‘—1โ‰ค๐‘˜

E(๏ฟฝ๏ฟฝa (๐‘‹ ๐‘—11 , ๐‘‹

๐‘—12 ))2

(E๐บa (๐‘‹ ๐‘—11 , ๐‘‹

๐‘—12 ))2

+โˆ‘

1โ‰ค ๐‘—1โ‰  ๐‘—2โ‰ค๐‘˜

โˆ๐‘™โˆˆ{ ๐‘—1, ๐‘—2}

ยฉยญยญยซE๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2)๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2)(

E๐บa (๐‘‹ ๐‘™1, ๐‘‹๐‘™2)

)2 โˆ’ 1ยชยฎยฎยฌ).

Then the consistent estimator ๐‘ 2๐‘›,a of E(๐บโˆ—a (๐‘‹1, ๐‘‹2)

)2 is constructed by replacing

E๐บ2a (๐‘‹ ๐‘™1, ๐‘‹๐‘™2), E๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2)๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™3), (E๐บa (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2))

2

in the above expansions of

E๏ฟฝ๏ฟฝ2a (๐‘‹1, ๐‘‹2),

โˆ‘1โ‰ค ๐‘—โ‰ค๐‘˜

E(๐‘” ๐‘— (๐‘‹ ๐‘—

1 , ๐‘‹2))2,

โˆ‘1โ‰ค ๐‘—1, ๐‘—2โ‰ค๐‘˜

E(๐‘” ๐‘—1, ๐‘—2 (๐‘‹

๐‘—11 , ๐‘‹

๐‘—22 )

)2

with the corresponding unbiased estimators

1๐‘›(๐‘› โˆ’ 1)

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๐บ2a๐‘› (๐‘‹ ๐‘™๐‘– , ๐‘‹ ๐‘™๐‘— ),(๐‘› โˆ’ 3)!๐‘›!

โˆ‘1โ‰ค๐‘–, ๐‘—1, ๐‘—2โ‰ค๐‘›|{๐‘–, ๐‘—1, ๐‘—2}|=3

๐บa๐‘› (๐‘‹ ๐‘™๐‘– , ๐‘‹ ๐‘™๐‘—1)๐บa๐‘› (๐‘‹ ๐‘™๐‘– , ๐‘‹ ๐‘™๐‘—2)

(๐‘› โˆ’ 4)!๐‘›!

โˆ‘1โ‰ค๐‘–1,๐‘–2, ๐‘—1, ๐‘—2โ‰ค๐‘›|{๐‘–1,๐‘–2, ๐‘—1, ๐‘—2}|=4

๐บa๐‘› (๐‘‹ ๐‘™๐‘–1 , ๐‘‹๐‘™๐‘—1)๐บa๐‘› (๐‘‹ ๐‘™๐‘–2 , ๐‘‹

๐‘™๐‘—2)

for 1 โ‰ค ๐‘™ โ‰ค ๐‘˜ . Again, to avoid a negative estimate of the variance, we can replace ๐‘ 2๐‘›,a๐‘› with 1/๐‘›2

whenever it is negative or too small. Namely, let

๏ฟฝ๏ฟฝ2๐‘›,a๐‘› = max{๐‘ 2๐‘›,a๐‘› , 1/๐‘›

2} ,and estimate var

(๐›พ2a (P, P๐‘‹

1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ ))

by 2๏ฟฝ๏ฟฝ2๐‘›,a/(๐‘›(๐‘› โˆ’ 1)).

120

Page 132: On the Construction of Minimax Optimal Nonparametric Tests ...

Therefore for general ๐‘˜ , the single kernel test statistic and the adaptive test statistic are con-

structed as

๐‘‡ IND๐‘›,a๐‘›

=๐‘›โˆš

2๏ฟฝ๏ฟฝโˆ’1๐‘›,a๐‘›

๐›พ2a๐‘› (P, P

๐‘‹1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ ) and ๐‘‡IND(adapt)๐‘› = max

1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘๐‘‡ IND๐‘›,a๐‘›

respectively. Accordingly, ฮฆIND๐‘›,a๐‘›,๐›ผ

and ฮฆIND(adapt) can be constructed as in the case of ๐‘˜ = 2.

B.5 Theoretical Properties of Independence Tests for General ๐‘˜

In this section, with ฮฆIND๐‘›,a๐‘›,๐›ผ

and ฮฆIND(adapt) constructed in Appendix B.4 for general ๐‘˜ , we

confirm that Theorem 12, Theorem 13 and Theorem 16 still hold. We shall only emphasize the

main differences between the new proofs and the original proofs in the case of ๐‘˜ = 2.

Under the null hypothesis: we only need to re-ensure that ๐‘ 2๐‘›,a๐‘› is a consistent estimator of

E[๐บโˆ—a๐‘›(๐‘‹1, ๐‘‹2)]2. Specifically, we show that

๐‘ 2๐‘›,a๐‘›/E[๐บโˆ—a๐‘›(๐‘‹1, ๐‘‹2)]2 โ†’๐‘ 1

given 1 ๏ฟฝ a๐‘› ๏ฟฝ ๐‘›4/๐‘‘ for Theorem 12 and

sup1โ‰คa๐‘›โ‰ค๐‘›2/๐‘‘

๏ฟฝ๏ฟฝ๐‘ 2๐‘›,a๐‘›/E[๐บโˆ—a๐‘›(๐‘‹1, ๐‘‹2)]2 โˆ’ 1

๏ฟฝ๏ฟฝ = ๐‘œ๐‘ (1)for Theorem 16.

To prove the former one, since

E[๐บโˆ—a๐‘›(๐‘‹1, ๐‘‹2)]2

(๐œ‹/(2a๐‘›))๐‘‘/2โ€–๐‘โ€–2๐ฟ2

โ†’ 1

as a๐‘› โ†’ โˆž, it suffices to show

a๐‘‘/2๐‘›

๏ฟฝ๏ฟฝ๐‘ 2๐‘›,a๐‘› โˆ’ E[๐บโˆ—a๐‘›(๐‘‹1, ๐‘‹2)]2๏ฟฝ๏ฟฝ = ๐‘œ๐‘ (1),

121

Page 133: On the Construction of Minimax Optimal Nonparametric Tests ...

which follows considering that

a๐‘‘๐‘™/2๐‘› E๐บ2a๐‘› (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2), a

๐‘‘/2๐‘› E๐บa๐‘› (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2)๐บa๐‘› (๐‘‹ ๐‘™1, ๐‘‹

๐‘™3), a

๐‘‘๐‘™/2๐‘› (E๐บa๐‘› (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2))

2 (B.8)

are all bounded and they are estimated consistently by their corresponding estimators. For example,

a๐‘‘๐‘™/2๐‘› E๐บ2a๐‘› (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2) โ†’ (๐œ‹/2)๐‘‘๐‘™/2 โ€–๐‘๐‘™ โ€–2

๐ฟ2

and

a๐‘‘๐‘™๐‘› E

ยฉยญยซ 1๐‘›(๐‘› โˆ’ 1)

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๐บ2a๐‘› (๐‘‹ ๐‘™๐‘– , ๐‘‹ ๐‘™๐‘— ) โˆ’ E๐บ2a๐‘› (๐‘‹ ๐‘™1, ๐‘‹๐‘™2)

ยชยฎยฌ2

= a๐‘‘๐‘™๐‘› var ยฉยญยซ 1

๐‘›(๐‘› โˆ’ 1)โˆ‘

1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›๐บ2a๐‘› (๐‘‹ ๐‘™๐‘– , ๐‘‹ ๐‘™๐‘— )

ยชยฎยฌ. a

๐‘‘๐‘™๐‘›

(๐‘›โˆ’1E๐บ2a๐‘› (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2)๐บ2a๐‘› (๐‘‹ ๐‘™1, ๐‘‹

๐‘™3) + ๐‘›

โˆ’2E๐บ4a๐‘› (๐‘‹ ๐‘™1, ๐‘‹๐‘™2)

).๐‘‘๐‘™๐‘›

โˆ’1a๐‘‘๐‘™/4๐‘› โ€–๐‘๐‘™ โ€–3

๐ฟ2+ ๐‘›โˆ’2a

๐‘‘๐‘™/2๐‘› โ€–๐‘๐‘™ โ€–2

๐ฟ2โ†’ 0.

The proof of the latter one is similar. It sufficies to have

โ€ข each term in (B.8) is bounded for a๐‘› โˆˆ [1,โˆž), which immediately follows since each term

is continuous and converges at โˆž;

โ€ข the difference between each term in (B.8) and its corresponding estimator converges to 0

uniformly over a๐‘› โˆˆ [1, ๐‘›2/๐‘‘], the proof of which is the same with that of Lemma 5.

Under the alternative hypothesis: we only need to re-ensure that ๏ฟฝ๏ฟฝ๐‘›,a๐‘› is bounded. Specifi-

cally, we show

inf๐‘โˆˆ๐ปIND

1 (ฮ”๐‘›,๐‘ )

๐‘›๐›พ2a๐‘›(P, P๐‘‹1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ )[E

(๏ฟฝ๏ฟฝ2๐‘›,a๐‘›

)1/๐‘˜] ๐‘˜/2 โ†’ โˆž

122

Page 134: On the Construction of Minimax Optimal Nonparametric Tests ...

for Theorem 13 and

inf๐‘ โ‰ฅ๐‘‘/4

inf๐‘โˆˆ๐ปIND

1 (ฮ”๐‘›,๐‘ ;๐‘ )๐‘ƒ

(๏ฟฝ๏ฟฝ2๐‘›,a๐‘› (๐‘ ) โ€ฒ โ‰ค 2๐‘€2(2a๐‘› (๐‘ )โ€ฒ/๐œ‹)โˆ’๐‘‘/2

)โ†’ 1 (B.9)

for Theorem 16, where a๐‘› (๐‘ )โ€ฒ = (log log ๐‘›/๐‘›)โˆ’4/(4๐‘ +๐‘‘) .

The former one holds because

E(๏ฟฝ๏ฟฝ2๐‘›,a๐‘›

)1/๐‘˜โ‰ค E

(max

{๏ฟฝ๏ฟฝ๐‘ 2๐‘›,a๏ฟฝ๏ฟฝ , 1/๐‘›2})1/๐‘˜

โ‰ค E๏ฟฝ๏ฟฝ๐‘ 2๐‘›,a๏ฟฝ๏ฟฝ1/๐‘˜ + ๐‘›โˆ’2/๐‘˜

.๐‘˜

(๐‘˜โˆ๐‘™=1E๐บ2a๐‘› (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2)

)1/๐‘˜

+ ๐‘›โˆ’2/๐‘˜

โ‰ค(๐‘€2(๐œ‹/(2a๐‘›))๐‘‘/2

)1/๐‘˜+ ๐‘›โˆ’2/๐‘˜ .

where the second to last inequality follows from generalized Hรถlderโ€™s inequality. For example,

Eยฉยญยซ๐‘˜โˆ๐‘™=1

1๐‘›(๐‘› โˆ’ 1)

โˆ‘1โ‰ค๐‘–โ‰  ๐‘—โ‰ค๐‘›

๐บ2a๐‘› (๐‘‹ ๐‘™๐‘– , ๐‘‹ ๐‘™๐‘— )ยชยฎยฌ

1/๐‘˜

โ‰ค(๐‘˜โˆ๐‘™=1E๐บ2a๐‘› (๐‘‹ ๐‘™1, ๐‘‹

๐‘™2)

)1/๐‘˜

.

To prove the latter one, note that for a๐‘› = a๐‘› (๐‘ )โ€ฒ, all three terms in (B.8) are bounded by

๐‘€2๐‘™(๐œ‹/2)๐‘‘๐‘™/2 and the variances of their corresponding estimators are bounded by

๐ถ (๐‘‘๐‘™)(๐‘›โˆ’1 (a๐‘› (๐‘ )โ€ฒ)๐‘‘๐‘™/4 ๐‘€3

๐‘™ + ๐‘›โˆ’2 (a๐‘› (๐‘ )โ€ฒ)๐‘‘๐‘™/2 ๐‘€2

๐‘™

)= ๐‘œ(1)

uniformly over all ๐‘ . Therefore,

inf๐‘ โ‰ฅ๐‘‘/4

inf๐‘โˆˆ๐ปIND

1 (ฮ”๐‘›,๐‘ ;๐‘ )๐‘ƒ

((a๐‘› (๐‘ )โ€ฒ)๐‘‘/2

๏ฟฝ๏ฟฝ๏ฟฝ๐‘ 2๐‘›,a๐‘› (๐‘ ) โ€ฒ โˆ’ E[๐บโˆ—a๐‘› (๐‘ ) โ€ฒ (๐‘Œ1, ๐‘Œ2)]2

๏ฟฝ๏ฟฝ๏ฟฝ โ‰ค ๐‘€2(๐œ‹/2)๐‘‘/2)โ†’ 1

123

Page 135: On the Construction of Minimax Optimal Nonparametric Tests ...

where ๐‘Œ1, ๐‘Œ2 โˆผiid P๐‘‹1 โŠ— ยท ยท ยท โŠ— P๐‘‹ ๐‘˜ . Further considering that

E[๐บโˆ—a๐‘› (๐‘ ) โ€ฒ (๐‘Œ1, ๐‘Œ2)]2 โ‰ค E[๏ฟฝ๏ฟฝa๐‘› (๐‘ ) โ€ฒ (๐‘Œ1, ๐‘Œ2)]2 โ‰ค ๐‘€2(๐œ‹/(2a๐‘› (๐‘ )โ€ฒ))๐‘‘/2

and that

1/๐‘›2 = ๐‘œ((a๐‘› (๐‘ )โ€ฒ)โˆ’๐‘‘/2)

uniformly over all ๐‘ , we prove (B.9).

124


Recommended