Testing Community Structure for Hypergraphs

Submitted to the Annals of Statistics

TESTING COMMUNITY STRUCTURE FOR HYPERGRAPHS

By Mingao Yuan‡, Ruiqi Liu§, Yang Feng¶,∗ and Zuofeng Shang‖,†

North Dakota State University‡, Texas Tech University§, New York University¶ and New JerseyInstitute of Technology ‖

Many complex networks in the real world can be formulated ashypergraphs where community detection has been widely used. How-ever, the fundamental question of whether communities exist or notin an observed hypergraph remains unclear. This work aims to tacklethis important problem. Specifically, we systematically study whena hypergraph with community structure can be successfully distin-guished from its Erdos-Renyi counterpart, and propose concrete teststatistics when the models are distinguishable. The main contributionof this paper is threefold. First, we discover a phase transition in thehyperedge probability for distinguishability. Second, in the bounded-degree regime, we derive a sharp signal-to-noise ratio (SNR) thresh-old for distinguishability in the special two-community 3-uniform hy-pergraphs, and derive nearly tight SNR thresholds in the generaltwo-community m-uniform hypergraphs. Third, in the dense regime,we propose a computationally feasible test based on sub-hypergraphcounts, obtain its asymptotic distribution, and analyze its power. Ourresults are further extended to non-uniform hypergraphs in which anew test involving both edge and hyperedge information is proposed.The proofs rely on Janson’s contiguity theory [40], a high-momentsdriven asymptotic normality result by Gao and Wormald [36], and atruncation technique for analyzing the likelihood ratio.

1. Introduction. Community detection is a fundamental problem in network data analysis.For instance, in social networks [25, 38, 63], protein to protein interactions [21], image segmentation[59], among others, many algorithms have been developed for identifying community structure.Theoretical studies on community detection have mostly been focusing on ordinary graph settingin which each possible edge contains exactly two vertices (see [12, 4, 56, 63, 64, 35, 5]). Onecommon assumption made in these references is the existence of communities. Recently, a numberof researchers have been devoted to testing this assumption, e.g., [18, 44, 51, 16, 11, 33, 34, 61].Besides, hypothesis testing has been used to test the number of communities in a network [18, 44].

Real-world networks are usually more complex than ordinary graphs. Unlike ordinary graphswhere the data structure is typically unique, e.g., edges only contain two vertices, hypergraphsdemonstrate a number of possibly overlapping data structures. For instance, in coauthorship net-works [24, 54, 57, 52], the number of coauthors varies across different papers so that one cannotconsider edges consisting of two coauthors only. Instead, a new type of “edge,” called hyperedge,must be considered which allows the connectivity of arbitrarily many coauthors. The complex struc-tures of hypergraphs create new challenges in both theoretical and methodological studies. As faras we know, existing hypergraph literature mostly focuses on community detection in algorithmicaspects [58, 20, 12, 56, 4, 30, 41, 45]. Only recently, Ghoshdastidar and Dukkipati [30, 31] pro-vided a statistical study in which a spectral algorithm based on adjacency tensor was proposed for

∗Corresponding author. Supported by NSF CAREER Grant DMS-2013789.†Corresponding author. Supported by NSF DMS-1764280 and NSF DMS-1821157.AMS 2000 subject classifications: Primary 62G10; secondary 05C80Keywords and phrases: hypergraph, stochastic block model, hypothesis testing, contiguity, l-cycle.

1

http://www.imstat.org/aos/

2 YUAN, LIU, FENG AND SHANG

identifying community structure and asymptotic results were developed. Nonetheless, the impor-tant problem of testing the existence of community structure in an observed hypergraph remainsuntreated.

In this paper, we aim to tackle the problem of testing community structure for hypergraphs. Wefirst consider the relatively simpler but widely used uniform hypergraphs in which each hyperedgeconsists of an equal number of vertices. For instance, the (user, resource, annotation) structure infolksonomy may be represented as a uniform hypergraph where each hyperedge consists of threevertices [37]; the (user, remote host, login time, logout time) structure in the login-data can bemodeled as a uniform hypergraph where each hyperedge contains four vertices [32]; the point-set matching problem is usually formulated as identifying a strongly connected component in auniform hypergraph [20]. We provide various theoretical or methodological studies ranging fromdense uniform hypergraphs to sparse ones and investigate the possibility of a successful test ineach scenario. Our testing results in the dense case are then extended to the more general non-uniform hypergraph setting, in which a new test statistic involving both edge and hyperedge isproposed. One important finding is that our new test is more powerful than the classic one involvingedge information only, showing the advantage of using hyperedge information to boost the testingperformance. A more notable contribution is a nearly tight threshold for signal-to-noise ratio toexamine the existence of community structure (Theorem 2.6).

1.1. Review of Hypergraph Model And Relevant Literature. In this section, we review some basicnotions in hypergraphs and recent progress in the literature. Let us first review the notion of the uni-form hypergraph. An m-uniform hypergraph Hm = (V, E) consists of a vertex set V and a hyperedgeset E , where each hyperedge in E is a subset of V consisting of exactly m vertices. Two hyperedgesare the same if they are equal as vertex sets. An l-cycle inHm is a cyclic ordering v1, v2, . . . , vr of asubset of the vertex set with hyperedges like vi, vi+1, . . . , vi+m−1 and any two adjacent hyperedgeshave exactly l common vertices. An l-cycle is loose if l = 1 and tight if l = m−1. To better illustratethe notion, consider a 3-uniform hypergraph H3 = (V, E), where V = v1, v2, v3, v4, v5, v6, v7, E =(vi, vj , vt)|1 ≤ i < j < t ≤ 7. Then (v1, v2, v3, v4, v5, v6, (v1, v2, v3), (v3, v4, v5), (v5, v6, v1)) is aloose cycle and (v1, v2, v3, v4, (v1, v2, v3), (v2, v3, v4), (v3, v4, v1), (v4, v1, v2)) is a tight cycle (seeFigure 1).

v3

v4

v5

v2 v6

v1

E2

E1 E3

v1

v2

v3

v4E1

E2

E3

E4

Fig 1: Left: a loose cycle of three edges E1, E2, E3. Right: a tight cycle of four edges E1, E2, E3, E4. Both cycles aresubgraphs of the 3-uniform hypergraph H3(V, E).

Next, let us review uniform hypergraphs with a planted partitioning structure, also known as

TESTING HYPERGRAPH MODELS 3

stochastic block model (SBM). For any positive integers n,m, k with m, k ≥ 2, and positive se-quences 0 < qn < pn < 1 (possibly depending on n), let Hkm(n, pn, qn) denote a m-uniform hy-pergraph of n vertices and k balanced communities, in which pn (qn) represents the hyperedgeprobability within (between) communities. More explicitly, any vertex i ∈ [n] ≡ 1, 2, . . . , n isassigned, independently and uniformly at random, a label σi ∈ [k] ≡ 1, 2, . . . , k, and then eachpossible hyperedge (i1, i2, . . . , im) is included with probability pn if σi1 = σi2 = · · · = σim andwith probability qn otherwise. In particular, H2

2(n, pn, qn) (with m = k = 2) reduces to the or-

dinary bisection stochastic block models considered by [49, 61]. Let A ∈ 0, 1n× n× · · · × n︸︷︷︸

m denotethe symmetric adjacency tensor of order m associated with Hkm(n, pn, qn). By symmetry we meanthat Ai1i2...im = Aψ(i1)ψ(i2)...ψ(im) for any permutation ψ of (i1, i2, . . . , im). For convenience, as-sume Ai1i2...im = 0 if is = it for some distinct s, t ∈ 1, 2, . . . ,m, i.e., the hypergraph has noself-loops. Conditional on σ1, . . . , σn, the Ai1i2...im ’s, with i1, . . . , im pairwise distinct, are assumedto be independent following the distribution below:

(1) P(Ai1i2...im = 1|σ) = pi1i2...im(σ), P(Ai1i2...im = 0|σ) = qi1i2...im(σ),

where σ = (σ1, . . . , σn),

pi1i2...im(σ) =

pn, σi1 = · · · = σimqn, otherwise

, qi1i2...im(σ) = 1− pi1i2...im(σ).

In other words, each possible hyperedge (i1, . . . , im) is included with probability pn if the vertices

i1, . . . , im belong to the same community, and with probability qn otherwise. LetHm(n, pn+(km−1−1)qnkm−1 )

denote the m-uniform hypergraph without community structure, i.e., an Erdos-Renyi model in

which each possible hyperedge is included with common probability pn+(km−1−1)qnkm−1 . We consider

such a special choice of hyperedge probability in order to make the model have the same aver-age degree as Hkm(n, pn, qn). In particular, H2(n, pn+(k−1)qn

k ) with m = 2 becomes the traditionalErdos-Renyi model that has been well studied in ordinary graph literature; see [13, 14, 27, 23, 60].Non-uniform hypergraphs can be simply viewed as a superposition of uniform ones; see Section 3.Throughout this paper, we assume k and m are fixed constants independent of n.

Given an observed adjacency tensor A, does A represent a hypergraph that exhibits communitystructure? In the present setting, this problem can be formulated as testing the following hypothesis:

(2) H0 : A ∼ Hm(n,pn + (km−1 − 1)qn

km−1

)vs. H1 : A ∼ Hkm

(n, pn, qn

).

When m = k = 2, problem (2) has been well studied in the literature. Specifically, for extremelysparse scenario pn qn n−1, [49] show that H0 and H1 are always indistinguishable in the sensethat all tests are asymptotically powerless; for bounded degree case pn qn n−1, the two modelsare distinguishable if and only if the signal-to-noise ratio (SNR) is greater than 1 [49, 50, 61]; fordense scenario pn qn n−1, H0 and H1 are always distinguishable and a number of algorithmshave been developed (see [44, 33, 34, 16, 2, 18]). When m = 2 and k ≥ 3, the above statementsremain true for extremely sparse and dense scenarios; but for bounded degree scenario, SNR> 1 isonly a sufficient condition for successfully distinguishing H0 from H1 while a necessary conditionremains an open problem (see [2, 17, 62]). Abbe [1] provides a comprehensive review of the recentdevelopment in this field. From the best of our knowledge, there is a lack of literature dealing withthe testing problem (2) for general m. The literature on hypergraph analysis mainly focused oncommunity detection (see [6, 30, 31, 58, 20, 29, 41, 45, 48]).


1.2. Our Contributions. The aim of this paper is to provide a study on hypergraph testingunder a spectrum of hyperedge probability scenarios. Our results consist of four major parts.Section 2.1 deals with the extremely sparse scenario pn qn n−m+1, in which we show thatH0 and H1 are always indistinguishable in the sense of contiguity. Section 2.2 deals with boundeddegree case pn qn n−m+1, in which we show that H0 and H1 are distinguishable if the SNR ofuniform hypergraph is greater than a certain threshold, but indistinguishable if the SNR is belowanother threshold. Interestingly, when k = 2, the two thresholds are nearly tight in that they areof the same order 2−m(up to universal constants). We also construct a powerful test statistic whenSNR is greater than one based on counting the “long loose cycles”. Section 2.3 deals with densescenario pn qn n−m+1. We propose a test based on counting the hyperedges, l-hypervees,and l-hypertriangles with l determined by the order of pn (or qn), and show that the power of theproposed test approaches one as the number of vertices goes to infinity. In Section 3, we extendsome of the previous results to non-uniform hypergraph testing. We propose a new test involvingboth edge and hyperedge information and show that it is generally more powerful than the classictest using edge information only (see Remark 3.1). The results of the present paper can be viewedas nontrivial extensions of the ordinary graph testing results such as [49, 50, 33]. Section 4 providesnumerical studies to support our theory. Possible extensions are discussed in Section 5 and proofof the main results are collected in Section 6.

Figure 2 displays a phase transition phenomenon in the special 3-uniform hypergraph, based onour results in Sections 2.1 and 2.3. We find that H0 and H1 are indistinguishable if the hyperedgeprobabilities satisfy pn, qn = o(n−2) (see red zone), and are distinguishable if pn, qn n−2 (see greenzone), which is consistent with [8] who showed that community detection with weak consistency ispossible if and only if pn, qn n−2. Therefore, the seemingly different perspectives, i.e., hypothesistesting and community detection, appear to coincide here. In contrast, the spectral algorithmproposed by [31] is able to detect communities with strong consistency if pn, qn n−2(log n)2

(later improved to pn, qn n−2 log n by [45, 7]). For bounded degree case pn, qn n−2, detectionalgorithms better than random guess were proposed by [8, 19, 28]. Overall, in the references [8, 19],the SNR conditions are not comparable to our κ > 1 since unknown constants are involved in theirconditions. The SNR condition in [28] seems more restrictive than our condition κ > 1.

Number of Vertices n

Hyp

ered

ge P

roba

bilit

y

1n2

1 ∞

Fig 2: Phase transition for 3-uniform hypergraph. Red: indistinguishable; green: distinguishable.

On the next page, Figure 3 demonstrates the distinguishable and indistinguishable regions fortwo-community graph (left) and two-community 3-uniform hypergraph (right) in the bounded de-gree regime, i.e., pn = a

nm−1 , qn = bnm−1 . The regions are characterized by (a, b) with a > b > 0. The

left plot is based on [49] who derived the decision boundary (a − b)2 = 2(a + b). The right plot isbased on our Theorem 2.6 with the decision boundary (a− b)2 = 4(a+ 3b). It can be observed that


m = 3 yields a larger indistinguishable region than m = 2, which reveals a substantial differencefor hypothesis testing in the two models.

m=2

a

b

0 ∞

0∞

2

(a − b)2

2(a + b) = 1

a < b

a = b

m=3

a

b

0 ∞

0∞

4

(a − b)2

4(a + 3b) = 1

a < b

a = b

Fig 3: Phase transition in bounded degree case. Red: indistinguishable; green: distinguishable.

2. Main Results. In this section, we present our main results in three parts, organized by thesparsity of the network. The contiguity theory for the extremely sparse case is presented in Section2.1, followed by the contiguity and orthogonality result for the bounded degree case in Section2.2. In Section 2.3, we construct a powerful test by counting the hyperedges, l-hypervees, and l-hypertriangles for the dense case. Throughout this paper, we assume k and m are fixed positiveintegers.

2.1. A Contiguity Theory for Extremely Sparse Case. In this section, we consider the testingproblem (2) with pn qn n−m+1, i.e., the hyperedge probability of the hypergraph is extremelylow. For technical convenience, we only consider pn = a

nα and qn = bnα with constants a > b > 0

and α > m − 1. The results in this section may be extended to general orders of pn and qn withmore cumbersome arguments. We will show that no test can successfully distinguish H0 from H1 insuch a situation. The proof proceeds by showing that the probability measures associated with H0

and H1 are contiguous (see Theorem 2.1). We remark that contiguity has also been used to proveindistinguishability for ordinary graphs (see [49, 50]).

Let Pn and Qn be sequences of probability measures on a common probability space (Ωn,Fn).We say that Pn and Qn are mutually contiguous if for every sequence of measurable sets An ⊂ Ωn,Pn(An) → 0 if and only if Qn(An) → 0 as n → ∞. They are said to be orthogonal if there existsa sequence of measurable sets An such that Pn(An) → 0 and Qn(An) → 1 as n → ∞. Accordingto [49], two probability models are indistinguishable if their associated probability measures aremutually contiguous, and two probability models are distinguishable if their associated probabilitymeasures are orthogonal. The following theorem shows that H0 and H1 are indistinguishable.

Theorem 2.1. If α > m − 1 and a > b > 0 are fixed constants, then the probability measuresassociated with H0 and H1 are mutually contiguous.

The proof of Theorem 2.1 proceeds by showing that the ratio of the likelihood function of H1

over H0 converges in distribution to 1 under H0, which implies the contiguity of H1 and H0 [40].Theorem 2.1 says that the hypergraphs in H0 and H1 are indistinguishable, and hence, no test cansuccessfully separate the two hypotheses. One intuitive explanation is that when α > m − 1, the


average degree of both hypergraph models converges to zero. To see this, the average degree is

(3)

(n

m− 1

)a+ (km−1 − 1)b

km−1nα,

which goes to zero as n → ∞ if α > m − 1. Therefore, the signals in both models are not strongenough to support a successful test. It is easy to see that the average degree becomes boundedwhen α = m− 1 which will be investigated in the next section.

2.2. Bounded Degree Case. In this section, we consider pn qn n−m+1 which leads tobounded average degrees for the models in H0 and H1; see (3). For convenience, let us denotepn = a

nm−1 and qn = bnm−1 for fixed a > b > 0. Define the signal to noise ratio (SNR) for H0 and

H1 as

(4) κ =(a− b)2

km−1(m− 2)![a+ (km−1 − 1)b].

When m = k = 2, it is easy to check that κ = (a−b)22(a+b) which becomes the classic SNR of ordinary

stochastic block models considered by [49, 3, 2]. Hence, it is reasonable to view κ defined in (4) asa generalization of the classic SNR to the hypergraph model Hkm(n, a

nm−1 ,b

nm−1 ). Like the classicSNR, the value of κ characterizes the separability between communities. Intuitively, when κ is largewhich means that the communities are very different, the testing problem (2) becomes simpler. Thefollowing result showes that when κ > 1, successful testing becomes possible.

Theorem 2.2. Suppose that a > b > 0 are fixed constants, m, k ≥ 2. If κ > 1, then theprobability measures associated with H0 and H1 are orthogonal.

We prove Theorem 2.2 by constructing a sequence of events dependent on the number of longloose cycles and showing that the probabilities of the events converge to 1 (or 0) under H0 (or H1),based on the high moments driven asymptotic normality theorem from Gao and Wormald [36].Theorem 2.2 says that it is possible to distinguish the hypotheses H0 and H1 provided that κ > 1.Abbe and Sandon [2] obtained relevant results in the ordinary graph setting, i.e., m = 2 and k ≥ 2in our case; see Corollary 2.8 therein which states that community detection in polynomial timebecomes possible if SNR> 1. Whereas Theorem 2.2 holds for arbitrary m, k ≥ 2. Hence our resultcan be viewed as an extension of [2] to hypergraph setting.

Let us now propose a test statistic based on “long loose cycles” that can successfully distinguishH0 and H1 when κ > 1. Let ξn be a positive integer sequence diverging along with n. Let Xξn bethe number of loose cycles, each consisting of exactly ξn edges. Define

µn0 =λξnm2ξn

, µn1 = µn0 +k − 1

2ξn

[ a− bkm−1(m− 2)!

]ξn,

where λm = a+(km−1−1)bkm−1(m−2)!

for any m ≥ 2. Note that when m = 2, λm = a+(k−1)bk is the average

degree [17]. Let PH1 denote the probability measure induced by A under H1. We have the followingtheorem about the asymptotic property of Xξn .

Theorem 2.3. Suppose κ > 1 and 1 ξn ≤ δ0 logλm logγ n, where γ > 1 and 0 < δ0 < 2

are constants. Then, under Hl for l = 0, 1,Xξn−µnl√

µnl

d→ N(0, 1) as n → ∞. Furthermore, for any

constant C > 0, PH1

(∣∣Xξn−µn0√µn0

∣∣ > C)→ 1 as n→∞.


The proof is based on the asymptotic normality theory developed by [36]. According to Theorem2.3, we propose the following test statistic

Tξn =Xξn − µn0√

µn0.

We remark that computation of Tξn is typically in super-polynomial time since it requires to find

Xξn which has complexity nO(ξn). By Theorem 2.3, Tξnd→ N(0, 1) under H0. Hence, we construct

the following testing rule at significance level α ∈ (0, 1):

reject H0 if and only if |Tξn | > zα/2,

where zα/2 is the (1−α/2)-quantile of N(0, 1). It follows by Theorem 2.3 that PH1(|Tξn | > zα/2)→ 1,i.e., the power of Tξn approaches one when κ > 1.

Theorem 2.3 requires ξn → ∞ and to grow slower than an iterative logarithmic order. This isdue to the use of [36] which requires ξn to diverge with ξnλ

ξnm = o(log n). In practice, we suggest

choosing ξn = bδ0 logλm logγ nc with γ close to 1 and δ0 close to 2. Such γ and δ0 will make ξnsuitably large so that the test statistic Tξn becomes valid. For instance, Table 1 demonstrates thevalues of ξn along with n with δ0 = 1.99, γ = 1.01, λm = 10. We can see that, for a moderate rangeof n, the values of ξn are sufficiently large to make the test valid. When ξn = l is fixed and theexact α-level test is needed, we should use Poisson distribution as the null limiting distribution.In this case, the number of l-loose cycle Xl converges in distribution to Poisson distribution with

mean µ0 = λlm2l under H0 (It’s implied by the proof of Theorem 2.5). It should be mentioned that

Table 1Minimal n to achieve a desirable value of ξn.

Desirable ξn 3 4 5 6

Minimal n 2 3 25 29786

the calculation of Tξn requires known values of a and b. When a and b are unknown, motivated bythe ordinary graph [49], they can be estimated as follows. Define

λm =nm−1|E|

(m− 2)!(nm

) , f = (2ξnXξn − λξnm )1ξn ,

where |E| is the number of observed hyperedges and Xξn is the number of loose cycles of length

ξn. Let an = (m − 2)![λm + (km−1 − 1)(k − 1)

− 1ξn f]

and bn = (m − 2)![λm − (k − 1)

− 1ξn f]. The

following theorem says that an and bn are consistent estimators of a and b, respectively.

Theorem 2.4. Suppose κ > 1 and ξn satisfies the condition in Theorem 2.3. Then an → a andbn → b in probability.

Another interesting question is to investigate for what values of κ a successful test becomesimpossible. When m = k = 2, [49] showed that no test can successfully distinguish H0 fromH1 provided κ < 1; and a successful test becomes possible provided κ > 1. It is substantiallychallenging to obtain such a sharp result when k becomes larger. For instance, in the ordinarygraph setting, [53] obtained a (non-sharp) condition in terms of SNR when k ≥ 3 under whichsuccessful test becomes impossible. In Theorem 2.5 below, we address a similar question in the

hypergraph setting. For any integers m ≥ 3, k ≥ 2, define τ1(m, k) =(m2

)−1∑dm2 −1ei=1

1k2i−1

(mi+2

)and


τ2(m, k) = 1 +(m2

)−1∑m−2i=1

1k2i

(mi+2

). The quantities τ1(m, k) and τ2(m, k) will jointly characterize

a spectrum of (m, k, κ) such that successful test does not exist.

Theorem 2.5. Suppose that m ≥ 3, k ≥ 2 are integers satisfying τ1(m, k) ≤ 1, a > b > 0 arefixed constants and α = m− 1. If

(5) 0 < κ <1

τ2(m, k)(k2 − 1),

then the probability measures associated with H0 and H1 are mutually contiguous.

The proof of Theorem 2.5 relies on Janson’s contiguity theory [40]. Theorem 2.5 says that whenτ1(m, k) ≤ 1 and κ falls in the range (5), there is no test that can successfully distinguish thehypotheses H0 and H1. It should be emphasized that the condition τ1(m, k) ≤ 1 holds for a broadrange of pairs (m, k). For instance, such condition holds for any k ≥ 2 and 3 ≤ m ≤ 6. To see this,for any k ≥ 2, τ1(3, k) = 1

3k < 1, τ1(4, k) = 23k < 1, τ1(5, k) = 1

k + 12k3

< 1 and τ1(6, k) = 43k + 1

k3< 1.

Note that m ≤ 6 covers most of the practical cases (see [31]).Combining Theorems 2.5 and 2.2, it is still unknown whether H0 and H1 are distinguishable

when 1τ2(m,k)(k2−1)

≤ κ ≤ 1. Such result can be further improved for the special case k = 2, and we

close the gap if in addition m = 3, as presented in the following theorem.

Theorem 2.6. For k = 2, the following results hold.

1. For any m ≥ 2, if 0 < κ < 22−m, then H0 and H1 are indistinguishable. Moreover, for anygiven constant κ0 such that κ0 >

m(m−1) log 22m−1−1

, there exist a > b > 0 such that the SNR κ forthe hypotheses H0 and H1 is equal to κ0, and H0 and H1 are distinguishable by the likelihoodratio test.

2. For any m ≥ 2, if 0 < κ < m(m−1)2Nm

, where Nm = [3m + (−1)m]/4− 2m−1 + 1/2, then H0 andH1 are indistinguishable.

Specifically, Part 1 indicates that, when SNR is below 22−m, H0 and H1 are indistinguishable;while they are possible to be distinguishable when SNR is greater than m(m−1) log 2

2m−1−1. Essentially,

Part 1 implies that the derived SNR upper and lower bounds satisfy the following relationship:

0 < supm≥1

1

m2· SNR upper bound

SNR lower bound<∞,

and

0 < minm≥1

SNR lower bound

2−m≤ max

m≥1

SNR lower bound

2−m<∞.

Part 2 in Theorem 2.6 provides an SNR interval for H1 and H0 to be indistinguishable. Whenm = 3, N3 = 3 leads to an SNR interval 0 < κ < 1 which is sharp since κ > 1 implies that H0 andH1 are distinguishable thanks to Theorem 2.2. For k = 2 and general m, the upper bound m(m−1)

2Nm

may be less sharp as m grows. In particular, when m is large, the upper bound is of order m23−3,which can actually be improved as shown in Part 1 of Theorem 2.6. Specifically, Part 1 indicatesthat, when SNR is below 22−m, H0 and H1 are indistinguishable; while they are possible to bedistinguishable when SNR is greater than m22−m up to a constant in the special case k = 2. Notethat for 3 ≤ m ≤ 8, m(m−1)

2Nm> 22−m and for m ≥ 9, m(m−1)

2Nm< 22−m.

The proof of Theorem 2.6 relies on a truncation technique to show the stochastic boundednessof the likelihood ratio and a delicate derivation of a lower bound for the truncated likelihood ratio.


An interesting consequence of Part 1 is that the likelihood ratio test is possible to distinguish H0

and H1 even when κ is below 1 (but greater than m(m−1) log 22m−1−1

). However, the computation of thelikelihood ratio is NP-hard. When κ > 1, the l-cycle based test can distinguish H0 and H1 as well(see Theorem 2.3), and is computationally less expensive.

Remark 2.1. We provide more details about why truncation technique is needed in our setting.The proof of Theorem 2.6 relies on the first moment technique which requires the analysis of E1Ynwhere E1 is the expectation taken under H1 and Yn = dP1

dP0is the likelihood ratio of H1 to H0. We

find that the expression of E1Yn includes terms like

(6) Eσ exp

( ∑i1<···<im

poly(σi1 , . . . , σim)

)

where poly(σi1 , . . . , σim) is an m-th-order polynomial of σi1 , . . . , σim ∈ ±1. When m = 2, (6)becomes a second-order polynomial which is asymptotically χ2 by CLT. And so, (6) is heuristicallyE exp(const× χ2) which is finite. This is why no truncation technique is needed here.

However, when m = 3, the above polynomial is third-order which is asymptotically Z3 whereZ ∼ N(0, 1). And as a result, (6) is heuristically E exp(const×Z3) which is infinite. This is why weused the truncation technique, i.e., to truncate the likelihood ratio on an even with high probabilityso that the higher-order polynomials are well controlled, and the truncated likelihood ratio has afinite expectation.

2.3. A Powerful Test for Dense Uniform Hypergraph. In this section, we consider the problemof testing community structure in dense m-uniform hypergraphs with pn qn n−m+1. Ourapproach is based on counting the hyperedges, l-hypervees, and l-hypertriangles in the observedhypergraph. To ensure the success of our test, l needs to be properly selected according to thehyperedge probability of the model. Under such correct selection, we derive asymptotic normalityfor the test and analyze its power. We also discuss the effect of misspecified l in Remark 2.2. Ourmethod can be viewed as a generalization of [33, 34] from ordinary graph testing. The differentfeatures of the hypergraph cycles make our generalization nontrivial.

For convenience, let us denote pn = annm−1 and qn = bn

nm−1 with diverging an, bn. Therefore, (2)becomes the following hypothesis testing problem:

(7) H ′0 : A ∼ Hm(n,an + (km−1 − 1)bn

km−1nm−1

)vs. H ′1 : A ∼ Hkm

(n,

annm−1

,bn

nm−1

).

We temporarily assume that there exists an integer 1 ≤ l ≤ m2 such that nl−1 an bn nl−

23 .

Such a requirement will be relaxed by invoking a sparsification technique. Note that model (7)allows 1 an bn n1/3 (with l = 1), compared with spectral algorithm [31] which requiresan (log n)2 or an log n in [45].

We consider the following degree-corrected SBM in [9, 10, 33] which is more general than (1) andgeneralizes its counterpart in ordinary graphs. Let Wi, i = 1, . . . , n be nonnegative i.i.d. randomvariables with E(W 2

1 ) = 1 and σi, i = 1, . . . , n be i.i.d. random variables from multinomialdistribution Mult(k, 1, 1/k). Assume that Wi’s and σi’s are independent. Given Wi’s and σi’s, theAi1i2...im ’s, with pairwise distinct i1, . . . , im, are conditional independent satisfying

P(Ai1i2...im = 1|W,σ) = Wi1 . . .Wimpi1i2...im(σ),(8)

P(Ai1i2...im = 0|W,σ) = 1−Wi1 . . .Wimpi1i2...im(σ),


where W = (W1, . . . ,Wn),

pi1i2...im(σ) =

an

nm−1 , σi1 = · · · = σimbn

nm−1 , otherwise.

We call (8) the degree-corrected SBM in hypergraph setting. The degree-correction weights Wi’scan capture the degree of inhomogeneity exhibited in many social networks. When m = 2, (8)reduces to the classical degree-corrected SBM for ordinary graphs (see [9, 10, 33]). For ordinarygraphs, [33] proposed a test through counting small subgraphs to distinguish the degree-correctedSBM from an Erdos-Renyi model. In what follows, we generalize their results to hypergraphsthrough counting small sub-hypergraphs, including hyperedges, l-hypervee, and l-hypertriangles,with definitions given below.

Definition 2.1. An l-hypervee consists of two hyperedges with l common vertices. An l-hypertriangle is an l-cycle consisting of three hyperedges.

For example, in Figure 4, the hyperedge set (v1, v2, v3, v4), (v3, v4, v5, v6) is a 2-hypervee, and(v1, v2, v3, v4), (v3, v4, v5, v6), (v5, v6, v1, v2) is a 2-hypertriangle.

v 3v 4

v 5

v 2

v 6

v 1

E2

E1

v 3v 4

v 5v 2

v 6

v 1

E2

E1

E3

Fig 4: Examples of hypervee (left) and hypertriangle (right) with two common vertices between consecutive hyperedges.

Consider the following probabilities of hyperedge, hypervee and hypertriangle inHkm(n, an

nm−1 ,an

nm−1

):

E = P(Ai1i2...im = 1),

V = P(Ai1i2...imAim−l+1...i2m−l = 1),

T = P(Ai1i2...imAim−l+1...i2m−lAi2m−2l+1...i3(m−l)i1...il = 1).

It follows from direct calculations that

E = (EW1)man + (km−1 − 1)bn

nm−1km−1,

V = (EW1)2(m−l)

((an − bn)2

n2(m−1)k2m−l−1+

2(an − bn)bn

n2(m−1)km−1+

b2nn2(m−1)

),

T = (EW1)3(m−2l)

((an − bn)3

n3(m−1)k3(m−l)−1+

3(an − bn)2bn

n3(m−1)k2m−l−1+

3(an − bn)b2nn3(m−1)km−1

+b3n

n3(m−1)

).


Define T = T −(VE

)3. The following result demonstrates a strong relationship between T and

H ′0, H′1.

Proposition 2.7. Under H ′0, T = 0 and under H ′1, T 6= 0.

Proposition 2.7 says that H ′0 holds if and only if T = 0. Hence, it is reasonable to use an empirical

version of T , namely, T , as a test statistic for (7).Prior to constructing T , let us introduce some notation. For convenience, we use i1 : im to

represent the ordering i1i2 . . . im. Also define C2m−l(A) and C3(m−l)(A) for any adjacency tensor Aas follows.

C2m−l(A) = Ai1:imAim−l+1:i2m−l +Ai2:im+1Aim−l+2:i2m−li1 + · · ·+Ai2m−li1:im−1Aim−l:i2m−l−1,

C3(m−l)(A) = Ai1:imAim−l+1:i2m−lAi2m−2l+1:i3(m−l)i1:il +Ai2:im+1Aim−l+2:i2m−l+1Ai2m−2l+2:i3(m−l)i1:il+1

+ · · ·+Aim−l:i2m−l−1Ai2(m−l):i3(m−l)i1:il−1

Ai3(m−l)i1:im−1 .

Note that C2m−l(A) is the number of hypervees in the given vertex ordering i1i2 . . . i2m−l, whileC3(m−l)(A) counts the number of hypertriangles in the given vertex ordering i1i2 . . . i3(m−l). Define

E, V , T as the empirical versions of E, V, T :(9)

E =1(nm

) ∑i∈c(m,n)

Ai1:im , V =1(n

2m−l) ∑i∈c(2m−l,n)

C2m−l(A)

2m− l, T =

1(n

3(m−l)) ∑i∈c(3(m−l),n)

C3(m−l)(A)

m− 1,

where, for any positive integers s, t, c(s, t) = (i1, . . . , is) : 1 ≤ i1 < · · · < is ≤ t. We have thefollowing asymptotic normality result.

Theorem 2.8. Suppose EW 41 = O(1) and nl−1 an bn nl−

23 for some integer 1 ≤ l ≤ m

2 .Moreover, let

(10) δ :=

√(n

3(m−l))(m− l)

√T

[T −

(VE

)3]∈ [0,∞).

Then we have, as n→∞,

(11)

√(n

3(m−l))(m− l)

[T −

(V

E

)3]√T

− δ d→ N(0, 1),

(12) 2

√(n

3(m− l)

)(m− l)

[√T −

( VE

) 32

]− δ d→ N(0, 1).

When l = 1 and m = 2, Theorem 2.8 becomes Theorem 2.2 of [33].Following (11) in Theorem 2.8, we can construct a test statistic for (7) as

(13) Tm =

√(n

3(m−l))(m− l)

[T −

(V

E

)3]√T

.


In practice, T might be close to zero which may cause computational instability, an alternative testcan be constructed based on (12) as

(14) T ′m = 2

√(n

3(m− l)

)(m− l)

[√T −

( VE

) 32

].

We remark that computation of Tm and T ′m is in polynomial time since the computations of T , V ,and E all have complexity O(n3(m−l)). Theorem 2.8 proves asymptotic normality for Tm and T ′munder both H ′0 and H ′1. Under H ′0, i.e., δ = 0, both Tm and T ′m are asymptotically standard normal.

Under H ′1, both Tm and T ′m are asymptotically normal with mean δ > 0 and unit variance. When

T has a large magnitude, both test statistics can be used to construct valid rejection regions.The following Theorem 2.9 says that the power of our test tends to one if δ goes to infinity.

Theorem 2.9. Suppose EW 41 = O(1) and nl−1 an bn nl−

23 for some integer 1 ≤ l ≤ m

2 .

Under H ′1, as n, δ →∞, P(|Tm| > zα/2)→ 1. The same result holds for T ′m.

Remark 2.2. When there are multiple possible choices for l, Theorem 2.8 and Theorem 2.9may fail if l is misspecified. For example, if m = 4 and the “correct” value is l0 = 2 (correspondingto the true hyperedge probability), but we count 1-cycle. Then under H0, the test statistic in (11)

or (12) is of order Op(n32 ), i.e., the limiting distribution does not exist. Whereas, if the correct

value is l0 = 1 but we count 2-cycle, then the test statistic in (11) or (12) have the same limitingdistribution (if it exists) under H0 and H1, i.e., the power of the test does not approach one. Inpractice, we recommend using the hyperedge proportion to get a rough estimate for l.

Theorem 2.8 and Theorem 2.9 work for relatively sparse hypergraphs. For denser hypergraphs,we propose a sparsification procedure so that Theorem 2.8 and Theorem 2.9 are valid. For any

index i1 < i2 < · · · < im, generate εi1i2...imiid∼ Bernoulli(rn). Consider a new hypergraph with

adjacency tensor A defined by Ai1i2...im = εi1i2...imAi1i2...im , where Ai1i2...im are the elements of theoriginal observed adjacency tensor. Under H ′0, we have

E[Ai1i2...im ] = (EW1)m(rnan) + (km−1 − 1)(rnbn)

km−1nm−1.

Set an = rnan and bn = rnbn. For dense hypergraphs, we could replace A, an and bn in (7) byA, an and bn respectively. Note that the hypergraphs A and A have the same global communitystructure. A properly selected rn will make Theorem 2.8 and Theorem 2.9 valid.

Corollary 2.10. Suppose EW 41 = O(1) and 1 an bn ≤ nm−1. If rn = o(1) and nl−1

rnan rnbn nl−23 for some integer 1 ≤ l ≤ m

2 . Then the results of Theorems 2.8 and 2.9 based

on l-cycle continue to hold based on the sparsified hypergraph A.

Note that Corollary 2.10 is valid for a broad range of hyperedge probabilities 1nm−1 pn qn ≤

1. Since H0 and H1 are indistinguishable when pn qn 1nm−1 (see Section 2.1), it covers all

density regimes of interest. One just needs to select the sparsification factor rn to ensure that rnanand rnbn fall into the range nl−1 rnan rnbn nl−

23 , provided that one wants to use l-cycles

to construct the test. The selection of l has been discussed in Remark 2.2.

Remark 2.3. In some literature, the degree correction variable Wi in (8) are assumed to bedeterministic [43, 35, 42]. In this case, Theorem 2.8 still holds under mild conditions and the proof


goes through with slight modifications. To illustrate this, we consider m = 3. Let W = (W1, . . . ,Wn)be a given and deterministic degree correction vector and denote ||W ||tt =

∑ni=1W

ti for positive

integer t. Let T1, E1, V1 be defined as

T1 =

∑i1,...,i6:distinctAi1i2i3Ai3i4i5Ai5i6i1

n6, V1 =

∑i1,...,i5:distinctAi1i2i3Ai3i4i5

n5, E1 =

∑i1,i2,i3:distinctAi1i2i3

n3.

Then we have the following result.

Proposition 2.11. Suppose 1 ||W ||tt = O(||W ||1) for 2 ≤ t ≤ 12, ||W ||1 ||W ||22 = O(n),p0||W ||21 1 and p2

0||W ||31 = o(1). Then under H ′0 we have

(15) T3 =

√n6

T1

[T1 −

( V1

E1

)3]

d→ N(0, 1).

Further, if 1 an bn n13 , then the power of the test T3 goes to 1 as δ1 →∞, where

δ1 :=

√n6

T1

[T1 −

( V1

E1

)3], E1 =

an + (k2 − 1)bnn2k2

||W ||31n3

,

V1 =

((an − bn)2

n4k4+

2(an − bn)bnn4k2

+b2nn4

)||W ||22||W ||41

n5,

T1 =

((an − bn)3

n6k5+

3(an − bn)2bnn6k4

+3(an − bn)b2n

n6k2+b3nn6

)||W ||62||W ||31

n6.

The proof of Proposition 2.11 is given in the supplement. In Proposition 2.11, the conditionsp0||W ||21 1 and p2

0||W ||31 = o(1) require the hypergraph to be moderately sparse. At first glance, theconditions 1 ||W ||tt = O(||W ||1) = O(n) for 2 ≤ t ≤ 12 and ||W ||1 ||W ||22 seem very restrictive.However, these conditions are easy to satisfy and can accommodate severe degree heterogeneity. Forexample, when Wi = i

n for i = 1, 2, . . . , n, we have ||W ||tt = nt+1(1 + o(1)) for any positive integer

t. In this case, the average degrees d1 and dn for vertices 1 and n are

d1 =∑

1<j<k

W1WjWkp0 = p0W1

(0.5( n∑j=2

Wj

)2 − n∑j=2

W 2j

)=np0

8(1 + o(1)),

dn =∑j<k<n

WnWjWkp0 = p0Wn

(0.5( n−1∑j=1

Wj

)2 − n−1∑j=1

W 2j

)=n2p0

8(1 + o(1)).

Clearly, dn nd1 and hence the hypergraph is highly heterogeneous. Another example is to takeWi ∈ [c1, c2] with positive constants c1 < c2, which yields a hypergraph with less heterogeneousdegrees.

3. Extensions to Non-uniform Hypergraph. Non-uniform hypergraph can be viewed asa superposition of a collection of uniform hypergraphs, introduced by [31] in which the authorsproposed a spectral algorithm for community detection. In this section, we study the problem oftesting community structure over a non-uniform hypergraph.


Let Hk(n,M) be a non-uniform hypergraph over n vertices, with the vertices uniformly andindependently partitioned into k communities, and M ≥ 2 is an integer representing the maxi-mum length of the hyperedges. Following [31], we can write Hk(n,M) = ∪Mm=2Hkm

(n, amn

nm−1 ,bmnnm−1

),

where Hkm(n, amn

nm−1 ,bmnnm−1

)are independent uniform hypergraphs with degree-corrected vertices in-

troduced in Section 2.3. Correspondingly, define H(n,M) = ∪Mm=2Hm(n, amn+(km−1−1)bmn

km−1nm−1

)as a

superposition of Erdos-Renyi models. Clearly, each Erdos-Renyi model in H(n,M) has the sameaverage degree as its counterpart in Hk(n,M), and H(n,M) has no community structure. Let Amdenote the adjacency tensor for m-uniform sub-hypergraph and A = Am,m = 2, . . . ,M is acollection of Am’s. We are interested in the following hypotheses:

H ′′0 : A ∼ H(n,M) vs. H ′′1 : A ∼ Hk(n,M).(16)

3.1. Non-uniform homogeneous hypergraphs with bounded degree. To enhance readability, weassume M = 3, i.e., H = H2∪H3, and the hypergraphs are homogeneous without degree correction.The results are extendable to arbitrary M with more tedious arguments. The following Corollary3.1, extending Theorem 2.1, shows that it is impossible to distinguish H ′′0 and H ′′1 in extremelysparse regime. The proof is essentially the same as Theorem 2.1 which also relies on the conditionalindependence of H2 and H3.

Corollary 3.1. If amn bmn = o(1), then H ′′0 and H ′′1 are mutually contiguous.

The following Corollary 3.2 extends the bounded degree results from Section 2.2. Let amn = am,

bmn = bm be positive constants, and κm = (am−bm)2

km−1[am+(km−1−1)bm].

Corollary 3.2. If κ2 > 1 or κ3 > 1, then H ′′0 and H ′′1 are asymptotically orthogonal. If[κ2 +

κ3

3

(1 +

1

3k2

)](k2 − 1) < 1,

then H ′′0 and H ′′1 are mutually contiguous. Furthermore, the results of Theorems 2.3 and 2.4 stillhold with the corresponding quantities therein replaced by those in Hm.

3.2. Non-uniform hypergraph with growing degree. Assume that, for 2 ≤ m ≤ M , amn, bmn

are proxies of the hyperedge densities satisfying nlm−1 amn bmn nlm−23 , for some integer

1 ≤ lm ≤ m2 .

For any 2 ≤ m ≤ M , let Tm and δm be defined as in (13) and (10), respectively, based on them-uniform sub-hypergraph. We define a test statistic for (16) as

(17) T =M∑m=2

cmTm,

where cm are constants with normalization∑M

m=2 c2m = 1. As a simple consequence of Theorems

2.8 and 2.9, we get the asymptotic distribution of T as follows.

Corollary 3.3. Suppose that the degree-correction weights satisfy the same conditions as inTheorem 2.8, and for any 2 ≤ m ≤ M , nlm−1 amn bmn nlm−

23 , for some integer 1 ≤ lm ≤

m2 . Then, as n → ∞, T −

∑Mm=2 cmδm

d→ N(0, 1). Furthermore, for any constant C > 0, under

H ′′1 , P(|T | > C)→ 1, provided that∑M

m=2 cmδm →∞ as n→∞.


Under H ′′0 , i.e., each m-uniform subhypergraph has no community structure, we have δm = 0 by

Proposition 2.7. Corollary 3.3 says that T is asymptotically standard normal. Hence, an asymptotictesting rule at significance α would be

reject H ′′0 if and only if |T | > zα/2.

The quantity∑M

m=2 cmδm may represent the degree of separation between H ′′0 and H ′′1 . By Corollary

3.3, under H ′′1 , the test will achieve high power when∑M

m=2 cmδm is large.

Remark 3.1. According to Corollary 3.3, to make T having the largest power, we need tomaximize the value of

∑Mm=2 cmδm subject to

∑Mm=2 c

2m = 1. The maximizer is c∗m = δm√∑M

m=2 δ2m

,

m = 2, 3, . . . ,M . The corresponding test T ∗ =∑M

m=2 c∗mTm becomes asymptotically the most pow-

erful among (17). In particular, T ∗ is more powerful than Tm for a single m. This can be explainedby the additional hyperedge information involved in the test. This intuition is further confirmed bynumerical studies in Section 4. Note that T2 (m=2) is the classic test proposed by [33] in ordinarygraph settings.

4. Numerical Studies. In this section, we provide a simulation study in Section 4.1 and realdata analysis in Section 4.2 to assess the finite sample performance of our tests.

4.1. Simulation. We generated a non-uniform hypergraphH2(n, 3) = H22(n, a2, b2)∪H2

3(n, a3, b3),

with n = 100 under various choices of (am, bm),m = 2, 3. In each scenario, we calculated Z2 := T ′2and Z3 := T ′3 by (14). Note that Z2 = T ′2 is the test for ordinary graph considered in [33].For testing the community structure on the non-uniform hypergraph, we calculated the statis-tic Z := T = (T ′2 + T ′3 )/

√2. In addition, we considered a strategy similar to [8] by first reducing the

hypergraph to a weighted graph and applying a test designed for weighted graphs in [65]. Specifi-cally, given an m-uniform hypergraph with hyperedges e1, e2, . . . , eM , we first transformed it to aweighted graph with an adjacency matrix A = [Aij ]1≤i,j≤n in which Aij =

∑Mk=1 I(i, j ⊂ ek) for

i 6= j and Aij = 0 for i = j. In other words, Aij is the total number of hyperedges containing ver-tices i and j. Next, we generated a new weighted graph with an adjacency matrix A = [Aij ]1≤i,j≤nby zeroing out row s and column s of A if

∑nj=1 Asj > cthr

1n

∑ni=1

∑nj=1 Aij . Here cthr > 0 is a

prespecified threshold constant.1 We then applied the test method proposed by [65] to the weightedgraph A, where the test statistic is denoted by ZT .

We examined the size and power of each test by calculating the rejection proportions based on500 independent replications at 5% significance level. Let δm denote the quantity defined in (10)which is the main factor that affects power.

Our study consists of two parts. In the first part, we investigated the power change of the fourtesting procedures when δ2 = δ3 = δ increases from 0 to 10. Specifically, we set b2 = 10b3, whereb3 = 0.01, 0.005, 0.001 represents the dense, moderately dense and sparse network, respectively;am = rmbm for m = 2, 3 with the values of rm summarized in Table 2. It can be checked thatsuch choice of (am, bm) indeed makes δ range from 0 to 10. We also considered both balanced andimbalanced networks with the probability (ς) of the smaller community takes the value of 0.5 and0.3, respectively.

The rejection proportions under various settings are summarized in Figures 5 through 7. Severalinteresting findings should be discussed. First, the rejection proportions of all test statistics except

1According to the proof of Lemmas 5 and 7 in [8], cthr is a large enough constant such that log(1 +m2x)−m2 ≥12

log(x) for all x ≥ cthr. In the simulation studies, we chose cthr to be the largest root of log(1+m2x)−m2 = 12

log(x)


the ZT (based on the graph transformation) at δ = 0 are close to the nominal level 0.05 underdifferent choices of ς and b3, which demonstrates that these three test statistics are valid. Weobserve that the size (corresponding to δ = 0) and power (corresponding to δ > 0) of the graph-transformation test are almost 100% regardless of the choice of b3, which implies that the testingprocedure ZT is asymptotically invalid. Second, as expected, the rejection proportions of all testsincrease with δ, regardless of the choices of b3 and ς. Third, in most cases, the testing procedurebased on non-uniform hypergraph (Z) has larger power than the one based only on the 3-uniformhypergraph (Z3) or the ordinary graph (Z2). This agrees with our theoretical finding since moreinformation has been used in the combined test; see Remark 3.1 for a detailed explanation.

Remark 4.1. The failure of the graph-transformation-based testing procedure ZT is possiblydue to the dependence between the edges of the transformed graph. Given the number of commu-nities k, many existing community detection algorithms do not require the independence assump-tion about the edges. However, this assumption is important to derive the limit distributions ofthe corresponding statistics in the hypothesis testing problems about k (e.g., see [18, 33, 44, 65]).The graph-transformation-based method might still be promising for testing hypergraphs, but newasymptotic theory based on dependent edges seems necessary.

b3 δ 0 1 2 3 4 5 6 7 8 9 10

0.01r3 1 2.26 2.65 2.93 3.17 3.38 3.58 3.75 3.91 4.06 4.21r2 1 2.07 2.43 2.71 2.95 3.16 3.35 3.53 3.71 3.87 4.02

0.005r3 1 2.89 3.51 3.98 4.39 4.75 5.08 5.38 5.67 5.94 6.20r2 1 2.66 3.29 3.79 4.22 4.61 4.97 5.31 5.64 5.94 6.24

0.001r3 1 6.50 8.83 10.73 12.41 13.95 15.39 16.76 18.03 19.28 20.48r2 1 6.57 9.31 11.59 13.64 15.51 17.26 18.92 20.51 22.00 23.46

Table 2Choices of r2, r3, b3 for δ to range from 1 to 10.

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

ς = 0.3

δ

Rej

ectio

n P

ropo

rtio

n

ZZ2Z3ZT

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

ς = 0.5

δ

Fig 5: Rejection proportions in dense case with b3 = 0.1× b2 = 0.01.


0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

ς = 0.3

δ

Rej

ectio

n P

ropo

rtio

n

ZZ2Z3

ZT

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

ς = 0.5

δ

Fig 6: Rejection proportions in moderately dense case with b3 = 0.1× b2 = 0.005.

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

ς = 0.3

δ

Rej

ectio

n P

ropo

rtio

n

ZZ2Z3

ZT

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

ς = 0.5

δ

Fig 7: Rejection proportions in sparse case with b3 = 0.1× b2 = 0.001.

In the second part, we investigated how the powers of the tests change along with the hyperedgeprobability. For convenience, we report the results based on the log-scale of b3 which ranges from−8 to −6. We chose δ = 1 and 3, ς = 0.3 and 0.5, b2 = 10b3. Similar to the first part, we setam = rmbm with m = 2 and 3 to guarantee that log b3 indeed ranges from −8 to −6. The valuesof rm were summarized in Table 3. Figures 8 and 9 report the rejection proportions for δ = 1 and3 under various hyperedge densities. We note that the rejection proportion of ZT is always 100%under all settings. Moreover, Z is more powerful than Z2 and Z3 in the cases ς = 0.3, 0.5 and δ = 3.For the remaining scenarios, all procedures have satisfactory performance.

Table 3Choices of r2, r3, and δ for log(b3) to range from −8 to −6.

δ log(b3) -8 -7 -6

1r3 14.18 6.88 3.93r2 15.78 7.03 3.72

3r3 26.37 11.51 5.82r2 30.68 12.54 5.83


−8.0 −7.5 −7.0 −6.5 −6.0

0.0

0.2

0.4

0.6

0.8

1.0

ς = 0.3

log(b3)

Rej

ectio

n P

ropo

rtio

n

ZZ2Z3

ZT

−8.0 −7.5 −7.0 −6.5 −6.0

0.0

0.2

0.4

0.6

0.8

1.0

ς = 0.5

log(b3)

Fig 8: Rejection proportions when δ = 1 and b2 = 10b3.

−8.0 −7.5 −7.0 −6.5 −6.0

0.0

0.2

0.4

0.6

0.8

1.0

ς = 0.3

log(b3)

Rej

ectio

n P

ropo

rtio

n

ZZ2Z3

ZT

−8.0 −7.5 −7.0 −6.5 −6.0

0.0

0.2

0.4

0.6

0.8

1.0

ς = 0.5

log(b3)

Fig 9: Rejection proportions when δ = 3 and b2 = 10b3.

4.2. Analysis of Coauthorship Data. In this section, we applied our testing procedure to studythe community structure of a coauthorship network dataset, available at https://static.aminer.org/lab-datasets/soinf/. The dataset contains a 2-author ordinary graph and a 3-author hy-pergraph. After removing vertices with degrees less than ten or larger than 20, we obtained a hy-pergraph (hereinafter referred to as global network) with 58 nodes, 110 edges, and 40 hyperedges.The vertex-removal process aims to obtain a suitably sparse network so that our testing procedureis applicable. We examined our procedures based on the global network and subnetworks. To dothis, we first performed the spectral algorithm proposed by [31] to partition the global network intofour subnetworks which consist of 7, 13, 14, 24 vertices, respectively (see Figure 10). In Figure 11,we plotted the incidence matrices of the 2- and 3-uniform hypergraphs, denoted 2-UH and 3-UH,respectively, as well as their superposition (Non-UH). The black dots represent vertices within thesame communities. The red crosses represent vertices between different communities. An edge orhyperedge is drawn between the black dots or red crosses that are vertically aligned. It is observedthat the between-community (hyper)edges are sparser than the within-community ones, indicatingthe validity of the partitioning.

We conducted testing procedures based on Z2, Z3, and Z at significance level 0.05 (similar to

https://static.aminer.org/lab-datasets/soinf/

https://static.aminer.org/lab-datasets/soinf/


Section 4.1) to both global network and subnetworks. The values of the test statistics are summa-rized in Table 4. Observe that Z2 and Z yield very large test values for the global network indicatingstrong rejection of the null hypothesis. For subnetwork testing, Z2 rejects the null hypothesis forsubnetwork 3; while Z3 and Z do not reject the null hypotheses for any subnetworks. This demon-strates that the community detection results are reasonable in general, and the subnetworks mayno longer have finer community structures.

Global Network

537

16 33

39

40

11

384

2

30

7

10

31

18

15

2228

12 21

45

24

53

56 299

13

17

14

43

42 2720

49

19

3

47

1

50

46

23

35

8

6

41

55

4834

54

57

51

52

25

443258

26

36

SubNetwork 1

11

21

50

46

23

19

55

SubNetwork 2

5

37

30

31

18

15

22

28

1249

51

52

54

SubNetwork 3

1633

39

40

4

2

17

3

35

108

41

34

36

SubNetwork 4

38

7

2453

56

29

9

13

43

42

27

20

47

114

6

48

45

57

25

44

32

58

26

Fig 10: Global network and four subnetworks based on coauthorship data.

0 20 40 60 80 100

010

2030

4050

60

Edge Index

Vert

ex In

dex

2−UH

0 10 20 30 40

010

2030

4050

60

Hyperedge Index

Vert

ex In

dex

3−UH

0 50 100 150

010

2030

4050

60

Hyperedge Index

Vert

ex In

dex

Non−UH

Hyperedge(Edge) Between Classes Hyperdge(Edge) Within ClassesFig 11: Incidence matrices based on coauthorship data. Left: 2-uniform hypergraph; Middle: 3-uniform hypergraph;Right: non-uniform hypergraph.

5. Discussion. In the context of community testing for hypergraphs, we systematically con-sidered various scenarios in terms of hyperedge densities and investigated distinguishability orindistinguishability of the hypotheses in each scenario. Extensions of our results are possible.

The first line is to extend the test statistic in Section 2.3 to tackle the model selection problemfor SBM in hypergraphs. In particular, one possibility is to study the hypothesis testing problem ofH0 : k = k0 vs. H1 : k > k0 for k0 = 1, 2, . . . sequentially and stop when observing a rejection. The


Table 4Values of test statistics based on global network and four subnetworks. Symbols ∗∗ and ∗ indicate the strength of

rejection, i.e., p-value< 0.001 and p-value< 0.05 respectively.

Global Network SubNetwork 1 SubNetwork 2 SubNetwork 3 SubNetwork 4n 58 7 13 14 24Z2 8.360** 0.161 -0.030 2.667* 1.661Z3 1.451 -0.100 -0.211 -0.289 -0.052Z 6.938** 0.043 -0.171 1.682 1.137

second line is to extend the current results to the increasingly popular degree-corrected stochasticblock models. However, based on the current second-moment technique, the bounded degree resultsare not easy to establish. The main reason is that the moments of the likelihood ratio do not havean explicit expression in terms of a, b, κ. Even in the ordinary graph setting with m = 2, this isalready very difficult. To see this, when σi = σj and ηi = ηj for all i < j, it can be shown that

EWEW ∗∏i<j

(pij(σ,W )pij(η,W

∗)

p0+qij(σ,W )qij(η,W

∗)

q0

)

≈ EWEW ∗ exp

β∑i<j

(WiWja− d)(W ∗i W∗j a− d)

,(18)

where β = 1dnα + 1

n2α . The expected value (18) seems difficult to analyze under general randomweights W,W ∗, even for the above special choice of σ, η. Hence, a precise contiguity region in termsof a, b, κ is not available using the current second-moment method.

The third line is to test more general and complicated hypotheses. The current paper only dealswith the relatively simple Erdos-Renyi null hypotheses, whereas the proposed methods may beextended to more general settings. For instance, in light of Theorem 2.3, the test statistics basedon long loose cycles may also test the null hypothesis that the hypergraph is an SBM with kcommunities in which k > 1 is given; in light of Theorems 2.8 and 2.9, the test statistics based onsub-hypergraph counts may be extended to the null hypothesis that the model is a degree-correctedSBM with k communities. It is a worthwhile project to investigate the validity of these methods,especially in real-world situations.

6. Proof of Main Results. In this section, we prove the main results of this paper. Theproofs of Lemmas 6.3, 6.4, 6.5, 6.7, 6.9 and Propositions 2.7, 2.11 are relegated to the supplement.

6.1. Proof of Theorem 2.1. The proof is based on one result in Janson [40] as below.

Proposition 6.1 (Janson, 1995). Suppose that Ln = dQndPn , regarded as a random variable on

(Ωn,Fn,Pn), converges in distribution to some random variable L as n→∞. Then Pn and Qn arecontiguous if and only if L > 0 a.s. and EL = 1.

We prove Theorem 2.1 for k = 2. The general case can be proved similarly, but with moretediousness. For convenience, we use σi = + or − (rather than σi = 1 or 2) to represent thepotential community label of i. We use i1 : im to represent the ordering i1i2 . . . im, and hence,

Ai1:im = Ai1i2...im . Define I[σi1 : σim ] = I[σi1 = σi2 = · · · = σim ]. Let d = a+(2m−1−1)b2m−1 , p0 = d

nα ,q0 = 1− p0. Therefore, the hyperedge probabilities pi1i2...im(σ) and qi1i2...im(σ) are rewritten as

pi1:im(σ) = P(Ai1:im = 1|σ) =( a

nα

)I[σi1 :σim ]( b

nα

)1−I[σi1 :σim ],


and qi1:im(σ) = 1 − pi1:im(σ). Let Yn = PH1(A)/PH0(A) be the likelihood ratio of the adjacenttensor A, where PH0 and PH1 are the probability measures under H0 and H1 respectively. Then

Yn = 2−n∑

σ∈±n∏i∈c(m,n)

(pi1:im (σ)

p0

)Ai1:im( qi1:im (σ)

q0

)1−Ai1:imwhich leads to that

Y 2n = 2−2n

∑σ,η∈±n

∏i∈c(m,n)

(pi1:im(σ)pi1:im(η)

p20

)Ai1:im(qi1:im(σ)qi1:im(η)

q20

)1−Ai1:im.

The expectation of Y 2n under H0 is

E0Y2n = 2−2n

∑σ,η∈±n

∏i∈c(m,n)

(pi1:im(σ)pi1:im(η)

p0+qi1:im(σ)qi1:im(η)

q0

).(19)

For any σ, η ∈ ±n, define s2 = #1 ≤ i1 < i2 < · · · < im ≤ n : I[σi1 : σim ] + I[ηi1 : ηim ] = 2,s1 = #1 ≤ i1 < i2 < · · · < im ≤ n : I[σi1 : σim ] + I[ηi1 : ηim ] = 1 and s0 = #1 ≤ i1 < i2 < · · · <im ≤ n : I[σi1 : σim ] + I[ηi1 : ηim ] = 0. Note that s0, s1, s2 are bounded above by nm. By directexaminations, we have

1

p0

( a

nα

)2+

1

q0

(1− a

nα

)2= 1 +

(a− d)2

dnα+

(a− d)2

n2α+O(

1

n3α),

1

p0

a

nαb

nα+

1

q0

(1− a

nα

)(1− b

nα

)= 1 +

(a− d)(b− d)

dnα+

(a− d)(b− d)

n2α+O(

1

n3α),

1

p0

( b

nα

)2+

1

q0

(1− b

nα

)2= 1 +

(b− d)2

dnα+

(b− d)2

n2α+O(

1

n3α).

Then for α > m2 , we have by (19) that

E0Y2n = (1 + o(1))Eση

(1 +

(a− d)2

dnα

)s2(1 +

(a− d)(b− d)

dnα

)s1(1 +

(b− d)2

dnα

)s0= (1 + o(1))Eση exp

(a− d)2

dnαs2 +

(a− d)(b− d)

dnαs1 +

(b− d)2

dnαs0

.(20)

If α > m, thensjnα → 0 for j = 0, 1, 2. Hence E0Y

2n → 1. Since E0Yn = 1, we have that Yn converges

to 1 in distribution. By Proposition 6.1, H0 and H1 are contiguous.Next we consider α = m. Note that

s2 =∑

i∈c(m,n)

I[σi1 : σim ]I[ηi1 : ηim ],

s1 =∑

i∈c(m,n)

(I[σi1 : σim ](1− I[ηi1 : ηim ]) + (1− I[σi1 : σim ])I[ηi1 : ηim ]

),

s0 =∑

c(i,m,n)

(1− I[σi1 : σim ])(1− I[ηi1 : ηim ]).

Then the numerator of the exponent in (20) can be written as

(a− d)2s2 + (a− d)(b− d)s1 + (b− d)2s0

=

(n

m

)(b− d)2 + (a− b)2

∑c(i,m,n)

I[σi1 : σim ]I[ηi1 : ηim ]

+(a− b)(b− d)( ∑i∈c(m,n)

I[σi1 : σim ] +∑

c(i,m,n)

I[ηi1 : ηim ]).(21)


For s, t = +1,−1, let

ρst =n∑i=1

I[σi = t]I[ηi = s], ρt0 =n∑i=1

I[σi = t], ρ0s =n∑i=1

I[ηi = s],

and

ρst =1√n

n∑i=1

(I[σi = t]I[ηi = s]− 1

22), ρt0 =

1√n

n∑i=1

(I[σi = t]− 1

2), ρ0s =

1√n

n∑i=1

(I[ηi = s]− 1

2).

It is easy to verify that∑

s,t ρst = 0,∑

s ρs0 = 0,∑

t ρ0t = 0 and∑1≤i1,...,im≤n

I[σi1 : σim ]I[ηi1 : ηim ] = m!∑

i1<i2<···<im

I[σi1 : σim ]I[ηi1 : ηim ] +O(nm−1).

Then we have∑i∈c(m,n)

I[σi1 : σim ]I[ηi1 : ηim ] =1

m!

∑1≤i1,...,im≤n

I[σi1 : σim ]I[ηi1 : ηim ] +O(nm−1)

=1

m!

∑1≤i1,...,im≤n

∑s,t=−1,+1

m∏j=1

I[σij = s]I[ηij = t] +O(nm−1)

=1

m!

∑s,t=−1,+1

ρmst +O(nm−1)(22)

=1

m!

∑s,t=−1,+1

(√nρst +

n

22)m +O(nm−1)

=1

m!

4nm

22m+

1

m!nm−1

∑s,t

ρ2st

m∑k=2

(m

k

)1

22(m−k)

( ρst√n

)k−2+O(nm−1),

∑i∈c(m,n)

I[σi1 : σim ] =1

m!

∑1≤i1,...,im≤n

I[σi1 : σim ] +O(nm−1)

=1

m!

∑t=−1,+1

ρmt0 +O(nm−1)(23)

=1

m!

2nm

2m+nm−1

m!

∑t

ρ2t0

m∑k=2

(m

k

)1

2(m−k)

( ρt0√n

)k−2+O(nm−1),

∑i∈c(m,n)

I[ηi1 : ηim ] =1

m!

∑1≤i1,...,im≤n

I[ηi1 : ηim ] +O(nm−1)(24)

=1

m!

∑s=−1,+1

ρm0s +O(nm−1)(25)

=1

m!

2nm

2m+nm−1

m!

∑s

ρ20s

m∑k=2

(m

k

)1

2(m−k)

( ρ0s√n

)k−2+O(nm−1).


If α = m, by (21), (22), (23), (25), and law of large number, we have

(a− d)2 s2

nm+ (a− d)(b− d)

s1

nm+ (b− d)2 s0

nm→ (a− b)2 4

22m+ (a− b)(b− d)

4

2m+ (b− d)2

=(a− b

2m−1+ (b− d)

)2= 0.(26)

Combining (20) and (26), we get that E0Y2n → 1, which implies that H0 and H1 are contiguous by

Proposition 6.1.Let α = m− 1 + δ, for 0 < δ < 1. Note that | ρst√

n|, | ρs0√

n|, | ρ0t√

n| are all bounded by 1. Hence, there

is a universal constant C such that

(a− b)2

dm!

∣∣∣ m∑k=2

(m

k

)1

22(m−k)

( ρts√n

)k−2∣∣∣ ≤ C,

(a− b)(b− d)

dm!

∣∣∣ m∑k=2

(m

k

)1

2(m−k)

( ρt0√n

)k−2∣∣∣ ≤ C,

(a− b)(b− d)

dm!

∣∣∣ m∑k=2

(m

k

)1

2(m−k)

( ρ0s√n

)k−2∣∣∣ ≤ C.

Note that (b − d)2 + 422m

(a − b)2 + 42m (a − b)(b − d) = 0. Then by (20), (21), (22), (23), (25), we

have

E0Y2n ≤ (1 + o(1))Eση exp

∑s,t

C

nδρ2st +

∑t

C

nδρ2t0 +

∑s

C

nδρ2

0s +O(1

nδ).(27)

By central limit theorem and Slutsky’s theorem, ρ2st, ρ

2s0 and ρ2

0t converge to chi-square distributions,which implies that C

nδρ2st,

Cnδρ2s0 and C

nδρ2

0t converge to zero in probability. For any γ > 0 and β > 0,by Hoeffding inequality, we have

P(

exp Cnδρ2st

> γβ

)= P

( |ρts|√n>

√nδ log γβ

Cn

)≤ 2 exp

− nδ log γβ

Cn

n

m

= 2γ−β

nδ

C .

Choose a n0 > 0 such that C < βnδ0. For any n ≥ n0 and C1 > 0, we have

(28)

∫ ∞C1

P(

exp Cnδρ2st

> γβ

)dγ ≤ 2

βnδ0C − 1

C1−βn

δ0

C1 .

Notice that there are totally eight items in the summation∑

s,t +∑

s +∑

t where the sums rangeover s, t = ±. Therefore, we have

P(

exp∑

s,t

C

nδρ2st +

∑t

C

nδρ2t0 +

∑s

C

nδρ2

0s

> t)

≤∑s,t

P(

exp Cnδρ2st

> t

18

)+∑s

P(

exp Cnδρ2t0

> t

18

)+∑t

P(

exp Cnδρ2

0s

> t

18

).

Together with (28), the variable in the right side of (27) is uniform integrable. By E0Y2n ≥ 1, we

conclude that E0Y2n → 1, hence H0 and H1 are contiguous by Proposition 6.1.

For k > 2, let S = 1, 2, . . . , k and σi ∈ S. It can be checked that

Yn = k−n∑σ∈Sn

∏i∈c(m,n)

(pi1:im(σ)

p0

)Ai1:im(qi1:im(σ)

q0

)1−Ai1:im.

The rest of the proof follows by a line-by-line check of the k = 2 case.


6.2. Proof of Theorem 2.2. The key idea in proving Theorem 2.2 is to count the long loose cyclesand use Theorem 1 in Gao and Wormald [36]. Here “long” means that the number of hyperedgesin the loose cycle diverges along with n. Recall Theorem 1 from Gao and Wormald [36] below.

Theorem 6.2 (Gao and Wormald, 2004). Let sn > − 1µn

and σn =√µn + µ2

nsn, where µn > 0

satisfies µn →∞. Suppose that µn = o(σ3n) and Xn is a sequence of nonnegative random variables

satisfying

E[Xn]k ∼ µkn exp(k2sn

2

),

uniformly for all integers k in the range c1µn/σn ≤ k ≤ c2µn/σn for some constants c2 > c1 > 0.Then (Xn − µn)/σn converges in distribution to the standard normal variable as n → ∞. Hereαn ∼ βn means limn→∞

αnβn

= 1.

Let Xξn be the number of ξn-hyperedge loose cycles over the observed hypergraph. We willcompute the expectation of [Xξn ]s under H1. Consider the s-tuple of ξn-hyperedge loose cycles(Hξn1, . . . ,Hξns) in which Hξnj are ξn-hyperedge loose cycles. Let B be the collection of such s-tuples with vertex disjoint cycles and B be the collection of tuples in which two cycles have commonvertex. The expectation of [Xξn ]s under H1 can be expressed as

E1[Xξn ]s =∑B

E1I∪si=1Hξni+∑B

E1I∪si=1Hξni.

Let τ be a random label assignment. The first term in the right hand side of the above equation is

E1I∪si=1Hξni= E1

s∏i=1

IHξni = Eτs∏i=1

E1IHξni = Eτs∏i=1

∏i1,...,im∈E(Hξni)

Mi1i2...im(τ)

nm−1

=s∏i=1

([a+ (km−1 − 1)b

(kn)m−1

]ξn+ (k − 1)

[ a− b(kn)m−1

]ξn)=

1

n(m−1)ξns

([a+ (km−1 − 1)b

km−1

]ξn+ (k − 1)

[a− bkm−1

]ξn)s,

where E(Hξni) is the hyperedge set of Hξni. Note that #B = n!(n−M1)!

(1

2ξn(m−2)!ξn

)s, where M1 =

(m− 1)ξns. Then for M1 = o(√n),∑

B

E1I∪si=1Hξni= #B × E1I∪si=1Hξni

=n!

(n−M1)!n−M1

( 1

2ξn

[a+ (km−1 − 1)b

km−1(m− 2)!

]ξn+

(k − 1)

2ξn

[ a− bkm−1(m− 2)!

]ξn)s∼

( 1

2ξn

[a+ (km−1 − 1)b

km−1(m− 2)!

]ξn+

(k − 1)

2ξn

[ a− bkm−1(m− 2)!

]ξn)s.

The “∼” is due to the trivial fact that n!(n−M1)!n

−M1 → 1 as M1 = o(√n). Note that #B ≤M2

1nM1−1

and E1[I∪si=1Hξni|τ ] ≤

(a

nm−1

)|E(H)|, then

∑B

E1I∪si=1Hξni≤M2

1nM1−1

( a

nm−1

)|E(H)|= M2

1

aM1

n→ 0,


provided that M1 ≤ δ1 loga n for a constant 0 < δ1 < 1.

Define µn1 = 12ξn

[a+(km−1−1)bkm−1(m−2)!

]ξn+ (k−1)

2ξn

[a−b

km−1(m−2)!

]ξnand µn0 = 1

2ξn

[a+(km−1−1)bkm−1(m−2)!

]ξn. If M1 ≤

δ1 loga n, then

(29) E1[Xξn ]s ∼ µsn1,

(30) E0[Xξn ]s ∼ µsn0.

Note that κ > 1 implies λm > 1. To see this, let a = c + (km−1 − 1)d and b = c − d for someconstants c > d > 0. Then it follows from κ > 1 that c > (m − 2)!, which yields λm > 1. Thenµn1, µn0 →∞ as n→∞. It is obvious that

µn1 ≤

(logγ n

)δ0ξn

, µn0 ≤

(logγ n

)δ0ξn

.

Let σn1 =√µn1, σn0 =

√µn0. For any constant c2 > c1 > 0 and s satisfying c1

µn1σn1≤ s ≤ c2

µn1σn1

orc1µn0σn0≤ s ≤ c2

µn0σn0

, we have for large n

M1 = (m− 1)ξns = (m− 1)

√(logγ n

)δ0logλm

(logγ n

)δ0≤ δ1 loga n,

which implies (29) and (30) hold. By Theorem 6.2, we conclude thatXξn−µn1√

µn1and

Xξn−µn0√µn0

converge

in distribution to the standard normal variables under H1 and H0, respectively.Since κ > 1, there exits a constant ρ satisfying√

a+ (km−1 − 1)b

km−1(m− 2)!< ρ <

a− bkm−1(m− 2)!

.

It is easy to verify that µn1 = o(ρ2ξn), µn0 = o(ρ2ξn). Let An = Xξn ≤ E0Xξn + ρξn. Then wehave

PH0(An) = PH0

(Xξn − µn0√µn0

≤ ρξn√µn0

)→ Φ(∞) = 1.(31)

Note that µn1−µn0ρξn

→∞, then for large n, we have µn1 − ρξn ≥ µn0 + ρξn . Then it yields

PH1(An) ≤ PH1

(Xξn ≤ E1Xξn − ρξn

)= PH1


≤ − ρξn√µn1

)→ Φ(−∞) = 0.(32)

By definition, (31) and (32) shows that H0 and H1 are orthogonal.

6.3. Proof of Theorem 2.4. Let f = a−bkm−1(m−2)!

. By the proof of Theorem 2.2, it is easy to show

that for any ε > 0,

PH1

(2ξnXξn − λξnm − (k − 1)f ξn

(k − 1)f ξn> ε)

= PH1


>(k − 1)f ξnε

2ξn√µn1

)= 1−Φ

((k − 1)f ξnε

2ξn√µn1

)→ 0,

and PH1

(2ξnXξn−λ

ξnm −(k−1)fξn

(k−1)fξn< −ε

)→ 0. Then it follows that 2ξnXξn−λ

ξnm = (1+op(1))(k−1)f ξn .


Next, we show that λξnm −λξnm = op(1). For simplicity, we only show λξn3 −λξn3 = op(1), the general

case follows similarly. Let ηijt =(a−b)I[σi=σj=σt]+b

n2 . By Taylor expansion, we have

λξn3 − λξn3 =

ξn∑i=1

ξn(ξn − 1) . . . (ξn − i+ 1)

i!λξn−i3 (λ3 − λ3)i,

from which it follows that

(33) E(λξn3 − λξn3 )2 =

ξn∑i,j=1

Cijλ2ξn−i−j3 E(λ3 − λ3)i+j ,

where Cij = ξn(ξn−1)...(ξn−i+1)i!

ξn(ξn−1)...(ξn−j+1)j! ≤ ξ2ξn

n . For any integer s with 2 ≤ s ≤ 2ξn, we

calculate E(λ3 − λ3)s as follows:

E(λ3 − λ3)s

= E[ n2(

n3

) ∑i<j<t

(Aijt −

a+ (k2 − 1)b

n2k2

)]s=

n2s(n3

)s ∑ir<jr<tr,r=1,...,s

E[(Ai1j1t1 −

a+ (k2 − 1)b

n2k2

). . .(Aisjsts −

a+ (k2 − 1)b

n2k2

)]=

n2s(n3


E[(Ai1j1t1 − ηi1j1t1 + ηi1j1t1 −

a+ (k2 − 1)b

n2k2

)× . . .

×(Aisjsts − ηisjsts + ηisjsts −

a+ (k2 − 1)b

n2k2

)]=

n2s(n3


E[(Ai1j1t1 − ηi1j1t1

). . .(Aisjsts − ηisjsts

)+ . . .

+(ηi1j1t1 −

a+ (k2 − 1)b

n2k2

). . .(ηisjsts −

a+ (k2 − 1)b

n2k2

)].(34)

There are(n3

)sindex triples (ir, jr, tr) for 1 ≤ r ≤ s in total. Among them,

(n3

)(n−3

3

). . .(n−3(s−1)

3

)ones are disjoint, that is, (ir, jr, tr) and (iu, ju, tu) are disjoint for any 1 ≤ r < u ≤ s. In the disjointcase, the independence between ηirjrtr(1 ≤ r ≤ s) yields

E[(ηi1j1t1 −

a+ (k2 − 1)b

n2k2

). . .(ηisjsts −

a+ (k2 − 1)b

n2k2

)]= 0.

Let C1 > 1 be a constant such that |ηijt| ≤ C1n2 and |ηijt − a+(k2−1)b

n2k2| ≤ C1

n2 . Let C2 = 18C1 > 1, wehave

n2s(n3


∣∣∣E[(ηi1j1t1 − a+ (k2 − 1)b

n2k2

). . .(ηisjsts −

a+ (k2 − 1)b

n2k2

)]∣∣∣≤ n2s(

n3

)s [(n3)s−(n

3

)(n− 3

3

). . .

(n− 3(s− 1)

3

)]Cs1n2s≤ 3s−1(s− 1)!n2s+1Cs1(

n3

)s ≤ Cs2(2ξn)2ξn

n.

Consider the terms in (34) consisting of v items (Aijt− ηijt) for 1 ≤ v ≤ s. Typically they have thefollowing fashion:(35)

E[(Ai1j1t1 − ηi1j1t1) . . . (Aivjvtv − ηivjvtv)

(ηiv+1jv+1tv+1 −

a+ (k2 − 1)b

n2k2

). . .(ηisjsts −

a+ (k2 − 1)b

n2k2

)].


The above term vanishes when v = 1 since E[(Ai1j1t1 − ηi1j1t1)|σ] = 0. When v = 2, if (i1, j1, t1) 6=(i2, j2, t2), then

E[(Ai1j1t1 − ηi1j1t1)(Ai2j2t2 − ηi2j2t2)

∣∣σ] = E[(Ai1j1t1 − ηi1j1t1)

∣∣σ]E[(Ai2j2t2 − ηi2j2t2)∣∣σ] = 0,

since Aijt are independent conditional on σ. This implies that (35) vanishes. Hence, (35) is nonzeroif and only if (i1, j1, t1) = (i2, j2, t2). In this case, we have

n2s(n3


∣∣∣E(Ai1j1t1 − ηi1j1t1) . . . (Aivjvtv − ηivjvtv)

×(ηiv+1jv+1tv+1 −

a+ (k2 − 1)b

n2k2

). . .(ηisjsts −

a+ (k2 − 1)b

n2k2

)∣∣∣=

n2s(n3


∣∣∣E(Ai2j2t2 − ηi2j2t2)2(ηi3j3t3 −

a+ (k2 − 1)b

n2k2

). . .(ηisjsts −

a+ (k2 − 1)b

n2k2

)∣∣∣=

n2s(n3


∣∣∣Eηi2j2t2(1− ηi2j2t2)(ηi3j3t3 −

a+ (k2 − 1)b

n2k2

). . .(ηisjsts −

a+ (k2 − 1)b

n2k2

)∣∣∣≤ n2s(

n3

)s [(n3)s−1C1

n2

Cs−21

n2(s−2)

]=n2Cs−1

1(n3

) ≤ Cs2(2ξn)2ξn

n.

When 3 ≤ v ≤ s, for each r with 1 ≤ r ≤ v, there exists r0 6= r such that (ir0 , jr0 , tr0) = (ir, jr, tr).Otherwise the expectation in (35) will vanish. For example, if v = 4 and (i1, j1, t1) 6= (i2, j2, t2) =(i3, j3, t3) = (i4, j4, t4), then

E[(Ai1j1t1 − ηi1j1t1)(Ai2j2t2 − ηi2j2t2)3

∣∣σ] = E[(Ai1j1t1 − ηi1j1t1)

∣∣σ]E[(Ai2j2t2 − ηi2j2t2)3∣∣σ] = 0,

which implies that either (i1, j1, t1) = (i2, j2, t2) = (i3, j3, t3) = (i4, j4, t4) or (ir1 , jr1 , tr1) =(ir2 , jr2 , tr2) and (ir3 , jr3 , tr3) = (ir4 , jr4 , tr4) for distinct r1, r2, r3, r4 ∈ 1, 2, 3, 4. In the generalcase, suppose for some q with 1 ≤ q < v and pr ≥ 2 for 1 ≤ r ≤ q that

(Ai1j1t1 − ηi1j1t1) . . . (Aivjvtv − ηivjvtv) = (Ai1j1t1 − ηi1j1t1)p1 . . . (Aiqjqtq − ηiqjqtq)pq .

Then, after relabeling the indexes, one has

n2s(n3


∣∣∣E(Ai1j1t1 − ηi1j1t1) . . . (Aivjvtv − ηivjvtv)

×(ηiv+1jv+1tv+1 −

a+ (k2 − 1)b

n2k2

). . .(ηisjsts −

a+ (k2 − 1)b

n2k2

)∣∣∣=

n2s(n3

)s ∑ir<jr<tr,r=1,...,s−v+q

∣∣∣E(Ai1j1t1 − ηi1j1t1)p1 . . . (Aiqjqtq − ηiqjqtq)pq

×(ηiq+1jq+1tq+1 −

a+ (k2 − 1)b

n2k2

). . .(ηis−v+qjs−v+qts−v+q −

a+ (k2 − 1)b

n2k2

)∣∣∣=

n2s(n3

)s ∑ir<jr<tr,r=1,...,s−v+q

∣∣∣Eηi1j1t1 . . . ηiqjqtq(ηiq+1jq+1tq+1 −a+ (k2 − 1)b

n2k2

)×

· · · ×(ηis−v+qjs−v+qts−v+q −

a+ (k2 − 1)b

n2k2

)∣∣∣≤ n2s(

n3

)s [(n3)s−v+q Cq1

n2q

Cs−v1

n2(s−v)

]=

(3!)v−qCs−v+q1

nv−q≤ Cs2(2ξn)2ξn

n.


Hence, by (33) and (34) and for some large constant C3 > 1, we conclude that E(λ3 − λ3)s ≤2sCs2(2ξn)2ξn

n and

E(λξn3 − λξn3 )2 ≤ ξ2

nξ2ξnn λ2ξn

3 2ξnCξn2 (2ξn)2ξn

n≤ (C3ξn)C3ξn

n.

Let Nn = C3ξn →∞, then n = γλNnδ0C33 . For large Nn, it holds that λ

Nnδ0C33 ≥ C4N

2n for some constant

C4 > 0, which implies that

E(λξn3 − λξn3 )2 ≤ (C3ξn)C3ξn

n=

NNnn

γλNnδ0C33

≤( Nn

γC4Nn

)Nn→ 0,

leading to λξn3 − λξn3 = op(1).

Now we conclude 2ξnXξn − λξnm = (1 + op(1))(k− 1)f ξn , which implies that f = f + op(1). Since

λm and f are consistent estimators of λm and f , then an and bn are consistent estimators of a andb, respectively.

6.4. Proof of Theorem 2.5. Before proving Theorem 2.5, we need several preliminary results,i.e., Lemmas 6.3, 6.4, 6.5, 6.7 and Proposition 6.6.

Lemma 6.3. Let M0 be the following k × k matrix

M0 =

a+ (km−2 − 1)b km−2b . . . km−2b

km−2b a+ (km−2 − 1)b . . . km−2b...

... . . ....

km−2b km−2b . . . a+ (km−2 − 1)b

.Then the trace of M j

0 is

Tr(M j0 ) = (a+ (km−2 − 1)b)j + (k − 1)(a− b)j ,

for any positive integer j.

Lemma 6.4. For any 1 ≤ i1, . . . , im ≤ k, let Mi1i2...im = (a − b)I[i1 = i2 = · · · = im] + b. Ifj ≥ 1 and 1 ≤ i1, . . . , ijm−j ≤ k, then we have

k∑i1,...,ijm−j=1

Mi1i2...imMim...i2m−1Mi2m−1...i3m−2 . . .Mi(j−1)m−(j−2)...ijm−ji1 = Tr(M j0 ),

where M0 is the same as in Lemma 6.3.

Lemma 6.5. For any h ≥ 2, let Xhn be the number of h-hyperedge loose cycles in Hm(n, dnm−1 ),

where d = a+(km−1−1)bkm−1 . Then for any integer s ≥ 2, Xhnsh=2 jointly converge to independent

Poisson variables with means λh = dh

2h[(m−2)!]h.

The following proposition is useful to prove Theorem 2.5. For any non-negative integer x, let [x]jdenote the product x(x− 1) · · · (x− j + 1).


Proposition 6.6 (Janson, 1995). Let λi > 0, i = 1, 2, . . ., be constants and suppose that foreach n there are random variables Xin, i = 1, 2, . . ., and Yn (defined on the same probability space)such that Xin is non-negative integer valued and EYn 6= 0 (at least for large n), and furthermorethe following conditions are satisfied:

(A1) Xind→ Zi as n → ∞, jointly for all i, where Zi ∼ Poisson(λi) are independent Poisson

random variables;(A2) EYn[X1n]j1 · · · [Xkn]jk/EYn →

∏ki=1 µ

jii , as n → ∞, for some µi ≥ 0 and every finite

sequence j1, . . . , jk of non-negative integers;(A3)

∑∞i=1 λiδ

2i <∞, where δi = µi/λi − 1;

(A4) EY 2n /(EYn)2 → exp

(∑∞i=1 λiδ

2i

).

ThenYn

EYnd→W ≡

∞∏i=1

(1 + δi)Zi exp(−λiδi), as n→∞,

and EW = 1.

For u = 1, . . . , n, let σu = (1[σu=1], . . . , 1[σu=k])T , τu = (1[τu=1], . . . , 1[τu=k])

T . Clearly, σu, τu ∼Multinomial(1, k, p) with p = 1

k . Let C be a (k2 + 2k)× (k2 + 2k) diagonal matrix, with the first2k diagonal elements c1, the last k2 diagonal elements c2. Let

ρ = (ρ10, . . . , ρs0, ρ01, . . . , ρ0s, ρ11, ρ12, . . . , ρss)T .

Then Zn = ρCρT . By central limit theorem, ρ converges to N(0,Σ), where Σ is the covariancematrix of (σTu , τ

Tu , σ

Tu ⊗ τTu )T .

Lemma 6.7. The covariance matrix of (σTu , τTu , σ

Tu ⊗ τTu )T has the following expression:

Σ =

V 0 V ⊗ pT

0 V pT ⊗ VV ⊗ p p⊗ V V2

,where V = V ar(σu) = pIk − p2Jk, p = E(σu), V2 = p2Ik2 − p4Jk2, Jk2 is an k2 × k2 order matrixwith all elements 1. Besides, V 2 = pV , V 2

2 = p2V2. Let

R =

Ik 0 −Ik ⊗ pT

0 Ik −pT ⊗ Ik0 0 Ik2

, Λ =

V 0 00 V 00 0 Ω2

, Λ1 =

1√pV 0 0

0 1√pV 0

0 0 1pΩ2

, A =

c1Ik 0 00 c1Ik 00 0 c2Ik2

where Ω2 = V2 − p2V ⊗ Jk − p2Jk ⊗ V with Ω2

2 = p2Ω2. Then RTΣR = Λ and

R−1 =

Ik 0 Ik ⊗ pT

0 Ik pT ⊗ Ik0 0 Ik2

, Λ1R−1A(R−1)TΛ1 =

0 0 00 0 00 0 c2Ω2

.Hence, Zn → c2p

2χ2(k−1)2. Furthermore, exp(Zn)∞n=1 is uniformly integrable if κ(k − 1)2 < 1.

Proof of Theorem 2.5. We check the conditions of Proposition 6.6. Let λh = 12h

(a+(km−1−1)bkm−1(m−2)!

)hand δh = (k − 1)

(a−b

a+(km−1−1)b

)h. Condition (A1) follows from Lemma 6.5.


Next, we check condition (A2). Let S = 1, 2, . . . , k and H = (Hhi)2≤h≤s,1≤i≤js be a sjs-tupleof h-edge loose cycle Hhi for any integers s(≥ 2) and js(≥ 1). Define Xhn as the number of h-edgeloose cycles in the hypergraph and [x]j = x(x − 1) . . . (x − j + 1). Note that for any sequence ofpositive integers j2,. . . , js, we have

(36) E0Yn[X2n]j2 . . . [Xsn]js =∑H∈B

E0Yn1H +∑H∈B

E0Yn1H ,

where B is the collection of disjoint tuples H and B is the complement, that is, any two tuples H1

and H2 in B have at least one vertex in common. Direct computation yields

E0Yn1H =1

kn

∑σ∈Sn

E01H∏

i∈c(m,n)

(pi1:im(σ)

p0

)Ai1:im (qi1:im(σ)

q0

)1−Ai1:im

=1

kn

∑σ∈Sn

E01H∏

(i1,...,im)∈E(H)

(pi1:im(σ)

p0

)Ai1:im (qi1:im(σ)

q0

)1−Ai1:im

×E0

∏(i1,...,im)6∈E(H)

(pi1:im(σ)

p0

)Ai1:im (qi1:im(σ)

q0

)1−Ai1:im

=1

kn

∑σ∈Sn

E01H∏

(i1,...,im)∈E(H)

(pi1:im(σ)

p0

)Ai1:im (qi1:im(σ)

q0

)1−Ai1:im,

where the second equality follows by the independence of Ai1:im . Define σ1hi and σ2hi to be therestrictions of σ on V(Hhi) and [n]\V(Hhi). Similarly, σ1 and σ2 are the restrictions of σ on V(H)and [n]\V(H). Then by the above equation, we have

E0Yn1H =1

kn

∑σ1∈S|V(H)|

∑σ2∈S[n]/|V(H)|

E01H∏

(i1,...,im)∈E(H)

(pi1:im(σ)

p0

)Ai1:im (qi1:im(σ)

q0

)1−Ai1:im

= k−|V(H)|∑

σ1∈S|V(H)|

E01H∏

(i1,...,im)∈E(H)

(pi1:im(σ1)

p0

)Ai1:im (qi1:im(σ1)

q0

)1−Ai1:im.

Since Ai1:im = 1 for (i1, . . . , im) ∈ E(H), the above equals

k−M1pM10

∑σ1∈S|V(H)|

∏(i1,...,im)∈E(H)

pi1:im(σ1)

p0= Eσ1

∏(u,v)∈E(H)

pi1:im(σ1)

=

s∏h=2

jh∏i=1

Eσ1hi

∏(i1,...,im)∈E(Hhi)

pi1:im(σ1hi) =

s∏h=2

jh∏i=1

Eσ1hi

∏(i1,...,im)∈E(Hhi)

Mσ1hii1

,...,σ1hiim

n

=s∏

h=2

jh∏i=1

Eτhi

Mτhii1...τhiim

Mτhiim ...τhi2m−1

. . .Mτhii(h−1)(m−1)...τhii1

nh(m−1)

=s∏

h=2

jh∏i=1

Tr(Mh0 )

kh(m−1)nh(m−1),

where we used Lemma 6.4 for the last equality. Note #B = n!(n−M1)

∏kh=2( 1

2h(m−2)!h)jh , where

M1 = (m− 1)∑s

h=2 hjh. Hence, by Lemma 6.3, the first term in (36) is


#B ×s∏

h=2

jh∏i=1

Tr(Mh0 )

kh(m−1)nh(m−1)=

n!

(n−M1)!nM1

s∏h=2

[1

2h(m− 2)!h

(dh +

(k − 1)(a− b)h

km−1

)]jh=

n!

(n−M1)!nM1

s∏h=2

[λh(1 + δh)]jh →s∏

h=2

[λh(1 + δh)]jh .

For H ∈ B, one has

E0Yn1H = k−n∑σ∈Sn

E01H∏

(i1,...,im)∈E(H)

(pi1:im(σ)

p0

)Ai1:im (qi1:im(σ)

q0

)1−Ai1:im

= k−n∑σ∈Sn

∏(i1,...,im)∈E(H)

pi1:im(σ)

p0

P0(H)

≤ k−np|E(H)|0

∑σ∈Sn

p−|E(H)|0

( a

nm−1

)|E(H)|=( a

nm−1

)|E(H)|.

Then it follows that ∑H′ isomorphic to H

E0Yn1H ≤( a

nm−1

)|E(H)|(

n

|V(H)|

)|V(H)|!→ 0,

and∑

H∈B E0Yn1H → 0. Hence, E0Yn[X2n]j2 . . . [Xsn]js →∏sh=2[λh(1 + δh)]jh .

Then we check condition (A3). By (A1) and (A2), we have µhλh− 1 = λh(1+δh)

λh− 1 = δh. Besides,

λhδ2h = 1

2h

((a−b)2

km−1(m−2)!(a+(km−1−1)b)

)h= κh

2h . If κ < 1, then∑∞

h=2 λhδ2h <∞.

Lastly, we check condition (A4). Note that

E0Y2n = (1 + o(1)) exp

1

nm−1d

((nm

)(b− d)2 + (a− b)2

∑i∈c(m,n)


+(a− b)(b− d)(∑

i∈c(m,n)

I[σi1 : σim ] +∑

i∈c(m,n)

I[ηi1 : ηim ])).

Let C = (i1, . . . , im)|∃is, it : is = it, is′ 6= it′ if s′, t′ /∈ s, t. Then∑

i1,i2,...,im

I[σi1 : σim ]I[ηi1 : ηim ] = m!∑

i∈c(m,n)

I[σi1 : σim ]I[ηi1 : ηim ]+∑C

I[σi1 : σim ]I[ηi1 : ηim ]+O(nm−2).

Direct computation yields∑i∈c(m,n)


=1

m!

∑i1,i2,...,im

I[σi1 : σim ]I[ηi1 : ηim ]− 1

m!

∑C

I[σi1 : σim ]I[ηi1 : ηim ] +O(nm−2)

=1

m!

k∑s,t=1

(√nρst +

n

k2)m − 1

m!

(m

2

) k∑s,t=1

(√nρst +

n

k2)m−1 +O(nm−2)


=1

m!

nm

k2m−2+

1

m!

(m2

)nm−1

k2(m−2)

k∑s,t=1

ρ2st

[1 +

m−2∑i=1

1

k2i

(mi+2

)(m2

) ( ρst√n

)i]

−(m2

)m!

k2nm−1

k2(m−1)−(m2

)nm−1

m!

k∑s,t=1

m−1∑i=1

(m− 1

i

)1

k2(m−1−i)

( ρst√n

)i+O(nm−2).

Similarly, one gets

∑i∈c(m,n)

I[σi1 : σim ] =1

m!

nm

km−1+

1

m!

(m2

)nm−1

k(m−2)

k∑s=1

ρ2s0

[1 +

m−2∑i=1

1

ki

(mi+2

)(m2

) ( ρs0√n

)i]

−(m2

)m!

knm−1

k(m−1)−(m2

)nm−1

m!

k∑s=1

m−1∑i=1

(m− 1

i

)1

k(m−1−i)

( ρs0√n

)i+O(nm−2)

and

∑i∈c(m,n)

I[ηi1 : ηim ] =1

m!

nm

km−1+

1

m!

(m2

)nm−1

k(m−2)

k∑t=1

ρ20t

[1 +

m−2∑i=1

1

ki

(mi+2

)(m2

) ( ρ0t√n

)i]

−(m2

)m!

knm−1

k(m−1)−(m2

)nm−1

m!

k∑t=1

m−1∑i=1

(m− 1

i

)1

k(m−1−i)

( ρ0t√n

)i+O(nm−2).

Note that(nm

)= nm

m! −(m2 )m! n

m−1 +O(nm−2) and

nm

m!

((a− b)2

k2(m−2)+

2(a− b)(b− d)

km−1+ (b− d)2

)=nm

m!

(a− bkm−1

+ (b− d))2

= 0,

(m2

)nm−1

m!

(k2(a− b)2

k2(m−1)+

2k(a− b)(b− d)

km−1+ (b− d)2

)=

(m2

)nm−1

m!

(k − 1)2(a− b)2

k2(m−1).

Let c1 =(m2 )m!d

(a−b)(b−d)km−2 and c2 =

(m2 )m!d

(a−b)2k2(m−2) . Since | ρst√

n| ≤ 1, | ρ0t√

n| ≤ 1, | ρt0√

n| ≤ 1 and | ρst√

n| → 0,

| ρ0t√n| → 0, | ρt0√

n| → 0 in probability. Hence,

Zn = c2

k∑s,t=1

ρ2st

[1 +

m−2∑i=1

1

k2i

(mi+2

)(m2

) ( ρst√n

)i]

+c1

( k∑t=1

ρ20t

[1 +

m−2∑i=1

1

ki

(mi+2

)(m2

) ( ρ0t√n

)i]+

k∑s=1

ρ2s0

[1 +

m−2∑i=1

1

ki

(mi+2

)(m2

) ( ρs0√n

)i])and Zn = c2

∑ks,t=1 ρ

2st + c1

(∑kt=1 ρ

20t +

∑ks=1 ρ

2s0

)are asymptotically equivalent.

If τ1(m, k) ≤ 1, then 1 +∑m−2

i=11ki

( mi+2)(m2 )

(ρs0√n

)i≥ 0, hence

Zn ≤ c2

k∑s,t=1

ρ2st

[1 +

m−2∑i=1

1

k2i

(mi+2

)(m2

) ( ρst√n

)i]≤ c2τ2(m)

k∑s,t=1

ρ2st.


Let fj = 1√n

∑ju=1

( (1[σu=1]1[ηu=1] − 1

k2

), . . . ,

(1[σu=k]1[ηu=k] − 1

k2

) )Tand dj = fj − fj−1. Then

‖dj‖2 = 1nk2−1k2

and b2∗ =∑n

j=1 ‖dj‖2 = k2−1k2

. By Theorem 3.5 in Pinelis [55], for any t > 0,

P

(exp

c2τ2(m)‖fn‖2

> t

)= P

(c2τ2(m)‖fn‖2 > log(t)

)= P

(‖fn‖ >

√log(t)

c2τ2(m)

)

≤ 2 exp

(− log(t)

κ(k2 − 1)τ2(m)

)= 2t

− 1κ(k2−1)τ2(m) .

Hence, the condition κ(k2 − 1)τ2(m, k) < 1 implies that exp(Zn)∞n=1 is uniformly integrable.By Lemma 6.7, we conclude that Zn converges to c2

k2χ2

(k−1)2 . Since κ(k2− 1)τ2(m, k) < 1 implies

κ(k − 1)2 < 1 and c2k2< 1

2 , then it follows that

E0Y2n → exp

−(m2

)m!d

(k − 1)2(a− b)2

k2(m−1)

E exp

c2

k2χ2

(k−1)2

= exp

−(m2

)m!d

(k − 1)2(a− b)2

k2(m−1)

exp

− (k − 1)2

2log(

1− 2c2

k2

)= exp

∞∑h=2

λhδ2h

,

where we used the fact that

(k − 1)2

2

(2c2

k2

)h 1

h=

(k − 1)2

2h

(a+ (km−1 − 1)b

km−1(m− 2)!

)h( (a− b)2

(a+ (km−1 − 1)b)2

)h= λhδ

2h.

Obviously, E0Yn = 1. Hence, H0 and H1 are contiguous.

The proof of Theorem 2.6 relies on the following lemma.

Lemma 6.8. If σi, τi ∈ ± for i = 1, . . . , n, then it holds that

n∑i1,...,im=1

[m−1∏l=1

(σilσil+1+ 1)− 1

][m−1∏l=1

(τilτil+1+ 1)− 1

]

=∑

2≤t,s≤mt, s even

∑max0,t+s−m≤c≤mins,t

m!

c!(t− c)!(s− c)!(m− t− s+ c)!

×nm−t−s+c(n∑i=1

σiτi)c(

n∑i=1

σi)t−c(

n∑i=1

τi)s−c.(37)

Proof of Lemma 6.8. Let

Jm :=

n∑i1,...,im=1

[m−1∏l=1

(σilσil+1+ 1)− 1

][m−1∏l=1

(τilτil+1+ 1)− 1

]which has an expression

Jm =

n∑i1,...,im=1

∑t, s even

1≤l1<···<lt≤m1≤h1<···<hs≤m

σil1 · · ·σilt τih1 · · · τihs .

Let c = #l1, . . . , ls ∩ h1, . . . , hs. Hence, taking the sum over i1, . . . , im = 1, . . . , n, the termsof Jm have a common expression nm−t−s+c(

∑ni=1 σiτi)

c(∑n

i=1 σi)t−c(

∑ni=1 τi)

s−c and there are(mc

)(m−ct−c)(m−ts−c)

such terms. Proof is completed.


Proof of Theorem 2.6. For convenience, we prove Part (2) first.Proof of Part (2). For simplicity, assume that the random labels σi are iid uniformly distributed

over ±, similar as [49]. The likelihood ratio can be explicitly written as

Yn = 2−n∑

τ∈±n

∏i1<···<im

(pi1:im(τ)

p0

)Ai1:im (qi1:im(τ)

q0

)1−Ai1:im.

Recall that

pi1:im(σ) =( a

nm−1

)1[σi1=···=σim ]

(b

nm−1

)1−1[σi1=···=σim ]

, qi1:im(σ) = 1− pi1:im(σ),

p0 =a+ (2m−1 − 1)b

(2n)m−1, q0 = 1− p0.

It turns out that when m ≥ 3 the second moment of Yn under H0 is asymptotically tricky so thesecond moment method considered by [49] doesn’t work. Instead we will use a truncation techniqueto show that Yn = OP1(1) leading to that H1 is contiguous to H0, where P1 is the probabilitymeasure of A under H1. With a slight abuse of notation, we will also use P1 to represent the jointprobability measure of A and σ under H1. Choose a large C > 0 so that P1(|

∑ni=1 σi| ≥ C

√n)

is small. It suffices to show that P1(Yn ≥ C) is small when C is large. Define Yn = Yn1[|σ|≤C√n],

where |σ| := |∑n

i=1 σi|. To proceed, consider

P1(Yn ≥ C) = P1(Yn ≥ C, σ ∈ ±n)

≤ P1(Yn ≥ C, |σ| ≤ C√n) + P1(|σ| > C

√n)

≤ P1(Yn ≥ C) + P1(|σ| > C√n)

≤ E1Yn/C + P1(|σ| > C√n).

Next we will show that the first term in the last equation is small when C is large. Note that

E1Yn = 2−n∑

τ∈±nE1

∏i1<···<im

(pi1:im(τ)

p0

)Ai1:im (qi1:im(τ)

q0

)1−Ai1:im1[|σ|≤C

√n]

= 4−n∑

τ∈±n

∑|σ|≤C

√n

E1

[ ∏i1<···<im

(pi1:im(τ)

p0

)Ai1:im (qi1:im(τ)

q0

)1−Ai1:im∣∣∣∣σ]

= 4−n∑

τ∈±n

∑|σ|≤C

√n

∏i1<···<im

[pi1:im(σ)pi1:im(τ)

p0+qi1:im(σ)qi1:im(τ)

q0

]

= 4−n∑

τ∈±n

∑|σ|≤C

√n

∏i1<···<im

[1

p0

( a

nm−1

)Nστi1:im

(b

nm−1

)2−Nστi1:im

+1

q0

(1− a

nm−1

)Nστi1:im

(1− b

nm−1

)2−Nστi1:im

],


where Nστi1:im

= 1[σi1=···=σim ] + 1[τi1=···=τim ]. Therefore,

E1Yn = 4−n∑

τ∈±n

∑|σ|≤C

√n

∏Nστi1:im

=0

[1

p0

(b

nm−1

)2

+1

q0

(1− b

nm−1

)2]

×∏

Nστi1:im

=1

[1

p0

( a

nm−1

)( b

nm−1

)+

1

q0

(1− a

nm−1

)(1− b

nm−1

)]

×∏

Nστi1:im

=2

[1

p0

( a

nm−1

)2+

1

q0

(1− a

nm−1

)2]

= 4−n∑

τ∈±n

∑|σ|≤C

√n

[1

p0

(b

nm−1

)2

+1

q0

(1− b

nm−1

)2]Mστ

0

×[

1

p0

( a

nm−1

)( b

nm−1

)+

1

q0

(1− a

nm−1

)(1− b

nm−1

)]Mστ1

×[

1

p0

( a

nm−1

)2+

1

q0

(1− a

nm−1

)2]Mστ

2

,

where Mστt = #i1 < · · · < im : Nστ

i1:im= t for t = 0, 1, 2. Straightforward calculations lead to

that

1

p0

(b

nm−1

)2

+1

q0

(1− b

nm−1

)2

= 1 + γ +O(n−3(m−1)),

1

p0

( a

nm−1

)( b

nm−1

)+

1

q0

(1− a

nm−1

)(1− b

nm−1

)= 1− (2m−1 − 1)γ +O(n−3(m−1)),

1

p0

( a

nm−1

)2+

1

q0

(1− a

nm−1

)2= 1 + (2m−1 − 1)2γ +O(n−3(m−1)),(38)

where

γ =(m− 2)!κ

nm−1+

1

n2(m−1)

(a− b2m−1

)2

,

and κ is the SNR defined in (4) with k = 2 therein. Dropping the O(n−3(m−1)) terms in (38), weget

E1Yn ∼ 4−n∑

τ∈±n

∑|σ|≤C

√n

(1 + γ)Mστ0 (1− (2m−1 − 1)γ)M

στ1 (1 + (2m−1 − 1)2γ)M

στ2

∼ 4−n∑

τ∈±n

∑|σ|≤C

√n

exp(γ(Mστ

0 − (2m−1 − 1)Mστ1 + (2m−1 − 1)2Mστ

2 ))

= Eτσ[exp

(γ(Mστ

0 − (2m−1 − 1)Mστ1 + (2m−1 − 1)2Mστ

2 ))

1|σ|≤C√n

].(39)

Since Mστ0 +Mστ

1 +Mστ2 =

(nn

), we get

Mστ0 − (2m−1− 1)Mστ

1 + (2m−1− 1)2Mστ2 = −(2m−1− 1)

(n

m

)+ 2m−1(Mστ

0 −Mστ2 ) + 22(m−1)Mστ

2 .


Rewrite

Mστ0 =

∑i1<···<im

(1− 1[σi1=···=σim ])(1− 1[τi1=···=τim ])

Mστ2 =

∑i1<···<im

1[σi1=···=σim ]1[τi1=···=τim ].

So

Mστ0 − (2m−1 − 1)Mστ

1 + (2m−1 − 1)2Mστ2

= −(2m−1 − 1)

(n

m

)+ 2m−1

[(n

m

)−

∑i1<···<im

1[σi1=···=σim ] −∑

i1<···<im

1[τi1=···=τim ]

]+22(m−1)

∑i1<···<im

1[σi1=···=σim ]1[τi1=···=τim ]

=

(n

m

)− 2m−1

∑i1<···<im

1[σi1=···=σim ] − 2m−1∑

i1<···<im

1[τi1=···=τim ]

+22(m−1)∑

i1<···<im

1[σi1=···=σim ]1[τi1=···=τim ],

and by 1[σi1=···=σim ] = 2−(m−1)∏m−1l=1 (σilσil+1

+ 1), the above is equal to

(n

m

)−

∑i1<···<im

m−1∏l=1

(σilσil+1+ 1)−

∑i1<···<im

m−1∏l=1

(τilτil+1+ 1)

+∑

i1<···<im

m−1∏l=1

(σilσil+1+ 1)

m−1∏l=1

(τilτil+1+ 1)

=∑

i1<···<im

[m−1∏l=1

(σilσil+1+ 1)− 1

][m−1∏l=1

(τilτil+1+ 1)− 1

]

=1

m!

n∑i1,...,im=1

[m−1∏l=1

(σilσil+1+ 1)− 1

][m−1∏l=1

(τilτil+1+ 1)− 1

]+O(nm−1)

≡ 1

m!Jm +O(nm−1).

Note that Jm has an expression given by Lemma 6.8. Fix even 2 ≤ t, s ≤ m and |σ| ≤ C√n.

Consider the following cases:Case 1. If c satisfies Lemma 6.8 with t− c ≥ 2, then

nm−t−s+c(n∑i=1

σiτi)c(

n∑i=1

σi)t−c(

n∑i=1

τi)s−c ≤ Ct−cnm−(t−c)/2 ≤ Ct−cnm−1.


Case 2. Suppose that c satisfies Lemma 6.8 with t− c = 1, hence, c = t−1 ≤ s. If s = t−1, then

nm−t−s+c|(n∑i=1

σiτi)c(

n∑i=1

σi)t−c(

n∑i=1

τi)s−c| = nm−t|

n∑i=1

σiτi|t−1|n∑i=1

σi|

≤ Cnm−t+1/2|n∑i=1

σiτi|t−1

≤ Cnm−5/2(n∑i=1

σiτi)2.

If s > t− 1, then


σiτi)c(

n∑i=1

σi)t−c(

n∑i=1

τi)s−c| ≤ Cnm−5/2|

n∑i=1

σiτi| · |n∑i=1

τi|.

Case 3. Suppose that c satisfies Lemma 6.8 with t− c = 0. Then


σiτi)c(

n∑i=1

σi)t−c(

n∑i=1

τi)s−c| ≤ nm−2(

n∑i=1

σiτi)2.

Note that the number times that Case 3 occurs is

Nm ≡∑

2≤t≤s≤mt, s even

m!

t!(s− t)!(m− s)!

=∑

2≤s≤ms even

(m

s

) ∑2≤t≤st even

(s

t

)=

∑2≤s≤ms even

(m

s

)(2s−1 − 1)

= [3m + (−1)m]/4− 2m−1 + 1/2.

Based on the above Cases 1-3, we get that

E1Yn ∼ Eστ exp

(γ

[1

m!Jm +O(nm−1)

])1[|σ|≤C

√n]

≤ Eστ exp

(γ

[Nm

m!nm−2(

n∑i=1

σiτi)2 +O(nm−5/2)|

n∑i=1

σi| · |n∑i=1

τi|

+O(nm−5/2)(

n∑i=1

σiτi)2 +O(nm−1)

])

≤ Eστ exp

([κ(m− 2)!Nm

m!+O(n−1/2)

](n−1/2

n∑i=1

σiτi)2 +O(1)|n−1/2

n∑i=1

σiτi| · |n−1n∑i=1

τi|

)

≤ Eστ exp

([κ(m− 2)!Nm

m!+O(n−1/2) + ε)

](n−1/2

n∑i=1

σiτi)2 +

C2

ε(n−1

n∑i=1

τi)2

),


where ε > 0 is small so that κ(m−2)!Nmm! +O(n−1/2)+ε < 1/2. Due to the independence of

∑ni=1 σiτi

and∑n

i=1 τi, we get

E1Yn

≤ Eστ exp

([κ(m− 2)!Nm

m!+O(n−1/2) + ε)

](n−1/2

n∑i=1

σiτi)2

)× Eστ exp

(C2

ε(n−1

n∑i=1

τi)2

).

(40)

By Hoeffding inequality and uniform integrability, the two expectations in the last equation arebounded, which leads to Yn = OP1(1) and so Yn = OP1(1). So H1 is contiguous to H0. Suppose thereexists a valid asymptotically powerful test which satisfies P0(reject H0)→ 0 and P1(reject H0)→ 1.However, H1 is contiguous to H0, hence, P1(reject H0)→ 0. Contradiction! This finishes the proofof Part (2).

Proof of Part (1). We conduct a finer analysis on Case 3. Let ρ = 1n

∑ni=1 σiτi and η = 1

n

∑ni=1 τi.

Then ∑2≤t≤s≤mt, s even

(m

s

)(s

t

)nm−s(

n∑i=1

σiτi)t(

n∑i=1

τi)s−t

= nm∑

2≤t≤s≤mt, s even

(m

s

)(s

t

)ρtηs−t

= nmρ2∑

0≤s≤m−2s even

(m

s+ 2


(s+ 2

t+ 2

)ρtηs−t

≤ m(m− 1)

2nmρ2

∑0≤s≤m−2s even

(m− 2

s


(s

t

)ρtηs−t

=m(m− 1)

8nmρ2

[(ρ+ η + 1)m−2 + (−ρ− η + 1)m−2 + (−ρ+ η + 1)m−2 + (ρ− η + 1)m−2

].

Notice that

|ρ+ η + 1| = | 1n

n∑i=1

(σi + 1)τi + 1| ≤ 1 +1

n

n∑i=1

|1 + σi| = 1 +2

n|S+σ |,

where S+σ = i : σi = +. Since |σ| ≤ C

√n, |S+

σ | ∼ n/2, leading to that |ρ+ η + 1| ≤ 2. Similarly,one can show that | − ρ− η + 1|, | − ρ+ η + 1|, |ρ− η + 1| are all less than or equal to 2. So

∑2≤t≤s≤mt, s even

(m

s

)(s

t

)nm−s(

n∑i=1

σiτi)t(

n∑i=1

τi)s−t ≤ m(m− 1)2m

8nmρ2.

Similar to (40), it holds that when κ2m/8 < 1/2, i.e., κ < 22−m, E1Yn is asymptotically bounded,i.e., H1 is asymptotically contiguous to H0. This shows the desired conclusion.

Next we prove the lower bound for κ so that contiguity may fail. We begin with a slight modifi-


cation of the likelihood ratio. Consider

Yn = 2−n∑

τ∈±n

∏i1<···<im

(pi1:im(τ)

p0

)Ai1:im (qi1:im(τ)

q0

)1−Ai1:im

= 2−n∑τ

∏i∈Sτ

(a

nm−1p0Ai +

1− anm−1

1− p0(1−Ai)

)×∏i/∈Sτ

(b

nm−1p0Ai +

1− bnm−1

1− p0(1−Ai)

),

where Sτ = 1 ≤ i1 < · · · < im ≤ n : τi1 = · · · = τim. Define S = 1 ≤ i1 < · · · < im ≤ n : Ai = 1,then we have

Yn = 2−n∑τ

(a

nm−1p0

)|Sτ∩S|( b

nm−1p0

)|Sτ∩S|(1− anm−1

1− p0

)|Sτ∩S|(1− bnm−1

1− p0

)|Sτ∩S|

= 2−n∑τ

(a

nm−1p0

)|Sτ∩S|( b

nm−1p0

)|Sτ∩S|× exp

(−(2m−1 − 1)(a− b)

(2n)m−1|Sτ ∩ S|+

a− b(2n)m−1

|Sτ ∩ S|).

By Chernoff inequality [15],

P1

( ∑i∈Sτ∩Sσ

(Ai −a

nm−1) ≤ −

√2an2−m

|Sτ ∩ Sσ||Sτ ∩ Sσ|

)≤ e−n,

which leads to

P1

minτ

∑i∈Sτ∩Sσ(Ai − a

nm−1 )√a

nm−1 |Sτ ∩ Sσ|≤ −√

2n

≤ e−n2n → 0.

Similarly,

P1

minτ

∑i∈Sτ∩Sσ(Ai − b

nm−1 )√b

nm−1 |Sτ ∩ Sσ|≤ −√

2n

→ 0,

P1

maxτ


nm−1 )√n+

√n+ 2a

nm−1 |Sτ ∩ Sσ|≥√n

→ 0,

P1

maxτ


nm−1 )√n+

√n+ 2b

nm−1 |Sτ ∩ Sσ|≥√n

→ 0.

Define

E1 =

minτ


nm−1 )√a

nm−1 |Sτ ∩ Sσ|≥ −√

2n,minτ


nm−1 )√b

nm−1 |Sτ ∩ Sσ|≥ −√

2n

,

E2 =

maxτ


nm−1 )√n+

√n+ 2a

nm−1 |Sτ ∩ Sσ|≤√n,max

τ


nm−1 )√n+

√n+ 2b

nm−1 |Sτ ∩ Sσ|≤√n

.


Therefore, P (E1 ∩ E2)→ 1. On E1 ∩ E2, for any τ ∈ ±n,

|Sτ ∩ S| −∑i∈Sτ

qi(σ) = −∑

i∈Sτ∩Sσ

(Ai −a

nm−1)−

∑i∈Sτ∩Sσ

(Ai −b

nm−1)

≥ −√n

(2√n+

√n+

2a

nm−1|Sτ ∩ Sσ|+

√2b

nm−1|Sτ ∩ Sσ|

)= O(n),

and

|Sτ ∩ S| −∑i∈Sτ

qi(σ) = −∑

i∈Sτ∩Sσ

(Ai −a

nm−1)−

∑i∈Sτ∩Sσ

(Ai −b

nm−1)

≤√

2n

(√a

nm−1|Sτ ∩ Sσ|+

√b

nm−1|Sτ ∩ Sσ|

)= O(n).

Therefore, on E1 ∩ E2,

Yn & 2−n∑τ

(a

nm−1p0

)|Sτ∩S|( b

nm−1p0

)|Sτ∩S|

× exp

−(2m−1 − 1)(a− b)(2n)m−1

∑i∈Sτ

qi(σ) +a− b

(2n)m−1

∑i∈Sτ

qi(σ)

≡ Yn.Clearly,

Yn ≥ 2−n(

a

nm−1p0

)|Sσ∩S|( b

nm−1p0

)|Sσ∩S|× exp

−(2m−1 − 1)(a− b)(2n)m−1

∑i∈Sσ

qi(σ) +a− b

(2n)m−1

∑i∈Sσ

qi(σ)

.

It is easy to see that

a

nm−1p0= t ≡ 2m−1a

a+ (2m−1 − 1)b,

b

nm−1p0= r ≡ 2m−1b

a+ (2m−1 − 1)b.

Let Xσ = |Sσ ∩ S|, Yσ = |Sσ ∩ S|, and let θσ ∈ (0, 1) satisfy

− log θσ = (t− 1− log t)E1(Xσ|σ) + (r − 1− log r)E1(Yσ|σ)

+C√

(log t)2E1(Xσ|σ) + (log r)2E1(Yσ|σ).

By Markov inequality,

P1

(tXσrYσ ≤ θσE1(tXσrYσ |σ)|σ

)= P1 ((Xσ − E1(Xσ|σ)) log t+ (Yσ − E1(Yσ|σ)) log r

≤ log θσ + (t− 1− log t)E1(Xσ|σ) + (r − 1− log r)E1(Yσ|σ))

≤ (log t)2E1(Xσ|σ) + (log r)2E1(Yσ|σ)

(log θσ + (t− 1− log t)E1(Xσ|σ) + (r − t− log r)E1(Yσ|σ))2 = C−2,


therefore, one can choose a sufficiently large C > 0 such that P1(E3)→ 1, where

E3 = tXσrYσ ≥ θσE1(tXσrYσ |σ).

On E3,

Yn ≥ 2−nθσE1(tXσrYσ |σ)× exp

−(2m−1 − 1)(a− b)(2n)m−1

∑i∈Sσ

qi(σ) +a− b

(2n)m−1

∑i∈Sσ

qi(σ)

∼ 2−nθσ exp

((2m−1 − 1)(a− b)a+ (2m−1 − 1)b

· a

nm−1|Sσ| −

(2m−1 − 1)(a− b)(2n)m−1

(1− a

nm−1

)|Sσ|

− a− ba+ (2m−1 − 1)b

· b

nm−1|Sσ|+

a− b(2n)m−1

(1− b

nm−1

)|Sσ|

)= 2−nθσ exp

((2m−1 − 1)2(a− b)2

2m−1(a+ (2m−1 − 1)b)· |Sσ|nm−1

+(a− b)2

2m−1(a+ (2m−1 − 1)b)· |Sσ|nm−1

+O(1)

).

Since |S+σ | ∼ n/2 and |S−σ | ∼ n/2, we have

|Sσ| ∼ 2

(n/2

m

)∼ nm

m!2m−1, |Sσ| =

(n

m

)− |Sσ| ∼ (1− 2−m+1)

nm

m!.

Therefore,

(41) Yn & 2−nθσ exp

((2m−1 − 1)(a− b)2

2m−1m!(a+ (2m−1 − 1)b)· n).

It is easy to see that

log θσ ≥ −(t− 1− log t)|Sσ|a

nm−1− (r − 1− log r)|Sσ|

b

nm−1+O(

√n)

∼ −[

(2m−1 − 1)(a− b)a+ (2m−1 − 1)b

− log

(2m−1a

a+ (2m−1 − 1)b

)]an

m!2m−1

+

[a− b

a+ (2m−1 − 1)b+ log

(2m−1b

a+ (2m−1 − 1)b

)](1− 2−m+1)bn

m!

= − (2m−1 − 1)(a− b)2

m!2m−1(a+ (2m−1 − 1)b)· n+ δ(m, a, b) · n,

where

δ(m, a, b) =1

m!2m−1

[a log

(2m−1a

a+ (2m−1 − 1)b

)+ (2m−1 − 1)b log

(2m−1b

a+ (2m−1 − 1)b

)].

Therefore, on E1 ∩ E2 ∩ E3, we have

Yn & 2−n exp(δ(m, a, b) · n).

For convenience, write a = c+ ε and b = c− ε. For any κ0 >m(m−1) log 2

2m−1−1, suppose that the SNR κ

for H0 and H1 is κ0, which leads to that

(a− b)2

2m−1(m− 2)!(a+ (2m−1 − 1)b)=

4ε2

2m−1(m− 2)!(2ε+ 2m−1(c− ε))= κ.


Suppose that both c and ε tend to infinity but ε grows slower than c, then the above equation leadsto that

ε2

c→ (2m−1)2(m− 2)!κ

4.

Meanwhile, it is easy to find that

δ(m, a, b)

=1

m!2m−1

[(c+ ε) log

(2m−1(c+ ε)

2ε+ 2m−1(c− ε)

)+ (2m−1 − 1)(c− ε) log

(2m−1(c− ε)

2ε+ 2m−1(c− ε)

)]=

1

m!2m−1

[(c+ ε) log

(1 +

2ε(2m−1 − 1)

2ε+ 2m−1(c− ε)

)+ (2m−1 − 1)(c− ε) log

(1− 2ε

2ε+ 2m−1(c− ε)

)]∼ 1

m!2m−1

[(c+ ε) · 2ε(2m−1 − 1)

2ε+ 2m−1(c− ε)− (2m−1 − 1)(c− ε) 2ε

2ε+ 2m−1(c− ε)

]∼ 1

m!2m−1

[(c+ ε)

2ε(2m−1 − 1)

2m−1c− (2m−1 − 1)(c− ε) 2ε

2m−1c

]=

4(2m−1 − 1)

m!(2m−1)2· ε

2

c→ (2m−1 − 1)κ

m(m− 1)> log 2.

Therefore, fixing the above large c, ε so that δ(m, a, b) > log 2, one has that, with P1-probabilityapproaching one,

Yn & exp ((δ(m, a, b)− log 2)n)→∞.Note E0Yn = 1 which leads to Yn = OP0(1), while under P1, Yn tends to infinity, so H0 and H1 aredistinguishable. This completes the proof of Part (1).

6.5. Proof of Theorem 2.8 and Theorem 2.9. The proof relies on the following lemma.

Lemma 6.9. Under the condition of Theorem 2.8, we have

(42) E(E − E)2 = O(a2

1

n

),

(43) E(V − V )2 = O(a4

1

n

),

(44) E(T − T )2 = O( a3

1

n3(m−l)

),

(45)

√(n

3(m−l))(m− l)(T − T )√T

d→ N(0, 1).

Proof of Theorem 2.8. It is easy to check the following expansion

T −( VE

)3= T −

(VE

)3+ (T − T )

+(VE− V

E

)3− 3

V

E

(VE− V

E

)2+ 3(VE

)2V − VE

−3(VE

)2( 1

E− 1

E

)V − 3

(VE

)2( 1

E− 1

E

)(V − V ).(46)


By Lemma 6.9, the first two terms in (46) are the leading terms and hence we have√(n

3(m−l))(m− l)

(T −

(V

E

)3)√T

−

√(n

3(m−l))(m− l)

(T −

(VE

)3)√T

=

√(n

3(m−l))(m− l)

(T − T

)√T

d→ N(0, 1).

Since T = T + oP (1), we have√(n

3(m−l))(m− l)

(T −

(V

E

)3)√T

− δ d→ N(0, 1),

which completes the proof.

Proof of Theorem 2.9. We rewrite the statistic as

2

√(n

3(m− l)

)(m− l)

(√T −

( VE

) 32

)

= 2

√(n

3(m− l)

)(m− l)

T −(VE

)3

√T +

(V

E

) 32

+ 2

√(n

3(m− l)

)(m− l) T − T√

T +(V

E

) 32

+ oP (1).

The first term is of the same order as δ, while the second term is bounded in probability. Hence,we get the desired result.

Acknowledgement. We are grateful to the co-Editor Professor Richard Samworth, the AE,and referees for their insightful comments which greatly improved the quality and scope of the paper.We thank Sumit Mukherjee for suggesting the truncation technique which motivates Theorem 2.6.

REFERENCES

[1] Abbe, E. (2017). Community detection and stochastic block models: recent developments. Journal of MachineLearning Research, 18, 1-86.

[2] Abbe, E. and Sandon, C. (2017). Proof of the achievability conjectures for the general stochastic block model.Communications on Pure and Applied Mathematics, 71(7), 1334-1406.

[3] Abbe, E. and Sandon, C. (2016). Detection in the stochastic block model with multiple clusters: proof of theachievability conjectures, acyclic BP, and the information-computation gap. https://arxiv.org/pdf/1512.09080.pdf.

[4] Agarwal, S., Branson, K. and Belongie, S. (2006). Higher order learning with graphs. Proceedings of the Interna-tional Conference on Machine Learning, 17-24.

[5] Amini, A., Chen, A. and Bickel, P. (2013). Pseudo-likelihood methods for community detection in large sparsenetworks. Annals of Statistics, 41(4), 2097-2122.

[6] Angelini, M., Caltagirone,F., Krzakala, F. and Zdeborova, L. (2015). Spectral detection on sparse hypergraphs.Allerton Conference on Communication, Control, and Computing, 66–73.

[7] Ahn, K., Lee, K. and Suh, C.(2016). Community recovery in hypergraphs, 54th Annual Allerton Conference onCommunication, Control, and Computing (Allerton). DOI: 10.1109/ALLERTON.2016.7852294.

[8] Ahn, K., Lee, K. and Suh, C.(2018). Hypergraph Spectral Clustering in the Weighted Stochastic Block Model.IEEE Journal of Selected Topics in Signal Processing 12(5), 2018.

[9] Dall’Amico, L., Couillet, R. and Tremblay, N.(2019). Revisiting the Bethe-Hessian: Improved community detectionin sparse heterogeneous graphs. NIPS 2019.


[10] Dall’Amico, L., Couillet, R.(2019). Community detection in sparse reallistic graphs: Improving the Bethe Hessian.ICASSP 2019 : 18778248.

[11] Banerjee, D. (2018). Contiguity and non-reconstruction results for planted partition models: the dense case.Electronic Journal of Probability, 23, 1-28.

[12] Bolla, M. (1993). Spectra, euclidean representations and clusterings of hypergraphs. Discrete Mathematics,117(1), 19-39.

[13] Bollobas, B. (2001). Random Graphs. Cambridge University Press, second edition.

[14] Bollobas, B. and Erdos, P. (1976). Cliques in random graphs. Mathematical Proceedings of the Cambridge Philo-sophical Society, 80, 419-427.

[15] Boucheron, S., Lugosi, G. and Massart P. Concentration Inequalities: A Nonasymptotic Theory of Independence.Oxford University Press.

[16] Banerjee, D. and Ma, Z. (2017). Optimal hypothesis testing for stochastic block models with growing degrees.https://arxiv.org/pdf/1705.05305.pdf.

[17] Bankds, J., Moore, C., Neeman, J. and Netrapalli, P. (2016). Information-theoretic thresholds for communitydetection in sparse networks. JMLR: Workshop and Conference Proceedings 49, 1-34.

[18] Bickel, P. J. and Sarkar, P. (2016). Hypothesis testing for automated community detection in networks. Journalof Royal Statistical Society, Series B, 78, 253-273.

[19] Chien, I., Lin, C. and Wang, I.(2018). Community Detection in Hypergraphs: Optimal Statistical Limit and Ef-ficient Algorithms. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statis-tics, 84:871-879.

[20] Chertok, M. and Keller, Y. (2010). Efficient high order matching. IEEE Trans. on Pattern Analysis and MachineIntelligence, 32(12), 2205-2215.

[21] Chen, J. and Yuan, B. (2006). Detecting functional modules in the yeast proteinprotein interaction network.Bioinformatics, 22(18), 2283-2290.

[22] Decelle, A., Krzakala, F., Moore, C., and Zdeborova, F. (2011). Asymptotic analysis of the stochastic blockmodel for modular networks and its algorithmic applications. Physics Review E, 84, 066-106.

[23] Erdos, P. and Renyi, A. (1960). On the evolution of random graphs. Publ. Math. Inst. Hungar. Acad. Sci. , 5,17-61.

[24] Estrada, E. and Rodriguez-velasquez, J. (2005). Complex networks as hypergraphs. https://arxiv.org/ftp/physics/papers/0505/0505137.pdf

[25] Fortunato,S. (2010). Community detection in graphs. Physics Reports, 486 (3-5), 75-174.

[26] Fosdick, B. K. and Hoff, P. D. (2015). Testing and modeling dependencies between a network and nodal attributes.Journal of the American Statistical Association, 110, 1047-1056.

[27] Frieze, A. and Karonski, M. (2015). Introduction to random graphs. Cambridge University Press.

[28] Florescu, L. and Perkins, W.(2016). Spectral thresholds in the bipartite stochastic block model. 29th AnnualConference on Learning Theory, 49: 943-959.

[29] Govindu, V. M. (2005). A tensor decomposition for geometric grouping and segmentation. IEEE Conference onComputer Vision and Pattern Recognition (CVPR), 1150-1157.

[30] Ghoshdastidar, D. and Dukkipati, A. (2014). Consistency of spectral partitioning of uniform hypergraphs underplanted partition model. Advances in Neural Information Processing Systems (NIPS), 397-405.

[31] Ghoshdastidar, D. and Dukkipati A. (2017). Consistency of spectral hypergraph partitioning under plantedpartition model. The Annals of Statistics, 45(1), 289-315.

[32] Gibson, D., Kleinberg, J. and Raghavan, P. (2000). Clustering categorical data: An approach based on dynamicalsystems. VLDB Journal, 8, 222-236.

[33] Gao, C. and Lafferty, J. (2017a). Testing for global network structure using small subgraph statistics. https://arxiv.org/pdf/1710.00862.pdf

[34] Gao, C. and Lafferty, J. (2017b). Testing network structure using relations between small subgraph probabilities.https://arxiv.org/pdf/1704.06742.pdf

[35] Gao, C., Ma, Z., Zhang, A.Y. and Zhou, H. H. (2016). Community detection in degree-corrected block models.https://arxiv.org/pdf/1607.06993.pdf.

[36] Gao, Z. and Wormald, N. (2004). Asymptotic normality determined by high moments, and submap counts ofrandom maps. Probab. Theory Relat. Fields, 130(3), 368–376.

[37] Ghoshal, G., Zlatic, V., Caldarelli, G. and Newman, M. E. J. (2009). Random hypergraphs and their applications.Physical Review E 79.

https://arxiv.org/pdf/1705.05305.pdf

https://arxiv.org/ftp/physics/papers/0505/0505137.pdf

https://arxiv.org/ftp/physics/papers/0505/0505137.pdf






[38] Goldenberg, A., Zheng, A. X. S., Fienberg, E., and Airoldi, E. M. (2010). A survey of statistical network models.Foundations and Trends in Machine Learning 2, 2, 129-233.

[39] Hall, P. and Heyde, C. C. (2014). Martingale limit theory and its application. Academic press.

[40] Janson, S. (1995). Random regular graphs: asymptotic distributions and contiguity. Combinatorics, Probabilityand Computing, 4, 369–405.

[41] Kim, C., Bandeira,A. and Goemans, M. (2017). Community detection in hypergraphs, spiked tensor models,and sum-of-squares. 2017 International Conference on Sampling Theory and Applications (SampTA), 124-128.

[42] Ke, Z., Shi, F. and Xia, D.(2020). Community Detection for Hypergraph Networks via Regularized Tensor PowerIteration. https://arxiv.org/pdf/1909.06503.pdf.

[43] Karrer, B. and Newman, M.E.(2011). Stochastic blockmodels and community structure in networks. PhysicalReview E, 83(1): 016107.

[44] Lei, J. (2016). A goodness-of-fit test for stochastic block models. Annals of Statistics, 44, 401-424.

[45] Lin, C., Chien, I. and Wang, I. (2017). On the fundamental statistical limit of community detection in randomhypergraphs. Information Theory (ISIT), 2017 IEEE International Symposium , 2178-2182.

[46] Leskovec, J., Lang, K. L., Dasgupta, A. and Mahoney, M. W. (2008). Statistical properties of community structurein large social and information networks. Proceeding of the 17th international conference on World Wide Web,695-704. ACM.

[47] Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. The Annals ofStatistics, 43(1), 215-237.

[48] Michoel, T. and Nachtergaele, B. (2012). Alignment and integration of complex networks by hypergraph-basedspectral clustering. Physical Review E 86.

[49] Mossel, E., Neeman, J. and Sly, A. (2015). Reconstruction and estimation in the planted partition model.Probability Theory and Related Fields, 162, 431-461.

[50] Mossel, E., Neeman, J. and Sly, A. (2017). A proof of the block model threshold conjecture. Combinatorica,1-44.

[51] Montanari, A. and Sen, S. (2016). Semidefinite programs on sparse random graphs and their application tocommunity detection. STOC ’16 Proceedings of the forty-eighth annual ACM symposium on Theory of Computing,814-827.

[52] Newman, M. (2001). Scientific collaboration networks. I. Network construction and fundamental results. PhysicalReview E, 64, 016-131.

[53] Neeman, J. and Netrapalli, P. (2014). Non-reconstructability in the stochastic block model. https://arxiv.org/abs/1404.6304

[54] Ouvrard, X., Goff, J. and Marchand-Maillet, S. (2017). Networks of Collaborations: hypergraph modeling andvisualisation, https://arxiv.org/pdf/1707.00115.pdf.

[55] Pinelis, I. (1994). Optimum bounds for the distributions of martingales in Banach spaces. The Annals of Prob-ability, 22(4), 1679-1706.

[56] Rodriguez, J. A. (2009). Laplacian eigenvalues and partition problems in hypergraphs. Applied MathematicsLetters, 22(6), 916-921.

[57] Ramasco, J., Dorogovtsev, S. N. and Pastor-Satorras, R. (2004). Self-organization of collaboration networks,Phys. Rev. E 70, 036-106.

[58] Rota Bulo, S. and Pelillo, M. (2013). A game-theoretic approach to hypergraph clustering. IEEE Transactionson Pattern Analysis and Machine Intelligence, 35(6), 1312-1327.

[59] Shi, J. and Malik, J. (1997). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysisand Machine Intelligence, 22(8), 888-905.

[60] Wormald, N. C. (1999). Models of random regular graphs. London Mathematical Society Lecture Note Series,239-298. Cambridge University Press.

[61] Yuan, M., Feng, Y. and Shang, Z. (2018a). A likelihood-ratio type test for stochastic block models with boundeddegrees. https://arxiv.org/pdf/1807.04426.pdf

[62] Yuan, M., Feng, Y. and Shang, Z. (2018b). Inference on multi-community stochastic block models with boundeddegree. Manuscript.

[63] Zhao, Y., Levina, E. and Zhu., J.(2011). Community extraction for social networks. Proc. Natn. Acad. Sci. USA,108, 7321-7326.

[64] Zhao, Y., Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-correctedstochastic block models. Annals of Statistics, 40, 2266-2292.

[65] Tokuda, T. (2018). Statistical test for detecting community structure in real-valued edge-weighted graphs. PloSone, 13(3), e0194079.

https://arxiv.org/abs/1404.6304

https://arxiv.org/abs/1404.6304



Submitted to the Annals of Statistics

Supplement to

TESTING COMMUNITY STRUCTURE FOR HYPERGRAPHS

This supplement contains the proofs of Lemmas 6.3, 6.4, 6.5, 6.7, 6.9 and Propositions 2.7, 2.11.

Proof of Lemma 6.3. Note that M0 = (a − b)I + km−2bJ , where I is k × k identity matrixand J is k × k matrix with every entry 1. For any real number λ, we have

M0 − λI = (a− b− λ)I + km−2bJ = km−2b(J − λ− a+ b

km−2bI).

Then det(M0 − λI) = 0 implies that det(J − λ−a+bkm−2b

I) = 0. The eigenvalue of J are k and 0 withmultiplicity k − 1, which implies λ = a− b, a+ (km−2 − 1)b and the desired result follows.

Proof of Lemma 6.4. Let Ij = (i(j−1)m−j+3, . . . , ijm−j). Then we have∑i1,...,ijm−j∈1,...,k

Mi1i2...imMim...i2m−1Mi2m−1...i3m−2 . . .Mi(j−1)m−(j−2)...ijm−ji1

=∑

I1,I2,...,Ij

∑i1,im,i2m−1,...,i(j−1)m−(j−2)∈1,2,...,k

Mi1I1imMimI2i2m−1 . . .Mi(j−1)m−(j−2)Iji1

=∑

I1,I2,...,Ij

Tr(M(I1)M(I2) . . .M(Ij)

),

where M(It) = (MiIts)ki,s=1 is a k × k matrix. By the definition of Mi1i2...im , it follows that

M(It) =

a b . . . bb b . . . b...

... . . ....

b b . . . b

+

b b . . . bb a . . . b...

... . . ....

b b . . . b

+ · · ·+

b b . . . bb b . . . b...

... . . ....

b b . . . a

+∑

It: elements are different

M(It)

=

a+ (k − 1)b kb . . . kb

kb a+ (k − 1)b . . . kb...

... . . ....

kb kb . . . a+ (k − 1)b

+ (km−2 − k)

b b . . . bb b . . . b...

... . . ....

b b . . . b

= M0,

which completes the proof.

Proof of Lemma 6.5. Let H be a graph on a subset of [n] with vertex set V(H) and edge setE(H). For any sequence of positive integers j2, j3, . . . , js, we have

s∏h=2

[Xhn]jh =∑

(Hhi)

s∏h=2

jh∏i=1

1Hhi .

Then

E0

s∏h=2

[Xhn]jh =∑

(Hhi)

E0

s∏h=2

js∏i=1

1Hsi =∑

(Hsi)∈B

E0

s∏h=2

jh∏i=1

1Hhi +∑

(Hhi)∈B

E0

s∏h=2

jh∏i=1

1Hhi .(47)

1

http://www.imstat.org/aos/


The summand in the first term of (47) can be calculated as below

E0

s∏h=2

jh∏i=1

1Hhi = EτE0

[s∏

h=2

jh∏i=1

1Hhi

∣∣∣τ] =s∏

h=2

jh∏i=1

Eτhi∏

(i1,...,im)∈E(Hhi)

d

nm−1=

k∏h=2

jh∏i=1

dh

nh(m−1).

Note that #B = n!(n−M1)

∏kh=2( 1

2h(m−2)!h)jh , M1 = (m − 1)

∑sh=2 hjh. Hence the first term in the

right hand side of (47) by Lemma 6.3 is

#B ×s∏

h=2

jh∏i=1

dh

kh(m−1)nh(m−1)=

n!

(n−M1)!nM1

s∏h=2

[dh

2h(m− 2)!h

]jh→

s∏h=2

λjhh .

For (Hhi) ∈ B, H = ∪Hhi has at most M1 − 1 vertices and∑s

h=2 hjh hyperedges, and hence|V(H)| < |E(H)|(m− 1), and

E0

s∏h=2

jh∏i=1

1Hhi =∏

(i1,...,im)∈E(H)

( a

nm−1

)1[τu=τv ](

b

nm−1

)1[τu 6=τv ]

≤( a

nm−1

)|E(H)|.

There are(

n|V(H)|

)|V(H)|! graphs isomorphic to H. Then∑

H′ isomorphic to H

E1[1H′ |τ ] ≤( a

nm−1

)|E(H)|(

n

|V(H)|

)|V(H)|!→ 0.

Since the number of isomorphism classes is bounded, the second term in the right hand side of(47) goes to zero. Hence, E0

∏sh=2[Xhn]jh →

∏sh=2 λ

jhh , which completes the proof by Lemma 2.8

in Wormald [60].

Proof of Lemma 6.7. We only need to find Cov(σu, σu⊗τu), Cov(τu, σu⊗τu) and V ar(σu⊗τu).

Cov(σu, σu ⊗ τu) = E[(σu − p)σTu ⊗ τTu ]

= E

(1[σu = 1]− p)1[σu = 1]τTu (1[σu = 1]− p)1[σu = 2]τTu . . . (1[σu = 1]− p)1[σu = k]τTu(1[σu = 2]− p)1[σu = 1]τTu (1[σu = 2]− p)1[σu = 2]τTu . . . (1[σu = 2]− p)1[σu = k]τTu

......

......

(1[σu = k]− p)1[σu = 1]τTu (1[σu = k]− p)1[σu = 2]τTu . . . (1[σu = k]− p)1[σu = k]τTu

=

(p− p2)pT −p2pT . . . −p2pT

−p2pT −(p− p2)pT . . . −p2pT

......

......

−p2pT −p2pT . . . −(p− p2)pT

= V ⊗ pT .

Similarly one can get Cov(τu, σu ⊗ τu) = pT ⊗ V . The variance of σu ⊗ τu can be calculated as

Cov(σu ⊗ τu, σu ⊗ τu) = E[(σu ⊗ τu − p⊗ p)σTu ⊗ τTu ]

= E

(1[σu = 1]τu − pp)1[σu = 1]τTu . . . (1[σu = 1]τu − pp)1[σu = k]τTu(1[σu = 2]τu − pp)1[σu = 1]τTu . . . (1[σu = 2]τu − pp)1[σu = k]τTu

......

...(1[σu = k]τu − pp)1[σu = 1]τTu . . . (1[σu = k]τu − pp)1[σu = k]τTu

=

p2Ik − p4Jk −p4Jk . . . −p4Jk−p4Jk p2Ik − p4Jk . . . −p4Jk

......

......

−p4Jk −p4Jk . . . p2Ik − p4Jk

= p2Ik2 − p4Jk2 .


Note that (Ik ⊗p)V = V ⊗p, V (Ik ⊗pT ) = V ⊗pT , (p⊗ Ik)V = p⊗ V , V (pT ⊗ Ik) = pT ⊗ V .Direct computation yields RTΣR = Λ and

Λ1R−1A(R−1)TΛ1

= Λ1

Ik 0 Ik ⊗ pT

0 Ik pT ⊗ Ik0 0 Ik2

c1Ik 0 00 c1Ik 00 0 c2Ik2

Ik 0 00 Ik 0

Ik ⊗ p p⊗ Ik Ik2

Λ1

= Λ1

c1Ik 0 c2Ik ⊗ pT

0 c1Ik c2pT ⊗ Ik

0 0 c2Ik2

Ik 0 00 Ik 0

Ik ⊗ p p⊗ Ik Ik2

Λ1

= Λ1

(c1 + c2p)Ik c2p2Jk c2Ik ⊗ pT

c2p2Jk (c1 + c2p)Ik c2p

T ⊗ Ikc2Ik ⊗ p c2p⊗ Ik c2Ik2

Λ1

=

Ik√p 0 0

0 Ik√p 0

0 0 Ikp

(c1 + c2p)V

2 c2p2V JkV c2V (Ik ⊗ pT )Ω2

c2p2V JkV (c1 + c2p)V

2 c2V (pT ⊗ Ik)Ω2

c2Ω2(Ik ⊗ p)V c2Ω2(p⊗ Ik)V c2Ω22

Ik√p 0 0

0 Ik√p 0

0 0 Ikp

.Note that V Jk = JkV = 0,

c1 + c2p =

(m2

)m!d

(b− d)(a− b)km−2

+

(m2

)m!d

(a− b)2

k2(m−2)

1

k= 0,

Ω2(Ik ⊗ p)V = (V2 − p2V ⊗ Jk − p2Jk ⊗ V )(Ik ⊗ p)V

= V2(Ik ⊗ p)V − p(V ⊗ p)V = p2(V ⊗ p)− p(V ⊗ p)(pIk − p2Jk) = 0,

and V (pT ⊗ Ik)Ω2 = V (Ik ⊗ pT )Ω2 = Ω2(p⊗ Ik)V = 0, which yields the desired result.Let Q = (Λ1R

−1)T and Z ∼ N(0, Ik2). Then the covariance matrix Σ can be decomposed as

Σ = (R−1)TΛR−1 = (Λ1R−1)T (Λ1R

−1) = QQT .

HenceρAρT → ZTQTAQZ = ZTΛ1R

−1A(R−1)TΛ1Z = c2ZTΩ2Z.

Note Ω22 = p2Ω2 implies the eigenvalues of Ω2 are either 0 or p2 and

Tr(Ω2) = Tr(V2 − p2V ⊗ Jk − p2Jk ⊗ V

)= Tr

(p2Ik2 − p3Ik ⊗ Jk − p3Jk ⊗ Ik + p4Jk2

)= k2p2 − p3k2 − p3k2 + p4k2 =

(k − 1)2

k2.

Hence Ω2 has (k − 1)2 eigenvalues p2 with other eigenvalues 0. Then c2ZTΩ2Z ∼ c2p

2χ2(k−1)2 .

Note that we can rewrite Zn as

Zn =

(m2

)m!d

(a− b)2

k2(m−2)

( k∑s,t

ρ2st −

1

k

[ k∑s=1

ρ2s0 +

k∑t=1

ρ20t

])

=1

2(m− 2)!d

(a− b)2

k2(m−2)

k∑s,t=1

( 1√n

n∑u=1

(I[σu = s]− 1

k)(I[ηu = t]− 1

k))2.


Let fj = 1√n

∑ju=1

( (1[σu=1] − 1

k

) (1[ηu=1] − 1

k

), . . . ,

(1[σu=k] − 1

k

) (1[ηu=k] − 1

k

) )Tand dj = fj −

fj−1. Then ‖dj‖2 = 1n

(k−1)2

k2and b2∗ =

∑nj=1 ‖dj‖2 = (k−1)2

k2. By Theorem 3.5 in [55], we have for

any t > 0,

P

(exp

1

2(m− 2)!d

(a− b)2

k2(m−2)‖fn‖2

> t

)= P

(1

2(m− 2)!d

(a− b)2

k2(m−2)‖fn‖2 > log(t)

)

= P

‖fn‖ >√√√√ log(t)

12(m−2)!d

(a−b)2k2(m−2)

≤ 2 exp

(− log(t)

κ(k − 1)2

)= 2t

− 1κ(k−1)2 .

Hence, if κ(k − 1)2 < 1, exp(Zn)∞n=1 is uniformly integrable.

Proof of Proposition 2.7. For convenience, we denote a1 = annm−1 and b1 = bn

nm−1 . Under H0,we have a1 = b1, and then

T = (EW1)3(m−2l)[b31 −

(b21b1

)3]= 0.

Under H1, k ≥ 2 and a1 > b1. For l = 1, direct computation yields

T = (EW1)3(m−2) (k − 1)(a1 − b1)3

k3(m−1)6= 0.

Next we assume l ≥ 2, let E1 = (EW1)−mE, V1 = (EW1)−2(m−l)V and T1 = (EW1)−3(m−2l)T . Then

T = (EW1)3(m−2l)[T1 −

( V1

E1

)3].

We calculate T1E31 − V 3

1 to get the following

T1E31 − V 3

1 = (a1 − b1)6 1− k−1

k6m−3l−4+ 3(a1 − b1)5b1

( kl − 2

k5m−2l−3+

1

k5m−l−4

)+3(a1 − b1)4b21

(kl − 1− k−2l+1

k4m−3l−2+

1

k4(m−1)

)+(a1 − b1)3b31

(1− 3k−2l+1

k3m−3l−1+

2

k3m−3

).(48)

Clearly, if k ≥ 2, a1 > b1 > 0 and l ≥ 2, each term in the right hand side of (48) is positive, whichimplies that T1E

31 − V 3

1 > 0 and hence T 6= 0.

Before proving Lemma 6.9, we introduce some notation and preliminary. For any tensors A,B,C,define

C2m−l(A,B) = Ai1:imBim−l+1:i2m−l +Ai2:im+1Bim−l+2:i2m−li1 + · · ·+Ai2m−li1:im−1Bim−l:i2m−l−1,

C3(m−l)(A,B,C) = Ai1:imBim−l+1:i2m−lCi2m−2l+1:i3(m−l)i1:il +Ai2:im+1Bim−l+2:i2m−l+1Ci2m−2l+2:i3(m−l)i1:il+1

+ · · ·+Aim−l:i2m−l−1Bi2(m−l):i3(m−l)i1:il−1

Ci3(m−l)i1:im−1 .

The proof of Lemma 6.9 relies on the following high-moments driven asymptotic result due to Halland Heyde [39].


Theorem 6.10 (Hall and Heyde, 2014). Suppose that for every n ∈ N and ξn →∞ the randomvariables Xn,1, . . . , Xn,ξn are a martingale difference sequence relative to an arbitrary filtration

Fn,1 ⊂ Fn,2 ⊂ . . . Fn,ξn. If (1)∑ξn

i=1 E(X2n,i|Fn,i−1)→ 1 in probability, (2)

∑ξni=1 E(X2

n,iI[|Xn,i| >ε]|Fn,i−1)→ 0 in probability for every ε > 0, then

∑ξni=1Xn,i → N(0, 1) in distribution.

Proof of Lemma 6.9. Let Wi1:im = Wi1Wi2 . . .Wim , ηi1:im = (a1 − b1)I[σi1 = σi2 = · · · =σim ] + b1 and θi1:im = ηi1:imWi1:im . Clearly E(Ai1:im |W,σ) = θi1:im .

Firstly, we show equation (42). Write E − E as

E − E =(E − E(E|W,σ)

)+(E(E|W,σ)− E(E|σ)

)+(E(E|σ)− E

).

Note that the three terms in the right hand side are mutually uncorrelated. Hence

(49) E(E − E)2 = E(E − E(E|W,σ)

)2+ E

(E(E|W,σ)− E(E|σ)

)2+ E

(E(E|σ)− E

)2.

It’s easy to check that Ai1:im and Aj1:jm are conditionally independent if i1 : im 6= j1 : jm. For thefirst term, we have

E(E − E(E|W,σ)

)2= E

(1(nm

) ∑i∈c(m,n)

(Ai1:im − θi1:im)

)2

=1(nm

)2 ∑i∈c(m,n),j∈c(m,n)

E(Ai1:im − θi1:im)(Aj1:jm − θj1:jm)

=1(nm

)2 ∑i∈c(m,n)

E(Ai1:im − θi1:im)2

=1(nm

)2 ∑i∈c(m,n)

Eθi1:im(1− θi1:im)

≤ 1(nm

)2 ∑i∈c(m,n)

Eθi1:im

=1(nm

)2 ∑i∈(m,n)

(EW1)m(a1 + (km−1 − 1)b1

km−1

)=

(EW1)m(nm

) (a1 + (km−1 − 1)b1km−1

)= O

( a1

nm

).(50)

For the third term in (49), one has

E(E(E|σ)− E

)2

= E( 1(

nm

) ∑i∈c(m,n)

(EW1)m(ηi1:im − Eηi1:im))2

= (EW1)2mE( 1(

nm

) ∑i∈c(m,n)

(a1 − b1)(I[σi1 : σim ]− P[σi1 : σim ]

))2

≤ (EW1)2m2(a21 + b21)E

( 1(nm

) ∑i∈c(m,n)

(I[σi1 : σim ]− P[σi1 : σim ]

))2.(51)


Note that

E( 1(

nm

) ∑i∈c(m,n)

(I[σi1 : σim ]− P[σi1 : σim ]

)2)=

1(nm

)2 ∑i∈c(m,n),j∈c(m,n)

E(I[σi1 : σim ]− P[σi1 : σim ]

)(I[σj1 : σjm ]− P[σj1 : σjm ]

)(52)

If there is no repeated index in i1 : im and j1 : jm, then



)= 0.

If there is only one repeated index in i1 : im and j1 : jm, say, i1 = j1 and other indices are different,then



)=

k

k2m−1− 2

k

km1

km−1+

1

k2(m−1)= 0.

If there are two or more indices in i1 : im and j1 : jm are the same, it is easy to verify that

0 < E(I[σi1 : σim ]− P[σi1 : σim ]


)≤ 1.

Hence, by (51) and (52), we have

E(E(E|σ)− E

)2= O

((a2

1 + b21)1(nm

)2(nm)(

n

m− 2

))= O

(a21

n2

).(53)

For the second term in (49), we have

E(E(E|W,σ)− E(E|σ)

)2= E

( 1(nm

) ∑i∈c(m,n)

ηi1:im(Wi1:im − EWi1:im))2.(54)

Note that for some constants cs1 , cs1s2 , . . . , cs1:sm−1 dependent on EW1, 1 ≤ s1, . . . , sm−1 ≤ m, onehas

Wi1:im − EWi1:im =m∑

s1=1

cs1(Wis1− EWis1

) +∑

1≤s1 6=s2≤mcs1s2(Wis1

− EWis1)(Wis2

− EWis2)

+ · · ·+ (Wi1 − EWi1)(Wi2 − EWi2) . . . (Wim − EWim).(55)

Clearly, the summation terms in (55) are mutually uncorrelated. And for Wi1 − EWi1 , we have

E( 1(

nm

) ∑i∈c(m,n)

ηi1:im(Wi1 − EWi1))2

=1(nm

)2 ∑i∈c(m,n),j∈c(m,n)

E(ηi1:imηj1:jm(Wi1 − EWi1)(Wj1 − EWj1)

)=

1(nm

)2O(a21

(n

m

)(n

m− 1

))= O

(a21

n

).(56)

It’s easy to verify that the terms∏ts=1(Wis − EWis) (t ≥ 2) are of higher order. By equation (54),

E(E(E|W,σ)− E(E|σ)

)2= O

(a21

n

).(57)


Combining (50), (53) and (57) yields (42).Next we prove (43). We can similarly decompose the mean square as

(58) E(V − V )2 = E(V − E(V |W,σ)

)2+ E

(E(V |W,σ)− E(V |σ)

)2+ E

(E(V |σ)− V

)2.

Firstly we have the following decomposition

Ai1:imAim−l+1:i2m−l − θi1:imθim−l+1:i2m−l

= (Ai1:im − θi1:im)(Aim−l+1:i2m−l − θim−l+1:i2m−l)

+(Ai1:im − θi1:im)θim−l+1:i2m−l + θi1:im(Aim−l+1:i2m−l − θim−l+1:i2m−l),

from which it follows

V − E(V |W,σ)

=1(n

2m−l) ∑i∈c(2m−l,n)

C2m−l(A)− C2m−l(θ)

2m− l

=1(n

2m−l) ∑i∈c(2m−l,n)

C2m−l(A− θ)2m− l

+1(n

2m−l) ∑i∈c(2m−l,n)

C2m−l(A− θ, θ) + C2m−l(θ,A− θ)2m− l

.(59)

In the last equation of (59), the first summation and the second summation are conditionallyuncorrelated. Hence

E(V − E(V |W,σ)

)2

= E( 1(

n2m−l

) ∑i∈c(2m−l,n)

C2m−l(A− θ)2m− l

)2

+E( 1(

n2m−l

) ∑i∈c(2m−l,n)

C2m−l(A− θ, θ) + C2m−l(θ,A− θ)2m− l

)2.(60)

The terms in C2m−l(A− θ) are also conditionally uncorrelated and

E( 1(

n2m−l

) ∑i∈c(2m−l,n)

(Ai1:im − θi1:im)(Aim−l+1:i2m−l − θim−l+1:i2m−l)

2m− l

)2

=1(n

2m−l)2 ∑

i∈c(2m−l,n)

E(Ai1:im − θi1:im)2(Aim−l+1:i2m−l − θim−l+1:i2m−l)2

(2m− l)2

=1(n

2m−l)2O(a2

1

(n

2m− l

))= O

( a21

n2m−l

),(61)

which is the order of the first term in (60). For the second summand term in (60), one has

E( 1(

n2m−l

) ∑i∈c(2m−l,n)

(Ai1:im − θi1:im)θim−l+1:i2m−l

2m− l

)2

=1(n

2m−l)2 ∑

i∈c(m,n),im<jm+1<...,j2m−l≤n

E(Ai1:im − θi1:im)2θim−l+1:i2m−lθim−l+1:imjm+1:j2m−l

(2m− l)2

=1(n

2m−l)2O(a3

1

(n

2m− l

)(n

m− l

))= O

( a31

nm

).(62)


Hence, it follows from (61) and (62) that

(63) E(V − E(V |W,σ)

)2= O

( a21

nm

).

For middle term in (58), by definition, it’s equal to

E(E(V |W,σ)− E(V |σ)

)2= E

( 1(n

2m−l) ∑i∈c(2m−l,n)

C2m−l(θ)− E(C2m−l(θ)|σ)

2m− l

)2.

The first term in C2m−l(θ)− E(C2m−l(θ)|σ) is(Wi1:im−lW

2im−l+1:imWim+1:i2m−l − (EW 2

1 )l(EW1)2(m−l))ηi1:imηim−l+1:i2m−l ,

and we only need to bound this term since the remaining 2m−l−1 terms can be similarly bounded.Let δs = 2 if s = m− l + 1, . . . ,m and δs = 1 otherwise. For generic bounded constants cs1 , cs1s2 ,. . . , cs1...s2m−l−1

, the following expansion is true.

Wi1:im−lW2im−l+1:imWim+1:i2m−l − (EW 2

1 )l(EW1)2(m−l)

=2m−l∑s1=1

cs1(Wδs1is1− EW δs1

is1) +

∑1≤s1 6=s2≤2m−l

cs1s2(Wδs1is1− EW δs1

is1)(W

δs2is2− EW δs2

is2)

+ · · ·+2m−l∏s1=1

(Wδs1is1− EW δs1

is1)(64)

Clearly, the summation terms in (64) are mutually uncorrelated. For any s1,

E( 1(

n2m−l

) ∑i∈c(2m−l,n)


is1)ηi1:imηim−l+1:i2m−l

2m− l

)2

=1(n

2m−l)2 ∑

i∈c(2m−l,n),j∈c(2m−l,n)

E(Wδs1is1− EW δs1

is1)(W

δs1js1− EW δs1

js1)O(a4)

(2m− l)2

=1(n

2m−l)2O(a4)E(W

δs1is1− EW δs1

is1)2

(n

2m− l

)(n

2m− l − 1

)= O

(a4

n

).

It’s easy to verify that the product terms of Wδs1is1− EW δs1

is1are of higher order. Hence

(65) E(E(V |W,σ)− E(V |σ)

)2= O

(a41

n

).

The last term in (58) can be expressed as

E(E(V |σ)− V

)2= V ar

( 1(n

2m−l) ∑c(i,2m−l,n)

C2m−l(η)

2m− l

)= O

(V ar

( 1(n

2m−l) ∑c(i,2m−l,n)

ηi1:imηim−l+1:i2m−l

2m− l

)).(66)


To find the variance, let H ⊂ [k]2m−l. We have

E( ∑i∈c(2m−l,n)

∑(his )∈H

( 2m−l∏s=1

I[σis = his ]− E2m−l∏s=1

I[σis = his ]))2

≤ |H|∑

(his )∈H


( 2m−l∏s=1


I[σis = his ]))2

.

Since

2m−l∏s=1


I[σis = his ]

=2m−l∑s1=1

cs1

(I[σis1 = his1 ]− EI[σis1 = his1 ]

)+

∑1≤s1 6=s2≤2m−l

cs1s2


)(I[σis2 = his2 ]− EI[σis2 = his2 ]

)

+ · · ·+2m−l∏s=1

(I[σis = his ]− EI[σis = his ]

),

and



))2

=∑

i∈c(2m−l,n),j∈c(2m−l,n)

E(I[σis1 = his1 ]− EI[σis1 = his1 ]

)(I[σjs1 = hjs1 ]− EI[σjs1 = hjs1 ]

)= O

(n2(2m−l)−1

),

then

(67) E( ∑i∈c(2m−l,n)

∑(his )∈H

( 2m−l∏s=1


I[σis = his ]))2

= O(n2(2m−l)−1

).

Note that

ηi1:imηim−l+1:i2m−l = (a1 − b1)2I[σi1 : σi2m−l ] + (a1 − b1)b1I[σi1 : σim ]

+(a1 − b1)b1I[σim−l+1: σi2m−l ] + b21.


Then by (67) we have

V ar( 1(

n2m−l

) ∑i∈c(2m−l,n)

ηi1:imηim−l+1:i2m−l

2m− l

) V ar

( 1(n

2m−l) ∑i∈c(2m−l,n)

(a1 − b1)2I[σi1 : σi2m−l ]

2m− l

)+V ar

( 1(n

2m−l) ∑i∈c(2m−l,n)

(a1 − b1)b1I[σi1 : σim ]

2m− l

)+V ar

( 1(n

2m−l) ∑i∈c(2m−l,n)

(a1 − b1)b1I[σim−l+1: σi2m−l ]

2m− l

) a4

1(n

2m−l)2n2(2m−l)−1 +

a41(n

2m−l)2n2(2m−l)−1 +

a41(n

2m−l)2n2(2m−l)−1 = O

(a41

n

).(68)

By (66) and (68), one gets

(69) E(E(V |σ)− V

)2= O

(a41

n

).

From (63), (65), (69) and the condition nl−1 an bn, we conclude (43).In the following, we prove (44). Similar to the previous proof, we have

T − T =(T − E(T |W,σ)

)+(E(T |W,σ)− E(T |σ)

)+(E(T |σ)− T

),

and

(70) E(T − T )2 = E(T − E(T |W,σ)

)2+ E

(E(T |W,σ)− E(T |σ)

)2+ E

(E(T |σ)− T

)2.

For the second expection, one has

E(E(T |W,σ)− E(T |σ)

)2= E

( 1(n

3(m−l)) ∑i∈c(3(m−l),n)

C3(m−l)(θ)− EC3(m−l)(θ)

m− l

)2.

The first term in C3(m−l)(θ)− EC3(m−l)(θ) is

ηi1:imηim−l+1:i2m−lηi2m−2l−1:i3(m−l)i1:il

×(Wi1:imWim−l+1:i2m−lWi2m−2l+1:i3(m−l)i1:il − EWi1:imWim−l+1:i2m−lWi2m−2l+1:i3(m−l)i1:il

),

and there are m− 1 terms in it. Let δs = 2 if s = m− l + 1, . . . ,m or s = 2m− 2l + 1, . . . , 2m− land δs = 1 otherwise. Then following decomposition holds.

Wi1:imWim−l+1:i2m−lWi2m−2l−1:i3(m−l)i1:il − EWi1:imWim−l+1:i2m−lWi2m−2l−1:i3(m−l)i1:il

=

3(m−l)∑s1=1

cs1(Wδs1is1− EW δs1

is1) +

3(m−l)∑1≤s1 6=s2≤3(m−l)

cs1s2(Wδs1is1− EW δs1

is1)(W

δs2is2− EW δs2

is2)

+ · · ·+3(m−l)∏s1=1


is1).


Note that

E( 1(

n3(m−l)

) ∑i∈c(3(m−l),n)

ηi1:imηim−l+1:i2m−lηi2m−2l−1:i3(m−l)i1:il(Wδs1is1− EW δs1

is1)

m− l

)2

=1(n

3(m−l))2 ∑

i∈c(3(m−l),n),j∈c(3(m−l),n)

O(a61)E(W

δs1is1− EW δs1

is1)(W

δs1js1− EW δs1

js1)

(m− l)2= O

(a61

n

),

and the product terms of Wδs1is1− EW δs1

is1are of higher order. Hence,

(71) E(E(T |W,σ)− E(T |σ)

)2= O

(a61

n

).

For the third expectation in (70), similar to (66), one has

E(E(T |σ)− T

)2= V ar

( 1(n

3(m−l)) ∑i∈c(3(m−l),n)

C3(m−l)(η)

m− l

) V ar

( 1(n

3(m−l)) ∑i∈c(3(m−l),n)

ηi1:imηim−l+1:i2m−lηi2m−2l−1:i3(m−l)i1:il

m− l

) O(

a61

n).(72)

For the first expectation in (70), note that

T − E(T |W,σ) =1(n

3(m−l)) ∑i∈c(3(m−l),n)

C3(m−l)(A)− C3(m−l)(θ)

m− l.

The first term in it can be decomposed as

Ai1:imAim−l+1:i2m−lAi2m−2l−1:i3(m−l)i1:il − θi1:imθim−l+1:i2m−lθi2m−2l−1:i3(m−l)i1:il

= (Ai1:im − θi1:im)θim−l+1:i2m−lθi2m−2l−1:i3(m−l)i1:il + (Aim−l+1:i2m−l − θim−l+1:i2m−l)θi1:imθi2m−2l−1:i3(m−l)i1:il

+(Ai2m−2l−1:i3(m−l)i1:il − θi2m−2l−1:i3(m−l)i1:il)θi1:imθim−l+1:i2m−l

+(Ai1:im − θi1:im)(Aim−l+1:i2m−l − θim−l+1:i2m−l)θi2m−2l−1:i3(m−l)i1:il + . . .

+(Ai1:im − θi1:im)(Aim−l+1:i2m−l − θim−l+1:i2m−l)(Ai2m−2l−1:i3(m−l)i1:il − θi2m−2l−1:i3(m−l)i1:il).

Note that

E( 1(

n3(m−l)

) ∑i∈c(3(m−l),n)

(Ai1:im − θi1:im)(Aim−l+1:i2m−l − θim−l+1:i2m−l)θi2m−2l−1:i3(m−l)i1:il

)2

= O( a4

1(n

3(m−l))2n3(m−l)n3(m−l)−(2m−l)

)= O

( a41

n2m−l

),(73)

and

E( 1(

n3(m−l)

) ∑i∈c(3(m−l),n)

(Ai1:im − θi1:im)θim−l+1:i2m−lθi2m−2l−1:i3(m−l)i1:il

)2

= O( a5

1(n

3(m−l))2n3(m−l)n3(m−l)−m

)= O

( a51

nm

).(74)


Let

Gi1:i3(m−l) = (Ai1:im − θi1:im)(Aim−l+1:i2m−l − θim−l+1:i2m−l)(Ai2m−2l−1:i3(m−l)i1:il − θi2m−2l−1:i3(m−l)i1:il).

Then

E( 1(

n3(m−l)

) ∑i∈c(3(m−l),n)

Gi1:i3(m−l)

)2=

1(n

3(m−l))2 ∑

i∈c(3(m−l),n)

EG2i1:i3(m−l)

1(n

3(m−l))2 ∑

i∈c(3(m−l),n)

Eθi1:imθim−l+1:i2m−lθi2m−2l−1:i3(m−l)i1:il

=T(n

3(m−l)) = O

( a31

n3(m−l)

).(75)

Under the condition an bn n3l−23 , by (70) , (71), (72), (73), (74) and (75), we get (44).

In the end, we show the asymptotic normality by using Theorem 6.10. Let

Wn =∣∣∣ 1n

n∑i=1

W 2i − EW 2

1

∣∣∣ ≤ n− 13

, Θn =

√√√√E( ∑i∈c(3(m−l),n)

Gi1:i3(m−l)

)2.

Clearly, Θn √n3(m−l)a3

1 →∞ if nl−1 an bn. Define

Sn,t =

∑i∈c(3(m−l),t)Gi1:i3(m−l)

Θn,

and let Xn,t = Sn,t−Sn,t−1. We show the asymptotic normality by applying the martingale centrallimit theorem to Xn,t conditioning on W and σ. Simple calculation yields that

Xn,t =

∑i∈c(3(m−l)−1,t−1)Gi1:i3(m−l)−1t

Θn,

and E(Xn,t|Fn,t−1) = 0. Hence, Xn,t is martingale difference. Note that

E( n∑t=1

E(Sn,t − Sn,t−1)2|Fn,t−1,W, σ)

(76)

=n∑t=1

(E(S2

n,t|W,σ)− E(S2n,t−1|W,σ)

)= E(S2

n,n|W,σ) = 1,

and

V ar( n∑t=1

E[(Sn,t − Sn,t−1)2|Fn,t−1,W, σ])

=1

Θ4n

V ar( n∑t=1

E( ∑i∈c(3(m−l)−1,t−1)

Gi1:i3(m−l)−1t

)2|Fn,t−1,W, σ

)


=1

Θ4n

V ar( n∑t=1

∑i∈c(3(m−l)−1,t−1)

(Ai1:im − θi1:im)2(Aim−l+1:i2m−l − θim−l+1:i2m−l)2O(a1)|W,σ

)

=O(a2

1)

Θ4n

n∑s,t=1

∑i∈c(3(m−l)−1,s−1),j∈c(3(m−l)−1,t−1)

Cov(

(Ai1:im − θi1:im)2(Aim−l+1:i2m−l − θim−l+1:i2m−l)2,

(Aj1:jm − θj1:jm)2(Ajm−l+1:j2m−l − θjm−l+1:j2m−l)2)

=O(a5

1)

Θ4n

∑s≤t

(t

3(m− l)− 1

)(s

2m− 3l − 1

)

=O(a5

1)

Θ4n

n∑s=1

(s

3(m− l)− 1

) n∑s=1

(s

2m− 3l − 1

)=

O(a51)

Θ4n

n3(m−l)n2m−3l =O(a5

1)

n6(m−l)a61

n3(m−l)n2m−3l =1

a1nm→ 0.

Equations (76) and (77) implies that

n∑t=1

E(

(Sn,t − Sn,t−1)2|Fn,t−1,W, σ)→ 1,

which is condition (1) in Theorem 6.10.Next we check the Lindeberg condition. For any ε > 0, we have

n∑t=1

E(

(Sn,t − Sn,t−1)2I[|Sn,t − Sn,t−1| > ε]∣∣∣Fn,t−1,W, σ

)≤

n∑t=1

√E(

(Sn,t − Sn,t−1)4∣∣∣Fn,t−1,W, σ

)√P[|Sn,t − Sn,t−1| > ε]

∣∣∣Fn,t−1,W, σ)

≤ 1

ε2

n∑t=1

E(

(Sn,t − Sn,t−1)4∣∣∣Fn,t−1,W, σ

)=

1

ε2Θ4n

n∑t=1

E(( ∑

i∈c(3(m−l)−1,t−1)

Gi1:i3(m−l)−1t

)4∣∣∣Fn,t−1,W, σ).(77)

For convenience, let c(i) = c(i, 3(m − l) − 1, t − 1), D1i = Ai1:im − θi1:im , D2i = Aim−l+1:i2m−l −θim−l+1:i2m−l , and D3i = Ai2m−2l−1:i3(m−l)i1:il − θi2m−2l−1:i3(m−l)i1:il . Then

E(( ∑

i∈c(3(m−l)−1,t−1)

Gi1:i3(m−l)−1t

)4|Fn,t−1,W, σ

)= E

( ∑c(i),c(j),c(r),c(s)

D1iD2iD3iD1jD2jD3jD1rD2rD3rD1sD2sD3s

∣∣∣Fn,t−1,W, σ).(78)

For indices i2m−2l−1 : i3(m−l)i1 : il, j2m−2l−1 : j3(m−l)j1 : jl, r2m−2l−1 : r3(m−l)r1 : rl and s2m−2l−1 :s3(m−l)s1 : sl, where i3(m−l) = j3(m−l) = r3(m−l) = s3(m−l) = t, either all of them are the same ortwo of them are the same and the other two are the same. Otherwise, the conditional expectationin (78) given W,σ vanishes. The same is true for the other two sets of indices. We consider the case


i1 : i3(m−l)−1 = j1 : j3(m−l)−1 and r1 : r3(m−l)−1 = s1 : s3(m−l)−1 for example. Then by (78), (77) isequal to

1

ε2Θ4n

n∑t=1

( ∑c(i),c(r)

ED21iD

22iD

23iD

21rD

22rD

23r

∣∣∣Fn,t−1,W, σ)

=nO(a6

1)

ε2n6(m−l)a61

n3(m−l)−1n3(m−l)−1 → 0.

The other cases can be similarly proved. Hence,

n∑t=1

E(

(Sn,t − Sn,t−1)2I[|Sn,t − Sn,t−1| > ε]∣∣∣Fn,t−1,W, σ

)→ 0,

which is condition (2) in Theorem 6.10. Then we conclude that conditional on W ∈ Wn and σ,∑i∈c(3(m−l),n)Gi1:i3(m−l)

Θn=

n∑t=1

(Sn,t − Sn,t−1)→ N(0, 1).(79)

Since Θn √(

n3(m−l)

)T , and(

n

3(m− l)

)(m− l)

(T −T

)=

∑i∈c(3(m−l),n)

Gi1:i3(m−l) + · · ·+∑

i∈c(3(m−l),n)

Gim−l:i3(m−l)i1...im−l−1+o(1),

then (45) follows from the fact the terms in the right hand side of the above equation are uncorre-lated and a similar argument as in proving (79).

Proof of Corollary 3.1. The proof is similar to Theorems 2.1. Note that H2 and H3 areindependent given σ. Suppose k = 2 for simplicity. Rewrite

pij(σ) =(a2 − b2)I[σi = σj ] + b2

nα2,

pijk(σ) =(a3 − b3)I[σi = σj = σk] + b3

nα3,

qij = 1− pij(σ), qijk(σ) = 1− pijk(σ),

Yn = Eσ∏i<j

(pij(σ)

p02

)Aij(qij(σ)

q02

)1−Aij ∏i<j<k

(pijk(σ)

p03

)Aij7k(qijk(σ)

q03

)1−Aijk.

Then by a similar analysis of equation (20) in the paper it yields that

E0Y2n = Eση

∏i<j

(pij(σ)pij(η)

q02+qij(σ)qij(η)

q02

)×∏i<j<k

(pijk(σ)pijk(η)

q03+qijk(σ)qijk(η)

q03

)= (1 + o(1))Eση exp

(a2 − d2)2

d2nα2s

(2)2 +

(a2 − d2)(b2 − d2)

d2nα2s

(2)1 +

(b2 − d2)2

d2nα2s

(2)0

× exp

(a3 − d3)2

d3nα3s

(3)2 +

(a3 − d3)(b3 − d3)

d3nα3s

(3)1 +

(b3 − d3)2

d3nα3s

(3)0

,(80)


where s(m)2 = #1 ≤ i1 < · · · < im ≤ n : I[σi1 = · · · = σim ] + I[ηi1 = · · · = ηim ] = 2 for m = 2, 3,

s(m)1 and s

(m)0 are similarly defined and am, bm, dm corresponds to Hm for m = 2, 3.

For αm ≥ m, s(m)2 ,s

(m)1 , s

(m)0 are bounded by nαm . By the proof of Theorem 2.1, E0Y

2n → 1.

For αm ≥ m − 1 + δ with δ ∈ (0, 1), after a transformation of s(m)2 ,s

(m)1 , s

(m)0 as in the proof of

Theorem 2.1, each term in the exponent of (82) is uniformly integrable. Hence, E0Y2n → 1.

Proof of Corollary 3.2. The proof is similar to Theorem 2.5. Let

λh(m) =1

2h

(a+ (km−1 − 1)b

km−1(m− 2)!

)h,

δh(m) = (k − 1)( a− ba+ (km−1 − 1)b

)h.

Condition (A1) follows from Lemma 6.5.Next, we check condition (A2). Let S = 1, 2, . . . , k and Hm = (Hmhi)2≤h≤s,1≤i≤js be a sjs-tuple

of h-edge loose cycle Hmhi in Hm (m = 2, 3) for any integers s ≥ 2 and js ≥ 1. Define Xmhn as thenumber of h-edge loose cycles in the hypergraph Hm (m = 2, 3) and [x]j = x(x− 1) . . . (x− j + 1).Note that for any sequence of positive integers j2,. . . , js, we have

E0Yn[X23n]j2 . . . [X2sn]js [X32n]t2 . . . [X3ln]tl(81)

=∑

H2∈B2,H3∈B3

E0Yn1H21H3 +∑

H2∈B2,H3∈B3

E0Yn1H21H3

+∑

H2∈B2,H3∈B3

E0Yn1H21H3 +∑

H2∈B2,H3∈B3

E0Yn1H21H3

=∑

H2∈B2,H3∈B3

E0Yn1H21H3 + o(1)

→s∏

h=3

[λh(2)(1 + δh(2))]jhl∏

h=2

[λh(3)(1 + δh(3))]th .

where Bm is the collection of disjoint tuples Hm and Bm is the complement, that is, for anyHm ∈ Bm, there exist two cycles in Hm that have at least one vertex in common. The secondequality and the last step follows from the proof of Theorem 2.5.

Then we check condition (A3). By (A1) and (A2), we have µh(m)λh(m)−1 = λh(m)(1+δh(m))

λh(m) −1 = δh(m).

Besides, λh(m)δh(m)2 = 12h

((a−b)2

km−1(m−2)!(a+(km−1−1)b)

)h= κhm

2h . If κm < 1, then∑∞

h=2 λh(m)δh(m)2 <∞.

Lastly, we check condition (A4). Then by a similar analysis of equation (20) in the paper it yieldsthat

E0Y2n = Eση

∏i<j

(pij(σ)pij(η)

q02+qij(σ)qij(η)

q02

)×∏i<j<k

(pijk(σ)pijk(η)

q03+qijk(σ)qijk(η)

q03

)= (1 + o(1))Eση exp

(a2 − d2)2

d2nα2s

(2)2 +

(a2 − d2)(b2 − d2)

d2nα2s

(2)1 +

(b2 − d2)2

d2nα2s

(2)0

× exp

(a3 − d3)2

d3nα3s

(3)2 +

(a3 − d3)(b3 − d3)

d3nα3s

(3)1 +

(b3 − d3)2

d3nα3s

(3)0

,(82)


where s(m)2 = #1 ≤ i1 < · · · < im ≤ n : I[σi1 = · · · = σim ] + I[ηi1 = · · · = ηim ] = 2 for m = 2, 3,

s(m)1 and s

(m)0 are similarly defined and am, bm, dm correspond to Hm for m = 2, 3.

Let c(m)1 =

(m2 )m!d

(a−b)(b−d)km−2 , c

(m)2 =

(m2 )m!d

(a−b)2k2(m−2) ,

Zn = c(3)2

k∑s,t=1

ρ2st

[1 +

m−2∑i=1

1

k2i

(mi+2

)(m2

) ( ρst√n

)i]

+c(3)1

( k∑t=1

ρ20t

[1 +

m−2∑i=1

1

ki

(mi+2

)(m2

) ( ρ0t√n

)i]+

k∑s=1

ρ2s0

[1 +

m−2∑i=1

1

ki

(mi+2

)(m2

) ( ρs0√n

)i])≤ c(3)

2 (1 +1

3k2)

k∑s,t=1

ρ2st

and Zn = c(2)2

∑ks,t=1 ρ

2st + c

(2)1

(∑kt=1 ρ

20t +

∑ks=1 ρ

2s0

). Then

E0Y2n = (1 + o(1))Eση expZn + Zn

= (1 + o(1))Eση exp

(c(2)2 + c

(3)2 )

k∑s,t=1

ρ2st + (c

(2)1 + c

(3)1 )( k∑t=1

ρ20t +

k∑s=1

ρ2s0

)+ oP (1)

.

Note that

Zn + Zn ≤(c

(2)2 + c

(3)2 (1 +

1

3k2)) k∑s,t=1

ρ2st.

Denote τ = c(2)2 +c

(3)2 (1+ 1

3k2). Let fj = 1√

n

∑ju=1

( (1[σu=1]1[ηu=1] − 1

k2

), . . . ,

(1[σu=k]1[ηu=k] − 1

k2

) )Tand dj = fj − fj−1. Then ‖dj‖2 = 1

nk2−1k2

and b2∗ =∑n

j=1 ‖dj‖2 = k2−1k2

. By Theorem 3.5 in Pinelis([55]), for any t > 0,

P

(exp

c‖fn‖2

> t

)= P

(c‖fn‖2 > log(t)

)= P

(‖fn‖ >

√log(t)

c

)

≤ 2 exp

(− log(t)

2cb2∗

)= 2t

− 1(κ2+

κ33 (1+ 1

3k2)

)(k2−1)

.

Hence, the condition(κ2 + κ3

3 (1 + 13k2

))(k2 − 1) < 1 implies that expZn + Zn∞n=1 is uniformly

integrable.By the proof of Lemma 6.7, we conclude that

(c(2)2 + c

(3)2 )

k∑s,t=1

ρ2st + (c

(2)1 + c

(3)1 )( k∑t=1

ρ20t +

k∑s=1

ρ2s0

)


converges in distribution to (c(2)2 + c

(3)2 )k−2χ2

(k−1)2 . Then it follows that

E0Y2n → exp

−(

22

)2!d

(k − 1)2(a− b)2

k2(2−1)

exp

−(

32

)3!d

(k − 1)2(a− b)2

k2(3−1)

E exp

c(2)2 + c

(3)2

k2χ2

(k−1)2

= exp

−(

22

)2!d

(k − 1)2(a− b)2

k2(2−1)

exp

−(

32

)3!d

(k − 1)2(a− b)2

k2(3−1)

× exp

− (k − 1)2

2log(

1− 2c

(2)2 + c

(3)2

k2

)= exp

∞∑h=3

λh(2)δh(2)2

exp ∞∑h=2

λh(3)δh(3)2,

where we used the fact that

(k − 1)2

2

(2c(m)2

k2

)h 1

h=

(k − 1)2

2h

(a+ (km−1 − 1)b

km−1(m− 2)!

)h( (a− b)2

(a+ (km−1 − 1)b)2

)h= λh(m)δh(m)2.

Obviously, E0Yn = 1. Hence, H0 and H1 are contiguous.

Proof of Proposition 2.11. Under H ′0 and condition 1 ||W ||tt = O(||W ||1) = O(n) for2 ≤ t ≤ 12, we have

T1 = E[ 1

n6

∑i1,...,i6:distinct

Ai1i2i3Ai3i4i5Ai5i6i1

]=||W ||62||W ||31p3

0

n6+O

( ||W ||51p30

n6

),

E1 = E[ 1

n3

∑i1,i2,i3:distinct

Ai1i2i3

]=||W ||31p0

n3(1 + o(1)),

V1 = E[ 1

n5

∑i1,...,i5:distinct

Ai1i2i3Ai3i4i5

]=||W ||22||W ||41p2

0

n5(1 + o(1)).

Hence, T1 −(V1E1

)3= O

(||W ||51p30n6

). If ||W ||1 ||W ||22 and p2

0||W ||31 = o(1), we have

E(T1 − T1)2 =||W ||62||W ||31p3

0

n12(1 + o(1)).

Besides, direct calculation yields

E(E1 − E1)2 =||W ||31p0

n6(1 + o(1)),

E(V1 − V1)2 =( ||W ||22||W ||41p2

0

n10+||W ||33||W ||61p3

0

n10

)(1 + o(1)).

Then by equation (46), it’s easy to verify that

T1 −( V1

E1

)3= T1 −

( V1

E1

)3+ (T1 − T1) + oP

(√ ||W ||62||W ||31p30

n12

).(83)


As a result, under the conditions 1 ||W ||tt = O(||W ||1) = O(n) for 2 ≤ t ≤ 12, ||W ||1 ||W ||22and p2

0||W ||31 = o(1), we have

√n6(T1 −

(V1E1

)3)√T1

=

√n6(T1 − T1

)√T1

+ oP (1)

=

∑i1,...,i6:distinct(Ai1i2i3 −Wi1Wi2Wi3p0)(Ai3i4i5 −Wi3Wi4Wi5p0)(Ai5i6i1 −Wi5Wi6Wi1p0)

√n6T1

+ oP (1),

which converges in distribution to N(0, 1) under H ′0 by a similar proof as in Theorem 2.8.

Under H ′1, if we further assume 1 an bn ≤ n13 , then equation (84) still holds. By a similar

proof of Lemma 6.9, we conclude that

√n6

(T1−T1

)√T1

= Op(1) and

√n6(T1 −

(V1E1

)3)√T1

=

√n6(T1 −

(V1E1

)3)√T1

+OP (1) = δ1 +OP (1).

Then the power of the test goes to 1 if δ1 →∞.

Date post:	01-Dec-2021
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

Testing Community Structure for Hypergraphs

Documents