+ All Categories
Home > Documents > Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding...

Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding...

Date post: 05-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
172
Lower bounds and structural results in property testing of dense combinatorial structures Eyal Rozenberg Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012
Transcript
Page 1: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Lower bounds and structuralresults in property testing of

dense combinatorial structures

Eyal Rozenberg

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 2: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 3: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Lower bounds and structuralresults in property testing of

dense combinatorial structures

Research Thesis

Submitted in partial fulfillment of the requirements

for the degree of Doctor of Philosophy

Eyal Rozenberg

Submitted to the Senate

of the Technion — Israel Institute of Technology

Tevet 5772 Haifa January 2012

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 4: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 5: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

This research was carried out under the supervision of Prof. Eldar Fischer, in the Faculty

of Computer Science.

Most results in this thesis have been published as articles by the author and research

collaborators in conferences and journals during the course of the author’s doctoral

research period, the most up-to-date versions of which being:

Eldar Fischer and Eyal Rozenberg. Lower bounds for testing forbidden induced substructuresin bipartite-graph-like combinatorial objects. In Proceedings of RANDOM 2007, pages464–478. Springer, 2007.

Eldar Fischer and Eyal Rozenberg. Inflatable graph properties and natural property tests.In Proceedings of RANDOM 2011, pages 542–554, Berlin, Heidelberg, 2011. Springer-Verlag.

Oded Goldreich, Michael Krivelevich, Ilan Newman, and Eyal Rozenberg. Hierarchy theoremsfor property testing. In Oded Goldreich, editor, Property Testing, volume 6390 of LectureNotes in Computer Science, pages 289–294. Springer, 2010.

Acknowledgements

I would like to thank my advisor, Professor Eldar Fischer, for his help and guidance,

and the numerous times he has poked holes in my yet-unsound arguments leading to

actually valid proofs.

I am deeply indebted to Ori Avi-Noam, a dear friend with whom I had the pleasure

to serve in public office. I also wish to thank Gal Tamir, a third member of our Graduate

Student Organization (GSO) executive committee, who has helped me not to take things

too seriously, and who came through when I needed his support. I would also like to

acknowledge others along whom I have served at the Technion, for longer or shorter

whiles: Roee Engelberg, Mark Ishay, Moti Ronen, Ida Sivan, Daniel Vainsencher, Nadav

Shragai, Jonathan Braude, Yair Farber and other members of the Technion’s junior

and untenured staff. In this context I also wish to thank the staff of the GSO over

the years, especially Avi Kaufman, Efrat Valensi, Neta Dobrin and Tal Levi. Outside

of the Technion I am indebted to “friendly militants” from other universities, such as

Haifa University’s Gonen Ha-Cohen, Tel-Aviv University’s Ohad Carny, Ben-Gurion

University’s Sion Korren and my good friend Matan Prezma from the Hebrew University

in Jerusalem; and Daniel Mishori and Nitzan Hadas, who have taught me much.

I would like to thank Prof. Oded Goldreich, for his inspiration and advice, both

professional and otherwise, and for lending a sympathetic ear. I am also grateful to

Oded, along with Professors Michael Krivelevich and Ilan Newman as my collaborators

on part of this work. I also wish to thank Ronitt Rubinfeld, Arie Matsliah, Dana Ron

and Yoav Tzur who had been helpful with that same part. I thank Arie in particular,

not just for coming up with a useful counter-example, but also for the occasional couch

discussion at his office.

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 6: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

I also wish to thank Prof. Johann Makowsky, who had almost become my advisor.

Janos’ culture, knowledge, and broad intellectual horizons have been an inspiration for

me, and his door was (almost) always for an interesting discussion even if you haven’t

come to “talk shop”.

Thanks also goes to many denizens of our faculty over the past several years: My

roommates — Firas Swidan, Adi Mano and lastly Yossi Atiya; Raviv El’azar the riddler;

Uri Itai; Yossi Weinstein the quiet activist; Tigran the maintenance superman; and all

the rest.

Finally, I would like to thank Anat Greenstein and Iris Bar for their friendship over

these years; my brother Igal; and last but foremost, my father Jacob and my mother

Veronica, without whose loving support I never would have reached this goal.

The Technion’s funding of this research is hereby acknowledged.

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 7: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Contents

Abstract 1

1 Introduction 3

1.1 Overview of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Preliminaries 11

2.1 The dense model for property testing . . . . . . . . . . . . . . . . . . . . 11

2.1.1 General dense structures . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Features of dense structure property tests . . . . . . . . . . . . . . . . . 15

2.3 Features of dense structure properties . . . . . . . . . . . . . . . . . . . 16

2.4 Testing-Reductions between properties . . . . . . . . . . . . . . . . . . . 18

3 Inflatable properties and natural property tests 21

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2 Additional preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.1 On features of properties and of tests . . . . . . . . . . . . . . . 23

3.2.2 Fixed-order subgraph distributions of graphs . . . . . . . . . . . 26

3.3 Overview of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4 Naturalizing tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.5 Lower bounds for triangle-freeness testing . . . . . . . . . . . . . . . . . 33

3.6 One-sided error and natural tests . . . . . . . . . . . . . . . . . . . . . . 35

3.7 Inflatability and heredity of naturally-testable properties . . . . . . . . . 36

3.8 Natural testability and proximity-oblivious testing . . . . . . . . . . . . 38

3.9 Naturalization and inflatability in other dense structures . . . . . . . . . 39

3.9.1 Generalized preliminaries . . . . . . . . . . . . . . . . . . . . . . 40

3.9.2 Generalization of our main results . . . . . . . . . . . . . . . . . 41

4 Query complexity hierarchies for dense graphs and other models 45

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2 Hard properties decidable and testable in PTIME . . . . . . . . . . . . 47

4.2.1 The difficulties deciding hard-to-test properties in [GGR98] . . . 47

4.2.2 The alternative construction . . . . . . . . . . . . . . . . . . . . 48

4.3 A hierarchy of generic function properties . . . . . . . . . . . . . . . . . 52

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 8: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

4.3.1 Property construction . . . . . . . . . . . . . . . . . . . . . . . . 53

4.3.2 Lower and upper query complexity bounds . . . . . . . . . . . . 54

4.4 An aside: A hierarchy of bounded-degree graph properties . . . . . . . . 55

4.4.1 Lower and upper query complexity bounds . . . . . . . . . . . . 56

4.5 A hierarchy of PTIME-testable properties . . . . . . . . . . . . . . . . 58

4.5.1 Property construction . . . . . . . . . . . . . . . . . . . . . . . . 59

4.5.2 A query complexity lower bound for the constructed property . . 64

4.5.3 A test for the constructed property . . . . . . . . . . . . . . . . . 68

4.6 A hierarchy of monotone properties . . . . . . . . . . . . . . . . . . . . . 71

4.6.1 Property construction . . . . . . . . . . . . . . . . . . . . . . . . 71

4.6.2 A query complexity lower bound for the constructed property . . 75

4.6.3 A test for the constructed property . . . . . . . . . . . . . . . . . 78

4.7 A hierarchy of one-sided-testable properties . . . . . . . . . . . . . . . . 92

4.7.1 Property construction . . . . . . . . . . . . . . . . . . . . . . . . 92

4.7.2 A query complexity lower bound for the constructed property . . 95

4.7.3 A one-sided test for the constructed property . . . . . . . . . . . 96

5 Lower bounds for testing partite dense structures 99

5.1 Introduction and overview of results . . . . . . . . . . . . . . . . . . . . 99

5.2 Additional preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.3 A lower bound for colored bipartite graphs . . . . . . . . . . . . . . . . 101

5.3.1 Representing cyclic partite digraphs by matrices . . . . . . . . . 102

5.3.2 An initial hard-to-test matrix . . . . . . . . . . . . . . . . . . . . 103

5.3.3 Reducing the number of colors . . . . . . . . . . . . . . . . . . . 104

5.3.4 Proof of the lower bound . . . . . . . . . . . . . . . . . . . . . . 108

5.4 A lower bound for k-uniform k-partite hypergraphs . . . . . . . . . . . . 109

5.4.1 A hard-to-test tensor . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.4.2 Proof of the lower bound . . . . . . . . . . . . . . . . . . . . . . 110

6 Pseudo-testing hypergraph tuple partition properties 113

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

6.2 Additional preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6.2.1 Hypergraph tuple partition functions and named tuple decompo-

sitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6.2.2 Partitions and partition oracles . . . . . . . . . . . . . . . . . . . 117

6.2.3 Multi-colored hypergraph partition properties . . . . . . . . . . . 117

6.2.4 Tuple types and type estimators . . . . . . . . . . . . . . . . . . 119

6.3 An upper bound on pseudo-testing partition properties . . . . . . . . . . 121

6.3.1 Key Lemma: Low-damage tuple redistribution . . . . . . . . . . 122

6.3.2 Generating type estimators and partition oracles . . . . . . . . . 124

6.3.3 Distinguishing good and bad partition oracles . . . . . . . . . . . 129

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 9: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

6.4 A lower bound on testing partition properties . . . . . . . . . . . . . . . 131

6.4.1 Expressing basic constraints with density characteristics . . . . . 131

6.4.2 FOL constraints and density characteristic composition . . . . . 133

6.4.3 The reduction from testing triangles . . . . . . . . . . . . . . . . 137

7 Open Questions 139

7.1 Natural testing and inflatable properties . . . . . . . . . . . . . . . . . . 139

7.2 Hard properties and complexity hierarchies . . . . . . . . . . . . . . . . 141

7.3 Partite and multi-colored dense structures . . . . . . . . . . . . . . . . . 142

7.4 Hypergraph partition properties . . . . . . . . . . . . . . . . . . . . . . . 143

7.5 Expanding the testing model via ‘plugging’ . . . . . . . . . . . . . . . . 144

7.6 Ordered structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

Hebrew Abstract i

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 10: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 11: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Abstract

This thesis endeavors to deepen the understanding of the dense model for testing

properties of combinatorial structures such as graphs, hypergraphs, matrices and tensors.

This is achieved through the development of structural concepts regarding testing in

the dense model, which are then put to use: In formulating new lower bounds on the

query complexity for testing certain classes of such properties; in enhancing known

lower bounds; and in achieving hierarchy results with both upper and lower bounds.

We first focus on dense graphs, and consider natural testing: Property tests which

act entirely independently of the size of the graph being tested. We introduce the

notion of graph properties being inflatable — closed under taking (balanced) blowups

— and show that the query complexity of natural tests are related to the degree to

which a property is approximately hereditary and approximately inflatable. Specifically,

we show that for properties which are almost hereditary and almost inflatable, any

test can be made natural with little penalty in the number of queries. In the reverse

direction, we show that properties admitting natural tests are approximately inflatable

and approximately hereditary, with parameters depending on the test’s number of

queries. Using the technique for naturalization, we restore in part a claim of Goldreich

and Trevisan regarding testing hereditary properties, and generalize the relation between

one-sided and two-sided lower bounds on triangle-freeness testing; we also give a simple

explicit proof of a slight improvement of the best current explicitly-stated lower bound

on triangle-freeness testing. More generally, we explore the relations of the notion of

inflatability and other already-studied features of properties and property tests in the

dense graph model, such as one-sidedness, heredity, and proximity-oblivion. Finally, we

generalize these results to dense structures other than graphs.

From natural testing we turn to study tests which are highly-dependent on the size

of their input graph: We construct a property of dense graphs which is maximally-hard

to test, in terms of the number of queries necessary, but which can be efficiently decided,

and whose test is time-efficient. Using this and some already-established constructions

we prove several hierarchy theorems for the dense graph model, establishing that for

every possible reasonable function of the input graph size, there exists properties with

exactly this function as its query complexity — and with certain desirable features. We

prove a similar hierarchy theorem both for testing generic functions and graphs in the

sparse testing model. As with the results regarding natural tests, in reaching these

1

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 12: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

results we further explore, and make extensive use of, the concept of graph blowups.

We next present several results regarding testing dense structures which are essentially

different than (the more widely-studied) general graphs.

We give lower bound results regarding testing bipartite graphs with colored edges,

and k-partite k-uniform hypergraphs (which can be seen as testing matrices and tensors

over fixed finite fields, if coordinate order is disregarded). In this context, a previous

positive result showing that bipartite graphs are easily testable for freeness of forbidden

induced subgraphs is shown not to hold when edges can have multiple colors, or when

the ‘dimension’ is increased to k-partite k-uniform hypergraphs with k ≥ 3. A lower

bound is obtained, settling an open question of Alon, Fischer and Newman.

Two final results regard testing properties of general hypergraphs with multiple

edge relations (or colors), and more specifically, properties which are characterized by

partitions of vertex tuples, with density constraints on these partitions. We show that

such properties can be efficiently ‘pseudo-tested’, that is, one can distinguish whether or

not there exist partitions which approximately satisfy the density constraints. However,

this ‘pseudo-testing’, sufficient for obtaining an actual test for partition properties of

graphs, or of partitions of hypergraph vertices only, does not suffice in the general case

— as we are able to demonstrate by proving a lower bound on the query complexity of

such hypergraph properties.

These results are based, for the most part, on articles published by the author and

research collaborators in conferences and journals during the course of the author’s

doctoral research period, the most up-to-date versions of which being [FR07], [GKNR10]

and [FR11].

2

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 13: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Chapter 1

Introduction

Studies in Theoretical Computer Science, and specifically in Computational Complexity,

are most often concerned with the following kind of question: How much of a certain

computational resource is necessary, or sufficient, for solving certain computational

problems? The resource of interest can typically be the computation time, or the

number of operations: How fast can one, say, sort an array of numbers, as a function

of its length? Other resources often studied are the amount of memory space for

performing the computation, or the number of bits of communication necessary for

several computers interacting over a network to compute something in collaboration.

The field of Property Testing can be thought of as the study of how much information

from the input instance of a computational problem is necessary for making a valid

decision.

Of course, one can generally not make correct decisions with certainty about an input

object — a string of characters, a graph, a function evaluated over a certain domain

— without reading it in its entirety; but one can very often reach certain conclusions

about the entire input based on samples from it, with high probability of their being

correct. More specifically, a property test is allowed oracle access to some combinatorial

object, and must distinguish with high probability between the case of this object

satisfying a certain property, and the case of the object being far from satisfying it by

some measure of distance. Roughly, when one needs to change at least an ε-fraction of

the representation of the object to make it satisfy the property, it is considered to be

ε-far from satisfying the property. One is interested in devising tests making as few

queries as possible of function values, presence of graph edges, matrix cell values, etc. A

test, therefore, must probabilistically decide the promise problem, in which the input is

guaranteed to either satisfy a property or be far from satisfying it; and it is allowed to

err or fail for inputs which are close to satisfying the property, but do not quite satisfy

it.

Such problems were first studied by Blum, Luby and Rubinfeld in [BLR90], which

was concerned with testing the linearity of functions, and began a long line of inquiry

into testing algebraic properties; one of these works, by Rubinfeld and Sudan [RS96],

3

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 14: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

first introduced the general formulation of Property Testing as such.

The study of testing properties of combinatorial objects began with the work of

Goldreich, Goldwasser and Ron in [GGR98], regarding properties of graphs. Combi-

natorial property testing has been an active field of research in the decade-and-a-half

since, as is evidenced by the earlier surveys by Fischer [Fis04] and by Ron [Ron01], and

the more recent survey of by Ron [Ron10] and by Goldreich [Gol10] (the latter is in fact

a collection of mini-surveys and articles, including [GKNR10]).

Dense structure testing and other models

One of the important aspects in the study of property testing is the testing model —

that is, exactly what information is the test given in advance; what is the distance

metric between input structures; and what kind of queries it can make regarding the

implicit input structure (or, alternatively, how is the input structure represented). For

example, in the case of graphs, the test might ask “is there an edge between the ith and

jth vertices?” or it might ask “which vertex is the kth neighbor of the ith vertex?” —

with these kinds of queries corresponding to an adjacency-matrix representation of a

graph or an adjacency-list representation, respectively.

The testing model corresponding to an adjacency-matrix representation of a graph is

called the dense model. This was the first model considered for testing graph properties,

introduced in [GGR98]. In this model, graphs on n vertices are ε-close to each other if

one needs to add and/or remove an ε-fraction of all possible(n2

)edges from one graph

to convert it into the other — an ε-fraction of the representation of the graph. As the

properties concern graphs rather than representations (in which vertices are labeled),

the set of representations of satisfying graphs in the model must be closed under graph

isomorphism, so if a certain labeled graph is considered to satisfy the property, so are

all labeled graphs obtained from it by permuting the labels. In the dense model, sparse

graphs (with o(n2)

edges) are all close to being empty by this definition, hence the

model’s name.

A second model which has been the focus of research is the bounded-degree model,

corresponding to an adjacency-list representation of graphs. In this model, introduced by

Goldreich and Ron in [GR02], each vertex’ degree is bounded by a fixed value d, and the

test can query a vertex to obtain any of its up to d neighbors. Asymptotically, as d n,

such graphs are all so sparse that in the dense model they would be indistinguishable

from the empty graph, and could be safely treated as empty. In the sparse graph model

also, the distance is the fraction of the total possible edges necessary to convert one

graph into the other — but in sparse graphs, a number of edges linear in n suffices to

make two graphs far from each other.

This difference between the models is not merely ‘fine’ versus ‘coarse’ resolution;

specifically, a property may be non-trivial (and not-maximally-hard) to test, in both

of these models. A telling example is the property of bipartiteness — the vertex set

4

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 15: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

being divisible into two subsets, with no edges within each set. In the dense model,

the complexity of bipartiteness is Ω(ε−1.5

)(due to Bogdanov and Trevisan in [BT04])

and O(ε−2)

(due to Alon and Krivelevich in [AK99]); in the bounded-degree model the

complexity is Ω(√n) (presented with the introduction of the model, in [GR02]) and

O(√n · poly(1/ε)) (in the subsequent [GR99] by the same authors).

These models do not cover the entire possible range of graph densities, and indeed,

Krivelevich, Kaufman and Ron have considered a model ‘mixing’ the queries possible in

the dense and the sparse models, in [KKR04] (exploring bipartiteness for different graph

densities from sparse to dense) and [AKKR08] with Alon, as well as a graph testing

model with stronger queries in [BEKKR10] with Ben-Eliezer.

This thesis focuses on testing in the dense model. However, dense testing is not

limited merely to graphs, and extends readily to other kinds of structures: A structure’s

representation includes a set or several sets of vertices, as well as a fixed number of

relations (collections of tuples), or collection of sets, with limited arity or set size.

One can thus consider the testing of dense digraphs, partite graphs, graphs with edge

colors, matrices and tensors, or more generally — hypergraphs, with or without edge

orientation, with one or more edge relations (or with edge ‘colors’). The ‘denseness’

carries to different structures through the normalized Hamming distance metric: An

ε-fraction of modifications out of the total number of possible edges/tuples/sets, or

number of matrix/tensor cells etc., makes two structures far from each other, and sparse

structures are regarded as virtually-empty.

As in the case of graphs, properties must be closed under permutations of the

vertices, so that any labeling or ordering of vertices in the representation of the structure

do not carry information; if one is testing binary matrices, for example, the two matrices

( 0 11 0 ) and ( 1 0

0 1 ) represent the same object and both satisfy or fail to satisfy a given

property.

The above example immediately leads one to consider another extension of the dense

model, to structures such as ordered matrices and hypergraphs with vertex order. While

some of the research work leading to this thesis concerned such structures, they have

thus far failed to produce any results of note, and they are therefore not explored in

this thesis. However, Fischer and Newman’s [FN07a] studies some specific properties of

multi-dimensional tensors with a partial order on their cells.

Testable and hard-to-test graph properties in the dense model

One wishes to be able to characterize which properties admit which kinds of tests:

What dependencies can one achieve of the necessary number of queries on n and ε, and

what useful features can tests be shown to have. Interestingly, [GGR98] demonstrated

that certain (graph) properties, such as k-colorability, while being NPTIME-hard as

decision problems, admit very efficient tests in the dense model — using a number of

queries independent of the size of the input graph, and depending only on the distance

5

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 16: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

parameter ε; such properties are referred to as being testable. [GGR98] established

a large class of properties as testable, and posed the characterization of the class of

properties testable in the dense model as an open problem. In the following decade,

a series of results gradually progressed towards this goal, and a characterization was

achieved in Alon, Fischer and Newman’s [AFNS09], and independently by Borgs, Chayes,

Lovasz, Sos, Szegedy and Vesztergombi in [BCL+06] (in terms of graph limits).

The main technical tool in these works is Szemeredi’s regularity lemma, stating

that large enough graphs can be decomposed into a bounded number of bipartite

graphs most of which are similar to random graphs (see Szemeredi’s own [Sze78] for

the original lemma, Fischer’s [Fis04, Section 5] for basic discussion of its use for testing,

or the characterization result in [AFNS09] itself). Unfortunately, using it incurs a

prohibitive dependence on ε — while many significant properties have a mere polynomial

dependence on ε in the number of queries. Thus the question of the dependence of the

query complexity on ε has remained a significant avenue of research.

On the other end of the spectrum from testable properties are those properties

whose query complexity is ‘maximally’ dependent on n — with query complexity Θ(n2);

artificial such properties were presented already in [GGR98]. Between the extremes,

certain properties have been established to have various query complexity functions,

(e.g. constant powers of n below 2, as in [FM06, PRR03]).

Relating features of properties, features of tests and query complexity

Within a given testing model, general results are often derived by further qualifying

the model with certain features and obtaining bounds on query complexity or other

provable consequences. These qualifications are usually features either of the property

itself, or of the test. For example, a notable result on the way to characterizing the class

of testable graph properties in the dense model is Alon and Shapira’s [AS08a]: This

work showed that if a property is hereditary, then it is also testable (that is, it admits

a test whose number of queries is independent of the size of the input); a hereditary

property is such that any induced subgraph of a satisfying graph is also itself a satisfying

graph. In fact, it was established that hereditary properties are not only testable, but

have tests with one-sided error (that is, tests that can never reject inputs satisfying the

property, regardless of which queries they make).

Another example is of strengthening an existing upper or lower bound result on

query complexity by making additional constraints on the property, as in Goldreich

and Trevisan’s [GT03, Theorem 1]: In this improvement of a result in [GGR98], the

existence is demonstrated of properties requiring Ω(n2)

queries, which are not only in

NPTIME, as was previously known, but also monotone; a graph property is monotone

(increasing) if it is closed to adding edges, i.e. adding edges (but not vertices) to a

satisfying graph results in another satisfying graph.

A third example regards the characterization of the ‘power’ of features of tests. Such

6

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 17: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

is a result of Goldreich and Ron in [GR10] (following the earlier work of Gonen and Ron

in [GR07]) regarding adaptive tests; a test is adaptive if it considers results of previous

queries when deciding which query to make next. [GR10] finds some testable graph

properties in the dense model that exhibit a polynomial gap between an upper bound

on the query complexity of adaptive tests, and a lower bound on the query complexity

of non-adaptive tests. In the sparse graph model this gap can be exponential (assuming

the test does not have to provide the labels of queried vertices in advance).

A more restricting feature of a test than being non-adaptive is being canonical,

introduced in another section of [GT03]: A canonical test samples a number of vertices,

and queries their entire induced subgraph; it then makes a deterministic decision whether

to accept or reject the graph based on this small subgraph.

This thesis will present several results of a nature similar to these examples, as well

as introduce certain hereto-unexplored features of properties of dense graph (and other

dense structures).

Testing triangle-freeness

Perhaps the most studied class of properties in the dense model is that of being free

of certain families of forbidden substructures, and specifically the property of being

triangle-free. This property easily springs to mind once one begins to think up simple

properties of graphs: A first non-trivial such property may be “not having edges”,

distinguishing empty graphs from graphs with many edges; after edges, perhaps paths,

and then, perhaps a small cycle, a triangle. And while the query complexity obeing

free of edges or of paths of any fixed length is easy to analyze (the query complexity

is Ω(1/ε) queries), studying triangle-freeness testing is a very challenging endeavor:

While the property is known to be testable, there is a vast gap between the lower and

upper bounds for it.

The best known upper bounds for testing a graph for being free of triangles were

until recently based on applying Szemeredi’s regularity lemma: See [Alo99], a proof

sketch in [Fis04], or a more general treatment covering any family of induced subgraphs

in Alon, Fischer, Krivelevich and Szegedy’s [AFKS00]. This construction yields a query

complexity equal to a tower function of height polynomial in 1/ε (even a double-tower

for general forbidden induced subgraphs); recently, Fox has proven in [Fox11] a tower

function upper bound for forbidden subgraphs, whose height is only logarithmic in

1/ε , by a technique similar to the one used for proving Szemeredi’s Regularity Lemma,

customized to the problem of subgraph-freeness.

The study of the property of triangle-freeness has also seen much use of the relations

between features of properties and features of tests, for obtaining lower bounds. The

standard approach for proving lower bounds on a property’s query complexity is Yao’s

method, named after a principle observed in Yao’s [Yao77]: if any deterministic test can’t

distinguish well enough between two fixed probability distributions, one over satisfying

7

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 18: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

graphs and one over far graphs, then no probabilistic algorithm (which is a distribution

over deterministic algorithms) can do so either, and a lower bound is established —

usually for non-adaptive tests. If the test is adaptive, proving indistinguishability

becomes more complex, as queries depending on the test’s history of queries already

made can much better distinguish between input distribution.

If we limit our attention to one-sided tests only, things become somewhat simpler:

A test querying a subgraph which in itself contains no triangles would have to accept,

as it is possible that there are no edges in the graph except the queried ones. A bound

therefore requires only constructing a single graph (for every order n) which has very few

triangles, but no small set of edges intersecting all of them. Indeed, such a construction

by Alon in [Alo02] established a bound (mildly) super-polynomial in 1/ε ; this bound

is based on a number-theoretic construction of Behrend in [Beh46] of dense sets of

integers without any three-term arithmetic progression. A recent construction by Elkin

in [Elk11] of larger arithmetic-progression-free sets allows for a slight improvement of

the [Alo02] bound.

If one could convert such one-sided lower bounds into general, two-sided bounds,

this could be a shortcut avoiding a complex adversarial Yao’s-method construction.

And indeed, [GT03] includes a proposition communicated by Noga Alon: Testable

hereditary properties can be tested by merely ensuring that most small induced sub-

graphs themselves satisfy the property (with a mild increase in the number of queries).

Consequently, if the property is both hereditary and one-sided, then any test should

imply the existence of a one-sided test — and any bound on one-sided testing becomes

a bound on testing in general. Unfortunately, it later turned out that this proposition

only holds for tests which are ‘natural’: Tests acting independently of the size of the

input graph. This qualification appears in the errata [GT05].

Alon and Shapira worked in [AS06] around the hurdle of not being able to generalize

the one-sided triangle testing lower bound of [Alo02] to the two-sided setting, by proving

the same quasi-polynomial lower bound for any triangle freeness test, directly, using

Yao’s method to obtain specific indistinguishable distributions. However, this method

is limited to a specific kind of constructions, and may not necessarily apply to future

one-sided lower bounds.

1.1 Overview of results

Inflatable properties and natural property tests

In Chapter 3 (based on [FR11]) we establish links between the query complexity of natu-

ral tests and the features of graph properties being inflatable and hereditary. Specifically,

we show that for properties which are almost hereditary and almost inflatable, any test

with query complexity independent of n can be made natural, with a polynomial increase

in its number of queries. The naturalization is carried out as a sort of extension of the

8

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 19: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

canonicalization due to Goldreich and Trevisan in [GT03], so that natural canonical tests

can be described as strongly canonical. In the reverse direction, we show that properties

admitting natural tests are approximately inflatable and approximately hereditary, with

these parameters depending on the test’s number of queries.

Using the technique for naturalization, we restore in part the claim in [GT03]

mentioned above, regarding testing hereditary properties by ensuring that a small

random subgraph itself satisfies the tested property. This restoration allows us to make

a generalization regarding lower bounds on triangle-freeness testing: Any (future) lower

bound — not only the currently established quasi-polynomial one — on one-sided testing

for triangle freeness holds essentially for two-sided testing as well. We later make use of

this generalization in the lower bounds for testing partite dense structures, in Chapter 5

(see overview below). We also demonstrate the use of this generalization through an

explicit statement and simple proof of the bound implicit in the constructions of [Elk11],

constituting a slight improvement over the best established lower bound of [AS06].

Finally, we prove a characterization of those inflatable properties which admit a

proximity-oblivious test.

Query complexity hierarchies for dense graphs and other models

In Chapter 4 (based on [GKNR10]) we consider the question of the existence of properties

with arbitrary query complexity. We answer this question affirmatively, establishing

hierarchies of query complexity classes for both the sparse and the dense model for

graph testing. Loosely speaking, we prove that for every reasonable function q(n), there

exists a property of graphs which is not testable using o(q(n)) queries, but is testable

using O(q(n)) queries.

For the sparse graph model, we establish the hierarchy theorem using a non-artificial,

easy-to-formulate property for every q(n): The property of being 3-colorable and having

connected components of order at most q(n). The q(n)-query test establishing the upper

bound is one-sided.

For the dense model, we in fact prove three variant hierarchy theorems, each for

some additional feature of the properties or the test:

• A hierarchy of query complexity classes of properties which are PTIME-decidable

(as languages) and PTIME-testable — that is, properties with a test whose

running time is polynomial in q(n).

• A hierarchy for monotone properties (although not in PTIME).

• A hierarchy for properties in which the lower bound q(n) on query complexity is

matched by a one-sided upper bound, i.e. they can be one-sided tested with q(n)

queries.

9

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 20: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Lower bounds for partite dense structures

In Chapter 5 (based on [FR07]) we consider dense structures other than general graphs:

Bipartite graphs with colored edges and k-partite k-uniform hypergraphs — which

correspond to matrices and tensors (with no order among rows and columns), binary

or over finite domains. Relating to [AFN07], which established a polynomial upper

bound for testing binary matrices for forbidden subgraph freeness, we prove super-

polynomial lower bounds both for matrices over a trinary domain, and for 3-dimensional

binary tensors; this shows that the upper bound result, and the concept of ‘conditional

regularity’ underlying it, do not immediately extend to larger domains, nor to higher

dimensions. The lower bound is based on a reduction from testing cycle-freeness in

dense digraphs, utilizing also the result re-established in Chapter 3 regarding hereditary

property lower bounds.

Pseudo-testing hypergraph tuple partition properties

In Chapter 6 we consider the prospects of expanding the set of efficiently-testable

properties of hypergraphs with multiple (oriented) edge relations, as dense structures.

Specifically, we consider a generalization of the graph partition properties established

to be easily testable in [GGR98]. Fischer, Matsliah and Shapira show in [FMS07]

that a rudimentary generalization of such partition properties to hypergraphs is also

efficiently testable. We study a stronger and somewhat more expressive generalization,

in which not only vertices are partitioned, but also vertex tuples of higher arity. We

show that such a class of properties, while not being maximally expressive (e.g. it

does not seem to allow expression of the property of having a regular hypergraph

partition) does not have tests which are efficient in terms of ε. On the other hand,

we show that they admit an efficient ‘pseudo-test’, which distinguishes hypergraphs

satisfying such a property from hypergraphs for which every partition is far from being

satisfactory; in other words, the pseudo-test may err for hypergraphs which are far from

the property but have approximately-satisfying partitions. Unlike the case of graphs,

having such an approximately-satisfying partition does not imply closeness to having a

properly-satisfying one.

10

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 21: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Chapter 2

Preliminaries

2.1 The dense model for property testing

This thesis concerns testing properties of dense combinatorial structures, with graphs

being the most commonly studied, and for which the testing model is usually defined.

As much of the thesis concerns other ‘dense’ structures (a concept which will be defined

shortly), we first define the model for the case of graphs, and then make definitions for

more general dense structures in Subsection 2.1.1.

In the context of this work, we refer to simple graphs, G = (V,E), with V being a set

of vertices of order n and E an edge set containing unordered pairs of vertices.

Definition 2.1.1. The absolute distance between two graphs G, H of order n is the

number of edges one has to add and/or remove in G to make it into an isomorphic copy

of H; in other words, it is the minimum over all bijections φ : V (G)→ V (H) of the

number of edge discrepancies — the symmetric difference

u, v ∈ E(G) | φ(u), φ(v) /∈ E(H) ·∪ u, v ∈ E(H) | φ(u), φ(v) /∈ E(G)

The (relative) distance dist(G,H

)between G and H is the absolute distance between

them normalized by a factor of(n2

)−1.

Two graphs are said to be ε-far if their distance is at least ε (that is, they have at least

ε(n2

)edge discrepancies).

Definition 2.1.2. A property of graphs is a set Π =⋃n∈N Πn of graphs, closed under

graph isomorphism, where Πn is supported on graphs of order n.

A graph of order n is said to satisfy a property Π if it is an element of Πn; a graph is

said to be ε-far from satisfying a property Π if it is ε-far from every graph H ∈ Πn.

Definition 2.1.3. A dense model property test for a graph property Π is a probabilistic

oracle machine which, given the values (n, ε), as well as oracle access to a graph G of

11

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 22: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

order n, makes a certain number of edge queries (“is there an edge between the vertices

u and v?”), and distinguishes with probability at least 2/3 between the case of G being

in Π and the case of G being ε-far from Π. The (possibly adaptive) number and choice

of queries, as well as the rest of the algorithm, may in general depend on the value of n,

as can the decision to accept or reject.

Note. Many results regard tests for specific values of ε, rather than tests receiving ε

as a parameter. Alon and Shapira prove in [AS08b] that these notions are different,

with some properties only being testable with ε-specific tests rather than a general test

receiving ε as a parameter. (The difference has to do with the computational tractability

of the number of queries as a function of ε; see [Sha06, Chapter 3] for further discussion.)

The results of this thesis hold for both settings. Specifically, all upper bounds are tests

receiving ε as a parameter, while all lower bounds apply to ε-specific tests as well as

tests receiving ε as a parameter.

Definition 2.1.3, the traditional definition of a property test in the dense model,

includes an artificial dependence of the query model on the value of n: Without utilizing

this value it is not possible to make any queries. The results and observations in [GT03,

Section 4] emphasize the artifice of this particular dependence, and lead to an alternative

definition of a test avoiding it:

Definition 2.1.4 (Alternative to Definition 2.1.3). A dense model property test for a

graph property Π is a probabilistic oracle machine which is given the values (n, ε), as

well access to a graph G of order n, through an oracle which takes two types of requests:

A request to uniformly sample an additional vertex out of the remaining vertices of

G, and an edge query within the subgraph induced by the sampled vertices (“is there

an edge between the ith and jth sampled vertices?”). The machine makes a sequence

of requests to the oracle, and distinguishes with probability at least 2/3 between the

case of G being in Π and the case of G being ε-far from Π. If the test has sampled all

vertices of the graph being tested, additional requests to sample an additional vertex

will indicate that there are none left.

Definition 2.1.3 and Definition 2.1.4 are not equivalent as computational models in

general, but in the context of testing dense structure properties closed under isomorphism

— they are equivalent. With respect to graphs, this is established for all intents and

purposes in [GT03], albeit not formally stated there. Further discussion of this point

regarding dense structures in general can be found in Section 3.9.

2.1.1 General dense structures

A wide variety of dense structures are studied in this and other works on Property

Testing, so that a “most-general definition” covering them all would make for a sort of

a swiss-army-knife: General and partite graphs and hypergraphs; matrices and tensors,

12

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 23: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

over binary or other domains; hypergraphs with uniform-arity hyperedges or multiple

arities; structures might have edges as sets of vertices, or as tuples as in the case of

digraphs, or both; there may be a single edge relation, or multiple relations; et cetera.

So as to state at least in a mostly-general way what constitutes a dense structure, we

shall use the following:

Definition 2.1.5. An unconstrained general dense structure is a hypergraph H =

((V1, . . . , Vk), (E1, . . . , Et)) with k vertex sets or parts, and t (hyper)edge relations

(or “colors”), each being a set of arity-ri tuples over the union of the vertex sets:

Ei ⊆∏rii=1

⋃ki=1 Vi.

Definition 2.1.6. A general dense structure class constraint is a sentence in First-

Order Logic without equality, with vocabulary V1, . . . , Vk, E1, . . . , Et. The arity of

each Vi symbol is 1, and the arity of each Ei is ri. A constraint must have the form

∀x1 . . . ∀xs ϕ(x1, . . . , xs), with ϕ being unquantified; the formula ϕ must be made up

only of edge relation symbols of arity at least s, using all variables x1, . . . , xs at least

once (but with possible repetitions), vertex part containment relation symbols (using a

single variable), and Boolean connectives (including negation).

Definition 2.1.7. A class of general dense structures is the set of all unconstrained

general dense structures with the same specific k, t and arities (ri)ti=1, which satisfy a

specific common set of constraints with the appropriate vocabulary, where the constraints

are interpreted as follows: The domain is⋃i Vi; the Vi symbols are interpreted as

containing all vertices of the ith part of the structure; and the Ei symbols are interpreted

as the structure’s own relations Ei.

Such constraints allow the expression of the wider variety of structures mentioned above

through multi-relation hypergraphs. Some relevant examples:

• An edge relation may be constrained to be symmetric (permutation of the coor-

dinates does not change the edge function value). An example: Structures with

k = 1, t = 1 and r1 = 2, with the constraint ∀x ∀ y[E1(x, y)↔ E1(y, x)

], are the

expression of undirected graphs (with possible self-loops).

• Several edge relations (say, `) of the same arity may be constrained to only have

some of the 2` possible values for a certain tuple; this allows the expression of

structures with colored edges, whose maximum number of colors is not a power of

2, using multiple edge relations.

• A constraint may prevent tuples containing a single vertex more than once.

For example, to prevent self-loops in graphs, the constraint imposed would be:

∀x[¬E1(x, x)

].

• An edge relation may be constrained to tuples in some specific sequence of vertex

parts Vj1×. . .×Vjri ; this allows the expression of bipartite digraphs or k-partite

13

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 24: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

oriented hypergraphs. For bipartite digraphs (with k = 2, t = 1, r1 = 1), the

constraint would be: ∀x∀ y[E1(x1, x2)→ (V1(x1) ∧ V2(x1))

].

Finally, while some structures do not technically fit even this general definition (e.g.

matrices or tensors which have no vertices) — they can easily be expressed by a general

dense structure with a simple transformation or reinterpretation (e.g. interpreting a

matrix as the adjacency matrix of a bipartite graph). We will thus refer to them as

dense structures as well.

Definition 2.1.8. A property of general dense structures of a certain class is a set Π

of structures, all satisfying the constraints associated with the class, which is closed

under isomorphism (i.e. closed under permutation of the vertices in each part).

Definition 2.1.9. For a class of general dense structures with one edge relation E of

arity r, the absolute distance between structures in that class is the minimum, over all

bijections between the vertex sets, of discrepant tuples with respect to the edge relation.

(This is different than our definition for graphs, in that tuples are counted rather than

sets.) The (relative) distance is the absolute distance normalized by n−r.

For classes with multiple edge relations, the absolute distance is not a meaningful

concept, as the number of tuples in each edge relation is of a different order of magnitude

with respect to to n. The (relative) distance, with respect to a specific bijection between

the vertices of corresponding parts of the structures, is the maximum over all edge

relations Ei of the number of discrepancies with respect to the bijection in that edge

relation, normalized by n−ri . The overall (relative) distance is the the minimum of the

above over all bijections.

Note. One can, as an alternative to the definition above, further normalize the distance

by the maximum possible distance between two structures in the class (as in the case of

simple undirected graphs, where the distance is a fraction of(n2

)).

Definition 2.1.10. A general dense structure with n vertices in each of its parts is

said to be of uniform order n; if the number of vertices in each part differs, then the

structure is said to be of (non-uniform) order (n1, n2, . . . , nk).

Definition 2.1.11. A dense model property test for a property Π of a certain kind of

dense structures is a probabilistic oracle machine which, given the values (n1, . . . , nk, ε),

as well oracle access to a structure H with ni vertices in each of the k parts , makes a

certain number of tuple queries (“is the tuple (x1, . . . , xri) in the edge relation Ei?”),

and distinguishes with probability at least 2/3 between the case of H being in Π and

the case of H being ε-far from Π.

A dense model uniform-order property test for a property Π is a test as per the

above, except that the structure tested is guaranteed to be of uniform order n, and the

test is given the values (n, ε).

14

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 25: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Note. The alternative definition for a property test in Definition 2.1.4, without an

artificial dependence on the number of vertices (in each part), can be made similarly in

the case of a general dense structure, with the oracle receiving either requests to sample

a vertex from one of the parts of the graphs, or queries regarding the presence of tuples

of already-sampled vertices in one of the structure’s edge relations.

2.2 Features of dense structure property tests

As discussed in the introduction, it is interesting to distinguish tests not just by their

use of computational resources (queries, running time, etc.) but also by other features

specific to the setting of dense structure property testing or property testing in general.

Definition 2.2.1. A property test is said to be one-sided (or said to have one-sided

error) if it accepts all graphs in Π with probability 1.

Definition 2.2.2. A property test is said to be adaptive if the queries it makes to the

oracle may depend in some way on the results of previous queries. If no query made by

the test depends on previous query results, the test is said to be non-adaptive.

Definition 2.2.3. A test for a graph property Π is said to be canonical if, for some

function s : N× (0, 1)→ N and some sequence of properties(Π(i)

)i∈N (with Π(i) con-

sisting of structures of size s(i, ε)), the test operates as follows: On input n and oracle

access to an n-vertex graph G, the test samples uniformly a set of s(n, ε) distinct vertices

of G, queries the entire corresponding induced subgraph, and accepts if and only if this

subgraph is in Π(n). If the graph has fewer than s(n, ε) vertices, the test queries the

entire graph and accepts if it is in Π.

For a general dense structure, a canonical uniform-order test samples s(n, ε) vertices

from each one of the k parts, and queries the substructure induced by these sampled

vertices. If the structure has fewer than s(n, ε) vertices per part, the test queries the

entire structure and accepts if it is in Π.

Note. For multi-partite dense structures, this definition is somewhat lacking — it does

not cover tests of non-uniform-order structures. See Section 3.9 for further discussion

and a reasoning for limiting the definition’s scope in this work.

Definition 2.2.4 (as appearing in [GT05]). A (graph) property test is said to be nat-

ural if its query complexity is independent of the size of the tested structure, and on

input (n, ε) and oracle access to a graph of order n, the test’s output is based solely

on the sequence of oracle answers it receives, and not on n (while possibly using more

random bits, provided that their number and use is also independent of n).

If our graph property tests are as defined traditionally (Definition 2.1.3), the above

definition of a natural test is flawed, and no test which makes any queries can be natural:

15

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 26: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

A test cannot make q(ε) queries to an input graph with less than√q(ε) vertices (this

point is also mentioned in [AS08a]). Instead of amending the definition of naturality

to avoid this semantic issue, it seems more reasonable to use the alternative definition

for the dense graph model, Definition 2.1.4, in which the artificial dependence on n is

removed. In this case, Definition 2.2.4 is valid: If the test attempts to sample too many

vertices, the oracle indicates its failure to do so and the test proceeds accordingly. In

fact, in this work the implicit assumption is made that whenever a test attempts to

sample more vertices than the vertex set contains, the oracle indicates that this is the

case, and the test proceeds to query the entire structure, accepting it deterministically

if it satisfies the property being tested.

In Chapter 3 we further develop the notions of canonicality and naturality of tests,

and explore their interrelation.

2.3 Features of dense structure properties

Definition 2.3.1. A property is said to be testable if it has a test whose maximum

number of queries is independent of n, and depends only on ε. If the maximum number

of queries is a polynomial function in 1/ε , the property is said to be polynomially

testable.

Definition 2.3.2. A graph property Π of is said to be decidable in complexity class

CLASS if, for some reasonable string encoding of graphs (so that the string length is

polynomial in the order of the graph), the language consisting of these encodings for

the graphs of Π is in CLASS.

Thus a property is in PTIME if the language of Π graph encodings is accepted by a

deterministic Turing machine running in time polynomial in the length of its input, etc.

A similar definition can be made for non-graph structures — dense or otherwise.

Definition 2.3.3. A property of graphs is said to be ε-testable in PTIME, if it has

an ε-test, whose running time is bounded by a polynomial function of its number of

queries (rather than polynomial in n). The property is said to be testable in PTIME

or PTIME-testable if it is ε-testable in PTIME for every ε > 0.

Definition 2.3.4. A property is said to be hereditary if it is closed under the taking

of induced substructures.

Hereditary properties can be characterized by a (possibly infinite) set F of forbidden

induced substructures — a structure satisfies a hereditary property Π if and only if it

has no induced subgraph from the forbidden set FΠ .

Definition 2.3.5. A property of graphs or hypergraphs is said to be downwards mono-

tone if it is closed under the removal of edges (while maintaining the same number of

16

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 27: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

vertices). If a property is closed under the addition of edges, it is said to be upwards

monotone.

Note. This notion of monotonicity is wider than that used in [AS05], which combines

the notions of monotonicity and heredity defined here.

Definition 2.3.6. A graph G′ = (V ′, E′) is said to be a blowup of a graph G = (V,E)

if V ′ can be partitioned into |V | clusters of vertices, each corresponding to a vertex in

V , where the edges in E′ between these clusters correspond to the edges of E. In other

words, if (u, v) ∈ E then the bipartite graph between the clusters corresponding to u

and v is complete, and if (u, v) /∈ E then this bipartite graph is empty. G′ must also

have no edges within the clusters of such a partition. A graph blowup is said to be:

an exactly-balanced blowup if the clusters in V ′ (corresponding to the vertices of G)

all have exactly the same size (and, in particular, |V | divides |V ′|). In this case,

for t = |V ′|/|V |, G′ is also said to be a t-factor blowup of G.

a balanced blowup if all clusters are of size either b|V ′|/|V |c or d|V ′|/|V |e. The

unqualified term ‘blowup’ indicates a balanced blowup.

a generalized blowup if all clusters in V ′ are non-empty (but have no other restriction

on their sizes).

a relaxed generalized blowup if the clusters in V ′ may have any size, with some possibly

being empty.

The above definition requires an explicit statement for classes of general dense

structures:

Definition 2.3.7. A dense structure H ′ = ((V ′1 , . . . , V′k), (E′1, . . . , E

′t)) is said to be

a blowup of a structure H = ((V1, . . . , Vk), (E1, . . . , Et)) if it satisfies the following.

First, each of its vertex sets V ′i can be partitioned into |Vi| clusters of vertices, with

each cluster Cv corresponding to some vertex v ∈ Vi. Additionally, the tuples in each E′jcorrespond to the tuples in Ej : If x =

(x1, . . . , xrj

)∈ Ej then the complete rj-uniform

oriented hypergraph∏rj`=1Cx` is contained in E′, and if x /∈ E then E′j contains no

hyperedge of this hypergraph. In particular, if H has no hyperedges involving the same

vertex more than once, then H ′ has no hyperedges with more than one constituent

vertex within the same cluster.

A blowup is said to be balanced if the clusters in each V ′i all have the same size up

to a difference of at most 1; and exactly-balanced if the clusters have exactly the same

sizes (and, in particular, |Vi| divides |V ′i |). In this case, for ti = |V ′|/|V |, H ′ is also said

to be a (t1, . . . , tk)-factor blowup of H; if ti = t for all i ∈ [k], the blowup is said to be

a t-factor blowup of H.

Observation 2.3.8. General dense structure classes are, in themselves, closed to taking

blowups: It is easy to verify that any constraint satisfied by a dense structure is also

17

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 28: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

necessarily satisfied by a blowups of structures in the class. General dense structure

classes are also closed to taking induced substructures, by a similar argument.

Definition 2.3.9. A property Π is said to be inflatable if it is closed under (balanced)

blowups, i.e. if G satisfies Π, then so does any blowup of G.

The concept of inflatability, which this thesis introduces, is explored in Chapter 3.

2.4 Testing-Reductions between properties

The following definition is relevant essentially to any model for property testing, not

merely dense graphs or other dense structures.

Definition 2.4.1. Consider two classes CLASS and CLASS′ of combinatorial objects,

each with some distance metric and some measure of object ‘size’ (e.g. the number of

vertices in a graph or the number of bits in the representation of the object). Also, let

fr : R+→ [0, 1] be a continuous function and gr, hr : N→ N. The testing of a property

Π ⊆ CLASS, in some testing model, is said to be (fr, gr, hr)-reducible to the testing of

property Π′ ⊆ CLASS′ in another testing model if, given oracle access to a structure

K ∈ CLASS (with possible queries according to the testing model for Π), one may

simulate an oracle to a structure K ′ ∈ CLASS′ (accepting queries according to the

second testing model) with the oracle satisfying the following:

1. If K is of size n then K ′ is of size at most O(hr(n)).

2. If K ∈ Π then K ′ ∈ Π′.

3. If K is ε-far from Π (according to the CLASS metric) then K ′ is fr(ε)-far from

Π′ (according to the CLASS′ metric).

4. To answer a query regarding K ′, one must make at most gr(n) queries to K.

Abusing the definition somewhat, we shall sometimes describe Π as being reducible to

Π′.

Lemma 2.4.2. If, in the above settings, the query complexity of Π′ is O(q(n, ε)), then

the query complexity of Π is O(q(hr(n), fr(ε)) · gr(n)).

Conversely, let f(ε) be continuous, with its image containing some interval (0, ε0),

and let

hr−1(n) = min

n′ ∈ N

∣∣ hr(n′) = n

fr−1(ε) = max

ε′ ∈ R+

∣∣ fr(ε′) = ε

With hr having an infinite image. If the query complexity of Π is Ω(q′(n, ε)

), then the re-

ducibility implies that the query complexity of Π′ is Ω

(1

gr(hr−1(n))· q′(hr−1(n), f−1(ε)

))(for ε < ε0 and the values of n for which hr

−1 is defined).

18

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 29: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Proof. For the upper bound claim: Given a structure in CLASS, one applies the

test for Π′ (with hr(n) and fr(ε) instead of n, ε) while simulating oracle access to the

corresponding structure in CLASS′. This is by definition a valid test for Π′, making

the claimed number of queries.

For the lower bound one uses the reduction from testing Π to Π′, obtaining a valid

test as in the above. If the Π′ test makes o

(1

gr(hr−1(n′))· q′(hr−1(n′), f−1(ε′)

))queries

given n′, ε′, then when given n′ = hr(n), ε′ = fr(ε), it makes o(

1gr(n′′)

· q′(n′′, ε′′))

queries to the simulated oracle, for n′′ = hr−1(hr(n)) ≤ n and ε′′ = f−1(f(ε)) ≥ ε,

with each query requiring at most gr(n′′) queries to the real oracle; thus the actual

number of queries is o(q′(n′′, ε′′)) = o(q′(n, ε)), contradicting the query complexity lower

bound for Π. These last two steps of our argument can be made since the range of n′′

is unbounded, and ε′′ can be arbitrarily close to 0, so that lim supn′′−→∞ n = ∞ and

lim infε′′−→0

ε = 0.

Observation 2.4.3. Reductions defined as per the above preserve one-sided error (in

the construction of Π-testers from Π′-testers), but they do not necessarily preserve

non-adaptivity if the query translation (item 4 above) is not itself non-adaptive.

Observation 2.4.4. If Π1 is (f1, g1, h1)-reducible to testing Π2, and Π2 is (f2, g2, h2)-

reducible to testing Π3, then Π1 is (f1 f2, g1 · g2, h1 h2)-reducible to testing Π3

— assuming that h2(n) is monotone increasing (otherwise one has to account more

accurately for the O(h1(n)) structure sizes resulting from the first reduction).

19

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 30: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

20

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 31: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Chapter 3

Inflatable properties and natural

property tests

3.1 Introduction

Goldreich and Trevisan’s [GT03] includes two results tying several features of properties

and tests together. Their article defined the feature of property tests being canonical

(Definition 2.2.3). The article also demonstrates how any test can be made canonical

with at most about a squaring of its number of queries; this immediately implies that the

gap between properties’ adaptive and non-adaptive query complexity (see the discussion

in Chapter 1) is at most quadratic. The second result (due to Noga Alon) was mentioned

in Chapter 1 with respect to triangle-freeness testing: If a property is hereditary, then a

test for it can be replaced with merely ensuring that a small sampled subgraph satisfies

the same property as the large one. However, the proof in [GT03] implicitly assumes

that the test is natural (as in Definition 2.2.4); thus this result must be qualified, and is

not usable as such for deriving lower bounds on testing a property in general.

It seems odd, however, that tests for hereditary properties could circumvent the

argument in [GT03]. Many hereditary properties (specifically, those with finite families

of forbidden graphs) are highly ‘local’ in their definition; wherefore might they benefit

significantly from basing their action on the order of the entire input graph? If we

constrain ourselves to properties with features preventing blatant ‘pathologies’ which

preclude natural tests (e.g. the property of graphs having an odd number of vertices) —

then one tends to believe that property tests are ‘essentially natural’, so that perhaps

one can ‘smooth out’ any non-natural artificial dependence of tests on n.

The relevant features of properties allowing this adjustment will have to do with

their heredity, on the one hand, and their inflatability on the other. For an intuition

for the choice of these features, think of a property test as being canonical, with a

set of acceptable subgraphs for each order n of the input graph; in general, this set

may gain or lose elements as n increases or decreases; we want to ‘fix’ it somehow.

Constraining a property to be hereditary intuitively ‘covers’ one direction of change

21

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 32: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

in n : As the input order increases, the set of forbidden subgraphs increasingly gains

elements, so one expects the set of subgraphs accepted by the test to shrink gradually,

or at least fail to grow — since the test is supposed to reject query results indicating

the presence of forbidden subgraphs. In the other direction, we would like the test’s

set of accepted subgraphs not to grow as n goes down; now, if whatever it is we accept

at a certain order also appears at higher orders — through blowups — then we do

not expect the set of accepted subgraphs to shrink. Again, this is merely intuition. A

concrete immediate effect of requiring inflatability is precluding the pathology of graphs

going from satisfying a property at order n to being very far from satisfying it by merely

adding a vertex.

With regards to the idea of ‘smoothing out’ non-naturality, a typical example would

be a test which arbitrarily rejects some specific queried subgraph at, say, even orders,

and accepts it at odd ones. If this subgraph is very unlikely to appear in graphs in the

property, then a natural test could be ‘spoiled’ by adding this behavior to it, while still

remaining a valid test. However, this can only be done for a single possible queried

subgraph, or few of them — such behavior is impossible with all acceptable graphs,

or with any subset of them which has an overall high probability of being sampled.

This leads one to recall that, in Alon, Fischer, Newman and Shapira’s [AFNS09], the

characterization of testability uses the set of all subgraphs of a fixed order accepted by

a canonical test. Even more relevant is Fischer and Newman’s [FN07b] (proving that

testable properties are also estimable, a key result necessary for the characterization

in [AFNS09]), where it is observed that if one has a good estimate of the subgraph

distribution, then one knows in particular whether a test querying subgraphs of this

order accepts with high probability or not. In fact, disregarding the heavy use of

Szemeredi’s regularity lemma in [FN07b], its result is based mostly on estimating the

subgraph distribution up to a small variation distance — an approach sometimes referred

to as “meta-testing”.

Indeed, by analyzing tests with a focus on the distribution of subgraphs of a fixed

order and its behavior in subgraphs and blowups, under the constraints of heredity and

inflatability (even with a little relaxation), tests can be made natural, with a polynomial

penalty in the number of queries. This technique, the concept of inflatable properties,

and some of the aspects of our analysis, allow us to achieve several related results —

including a partial restoration of the proposition regarding testing hereditary properties

— and to draw conclusions regarding lower bounds for testing triangle (and other induced

subgraph) freeness.

22

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 33: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

3.2 Additional preliminaries

3.2.1 On features of properties and of tests

Canonicality

The definition of a property test being canonical appears above, as Definition 2.2.3. Any

test can be made canonical:

Theorem ([GT03, Theorem 2]). If a graph property has a test making q(ε) queries

involving at most s(ε) vertices, independently of the size of the input graph, then it

has a canonical test with queried subgraph order at most 9s(ε) (and query complexity

O(q(ε)2)). If the original test is one sided, this canonical test’s queried subgraph order

is s(ε) and it is also one-sided.

Note. The theorem, as appearing in [GT03], is not phrased in terms of the order of the

sampled subgraph as in the above; this bound is to be found in the theorem’s proof:

The original test is repeated 9 times, and the majority-vote is used, to amplify the

probability of success from 1/3 to 1/6; see also [GT05, Page 2, Footnote 1]. If one

wishes the canonical test to succeed with higher probability, this can be achieved by

repeating the original pre-canonicalized test additional times (and using a majority vote)

before applying the canonicalization itself; the penalty is a constant-factor increase in

the final order of the queried subgraph.

A canonical test, which accepts a graph G when the queried subgraph on its sampled

vertices is G′, is said to accept G by sample G′.

In this chapter we will be dealing mostly with tests which combine both the features

of canonicality and naturality, focusing on making canonical tests natural as well. For

canonical tests, the feature of naturality means that the ‘internal’ property, the one

for which the sampled subgraph is checked for, does not depend on the order of the

input graph. This observation leads us to use naturality to define several ‘levels’ of

canonicality for a property test:

Definition 3.2.1. Consider a canonical test for graph property Π, with(Π(i)

)∞i=1

being

the sequence of properties the satisfaction of which the test checks for its sampled

order-s subgraph. The test is said to be

perfectly canonical when Π(n) = Π: The test does nothing but ensure that a small

random subgraph satisfies the same property that the larger input graph is being

tested for.

strongly canonical when Π(n) = Π′: The test ensures that a small sampled subgraph

satisfies some fixed property, the same one for any order of the input graph, but

not necessarily Π itself.

weakly canonical for any(Π(i)

)∞i=1

: It may be the case that Π(n) is different for different

input graph orders n.

23

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 34: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Notes.

– Indeed, a test is strongly canonical if and only if it is both canonical and natural.

– In Alon and Shapira’s [AS08a], the term oblivious is used for what we have defined

as a strongly canonical test.

– There is only one perfectly canonical test for any queried subgraph order; of course,

for many properties this will not constitute a test, as it will not distinguish satisfying

graphs from far graphs with sufficient probability.

Approximate inflatability and heredity

We have defined what it means for a property to be inflatable and hereditary, in exact

terms. In this chapter we require relaxations of these definitions, to be able to describe

properties as approximately hereditary or approximately inflatable. These definitions

will concern random subgraphs and “random blowups” of graphs, so we first discuss the

latter briefly.

Definition 3.2.2. A random blowup of a graph from order n to order n′ is the blowup

in which the n′ (mod n) vertices having the larger clusters in the blowup (clusters of

size dn′/ne rather than bn′/nc) are chosen uniformly at random.

Definition 3.2.3. Let G,H be graphs of the same order, let π : V (G)→ V (H) be a

bijection and let G′ be a blowup of G. A blowup H ′ of H to the same order as G′ is

said to correspond to G′ if for every v ∈ V (G), the size of v’s cluster in G′ is the same

as the size of π(v)’s cluster in H ′. In other words, “the same” vertices in G and H get

larger clusters.

Lemma 3.2.4. Let G 6= H be graphs of order n, let n′ > n, and let π : V (G)→ V (H)

be a bijection achieving dist(G,H

), i.e. exhibiting dist

(G,H

)·(n2

)discrepancies. If one

uniformly samples a blowup G′ of G to order n′, and applies a corresponding blowup to

H, then the expected distance between the two blowups is strictly lower than dist(G,H

).

Proof. We show that the expected number of discrepancies under a bijection mapping

each vertex v’s cluster to a vertex in the cluster of π(v) is less than dist(G,H

)(n′

2

),

implying the claim. By the linearity of expectation, it suffices to show that for every

pair of vertices u, v which exhibits a discrepancy under π before the blowup, the

expected number of discrepancies of the two corresponding clusters in G′ and H ′ is

under (n′/n)2 <(n′

2

)/(n2

).

Now, let k = n′ (mod n) and m = bn′/nc. The number of discrepancies due to

u, v is the product of the sizes of u and v’s clusters (denote their sizes cs(u), cs(v)).

24

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 35: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Each of these clusters has size either m or m+ 1; thus

Ex[cs(u) · cs(v)] = 1 · (m ·m) + Pr[cs(u) = m+ 1] · (1 ·m)

+ Pr[cs(v) = m+ 1] · (m · 1)

+ Pr[cs(u) = cs(v) = m+ 1] · (1 · 1)

= m2 + 2 ·m · kn

+ Pr[cs(u) = cs(v) = m+ 1]

= m2 + 2 ·m · kn

+

(k

n· k − 1

n− 1

)<

(m+

k

n

)2

=

(n′

n

)2

This completes the proof.

Incidentally, Pikhurko has shown in [Pik10, Lemma 14] that the distance be-

tween blowups can’t be very far below the distance between the original graphs:

dist(G′, H ′

)≥ 1

3 dist(G,H

), for exactly-balanced blowups; this non-trivial direction

of the distance bound, however, is only relevant to Chapter 4 of this work (see, specifi-

cally, Subsection 4.5.2), and not to this chapter.

Definition 3.2.5. A graph property Π is said to be (s, δ)-inflatable if for any graph G

satisfying Π, of order at least s, all blowups of G are δ-close to satisfying Π. A property

Π is said to be (s, δ)-inflatable on the average if for any graph G satisfying Π, of order

at least s, the expected distance from Π of blowups of G to any fixed order (a uniform

sampling out of all possible blowups to that order) is less than δ.

As noted above, blowups do not affect graph distances overmuch. This implies that

taking a blowup cannot drive you too far away from an inflatable property:

Proposition 3.2.6. Let property Π be (s, δ)-inflatable on the average, let G be a graph

of order n ≥ s, and let n′ > n. The expected distance of G from the property does not

increase by more than δ with a random blowup, i.e. ExG′[dist(G′,Π

)]≤ dist

(G,Π

)+ δ.

Proof. Let H ∈ Π be a graph of the same order as G such that dist(G,Π

)= dist

(G,H

).

Let G′ and H ′ be corresponding random blowups of G and H respectively (as per

Definition 3.2.3). The lemma gives ExG′[dist(G′, H ′

)]< dist

(G,H

); also, since Π is

(s, δ)-inflatable on the average, and since H if of order at least s, and since H ′ is a also

random blowup, its own expected distance from Π is less than δ. We can now use the

triangle inequality to conclude that:

ExG′

[dist(G′,Π

)]≤ Ex

G′

[dist(G′, H ′

)+ dist

(H ′,Π

)]= Ex

G′

[dist(G′, H ′

)]+ Ex

G′

[dist(H ′,Π

)]< dist

(G,H

)+ δ = dist

(G,Π

)+ δ

as claimed.

25

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 36: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Having defined the approximate notion of inflatability, let us make a similar definition

of approximate heredity:

Definition 3.2.7. A property Π is said to be (s, δ)-hereditary if, for every graph in Π,

all of its induced subgraphs of order at least s are δ-close to Π. A property Π is said to

be (s, δ)-hereditary on the average if, for every graph in Π, the expected distance from

Π of a uniformly-sampled subgraph of any fixed order s′ ≥ s is less than δ.

3.2.2 Fixed-order subgraph distributions of graphs

Definition 3.2.8. Given a graph G, consider the graph induced by a uniformly sampled

subset of s vertices. We denote the distribution of this induced subgraph by DsG, the

order-s subgraph distribution of G; DsG(G′) is the relative frequency of a subgraph G′

of order s in G.

Note. In [FN07b], this distribution is called the graph’s q-statistic.

Definition 3.2.9. Let Gs denote the set of all graphs of order s. The distance between

two distributions D, D′ over graphs of order s, denoted dist(D,D′

), is the variation

distance between them, i.e.

dist(D,D′

)=

1

2

∑G∈Gs

∣∣D(G)−D′(G)∣∣

The distance between two graphs’ distributions of order-s subgraphs cannot exceed

their relative distance as graphs by more than a factor depending on s:

Lemma 3.2.10. If two graphs G, H (of order n ≥ s) are δ(s2

)−1-close, then their

order-s subgraph distributions are δ-close, i.e. dist(DsG,D

sH

)≤ δ.

Proof. Let φ : V (G)→ V (H) be a bijection achieving the minimum of the number

of edge discrepancies. The graphs’ being δ(s2

)−1-close means that there are at most

δ(s2

)−1 ·(n2

)such discrepancies. Now consider a uniformly-sampled set of s vertices in

V (G), and the subgraph they induce in G and (through φ) in H. Every pair of vertices

in the subgraph is uniformly distributed among the pairs of vertices of G or of H, so

the probability of having any discrepant edges between these two subgraphs under φ is

at most δ. When we condition on the sample not containing any vertex pair discrepant

under φ, the distributions of such an order-s subgraph of G and of H become identical;

the variation distance between the unconditioned distributions cannot, therefore, exceed

δ.

Another feature of the order-s subgraph distribution is that it does not change

overmuch when taking the blowup of a graph.

26

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 37: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Lemma 3.2.11. Let δ > 0, let G be a graph of order n ≥ 2δ

(s2

), let G′ be a random

blowup of G to order n′ > n, and let H ⊆ Gs. Then∣∣∣∣∣ExG′

[Pr

H∼DsG′

[H ∈ H]

]− PrH∼DsG

[H ∈ H]

∣∣∣∣∣ < δ

Proof. Let DsG′ denote the order-s subgraph distribution of G′, conditioned on the event

that every vertex of the subgraph is in the cluster of a different vertex of G. For any

fixed G′, we have ∣∣∣∣∣ PrH∼Ds

G′[H ∈ H]− Pr

H∼DsG′

[H ∈ H]

∣∣∣∣∣ ≤ dist(DsG′ , D

sG′)

This variation distance is bounded by the probability p that multiple vertices in H

sampled uniformly from G′ are in the same cluster of vertex of G. For a given pair of

vertices of H, the probability of their being in the same cluster is at most the relative

size of a large cluster, which is bounded by 2/n ; union-bounding over all pairs, we have,

irrespective of G′,

p <

(s

2

)· 2

n≤(s

2

)· 2

(s2

) = δ

The proof can now be complete if we show that

ExG′

[Pr

H∼DsG′

[H ∈ H]

]= Pr

H∼DsG

[H ∈ H]

For this purpose, let us analyze separately the various sets of s vertices in G (correspond-

ing to sets of s clusters in G′): The probability of sampling H in H is the probability of

sampling a set S of s vertices, such that the induced graph H = HS on these vertices

is in H; in G′, it is the probability of sampling vertices from the appropriate sets of s

clusters. Let SH be the family of s-vertex sets S with HS ∈ H. Denote by pS(G′) the

probability that a set S′ of s vertices, each from a different cluster of a G vertex, equals

S. Now, by the linearity of expectation,

ExG′

[Pr

H∼DsG′

[H ∈ H]

]= Ex

G′

∑S∈SH

pS(G′)

=∑S∈SH

ExG′

[pS(G′)

]The expectation ExG′ [pS(G′)] is the same, by symmetry, for all s-subsets S, as the

blowup G′ is sampled uniformly. It must therefore be equal to the inverse of the number

of sets S, i.e.(ns

)−1. Thus

ExG′

[Pr

H∼DsG′

[H ∈ H]

]=∑S∈SH

ExG′

[pS(G′)

]=∑S∈SH

(n

s

)−1

= PrH∼DsG

[H ∈ H]

27

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 38: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

as claimed.

Note that while a single event (or a single order-s subgraph or set of s clusters) has

the same expected probability when taking a random blowup, in specific blowups the

probability of an event or a set of clusters may very well be quite different, even for

n s, as one may choose to have, say, the higher-degree vertices have bigger clusters,

and the lower-degree vertices have smaller clusters. The following proposition gives a

deterministic bound on the distance between the subgraph distributions using both the

order of the pre-blowup graph n and the ‘imbalance’ of the blowup:

Proposition 3.2.12. Let G be a graph of order n ≥ s and G′ a blowup of G to order

n′ ≥ n, and let k = n′ (mod n). If n divides n′, then

dist(DsG′ ,D

sG

)<

(s

2

)· 1

n

and for any n ≥ n′ it holds that

dist(DsG′ ,D

sG

)<

(s

2

)· 1

n+ s · mink, n− k

n′≤(s

2

)· 1

n+ s · n

2n′

Proof. Let us first analyze the case of the blowup G′ being exactly-balanced, i.e. n′ = n·kfor some k ∈ N. Consider a sample of an s-vertex subgraph of G′. Conditioning on

the event of every vertex being sampled from the cluster of a different vertex of G, the

distribution of order-s subgraphs of G′ is exactly DsG. Thus the unconditioned distance

dist(DsG′ ,D

sG

)is at most the probability of sampling at least two of the s vertices from

the same cluster. Since G′ is an exactly-balanced blowup, this probability is less than

1/n for a single pair of vertices. Applying a union bound over the(s2

)pairs of vertices

yields dist(DsG′ ,D

sG

)< 1

n

(s2

).

In the general case, G′ is not necessarily exactly-balanced. However, let us choose

one vertex from each of the n′ (mod n) larger clusters to form a set U . the subgraph

of G′ induced by V (G′) \U is an exactly-balanced blowup of G; and with probability at

least 1− s · kn′ , a sample of s vertices from V (G) is in fact sampled from V (G′) \U only,

conditioning on which event the above distance bound holds. Alternatively, think of an

exactly-balanced blowup G′′ of G, to order n′ + n− k. The exactly-balanced distance

holds for G′′, but when conditioning on the event of no vertices being sampled out of

the n− k additional vertices in G′′, it has the same order-s subgraph distribution as G′;

this event’s probability is at least 1− s · n−kn′ .

In the general case, therefore, we have

dist(DsG′ ,D

sG

)< min

1

n

(s

2

)+k

n′,

1

n

(s

2

)+n− kn′

as claimed.

28

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 39: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

3.3 Overview of results

We first state our main result in a simplified manner, for motivation and clarity:

Theorem 3.1. If a hereditary, inflatable graph property has a test making q(ε) queries,

regardless of the size of the input graph, then it has a strongly canonical test — specifically,

a natural test — making O(q(ε)4) queries.

We will in fact prove a mildly stronger version, with the above being a special case:

Theorem 3.1 (exact version). Let Π be a graph property that has a test with queries

involving at most s(ε) distinct vertices, regardless of the size of the input graph, and let

s1 = 12(

31s2

). If Π is

(s1,

16

(s12

)−1)-hereditary on the average and

(s1, s1

−1)-inflatable

on the average, then it has a strongly canonical test whose queried subgraph order is

s1 = O(s(ε)2).

Note. This theorem should also hold also for properties with weaker inflatability — a

higher threshold value than stated above for ε-inflatability on the average — with some

modifications of our proof, and with a worse dependence of the queried subgraph order

on s.

We also prove a weak converse of Theorem 3.1:

Theorem 3.2. If a graph property Π has a natural (not necessarily canonical) test with

queries involving s(ε) distinct vertices, then for every ε′ > ε, Π is(sh, ε

′)-hereditary

on the average and(si, ε

′)-inflatable on the average, for sh = O(s · log

(1

ε′−ε))

and si =

O(s2 · (ε′ − ε)−1log2

(1

ε′−ε))

respectively (with the coefficients sh and si being independent

of the specific property Π).

Let us now recall the proposition from Goldreich and Trevisan discussed in the

introduction:

Proposition ([GT03, proposition D.2], corrected as per [GT05]). Let Π be a heredi-

tary graph property, with a natural test making q(ε) queries. Then Π has a perfectly

canonical (one-sided) test with queried subgraph order O(q(ε)).

Originally, this proposition was stated without requiring that the test be natural (merely

that the number of queries be independent of the order of the input graph). Combining

now this corrected, qualified version above with Theorem 3.1, one obtains:

Corollary 3.3. Let Π be a hereditary inflatable graph property, with a test making q(ε)

queries. Then Π has a perfectly canonical (one-sided) test with queried subgraph order

O((q(ε))2

).

We use the contrapositive of this corollary to provide a more straightforward proof of

[AS06, Theorem 1], even improving it slightly for the case of triangles (using the recent

result in [Elk11]):

29

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 40: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Theorem 3.4. Any ε-test — natural or otherwise, with one-sided or two-sided error

— for the property of being triangle-free makes Ω

((1/ε)c·(log(1/ε))

1+ 2ln(2)·log(1/ε)

)queries,

for some global constant c.

(The lower bound in [AS06, Theorem 1], is (c/ε)c·ln( c/ε ).)

Returning to [GT03, proposition D.2], while for hereditary inflatable properties we

have established it with a power-of-four penalty on the number of queries, for properties

with one-sided tests it can be shown to hold as stated:

Proposition 3.3.1. If a hereditary inflatable property Π has a one-sided (not necessar-

ily natural) test making q(ε) queries, then Π has a perfectly canonical test with queried

subgraph order at most 2q(ε).

Finally, we place the notion of inflatability in the context of proximity-oblivious

testing (see the exposition of this concept in Section 3.8), we prove the following partial

characterization:

Proposition 3.3.2. Let Π be an inflatable hereditary property. Π has a constant-query,

proximity-oblivious test if and only if there exists a constant s such that, for n ≥ s, Πn

consists exactly of those graphs of order n, which are free of order-s graphs outside of

Πs.

3.4 Naturalizing tests

In this section we prove Theorem 3.1.

Let Π be a property meeting the conditions in the statement of the theorem. As Π

has a test with queries involving at most s(ε) vertices (independently of n), by [GT03,

Theorem 2] it has a canonical test, querying a uniformly-sampled subgraph of order at

most 9s, in its entirety. As noted after the citation of this theorem, in Subsection 3.2.1

above, we may assume that the canonical test’s probability of error is at most 136 rather

than 13 , at the cost of increasing the queried subgraph order to s0 = 31s.

One may think of the existence of such a canonical test as meaning that the

membership of a graph in Π is essentially determined by its distribution of (induced)

subgraphs of order s0. This being the case, let us consider a (canonical) ‘meta-test’ for

Π, which estimates whether the subgraph distribution leads to acceptance (of the input

graph G of order n). This meta-test is listed as Algorithm 3.1.

Note. The order s1 of the larger subgraph used for this estimate is chosen so as to ensure

the stability of the distribution under blowups — a consideration which will become

relevant later in this section. On the other hand, s1 is not high enough to properly

estimate the distribution, i.e. estimate the frequency of specific order-s0 subgraphs

(there are exp(Ω(s0

2))

of them) in G.

30

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 41: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Algorithm 3.1 A Meta-Test for Π

1: Uniformly query a subgraph Gsample of order s1 = 12(s02

)= 12

(31s(ε)

2

).

2: If at least a 16 -fraction of the order-s0 subgraphs G′ of Gsample are such that the

(canonical) s0-test accepts G by sample G′, accept. Otherwise reject.

Lemma 3.4.1. Algorithm 3.1 is a valid test for property Π, with probability of failure

at most 1/6 .

Proof. Suppose the input graph G either satisfies Π or is ε-far from satisfying Π. Let

G′ be one of the(s1s0

)order-s0 subgraphs of Gsample. Let XG′ be the indicator for the

s0-test erring (that is, rejecting G in case G satisfies Π, or accepting G in case G is far

from Π) by sample G′. Every order-s0 subgraph of Gsample is in fact uniformly sampled

from the input graph, thus Ex[XG′ ] is the probability of the s0-test erring — at most136 . The expected fraction of order-s subgraphs of Gsample by which the s0-test errs is

therefore also at most 136 . Considering the meta-test’s behavior again, it can only err if

at least a 16 -fraction of the subgraphs of Gsample cause the s0-test to err. by Markov’s

inequality the probability of this occurring is at most 136

/16 = 1

6 .

Let us now modify Algorithm 3.1 to reject samples which are themselves not in the

property at order s1; the result is listed as Algorithm 3.2.

Algorithm 3.2 Modified Meta-Test for Π

1: Uniformly query a subgraph Gsample of order s1 = 12(s02

)= 12

(31s(ε)

2

).

2: If Gsample is not in Π, reject.3: If at least a 1

6 -fraction of the order-s0 subgraphs G′ of Gsample are such that thes0-test accepts G by sample G′, then accept. Otherwise reject.

Lemma 3.4.2. Algorithm 3.2 is a valid test for property Π.

Proof. The additional check only increases the probability of rejection of any input

graph, so it does not adversely affect the soundness of the modified test (that is, a graph

ε-far from Π is still rejected by Algorithm 3.2 with probability at least 56 ≥

23).

As for the modified test’s completeness, we recall that Π is(s1,

16

(s12

)−1)-hereditary

on the average. This implies that, for an input graph in Π, the average distance of

subgraphs of order s1 from Π is 16

(s12

)−1; as each order-s1 subgraph not in Π is at least(

s12

)−1-far from Π, the fraction of order-s1 subgraphs of G which aren’t in Π is at most

16 . Regardless of these, at most a 1

6 -fraction of the order-s1 subgraphs of a satisfying

graph cause Algorithm 3.1 to reject. Union bounding over these two sets of subgraphs

causing rejection we find that the probability of the modified meta-test rejecting a graph

in Π is less than 2 · 16 = 1

3 .

Now, Algorithm 3.2 is not necessarily natural, receiving as input the order n of the

graph G being tested, and passing this value to the original s0-test; but if Algorithm 3.2

31

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 42: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

were somehow also natural, this would complete the proof of Theorem 3.1, as the test

otherwise meets the requirements. Since Algorithm 3.2 is canonical, its naturality means

being strongly canonical: accepting the same set of sampled subgraphs for any input

graph order. Interestingly enough, our modification has indeed made this the case:

Lemma 3.4.3. Let H be a graph of order s1 by which sample Algorithm 3.2 accepts

for at least some input graph order n. Algorithm 3.2 cannot reject for any input graph

order n′ ≥ s1 by sample H.

Proof. Assume on the contrary that Algorithm 3.2 rejects by sample H for some n′ ≥ s1.

We first note that Algorithm 3.2 does not reject by H at order n′ on account of H not

being in Π (as samples which aren’t in Π are rejected at all input orders). We will show

that this invariably implies that the original test is incomplete.

Let Π′n′ denote the set of order-s0 subgraphs by which sample the s0-test accepts

an input graph G at order n′. Our assumption is that the probability of the s0-test

accepting a subgraph of H is less than 16 , or in terms of the subgraph distribution,

PrHs∼Ds0H

[Π′n′]< 1

6 .

Now, consider a random blowup H ′ of H to order n′. Π is(s1,

112

(s02

)−1)-inflatable

on the average, and H is in Π, so

ExG′

[dist(H ′,Π

)]<

1

12

(s0

2

)−1

and by Markov’s inequality,

PrH′

[dist(H ′,Π

)≥ 1

6

(s0

2

)−1]<

1

2

Also, let δ = 16 . Since s1 ≥ 2

δ

(s02

), we may apply Lemma 3.2.11 (substituting H and H ′

for G and G′, s0 for s, s1 for n) for the event of the s0-test accepting at order n′:

ExH′

[Pr

Hs∼Ds0H′

[Hs ∈ Π′n′

]]≤ Pr

Hs∼Ds0H

[Hs ∈ Π′n′

]+

∣∣∣∣∣ExH′

[Pr

Hs∼Ds0H′

[Hs ∈ Π′n′

]]− PrHs∼D

s0H

[Hs ∈ Π′n′

]∣∣∣∣∣< Pr

Hs∼Ds0H

[Hs ∈ Π′n′

]+ δ <

1

6+

1

6=

1

3

and again by Markov’s inequality

PrH′

[Pr

Hs∼Ds0H′

[Hs ∈ Π′n′

]≥ 2

3

]<

1

2

Combining these two facts, we conclude that with positive probability, H ′ is a graph

which is both very close to Π and is accepted by the s0-test with probability at most 23 .

32

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 43: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Now, let H ′ be a graph in Π at distance at most 16

(s02

)−1from H ′. By Lemma 3.2.10,

these two graphs’ order-s0 subgraph distributions are 16 -close, implying that∣∣∣∣∣∣ Pr

Hs∼Ds0

H′

[Hs ∈ Π′n′

]− PrHs∼D

s0H′

[Hs ∈ Π′n′

]∣∣∣∣∣∣ < 1

6

We now use the triangle inequality to bound the probability of the s0-test accepting H ′:

PrHs∼D

s0

H′

[Hs ∈ Π′n′

]≤ Pr

Hs∼Ds0H′

[Hs ∈ Π′n′

]+

∣∣∣∣∣∣ PrHs∼D

s0H′

[Hs ∈ Π′n′

]− PrHs∼D

s0

H′

[Hs ∈ Π′n′

]∣∣∣∣∣∣<

2

3+

1

6=

5

6

This contradicts the original test’s probability of error — it must accept H ′, a graph

in Π, with probability at least 1 − 136 > 5

6 . It can therefore not be the case that

Algorithm 3.2 rejects H at order n′.

Proof of Theorem 3.1. Given a property Π satisfying the conditions, we have devised

Algorithm 3.2: This is a canonical test for Π, with queried subgraph order s1 = 12(

31s2

);

by Lemma 3.4.3, it accepts and rejects the same set of queried subgraphs for all graph

orders n ≥ s1 — that is, it is a natural test.

3.5 Lower bounds for triangle-freeness testing

As discussed earlier, part of our interest in the naturalization of tests is obtaining

lower bounds on testing the property of triangle-freeness (or freeness of other induced

substructures), through lower bounds on one-sided testing or other more fundamental

results.

The current state of the art in terms of an explicitly-stated lower bound is:

Theorem ([AS06, Theorem 1]). The query complexity of any ε-test — natural or oth-

erwise, with one-sided or two-sided error — for the property of being triangle-free is at

least (c/ε)c·ln( c/ε ), for some global constant c.

Now, consider the contrapositive of Corollary 3.3:

Corollary. If a hereditary inflatable property has no perfectly canonical test with queried

subgraph order q′(ε), then it has no test whatsoever (natural or otherwise, with one-sided

or two-sided error) making q(ε) queries such that q(ε)2 = o(q′(ε)).

[AS06, Theorem 1] can be obtained by combining the one-sided lower bound for testing

triangles of [Alo02] with Corollary 3.3, without requiring the careful use of Yao’s method

in [AS06].

33

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 44: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

The proof of the one-sided testing lower bound, in [Alo02], is based on a construction

of a large subset of [n], which is free of arithmetic progressions (i.e. tuples x, x +

d, x+ 2d, x+ 3d, . . .). The specific construction used in [Alo02] is that of Behrend, in

[Beh46]. Recently, after 60 years with no progress, an improvement was made over this

construction by Michael Elkin in [Elk11] (with a simpler proof suggested by Green and

Wolf in [GW10]):

Theorem. For every natural number n, there exists a subset Xn ⊆ [n], with |Xn| =

Ω

(n· 1/4√

log(n)

2√

log(n)

), which contains no 3-term arithmetic progressions.

Now, this new construction can be translated into a lower bound on testing triangle-

freeness either using our methods, or using the Alon-Shapira Yao-style argument from

[AS06], so that an improved two-sided lower bound can be considered to already be

established as the state of the art. However, as it has not been explicitly stated in the

literature, we sketch the proof below.

Lemma 3.5.1 (implicit in [Alo02] and [AS04b]). Let m(ε) be the highest integer with

a subset Xm ⊆ [m] of size εm which contains no non-trivial solutions to the equation

x1+, . . . , xk−1 = (k − 1) · xk (for an odd k). Any one-sided-error test for the property

of a graph being k-cycle-free makes Ω((m(ε))k−2) queries.

Proof Sketch. One constructs a k-partite graph of size Θ(m(ε)), and connects vertex i

in each of the first k − 1 parts to each vertex in the set i+ x | x ∈ Xm in the next

part, for every i. One then connects the vertex i of the kth part with each vertex in

the set i− kx | x ∈ Xm in the first part, for every i. It can be shown that this graph

has Θ(m|X|) k-cycles, all distinct — as two k-cycles can only share an edge if X has a

k-term arithmetic progression. As |X| > εm, the graph is far from being k-cycle-free.

One then blows up the graph by a factor of Θ(n/m). The resulting graph can

be shown to be far from being k-cycle-free, but only has Θ((n/m)k ·m · |X|

)=

O(nk/mk−2

)cycles. Now, a one-sided test making o

((m(ε))k−2

)queries will not find

any of these cycles in the blown-up graph, and will have to accept (as its queries can be

completed into a k-cycle-free graph).

(This argument, with some modification and for the case of 4-cycles in digraphs, is made

in detail in Chapter 5.)

Observation 3.5.2 ([Alo02]). If a set of integers is free of 3-term arithmetic progres-

sions, then it is free of solutions to the equation x1 + x2 = (3− 1)x3.

Combining Lemma 3.5.1 and Observation 3.5.2 with the contrapositive form of

Corollary 3.3, we have, for the case of graphs:

Corollary 3.5. Let m(ε) be the highest integer with a subset Xm ⊆ [m] of size εm

which is free of 3-term arithmetic progressions. Any test for the property of a graph

being triangle-free makes Ω((m(ε))1/4) queries.

34

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 45: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Now, the progression-free set used in [Alo02] has size

|Xn| >n

exp(

10√

ln(n)ln(k))

which implies m(ε) ≥ (c/ε)c·ln( c/ε ), for an appropriate global constant c′. The Elkin

construction has size

|Xn| = Ω

(n ·

1/4√

log(n)

2√

log(n)

)with log(n) being the base-2 logarithm, implying that m(ε) ≥ exp

(c′ · log2p(1/ε)

), for

an appropriate global constant c, and with p = 1 + 1ln(2)·log(1/ε) (we omit the calculation.)

This proves Theorem 3.4.

Note. A generalization of Elkin’s result by Kevin Obryant to k-progression-free sets

in [Obr11] hints at possible similar lower bounds on testing induced k-cycle freeness.

However, the argument in Observation 3.5.2 does not apply to cycles of length over 3

(e.g. 1 + 3 + 5 + 7 = (5 − 1) · 4 is a 5-term linear equation, but the set 1, 3, 4, 5, 7has no 5-term progression); one would have avoid cycles due to such solutions in an

alternative construction.

3.6 One-sided error and natural tests

Observation 3.6.1. If a hereditary property has a strongly canonical test, then this

test must be one-sided.

Proof. If the test for the hereditary property Π (deterministically) rejects any sampled

subgraph G′ of a graph G ∈ Π, the test also rejects G′ when it is the entire graph.

But when G′ is the entire graph, it will always be the sampled subgraph, i.e. the test

rejects G′ with probability 1. G′ can therefore not be in Π — a contradiction to Π

being hereditary.

The implication in Observation 3.6.1 can be reversed, in a way — weak approximate

heredity as a consequence of one-sided testability:

Lemma 3.6.2. If a property Π has a one-sided strongly canonical test with queried

subgraph order s(ε) for some ε, then Π is(s(ε), ε

)-hereditary.

Proof. Let G ∈ Πn for n ≥ s(ε), and let G′ be a subgraph of G of order at least s(ε).

If G′ is ε-far from Π, then it must have an order-s subgraph G′′ by which sample

the test rejects G′. But the test also rejects G by sample G′′, in contradiction to its

one-sidedness.

Note. This lemma is somewhat similar to the second direction of [AS08a, Theorem 2],

in which the existence of a one-sided natural test is shown to imply ‘semi-heredity’.

35

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 46: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

One would hope to somehow get rid of the dependence on ε and find conditions under

which the property is hereditary, at least down to some n0; this becomes possible if

the test is proximity-oblivious, but note that if a property Π has a natural proximity-

oblivious test, then Π is simply the property of being free of subgraphs by which

this test rejects (at least for n ≥ s; see discussion in Section 3.8, and specifically

Proposition 3.3.2).

In the proof of Lemma 3.6.2, we used the one-sidedness of the test to obtain

deterministic approximate heredity; Section 3.7 below deals with the general, two-

sided case, and establishes approximate heredity only on the average. Deterministic

approximate heredity may indeed require the test to be one-sided. For example, the

property Πhalf, containing those graphs with at most 12

(n2

)edges, is

(O(

), δ)-hereditary

on the average, has a two-sided natural test (in fact, its query complexity can be shown

to be O(1/ε2

)), but it is not

(s, 1

2 − δ)-hereditary for any s and δ > 0 (as there are

satisfying graphs with arbitrarily large complete subgraphs).

Returning again to the direction of Theorem 3.1, let us follow an alternate line

of argumentation than the one used to prove the theorem, this time for the case of

one-sided tests.

Lemma 3.6.3. Let Π be an inflatable property. A one-sided canonical test for Π can

only reject an input graph when it samples a subgraph which is not itself in Π.

Proof. Suppose that, for some input graph G of order n, the test samples a subgraph

G′ ∈ Π. Since Π is inflatable, there exists a blowup G′′ of G′ to order n such that

G′′ ∈ Π. Now, G′ is an induced subgraph of G′′, so it is possible for the test to sample

G′ when G′′ is the input graph. Since the test is one-sided, it can not, therefore, reject

an input graph of order n with G′ as the sample.

Proof of Proposition 3.3.1. By [GT03, Theorem 2], Π has a canonical one-sided test

with queried subgraph order s(ε) ≤ 2q(ε), which is also one-sided. By Lemma 3.6.3, this

test only rejects sampled subgraphs which are not themselves in Π. Now suppose we

modify the test so as to reject all sampled subgraphs not in Π. As we are only rejecting

additional subgraphs, the test’s soundness can only improve. As for its completeness,

we note that since Π is hereditary, no graph in Π has any subgraphs outside of Π, so the

test still accepts graphs in Π with probability 1. The resulting test is indeed perfectly

canonical.

3.7 Inflatability and heredity of naturally-testable prop-

erties

Lemma 3.7.1. If a property Π has a strongly canonical test with queried subgraph

order s(ε), with probability of error δ ≤ 13 , then Π is

(2δ

(s2

), ε + 3δ

)-inflatable on the

average.

36

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 47: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Proof. Let G be a graph of order n satisfying Π, for n ≥ 2δ

(s2

), and let G′ be a random

blowup of G to some higher order. Let Π′ be as in Definition 3.2.1 — the set of order-s

subgraphs by which sample the test accepts an input graph. By Lemma 3.2.11, we have∣∣∣∣∣ExG′

[Pr

H∼DsG′

[H /∈ Π′

]]− PrH∼DsG

[H /∈ Π′

]∣∣∣∣∣ < δ

so ExG′[PrH∼Ds

G′[H /∈ Π′]

]< 2δ. By Markov’s inequality

PrG′

[Pr

H∼DsG′

[H /∈ Π′

]> 1− δ

]≤ 2δ

1− δ≤ 3δ

Now, if G′ is rejected by the test with probability at most 1− δ, it cannot be ε-far from

Π; if it is rejected with higher probability, we can’t make any assumptions regarding its

distance. Thus

Ex[dist(G′,Π

)]< Pr

G′

[Pr

H∼DsG′

[H /∈ Π′

]≤ 1− δ

]· ε

+ PrG′

[Pr

H∼DsG′

[H /∈ Π′

]> 1− δ

]· 1 ≤ ε+ 3δ

meeting the requirement for approximate inflatability.

Lemma 3.7.2. If a property Π has a strongly canonical test, with queried subgraph

order s(ε), with probability of error δ ≤ 13 , then Π is

(s, ε + 3

2δ)-hereditary on the

average.

Proof. Let G be a graph in Π of order at least s, let G′ a uniformly-sampled subgraph

of G of order s′ ≥ s, and let pG′ denote the probability of the test rejecting with G′

rather than G as its input graph. The expectation of pG′ is exactly δ, the probability of

the test rejecting G — as the process of sampling an order-s′ subgraph, then sampling

an order-s subgraph out of it, is the same as just sampling an order-s subgraph of G.

We can apply Markov’s inequality and bound the probability of pG′ being too high:

PrG′ [pG′ ≥ 1− δ] ≤ δ1−δ . Since the test is sound, we know that if pG′ is lower than

1− δ, then G′ cannot be ε-far from Π; if pG′ is higher, we do not assume anything about

G′’s distance from Π. Thus

ExG′

[dist(G′,Π

)]≤ Pr

G′[pG′ < 1− δ] · ε+ Pr

G′[pG′ ≥ 1− δ] · 1

≤ 1 · ε+δ

1− δ· 1 = ε+

δ

1− δ≤ ε+

3

Proof of Theorem 3.2. Let δ = 13(ε′ − ε). Our first step is the same as in the proof

of Theorem 3.1 — pre-amplifying the probability of success of the natural test and

37

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 48: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

canonicalizing it. Our modified test remains natural (thus being strongly canonical), with

probability of failure at most δ, and its queried subgraph size is sh = O(s · log

(δ−1))

,

as per the discussion of canonicalization in Subsection 3.2.1. Now, by Lemma 3.7.2, Π is(sh, ε+ 3

2δ)-hereditary on the average, and by Lemma 3.7.1, Π is

(2δ

(sh2

), ε+3δ

)-inflatable

on the average. This meets the claim.

3.8 Natural testability and proximity-oblivious testing

In most works regarding property testing, tests are devised based on a foreknowledge

of the proximity parameter ε: Either the test is given ε as input, or ε is fixed globally.

Goldreich and Ron explore an alternative approach in [GR09]:

Definition 3.8.1. A proximity-oblivious test for property Π with detection probability

ρ(·) is a probabilistic oracle machine, which is given the value n, as well oracle access

to a graph G of order n in the same manner as a usual test. The machine accepts a

graph G ∈ Πn with probability 1, and rejects a graph G /∈ Πn with probability at least

ρ(dist(G,Πn

)).

Notes.

– One can obtain an ε-test in the usual sense by invoking the proximity-oblivious test

Θ(1/ρ(ε)) times.

– A proximity oblivious test has query complexity which may depend on n, but not on

ε.

In this section we concern ourselves with proximity-oblivious tests, that havequery

complexity independent of n.

Lemma 3.8.2. If a hereditary, inflatable graph property has a proximity-oblivious test

making c queries, using s ≤ 2c sampled vertices, then it has a perfectly canonical

proximity-oblivious test with queried subgraph order s (making at most(s2

)queries).

The proof of this lemma is exactly the proof of Proposition 3.3.1, which does not

make any assumptions regarding the test’s use of the value of ε, nor regarding its

probability of rejecting far graphs.

The general results of [GR09] regarding the dense graph model include a char-

acterization of the properties admitting a (not necessarily natural) constant-query

proximity-oblivious test:

Theorem ([GR09, Theorem 4.7]). A property Π has a constant-query proximity-oblivi-

ous test if and only if there exists a constant c and a finite sequence F = (Fn)n∈N of

sets of graphs, such that each Fn contains graphs of size at most c, and Πn is the set of

order-n Fn-free graphs.

38

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 49: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

When limiting our focus to properties which we know to be naturally testable, we

can tighten the characterization:

Proof of Proposition 3.3.2. If Π is the property of being F-free for F = (Πs)c, then

Π is proximity-oblivious testable with a constant number of queries: As established

by Alon, Fischer, Krivelevich and Szegedy in [AFKS00], any graph G is either close

to being F-free, or has δ(ε) · ns induced copies of this forbidden subgraph (with δ

being a double-tower function of (1/ε), as this fact is established using a version of

Szemeredi’s regularity lemma). In this direction, our argument is the same as in the

proof of the general characterization theorem of proximity-oblivious-testable properties

[GR09, Theorem 4.7].

The other direction follows from Lemma 3.8.2: The existence of a proximity-oblivious

test implies the existence of a perfectly canonical test querying a subgraph of order s

and rejecting if it isn’t in Πs. This test accepts, with probability 1, exactly those graphs

which are free of induced subgraphs outside Πs; as it is one-sided, this implies that Π,

at order s and above, is the set of (Πs)c-free graphs.

3.9 Naturalization and inflatability in other dense struc-

tures

The results of this chapter all essentially hold, albeit with different parameters, for any

class of dense structures which fits the general definition in Subsection 2.1.1 — and also

for structures mentioned there which require some trivial reduction to fit that definition,

such as matrices and tensors with no order on their coordinates in each dimension.

There is, however, a subtle point regarding the orders of structures tested: In graphs,

a test whose queries involve s(ε) vertices, when applied to a graph of order under s, can

simply query the entire graph and decide deterministically — using a number of queries

bounded by(s2

). This is not generally possible in multi-partite dense structures: A test

might require more vertices than are present in one of the parts, but it cannot query

the entire graph without making a number of queries depending on other ni’s, which is

not bounded. Instead, the test may require complex behavior, different than for the

general case, to effectively test structures with some parts being small and others large.

While such behavior is worthy of independent study, we wish to make straightforward

generalizations of this chapter’s results, so we choose to ignore this setting. We will

therefore only be generalizing our results to uniform-order tests; and this choice also

motivates the limited scope of our definition of canonicality in Definition 2.2.3.

We shall not repeat the proofs made above for graphs also for the case of general

dense structure classes, but rather state the generalized results and provide proof

sketches.

39

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 50: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

3.9.1 Generalized preliminaries

For the rest of this section, we fix a class of general dense structures (as per the definition

in Subsection 2.1.1), letting k denote the number of vertex parts, t the number of edge

relations, and ri the arity of the ith edge relation. We also denote r = maxr1, . . . , rt.

Observation 3.9.1. Under our assumptions and by Definition 2.1.10, a dense structure

of uniform order s supports up to σ(s, t, k, r) = t · (ks)r potential hyperedges; if the

class of structures is unconstrained, and r1 = . . . = rt = r, then the structure supports

exactly this number.

Theorem 3.6 (Generalization of [GT03, Theorem 2]). If a property Π has a uniform-

order test making q(ε) queries involving at most s(ε) vertices from each part of the

input structure, independently of the size of the input structure and its parts, then

Π has a canonical test, sampling a substructure of order at most 9s(ε) (and making

O(σ(9s, t, k, r)) = O(sr) = O(qr) queries). If the original test is one-sided, then a

queried subgraph of order s(ε) will suffice for such a canonical test, which will also be

one-sided.

Proof Sketch. The transformation of an arbitrary graph test into a canonical one in

[GT03, Section 4] has three steps:

• First, the test is split into two phases: A uniform sampling of vertices, followed by

a (probabilistic) decision based on their induced subgraph, queried in its entirety;

• The second phase of the test is made independent of the labeling of the vertices

of the induced subgraph. In other words, the test is made to accept with the

same probability any two induced subgraphs seen in the second phase which are

isomorphic to each other.

• Finally, the probabilistic aspect of the second phase is discarded by rounding

probabilities, so that induced subgraphs are deterministically either accepted or

rejected.

Considering these three steps, one observes that they do not depend on a graph’s

having two vertices per edge, or on the non-partiteness of general graphs. We can

therefore apply the same transformation to a test of any dense structure: We sample

O(s(ε)) vertices from every part, and query the entire induced substructure on the

sampled vertices (making σ(s, t, k, r) queries). A deterministic decision is now be made

based on this order-s substructure.

The only point one must take into account when canonicalizing tests of uniform-order

partite structures is, that the choice of part from which to sample the next vertex may

depend on previous query results — an aspect missing in the case of graphs. This is

the reason why as many as k · s vertices (the number of vertices in a substructure of

uniform order s) may be required: Instead of adaptively sampling s vertices, choosing

40

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 51: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

one part or another for each of them, we sample s vertices from every part, and can

thus simulate the original test’s sampling using our already-sampled vertices.

One may verify that the rest of the details of the proof of [GT03, Theorem 2]

indeed hold regardless of the choice of general dense structure class (but assuming that

the structure has enough vertices). The constant factor 9 is due to the repetition of

the original test to amplify the probability of success, an amplification necessary for

rounding the acceptance probabilities (and unnecessary for the case of one-sided tests).

This too is the same for any dense structure.

Definition 3.9.2. For a dense structure G in our chosen class, We denote by DsG the

distribution of substructures induced by a uniformly-sampled set of s vertices in each

part — the order-s substructure distribution of G; DsG(G′) is the relative frequency of a

substructure G′ of order s in G.

We let Gs denote all structures of uniform order s in our class of dense structures, and

define the distance between distributions similarly to the case of subgraph distributions

(see Definition 3.2.9).

Lemma 3.9.3 (Generelization of Lemma 3.2.10). If two dense structures G, H are

δ/σ(s, t, k, r)-close, then their order-s substructure distributions are δ-close, that is,

dist(DsG,D

sH

)≤ δ.

Proof Sketch. The proof is the same as in the case of graphs, except that the number of

potential hyperedges in an order-s substructure is bounded by σ(s, t, k, r) rather than(s2

).

Lemma 3.9.4 (Generelization of Lemma 3.2.11). Let δ > 0, let G be a structure with

ni ≥ 2δk(s2

), for all i ∈ [k]; let G′ be a random blowup of G to some higher order

(s1, s2, . . . , sk) (or uniform order s); and let H ⊆ Gs. Then∣∣∣∣∣ExG′

[Pr

H∼DsG′

[H ∈ H]

]− PrH∼DsG

[H ∈ H]

∣∣∣∣∣ < δ

Proof Sketch. The difference in this proof from the case of graphs is that there are as

many as s vertices in each part of each structure in Gs, so one must union-bound over

as many as k(s2

)pairs of vertices which may be sampled from the same cluster, rather

than(s2

)in graphs or other non-partite structures. Otherwise the proof is the same.

3.9.2 Generalization of our main results

Theorem 3.7 (Generalization of Theorem 3.1). If a hereditary inflatable property has

a uniform-order test making q(ε) queries, regardless of the size of the input structure

and its parts, then it has a strongly canonical uniform-order test — specifically, a natural

test — making O(q(ε)2r) queries.

41

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 52: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Theorem 3.7 (Exact version). Let s : R+→ N. There exist s1 = O(k(s2

)), εi =

Ω(1/σ(s, t, k, r)) and εh = Ω(1/σ(s1, t, k, r)) for which the following holds: Suppose

a property Π of dense structures of a certain kind is(s1, εh

)-hereditary on the average

and(s1, εi

)-inflatable on the average, and that Π has a uniform-order test making queries

involving at most s(ε) distinct vertices in each part of the input structure (regardless of

the size of the parts). Then Π has a strongly canonical uniform-order test querying a

substructure of order s1.

Proof Sketch. The proof for the case of graphs works for whatever dense structure we are

concerned with: We canonicalize the original test; switch to estimating the acceptance

probability of the canonical test over a larger (order-s1) substructure; and finally reject

if the larger substructure is itself not in Π. Using Lemma 3.9.3 and Lemma 3.9.4,

analysis shows that this is both a valid test and that it is natural, i.e. the same property

set of sampled substructures is accepted at any input order.

The only adjustments are in the larger sampled substructure size and the heredity

and inflatability parameters:

• The sampled substructure must be high enough for Lemma 3.9.4 to yield a

sufficiently small constant difference in the distributions of order-s substructures;

for our dense structures this is O(k(s2

))instead of the O

((s2

))for the case of

graphs, as discussed in the proof of Lemma 3.9.4.

• The heredity parameter must relate to the larger substructure size s1 as per

the above. Also, it must be strong enough so that, on the average, an order-s1

substructure of a structure in Π will itself be in Π, rather than just being close to

Π; this explains the inverse dependence on the number of edges/hyperedges in

the substructure.

• The inflatability parameter must be such that a random blowup of a graph in Π

is close enough to Π for Lemma 3.9.3 to yield a small constant distance between

the order-s substructure distributions.

The parameters appearing in the statement of the generalized theorem (for uniform-order

structures) indeed meet these requirements.

The converse of Theorem 3.7 also admits exactly the same proof as for the case of

graphs, with a tweaking of the inflatability parameter si similarly to Lemma 3.9.4:

Theorem 3.8 (Generalization of Theorem 3.2). If a property Π has a natural (not

necessarily canonical) test which, for structures of order at least s(ε), makes queries

involving at most s(ε) distinct vertices in each part, then for every ε′ > ε, Π is(sh, ε

′)-hereditary on the average and

(si, ε

′)-inflatable on the average, for sh = O(s · log

(1

ε′−ε))

and si = O(ks2 · (ε′ − ε)−1log2

(1

ε′−ε))

respectively.

42

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 53: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Note. In this direction, we are not limiting the argument to uniform-order tests.

Alon’s [GT03, Proposition D.2] regarding perfectly canonical testing of hereditary

properties (quoted above with its qualification in [GT05]) applies, with the same proof,

to any class of dense structures; with it, and Theorem 3.7, we derive the following:

Corollary 3.9 (Generalization of Corollary 3.3). If a property Π, which is hereditary

and inflatable, has a uniform-order test making q(ε) queries, then it has a canonical

uniform-order test with queried subgraph order poly(q(ε)).

Proposition 3.9.5 (Generalization of Proposition 3.3.1). If a property Π, which is he-

reditary and inflatable, has a one-sided (not necessarily natural) uniform-order test

making q(ε) queries, then Π has a perfectly canonical uniform-order test with queried

subgraph order at most r · q(ε).

43

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 54: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

44

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 55: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Chapter 4

Query complexity hierarchies for

dense graphs and other models

4.1 Introduction

While the rest of this thesis is mainly concerned with properties whose query complexity

depends only on the distance parameter ε, this chapter focuses on properties whose

tests require more queries as graphs grow, and with this dependence of their query

complexity on n.

Goldreich, Goldwasser and Ron’s initial exposition of graph property testing already

considered the question of properties of dense structures that are ‘maximally’-dependent

on n: [GGR98, Proposition 4.1.1] establishes the existence of a property of strings, or

generic functions (from [n] to a finite domain), with Ω(n) query complexity, linear in

the size of the representation; and in [GGR98, Proposition 10.2.3.1], this construction is

built upon to establish the existence of a dense graph property with query complexity

linear in the size of the representation, i.e. q(n) = Ω(n2).

There is no reason to assume a gap in the query complexity anywhere on the

‘spectrum’ between q(n) = Θ(1) and q(n) = Θ(n2), especially as over time, properties

have been established to have all manners of specific query complexities in between:

Graph isomorphism testing, in different variants, has been shown by Fischer and

Matsliah in [FM06] to have query complexities such as Θ(n3/2

)and Θ(

√n); Dyck

languages (parenthesis languages) have been shown to require Ω(n1/11

)queries and be

testable with Ω(n2/3polylog(n)

)queries; et cetera. Indeed, it is natural to expect there

exist properties of dense graphs (or other dense structures) with any arbitrary query

complexity as a function of n: Properties testable with Θ(q(n)) queries, without being

testable with o(q(n)).

In this chapter we prove the existence of such query complexity hierarchies for three

testing models: Beginning with the simple case of properties of generic Boolean functions

(or equivalently, of binary strings); making an aside for the case of bounded-degree

(sparse) graphs; and finally focusing on dense structures, specifically dense graphs. For

45

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 56: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

each model, we provide explicit (probabilistic) constructions for such properties. In fact,

all of these hierarchy results are established in a very similar pattern:

• We start with an appropriate maximally-hard property Π′ for our specific setting.

• A property Πq is constructed for an arbitrary choice of q(n), using mostly some

sort of replication or blowup, so that every structure in Πqn corresponds to some

structure in Π′ of size q(n) or lower.

• Testing Π′ is shown to be reducible (either generally, in the sense of Definition 2.4.1,

or for some subset or distribution) to testing Πq , establishing an Ω(q(n)) lower

bound on the query complexity of Πq .

• A test for Πq , making O(q(n)) queries, is explicitly presented, which essentially

considers (for an input structure of order n), which smaller structure(s) from Π′

is the input a blowup or a replication of, if at all.

There is, however, some subtlety to the question of the existence of properties of

arbitrary query complexity, and even the existence of maximally-hard properties.

A first aspect to consider in this respect is the kinds of properties we wish to obtain.

A “purely random” property will almost surely be hard to test, but it will also be hard

to decide (and impossible to decide for all n by a single machine only receiving n);

certainly such a property will not be polynomially decidable in general; and it will not

have useful structural features. Such is the hard property for the dense graph model,

constructed in [GGR98] (although [GGR98, Proposition 10.2.3.2] already improves on

this by making the property NPTIME-decidable). Another improvement, in Goldreich

and Trevisan’s [GT03, Theorem 1], is an NPTIME monotone property; to decide

it or to test it, one needs to recognize outputs of a certain pseudorandom generator,

making this an NPTIME problem not likely to be in PTIME. Thus the question

stands whether there are even Θ(n2)-hard properties which are definitely in PTIME

while exhibiting most or all of these features. Also, features of properties may be more

difficult to establish at q(n) = o(n2); specifically, a maximally-hard property is one-sided

testable, but in a somewhat meaningless sense: Reading the entire graph meets the

query complexity lower bound, and one can thus obviously make a deterministic decision

with no error; for q(n) = o(n2), a one-sided testability is not at all a trivial matter.

In order to provide hierarchies with these desirable features, we first strengthen the

hardness results from [GGR98], by constructing a maximally-hard property which is

both PTIME-decidable and PTIME-testable, in Section 4.2. We use this particular

hard property, and the original one of [GGR98, Proposition 10.2.3.1], to establish

three hierarchy theorems for the dense graph model, corresponding to three different

combinations of the above features:

• PTIME-decidability + PTIME-testability, in Section 4.5.

• Monotonicity, in Section 4.6.

46

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 57: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

• PTIME-decidability + one-sided testability, in Section 4.7.

A second subtle aspect regards the reductions in the pattern described above for

proving hierarchy results. As in Chapter 3, all of these dense model results involve

careful use and analysis of graph blowups (see Definition 2.3.6) to relate testing at

higher and lower graph orders. Specific to this chapter is the following question: If a

graph is far from another graph, or from a property, what guarantee is there that it

remains far from it when applying a blowup? The answer is that, in fact, a (balanced)

blowup can bring graphs much closer together, even making them identical in some

cases; we must therefore prove an appropriate bound on this effect, for different settings

in every section, so as to preserve the hardness of properties through blowups. That is

perhaps the key to this chapter’s dense model results.

4.2 Hard properties decidable and testable in PTIME

Several hierarchy results in this chapter involve hard properties decidable in polynomial

time (as per Definition 2.3.2): The result regarding generic functions in Section 4.3,

and two of the three dense graph model results, in Section 4.5 and Section 4.7). As

our construction of the maximally hard-to-test graph property uses the maximally

hard-to-test Boolean function property, we state and establish the existence of both of

them together through a single argument:

Theorem 4.1. There exist a PTIME-decidable property Π of generic Boolean func-

tions, and a constant ε4.1 > 0, such that any ε-test for Π with ε ≤ ε4.1 must make Ω(n)

queries, i.e. query at least a constant fraction of the function values.

Theorem 4.2. There exist a PTIME-decidable property Π of dense graphs, and a

constant ε4.2 > 0, such that for any sufficiently large n, any ε-test for Π with ε ≤ ε4.2

must make at least c4.2 ·(n2

)= Ω

(n2)

queries, i.e. query at least a constant fraction of

the potential edges.

4.2.1 The difficulties deciding hard-to-test properties in [GGR98]

Let us recap the two-step construction of a hard graph property (of query complexity

Ω(n2)) in [GGR98, Proposition 10.2.3.1]:

• First, a certain small sample space is shown to yield a hard property of Boolean

functions: The sample space is small enough to be sparse, so that a random

function is far from it; the sample space also exhibits strong pseudorandomness,

in that its projection on any (small) constant fraction of the coordinates is close

to a projection of a uniformly-sampled random function. Thus a test making at

most this many queries cannot tell apart functions sampled uniformly from 0, 1n

47

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 58: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

from functions sampled from the small sample space, while it is necessary for it to

usually reject the former and accept the latter.

• Next, the domain of the boolean functions is mapped to the set of (unordered)

pairs of graph vertices, and the set of functions is made closed under graph

isomorphism (i.e., permutations of the vertices), by adding all isomorphic images

of the constituent (labeled) graphs. The result is a graph property, with the

original boolean function values corresponding to adjacency matrix entries. The

parameters are such, that even though the resulting property may be contain

much as n! times as many graphs as the property of Boolean functions, it is still

sparse within the set of all possible graphs; a random graph is still far from it;

and it still has the strong pseudorandomness with respect to projections — so the

indistinguishability is maintained.

There are two difficulties, one in each of the steps of construction, which make the

resulting property hard to test in PTIME:

• The small sample space used in the first step is in NPTIME (that is, one can

decide membership in it with an NPTIME machine), but it is not clear whether

it is in PTIME.

Overcoming this difficulty: Instead of the small sample space used in [GGR98],

we shall use another adequate pseudorandom space, the membership in which is

decidable in PTIME.

• One can easily determine whether a given (labeled) graph is a permutation of

a (labeled) graph in the small sample space — using a short witness, being the

permutation function (i.e., this can be determined in NPTIME). But it is not

clear whether this can be done in PTIME, without the witness.

Overcoming this difficulty: We augment the graphs constructed using the Boolean

functions, so that after applying an isomorphism (permuting the vertices), the

original index of each vertex can be efficiently recovered. Thus the final class

can be recognized in PTIME by reversing the isomorphism, reconstructing the

Boolean function and determining whether it is in the sample space.

4.2.2 The alternative construction

We wish to use a sample space of graphs, the membership in which is efficiently decidable,

such that constant-size fractions of it do not reveal enough to make a decision about

the entire graph. To this end we begin with such a sample space for binary strings,

rather than graphs, which is d-wise independent, i.e. its distribution projected onto

any d coordinates is uniform — for d = Ω(n). The existence of such a space is a

long-established result due to Alon, Babai and Itai:

48

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 59: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Proposition ([ABI86]). There exists a global constant αABI > 0 and a linear code,

explicitly constructible by a PTIME Turing machine given n as input, which maps

strings of length n/1000 to strings of length n, such that every αABIn positions in a

codeword are linearly independent (and consequently, any assignment to them can be

extended to an equal, positive, number of codewords).

Such a code (consisting of evaluations of low-degree polynomials) is constructed using

a parity-check matrix spanning a 0.999n-dimensional vector space (the “dual code”),

in which every vector has Hamming weight at least αABIn. The space of codewords

will be our sample space, and the parity-check matrix can be used to efficiently decide

membership in the code.

This result in itself is already sufficient for a construction proving Theorem 4.1; but

as it will be undertaken as a part of the construction and proof regarding dense graph

properties, we shall not set down the proof at this point. Instead, we move from functions

to graphs. Consider the same code for N =(n2

), and fix some efficiently-computable

well-ordering on the set i, j | 1 ≤ i, j ≤ n.

Definition 4.2.1. For a sequence s = (s1, . . . , sN ) ∈ 0, 1N , we define Gs = ([n], Es),

the graph corresponding to s where i, j ∈ Es whenever the i, jth bit of s, by the

order , is 1.

If s is a codeword, Gs is said to be a codeword graph. Obviously, as long as a graph is

labeled, it can be decided in PTIME whether it is a codeword graph or not.

The set of labeled codeword graphs is not in general closed under isomorphism, and

does not therefore constitute a graph property. As was done in the [GGR98, Proposition

10.2.3.1] construction, we wish to close the set under isomorphism — but first we must

augment the graphs so as to be able to easily recover their original labels. Specifically,

Definition 4.2.2. For a graph G = ([m], Es) of order m, the (1 mod 4)-separating

augmentation of G is the graph G′ = ([4m+ 1], E′s), obtained by adding a (3m+ 1)-

vertex labeled clique to G, and connecting every vertex j ∈ V with the first j vertices

of the clique, i.e.

E′s = Es ·∪ u, v |m+ 1 ≤ u, v ≤ 4m+ 1 ·∪ j,m+ ` | j ∈ [m] ∧ ` ∈ [j]

we similarly define the (2 mod 4), (3 mod 4) and (0 mod 4) separating augmentations,

in which the large clique is of size 3m+ 2, 3m+ 3 and 3m+ 4 respectively.

The three additional variants of the separating augmentation are defined so that

augmented graphs will not be constrained to have a specific order modulo 4 (order 4n+1

in the basic definition). In most of our analysis below we shall ignore the additional

variants, implicitly using the same argument for them as well.

49

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 60: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

The hard property Π. Our hard property Π =⋃n∈N Πn shall constitute, at every

order n ≥ 5, the set of isomorphic images of separating augmentations of graphs of

order b(n− 1)/4c, with the original graph Gs having undergone an (n mod 4)-separating

augmentation.

Lemma 4.2.3. Π is decidable in PTIME.

Proof. Consider some n = 4m+ i, for i ∈ 1, 2, 3, 4. Given a graph of order n, which

is the result of a separating augmentation, we note that the vertices originally in the

(3m+ i)-clique are distinguishable from the rest, as their degree is at least 3m, while

the degree of vertices from the pre-augmented graph is at most m− 1 to other vertices

from the pre-augmented graph, and at most m to vertices in the clique, or 2m − 1

in total. Having separated the clique and the original vertices, the original index of

each original vertex is equal to the number of its neighbors in the clique. We can

thus efficiently reconstruct the (single) original order-m graph corresponding to any

separating augmentation (or determine that our input is not such an augmentation).

Having reconstructed the smaller graph, our earlier arguments implies we can decide in

PTIME whether the string s corresponding to Es is a codeword or not.

To complete our analysis, we shall use Yao’s method to demonstrate that Π cannot

be tested using o(n2)

queries. Fix some sufficiently large n, let m = b(n− 1)/4c and

i = ((n− 1) mod 4) + 1, and consider two distributions:

Gn: A uniform distribution over the augmentations of codeword graphs of order m (i.e.

over Πn), and

Rn: A uniform distribution over the augmentations of all graphs of order m.

Note that any n beyond some threshold value can be chosen, as our construction allows

for augmentations from any sufficiently large order m to any orders 4m+ 1, . . . , 4m+ 4.

Lemma 4.2.4. If two graphs G1, G2 of order m are δ-far from each other, then (pairs

of isomorphic images of) their separating augmentations to order n = 4m + i are

(δ/32−O(1/m))-far from each other.

Proof. In this proof, as in a few additional ones in this chapter, it will be easier for us

to bound distances by accounting for two-tuple discrepancies with respect to a bijection

between graphs rather than the edge discrepancies, i.e. for every discrepant edge u, vas per the above, we count both (u, v) and (v, u); this allows us to separate the counts

for each vertex in G. As there are no self-loops in our graphs, the number of tuple

discrepancies is exactly double the number of edge discrepancies.

Let G′1, G′2 denote the augmentations of the two far graphs. Clearly, a bijection

which maps (the copy of) G1 to (the copy of) G2 exhibits at least

(m

2

)=

16

((4m+ i

2

)− (12 + 8i)m+ i(i− 1)

2

)=

16

(4m+ i

2

)·(

1−O(

1

m

))50

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 61: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

discrepancies.

Now suppose that some vertex v of G1 in G′1 is mapped to a vertex of the large

clique in G′2. v is connected to at most 2m− 1 vertices in G′1 (m in the large clique and

m− 1 in G1), while the large clique vertex in G′2 is connected to at least 3m vertices.

This mapping of v therefore incurs more than m discrepancies of the form (v, u). We

conclude that by mapping G1 vertices to G′2 large-clique vertices, one can reduce the

number of discrepancies no more than by a factor of 4 +O(1/m). Thus any bijection

between G′1 and G′2 has at least 2δ16·(4+O(1/m))

(4m+i

2

)· (1−O(1/m)) discrepancies, so G′1

is (δ/32−O(1/m))-far from G′2.

Lemma 4.2.5. The probability of a graph sampled from Rn being δ4.2.5 = 0.4/64-close

to a graph in Πn is o(1).

Proof. Let R′m denote the uniform distribution over all graphs of order m; A sample

from Rn can be obtained by applying an augmentation to a sample from R′m.

Now, Πn is the set of augmentations of codeword graphs; by Lemma 4.2.4, if a graph

sampled from Rn is 0.4/64-close to a graph in Πn, then its pre-augmentation graph

(that is, its corresponding graph from R′m) is at least 0.4-close to a codeword graph (for

sufficiently large n). It thus suffices to prove that the probability of a graph sampled

from R′m being 0.4-close to a codeword graph is o(1).

Indeed, this follows from the fact that the codeword graphs are a sparse set: Each

codeword graph has at most (m)! = 2O(mlog(m)) (labeled) isomorphic images. The

sample space size (the number of codeword graphs) is 20.001(m2 ), so the number of their

isomorphic images is 2(0.001+o(1))(m2 ). There are∑0.4(m2 )

k=0

((m2 )k

)graphs which are 0.4-close

to a specific codeword graph (corresponding to the possible choices of k ≤ 0.4(m2

)edges

to add or remove); and it holds that∑0.4(m2 )

k=0

((m2 )k

)≤ 2Hb(0.4)·(m2 ) = o

(20.971·(m2 )

)=

2(0.972+o(1))·(m2 ) — where Hb(·) denotes the binary entropy function, which satisfies

Hb(0.4) < 0.971. Thus, for a sufficiently large n, the total number of order-m graphs

which are 0.4-close to the set of codewords is under 20.973·(m2 ); since Rn is uniformly

distributed over all 2(m2 ) labeled graphs of order m, the claim follows.

Lemma 4.2.6. Let M be a probabilistic oracle machine, whose number of queries is at

most d = αABI

(m2

)>(

116αABI − o(1)

)(n2

). It holds that Pr

[MRn = 1

]= Pr

[MGn = 1

].

Proof. We establish the claim using two reductions — to distributions over graphs of

order m, then to distributions over strings.

Let G′m denote the uniform distribution over (labeled) codeword graphs of order

m, and let R′m denote the uniform distribution over all graphs of order m. Both

distributions Rn and Gn are obtained by applying the same augmentation to samples

from G′m and R′m respectively; and the result of each query to an augmented graph

depends on one or no edges of the original order-n graph. It therefore suffices to prove

the claim assuming queries are made to the original order-m graphs rather than their

51

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 62: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

augmentations or the isomorphic images thereof — that is, it suffices to prove that one

cannot distinguish between R′m and G′m.

Now, the result of a query of a potential edge i, j in the edge set Es of a basic

graph is the i, jth bit of the string s corresponding to Gs. G′m corresponds, therefore,

to a uniform sample from the d-wise independent space of length-(m2

)strings, and R′m

corresponds to a uniform sample of a string of this length. Thus the claim reduces to

asserting that using d queries, one cannot distinguish between strings sampled from the

d-wise independent sample space and from a uniform distribution, respectively. For

non-adaptive tests, this is the definition of the d-wise independence; but adaptivity does

not offer an advantage, since for any choice of up to d queries already made, and for any

sequence of results for these queries, the conditional distributions for their completion

into d query result are the same (and uniform) regardless of the choice of edges to query.

A rigorous treatment of this transition from a non-adaptive to an adaptive bound may

be found in [Fis04, Section 8].

Proof of Theorem 4.2. Our constructed property Π is decidable in PTIME, as estab-

lished by Lemma 4.2.3. Now, set ε4.2 = δ4.2.5 and c4.2 = 117αABI, and let n be sufficiently

large for Lemma 4.2.6 to hold with d = 117αABI

(n2

). An ε-test for Π accepts with proba-

bility at least 2/3 a graph sampled from Gn. By Lemma 4.2.6, if the test makes less

than c4.2

(n2

)queries, it will accept a graph sampled from Rn with the same probability.

By Lemma 4.2.5, with probability 1− o(1), a graph from Rn is δ4.2.5 = ε4.2-far from Π,

so the probability of the test accepting graphs in Rn which are ε4.2-far from Π is at

least 2/3− o(1). Thus for every sufficiently large n there exists a specific graph which

is ε4.2 > ε-far from Πn, and is accepted with probability over 1/2 — a contradiction.

Proof Sketch for Theorem 4.1. The proof uses a subset of the arguments above — one

need not construct anything from strings or functions in the d-wise independent sample

space, so the membership decision is clearly in PTIME; also, the sample space is itself

sparse enough so that a random Boolean function is ε4.1-far from it with high probability.

One can thus construct appropriate indistinguishable distributions as for the case of

graphs.

4.3 A hierarchy of generic function properties

In the generic function testing model, the objects tested are functions from [n] to a

finite domain; as the elements of the tested functions’ domain are not interchangeable

as in the case of graphs, one can think of such functions as strings. Our construction

will only require Boolean functions (or binary strings).

Definition 4.3.1. The absolute distance between two functions f, g : [n]→ 0, 1 is

the number of elements of [n] on which they differ. The (relative) distance dist(f, g)

between f and g is the absolute distance normalized by a factor of 1/n.

52

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 63: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

The definition of a property and of satisfying a property or being ε-far from satisfying

it are the same as in the dense graph model (except that the classes are of functions,

and the distances are as defined above). An ε-test for a property of Boolean functions

is also defined as for the dense model, except that a test’s oracle access is to a generic

function f , with a query being an index i ∈ [n] and a reply being the value of f(i). A

test may, alternatively, receive ε together with n as a parameter, so a single algorithm

is used for all values of ε.

Definition 4.3.2. A function q : N→ N is said to be a reasonable query complexity

function for generic functions if q(n) ≤ n, and the image of q(·) is infinite, that is,

lim supn→∞ q(n) =∞.

Theorem 4.3. There exists a constant ε4.3 > 0, such that for every reasonable q(·),there exists a property Π of Boolean functions that is testable with one-sided error using

q(n) +O(1/ε) queries and running in time polynomial in its number of queries, but not

ε-testable with o(q(n)) queries, even with two-sided error, for ε ≤ ε4.3. Furthermore, if

q(n) is computable from n in poly(n) time, then the property is PTIME-decidable, and

if it is computable in poly(q(n)) time, then the property has a test whose running time

is polynomial in its number of queries.

Note. We assume that the test is given n as input in binary representation rather than

in unary, otherwise the computation of q(n) can only be polynomial in q(n) if n is

polynomial in q(n).

4.3.1 Property construction

For the rest of this section, fix q(·).

Observation 4.3.3. We may assume, without loss of generality , that q(n) ≤ n/2, as

otherwise we could replace q(n) with q′(n) = bmax(q(n)/2, 1)c, and Theorem 4.3 would

yield a property with the same features but a different constant.

The complexity-q property. Let Π′ =⋃m∈N Π′m be a property of Boolean functions

which requires Θ(n) queries to test, and is PTIME-decidable as a property of strings;

Theorem 4.1 guarantees that such properties exist.

Now, let m,n be such that m = q(n). For some f ′ ∈ Π′m, consider the function

f(i) = f ′(1 + (i− 1 mod q(n))) = f ′(1 + (i− 1 mod m)). The domain of f is [n]; and

it consists of bn/q(n)c duplicate copies of f ′ with perhaps another final incomplete

copy. With this construction in mind, our property of query complexity q(n) shall be

Πq =⋃n∈N Πq

n, with Πqn consisting of the functions f constructed for all f ′ in Π′m,

for m = q(n).

Observation 4.3.4. If q(n) is computable from n in poly(n) time, then Πq is decidable

in PTIME: To decide whether f over domain [n] is in Πq , one computes q(n), determines

53

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 64: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

whether its corresponding f ′ is in Π′m (for m = q(n)), and checks whether f(i) = f(i+m)

for every i ≤ n−m.

4.3.2 Lower and upper query complexity bounds

Lemma 4.3.5. Testing Π′m is (f, 1, h)-reducible to testing Πq , for f(ε) = ε/2 and the

partial function h(m) = mini ∈ N | q(i) = m (defined at orders n for which the set is

non-empty).

Proof. Let m ∈ N be such that n = h(m) is defined, and consider some Boolean function

f ′ over domain [m]. One can construct the function f , corresponding to f ′, over domain

[n] as in the construction of Πq ; if f ′ ∈ Π′, then f ∈ Πq , and if f ′ is ε-far from Π′, one

must change an ε-fraction of every complete copy of f ′ in f to obtain a function in Π′,

so over all of values f , one must change at least a bn/mc·mn · ε fraction to obtain bn/mcduplicate copies of a function in Π′. (It may be the case that less or no changes are

necessary to the incomplete copy of f ′). As q(n) ≤ n/2, this fraction is at least ε/2.

Given oracle access to f ′, one can simulate an oracle to f , making one query to f ′ so as

to answer a single query made to f . This meets the requirements of Definition 2.4.1.

The lower bound follows as a corollary of the reduction above, when setting ε4.3 =

f(ε4.1) = ε4.1/2:

Lemma 4.3.6. For ε ≤ ε4.3, any ε-test for Πq makes Ω(q(n)) queries.

Proof. By Lemma 4.3.5 above, testing Π′m is (f, 1, h)-reducible to testing Πq , with a

linear f(ε) and the partial function h(m) = mini ∈ N | q(i) = m, defined for m in the

image of q(·); since q(·)’s image is infinite, the Ω(n) lower bound for testing Π′ when

ε ≤ ε4.1 implies (by Lemma 2.4.2) a lower bound of Ω(q(n)) on the number of queries

required to test Πq when ε ≤ f(ε4.1) = ε4.3

For the upper bound, we present a straightforward test for Πq , listed as Algorithm 4.1.

Algorithm 4.1 A test for Πq

1: for Θ(1/ε) times do2: Uniformly sample j ∈ [q(n)] and r ∈ [dn/q(n)e − 1].3: If r · q(n) + j ≤ n and f(r · q(n) + j) 6= f(j), reject.4: end for5: Query all of f(1), . . . , f(q(n)).6: Reconstruct the function f ′ : [q(n)]→ 0, 1 s.t. f ′(i) = f(i).7: Deterministically decide whether f ′ ∈ Π′ and answer accordingly.

Lemma 4.3.7. Algorithm 4.1 is a non-adaptive one-sided-error test for Πq making

q(n) +O(1/ε) queries. Furthermore, if q(n) is computable in poly(q(n)) time, then the

test’s running time is polynomial in the number of queries.

54

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 65: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Proof. The number of queries of Algorithm 4.1 is clearly as stated. The first part of

the test has running time linear in 1/ε ; then come the steps dependent on n, being

the computation of q(n), then the querying of q(n) values and the decision whether f ′

is in Π′m. The test’s running time is spent computing q(n), determining whether the

first q(n) values are in Π′, and additional work taking time linear in q(n) (ignoring an

addition of O(1/ε)). As Π′ is decidable in PTIME, the decision takes time polynomial

in q(n). Thus if q(n) is computable in poly(q(n)), the test’s overall running time is

polynomial in q(n) +O(1/ε), being its number of queries.

As for completeness and soundness: If f ∈ Π, then by definition it is a repetition

of some f ′ in Π′ and will therefore be accepted. On the other hand, if f is accepted

with probability at least 2/3, then the f ′ constructed by the test is necessarily in Π′,

and f must be ε-close to a repetition of f ′ — as otherwise the first phase of the test

would reject with probability at least ε/2 at every iteration (again, since q(n) ≤ n/2),

and thus with probability at least 2/3 over all iterations. Thus if f is ε-far from Πq

then the test accepts it with probability lower than 1/3.

Theorem 4.3 is now proven by a combination of the query complexity lower bound

of Lemma 4.3.6 and the upper bound established through the valid test in Lemma 4.3.7,

and Observation 4.3.4 regarding the PTIME-decidability.

4.4 An aside: A hierarchy of bounded-degree graph prop-

erties

This section regards testing bounded-degree graphs, in which any single vertex is

connected to at most d vertices: |Γ(v)| ≤ d. Respecting this bound, E is represented in

this model by a function:

Definition 4.4.1. For a graph G = (V,E) with maximum degree d, an edge function

is a function gG : V ×[d]→ V ∪ ⊥ such that g(v, i) = u ∈ V if u is the ith neighbor

of v (by some arbitrary order) and g(v, i) = ⊥ if v has less than i neighbors.

The neighbors of v in G are g(v, 1), . . . , g(v,deg(v)).

Definition 4.4.2. The absolute distance between two bounded-degree graphs G, H of

order n is the minimum distance between pairs of edge functions gG, gH representing

them, which is in turn the number of values one has to modify in gG to get gH . The

(relative) distance dist(G,H

)between G and H is the absolute distance between them

normalized by a factor of 1/dn.

Note. Unless one wishes to test bounded-degree digraphs, it must be the case that

whenever u = g(v, i) for some i, v = g(u, j) for some j; and there are in fact only at

most dn/2 edges. Any modification of the edge function must respect this constraint.

55

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 66: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

The definition of a property and of satisfying a property or being ε-far from satisfying

it are the same as in the dense graph model, except for the different normalization of

the distance. A property test in the bounded-degree model for a graph property Π is

defined as in the dense model, except that its oracle access is to a graph edge function

gG, and its queries are to specific values of this function (“what is the index of the ith

neighbor of v?”).

Note. As in the dense model, one could alter the definition to remove the artificial

dependence of tests on n through the need to use vertex indices, but since this chapter

is concerned with tests that depend on n, we shall not explore this here.

Theorem 4.4. In the bounded-degree model, there exist constants d ∈ N and ε4.4 > 0,

such that for every q : N→ N with an infinite image, there exists a (downward) monotone

property of degree-d-bounded graphs that is testable with one-sided error using O(q(n)/ε)

queries, but not ε-testable using o(q(n)) queries, even allowing two-sided error, for any

ε ≤ ε4.4. Particularly, the property of degree-d-bounded graphs being 3-colorable, while

having connected components of size at most q(n), is such a property.

To establish any hierarchy theorem for the bounded-degree model in the common

pattern of this chapter (and similarly to Theorem 4.3), we need a property known to be

maximally hard. As implied in the statement of Theorem 4.4 above, for bounded-degree

graphs this shall be the property of being 3-colorable, which is also monotone, and

exhibits some additional features which we shall make use of in the proof. It is known

to be hard, by a result of Bodganov, Obata and Trevisan:

Theorem ([BOT02, Theorem 2]). There exist constants ε3-COL and d, such that any

ε-test of d-bounded graphs for being 3-colorable makes Ω(n) queries, even when allowed

to have two-sided error, for any ε ≤ ε3-COL.

From the remainder of this section, let us fix d to be as guaranteed by [BOT02,

Theorem 2], and fix also q(·), assuming without loss of generality that q(n) ≤ n/2 (see

Observation 4.3.3 for the justification; here we would be dividing q by 2d rather than 2

to obtain the inequality).

The complexity-q property Let Π′ be the property of being 3-colorable, and denote

Π′ =⋃m∈N Π′m. Our property is Πq =

⋃n∈N Πq

n, with Πqn consisting of all graphs

made up of connected components of size at most q(n), which are all 3-colorable, i.e.

every connected component itself satisfies Π′.

4.4.1 Lower and upper query complexity bounds

Lemma 4.4.3. Testing 3-colorability is (f, 1, h)-reducible to testing Πq for f(ε) = ε/2

and the partial function h(n) = mini ∈ N | q(i) = n (at orders n for which h(n) is

defined).

56

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 67: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Proof. For n = mini ∈ N | q(i) = m, and given a graph G′ of order m and degree at

most d, consider a graph G consisting of t = bn/mc disjoint disconnected copies of G′,

and an additional n mod m isolated vertices. If G′ ∈ Π′, then G ∈ Πq . For the case

of G′ being ε-far from Π′, we note that due to the downward monotonicity of Πq , the

distance of G from Πq is the number of edges one must remove to achieve a graph in

Πq (i.e. there is no benefit in adding edges).

We also note, that graphs in Πq are themselves 3-colorable (being disjoint unions

of 3-colorable graphs), so the edges removed must make G a 3-colorable graph. This

requires in particular making every induced subgraph of G 3-colorable, including its

(previously) connected components. We conclude that the minimum number of edge

removals necessary is exactly the number of edges whose removal is required to make each

connected component 3-colorable in itself. To make one of the connected components

3-colorable, we must remove at least an ε-fraction of its edges, and the overall fraction

of edges to be removed is at least t·mn ε. As by assumption q(n) ≤ n/2, this fraction is

at least ε/2, so G is ε/2-far from Πq .

Finally, given oracle access to an edge function of G′, one can simulate an oracle

to an edge function of G: For some v′ ∈ V ′, being the jth copy of some v ∈ V , its ith

neighbor will be the jth copy of the ith neighbor of v.

This meets the requirements of Definition 2.4.1.

The lower bound follows as a corollary of the reduction above, when setting ε4.4 =

f(ε3-COL) = ε3-COL/2:

Lemma 4.4.4. For ε ≤ ε4.4, any ε-test for Πq makes Ω(q(n)) queries.

Proof. By Lemma 4.3.5 above, testing Π′m is (f, 1, h)-reducible to testing Πq , with a

linear f(ε) and h(m) = mini ∈ N | q(i) = m, for m in the image of q(·); since q(·)’simage is infinite, the Ω(n) lower bound for testing Π′ when ε ≤ ε3-COL implies a lower

bound of Ω(q(n)) on the number of queries required to test Πq when ε ≤ f(ε3-COL) =

ε4.3.

For the upper bound, we present a test for Πq , listed as Algorithm 4.1. As in the

case of generic functions, the test is quite straightforward.

Lemma 4.4.5. Algorithm 4.2 is a one-sided-error test for Πq making q(n) · O(1/ε)

queries.

Proof. For every iteration of the main loop of Algorithm 4.2, we make at most d · q(n)

queries before either deciding that the component is too large or querying the entire

component; thus the number of queries of Algorithm 4.2 is as stated. If G ∈ Πq , by

definition it consists of components of size at most q(n) which are in Π′, and will

therefore not be rejected. On the other hand, if G is accepted with probability at least

2/3, then it must be the case that at most an ε-fraction of the vertices lie in components

57

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 68: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Algorithm 4.2 A test for Πq

1: for Θ(1/ε) times: do2: Uniformly sample v ∈ V (G).3: while The neighbors of vertices known to be in the connected component of v

have not all been queried do4: Query another unknown neighbor of a vertex in the connected component of v.5: If v’s component is now known to contain at least q(n) + 1 vertices, reject.6: end while7: If v’s connected component (now fully explored) is not in Π′, reject.8: end for9: accept.

which the test would reject for being too large not in Π′. These can all be made into

isolated vertices, by removing at most ε ·dn edges, i.e. G in this case is close to satisfying

Πq . Thus if G is ε-far from Πq , then the test accepts it with probability lower than

1/3.

Theorem 4.4 is now proven by a combination of the query complexity lower bound

of Lemma 4.4.4 and the upper bound established through the valid test in Lemma 4.4.5.

Note. Theorem 4.4 also holds for higher values of d beyond the minimum guaranteed

by [BOT02, Theorem 2], by the same construction, when adjusting ε4.4 to account for

the higher number of possible edges.

Is it possible, as an improvement over the Theorem 4.4, to construct the property

with query complexity Θ(q(n)) such that the test’s dependence on ε is additive rather

than multiplicative? i.e. obtain a test with query complexity Θ(q(n) + 1/ε) as is the

case in Theorem 4.3? One can alter the construction above so that the graph is made

up of ‘marked’ components, all being copies of the same 3-colorable graph, and use it

with some out-of-component gadgets for marking a graph over the copies of the same

vertex in the various components. This super-imposed graph could be used to ensure

that every pair of copies of two vertices is connected in all components, or in none of

them. However, one can’t use this sparse graph to check arbitrary pairs of components,

as there would be Ω(n/q(n)) components and one would need a walk of length at least

Ω(log(n/q(n))) in the super-imposed graph to reach all of them, even if the graph were

an expander or a balanced tree. One would also need to ensure the super-imposed graph

to be appropriate — but this in itself may not be an easy task: [GR02] presents a lower

bound of Ω(√n) for testing an order-n bounded-degree graph for the property of having

a certain degree of expansion.

4.5 A hierarchy of PTIME-testable properties

In this and the next two sections we return to the dense model for property testing,

specifically to dense graphs, proving three hierarchy theorems for three possible combi-

58

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 69: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

nations of features of properties of arbitrary complexity. The first of them, presented

in this section, regards properties which are PTIME-decidable and PTIME-testable,

using the hard property constructed in Theorem 4.2, carrying its PTIME decidability

to the properties themselves at any query complexity and the testability essentially also

to the optimal tests for these properties.

Definition 4.5.1. A function q : N→ N is said to be a reasonable query complexity

function for dense graphs if q(n) ≤(n2

), and the image of q(·) is infinite, that is,

lim supn→∞ q(n) =∞.

Theorem 4.5. There exists a constant ε4.5 > 0, such that for every reasonable q(·) (in

the sense of Definition 4.5.1), there exists a property of dense graphs that is testable

with two-sided error using O(q(n)/ε2

)queries, but not ε-testable with o(q(n)) queries,

even allowing two-sided error, for ε ≤ ε4.5. Furthermore, if q(n) is computable from

n in poly(n) time, then the property is PTIME-decidable, and if it is computable in

poly(q(n)) time, then the property has a test whose running time is polynomial in its

number of queries.

4.5.1 Property construction

Vertex dispersal and pre-blowup construction

Our property Πq will consist of copies, or rather, blow-ups, of graphs from a maximally-

hard property, similarly to the proof of Theorem 4.3. However, in order for us to be

able to tell vertices apart from each other after their having been blown up, we would

like the neighborhoods of different vertices in pre-blown-up graphs to be “substantially

different” from each other:

Definition 4.5.2. Let α > 0. A graph G of order n is said to be α-dispersed if, for

every two different vertices u, v ∈ V (G), their neighbor relations disagree on at least αn

elements. In other words, |(Γ(v) \ Γ(u)) ∪ (Γ(u) \ Γ(v))| ≥ αn. A set of graphs is said

to be dispersed if there exists a single α > 0 such that all graphs constituting the set

are α-dispersed.

Note. This notion of dispersion has nothing to do with the notion of dispersers as

relaxed randomness extractors (as surveyed in Shaltiel’s [Sha04]).

We begin with the maximally-hard graph property of Theorem 4.2, denoted here

Π′ =⋃n∈N Π′n, which has query complexity Θ

(n2), and is also PTIME-decidable. We

now augment the graphs from Π′, so as to make them dispersed:

Definition 4.5.3. Let G = (V,E) be a (labeled) graph of order n. Supposing for ease

of notation that V = [n], the dispersing augmentation of G consists of:

1. Setting n′ = 2dlog2(2n+1)e ∈ [2n+ 1, 4n].

59

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 70: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

2. Adding n′/2 isolated vertices to the graph before the original vertices, making the

vertex set [n′/2 + n] (with vertices n′/2 + 1, . . . , n′/2 + n being the vertices of G).

3. Adding an n′-clique to the graph, making the vertex set [n+ 3n′/2].

4. For every vertex i ∈ [n′/2 + n] (the original and isolated vertices), adding an

edge between vertex i and vertex n′/2 + n+ j (the jth vertex of the large clique)

whenever the inner product of i− 1 with j− 1, when viewed as log2(n′)-bit strings,

is 1 rather than 0.

Notes.

– A dispersing augmentation is a different operation than the separating augmentation

used in Subsection 4.2.2; however, since dispersing augmentations are the only ones

used in this section, we refer to them throughout the rest of the section merely as

augmentations.

– Graphs of different orders have dispersing augmentations of different orders; this is

the reason why we do not simply augment to size 2n′. Additional motivation for the

specifics of the definition can be found in their use below.

The dispersed pre-blowup property We set Π′′ to constitute all isomorphic images

of dispersing augmentations of graphs from Π′.

Pre-blowup construction analysis

The dispersed set Π′′ is a graph property — albeit empty for (infinitely) many graph

orders. Each labeled graph in Π′′ consists of a large clique, a smaller graph from Π′

with some additional isolated vertices and a “Hadamard-like” bipartite graph between

them. Π′′ is not the final property we shall be testing, but in order to complete our

construction we must establish several of its features:

Lemma 4.5.4. Π′′ is 1/8-dispersed, and the minimum degree of graphs in Π′′n is higher

than n/4.

Proof. Let G′′ ∈ Π′′, and let n be such that G′′ is an isomorphic image of the augmen-

tation of a graph of order n. Showing that the neighborhoods of every two vertices in

G′′ differ by at least n′/4 vertices establishes the claim regarding dispersion.

Let us consider the neighborhoods of pairs of vertices based on vertices’ membership

in the large clique:

• For the case of one vertex outside the large clique and another from the large

clique, the large-clique vertex is connected to n′ − 1 other vertices in the large

clique, while the other vertex of the pair is connected to exactly n′/2 of the large

clique vertices (by construction of the Hadamard-like bipartite graph), so the two

neighborhoods differ on at least n′/2− 1 ≥ n′/4 vertices.

60

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 71: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

• For two vertices i1, i2 outside the large clique, their neighborhoods differ with

respect to each large clique vertex j such that the inner product of j with i1 ⊕ i2is 1, i.e. the neighborhoods differ on n′/2 of the large-clique vertices.

• For pairs of large clique vertices, an analogous claim as for in-clique pairs could

have held for them, had there been exactly n′ non-large-clique vertices; this is

not the case, but let us think of them as having ‘lost’ a single difference of their

neighborhoods for every isolated vertex ‘missing’ from the augmentation: At most

n′ − (n+ n′/2) = (n′/2− n) ≤ (n′/2− n′/4) = n′/4; thus the neighborhoods of

pairs of large-clique vertices differ on at least n′/2− n′/4 = n′/4 vertices.

Also, any vertex, either in or out of the large clique, is connected to at least n′/2

vertices of the large clique, hence the minimum degree is n′/2 ≥ |V (G′′)|/4.

We now wish to show that the dispersing augmentation preserves distances, but

before doing so we require the following simple result:

Lemma 4.5.5. Let G1, G2 be two graphs of order n at distance ε. If one adds n′ − nfull-degree vertices, or alternatively n′ − n isolated vertices, to each of the graphs,

their distance becomes exactly ε ·(n2

)/(n′

2

)> ε(n/n′)2. Specifically, there are optimal

bijections between the augmented graphs in which G1 is mapped to G2.

Proof. We prove for the case of isolated vertices; the case of full-degree vertices is

similar.

Clearly, by taking the optimal bijection between G1 and G2 and expanding it into

a bijection between the additional isolated vertices, one obtains a bijection with ε(n2

)discrepancies.

In the other direction, denote by G′1 and G′2 the graphs with the isolated vertices

added. Suppose a bijection φ : V (G′1)→ V (G′2) maps some vertex v ∈ V (G1) to an

isolated vertex; φ must have some isolated vertex of G′1 mapped to some u ∈ V (G2). If

one remaps v to u, the number of edge discrepancies does not increase: |Γ(v)|+ |Γ(u)|discrepancies from mapping v and the isolated vertex respectively are avoided, and

the number of discrepancies added is the size of the symmetric difference of the two

neighborhoods, Γ(v) and Γ(u), which is at most the number avoided. We thus conclude

that there is an optimal φ which maps V (G1) to V (G2), so the minimum number of

discrepancies cannot be less than ε(n2

).

Lemma 4.5.6. If G is ε-far from Π′, then the dispersing augmentation of G is ε/250-far

from Π′′.

Proof. Let n be the order of G, and let G′ be its dispersing augmentation, having

n′′ = n+ 32n′ = n+2dlog2(2n+1)e vertices. To be in Π′′n′′ , a graph must be an augmentation

of an order-n graph in Π′; specifically, it must have an induced copy of some G ∈ Π′,

an appropriate number of isolated vertices and a large clique. If G is ε-far from Π, and

61

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 72: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

one only modifies the copy of G in G′, with the isolated vertices and the large clique

vertices keeping their respective roles, then the fraction of G′ which must be modified

to obtain a graph in Π′′ is ε(n2

)/(n′′

2

)> ε(n2

)/(

7n2

)> ε/50.

However, one might also have vertices of G′ outside of the copy of G take the role of

G vertices. To bound the effect of such mappings, let us consider the tuple discrepancies

(rather than edge discrepancies) in a bijection from G′ to some graph in Π′′.

First suppose that large-clique vertices are mapped to non-large-clique vertices (G or

isolated vertices), and vice-versa. Specifically, let v in G or an isolated vertex be mapped

to a large-clique vertex v′, with some large-clique vertex u mapped to a G-vertex or

isolated vertex u′.

Had the augmentation had n′ non-large-clique vertices, each large-clique vertex

would be connected to n′/2 of them by the Hadamard-like graph. The augmentation

has n′/2 + n non-large-clique vertices, so each large-clique vertex is connected to at

least n of them. A large-clique vertex therefore has degree at least (n′ − 1) + n, while a

G, G or isolated vertex has degree at most n′/2 + (n− 1); thus u must have at least

n′/2 + 1 vertices removed and v must have at least n′/2 + 1 vertices added, i.e. v, v′,

u and u′ contribute at least n′ + 2 discrepancies. If one were to map v to u and v′

to u′, the number of discrepancies would have been at most 4n′ (discrepancies in the

Hadamard-like bipartite graph) plus 2(n− 1) (discrepancies within G), less than 5n′

in total. Thus by altering the mapping as just described, the number of discrepancies

increases by a factor of 5 at most; overall, with the same maximum increase factor

in discrepancies, one can avoid any mapping of large-clique vertices to G or isolated

vertices.

Now, if large-clique vertices are only mapped to large-clique vertices, the discrepancies

under the mapping can be divided into discrepancies within the Hadamard-like bipartite

graph, and discrepancies within the set of G and isolated vertices. By Lemma 4.5.5,

this latter number of discrepancies is no less than 2ε(n2

)(as the isolated vertices were

added to two graphs at distance ε), so the overall number of discrepancies is at least this

much. Having increased the overall number of discrepancies by at most a factor of 5 by

enforcing no large-clique vertex to be mapped to a non-large-click vertex, we conclude

that an unconstrained bijection has at least 15 · 2ε

(n2

)> 2

250ε(

7n2

)tuple discrepancies, so

the distance of G′ from Π′′ is at least ε/250.

Lemma 4.5.7. Testing Π′ is (f(ε), 1, h′′(n))-reducible to testing Π′′, for f(ε) = ε/250

and h′′(n) = 32 · 2

dlog2(2n+1)e + n ≤ 7n.

Proof. For a graph G, we simulate an oracle to the corresponding augmentation G′

of G, with 32n′ + n = 3

2 · 2dlog(2n+1)e + n = h′′(n) vertices. Queries involving the the

large clique, the Hadamard-like bipartite graph or the n′/2 isolated vertices of G′ can

be answered without making any queries to G, based on vertex indices only; queries

regarding pairs of vertices from the pre-augmented G are simply passed to the oracle to

62

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 73: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

G. If G satisfies Π′, then its augmentation satisfies Π′′, by definition; the other direction

is Lemma 4.5.6: If G is ε-far from Π′, then G′ is ε/250 = f(ε)-far from Π′′.

Lemma 4.5.8. Π′′ is PTIME-decidable.

Proof. Suppose a graph G′ of order n′ is in Π′′. G′ is a dispersing augmentation of

some G in Π′ of order n (satisfying n′ = 32 · 2

dlog(2n+1)e + n, an equation from which

n can be calculated efficiently). We can easily tell apart the large clique vertices in

G′ from all other vertices, as their degree is at least 2dlog(2n+1)e + bn/2c while the

degree of other vertices is at most 2dlog(2n+1)e−1 + n − 1 < 2dlog(2n+1)e. We can also

tell apart the isolated vertices added to G′, as G itself has no isolated vertices (and

we can ignore the edges to the large clique, which we have set apart). Now, having

located the (isomorphic) copy of G in G′, we can ensure in PTIME that it is indeed in

Π′′, as Π′′ itself is PTIME-decidable. It remains to ensure that the bipartite graph

between the large clique and the other vertices is Hadamard-like, as in the definition of

the augmentation.

This would be perfectly immediate with 2dlog(2n+1)e vertices both inside and outside

the large clique — in that case the bipartite graph’s adjacency matrix is a full square

Hadamard matrix, and vertex indices can be permuted according to a permutation of

the dlog(2n+ 1)e bits in the index of a vertex, with the permutation resulting again

in a Hadamard matrix. In this case, one can verify the adjacency matrix by locating

dlog(2n+ 1)e large clique vertices, with the ith refining the partition induced by the

previous ones from 2i−1 into 2i cells of equal size. One then ensures that the vertices

outside this ‘basis’ each have a neighborhood which is the exclusive-or operation of a

unique subset of the basis vertex neighborhoods.

In our case, we can also successively locate appropriate large-clique vertices, but

more careful accounting is necessary. We proceed from the most to the least significant

bits — a concept which has meaning in our graph, as one cannot simply permute the

vertices outside the clique (there are not enough of them, and isolated and G vertices

are not interchangeable). Thus one begins by finding a clique vertex disconnected from

all isolated vertices and connected to all G vertices. It is necessarily the first vertex of

the large clique (according to the original labeling).

One can now limit the focus to the isolated vertices, and successively locate vertices

splitting the existing partition cells of the isolated vertices into halves, choosing such

a vertex with the minimum number of neighbors in G. At some point one gets the

same number of non-neighbors and neighbors in G for the splitting large-clique vertex,

and from this point on the order of bits within indices of G and isolated vertices is

immaterial, since these bits may be permuted without requiring that vertices be missing

from G′. The successive choice of large-clique vertices for the ‘basis’ can from that point

on ignore the balance of G neighbors and non-neighbors and proceed as in the simple

case of the full Hadamard matrix. Eventually an appropriate ‘basis’ of dlog(2n+ 1)evertices is obtained and the other large-clique vertices can be verified to each uniquely

63

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 74: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

correspond to an index in[2dlog(2n+1)e]. Thus G′ ∈ Π′′ will be accepted by a PTIME

computation.

Conversely, if the algorithm sketched out above accepts G′, it has appropriately

partitioned it into large-clique, isolated and G vertices; verified that G is in Π′′; and

that the bipartite graph to the large clique is valid. This implies G′ is in Π′′.

Blowup construction and analysis

Definition 4.5.9. For n ∈ N, the Πq pre-blowup order for n, denoted m(n, q) or m for

short, is the highest integer up to and including⌊√

q(n)⌋, for which Π′′m is not empty,

or 1 if there is no such integer.

Observation 4.5.10. m(n, q) is no lower than 12

√q(n) − 1 (comparing 3

2n′ + n for

consecutive values of n). Consequently,(m(n,q)

2

)= Θ(q(n)).

The complexity-q property. Πq =⋃n∈N Πq

n is the property for which Πqn contains

all (isomorphic images) of blowups of graphs in Π′′m(n,q) to order n. In other words,

a graph in Πqn has m clusters of size either bn/mc or dn/me, with complete bipartite

graphs of edges between these cluster pairs corresponding to edges of a graph in Π′′m(n,q).

Lemma 4.5.11. If q(n) is computable from n in poly(n) time, then Πq is PTIME-

decidable.

Proof. By Lemma 4.5.4, Π′′ is dispersed; specifically, the neighborhoods of each vertex

in a graph in Π′′ are distinct. This holds after blowup; that is, the neighborhoods of

vertices from different clusters are distinct. Thus, given an order-n graph G, one can

cluster it; ensure that G is a blowup, with m(n, q) clusters, all of size bn/mc or dn/me;reconstruct a pre-blown-up graph G′′ of order m; and determine whether G′′ ∈ Π′′.

Since Π′′ is PTIME-decidable (as per Lemma 4.5.8 above), q(n) ≤(n2

), and the other

tasks can all be carried out in time polynomial in n, we conclude that the total time

necessary is polynomial in n.

4.5.2 A query complexity lower bound for the constructed property

As in the case of generic functions, this lower bound uses a reduction from testing Π at

order n to testing Π′ at a lower order — as we’ve augmented and blown up graphs from

Π′. However, unlike replications of generic function, blowups — even exactly-balanced

ones — do not necessarily preserve the distance between graphs; see [GKNR10] for an

example due to Arie Matsliah, of a constant factor decrease in distance when blowing

up corresponding graphs from two families. When blowups are not exactly balanced,

distance can even be nullified:

64

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 75: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Example 4.5.12. Let G1 have one isolated vertex and a 2-path on two vertices, and

let G2 have two isolated vertices and two connected vertices; the distance between G1

and G2 is 1/6 — removing one edge of the 2-path makes G1 into G2. Now consider

a blowup of these order-4 graphs to order 5: In G1 the isolated vertex is replicated,

while in G2, one of the connected vertices is replicated. The (unlabeled) result of both

blowups is a graph consisting of two isolated vertices and a 2-path on three vertices;

thus the distance has dropped from 1/6 to 0.

Fortunately, while the distance may decrease, this change can be bounded when

one of the graphs is dispersed, even for the more problematic case of n not dividing n′,

where the blowup cannot be exactly-balanced:

Lemma 4.5.13. There exists a global constant c4.5.13 > 0, such that for every n, ε, α

and every pair of (unlabeled) graphs (G1, G2) of order n, with G1 being α-dispersed, the

following holds: If G1 and G2 are ε-far from each other, then any pair of (balanced)

blowups of G1 and G2 to order n′, are at a distance of at least c4.5.13 · α · ε from each

other.

Note. In the case of exactly-balanced blowups, an even stronger result of Oleg Pikhurko,

published independently of our work, holds: The distance between the blowups is no

lower than a third of the original distance, regardless of their dispersal ([Pik10, Lemma

14]).

Proof of Lemma 4.5.13. Roughly, we argue that the dispersal feature of G1 makes it

approximately optimal to map complete clusters of one graph to complete clusters of

the other to the extent possible, rather than splitting clusters of the first graph among

several clusters of the other graph.

Let us label the vertices of both graphs, so that we may denote V (G1) = V (G2) = [n]

(this induces a labeling of the blowup clusters). Let G′1 and G′2 denote the respective

blowups of the two graphs. Let t = bn′/nc; the clusters in G′1 and G′2 all have either t

or t+ 1 vertices. The (relative) weight of a cluster of s vertices in a graph G, denoted ρ,

is the fraction s/|V (G)|; the relative weight of a pair of clusters is the product of their

weights.

Consider a bijection π′ : V (G′1)→ V (G′2) which minimizes the number of discrepan-

cies; in the context of this proof we will be counting the tuple (ordered pair) discrepancies

of π′ rather than the edge discrepancies.

If the blowups were exactly-balanced (that is, with every cluster having exactly

t = n′/n vertices), and every cluster of G′1 were mapped by π′ to a cluster of G′2 (of the

same size), one could construct a corresponding map π : V (G1)→ V (G2), with t−2 as

many discrepancies as π′; and since G1 and G2 are ε-far, this would imply that π′ has

at least t2 · 2ε(n2

)≈ 2ε

(nt2

)discrepancies (the distance can’t be preserved exactly, since

the fraction of (v, v) tuples is smaller in larger graphs; if one normalizes distances by

n2, then one gets an equality here).

65

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 76: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Now, we refer to a cluster of G2 of weight ρ as pegged if it has more than 12ρ of

its weight, over a half, in vertices which π′ maps from a single cluster of G1, and as

unpegged otherwise. (e.g. note that a cluster of size 2 is pegged if and only if both

of its vertices are mapped from the same cluster). Also, let ε′ be such that there are

exactly 2ε′n′2 discrepancies under π′.

We first show that the total weight P of unpegged clusters in G2 is bounded as a

function of ε′.

Indeed, consider an unpegged cluster in G′2 with weight ρ. This cluster must have

vertices mapped to it from at least two clusters of G1. Order all of its source clusters

by decreasing number of vertices mapped to the G′2 cluster, breaking ties arbitrarily,

and start taking a union of them from the first on. At some point the union of clusters

contains between ρ/3 and 2ρ/3 of the vertices. Now match arbitrarily as many as

possible of these vertices to vertices from the remaining source clusters. The result is

a set of at least ρn′/3 disjoint pairs of vertices mapped to the unpegged G′2 cluster,

each two coming from different clusters of G′1. Every such pair contributes at least

αtn discrepancies to the total count: The two vertices’ neighborhoods disagree on αn

vertices in G1, and at least α · tn in the blowup G′1 (possibly significantly more). When

mapped to the same cluster in G′2, they must be made to have the same neighborhood;

regardless of which neighborhood this is, for every disagreement, one of the two vertices

must have an edge removed or added.

The set of all unpegged clusters, having total weight P, contributes, therefore,Pn′

3 · αtn = αP3 n′(tn) ≥ αP

3(n′)2

2 = αP6 (n′)2 discrepancies. If P > 12ε′/α, this exceeds

2ε′n′2, the total number of discrepancies — an impossibility.

For a pegged G′2-cluster, consider the G′1-cluster being the source of the majority of

its vertices. Can such a G′1-cluster be the source for two separate G′2 clusters? Indeed,

it can, for t = 1 — a 2-vertex G′1 cluster pegging two 1-vertex G′2 clusters. However, by

the pigeonhole principle, for each such G′1 cluster, there must exist some 2-vertex G′2cluster whose two vertices come from different G′1 clusters, i.e. an unpegged G′2 cluster

of the same weight. Thus the total weight of two-pegging G′1 -clusters is no higher than

the weight of unpegged G′2 clusters, 12ε′/α; and the weight of the G′2 clusters they peg

is at most 24ε′/α. Let us refer to these clusters as jointly-pegged and to the rest of the

pegged clusters as singularly-pegged.

Now, consider a bijection π between the vertices of G1 and G2, such that for every

singularly-pegged G′2-cluster i2 and its source G′1-cluster i1, π maps vertex i1 to vertex

i2, i.e. π “agrees with the majority mapping” of π′ for singularly-pegged clusters; the

rest of π is set arbitrarily. This definition is consistent, as the singular pegging ensures

that our definition does not make two constraints on the mapping of a single G1 vertex.

We note that discrepancies under π, of (i, j) with (π(i), π(j)), can be ‘charged’ to

discrepancies under π′, if the G′2 clusters corresponding to π(i) and π(j) are singularly-

pegged: If (π(i), π(j)) is discrepant with its source tuple (i, j), then the majority of

vertices in G′2 cluster π(i) form a discrepant tuple with vertices from the majority in the

66

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 77: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

cluster π(j) — because their sources under π′ are vertices in clusters i and j respectively.

Now, As G1 and G2 are ε-far, π must have at least 2ε(n2

)> εn2 discrepancies

(counting tuples rather than sets). Thus the total number of vertex pairs in pairs of G′2clusters, whose corresponding G2 vertex pairs are discrepant under π, is at least 1

4εn′2

(as for a given π discrepancy, the product of the corresponding cluster weights is at

least (t/n′)2 > (1/2n)2). Up to three quarters of these pairs ( 316εn

′2) may have at least

one vertex not originating in the pegging G′1 cluster, and are therefore not known to be

discrepant; also, less than 4 · Pn′2 of these pairs may involve vertices from unpegged or

doubly-pegged clusters. The remaining pairs must be discrepant under π′ as well. We

thus arrive at an inequality relating ε and ε′:

1

4εn′

2 − 3

16εn′

2 − 4 · Pn′2 < 2ε′n′2

1

16ε− 48

ε′

α< 2ε′

ε < 32ε′ + 768ε′

α

This implies ε′ > αε/800. The fraction of discrepant edges (rather than tuples) under

π′ is therefore ε′n′2/(

n′

2

)> ε′/2 and the claim follows for c4.5.13 = 1/1600.

Lemma 4.5.14. Testing Π′′ is (f, 1, h′)-reducible to testing Πq , for f(ε) = (c4.5.13/8) ·εand h′(n) = mini ∈ N |

⌊√q(i)

⌋= n (at orders n for which h′(n) is defined; in which

case n = m(h′(n), q)).

Proof. Even for orders n for which h′(n) is defined, we only consider those orders for

which Π′′n is non-empty (as otherwise, a trivial test for Π′′ will simply reject).

Given a graph G of appropriate order n, we apply the blowup to order h′(n) as in

the construction of Πq , obtaining a graph G′. By our construction, If G satisfies Π′′, G′

satisfies Πq . In the other direction, consider a graph G which is ε-far from Π′′. Since G

is ε-far from every individual graph in Π′′, and by Lemma 4.5.4, it is also 1/8-dispersed,

we may apply Lemma 4.5.13, and conclude that the blowup G′ of G is (c4.5.13/8) · ε-far

from the blowup of every graph in Π′′, that is, far from every graph in Πqh′(n), and

hence this far from Πq as a property.

Also, Given oracle access to G, one can easily simulate an oracle to G′, using at

most one query to G for the answer to any query made to G′.

Lemma 4.5.15. Testing Π′ is (f(ε), 1, h(n))-reducible to testing Πq , for

h(n) = h′(h′′(n)) = min

i ∈ N

∣∣∣∣ ⌊√q(i)⌋ =3

2· 2dlog2(2n+1)e + n

f(ε) = c4.5.13/2000 · ε

(at orders n for which h′(n) is defined).

67

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 78: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Proof. The reduction is a combination of the reductions from testing Π′ to testing Π′′

(as per Lemma 4.5.7, and from testing Π′′ to testing Π (as per Lemma 4.5.14).

We can now prove the lower bound, setting ε4.5 = c4.5.13/2000 · ε4.1:

Lemma 4.5.16. For ε ≤ ε4.5, any ε-test for Πq makes Ω(q(n)) queries.

Proof. By Lemma 4.5.15 above, testing Π′m is (f, 1, h)-reducible to testing Πq , with

f(ε) = c4.5.13/2000 · ε and h(n) = mini ∈ N

∣∣∣ ⌊√q(i)⌋ = 32 · 2

dlog2(2n+1)e + n

. h(n)

has an infinite image, and f(ε) is continuous and contains the interval (0, c4.5.13/2000).

Also, the lower bound for testing Π′ with ε ≤ ε4.1 is q′(n) = Ω(n2)

queries. Finally,

q(n) has is infinite range, thus so does h(n). We now apply Lemma 2.4.2, and obtain

a lower bound of Ω((h−1(n)

)2)on the number of queries required to test Πq with

ε ≤ f(ε4.5) = ε4.5. The proof is completed by noting that when h−1(n) is defined, its

value is Θ(√

q(n))

(see also Observation 4.5.10).

4.5.3 A test for the constructed property

An adaptive, two-sided error test for Πq is listed as Algorithm 4.3. For clarity of analysis,

the test makes the assumption that n/m is an integer, in which case the graphs in Π

are exactly-balanced blowups, with no need to account for the small difference in cluster

sizes; we later argue that this assumption can be foregone.

Algorithm 4.3 A test for Πq

1: ε′ ← ε/5, m←⌊√

q(n)⌋.

Phase I: Clustering and representative vertex selection.

2: Ssig ← uniform sample of Θ(log(m)) signature vertices.3: Scsize ← uniform sample of s′ = Θ

(mlog(m) /ε′2

)cluster size estimation vertices.

4: Query all edges between Scsize and Ssig.5: Cluster the vertices of Scsize using the known part of their neighborhoods.6: If the number of clusters is not exactly m, reject.7: If any cluster has size outside the range (1± ε′)s′/m vertices, reject.8: R← An arbitrarily selection of one representative vertex in Scsize from each cluster.

Phase II: Representative validation.

9: for Θ(1/ε) times: do10: Uniformly select a pair of vertices u, v.11: Cluster u and v using their neighborhoods in Ssig.12: If u or v are in none of the m existing clusters, reject.13: Let ru, rv ∈ R denote the representative vertices of the two vertices’ clusters.14: If u, v ∈ E and ru, rv /∈ E, or vice-versa, reject.15: end for

Phase III: Checking the pre-blown-up graph.

16: Query the order-m subgraph induced by R.17: Decide whether the induced subgraph is in Π′, and answer accordingly.

68

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 79: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Lemma 4.5.17. Algorithm 4.3 is a valid test for Πq , making O(q(n)) queries. Fur-

thermore, if q(n) is computable in poly(q(n)) time, then the test’s running time is

polynomial in its number of queries.

Proof. The number of queries made by Algorithm 4.3 is dominated by the queries of

edges between Ssig and Scsize, and by the querying of the m-vertex subgraph in the

final step. The number of queries made there is O(ε−2 ·mlog2(m)

)+(m2

); as we are

interested in the dependence on m, this is Θ(m2)

= Θ(q(n)).

As for running time, the potentially time-consuming parts of the test are computing

q(n) and deciding whether the order-m subgraph induced by the representatives is in

Π′; if the former task can be carried out in poly(q(n)) time, then by Lemma 4.5.11 the

latter task will require poly(√q(n)) time, which is specifically poly(q(n)), so this part

of the claim holds.

We now turn to the test’s completeness. Let G ∈ Πn be a blowup of G′ ∈ Π′′m. By

Lemma 4.5.4, Π′′ is 18 -dispersed; thus for every pair of (different) vertices u, v ∈ G′, a

uniformly-sampled vertex is located in only one of their neighborhoods, with probability

at least 18 . For a sample of Θ(log(m)) vertices uniformly, the probability that the

neighborhoods of u and v to have the same intersection with all sampled vertices is

less than 16

(m2

)−1. The same is true when u, v are vertices of G, from different clusters,

and the signature vertices are sampled from G rather than in G′. Union-bounding over

all(m2

)pairs of G′ vertices, we find that with probability at least 1− 1

6 , the signature

vertices induce the m clusters in G corresponding to the vertices of G′, each of size n/m.

Also, the probability that the fraction of sampled validation vertices from a certain

cluster is outside the range (1± ε′) · s′/m is less than exp(Ω(−ε′2s′/m

))< 1/6m (using

large deviation bound for sums of low-probability indicators; see, e.g. [ASE92, Theorem

A.11]). Thus with probability at least 1 − 16 , Scsize contains (1± ε′) · s′/m vertices

from each cluster. Assuming all of the above occurs, Phase I does not reject; Phase II

cannot reject since the u and ru, v and rv have the same neighborhoods respectively;

and in Phase III the test correctly reconstructs the pre-blowup graph G′ (regardless of

which representatives were chosen) and accepts, as G′ is in Π′′. Thus the probability of

accepting G is at least 2/3.

It remains to establish the soundness of the test. Let us suppose that an input graph

G is accepted with high constant probability (e.g. 1/3), and show that it cannot be

ε-far from Π.

The signature set Ssig is said to be a good signature if the clustering it induces has

at least m clusters each of size (1± 2ε′)n/m (and a bad signature otherwise). We first

show that the high probability of acceptance implies that G has a good signature, which

is sampled as Ssig. Assume to the contrary that Ssig is bad.

If Ssig only induces less than m clusters, then the test must reject in Phase I, so we

assume at least m clusters are induced. Suppose some m vertices of Ssig originate in m

of these induced clusters. Now consider the distribution of the s′−m remaining vertices

69

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 80: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

of Scsize. These must contain between (1− ε′)s′/m−1 and (1 + ε′)s′/m−1 vertices from

each of these induced clusters, for Phase I not to reject G. Specifically, the remainder

of Scsize must contain this number of vertices from some Ssig-induced cluster C, whose

size in G is not in the range (1± 2ε′)n/m. Now consider the n−m remaining vertices

of G, from which the remainder of Scsize is sampled; the fraction of C vertices among

these is outside the range (1± 1.5ε′)(n−m)/m (due to our implicit assumption that

n ≥ s′; see Definition 2.1.4.) The probability of the remainder of Scsize containing the

necessary fraction of vertices from C is therefore exp(−Ω((0.5ε′)2s′/m

))< 1/3. Thus

under our assumption that the signature is bad, Phase I rejects G with probability at

least 2/3 — contradicting our assumption that G is accepted with high probability. It

must therefore be the case that G has good signatures, and one of these is sampled as

Ssig.

Now, a representative set R = r1, . . . , rm is said to be well-representing (with

respect to a signature Ssig) if all of the following holds:

(i) Each ri is in a different cluster induced by Ssig.

(ii) An ε′-fraction of the vertex pairs u, v of G are such that both u and v are in

represented clusters, and are consistent with their representatives with respect to

E (that is, u, v ∈ E iff ru, rv ∈ E).

(iii) The subgraph of G induced by R is in Π′′m.

If the test does not reject by the end of Phase I, it must have found m clusters

induced by Ssig, so the set R it obtains obeys requirement (i). If R fails to obey (ii), it

will be rejected with probability greater than 2/3 at Phase II, due to an unrepresented

vertex or an inconsistent pair of vertices ; if R fails to obey (iii), it will be rejected at

Phase III, deterministically. Thus, under our assumption that the test accepts, there

must exist some well-representing set R, with respect to the good signature set, which

the test obtains.

Fixing a good signature set Ssig and a well-representing set R, let C1, . . . , Cmdenote the set of m clusters induced by Ssig, and let Vnc denote the set of vertices not

belonging to any of the m clusters. One can redistribute the excess vertices in each Ci,

and the vertices of Vnc among the Ci’s, so that each Ci becomes of size n/m exactly

(at most 2ε′n/m additions or removals in each cluster). One then needs to modify

the edges incident on redistributed vertices to match the subgraph induced by R; this

requires at most 5ε′(n2

)changes: Up to 4ε′

(n2

)vertex pairs whose endpoints have been

reassigned to a different cluster, plus up to ε′(n2

)pairs which had been in disagreement

with their representative pair with respect to E. This results in a graph satisfying Π,

and as 5ε′ = ε, we conclude that G is indeed indeed ε-close to Π under our assumption.

The claim follows.

Note. In the above, the large deviation bounds are applied as though the vertices

sampled are independent, while when a set of vertices is sampled without replacements,

this is not the case. However, large deviation bounds do apply to samples without

70

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 81: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

repetition from a finite set (in fact, even slightly more tightly than to independent

samples). This is established in [Hoe63], in the discussion preceding and following

Theorem 4 there. That theorem implies that the same or similar bounds established

elsewhere (e.g. [ASE92, Appendix A]), through examining the expectation of the

exponent of the sum of independent bounded variables, also apply to samples without

repetition.

Observation 4.5.18. The test in Algorithm 4.3 can be generalized to the case in

which n/m is not an integer. The modification required is to allow for any n mod m

of the clusters to have desired size dn/me while the others have bn/mc (or actually

ensure that the validation vertices’ intersections with the clusters are of relative sizes

between (1− ε′)bn/mc/n and (1 + ε′)dn/me/n). In the analysis of the test, the “well-

representing” sets will be respective of specific choices of m mod n larger clusters.

Theorem 4.5 is now proven by combining the lower bound of Lemma 4.5.16 and

the upper bound established through the valid test in Lemma 4.5.17, together with

Lemma 4.5.11 regarding PTIME-decidability (while taking Observation 4.5.18 into

account).

4.6 A hierarchy of monotone properties

This section continues Section 4.5, with a second hierarchy theorem for dense graph

properties. In this section, instead of focusing on the PTIME testability, the additional

feature we ensure for properties of arbitrary query complexity is upwards monotonicity.

The direction of monotonicity is inconsequential, as one notes that an identical result

holds for downwards monotone properties by considering the complements of graphs in

the upwards-monotone property; we hereon in this section refer to upwards-monotone

properties as simply ‘monotone’. Unlike the first and third hierarchy results, the con-

struction here does not utilize the PTIME-decidable hard-to-test property constructed

in Section 4.2, but rather the hard property of [GGR98, Proposition 10.2.3.1], which is

generally very hard to decide deterministically, but whose simpler construction better

allows us to place other relevant constraints on its constituent graphs.

Theorem 4.6. There exist a constant ε4.6 > 0, such that for every reasonable q(·)(in the sense of Definition 4.5.1), there exists an (upwards) monotone property of

dense graphs that is testable with two-sided error using O(q(n)ε−4log2

(ε−1))

queries (or

O(q(n)) if one ignores the dependence on ε), but is not ε-testable using o(q(n)) queries,

even allowing two-sided error, for ε ≤ ε4.6.

4.6.1 Property construction

Our construction of a property which is both monotone and hard-to-test will effectively

involve the taking of what is at the same time a blowup and a monotone closure of

71

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 82: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

another property. This operation must maintain not only an Ω(q) lower bound on

testing, but also the upper bound, the possibility of testing with O(q) queries. This is

a challenge, as the possible addition of edges can ‘drown out’ much of the structure

of the graph. We shall overcome this difficulty with a combination of two measures:

The first is that if “too many” edges have been added relative to the original graph,

then we will allow ourselves to always accept, thus limiting the hardness to graphs with

average degree in a certain range; the second measure is constraining the graphs to have

additional structural features which are robust enough, so that few edge additions do

not disrupt them overmuch. This second measure is achieved through the choice of our

initial, hard-to-test property.

Revisiting the hard property of Goldreich, Goldwasser and Ron

We wish to begin our construction with a hard property satisfying several additional

constraints:

Lemma 4.6.1. There exists a (not generally monotone) graph property Π′ =⋃n∈N Π′n

with the following features. First, a probabilistic oracle machine making o(n2)

queries

can only distinguish with probability o(1) between a uniformly-sampled graph from Π′n

and a graph sampled from distribution G(n, 1

2

). Also, for every δ and sufficiently large

n (as a function of δ), every graph G ∈ Π′n satisfies:

1. Every vertex in G has degree(

12 ± δ

)n.

2. For every pair of vertices in G, the union of their neighborhoods contains(

34 ± δ

)n

vertices.

Also, every two graphs G1, G2 ∈ Π′n satisfy:

3. If G1, G2 are non-isomorphic, then they are 0.4-far from each other.

4. If G1, G2 are isomorphic, but their isomorphism fixes less than 0.9n of the vertices,

then they differ on at least 0.01(n2

)of their edges. In other words, and letting

[n] denote the graphs’ vertex set, if the isomohrphism π : [n]→ [n] is such that

|i ∈ [n] | π(i) 6= i| > 0.1n, then the identity bijection between G1 and G2 induces

at least 0.01(n2

)edge discrepancies.

Finally, in addition to the above, an n-vertex graph, sampled from the G(n, 1

2

)distri-

bution (i.e. each vertex pair being an edge with probability 1/2, independently of the

others), is 0.4-far from Π′ with probability 1− o(1).

Proof. Let Π′ be the property constructed in the proof of [GGR98, Proposition 10.2.3.1],

with two slight modifications. Π′ is obtained there by sampling K = 2110·(n2) (labeled)

graphs using the G(n, 1

2

)distribution, and closing the resulting set to isomorphism

by taking all isomorphic images of the sampled labeled graphs. Our first modification

will be setting K differently, to 21

1000·(n2); the construction remains the same with the

alternate K, except that the query complexity lower bound for distinguishing between

72

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 83: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Π′ and a random graph with probability 1/2 drops from some c · n2 to some c′ · n2 for

some global constants 0 < c′ < c; see the proofs of [GGR98, Proposition 10.2.3.1] and

[GGR98, Proposition 4.1.1] for details. We also note that if c′n2 queries are required to

distinguish between the distributions with probability 1/2, then with o(n2)

queries, one

can only distinguish between them with probability o(1).

We now turn to the degree constraints and our second modification of the property.

The probability of a single graph out of the K failing to satisfy either of the first

two degree constraints is at most O(n2)· exp(−Ω(δn)) (using standard large-deviation

bounds, and for n sufficiently large to ignore the neighborhood size being n− 1 rather

than n). We introduce a second modification to the construction, which is the removal

of these unsatisfactory graphs (and their isomorphic images) from Π′; few enough graphs

are removed so that the argument regarding distinguishing graphs sampled from Π′ and

from G(n, 1

2

)still holds; and the density of Π′ remains very close to the original.

with probability 1−o(1), Π′ (as a set of labeled graphs) has close to n!K constituent

graphs at order n. By a union bound, the probability of a G(n, 1

2

)graph being 0.4-close

to it is therefore less than this number times the probability of a G(n, 1

2

)graph being

close to a specific (labeled) graph. This latter probability is equal to the probability of

a graph having at most 0.4(n2

)edges (considering these edges as changed edges from

the original graph), which is less than exp(−0.02

(n2

)); it is therefore the case that a

G(n, 1

2

)graph is 0.4-far from Π′ with probability 1− o(1).

A similar argument can be used to establish the third constraint: Fixing graphs

G1, . . . , Gs−1 in Π′, the graph Gs sampled into Π′ (before any removal of graphs) is

merely a sample from G(n, 1

2

). The probability of this sample being 0.4-close to any of

the previous s− 1 graphs sampled into Π′ or their isomorphic images is o(1/K) (see

the similar argument in Lemma 4.2.5); union-bounding over all K samples, we conclude

that the third constraint is indeed met with probability 1− o(1).

It remains to establish the fourth constraint in the statement of the lemma. Consider

an arbitrary permutation π over [n], fixing less than 0.9n vertices. We wish to show that

a large enough subset E′ of the pairs in G1 satisfies E′ ∩ π(E′) = ∅; if this is the case,

we can use the fact that G1 is sampled from G(n, 1

2

)and conclude that the number of

discrepancies of E′ by an identity bijection between G1 and G2 is close to 12E′ with

high probability.

Indeed, let U = i ∈ [n] | π(i) 6= i be the set of unfixed elements, with |U | = αn.

Let I ⊆ U be a subset of them such that |I| =⌊

13αn

⌋and π(I) ∩ I = ∅. Such

a set exists, as a greedy algorithm can construct it by repeatedly adding another

unfixed element i ∈ U , and marking π(i) as unacceptable for addition. We now set

E′ = u, v | u ∈ I ∧ v ∈ V \ (I ∪ π(I)). These edges have no endpoint in π(I), and

are mapped by π to pairs with one endpoint in π(I), so that E′ ∩ π(E′) = ∅ as desired.

Thus under the identity bijection between G1 and G2, every pair in E′ is mapped to a

pair out of E′. As the edges of G1 are chosen to exist independently of each other and

with probability 1/2, the expected number of discrepancies of the identity bijection is

73

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 84: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

therefore at least

1

2

∣∣E′∣∣ =1

2· |I| · (|V | − 2|I|) =

1

2·⌊

1

3αn

⌋·(n− 2

⌊1

3αn

⌋)=

1

18αn2 · (3− 2α)−O(n)

In the range 0.1 ≤ α ≤ 1, the minimum of the first term is achieved at 0.1 and the

expression is 118 · 0.1n

2 · (3− 0.2)−O(n) > 0.015(n2

)for sufficiently large n. Now, the

probability that one specific G1 and one specific isomorphic image G2 = π(G1) have

less than 0.01(n2

)discrepancies in E′ is at most exp

(−2 · 0.005

(n2

)). Union-bounding

over all K initial graphs in Π′ and all their permutations fixing less than 0.9n of the

vertices, we conclude that with probability 1− o(1), all such pairs indeed have at least

0.01(n2

)discrepancies.

By another union bound using the arguments above, all constraints hold simultane-

ously with probability 1− o(1).

Note. Regarding the deterministic computational complexity of Π′, it may not even

be deterministically computable, due to the random sampling. We could replace, in

the construction of Π′n, the sampling with an exhaustive search of the first set (by a

lexicographic order) of graphs which satisfies the requirements and is of appropriate size;

this would ensure computability. Also, it may be possible to devise a construction based

on a small NPTIME-decidable sample space, as in [GGR98, Proposition 10.2.3.2]; but

we do not explore this possibility in this work.

Property construction via approximate monotone blowups

Definition 4.6.2. Let G = (V,E) be a graph of order n and G′ = (V ′, E′) a graph of

order n′. G′ is said to be a β-threshold approximate monotone blowup of G if V ′ can

be partitioned into |V | + 1 clusters of vertices, as follows: The last cluster contains

n′ mod n vertices with full degree n′−1; the rest of the clusters are all of size t = bn′/nc,and each correspond to a vertex v ∈ V ; for every u′, v′ ∈ V ′ in clusters corresponding

to u, v ∈ V , such that u, v ∈ E, either u′, v′ ∈ E′, or at least one of u′ and v′ is

a heavy vertex, having degree at least n′ mod n + β(tn − 1) (i.e. at least β(tn − 1)

neighbors within the first |V | clusters).

Note. The condition on edges in E and E′ ensures that, ignoring high-degree vertices,

clusters in the blowup are in monotone agreement with vertices in the original graph:

Pairs of clusters in G′ corresponding to connected pairs in G have a complete bipartite

graph between them; and the bipartite graph between clusters corresponding to a

disconnected pair may, or may not, be empty. As we will see below, however, our

concern will be with graphs whose overall number of edges is not too high, so that these

bipartite graphs cannot ‘fill out’ overmuch.

Let us now fix some parameters, so as to be able to construct a specific Πq . As our

construction utilizes Lemma 4.6.1, it depends on the value of δ for which we apply that

74

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 85: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

lemma. We fix ∆ = 10−11c4.5.13 . This value, the ‘leeway’ for ‘informative’ vertex degrees

in graphs of our monotone properties, is set low enough to meet certain constraints

which come up in in the analysis of the construction, and in the proof of the validity of

a test in Subsection 4.6.3 further below.

Now let us fix q(·) for the rest of this section. We assume that, for sufficiently large

n, q(n) > q0 = 100∆−4log4(∆−1

). This (non-optimized) bound is necessary for some of

our arguments below, as the orders of graphs with which we will be concerned depend

on q(n). There is no loss of generality in this assumption regarding q: Recall that q is

required to satisfy q(n) ≤(n2

), and for its image to be infinite. Thus if our additional

assumption does not hold, we replace q with q′(n) = min(

n2

),maxq, q0

. This is still

a valid function with respect to the statement of Theorem 4.6, and when plugged in

there it yields the same result, albeit with a different threshold distance ε4.5 for hardness.

Note that q0 does not depends on ε.

A complexity-q property. Let m(n, q) =⌊√

q(n)⌋. We set Πq =

⋃n∈N Πq

n with

Πqn containing all graphs G = (V,E) satisfying at least one of the following two

conditions:

(C1) The graph has at least 0.5 + 2∆ ·(n2

)edges.

(C2) Each vertex in G has degree at least (0.5−∆)n and G is a 0.52-threshold approx-

imate monotone blowup of a graph in Π′m(n,q).

Observation 4.6.3. Πq is monotone (as each of the two conditions is itself monotone).

4.6.2 A query complexity lower bound for the constructed property

The hard property Π′ we use as the base of our construction is proven to be hard, in

[GGR98, Proposition 10.2.3.1], using Yao’s method, with the far distribution consisting

of G(n, 1

2

)graphs. As graphs in Πq are constructed by transforming graphs in Π′,

our lower bound will use distributions of transformed graphs, in a similar manner to

Section 4.2:

Rn: An exactly-balanced blowup of a graph sampled from distribution G(m(n, q), 1

2

),

to order n− (n mod m), to which are added n mod m additional vertices of full

degree.

Gn: An exactly-balanced blowup of a graph sampled uniformly from Π′m, to order

n− (n mod m), to which are added n mod m additional vertices of full degree.

(Recall that these distributions are only defined for n sufficiently large so that the

constraints on ∆ and q may be satisfied, with Lemma 4.6.1 holding.)

Lemma 4.6.4. The graphs of Gn are all in Πqn.

75

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 86: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Proof. A graph of Gn is an exactly-balanced blowup of a Πq ′ graph, with additional

full-degree vertices — constituting an approximate monotone blowup, regardless of the

threshold value: The n mod m full-degree vertices count as a separate cluster, and the

rest of the graph is in monotone agreement with the Πq ′ graph. Also, the minimum degree

of a Πq ′ graph is at least (0.5−∆)m; the minimum degree of the exactly-balanced blowup

is therefore at least (0.5−∆)(n− (n mod m)), and adding full-degree vertices makes

the minimum degree no less than (0.5−∆)(n− (n mod m)) + (n mod m) ≥ (0.5−∆)n.

This meets condition (C2).

Lemma 4.6.5. with probability 1− o(1), a graph sampled from Rn is (0.08 · c4.5.13)-far

from the support of Gn.

Proof. By Lemma 4.6.1, a graph G sampled from distribution G(m, 1

2

)is 0.4-far from

Π′m with probability 1− o(1). Also, with probability 1− o(1) G is 0.4-dispersed. When

both these events occur, Lemma 4.5.13 guarantees that any exactly-balanced blowup of

G is (c4.5.13 · 0.4 · 0.4)-far from all exactly-balanced blowups of graphs in Π′m. Finally,

adding n− (n mod m) full-degree vertices to all exactly-balanced blowups both of G

and of a graph in Π′m can reduce the distance between them by a factor of no more than

n/(n mod m) < 2 (by Lemma 4.5.5). Thus after applying the entire transformation

of the definition of Rn to G, we have a graph sampled from distribution Rn, which

with probability 1 − o(1) is 0.08 · c4.5.13-far from the transformed graphs Π′m graphs;

the proof is completed, as these transformed graphs are the support of Gn.

Lemma 4.6.6. Let δ ≤ ∆/4, let G′ be a graph of order n′ = n − (n mod m) with

maximum degree at most (0.5 + δ)n′, and let G be the result of adding n mod m vertices

to G′ with full degree n − 1. If G is δ-close to Πqn, then it is (63δ + ∆)-close to the

support of the Gn distribution.

Proof. We first consider values of n which are multiples of m, in which case there are

no full-degree vertices added to blowups in the construction of Πqn and no full-degree

vertices in G. Also, for this case we only assume δ ≤ ∆.

Let H ∈ Πq be the satisfying graph closest to G. The number of edges in H is less

than (0.5 + δ)(n2

)+ δ(n2

)= (0.5 + 2δ) ·

(n2

)(edges in G plus an upper bound on edges

added); H must therefore satisfy condition (C2) rather than (C1) in the definition of

Πq . Let H ′ ∈ Π′m be the graph of which H is an approximate monotone blowup, and

let H ′′ be an exactly-balanced blowup of H ′ to order n. We show that H ′′ — which is

in Πq , and particularly in the support of Gn — is close to G.

Denote by Vheavy the set of heavy vertices in H, that is, the vertices which have

degree greater than 0.52n (note that since n is a multiple of m, these do not include

any full-degree vertices added to the monotone blowup). Also, for ease of notation we

assume V (G) = V (H) = V (H ′′) = V .

The discrepancies between corresponding edges of H and H ′′ can be attributed

to one of two causes: Having a heavy vertex (in Vheavy) for an endpoint; or the edge

76

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 87: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

being in a bipartite graph between clusters of H, whose corresponding H ′ vertices are

disconnected (there is no case of H ′′ having an edge between non-heavy vertices of H,

which is not also present in H — by the definition of an approximate monotone blowup).

Regarding heavy-vertex-endpoint edges, we note that G has no 0.52-heavy vertices,

having maximum degree (0.5 + δ)n; thus Vheavy contains only as many vertices as

is made possible by up to δ(n2

)edges it may have in addition to those of G; thus

|Vheavy| ≤ δ(n2

)/(0.52− (0.5 + δ))n < 30δ(n− 1). At most (n− 1) edge discrepancies

between H and H ′′ edges may be attributed to each such vertex, for a total of less than

60δ(n2

)edge discrepancies over all of Vheavy.

Now suppose we correct all discrepancies in H with H ′′ involving heavy vertices,

i.e. modify the neighborhoods of vertices in Vheavy to their values in H ′′. These vertices

now obey the constraints on non-heavy vertices in an approximate monotone blowup —

and if we were now to remove all edges existing in H but not in H ′′, we would get H ′′

exactly: We would be ‘cleaning out’ the bipartite graphs corresponding to disconnected

H ′-vertices. The number of edges we would need to remove would be the difference in

the total number of edges between the modified H and H ′′. Correcting heavy vertices

necessarily involves removing more edges to them than are added, so after this correction

to H it still has at most (0.5 + 2δ) ·(n2

)edges. The number of edges in H ′′ is at least

12(0.5−∆)n2 > (0.5−∆)

(n2

)edges, since the average degree of H ′′, as a graph in Πq ,

is at least (0.5−∆)n. The number of edges remaining to be removed to make the

modified H into H ′′ is therefore no higher than (2δ + ∆)(n2

).

Altogether, H and H ′′ are therefore at a distance of no more than 62δ + ∆, so G

and H ′′ are at a distance of no more than 63δ + ∆.

Now let us consider the general case, in which n is not necessarily divisible by m;

we wish to reduce it to the case of n divisible by m, for which no full-degree vertices

are added.

Let H be as in the previous case: A graph satisfying Πq at minimum distance from

G. H has n mod m full-degree vertices; without loss of generality , we may assume that

the n mod m full-degree vertices in G are mapped to these, and that G′ (the exactly-

balanced blowup being an induced subgraph of G) is mapped to an induced subgraph H ′

of H, of order n′ = n− (n mod m); this assumption is possible by Lemma 4.5.5. H is an

approximate monotone blowup of some graph in Π′m — and H ′ is also an approximate

monotone blowup of the same graph (with the cluster of full-degree vertices being

empty). Thus H ′ satisfies Πqn′ and, in fact, it is in the support of Gn′ . The distance

of G′ from H ′ is δ′ = δ(n2

)/(n′

2

)< 4δ ≤ ∆. The argument for the previous case now

applies to G′ (as even though G′ does not meet the requirements of the lemma, it meets

the relaxed requirements of the first case discussed above); thus G′ is (63δ′ + ∆)-close

to Gn′ .

Finally, let H ′′ denote the Gn′ graph closest to G′. Adding n mod m full-degree

vertices to H ′′ results in a graph which is in Gn, and its distance from G is (63δ′ + ∆) ·

77

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 88: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

(n′

2

)/(n2

)≤ 63δ + ∆.

Lemma 4.6.7. with probability 1 − o(1), a graph sampled from Rn is ∆/4-far from

Πqn.

Proof. Let δ = ∆/4, and let n be sufficiently large so that with probability 1− o(1), a

uniformly sampled graph of order m(n, q) has maximum degree (0.5 + δ)m; its blowup to

order n−(n mod m) has maximum degree (0.5 + δ)(n− (n mod m)). Now, suppose that

the graph from Rn observes this bound (before the addition of the n mod m full-degree

vertices), and thatRn is also δ-close to Πqn. We may now apply Lemma 4.6.6 to conclude

that the graph is (63δ + ∆)-close to the support of Gn. But since ∆ = 10−5c4.5.13, we

have 63δ+ ∆ < 0.08 · c4.5.13. By Lemma 4.6.5, this can be the case only with probability

o(1).

We consequently set ε4.6 = ∆/4.

Lemma 4.6.8. Any probabilistic oracle machine making o(m2)

queries has probability

o(1) of distinguishing between inputs from Rn and from Gn.

Proof. The proof is by the same argument as in Lemma 4.2.6: Let R′m and G′m be the

uniform distributions over all graphs of order m and over Π′m respectively; distributions

Rn and Gn are obtained by applying the same augmentation to samples from G′m and

R′m respectively. The result of each query to an augmented graph depends on one or no

edges of the original order-m graph. It therefore suffices to prove the claim assuming

queries are made to the original order-m graphs rather than their augmentations or the

isomorphic images thereof; in other words, it suffices to show that the probability of

an oracle machine distinguishing between inputs from R′m and from G′m, using o(m2)

queries, is o(1). This is guaranteed by the choice of Π′ in Lemma 4.6.1.

Lemma 4.6.9. Any ε-test for Πq , for ε ≤ ε4.6, makes Ω(q(n)) queries.

Proof. Let n be sufficiently large for Lemma 4.6.7 to hold, and let ε ≤ ε4.6. An ε-test

for Π accepts with probability at least 2/3 a graph sampled from Gn. By Lemma 4.6.8,

if the test makes o(m2)

= o(q(n)) queries, then for a sufficiently large n it will accept a

graph sampled from Rn with probability at least 2/3 − o(1). By Lemma 4.6.7, with

probability 1− o(1), a graph from Rn is ∆/4 = ε4.6-far from Π, so the probability of the

test accepting graphs in Rn which are ε4.6-far from Π is also at least 2/3− o(1). Thus

for every sufficiently large n there exists a specific graph which is ∆/4 = ε4.6 > ε-far

from Πn, and is accepted with probability over 1/2 — a contradiction.

4.6.3 A test for the constructed property

In this subsection we present a test for Πq . As in Section 4.5, we assume, for the

sake of simplicity, that n is an integer multiple of m, in which case graphs in Πq are

78

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 89: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

approximate monotone blowups, with no full-degree vertices added to them; we later

argue that this assumption can be foregone.

Definition 4.6.10. Let Ξ =⋃n∈N Ξn denote the property of graphs satisfying a relaxed

version of the conditions of (C2) of Πq , and failing to satisfy a relaxation of condition

(C1). Specifically, a graph G is in Ξ if its minimum degree is at least (0.5 − 105∆)n

(rather than (0.5−∆)n in (C2)), its average degree is lower than (0.5 + 105∆)n (rather

than (0.5 + 2∆)n, the threshold for (C1)), and it is a 0.52-threshold approximate

monotone blowup of a graph in Π′m(n,q).

The motivation for this definition is that, when testing a graph satisfying (C2)

but not (C1), we hope to reconstruct, by querying poly(1/ε) edges, a graph which

approximately satisfies (C2) with the above parameters. Before proceeding to present an

actual test, we wish to establish the fact that, having reconstructed a Ξ graph, we can

also determine the pre-blowup graph of which it is an approximate monotone blowup.

This is less than trivial, due to Ξ graphs having some heavy vertices, as well as some

edges between clusters which are disconnected in the pre-blowup graph.

Lemma 4.6.11. Let G = (V,E) be a graph in Ξn, for sufficiently large n. There

exists a graph G′′ = (V ′′, E′′) of order at most m and a corresponding partition of the

non-heavy vertices into m cells (denoted V ′1 , . . . , V′m), so that the following holds:

1. V ′′ has a vertex for each non-empty cell Vi, i.e. |V ′′| = |i ∈ [m] | V ′i 6= ∅| (and

specifically, |V ′′| ≤ m).

2. G′′ is an induced subgraph of some graph in Π′m.

3. Each V ′i contains at most n/m vertices.

4. G′′ is in monotone agreement with the partition, i.e. for every i, j ∈ E′′ and

every (u, v) ∈ V ′i ×V ′j , it holds that u, v ∈ E.

5. At most 0.01m sets Vi are empty.

6. Neighborhoods of different vertices within the same partition cell agree on all but

at most 0.05n vertices.

7. Neighborhoods of vertices from different partition cells disagree on at least 0.45n

vertices.

Proof. Let Vheavy ⊆ V denote the set of heavy vertices of G (those with degree exceeding

0.52n). Before considering the seven requirements, we first bound from above the number

of heavy vertices in G, using the constraint on the minimum and the average degrees:

A bound is obtained by assuming that every non-heavy vertex contributes only the

minimum degree towards the overall average, and the heavy vertices contribute only

0.52n each. In this case we recall the bounds on the average and the minimum degree in

Ξ, and find that a sum over the vertex degrees yields 0.52n · |Vheavy|+(0.5− 105∆

)n ·

(n− |Vheavy|) ≤(0.5 + 105∆

)n2; thus

(0.02 + 105∆

)· |Vheavy| ≤

(2 · 105∆

)n; as (2 ·

105)/(0.02 + 105∆) < 107, this implies |Vheavy| < 107∆.

79

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 90: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

A partition and a graph G′′ satisfying the requirements above are the obvious ones:

G is a 0.52-threshold approximate monotone blowup of G′ = ([m], E′) of order m, and

G′′ is chosen as the subgraph of G′ induced by those vertices i with Vi 6= ∅; this satisfies

requirements 1 and 2. The partition chosen is the clustering of V in the approximate

monotone blowup, i.e. V ′i is the cluster originating in i ∈ G′, excluding any heavy

vertices. This satisfies requirements 3 and 4 (by definition of an approximate monotone

blowup). Regarding requirement 5, empty partition cells correspond to clusters with n/m

vertices which are all heavy, and there can be at most |Vheavy|/nm < 107∆m < 0.01m

of these (for sufficiently large n).

Regarding requirement 6: The neighborhoods of a pair of vertices in the same cluster

Vi of the blowup must agree on at least those edges mandated by G′, whose endpoints

are not heavy; i, as a vertex of G′, has degree no lower than (0.5−∆)m, so there are

(0.5−∆)n edges which both vertices must have, minus up to |Vheavy| edges to heavy

vertices which are not constrained to be present: At least (0.5− (107 + 1)∆)n. On top

of these, every one of the two vertices can have at most 0.52n− (0.5− (107 + 1)∆)n =

0.02n + (107 + 1)∆n < 0.021n additional neighbors so as not exceed the maximum

degree of a non-heavy vertex. Thus the two neighborhoods can differ by at most

2 · 0.021n < 0.05n of their neighbors.

Regarding requirement 7: If the blowup had been exactly-balanced rather than

monotone, that is, had G not had any heavy vertices, and had G contained only those

edges corresponding to edges in G′, a pair of vertices in different clusters V ′i and V ′jwould each have at least (0.5−∆)n neighbors, of which at least (0.25−∆)n were

shared with the other vertex and at least (0.25−∆)n not shared. Thus G′ mandates

a (0.5− 2∆) fraction of difference between the neighborhoods. As argued above, the

heavy vertices and the leeway with respect to the degree of non-heavy vertices can alter

the number of vertices in disagreement by at most 2 · 0.021n for each vertex. Thus the

difference between the neighborhoods is at least (0.5− 2∆)n− 2 · 0.021n > 0.45n.

Lemma 4.6.12. Let G, G′ be as in Lemma 4.6.11; the graph G′′ and the partition

V ′1 , . . . , V′m guaranteed to exist by Lemma 4.6.11 are unique up to isomorphism (re-

labeling), and for a given V ′1 , . . . , V′m, the labeling of G′′ is unique.

Proof. Consider an arbitrary graph and partition of V \ Vheavy which satisfy all the

requirements of Lemma 4.6.11. Now, two vertices from the same G′-cluster cannot be

assigned different cells Vi, Vj — as such two vertices have highly different neighborhoods

by requirement 7 (of Lemma 4.6.11), and their placement together will violate require-

ment 6 (their neighborhoods will not be consistent enough). For a similar reason, a

pair of vertices from different clusters cannot be assigned the same cell in the partition

— their neighborhoods will differ too much while required to agree mostly. It must

therefore be the case that the partition is exactly V1, . . . , Vm, up to a reordering. Let us

80

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 91: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

assume without loss of generality this is exactly the partition (without reordering). It

now remains to show G′′ must be as chosen in the proof of Lemma 4.6.11.

If a pair of non-heavy vertices u ∈ V ′i and v ∈ V ′j are not connected in G, their

cluster vertices in G′′ cannot be connected, i.e. it must be the case that i, j /∈ E′.If all pairs with disconnected clusters in G′′ were disconnected, it would be uniquely

determined, proving the claim. G is a monotone blowup, so as mentioned earlier, we

must show that the adverse effect of heavy vertices and unnecessary edges cannot bring

it into monotone agreement with a graph in Π′m other than G′′.

To do so, we show that the minimum number of (ordered) pairs of clusters, whose

bipartite graph is not full, is high enough to practically determine G′′. To obtain

this minimum number, we bound the total number of disconnected (ordered) pairs

of non-heavy vertices in different clusters: There are at most (0.5 + 105∆)n2 ordered

pairs in G connected by an edge, overall; and at most 2|Vheavy|n pairs are incident

upon heavy vertices, so the total number of disconnected pairs in different clusters is at

least (0.5−105∆)n2−2|Vheavy|n−∑

i |Vi|2 >

(0.5− (2 · 107 + 105)∆

)n2−

∑i (n/m)2 >(

0.5− 201 · 105∆)n2 − n2/m. As m > ∆−1 and (201 · 105 + 1)∆ < 0.001, this is at

least (0.5 − 0.001)n2. There must therefore be at least 0.499m2 (ordered) pairs of

different clusters with missing edges between non-heavy vertices. As the average degree

of G′′ is no lower than (0.5−∆)m, there are at most m2 −((0.5−∆)m2 + 0.499m2

)<

(0.001 + ∆)m2 additional (ordered) pairs of clusters which may fail to be present as

edges in G′′. Thus any two graphs of order m, with subgraphs which can serve as

G′′, have no more than 2(0.001 + ∆) discrepant pairs, corresponding to a choice of the

potential additional missing edges. Their distance does therefore not exceed 0.003(|V ′′|

2

).

We recall that Π′m graphs are 0.4-far from each other (condition 3 in Lemma 4.6.1 met

by graphs in Π′). Combining these two facts we conclude that the order-m graph, of

which G′′ is a subgraph, is determined up to isomorphism; and as G′ can be such a

graph, the order-m graph is necessarily some relabeling of G′.

Now, since the distance between any two potential G′′ graphs is less than 0.01, the

labeling of G′ is determined up to an isomorphism fixing over 0.9m of the vertices (as

per condition 4 of Lemma 4.6.1). It remains to show that an isomorphism on G′, which

fixes over 0.9m of the vertices, but moves at least one vertex of a cluster i with Vi 6= ∅,

makes G′′ incompatible with V ′1 , . . . , V′m.

Consider, therefore, two subgraphs G′′1 and G′′2 of relabelings G′1 and G′2 of G′,

satisfying the requirements with respect to the clustering V ′1 , . . . , V′m, and such that the

mapping between G′1 and G′2 fixes 0.9m of the vertices, but replaces a vertex i ∈ V (G′′1)

by some other vertex j 6= i of G′ (not necessarily a vertex in G′2). The neighborhood of

each non-heavy vertex in V ′i must contain all non-heavy vertices in clusters V ′k such that

i, k ∈ E′1, due to the monotone agreement with G′′1; and it must contain all vertices

in clusters V ′k such that j, k ∈ E′2, due to the monotone agreement with G′′2. Now, for

at least a 0.9-fraction of the clusters V ′k, it is the case that j, k ∈ E′2 if and only if

j, k ∈ E′1 — so the V ′i vertex’ neighborhood must contain these non-heavy vertices.

81

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 92: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Consequently, the degree of a vertex of V ′i must be at least (0.5−∆)n (edges mandated

byG′′1), plus (0.5−∆−0.1)n (edges mandated by G′′2 to vertices fixed by the isomorphism),

minus at most (0.25 + ∆)n (the maximum intersection of the neighborhoods of pairs

of vertices in G′, by Lemma 4.6.1), minus |Vheavy|n (heavy neighbors). Recalling that

|Vheavy|n ≤ 107∆n, this sum is at least(0.65−∆− 107∆

)n > 0.64n, which is impossible

for a non-heavy vertex. Thus the isomorphism cannot replace any i ∈ V (G′′1), so G′′1 and

G′′2 must be exactly the same as labeled graphs, i.e. G′′ is indeed uniquely determined,

as claimed.

We now have all the machinery necessary for presenting a test for Πq , listed as

Algorithm 4.4, and establishing its validity.

Algorithm 4.4 A test for Πq

1: ε′ ← minε,∆/1000/20, m←⌊√

q(n)⌋.

Phase I: Graph edge density estimation

2: Estimate the edge density of G, using Θ(1/ε′2

)independent edge queries.

3: If the estimated edge density exceeds 0.5 + 2∆− 2ε′, accept.

Phase II: Vertex degree estimation

4: Smin-deg ← uniform sample of Θ(1/ε′) vertices.5: Ssig ← uniform sample of Θ

(log(|Smin-deg|) /ε′2

)signature vertices.

6: for each vertex v ∈ Smin-deg do7: Estimate the degree of v using Ssig (by querying the potential edges from v to Ssig).8: If v has estimated degree under (0.5−∆− ε′)n, reject.9: end for

Phase III: Finding representatives for a clustering

10: Srep ← uniform sample of Θ(m/ε′2

)vertices.

11: Ssig ← uniform sample of Θ(log(|Srep|) /ε′2

)signature vertices.

12: S′rep ← ∅13: for each vertex v ∈ Srep do14: Estimate v’s degree using Ssig.15: If v’s estimated degree is less than (0.52− ε′)n, add v to S′rep.16: end for17: If |S′rep| < 0.99|Srep|, reject.18: Ssig ← uniform sample of Θ(log(|Srep|)) signature vertices.19: m′ ← 0, R← ∅20: for each v ∈ S′rep do21: for each i ∈ [m′] do22: Estimate the size of the difference between the neighborhoods of v and ri by the difference of

their neighborhoods in Ssig .23: If the neighborhoods of v and ri differ by no more than 0.06s, add v to V ′′i and continue to

the next iteration at line 20.24: end for25: m′ ← m+ 1, ri ← v, V ′′m′ ← v, R← R ∪ ri26: If m′ > m, reject.27: end for

... (continued) ...

Observation 4.6.13. The queries made by Algorithm 4.4 are dominated by those in

Phases IV and V:(m2

)= Θ(q(n)) and Θ

(t · log(t/ε) /ε2

)= Θ

(m · log2(m) · ε−4log(1/ε)

)respectively. Thus the overall number of queries, ignoring the dependence on ε, is

Θ(q(n)).

82

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 93: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Algorithm 4.4 A test for Πq (continued)

Phase IV: Determining G′ and G′′

28: Query the graph GR, induced by R and labeled accordingly.29: Let G′ = ([n], E′) ∈ Π′m be such, that GR is in monotone agreement with the subgraph induced by

its first m′ vertices: If i, j ∈ E′ then ri, rj ∈ E.30: Let G′′ be the subgraph of G′ induced on the first m′ vertices.31: If there exists no appropriate G′, or if G′′ is not uniquely determined by GR, reject.

Phase V: Estimating cluster sizes

32: Scsize ← uniform sample of t = Θ(mlog(m) · log(1/ε′) /ε′2

)vertices.

33: Ssig ← uniform sample of Θ(log(t/ε) /ε′2

)signature vertices.

34: for each v ∈ Scsize do35: Estimate v’s degree using Ssig.36: If v has estimated degree over (0.52− 2ε′)n, remove it from Scsize and continue to the next v.37: π(v)← ⊥38: for each i ∈ [m′] do39: Estimate the size of the difference between the neighborhoods of v and ri using Ssig.40: If the neighborhoods of v and ri differ by less than 0.06s, let π(v) = i.41: end for42: end for43: If any cluster i has over (1 + ε′/2)t/m vertices in Scsize with π(v) = i, reject.44: If more than an ε′/2-fraction of the vertices remaining in Scsize have π(v) = ⊥, reject.

Phase VI: Ensuring the monotone agreement of G with G′′

45: Ssig ← uniform sample of Θ(log(1/ε′) /ε′2

)signature vertices.

46: for Θ(1/ε′) times do47: Sample a pair of vertices u, v and query u, v.48: Estimate the degrees of u and v using Ssig.49: If u or v have estimated degree over (0.52− ε′)n, continue to the next pair.50: Cluster u and v as in Phase V.51: If π(u) = ⊥ or π(v) = ⊥, continue to the next pair.52: If π(u), π(v) ∈ E′′ but u, v /∈ E, reject.53: end for54: accept.

In proving Algorithm 4.4’s validity as a test, we will separate the arguments for

completeness and for soundness, both of which being non-trivial.

Completeness of the test

We will again denote by Vheavy the set of heavy vertices of G; we also refer to vertices

with degree under 0.52− 2ε′ as light vertices; the rest are non-light vertices, and the set

of these vertices is denoted Vnl.

Lemma 4.6.14. A graph of order n which satisfies (C2) but not (C1) has at most

150∆n vertices of degree over (0.52− 3ε′)n. In particular, less than a 150∆-fraction of

its vertices are non-light, i.e. |Vnl| < 150∆n.

Proof. The argument is similar to that made for heavy vertices earlier in this section:

Let U denote the set of vertices with degree over 0.52− 3ε′. Since the average degree of

G is less than (0.5 + 2∆), and the minimum degree is at least (0.5−∆)n, U satisfies

(0.52− 3ε′)n · |U |+(0.5−∆)n ·(n− |U |) ≤ (0.5 + 2∆)n2; thus (0.02 + ∆− 3ε′)n · |U | ≤(3∆)n2; this yields the claim, as 0.02 + ∆− 3ε′ > 0.02.

83

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 94: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Lemma 4.6.15. If G ∈ Πq , then it is accepted by Algorithm 4.4 with probability at

least 2/3.

Proof. If G satisfies condition (C1), then it will be accepted with high probability

by Phase I of the test; in fact, this will be true if G’s average degree is at least

(0.5 + 2∆− ε′)n. We thus focus on the case of G satisfying Πq but having less than this

average degree, thus satisfying (C2). Let G′ = ([m], E′) ∈ Πq be the graph of which G

is an approximate monotone blowup. To prove that the test accepts G with high enough

probability, we show that each of the following ‘desirable’ events is likely to occur:

1. The graph is not determined to have low minimum degree (in Phase II).

2. Almost all vertices sampled into Srep are light vertices, which are then placed in

S′rep (hence Phase III does not reject on account of S′rep being too small).

3. The clustering of S′rep in Phase III is valid, i.e. the vertices assigned to each cluster

are all those vertices of S′rep in the cluster of some single G′ vertex.

4. By the end of Phase III, R contains only non-heavy vertices, and its light vertices

represent almost all clusters of G′.

5. The graph GR, induced by the cluster representatives in R, is such that its

corresponding G′′ is uniquely determined (hence Phase IV does not reject).

6. All cluster size estimates in Phase V are about 1/m of the total size of Scsize

(hence Phase V does not reject on account of cluster size imbalance),

7. The clustering of Scsize is valid, and all heavy vertices are discarded (hence Phase

V does not reject on account of there being too many unclusterable vertices).

8. The fraction of Scsize discarded for having high degree is not excessively high.

9. The clustering of pairs in Phase VI is valid, i.e. all pairs with a heavy endpoint

are discarded, and all vertices in pairs assigned π(v) = i are non-heavy vertices

from the same G′ vertex cluster as the representative ri.

10. Phase VI finds no monotone disagreement between G and G′′ (and hence does

not reject).

If all of these events occur, G is indeed accepted.

Phase II degree estimates The degree estimate of a single vertex v ∈ Smin-deg is

ε′-close to its actual value with probability 1− exp(−Ω(ε′2 · |Ssig|

))= 1− exp

(−Ω(ε′2 ·

ε′−2 · log(|Smin-deg|)))> 1− 0.01 ·

∣∣Smin-deg

∣∣. (This last argument uses a large-deviation

bound on the vertices in in Smin-deg, which are uniform samples without repetition; see

the note on page 70, following the proof of Lemma 4.5.17.) Union-bounding over all

vertices in Smin-deg we conclude that with probability greater than 1− 0.01, all of their

degree estimates are correct to within ε′. As G’s minimum degree is at least (0.5−∆)n,

the estimates are all at least (0.5−∆− ε′)n, so G is not rejected at Phase II in this

event.

84

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 95: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Light vertices in Srep and S′rep In Phase III, the degree estimate of a single

vertex of Srep is ε′-close to its actual value with probability 1 − exp(−Ω(log(|Srep|)))(independently of the choice of S′rep), so with probability greater than 1− 0.005 all s

vertices in the set have estimates correct to within ε′. Also, with probability greater

than 1− exp(Ω(m)) > 1− 0.005, the fraction of non-light vertices in Srep is at most ε′

higher than their fraction in G (which is under 150∆, by Lemma 4.6.14); in this event,

|Srep| has at most a 150∆ + ε′ < 151∆ < 0.01 fraction of non-light vertices. If both

events occur, more than 0.99|Srep| light vertices are placed in S′rep, so the test does

not reject on account of S′rep being too small. Note also that these events occur with

probability greater than 1− 0.01, independently of the choice of light vertices in Srep

given their total number; in other words, with probability greater than 1− 0.01 these

events occur and, additionally, if we condition on the specific number s of light vertices

in Srep, these light vertices are distributed uniformly over all sets of s light vertices in

Srep.

Validity of the clustering of S′rep Let V ′1 , . . . , V′m be as in Lemma 4.6.11. Now,

two non-heavy vertices in S′rep in different V ′i ’s have neighborhoods differing on 0.45n

vertices, by item 7 in Lemma 4.6.11, and two non-heavy vertices in the same V ′i have

neighborhoods differing on at most 0.05n vertices by item 6 in Lemma 4.6.11. Thus

with probability 1− exp(Ω(log(|Srep|))), a pair of non-heavy vertices will be estimated

to have neighborhoods with under 0.06n differences, if the pair of vertices are in the

same cluster, and over 0.06 if they are in different clusters, using the set of s′ signature

vertices. Thus with probability greater than 1− 0.005, all decisions of whether the pairs

of non-heavy vertices in S′rep are in the same cluster will be correct — independently

of which non-heavy vertices make up S′rep. As argued above, with probability greater

than 1− 0.005 all degree estimates of Srep are correct, so no heavy vertices are placed

in S′rep. Thus with probability greater than 1− 0.01 all clustering decisions regarding

pairs of vertices in S′rep are correct, the clustering is valid, and the test will not reject G

on account of having more than m clusters. This, independently of the choice of light

vertices in Srep given their total number (see comment above).

R represents most clusters well Suppose that all degree estimates in Phase III

are correct to within ε′. In this case no heavy vertex is placed in S′rep and consequently

no heavy vertex is placed in R. It then remains to show that few clusters are missing

representatives in R.

Suppose, additionally, that the clustering of S′rep is valid. In this case, if a cluster is

represented in S′rep, it will be represented in R and not have its constituent non-heavy

vertices represented by some ri from another cluster. Finally, suppose that light vertices

in Srep are all placed in S′rep. With this supposition it suffices to show that only few

clusters have no light vertices in Srep.

Now, there are less than 150∆n non-light vertices in G, and thus less than 150∆/(1−

85

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 96: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

ε′) < 151∆− ε′ clusters have fewer than ε′n/16m light vertices (we refer to such clusters

as light-poor). A cluster with at least ε′n/16m light vertices (a light-rich cluster) has

at least one light vertex in Srep with probability at least 1− exp(−Ω(1/ε′)), thus the

expected number of light-rich clusters, having no light vertices in Srep, is less than

ε′m/3200. If this is the case, then with probability greater than 1− 0.005, the actual

number of such light-rich clusters is less than ε′m/16, making the total number of

clusters missing from R lower than 151∆m, i.e. m −m′ < 151∆m. When the above

holds, and R has no heavy vertices, we refer to R as being well-representing.

Thus R is well-representing with probability greater than 1− 0.03 overall, and with

probability greater than 1− 0.005 conditioning on relevant previous desirable events;

and the occurrence of the event with probability greater than 1− 0.03 is independent of

the distribution of light vertices within each cluster given their number in that cluster.

(Note, however, that the number of light vertices in each cluster is not independent of

the event of R being well-representing).

The uniqueness of G′′ We would like to show that the graph GR will be found to

uniquely determine G′′, the subgraph of G′ corresponding to the clusters represented

in R. We condition on the event of R being well-represented, with the choice of light

vertices in Srep being uniform given their total number. When this event occurs, the

appropriate subgraph of G′ is necessarily a possible choice for G′′ at Phase IV, as it is

in monotone agreement with GR — but we wish to show that it is the only such choice,

using Lemma 4.6.12.

We cannot apply Lemma 4.6.12 to G itself, which is unknown to the test, nor to GR.

Instead, consider a graph G obtained as follows: We blow up GR by a factor of n/m; for

any unrepresented cluster in G, we add a cluster of n/m heavy vertices, with full degree

n− 1. We will demonstrate that G is in Ξ, so that Lemma 4.6.11 and Lemma 4.6.12

apply to it. This will establish the uniqueness of our desired G′′, as G is its approximate

monotone blowup.

Regarding the minimum degree of G: Had R represented all clusters, the minimum

degree would have been (0.5−∆)n, as G would have been a monotone blowup of G′

with no heavy vertices. Since we replace missing clusters with full-degree vertices,

edges are only added relative to the case of having more clusters represented. Thus the

minimum degree is no less than (0.5−∆)n in G as well.

Now let us bound the average degree of G, letting dH denote the average degree

of a graph H. As G is a random graph based on the choice of R, let us consider the

distribution of a single representative ri. The representative is not a uniform sample

from Vi, as a uniform sample may fail to be estimated as non-heavy even with our

having conditioned on R being well-representing. But with our conditioning, if ri is

a light vertex, its distribution is uniform over all light vertices in the cluster. Thus

the variation distance between the distribution of any ri, and the uniform distribution

over its cluster Vi, is at most the probability of the uniformly-sampled Vi vertex being

86

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 97: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

non-light. This implies, specifically, that for two represented clusters i and j, with ri

and rj being their (random-variable) representatives

Prri,rj

[ri, rj ∈ E]

≤ Pr(vi,vj)∈Vi×Vj

uniform

[vi, vj ∈ E] + Pr(vi,vj)∈Vi×Vj

uniform

[vi or vj non-light]

≤ Pr(vi,vj)∈Vi×Vj

uniform

[vi, vj ∈ E] + Prvi∈Vi

uniform

[vi non-light] + Prvj∈Vj

uniform

[vj non-light]

Also, had all clusters been represented in R, and had ri been sampled uniformly from

its cluster, the expected average degree in GR would be exactly the average degree of G

(normalized by m/n), i.e. is at most (0.5 + 2∆− ε′)m.

Bearing the above in mind, we can obtain a bound on the number of tuples in GR

(which is m′ times its average degree) if we account for non-light vertices. Recall that

the expectation is under our conditioning of R to be well-representing.

Ex[m′ · dGR

]=∑

i,j∈[m]

Pr[i, j represented] · Prri,rj

[ri, rj ∈ E|i, j represented]

≤∑

i,j∈[m]

Pr(vi,vj)∈Vi×Vj

uniform

[vi, vj ∈ E] + Prvi∈Vi

uniform

[vi non-light] + Prvj∈Vj

uniform

[vj non-light]

=∑

i,j∈[m]

Pr(vi,vj)∈Vi×Vj

uniform

[vi, vj ∈ E] + 2m ·∑i∈[m]

Prvi∈Vi

uniform

[vi non-light]

≤ m2

ndG + 2 ·

(m2

n· |Vnl|

)≤ m ·

(mndG + 300∆m

)The expected degree of G is n/m times that of GR, plus less than n/m for every

one of the m −m′ clusters unrepresented in R. As R is well-representing, m −m′ <151∆m, so the contribution of unrepresented clusters to the expected degree is at most

(n/m) · (151∆m) = 151∆n. Thus,

Ex[dG

]< Ex

[ nm· dGR + 151∆n

]= Ex

[ n

m ·m′·(m′ · dGR

)+ 151∆n

]<

n

(1− 151∆)m2·Ex

[m′ · dGR

]+ 151∆n

< (1 + 302∆)n

m2·m ·

(mndG + 300∆m

)+ 151∆n

< (1 + 302∆) · (dG + 300∆n) + 151∆n

< (1 + 302∆) ·((0.5 + 2∆− ε′)n+ 300∆n

)+ 151∆n

< (0.5 + (2 + 302 · 0.5 + 302 · 2∆ + 300 + 302 · 300∆ + 151)∆)n

< (0.5 + 605∆)n

87

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 98: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

As the average degree of G cannot be lower than its minimum degree, which is at least

(0.5−∆)n, we can apply Markov’s inequality to the difference between the average and

the minimum degrees of GR to conclude that, with probability greater than 1 − 0.2,

the average degree of G is under 0.5 + 5000∆. When this occurs, Lemma 4.6.11 and

Lemma 4.6.12 apply to G.

Thus an appropriate G′ and G′′ exist, with G′′ determined uniquely, and the test

doesn’t reject in Phase IV — with probability greater than 1− 0.23 overall, and with

probability greater than 1− 0.2 conditioned on those relevant previous desirable events

occurring.

Validity of the clustering of Scsize We can employ the same argument as for the

clustering Srep, except that the set size is t: With probability greater than 1− 0.005 all

degree estimates are correct to within ε′, and with probability greater than 1− 0.005, all

decisions of whether the pairs of non-heavy vertices in S′rep and in S′csize are in the same

cluster will be correct. Thus supposing that the representatives in R are all non-heavy,

with probability greater than 1− 0.01 all heavy vertices are discarded and all clustering

decisions are correct — that is, for every vertex v remaining in Scsize, π(v) = i if v is an

in the cluster of ri, and π(v) = ⊥ if it is unrepresented.

Thus the clustering is valid with probability greater than 1− 0.04 overall, and with

probability greater than 1− 0.01 conditioning on relevant previous desirable events.

Cluster sizes in Scsize As every cluster in G corresponding to a vertex of G′ has

size n/m, the expected fraction of Scsize from each cluster is 1/m. We apply a large

deviation bound for sums of low-probability indicators, to conclude that the probability

of a cluster having more than (1 + ε′/2)t/m vertices is less then

exp

(−(ε′t/2m)2

2(t/m)+

(ε′t/2m)3

2(t/m)2

)= exp

(−(ε′2 + ε′3/2) · t

8m

)< exp

(−ε′2 · t

10m

)= exp

(−Ω

(log(m) log

(1

ε′

)))<

0.01

m

Union-bounding over all m clusters we find that with probability greater than 1− 0.01,

all clusters have less than (1 + ε′/2)t/m vertices in Scsize. When this event occurs and

the clustering is also valid, no cluster has more than (1 + ε′/2)t/m vertices assigned the

same π(·) value.

Thus the cluster sizes are all determined to be no higher than ε′/2 over their expected

value, and the test does not reject on account of cluster size imbalance, with probability

greater than 1 − (0.04 + 0.01) = 1 − 0.05 overall, and with probability greater than

1− 0.01 conditioning on relevant previous desirable events.

High-degree vertices in Scsize Vertices in Scsize may only be discarded if they are

estimated to have degree over 0.52− 2ε′. With probability greater than 1− 0.005, all

88

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 99: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

degree estimates of vertices in Scsize are correct to within ε′ (by an argument similar to

previous phases) — independently of the choice of vertices in Scsize. When this event

occurs, only vertices with degree over 0.52−3ε′ may be discarded. By Lemma 4.6.14, the

fraction of these is at most 150∆. By a similar argument as for Phase III, the fraction

of these vertices in Scsize is at most ε′ higher than their fraction in G, with probability

greater than 1 − 0.005. Thus with probability greater than 1 − 0.01 (regardless of

previous desirable events), less than a 150∆ + ε′ < 151∆-fraction of the vertices of Scsize

are discarded for having overly high degree, independently of the choice of these vertices

in Scsize.

Note that this event is not one of the desirable events listed earlier in the proof, but

it is useful to condition upon; see below.

Vertex clustering failures in Scsize We wish to bound the fraction of vertices in

Scsize which are not discarded for overly high degree, but are assigned π(v) = ⊥; we

suppose that R is well-representing.

To do so, we begin by bounding the fraction of light vertices in Scsize without a

representative in R. There are at most ε′m/16 light-rich clusters unrepresented in R;

thus the total number of light vertices whose clusters are unrepresented in R does not

exceed m · ε′n/16m in light-poor clusters, plus ε′m/16 · n/m in light-rich clusters, or

ε′n/8 overall. Consequently, the expected fraction of light vertices in Scsize unrepresented

in R is ε′/8; with probability greater than 1− Ω(m · log(1/ε)) > 1− 0.005, the actual

fraction is under ε′/4.

Suppose that the vertex degree estimates in Phase V are correct to within ε′ (this

happens with probability greater than 1− 0.005 independently of the choice of Scsize).

Suppose also that the fraction of vertices with degree over 0.52− 3ε′ in Scsize (before

any discards) is at most 151∆; as argued above, this occurs with probability greater

than 1− 0.005. When both these events occur, the number of light vertices discarded

for having overly high estimated degree is at most a 151∆-fraction of the vertices of

Scsize. Thus the fraction of unrepresented light vertices after the discard is at most

1/(1− 151∆) < 2 times the original fraction, i.e. under ε′/2. As all vertices remaining

after the discard are light, the fraction of unrepresented light vertices is the fraction of

unrepresented vertices remaining in Scsize. Finally, if we suppose that the clustering

in Phase V is valid, this fraction is the fraction of vertices v remaining in Scsize with

π(v) = ⊥.

Thus less then an ε′/2-fraction of the vertices remaining in Scsize are assigned

π(v) = ⊥, with probability greater than 1−0.06 altogether, and with probability greater

than 1− 0.01 conditioning on relevant previous desirable events.

Validity of the clustering in Phase VI The argument regarding the clustering in

Phase III applies also to the clustering of vertices in Phase VI, using the representatives

in R. If the representatives in R are all non-heavy, then with probability greater than

89

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 100: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

1 − 0.005, every one of the non-heavy vertices being clustered in Phase VI will be

assigned the correct cluster, or assigned ⊥ if their cluster is not represented in R. With

probability greater than 1−0.005 all vertices sampled in Phase VI have degree estimates

correct to within ε′, so no clustering is attempted of heavy vertices. Thus the clustering

in Phase VI is valid with probability greater than 1− (0.03 + 0.01) = 1− 0.04 overall,

and with probability greater than 1− 0.01 conditioned on relevant previous desirable

events occurring.

Monotone agreement of G with G′′ Suppose that G′′ is determined uniquely, and

that the clustering in Phase VI is valid. When both these events occur, Phase VI does

not reject, because non-heavy vertices in G are in monotone agreement with G′, and

the test only checks vertices (correctly) determined to be non-heavy.

Thus G is not rejected in Phase VI with probability greater than 1− 0.24 overall,

and with probability 1 conditioning on relevant previous desirable events.

The conjunction of all desirable events above occurs with probability greater than

1− 0.305 > 2/3, so the test indeed accepts with sufficient probability.

Note. As in Section 4.5, the large deviation bounds are applied as though the vertex

samples are independent, while when a set of vertices is sampled without repetitions, this

is not the case. However, such bounds are even tighter for samples without repetitions,

so such use is justified.

Soundness of the test

Lemma 4.6.16. If G is ε-far from Πq , then it is rejected by Algorithm 4.4 with proba-

bility at least 2/3.

Proof. We prove that if the test accepts with probability at least 1/3, then G cannot

be ε-far from Πq .

If G has average degree over (0.5 + 2∆− ε)n, then it isn’t ε-far from Πq and the

claim holds trivially. We thus assume G’s average degree is under (0.5 + 2∆− ε)n. In

this case, G is accepted with at most a small constant probability in Phase I; it thus

remains to prove that if the other phases accept with probability at most slightly lower

than 1/3, then G cannot be ε-far from Πq .

If G has more than ε′n vertices with degree under (0.5−∆− 2ε′)n, then it is rejected

by Phase II, the vertex degree estimation phase, with probability greater than 3/4, and

the claim holds. Let us also assume, therefore, that G has at most ε′n vertices with

degree under (0.5−∆− 2ε′)n.

Let S′rep, R and G′′ be as determined in Phases III and IV. The clustering π they

induce is a clustering of at least (1− ε′)n of the vertices with degree at most 0.52− 3ε′

(that is, at most ε′ of these have π(v) = ⊥), as otherwise Phase V rejects with high

probability. Also, each cluster contains at most (1+2ε′) ·n/m such vertices, as otherwise

90

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 101: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Phase VI rejects with high probability. Finally, the number of edges missing between

clusters, which are connected in G′′, is at most ε′(n2

), as otherwise, again, Phase V

rejects with high probability.

To complete the proof we show that under the above conditions, the graph is close

to satisfying either (C1) or (C2), and consequently close to Πq . Indeed, suppose that

we modify G as follows:

1. Add the edges missing between clusters connected in G′′ edges.

2. Move vertices with degree at most 0.52− 3ε′ from clusters with more than n/m

vertices to smaller clusters (at most 2ε′n such vertices need be moved), adding to

their neighborhoods those edges mandated by G′ for the new cluster.

3. Move vertices with degree at most 0.52 − 3ε′, which have π(v) = ⊥, into any

cluster into clusters with less than n/m vertices (including possibly clusters not

represented in R), adding to their neighborhoods those edges mandated by G′ for

the new cluster.

4. Arbitrarily add edges to vertices with degree at least 0.52 − 3ε′ to make them

heavy (i.e. increase their degree to 0.52).

5. Distribute heavy vertices among clusters so that each cluster has exactly n/m

vertices (without making any actual edge changes).

The result of the above modifications is a partition into m equal-size cells, which

constitutes an approximate monotone blowup of G′. Finally, we connect heavy vertices

to low-degree vertices until they meet the minimum degree requirement in (C2). This

is possible, due to the fact that in a monotone blowup of G′ to order n with no heavy

vertices, each vertex has degree at least (0.5−∆)n, so any vertex in the modified G

with degree lower than (0.5−∆)n must be missing edges necessitated by G′; these

cannot be missing between pairs of non-heavy vertices, due to the monotone agreement

of G with G′, so they must be missing between heavy and non-heavy vertices.

Now, if at any point in the above operations we have added so many edges, that the

average degree of the modified G surpasses 0.5+2∆ — the graph satisfies Πq by condition

(C1), and we leave it as it is; otherwise, after all these operations, the graph must satisfy

(C2). Either way, the number of edges we have added is at most: ε′(n2

)for the first

operation; (2ε′n) · (n− 1) = 4ε′(n2

)for the second operation; (ε′n) · (n− 1) = 2ε′

(n2

)for

the third operation; 3ε′n2 < 7ε′(n2

)for the fourth operation (for sufficiently large n); no

changes for the fifth operation on the list; and less than ε′n ·(0.5−∆)n+n ·2ε′n < 6ε′(n2

)for the final minimum degree increase (for sufficiently large n; this accounts). In total,

less than (1 + 4 + 2 + 7 + 6)ε′(n2

)= 20ε′

(n2

)≤ ε(n2

)edge additions are necessary to make

G satisfy either (C1) or (C2). G is therefore not ε-far from Πq .

Observation 4.6.17. The test in Algorithm 4.4 can be generalized to the case in which

n/m is not an integer. The modification required, in broad terms, is to check that about

n mod m of the vertices have full-degree, account for them as an (m+ 1)th cluster, and

set the other cluster sizes accordingly. This may also necessitate special handling of

91

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 102: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

heavy vertices with full or almost-full degree (which cannot be told apart from the ‘real’

n mod m full-degree vertices) apart from heavy vertices with significantly lower degree.

We do not go into the details.

Theorem 4.6 is now proven by combining the lower bound of Lemma 4.6.9, and the

upper bound established through the test in Algorithm 4.4 — valid as per Lemma 4.6.15

and Lemma 4.6.16 — if m divides n, and by their variations taking Observation 4.6.17

into account, otherwise.

4.7 A hierarchy of one-sided-testable properties

We continue Section 4.5 and Section 4.6 with a third hierarchy theorem for dense graph

properties. In this section, we modify the construction in Section 4.5, so as to make

the properties amenable to a one-sided test at an arbitrary query complexity, while

any significant reduction in the number of queries precludes even two-sided testing —

in a sense, a tighter hierarchy. Unfortunately, while the construction maintains the

PTIME-decidability of the property itself, it seems to make testing the property less

computationally efficient, that is, we are not able to present a test whose running time

is polynomial in its number of queries — as a test seems to need to decide what is

essentially a subgraph isomorphism problem.

Theorem 4.7. There exists a constant ε4.7 > 0, such that for every reasonable q(·) (in

the sense of Definition 4.5.1), there exists a property of dense graphs that is testable

with one-sided error using O(q(n)/ε2

)queries (or O(q(n)) queries ignoring ε), but not

ε-testable using o(q(n)) queries, even allowing two-sided error, for ε ≤ ε4.7. Furthermore,

if q(n) is computable from n in poly(n) time, then the property is PTIME-decidable.

4.7.1 Property construction

Thinking about how to obtain a one-sided-testing hierarchy theorem, we naturally ask

ourselves whether Algorithm 4.3, the test used for the upper bound in Section 4.5, can

be made one-sided. The reason it cannot is that we require the clusters in the blown-up

graphs there to be of equal or almost-equal size; and if the cluster sizes are off, with

some clusters being significantly larger than others, then the graph would be far from

an appropriate blow-up (since vertices cannot be moved from one cluster to another

without many edge changes). A test cannot avoid, therefore, having to estimate these

sizes — and this estimate can be invalid, as the test’s sampled vertices may come from

just a few of the clusters. In light of this fact, let us forego the strict requirement on

cluster sizes, and only require that a graph be a generalized blowup (see Definition 2.3.6),

with potentially highly-disparate cluster sizes, but keeping all clusters present for easy

deterministic decision. (This modification will also allow us to handle more cleanly

handle the possibility of n not being an integer multiple of the size of the pre-blown-up

92

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 103: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

graph). We can then make a similar argument to that in Section 4.5, with the necessary

allowance for this generalization:

Lemma 4.7.1. There exists a global constant c4.7.1 > 0, such that for every n, ε, α and

every pair of (unlabeled) graphs (G1, G2) of order n, with G1 being α-dispersed, the

following holds: If G1 and G2 are ε-far from each other, then any (relaxed or proper)

generalized blowup of G2 to order n is at least c4.7.1α2ε-far from any balanced blowup of

G1.

Before proceeding to the proof, we recall having established in Lemma 4.5.13 that

a balanced blowup (rather than merely a relaxed generalized blowup) of G2 must be

c4.5.13 · α · ε-far from a balanced blowup of G1. To prove that a relaxed generalized

blowup G′2 of G2 is also far from G′1, we will want to relate the “degree of imbalance”

of a blowup to its distance from any balanced blowup. To do so, we first formalize this

concept.

Definition 4.7.2. Let G be a labeled graph of order n and G′ a relaxed generalized

blowup of G to order n′. The (relative) weight ρi of the cluster Vi of the ith vertex of G

is the fraction |Vi|/n′.

Definition 4.7.3. Let G be a graph of order n and G′ a relaxed generalized blowup of

G to order n′, with t = bn′/nc. G′ is said to be a δ-balanced blowup of G if the variation

distance between the relative weights of clusters in the blowup, and the relative weights

of clusters in a balanced blowup, is at most δ — over all possible choices of n′ mod n

larger clusters in a balanced blowup, i.e.

min

n∑i=1

∣∣∣ρi − sin′

∣∣∣ ∣∣∣∣∣ si ∈ t, t+ 1 ∧n∑i=1

si = n′

≤ 2δ

Notes.

– A balanced blowup is 0-balanced, and any relaxed generalized blowup is 1-balanced.

– If n divides n′, the condition for δ-balance is merely∑∣∣ρi − t

n′

∣∣ =∑∣∣ρi − 1

n

∣∣ ≤ 2δ.

Proof of Lemma 4.7.1. Let G′1 be a balanced blowup of G1 (with clusters of size either

t or t+ 1) and G′2 be a relaxed generalized blowup of G′2. Let us label the vertices of

both graphs, so that we may denote V (G1) = V (G2) = [n] (this also induces a labeling

of the blowup clusters)

We distinguish two cases, based on the “degree of imbalance” in the blowup of G2

into G′2. Our threshold δ-balance value for the analysis will be δ = c4.5.13 · α · ε/5.

Suppose, first, that G′2 is a δ-balanced blowup of G2. If that is the case, G′2 is in fact

4δ-close to a balanced blowup of G2: For the choice of si achieving the variation distance,

one simply moves a 2δ fraction of the vertices between clusters of G′2, so that the cluster

sizes become exactly the chosen si values. Switching the cluster of a single vertex entails

93

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 104: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

as many as n′−1 edge changes, for a total of 2δn′(n′−1) = 4δ(n′

2

)over all vertices moved.

We now use the triangle inequality to conclude that the δ-balanced blowup is at least

(c4.5.13 · α · ε− 4δ)-far from G′1; and we note that c4.5.13 · α · ε− 4δ = δ = c4.5.13 · α · ε/5.

We are therefore left with the case of G′2 not being δ-balanced. If, for every bijection

between G′1 and G′2, we were somehow able to pair the ‘excess’ vertices with other

distinct vertices, so that every pair is in the same cluster of G′2 but with different origins

in G′1, our proof would be concluded, as any such pair entails many discrepancies with

respect to the bijection.

Towards this end, note first that due to the δ-imbalance of G′2, for every choice of

blowup cluster sizes, more than a δ-fraction of the weight of clusters is excess weight

beyond the designated cluster weight, i.e. for every choice of (si)ni=1 corresponding to a

blowup, and denoting I =i ∈ [n]

∣∣ ρi > sin′

, we have

∑i∈I(ρi − si/n′

)> δ.

Now consider some bijection π′ between the two blowups. Let sπ′i = t+ 1 if any G′1

clusters of size t+ 1 are mapped to i in their entirety and t otherwise. Clearly, there

are at most n′ mod n indices i such that sπ′i = t+ 1, so there exists some choice of si’s

corresponding to a blowup for which si ≥ sπ′i for every i. Now, since for this choice we

have∑

i∈I(ρi − si/n′

)> δ, we also have, for the same I,

∑i∈I(ρi − sπ

′i /n

′) > δ.

We now wish to ‘pair up’ vertices from different G′1 clusters within clusters of G′2.

Consider some cluster i of G′2, with ρin′ vertices. The largest set of vertices in this

cluster with the same origin in G′1 is of size at most sπ′i ; consequently, cluster i has

at least 12

(ρi − sπ

′i /n

′) pairs of vertices from different clusters. (To see why this is the

case, think about repeatedly removing arbitrary pairs of vertices in G′2 originating in

different clusters of G′1; eventually one is left with vertices in G′2 all from the same

cluster in G′1, and their number cannot exceed sπ′i .) Over all clusters in I, we have∑

i∈I12

(ρin′ − sπ′i

)> δn′/2 such pairs. Each pair is the cause of αtn > α · n′/2 distinct

discrepancies (as discussed in the proof of Lemma 4.5.13 — the neighborhoods of the

two vertices must be made the same); the total number of discrepancies under π′ due to

all these pairs is at least δ/2 · αn′2/2. π′ was chosen arbitrarily, so the same minimum

number of discrepancies exists under any bijection between G′1 and G′2; thus the distance

between the two graphs is at least δα/4.

The claim is now proven by setting c4.7.1 = c4.5.13/20 and noting that min(δ, δα/4) =

δα/4 = c4.5.13α2/20.

A complexity-q property. Let Π′′ be as constructed in Subsection 4.5.1, a dispersed

PTIME-decidable property requiring Ω(n2)

queries, and let m(n, q) be as in Defini-

tion 4.5.9. We set Πq =⋃n∈N Πq

n, with Πqn containing all (proper) generalized blowups

of graphs in Π′′m(n,q). In other words, a graph in Πqn has m non-empty clusters with

complete bipartite graphs between cluster pairs corresponding to pre-blowup edges.

Lemma 4.7.4. If q(n) is computable from n in poly(n) time, then Πq is PTIME-

decidable.

94

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 105: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Proof. The proof is very similar to that of Lemma 4.5.11: Since no two vertices of a

graph in Π′′ have the same neighborhood, one can easily reconstruct the original graph

given m non-empty clusters, regardless of their sizes (but assuming that q(n) itself can

be computed in polynomial time). Since Π′′ is in PTIME, one can then efficiently

decide whether the pre-blown-up graph satisfies it or not. Note that the fact that Πq

contains generalized blowups rather than relaxed generalized blowups is critical to this

argument, as without a vertex from every cluster, one would only be able to reconstruct

a subgraph of the original order-m graph, and might then need to decide an instance of

subgraph isomorphism.

4.7.2 A query complexity lower bound for the constructed property

Consider again the hard-to-test PTIME-decidable property Π′′ constructed in Sec-

tion 4.5.1 using dispersing augmentations. When used in Section 4.5, its query complexity

and features were sufficient for establishing a lower bound on testing its blowups. Our

analysis here will have to be a bit finer, as we will not be using a reduction proper —

neither from Πq of Section 4.5 nor from Π′′ of Section 4.5.1.

We go back, in fact, to the hard-to-test property guaranteed in Theorem 4.2 (from

which Π′′ is constructed using dispersal augmentation) denoting it Π′. We recall that

by Lemma 4.2.6, Ω(n2)

queries are required to distinguish between distributions Gn,

a uniform distribution over graphs in Π′n, and Rn, a separating augmentation (as per

Definition 4.2.2) of a uniformly sampled graph of order b(n− 1)/3c. Let us now carry

this result over to dispersal-augmented graphs.

Before stating our lemma, we first note that our graphs of order m(n, q) are now

the results of dispersing augmentations. Recalling the definition of these augmentations

(Definition 4.2.2), and denoting by m′(n, q) the order of a pre-augmented graph, we have

m = m′+ 322dlog(2m′+1)e. Since m < 21+dlog(2m′+1)e, we have blog(m)c = dlog(2m′ + 1)e,

so m′ = m− 322blog(m)c.

Now, the indistinguishable distributions for our Πq are (for sufficiently large n):

R′n: A graph sampled from distribution Rm′(n,q), dispersal-augmented to order m(n, q),

then blown up to order n.

G′n: A graph sampled from distribution Gm′(n,q), dispersal-augmented to order m(n, q),

then blown up to order n.

Lemma 4.7.5. With R′n and G′n as per the above, any probabilistic oracle machine M

making o(q(n)) queries to its input graph satisfies Pr[MR

′n = 1

]= Pr

[MG

′n = 1

].

Proof. We repeat an argument used in proving Lemma 4.2.6: As both distributions

R′n and G′n are obtained by applying the same dispersing augmentation and blowup to

samples from Rm′ and Gm′ respectively, and as the result of each query to a dispersing

augmented graph depends on one or no edges of the original pre-augmented graph, and

95

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 106: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

the result of each query to a blowup depends on one or no edges of the pre-blown-up

graph — it suffices to prove the claim assuming queries are made to the original order-

m′ graphs from Rm′ or Gm′ respectively — rather than to blowups of their dispersing

augmentations. Lemma 4.2.6 establishes, specifically, that if o(q(n)) queries are made,

a machine has the same probability of accepting graphs from these two distributions.

Lemma 4.7.6. A graph sampled from R′n is c4.7.1δ4.2.5/(250 · 64)-far from Πq with

probability 1− o(1).

Proof. By Lemma 4.2.5, with probability at least 1 − o(1) a graph G sampled from

Rm′ is δ4.2.5-far from Π′′m′ . Supposing this is the case, consider some graph H ∈ Π′′m′ .

By Lemma 4.5.6, the dispersing augmentation of G will be δ4.2.5/250-far from the

dispersing augmentation of H. Now, the blowup of the dispersing augmentation of G

to order n is a balanced blowup of an 1/8-dispersed graph, so by Lemma 4.7.1, it is

c4.7.1(1/8)2 · δ4.2.5/250-far from any generalized blowup of the dispersing augmentation

of H. The claim follows when recalling that Πqn is the set of all generalized blowups of

dispersing augmentations of graphs in Π′′m′ .

We can now prove the lower bound, setting ε4.7 = c4.7.1δ4.2.5/(250 · 64):

Lemma 4.7.7. Any ε-test for Πq , for ε ≤ ε4.7, makes Ω(q(n)) queries.

Proof. Let n be sufficiently large for Lemma 4.7.5 to hold. For ε ≤ ε4.7, an ε-test

for Π making less than o(q(n)) queries, which accepts graphs in Π with probability

at least 2/3, will accept a graph sampled from R′n with probability at least 2/3 (by

Lemma 4.7.5). Now, by Lemma 4.7.6, with probability 1 − o(1), a graph from R′n is

c4.7.1δ4.2.5/(250 · 64) = ε4.7-far from Πq , so the probability of the test accepting a graph

sampled from R′n which are ε4.7-far from Πq is at least 2/3 − o(1). Thus for every

sufficiently large n there exists a specific graph in the support of R′n which is ε4.7 > ε-far

from Πqn, and is accepted with probability over 1/2 — a contradiction.

4.7.3 A one-sided test for the constructed property

Algorithm 4.5 will be the test achieving the upper bound.

Algorithm 4.5 A test for Πq

1: Compute m(n, q).2: Uniformly sample a set S of Θ(m/ε) vertices.3: Query the subgraph GS induced by S.4: If Gsample is a relaxed generalized blowup of a graph in Π′′m to order |S|, accept.

Otherwise reject.

We stress that the test does not expect its sample to be a proper generalized blowup

of a graph Π′′ to order n — that is, it may include merely a subset of the clusters of

96

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 107: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

such a blowup. Looking at the test, one may wonder whether it doesn’t, perhaps, accept

graphs which, despite being relaxed generalized blowups of graphs in Π′′, are still far

from proper generalized blowups (with all m clusters present); before proceeding to

proving the test’s validity, we allay this concern:

Lemma 4.7.8. Suppose n > 4m/ε, and let G1 and G2 be graphs of order m and n

respectively, such that G2 is ε-far from any proper generalized blowup of G1. Then G2

is ε/2-far from any relaxed generalized blowup of G1.

Proof. Let G′1 be a relaxed generalized blowup of G1 at minimum distance from G2. To

make G′1 into a proper generalized blowup of G1, one must ‘populate’ the missing G1-

vertex clusters with vertices from other clusters, which now have more than one vertex.

There are at most m−1 missing clusters, and moving a vertex from one cluster to another

requires at most n− 1 edge modifications. Thus the total number of edge modifications

required to populate all clusters is less than (n−1) · (m−1) < (n−1) ·(εn4 − 1

)< 1

2ε(n2

),

i.e. G′1 is ε/2-close to a proper generalized blowup of G1. It must therefore be the case

that G1 and G′1 are at least ε/2-far.

As the test samples more than 4m/ε vertices, the case of graphs with less than

this many vertices is covered by the default behavior on graphs with too few vertices —

querying the entire graph and deciding deterministically (see Definition 2.1.3 and the

following discussion). It thus suffices if the test rejects graphs of higher order which are

ε/2-far from relaxed generalized blowups of graphs in Π′′.

Lemma 4.7.9. Algorithm 4.5 is a valid test for Πq with one-sided error, making O(q(n))

queries.

Proof. Clearly, a graph G satisfying Πq is accepted with probability 1, as G is particularly

such a blowup, and being a relaxed generalized blowup of one of a set of graphs is

hereditary: Losing vertices simply means having smaller clusters (due to the relaxation

the clusters may be reduced to having 0 vertices).

Now suppose that G′ is ε/2-far from Πqn; we assume without loss of generality

that n > 4m/ε. Think of S as being sampled in 2m iterations, each adding O(1/ε)

newly-sampled vertices to S. Let Si denote the sample in the ith iteration and let

S≤i =⋃j∈[i] Sj . Consider GS≤i , after the ith iteration; suppose that it is a relaxed

generalized blowup of a graph in Π′′m. In this case, Lemma 4.7.10 below guarantees

that a uniformly sampled pair of vertices, when added to S, increases the number of

clusters over the number in GS≤i+1 with probability Ω(ε); when this pair is sampled

from V (G) \ S, the probability can only be higher. Thus with probability at least 2/3,

at least one of the O(1/ε) pairs increases the number of clusters. Consequently, over all

m iterations, our sampled subgraph has probability at least 1− exp(−Ω(m)) > 2/3 of

being rejected either for reaching more than m clusters in the subgraph, or for having

97

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 108: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

an induced subgraph which is itself not a relaxed generalized blowup of any graph in

Π′′m, discovered already in an early iteration.

Finally, the number of queries is(

Θ(m(n,q)/ε)2

)= Θ

(m2/ε2

)= Θ

(q(n)/ε2

)(see

Observation 4.5.10 regarding the last transition). Ignoring the dependence on ε, this is

indeed O(q(n)).

Lemma 4.7.10. Let G be a graph of order n > 2mε which is ε-far from Πqn, and

let GS′ be the subgraph of G induced by a set of vertices S′ ⊆ V (G). Let m′ denote

the number of clusters in GS′. Suppose that m′ ≤ m(n, q) and that GS′ is a relaxed

generalized blowup of some graph in Π′′n. Then for a uniformly sampled pair of vertices

u′, v′, there is a probability of at least ε/8 that S′ ∪ u, v induces a graph with more

than m′ clusters.

Proof. We first apply Lemma 4.7.8: Since G is ε-far from any proper generalized blowup

of a graph in Π′′m, it is ε/2-far from any relaxed generalized blowup of a graph in Π′′m.

Now, let G′ ∈ Π′′m be the graph of which GS′ is a relaxed generalized blowup. We

note that, specifically, GS′ is a proper generalized blowup of an induced subgraph G′′ of

G′, with |V (G′′)| = m′.

Consider a clustering of all vertices of G using S′ as a signature, i.e. vertices with

the same neighbors in S′ are in the same cluster. Some of these clusters contain vertices

from S′ (let Cv denote the cluster containing v ∈ S′), and some may be new, with S′

neighborhoods differing from all existing vertices in S′. If G has at least εn/8 vertices

in new clusters, one of them is sampled with probability at least ε/8, and the claim

follows, since it will constitute a new cluster in the sampled subgraph.

If, on the other hand, there are few new clusters, the clustering is at “risk” of

contradicting our assumption regarding G: If clusters Cu and Cv, for most pairs

u, v ∈ S′, are mostly consistent with u, v with respect to to the edge relation, then G

can be made into a relaxed generalized blowup of G′′ using few modifications. Specifically,

it must be the case that at least 14ε(n2

)edges u′, v′ with u′ ∈ Cu and v′ ∈ Cv for the

corresponding u, v ∈ S′ have u′, v′ ∈ E(G′′) iff u, v /∈ E(G′′); otherwise one can

correct all these discrepancies, then move all new-cluster vertices to S′-vertex clusters,

with at most n− 1 edge changes per vertex, for a total of 18ε(n− 1)n = 1

4ε(n2

)additional

changes.

Consequently, when sampling two new additional vertices u′, v′ from S′ clusters

(denoted Cu and Cv), with probability at least ε/4 we find that they do not agree with

their cluster with respect to being an edge. It must then be the case that the number

of clusters in GS′∪u,v must increase when clustering according to the neighborhoods

in S′ ∪ u, v.

Theorem 4.7 is now proven by a combination of the query complexity lower bound

of Lemma 4.7.7, the upper bound established through the valid test in Lemma 4.7.9,

and Lemma 4.7.4 regarding the decidability of Πq .

98

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 109: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Chapter 5

Lower bounds for testing partite

dense structures

5.1 Introduction and overview of results

While testing graphs has received the most attention in the research of combinatorial

property testing (specifically, testing graphs in the dense model), other dense structures

are also of interest. Some are strictly more expressive than graphs (see the discussion

of hypergraph partition properties in Chapter 6 below), some strictly less expressive,

such as bipartite graphs, and some have both restrictions and extensions of the power

of expression. This chapter considers the latter case: Bipartite graphs, but with edges

in multiple colors; and k-uniform hypergraphs which are also k-partite (referred to as

k-graphs for short throughout this section).

For strictly less expressive structures — in the same testing model essentially, the

dense model in our case — upper bounds on testing more expressive structures generally

apply, while lower bound results for more expressive structures come into question, as one

may expect to provide stronger upper bounds by exploiting the structural restrictions.

Such expectation was indeed justified for the case of bipartite graphs, with properties

defined by a family of forbidden subgraphs. In general graphs, testing arbitrary such

properties (without relying on the size of the input graph) requires the use of Szemeredi’s

regularity lemma, resulting in extremely poor dependence of the query complexity on

ε. While the known lower bounds are not at all close to the tower functions incurred

by the use of regularity, they are super-polynomial, and there is certainly reason to

suppose that the minimum query complexity of such tests is in fact much higher. As it

turns out, in bipartite graphs this is not the case.

Fischer and Newman showed, in [FN01], a first upper bound for testing forbidden

induced subgraphs in bipartite graphs (studying them as binary matrices, see below)

— although this was doubly-exponential in 1/ε and was not known to contradict the

established lower bounds, it was a non-regularity based technique, that could not be

applied as such to general graphs. It was improved upon in Alon, Fischer and Newman’s

99

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 110: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

[AFN07], yielding a polynomial upper bound on such tests. Interestingly, the main

tool of [AFN07] is a sort of a regularity lemma — but with a conditional: Either a

bipartite graph has a relatively small “regular” partition (we shall not go into the

details of the definition of regularity here) of size polynomial in the regularity parameter,

or every possible small induced subgraph appears in the graph, in significantly many

copies. In the former case one can construct a ‘signature’ of the regular partition using

polynomially many queries, and decide based on this signature; in the latter case, a

uniformly-sampled subgraph will, with high probability, contain a forbidden structure.

One would hope that such a technique may apply in somewhat more general contexts:

Instead of just bipartite graphs, graphs with edge colors (or alternatively, matrices

over a finite domain larger than 0, 1); or in higher dimension — k-edges instead of

two-edges, k-partite hypergraphs instead of bi-partite graphs (or alternatively, binary

tensors instead of matrices). This was an open question posed in [AFN07].

After some efforts attempting to extend the upper bound further, research has

yielded the opposite — an establishment of lower bounds, precluding this possibility:

Theorem 5.1. There exists a 2-colored bipartite graph F with two vertices per part,

such that any ε-test of 3-colored bipartite graphs for being free of having F as an induced

subgraph, performs no less than (c/ε)c·ln( c/ε ) queries for some global constant c.

Theorem 5.2. There exists a 3-uniform tripartite hypergraph F with two vertices in

each part, such that every ε-tester of 3-uniform tripartite hypergraphs for being free of

copies of F , as an induced sub-hypergraph, performs no less than (c/ε)c·ln( c/ε ) queries

for some global constant c.

The proofs use constructions based on adaptations of known lower bounds for testing

forbidden subgraphs (specifically, triangles) for general graphs; see also Section 3.5 in

this work, for a slight improvement which is also applicable here.

5.2 Additional preliminaries

The following table summarizes the specifics of dense model testing, for colored bipartite

graphs and for k-graphs, in comparison with the case of (general, non-partite) graphs:

Structures Graphsσ-Colored (Complete)Bipartite Graphs

k-Graphs

query “x, y ∈ E?”“what is col(x, y)?” withcol values ranging in0, . . . , σ − 1

“(x1, . . . , xk) ∈ E?”

maximumdistance betweenstructures

(n2

)n2 nk

vertex set(s) V U, V V1, . . . , Vkmeaning of the or-der n

|V | |U | = |V | |V1| = . . . = |Vk|

100

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 111: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

A matrix M over 0, 1 can be associated with the (labeled) bipartite graph G =

(U, V,E), with

U = u1, . . . , un

V = v1, . . . , vn

E(G) = (ui, vj) ∈ U×V |M(i, j) = 1

that is, the bipartite graph whose adjacency matrix is M . Similarly, a matrix over a

larger domain 0, . . . , σ − 1 can be associated with an appropriate σ-colored bipartite

graph; and a k-dimensional tensor T over 0, 1 can be associated with the k-graph of

which it constitutes the adjacency tensor.

The conceptual similarity between matrices or tensors, and representations of colored

bipartite graphs or of k-graphs, will be used implicitly throughout this chapter. Note,

however that properties are closed under relabeling, i.e. a permutation of the indices on

the axes of the matrix/tensor in all dimensions. Thus, when we refer to ‘submatrices’

of a bipartite graph’s adjacency matrix, we are actually referring to subgraphs — the

submatrix coordinates may be selected irrespectively of the order of coordinates in the

adjacency matrix.

Finally, our lower bound constructions also involve the following:

Definition 5.2.1. A cyclic k-partite digraph G = (V1, . . . , Vk, E) is a k-partite digraph

in which every edge in E extends from Vi to V(i mod k)+1 for some i ∈ [k].

5.3 A lower bound for colored bipartite graphs

Our proof for Theorem 5.1 will be based on the argument that any test (not just tests

with one-sided error) must, in some sense, find copies of forbidden subgraphs; see the

discussion in Section 3.3, and specifically, Corollary 3.3. We will thus be proving the

following key lemma, regarding the scarcity of forbidden subgraphs:

Lemma 5.3.1. There exists a (2, 2) bipartite graph F , such that for every ε and for

every n > 16(c/ε)−c·ln( c/ε ), there exists a 3-colored bipartite graph G which is ε-far

from being free of F , while the fraction of (2, 2) subgraphs of G which are copies of F is

at most (c/ε)−c·ln( c/ε ), for some global constant c.

In leading up to a proof of this lemma will shall begin with a simple and rough

construction: Describing how the adjacency matrix of a colored bipartite graph can

represent partite cyclic digraphs, with the representation preserving the distributions of

induced substructures in the digraph; we shall then construct 4-partite cyclic digraphs

in which induced directed 4-cycles are super-polynomially rare.

Such a construction will prove a weaker version of Lemma 5.3.1: For one, we will

have used many more than 3 colors — the representation of a digraph will not be very

101

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 112: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

terse; also, we will have used numerous forbidden submatrices, as a submatrix of the

digraph representation will also contain information about additional edges to those

constituting a 4-cycle. We will then proceed to make several refinements:

1. One may construct digraphs with few 4-cycles as described above, with the

additional constraint that the first three edge layers are identical.

2. One may construct 4-partite digraphs as described in item 1, with the additional

constraint that the edge layers are symmetric with respect to a relevant ordering

of the vertices in each part.

3. The construction for item 2 can be shown to satisfy the additional constraint

that no pair of vertex indices is such that its corresponding pairs of vertices are

connected in all four edge layers.

These successive refinements will bring every pair of vertex indices j1, j2 to have

only three possible edge configurations; consequently, we will only need three colors for

the matrix representation of the digraph, and only one forbidden submatrix (i.e. only

one forbidden subgraph).

5.3.1 Representing cyclic partite digraphs by matrices

Given a cyclic k′-partite digraph, we decompose its edges into k′ bipartite digraphs

between pairs of cycle-consecutive parts. The edge relation between each of these pairs

can be thought of in terms of its binary adjacency matrix, leading to the following

representation:

Definition 5.3.2. Let G = (V1, . . . , Vk′ , E) be a cyclic k′-partite digraph, with k′

vertex sets of size n each, where Vi=(vi,1, . . . , vi,n). The matrix representation of G,

denoted CM(G), is the matrix of order n, over a domain of size 22k′ (the set of cell

colors), corresponding to all possible combinations of the following 2k′ binary values:

For M = CM(G), each cell M(j1, j2) has a distinct color bit for each one of the k′ edges

(v1,j1 , v2,j2), . . . ,(vk′−1,j1 , vk′,j2

),(vk′,j1 , v1,j2

), and another bit for each one of the k′

edges (v1,j2 , v2,j1), . . . ,(vk′−1,j2 , vk′,j1

),(vk′,j2 , v1,j1

). Each bit is set to 1 if its respective

edge exists, and to 0 otherwise.

Our lower bound construction utilizes cyclic 4-partite digraphs which are far from not

containing a (directed) 4-cycle, yet have few copies of it; we consequently set henceforth

k′ = 4. The reason for this choice of the number of parts is that 4 is the lowest even

number of parts with an induced subgraph for which testing freeness is hard — as

described in Alon and Shapira’s [AS04b]. Our matrix representations CM(·) therefore

has cells with 22k′ = 28 = 256 possible values.

102

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 113: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

The forbidden submatrices

Querying a matrix cell at (j1, j2) yields information about edges in all four layers;

querying a 2× 2 submatrix with coordinates (j1, j3)× (j2, j4) yields information about

several directed 4-cycles, one of which is C = (v1,j1 , v2,j2 , v3,j3 , v4,j4). We note that for

every 4-cycle of G there is a choice of j1, . . . , j4 for which C as defined above corresponds

to that cycle. We thus only need to forbid 2× 2 submatrices witnessing the existence

of the four edges of the single directed cycle C associated with a given submatrix.

There are many possible such 2× 2 submatrices, as the existence of any of the rest of

the (k′/2)2 · 2k′ − k′ = 28 edges represented in the submatrix cells does not affect the

presence of C. The forbidden submatrices are therefore the 228 matrices in which the

four color bits for the edges of C are set.

Note that in some cycles of G it may be the case that j1 = j3 and/or j2 = j4. We

refer to such cycles as degenerate; our construction and our arguments below will only

involve graphs with no degenerate cycles, so we may disregard these.

For every copy of a (non-degenerate) 4-cycle in G, there exists exactly one order-2

forbidden submatrix in CM(G) (recall that the submatrix may appear permuted in rows

or columns). This is true despite the fact that it is possible to infer the existence of a

4-cycle also from other submatrices of CM(G). In other words, a selection of a order-2

submatrix of CM(G), and a check of whether its C exists, correspond to a selection of

four vertices in the four parts of G and a check of whether they form a (non-degenerate)

cycle. With n = |Vi| as the size of each Vi, There are (n(n− 1))2 such possible choices.

5.3.2 An initial hard-to-test matrix

Definition 5.3.3. The trivial integer solutions to the equation x1+x2+. . .+xr = r·xr+1

are those in which all of x1, . . . , xr are equal.

Lemma 5.3.4 ([Alo02, Lemma 3.1] and [AS04b, Lemma 6.1]). For every natural r ≥2, and for every positive integer m, there exists a subset Xm ⊆ [m], of size at least

exp(−10

√ln(m)ln(r)

)·m, with no non-trivial solution to the equation x1+x2+. . .+xr =

r · xr+1.

Fix r = 3 and ε′ = 8ε. Let m be the maximum possible satisfying ε′m < 7 · 2−1 ·4−4|Xm|, obtaining, for an appropriate constant c, the bound m ≥ (c/ε′ )c·ln( c/ε

′ ).

Using such a set X = Xm, we construct a cyclic 4-partite digraph T : The four parts

of T ’s vertex set, V1, . . . , V4, have cardinalities m, 2m, 3m, 4m respectively. For every

i ∈ 1, 2, 3, j ∈ [im] and x ∈ X, T has the edge (vj , vj+x) between Vi and Vi+1; T also

has the edges between V4 and V1 of the form (vj+3x, vj), for every x ∈ X and j ∈ [m].

As one may verify (see [AS04b, Lemma 6.2]), E(T ) contains m|X| edge-disjoint

copies of the directed 4-cycle, formed by 4m|X| edges, and no other directed 4-cycles;

T ’s total number of edges is (1 + 2 + 3 + 1) · |X| ·m > 2 · 44ε′m2. For our purposes we

would like all parts Vi to have the same size, so we add isolated vertices making every

103

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 114: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Vi of size exactly 4m. Let T1 be the graph resulting from this addition. We note that

all cycles of T1 are non-degenerate.

Lemma 5.3.5 (special case of [AS04b, Lemma 6.3]). Let K = (V (K), E(K)) be a di-

graph and let T = (V (T ), E(T )) be an s-factor blowup of K. Let R be a subset of the

set of edges of T , and suppose that each copy of K in T contains at least one edge of R.

Then |R| > |E(T )|/|E(K)|2 > |E(T )|/|V (K)|4.

Now let G1 denote an s-factor blowup of T1, with s = bn/(4m)c. We have |E(G1)| ≥s2 · 2 · 44ε′m2 > 42ε′n2. Since E(T1) consists of edge-disjoint 4-cycles, E(G1) consists

of edge-disjoint s-blown-up 4-cycles. By Lemma 5.3.5, with a 4-cycle as K, at least a1

|E(K)|2 = 142

-fraction of the edges of each of these s-blown-up 4-cycles must be removed

so as to remove all 4-cycles from G1; G1 is thus ε′-far from being 4-cycle-free. On the

other hand, as |X| ≤ m, G1 has m|X| · s4 ≤ m2s4 < 44n4/m2 copies of the 4-cycle. One

can also verify that all cycles of G1 are non-degenerate.

We must now transform the argument regarding the scarcity of forbidden subgraphs

in G1 to apply to forbidden submatrices in CM(G1).

Proposition 5.3.6. For σ = 28 there exists a finite set F of σ-colored order-2 matrices,

such that for every ε and n > (c/ε)c·ln( c/ε ), there exists a σ-colored matrix M which is

ε-far from being free of members of F , and yet, the fraction of order-2 submatrices of M

which are copies of a member of F is at most (c/ε)−c·ln( c/ε ) for some global constant c.

Proof. Let M = CM(G1), and set the family of forbidden matrices to be the 228 matrices

defined above.

To prove the second part of the claim we recall that there is only one copy of a

forbidden matrix in CM(G1) for every copy of a 4-cycle in G. Only c1n4/m2 of the

(n(n− 1))2 possible directed non-degenerate 4-cycles with vertices in consecutive parts

appear in G, so no more than an 8c1

/m2 fraction of the (n(n− 1))2 submatrices of

CM(G1) of order 2 are copies of forbidden matrices.

For the first part of the claim, we note that by modifying a matrix cell one affects

the representation of at most 8 edges of G1. Thus, unless at least ε′n2/8 = εn2 cells are

modified, more than (1− ε′)n2 of the edges of G1 have their two representing color bits

(i.e. in both the cells CM(G1) (i, j) and CM(G1) (j, i)) unmodified. In this case, G1 still

has a 4-cycle with its representing order-2 submatrix intact, i.e. CM(G1) still contains

a copy of a forbidden matrix.

5.3.3 Reducing the number of colors

As mentioned above, 256 colors are more than is necessary to construct a hard to test

matrix. We now reduce this number by refining our construction; as we do so, we

lose the expressivity of matrices; we maintain, however, the ability to represent the

particular graphs we construct for proving the lower bound.

104

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 115: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Making most edge layers identical

We note that in the graph T1, the edge sets in the ‘first’ 3 layers, those between Vi and

Vi+1 for 1 ≤ i ≤ 3, are quite similar: vi,j is connected to vi,j+x. The difference is that

in each of the Vi’s, only the first im vertices are connected onwards to vertices in Vi+1.

We now add “+x” edges for all vertices in each Vi, not just the first im vertices — that

is, we make (vi,j , vi+1,j+x) an edge whenever j + x ≤ 4m and x ∈ X.

Let T2 denote this new graph. As with the graph T1, every directed 4-cycle

(v1,j1 , v2,j2 , v3,j3 , v4,j4) in T2 satisfies

(j2 − j1) + (j3 − j2) + (j4 − j3) = (j4 − j1)

so when denoting

x1 = j2 − j1 x2 = j3 − j2 x3 = j4 − j3 x4 = (j4 − j1)/3

the equation becomes x1 + x2 + x3 = 3x4; since x1, . . . , x4 ∈ X, all four x values must

be equal. Also, if such a cycle begins with j1 > m in V1, then

(j1 −mbj1/mc, j2 −mbj1/mc, j3 −mbj1/mc, j4 −mbj1/mc)

is another cycle in T2 (the vertex indices all remain positive), which begins with j1 ≤ m,

i.e. it corresponds to a cycle in the original T1. It follows that the total number of cycles

has increased by no more than a factor of 4, and that all cycles are still non-degenerate.

Since all cycles are edge-disjoint in T2 as well, the number of cycles increases with

the s-factor blowup of T2 into G2 by a factor of s4, as in the case of G1. G2 has the

same vertex sets as G1, and a superset of the edges of G1, making it at least as far

from being 4-cycle free as G1. As for the number of cycles, T1 had at most m|X| < m2

4-cycles, T2 has at most 4m|X| < 4m2 4-cycles, and G2 has at most 4m2s4 < c2n4/m2

4-cycles, for some constant c2.

We can now use our different construction of T2 to reduce the number of colors

necessary for its representation: As the bits for the three Vi → Vi+1 edge layers are the

same, we only need two bits for each type of layer (one for the j1 → j2 edge and one for

the ‘flip’ edge j2 → j1), times two types of layers (Vi → Vi+1 and V4 → V1): in total we

now use only 24 = 16 colors. This property of T2’s first three layers carries over to G2

with the blowup.

Our observations thus lead us to conclude that Proposition 5.3.6 also holds for

σ = 24, with a different choice of the constants.

Making the edge layers symmetric

The number of color bits may be further reduced — halved — if we ensure that whenever

(vi1,j1 , vi2,j2) is an edge, so is (vi1,j2 , vi2,j1). To achieve this, we could add the ‘flip’ edges

105

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 116: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

to T2 – in addition to the edge (j1, j1 +x) between Vi and Vi+1, and the edge (j1 +3x, j1)

between V4 and V1, we could add (j1 + x, j1) between Vi and Vi+1, and (j1, j1 + 3x)

between V4 and V1 respectively.

The addition of the ‘flip’ edges may, however, result in an excessive increase in

the number of cycles, and possibly also result in intersections of the edges of different

cycles. To avoid this, we again modify our pre-blowup graph T . Let us first consider a

replacement of T2 by the following T ′3: Each of the four vertex sets is now 1, . . . , 4 · 6 ·m.The edges in the first three layers (which continue to be uniform) are (j1, j1 +x+5m) for

all x ∈ X and j1 ∈ [24m− x− 5m]; the edges between V4 and V1 are (j1 +3(x+5m), j1)

for all x ∈ X and j1 ∈ [24m− 3(x+ 5m)]. (The choice of a 5m offset is made with

foresight of the argument in Section 5.4.) Each directed 4-cycle (v1,j1 , v2,j2 , v3,j3 , v4,j4)

must still satisfy

(j2 − j1) + (j3 − j2) + (j4 − j3) = (j4 − j1)

We denote

x1 = j2−j1−5m x2 = j3−j2−5m x3 = j4−j3−5m x4 = (j4−j1−15m)/3

and this yields again the equation x1 + x2 + x3 = 3x4. Thus as in the case of T2

above, cycles only exist when the edge x-values are all equal, and they must begin at

j1 ≤ 9m − 3 so that the three additions do not exceed 24m. Thus T ′3 has less than

9m|X| copies of a 4-cycle.

We now add all flip edges to T ′3: the edges of the form (vi,j1+x+5m, vi+1,j1) are added

in the first three layers, and the edges of the form(v4,j1 , v1,j1+3(x+5m)

)are added in the

fourth layer. Let T3 denote the resulting graph.

Lemma 5.3.7. Every cycle in T3 is either a cycle in T ′3 (a no-flip-edge cycle) or a

cycle consisting only of flip edges.

Proof. Consider first some tuple (j1, j2, j3, j4) of vertex indices in the four parts where

the first two edges are non-flip while the third one is a flip edge. In this case, we find

that j4 cannot be very far from j1:

|j1 − j4| = |(j1 − j2) + (j3 − j2)− (j3 − j4)| ≤ 2 · (5m+m)− 5m < 7m

however, for (j4, j1) to be an edge in the fourth layer (either a non-flip or a flip edge),

we must have |j4 − j1| = 15m+ 3x for some x ∈ X. No such edges exist, proving that

such a cycle is impossible. The remaining cases where one of three Vi → Vi+1 edges

is in the direction opposite to the other two edges are similarly impossible, implying

that the edges in the first three layers are in the same ‘direction’ for every cycle of T3.

If these three edges are non-flip edges, the j’s are an increasing sequence, and so the

fourth edge must have j4 > j1, i.e. it must also be a non-flip; if the edges in first three

106

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 117: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

layers are flip edges, the j’s are a decreasing sequence, and j4 < j1, i.e. the fourth edge

must also be a flip edge.

As for the number of cycles with all-flip or all-non-flip edges: If (vi1,j1 , vi2,j2) is a non-

flip edge, then (vi1,24m−j1+1, vi2,24m−j2+1) is a flip edge, and (vi1,24m−j2+1, vi2,24m−j1+1)

is a non-flip edge. Thus if

(v1,j1 , v2,j2 , v3,j3 , v4,j4)

is a cycle with no flip edges, then

(v1,24m−j1+1, v2,24m−j2+1, v3,24m−j3+1, v4,24m−j4+1)

is an all-flip-edge cycle, and vice-versa. This bijective correspondence, together with

the lemma above, bring us to conclude that there are exactly twice as many cycles in

T3 as there are in T ′3, and that they are all edge-disjoint. Furthermore, the necessity of

the first three edges to be in the same direction means that j1 6= j3 and j2 6= j4, so all

cycles are still non-degenerate.

T3 is a graph with 24m vertices in each part and no more than 18m|X| 4-cycles, all

edge-disjoint. Blowing it up by a factor of s = n/(20m) , we obtain a graph G3 with n

vertices per part and 18m|X| · s4 ≤ c3n4/m2 cycles for an appropriate constant c3. G3

is also ε′-far from being cycle-free, by an argument similar to the case of G1, with a

proper choice of m(ε′) = (c/ε)c·ln( c/ε ) reflecting the change in the constants used in the

construction of T3 and the blowup.

To represent G3, we only need two bits of color: One bit for the first three layers (a

single bit now suffices for both the ‘non-flip’ and the ‘flip’ edge), and one bit for the

V4 → V1 layer. We have thus brought down σ, the domain size for matrix cell values for

which Proposition 5.3.6 holds, to 22 = 4 (again, with a different choice of a constant c).

Mutual exclusion between the edge layers

How can we further reduce the number of colors? The upper bound result of [AFN07]

implies that it is impossible to reduce the number of bits per cell from two to one,

without making the matrix easy to test for the presence of forbidden submatrices. Still,

a decrease from four to three colors is possible. In fact, if we review the construction of

T3 and G3 carefully, we find that for any (j1, j2), we only have three edge combinations

represented for (j1, j2) (and the now-symmetric (j2, j1)):

1. (j1, j2) is an edge in the Vi → Vi+1 layers, but not in V4 → V1.

2. (j1, j2) is not an edge in Vi → Vi+1 layers, but is an edge in V4 → V1.

3. (j1, j2) is not an edge in any layer.

No (j1, j2) can be an edge in all four layers, since edges in V4 → V1 correspond to index

differences |j1 − j2| of at least 15m+ 1 (before the blowup of T3 into G3), while edges

107

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 118: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Vi → Vi+1 correspond to differences of at most 6m. Thus Proposition 5.3.6 holds for

CM(G3) as a 3-colored matrix as well. In fact, we are now able to prove Lemma 5.3.1:

Proof of Lemma 5.3.1. G is CM(G3) constructed above. Indeed, there is now only one

possible order-2 submatrix (up to permutations) of CM(G3) witnessing the presence

of its corresponding cycle C in G3: MF = ( 1 21 1 ) (this is a matrix over 0, 1, 2). One

may verify that in all other order-2 submatrices, at least one of the cycle edges must be

missing. Thus F is the subgraph with adjacency matrix MF.

5.3.4 Proof of the lower bound

Observation 5.3.8. The property of colored bipartite graphs being free of the forbidden

subgraph F , of the proof of Lemma 5.3.1, is hereditary — like any property of being

free of forbidden induced substructures. It is also inflatable, as F is not a blowup of a

smaller graph.

With our construction in Lemma 5.3.1 and the above observation, we can now

proceed to proving the lower bound theorem. As our proof makes use of the general

results regarding dense structures (in Section 3.9), we make a final observation regarding

the testing model to justify this use:

Observation 5.3.9. 3-colored bipartite graphs can be expressed as a class of general

dense structures (as per Subsection 2.1.1): Two vertex sets V1 = U , V2 = V , and two

edge relations E1, E2, with appropriate constraints. The constraints would be: Every

edge of any of the two relations has the first vertex in U , and the second in V ; and

whenever E1(u, v) is an edge, E2(u, v) can’t be an edge. A query of an edge of the

3-colored bipartite graph corresponds to two queries, of the values of E1 and E2, for

the appropriate tuple. Also, the bipartite graphs we consider are of uniform order —

both parts have the same number of vertices.

Proof of Theorem 5.1. Consider an ε-test of 3-colored bipartite graphs for being free

of the forbidden subgraph F , which makes at most q(ε) queries. As this property is

hereditary and inflatable, we may apply Corollary 3.9 to this test, which is specifically

a uniform-order test, to obtain a perfectly canonical one-sided test for F -freeness with

queried subgraph order q′(ε) ∈ poly(q(ε)).

By Lemma 5.3.1, there exists (for any sufficiently large n) a graph G and a forbidden

subgraph F , such that G is ε-far from being free of F , but only a (c′/ε)−c′·ln( c′/ε )

fraction of its order-2 subgraphs are copies of F , for some global constant c′.

The expected number of copies of F in the subgraph of G queried by the perfectly

canonical test is no more than O(q′4)/

(c′/ε)c′·ln( c′/ε ) — the expected number of copies

of CM(F ) in a submatrix of CM(G) of order O(q′). Thus if q(ε) < (c/ε)c·ln( c/ε ), for an

appropriate constant c, then the expected number of forbidden subgraphs discovered is

o(1), so the test accepts G with probability 1− o(1) — a contradiction.

108

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 119: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

5.4 A lower bound for k-uniform k-partite hypergraphs

5.4.1 A hard-to-test tensor

Fix ε. Let M be as in the proof of Lemma 5.3.1, but with distance parameter ε′ = 2ε.

Let us again think of the 3-colored matrix M as having two color bit layers: One

bit-layer for the first three edge-layers of the 4-cycle (Vi to Vi+1), and another bit-layer

for the 4th edge-layer (V4 to V1); it is still the case that no matrix cell M(j1, j2) has

both of its bits set.

Let us separate M into two binary matrices M ′ and M ′′, with M ′(j1, j2) being the

first color bit of M(j1, j2) and M ′′(j1, j2) being the second color bit. Using these two

matrices, we construct a 3-dimensional tensor T of order n:

T (x, y, z) =

M ′(x, y) 1 ≤ z ≤ n/2

M ′′(x, y) n/2 < z ≤ n

We split the forbidden order-2 matrix MF of Lemma 5.3.1 in a similar fashion, to obtain

a forbidden order-2 subtensor TF:[(1 0

1 1

),

(0 1

0 0

)]

(the two matrices are the layers for the two values in the z coordinate).

Lemma 5.4.1. Let T ′ be a subtensor of T with coordinates (j1, j3)× (j2, j4)× (z1, z2).

T ′ = TF if and only if the following holds: (j1, j2, j3, j4) are vertex indices of a cycle in

G3, z1 ∈

1, . . . , n2

and z2 ∈n2 + 1, . . . , n

.

Proof. If z1, z2 ≤ n2 or z1, z2 >

n2 , then T ′ is invariant along the z-axis and is therefore

not a copy of TF. Now suppose that z2 ∈

1, . . . , n2

and z1 ∈n2 + 1, . . . , n

; in

this case, all of (vj1 , vj2), (vj3 , vj2) and (vj3 , vj4) are edges in the fourth edge layer

of G3 and (vj4 , vj1) is an edge in the first three edge layers. We recall that G3 is a

blowup of the graph T3, thus there exist vertices vj′1 , . . . , vj′4 ∈ T3 such that(vj′1 , vj′2

),(

vj′3 , vj′2

)and

(vj′3 , vj′4

)are edges in T3’s fourth edge layer, and

(vj′4 , vj′1

)an edge

in its first three edge layers. Now, the edges in the fourth layer correspond to index

differences |j′1 − j′2|,|j′3 − j′2| and |j′3 − j′4| of at least 15m+ 3. Thus either j1 ≤ 9m− 3

or j1 ≥ 15m+ 4. In the first case, j2 ≥ 15m+ 4, j3 ≤ 9m− 3 and j4 ≥ 15m+ 4, thus

|j′1 − j′4| > 6m+ 7, which makes it impossible for (j′4, j′1) to be an edge in the first three

layers (in which the index differences are of the form 5m+ x with x ≤ m). The second

case is similar. Thus whenever z2 ∈

1, . . . , n2

and z1 ∈n2 + 1, . . . , n

, it is impossible

that T ′ = TF.

Finally, suppose (z1, z2) ∈

1, . . . , n2×n2 + 1, . . . , n

. In this case T ′(·, ·, z1) is the

first color bit of a order-2 submatrix of M , and T ′(·, ·, z2) is the second color bit thereof.

109

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 120: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

If (j1, j2, j3, j4) are not vertex indices of a cycle of G4, then at least one of the four ‘1’

bits of TF must be missing from T ′, so again T ′ 6= TF.

For the second direction of the lemma, let (j1, j2, j3, j4) be vertex indices of a cycle in

G3, and let (z1, z2) ∈

1, . . . , n2×n2 + 1, . . . , n

. The existence of the cycle constrains

the four subtensor cells corresponding to the four edges to be 1, and the fact that no

edge can exist both in the first three edge layers of G3 and in its fourth layer constrains

the other four bits to 0, so indeed T ′ = TF.

5.4.2 Proof of the lower bound

Lemma 5.4.2. There exists a single 3-dimensional binary tensor TF of order 2, such

that for every n, ε there exists a tensor T , which is ε-far from being free of TF, yet the

fraction of order-2 subtensors of T which are copies of TF is at most (c/ε)−c·ln( c/ε ), for

some global constant c.

Proof. Let T , TF, M , M ′, M ′′ and ε′ be as in Subsection 5.4.1. Lemma 5.4.1 ensures

that for every choice of z-axis coordinates z1, z2, either no choices of (j1, j3)× (j2, j4)

yield a copy of TF (for the case of z1 >n2 or z2 ≤ n

2 ), or at most a (c′/ε′ )−c′·ln( c′/ε′ )

fraction of these choices yield such a copy (due to the properties of M). Setting c = c′/2

we conclude that at most a 14 · (c

′/ε′ )−c′·ln( c′/ε′ ) < (c/ε)−c·ln( c/ε ) fraction of the order-2

subtensors of T are copies of the forbidden subtensor.

As for the distance from being TF-free, for every z1 ∈ [n/2], one must modify enough

cells of T (·, ·, z1) = M ′ and T (·, ·, z1 + n2 ) = M ′′ to affect all copies of TF located in

this pair of layers. These copies are in bijective correspondence with the copies of the

forbidden order-2 matrix in M , and the number of x, y coordinate pairs in which M has

to be changed to remove all copies of the forbidden submatrix is at least ε′n2; thus at

least 2εn2 changes are necessary to remove all copies of TF in T (·, ·, z1), T (·, ·, z1 + n2 ).

There are n2 disjoint pairs of such layers, so at least εn3 changes are needed in total. T

is therefore ε-far from being TF-free.

Observation 5.4.3. The property of 3-graphs of being free of the forbidden sub-3-

graph, whose adjacency tensor is TF, of the proof of Lemma 5.4.2, is hereditary — like

any property of being free of forbidden induced substructures. It is also inflatable, as

TF is not a blowup of a smaller tensor, so F is not a blowup of a smaller 3-graph.

Before proceeding to the proof, we note that 3-graphs can be expressed as a class of

a general dense structures (as per Subsection 2.1.1): 3-partite structures, with a single

ternary edge relation, constrained to only have edges with the first vertex of the tuple

in the first vertex part, the second in the second part and the the third in the third

vertex part. As the 3-graphs also have uniform order (the same number of vertices in

each part), this implies that the results in Section 3.9 apply for 3-graphs.

110

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 121: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Proof of Theorem 5.2. The proof is virtually the same as for Theorem 5.1.

Consider an ε-test of 3-graphs, making at most q(ε) queries, for being free of the

forbidden 3-hypergraph F whose adjacency tensor is TF from Lemma 5.4.2. As by

Observation 5.4.3 this property is hereditary and inflatable, we may apply Corollary 3.9

to this test, which is specifically a uniform-order test, and obtain a perfectly canonical

one-sided test for F -freeness with queried subgraph order s(ε) = poly(q(ε)).

By Lemma 5.4.2, there exists a 3-dimensional tensor T of order n that is ε-far from

being free of TF, but only a (c′/ε)c′·ln( c′/ε ) fraction of its order-2 subtensors are copies

of TF, for some global constant c′; let H be a 3-graph whose adjacency tensor is T .

The expected number of copies of F in a uniformly sampled sub-hypergraph of

H is no more than O(s6)/

(c′/ε)c′·ln( c′/ε ) – the expected number of copies of TF in

a uniformly sampled subtensor of T of order s. Thus if q(ε) < (c/ε)c·ln( c/ε ), for an

appropriate constant c, then the expected number of copies of F discovered is o(1), so

the test accepts H with probability 1− o(1) — a contradiction.

111

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 122: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

112

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 123: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Chapter 6

Pseudo-testing hypergraph tuple

partition properties

6.1 Introduction

In this chapter we seek to further chart the territory of efficiently-testable properties

of dense structures — specifically, hypergraphs with multiple edge relations or colors.

For the case of dense graphs, [GGR98] established several specific properties to be

testable using poly(1/ε) queries — bipartiteness and colorability, max-clique, bisection,

etc. — concluding with graph partition properties which can express all of these. Such

properties are defined using a partition of the vertices (not the edges), with constraints

on the sizes of the partition cells, and on the density of the bipartite graphs between

edges (see [GGR98, Section 9] for the details). [GGR98] establishes their polynomial

testability (albeit with running time exponential in the number of queries and 1/ε).

To date, this is the widest known “naturally-arising” class of polynomially-testable

properties of dense graphs.

In [FMS07], Fischer, Matsliah and Shapira extended the polynomial testability of

partition properties to hypergraphs with multiple edge relations. The constraints in

this setting are very similar to the graph case: constraints on the densities of vertex

partition cells, and on the densities of the uniform hypergraphs with vertices originating

in combinations of these partition cells.

A noteworthy use of this extension is its application to testing regular partitions in

graphs (rather than hypergraphs): [GGR98] partition properties are not rich enough to

express the constraint on a bipartite graph between two vertex sets of being regular in

the sense of Szemeredi’s regularity lemma. With hypergraphs, this constraint becomes

expressible: Using the terminology of Gowers in [Gow07], a bipartite graph is regular

if it has few ‘combinatorial octahedra’ — quadruples of vertices, two from each set,

supporting a length-4 cycle. (This alternative view appears implicitly already in

Alon, Duke, Lefmann, Rodl and Yuster’s [ADL+94].) One can construct an auxiliary

hypergraph for a given graph, with an appropriate quaternary relation, and constrain it

113

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 124: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

to have few such octahedra; thus one can test, with a number of queries polynomial

in ε, whether or not a graph has a regular partition with at most m cells. But while

the broadening of the scope of testing to hypergraphs allowed for regular partitions of

graphs, this kind of partition properties is not rich enough to express the constraints

necessary to test for regular partitions of hypergraphs.

The motivation for considering a generalization of [GGR98] and [FMS07] partition

properties is therefore double: The question of whether one can expand further the

class of efficiently-testable properties (in hypergraphs, but possibly with implications

on other structures); and the prospects for testing hypergraphs for regular partitions of

a fixed size.

The generalization we make is that of partitioning not just the vertices of a hyper-

graph, but also tuples of multiple vertices — one partition of the singletons, another of

the pairs, another of the triples, etc. Of course, this is meaningless unless the constraints

on edges regard these partitions of tuples, so let us illustrate what this entails. All

(hyper)edge constraints have the form “the density of k-vertex tuples, being edges of a

certain color which satisfy some condition relating to the partition(s), out of the total

ns such tuples, is such-and-such”. In [GGR98], the constraints are on 2-tuples (or on

2-sets, depending on whether the graph is directed), and the condition is “one vertex

is in partition cell j1, and the other vertex is in partition cell j2”. In [FMS07], the

constraint is on any one of the hypergraph’s edge relations, with its appropriate arity,

but the condition is the same: The tuple is broken up into its s constituent vertices,

and the origin of every one of them in the partition is constrained. Conditions regarding

partitions of tuples will not always decompose a k-tuple into k single vertices, but rather

make any sort of decomposition into tuples of arity up to k — designating certain pairs,

triplets etc. of the elements of the tuple, with the condition being that each sub-tuple

in the decomposition comes from some specified cell in the partition of tuple in its arity.

Thus, taking 3-tuples for example, the condition may be that the pair of the first and

third vertices come from cell no. 4 in the partition of pairs, while the second vertex of

the 3-tuple comes from cell no. 5 of the partition of vertices.

While this generalization is not the broadest possible, it is the focus of this chapter,

and it is already rich enough so that the results of [GGR98] and [FMS07] do not fully

apply.

A key point in both these previous works is the following: If a (hyper)graph has a

partition which approximately satisfies the density constraints, then the hypergraph

is close to having a partition satisfying them exactly; that is, one can add or remove

a small fraction of the edges so that a perfectly-satisfying partition is obtained. (In

[FMS07], this point is made immediately after the statement of Theorem 2.) We show

that this is not the case for partitions of hypergraph vertex tuples — at least not with

a polynomial relation between the distance of the hypergraph from being satisfactory

to the differences in densities of its best partition.

114

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 125: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

This difficulty is a corollary of two results, an upper bound and a lower bound, on

testing such properties:

The upper bound result, in Section 6.3, is that one can ‘test’ whether a graph has

a satisfying partition or whether all partitions are far from satisfying the constraints.

This is not an actual test: While we accept graphs satisfying the property, we reject

graphs not on account of their being far from satisfying it, but rather on account of their

partitions being far from satisfactory. This ‘pseudo-testing’ can be done with a number

of queries polynomial in 1/ε , using a generalization of a technique from [GGR98]

and [FMS07]: Beginning with an unknown satisfying partition, one can repeatedly

redistribute more and more small sets of vertices and tuples, using ‘type estimators’

which minimize the ‘damage’ of this redistribution, so that the unknown partition which

satisfies the constraints perfectly is gradually transformed into a partition which only

approximately satisfies them, but is wholly known to the test. The validity of this final

partition can be ensured with high probability of success. As pointed out above, for

vertex-partition properties, this constitutes an actual test, but not so for tuple-partition

properties.

The lower bound result, in Section 6.4, shows that some tuple partition properties

are not polynomially testable. We demonstrate how tuple partition constraints are

actually rich enough, already as we study them, to express the property of a graph being

triangle-free. This gives a super-polynomial lower bound for testing tuple partition

properties — at least in the general case. This lower bound does not rely on any specific

triangle-testing lower bound construction (unlike the result in Chapter 5, which relies

on a lower bound in partite graphs). We cannot even say for a fact that these partition

properties are testable at all (that is, have tests independent of the size of the graph).

As mentioned above, the generalization in this work of partition properties is only

partial. The expressive power it lacks is that of involving vertices and sub-tuples

of constrained tuples in multiple conditions regarding the tuple partitions — cross-

constraining, so to speak. For example, given a tuple x = (x1, . . . , x5), we might wish

to constrain both the origin of (x2, x3, x4, x1) and at the same time also the origin of,

say, (x5, x3, x4). Such constraints are necessary for expressing hypergraph regularity, as

the regular sub-entities of a hypergraph are simplical complexes, and their regularity

involves densities of tuples supported by intersecting lower-arity tuples; for details, see

one of the variant definitions of hypergraph regularity: [Gow07, NRS06, Ish09].

Attempts to establish the pseudo-testability of such properties have not met with

success thus far. Some further discussion of the prospects for these more expressive

partition properties is found in Chapter 7.

115

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 126: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

6.2 Additional preliminaries

6.2.1 Hypergraph tuple partition functions and named tuple decom-

positions

Definition 6.2.1. For some vertex set X and maximum arity k, partition functions

with respect to X are in fact a single function over the domain [k], but such that

each function P(s) is a partition of the tuples of a certain arity into m sets, or cells:

P(s) :∏si=1X→ [m].

We denote the jth partition cell of arity s, with respect to partition functions P, by

XP,sj = (P(s))−1(j).

Such partitions of the sets of tuples induce partitions of a hypergraph’s set of edges,

through the concept of edge decompositions which we define below.

Let [s1]≤s2 denote the set of all non-empty sequences, with length up to s2, of distinct

elements of [s1]. For a sequence A ∈ [s1]≤s2 , we denote by⋃A the (unordered) set of

all elements in A and by |A| the length of A. Thus A =(A1, . . . , A|A|

).

Definition 6.2.2. Let x = (x1, . . . , xs) ∈∏si=1X and A ∈ [s]≤s. For tuple x and index

sequence A, the A-projection of x, denoted x(A), is the tuple y, of arity |A|, such that

yi = xAi .

Definition 6.2.3. A function φ with domain Dom(φ) ⊆ [k′]≤k′

and range [m] con-

stitutes an k′-named tuple decomposition (or k′-NTD for short) if every pair of its

constituent sequences A,A′ ∈ Dom(φ) are disjoint, i.e.⋃A ∩

⋃A′ = ∅, while on

the other hand, all possible elements appear within some sequence in φ’s domain:⋃⋃A |A ∈ Dom(φ) = [k′].

In other words, an s-NTD constitutes: a partition of [k′] (the coordinates in an k′-tuple);

an ordering of the cells in this partition of the coordinates into sequences; and an

indication for each such sequence of its intended origin in P.

A tuple x is said to observe an NTD φ (by partition functions P) if for every A ∈ Dom(φ),

(P(|A|))(x(A)) = φ(A), i.e. the partition cell of the subtuple x(A) of x is the one

indicated by φ for A.

Example 6.2.4. Let n = 2000, s = 3, let m = 10 and let x = (7, 1003, 21). Let φ be the

NTD with domain Dom(φ) = (2), (1, 3), so that (2)φ7−→ 5 and (1, 3)

φ7−→ 4. The NTD

φ represents the constraint on 3-tuples of their second element coming from the fifth

partition cell (in a partition of individual vertices) and the subtuple of a 3-tuple, made

up of its first and third element, coming from the fourth partition cell (of the partition

of 2-tuples, which may be entirely unrelated to the partition of individual vertices). For

x to observe φ with some partition functions P, these must satisfy (P(1))(1003) = 5

and (P(2))(1, 3) = 4.

116

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 127: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Note. NTDs will be used to impose constraints on vertex tuples, and offer a certain

power of expressing such constraints. As discussed earlier, their expressive power is not

maximal: No sub-tuple x(A) of a tuple x has elements involved in two constraints of the

same NTD simultaneously. In the above example, since the second element of the tuple

is constrained as a singleton, no constraints on pairs can involve it. Thus instead of

having Dom(φ) = (1), (2), (3), (1, 2), (2, 3), (1, 3), (1, 2, 3) we only have the sequence

of tuple indices [s] decomposed into disjoint subsequences, each with its own constraint.

We denote by Φs the set of all s-NTDs, and their union of all arities up to k by

Φ[k] =⋃s≤k Φs.

6.2.2 Partitions and partition oracles

While we are interested in partitions of (multi-colored) hypergraphs, the objects we

are testing are the hypergraphs themselves, rather than possible partitions; we will be

constructing ‘partition oracles’ using queries to classify vertex tuples, thus simulating

these hypothetical partitions.

Definition 6.2.5. A (q,m, k) partition oracle is a mapping π :⋃ks=1

∏si=1X→ [m]

such that for x ∈∏si=1X, π(x) may be computed using q queries of the hypergraph. A

partial partition oracle is defined similarly, but provides answers only for some subset

Y ⊆⋃ks=1

∏si=1X.

Definition 6.2.6. A set of (possibly partial) oracles, sharing the same domain, is said

to have shared query complexity q if, for any element of their domain, the set of all

queries necessary for all the oracles to return an answer is of size at most q (where each

of the oracles might use as many as all q queries).

Definition 6.2.7. Fix P(s), let Y ⊆∏si=1X and let Q(s) be a partition function for

this subset. The modification of P(s) according to Q(s), is the function

(P(s) Q(s))(x) =

(Q(s))(x) x ∈ Y

(P(s))(x) x /∈ Y

Definition 6.2.8. For a partial partition oracle π for some set, the modification of

P(s) according to π, denoted by P(s) π, is the modification of P(s) according to the

partition induced by π.

6.2.3 Multi-colored hypergraph partition properties

Partition density features and density characteristics

As in the case of graph partition properties (studied in [GGR98]) and hypergraph vertex

partition properties (studied in [FMS07]), we concern ourselves with the intersection of

117

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 128: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

the edge set(s) with sets of tuples obeying partition-related constraints. In the case of

graphs, a constraint on an edge x1, x2 is “x1 is in some certain specified cell of the

vertex partition, and x2 is in some certain specified cell”; for our partition properties,

constraints correspond to NTDs.

Definition 6.2.9. For a hypergraph H, partition functions P (with maximum arity

k ≥ maxr(c) | c ∈ C(H)), a color c ∈ C(H) and an NTD φ ∈ Φr(c), let

Hφ(c) =y ∈ H(c)

∣∣ ∀B ∈ Dom(φ)[P(|B|)(y(B)) = φ(B)

]that is, Hφ(c) is the set of all hyperedges in H(c) which observe the NTD φ: For a

subtuple of [s] which φ constrains to some partition cell, P assigns the corresponding

subtuple of y to the same partition cell.

Definition 6.2.10. An (m, k, C)-density characteristic is a tuple τ = (ρ, µ) of density

functions

ρ : [k]× [m]→ [0, 1]

µ :

(c, φ)∣∣∣ c ∈ C and φ ∈ Φr(c)

→ [0, 1]

ρ values shall be referred to as the characteristic’s tuple densities, and µ values as its

edge densities.

Definition 6.2.11. The density characteristic corresponding to a hypergraph H and

partition functions P, denoted ψH,P =(ρH,P , µH,P

), is the one satisfying, for each

k′ ∈ [k] and j ∈ [m],

ρH,P(k′, j

)=

1

nk′

∣∣∣XP,k′j

∣∣∣and for each c ∈ C(H) and φ ∈ Φr(c),

µH,P(c, φ) =1

nr(c)|Hφ(c)|

In other words, ρH,P(k′, j) denotes the density of the jth partition cell of k′-tuples

within the entire set of k′-tuples; and µH,P(c, φ) denotes the density of the hypergraph’s

c-colored edges with the named decomposition φ.

Note. The edge density figures are ‘absolute’ — fractions of nk′

possible tuples for some

k′, rather than fractions of the number of tuples with the same NTD.

Observation 6.2.12. The total number Ndc of vertex and edge density values in a

single density characteristic is less than k ·(m+ c · kk ·mk

).

A hypergraph H and partition function P are said to satisfy a density characteristic ψ

if ψH,P = ψ. A hypergraph H is said to satisfy a density characteristic ψ if there exist

partition functions which, together with H, satisfy ψ.

118

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 129: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Observation 6.2.13. A density characteristic ψ is satisfiable by hypergraphs on n

vertices only if all tuple and edge densities of ψ are multiples of n−k′

for the respective

values of k′.

Definition 6.2.14. The distance between two density characteristics is the maximum

difference between corresponding ρP(k′, j) and µP(c, φ) values of the two characteristics.

Partition-based properties

Let Ψ be a set of (m, k, C)-density characteristics, and let H be a hypergraph on vertex

set X with color set C. H is said to satisfy the set Ψ if it satisfies some specific density

characteristic ψ ∈ Ψ (with some partition functions P). H is said to ε-approximately

satisfy Ψ if there exist partition functions P , and some ψ ∈ Ψ, which is in itself satisfiable

and of distance at most ε from ψH,P .

Definition 6.2.15. For a density characteristic set Ψ, the property ΠΨ of hypergraphs

is defined as the set of all hypergraphs which satisfy Ψ.

A hypergraph is said to ε-approximately satisfy ΠΨ if it ε-approximately satisfies Ψ.

As discussed in the introduction to this section, a hypergraph’s being approximately

satisfying of ΠΨ does not necessarily imply that it is also close to satisfying ΠΨ. We

thus make a definition analogous to that of a property test.

Definition 6.2.16. Let ΠΨ be a partition property as per the above. A pseudo-test

for ΠΨ is a probabilistic oracle machine with the same input and oracle as a (dense

model) property test, which distinguishes with probability at least 2/3 between the

case of G being in ΠΨ and the case in which, for every choice of partition functions P,(ρH,P , µH,P

)is ε-far from Ψ (rather than the case of G being far from ΠΨ).

Pseudo-testing can be seen as testing under a different distance metric — the minimum

distance of the hypergraphs’ partition functions.

6.2.4 Tuple types and type estimators

What is the effect of reassigning a hypergraph’s vertex tuple a different partition cell

on the partition’s density characteristic? As in [GGR98] and [FMS07], we will need to

estimate this effect and cluster tuples accordingly, so as to be able to redistribute tuples

among partition cells without affecting the partition density characteristic overmuch.

Let s ≤ k′ ≤ k, let x ∈∏si=1X and let A be a sequence in [k′]s (that is, a sequence

of s distinct elements between 1 and k′). We say that x takes the role of A in a tuple

y ∈∏k′

i=1X if y(A) = x. We’re interested in characterizing the effect on edge densities

of having x taking the role of different sequences A, for every possible decomposition of

the rest of the tuple besides x:

Definition 6.2.17. A partial function φ : [k′]≤k′−s→ [m] constitutes an A-less k′-

named tuple decomposition for arity k′ (or (k′, A)-NTD for short) if for every two

119

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 130: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

distinct sets B,B′ ∈ Dom(φ),⋃B ∩

⋃B′ = ∅, and every index not in A is covered by

some B, while no index in A is covered, i.e.⋃⋃B |B ∈ Dom(φ) = [k′] \

⋃A.

The set of all (k′, A)-NTDs shall be denoted Φk′,A, with Φ[k],A denoting their union over

all k′ ∈ [k].

Definition 6.2.18. For a hypergraph H, partition functions P, a color c ∈ C(H), an

(r(c), A)-NTD φ ∈ Φr(c),A, and an |A|-tuple x, we define

HA,xφ (c) =

y ∈ H(c)

∣∣ y(A) = x and ∀B ∈ Dom(φ)[P(|B|)(y(B)) = φ(B)

]In other words,HA,x

φ (c) is the set of all hyperedges in H of color c, in which x takes the

role of A, and which also observe φ.

Definition 6.2.19. An s-tuple type is a function τ : TypeDom (s)→ [0, 1], with its

domain being

TypeDom (s) =

k⋃k′=s

(A, c, φ) ∈

[k′]s×C(H)×Φ[s],A

∣∣∣ r(c) = k′ and φ ∈ Φr(c),A

Definition 6.2.20. The type of an s-tuple x with respect to a hypergraph H and

partition functions P is the s-tuple type τH,P,x, whose values are the relative sizes of

all of constrained edge sets of the various arities and for the various roles x can take in

such edge sets:

τH,P,x(A, c, φ) =1

n(r(c)−s)

∣∣∣HA,xφ (c)

∣∣∣We denote by Type(s) the set of all possible types of s-tuples (with respect to m and k).

Definition 6.2.21. The distance dist(τ1, τ2

)between two s-tuple types is the maximum

over (A, c, φ) ∈ TypeDom (s) of the absolute differences |τ1(A, c, φ)− τ2(A, c, φ)|.

In our arguments below we shall be using rounded estimates of tuples’ type for

clustering. For this purpose, we define:

Definition 6.2.22. The tuple type ε-net for s-tuples is the set TypeNets,ε of all types

τ = (ρ, µ) supported on exact multiples of ε.

Lemma 6.2.23. The size of the ε-net for s-tuples is polynomial in 1/ε :

∣∣TypeNets,ε∣∣ < (1 + 1/ε)|C(H)|·m(2ek)2k

= poly(1/ε)

Proof. For every one of the |C(H)| colors, TypeDom (s) has elements for any possible

choice of a sequence A of length s and an A-less NTD in Φr(c). The number of such

choices is m to the power of possible decompositions of an r(c)-tuple into a sequence A

and additional subsequences of [r(c)] \A. The number of such decompositions can be

upper-bounded as follows: Order the elements of [r(c)]; the first s elements constitute

120

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 131: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

A; as for the rest, one has to choose sizes for as many as r(c)− s additional sequences

(which contain the remaining elements of the r(c)-tuple as ordered); assuming s < r(c),

the number of such choices is(2(r(c)−s)−1

r(c)−s). The number of decompositions is therefore

under (r(c))! ·(2(r(c)−s)−1

r(c)−s)< (2er(c))2r(c) < (2ek)2k. The claim follows.

TypeNets,ε is a 12ε-net of tuple types — any type is within a distance of 1

2ε of

a type in TypeNets,ε. It can therefore induce a clustering of types, associating each

possible type with one of those in TypeNets,ε: We first impose some arbitrary order on

TypeNets,ε, then define:

Definition 6.2.24. For any s-tuple type τ , the TypeNets,ε type corresponding to τ is

the first type among those TypeNets,ε types which is at the minimum distance from τ .

Abusing notation, we refer to this corresponding type in TypeNets,ε as “τ ’s cluster”

with respect to TypeNets,ε. Given a specific hypergraph and partition functions P(·),this clustering of tuple types also induces a clustering of a hypergraph’s tuples — a

clustering according to type.

Aside from the single type with which a tuple x is associated, x is said to be compatible

with any tuple type τ ∈ TypeNets,ε whose distance from τ is less than ε.

Definition 6.2.25. A type estimator for a set Y ∈∏si=1X with respect to TypeNets,ε

is a probabilistic machine which, given some tuple y ∈ Y as input, makes certain queries

to the hypergraph, and then returns an element of TypeNets,ε as the estimated cluster

of y.

The concept of shared query complexity for type estimators is defined similarly to the

case of partition oracles (see Definition 6.2.6).

6.3 An upper bound on pseudo-testing partition proper-

ties

We begin with our positive result regarding tuple-partition properties: The possibility

of efficiently distinguishing hypergraphs with satisfying partitions from hypergraphs

which do not ε-approximately satisfy the given constraints with any partition:

Theorem 6.1. Let Ψ be a set of density characteristics for hypergraphs with colors C,

regarding partitions with m cells in each arity. One can pseudo-test ΠΨ with a number

of queries polynomial in ε.

Note. For the purpose of this theorem, we assume that the set Ψ is ‘easy’, in the sense

that one can efficiently compute the distance of a specific density characteristic from Ψ

(and specifically, whether it is in Ψ or not). We omit an exact definition of this notion.

121

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 132: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

The key to the proof (similarly to the argument in [GGR98, Section 9.1] and [FMS07,

Sections 6,7]) is the following: One can arbitrarily redistribute small sets of tuples

among the m partition cells (of the appropriate arity in our case), with a relatively

small effect on the partition’s density characteristic — provided that the elements in

the small set being redistributed all have very similar types, and that the overall size of

every partition cell remains almost the same after redistribution. This is established

in Subsection 6.3.1. Given a small set Y ⊆∏si=1X, and assuming that the rest of the

partition is known to us, we can rebuild another, similar, partition resulting from the

small-set redistribution. We can do so repeatedly for a chosen partition of all the vectors

in∏1i=1X, . . . ,

∏ki=1X into such small redistribution sets, so that, in fact, we eventually

need not have any output depend on knowledge of the original partition; the overall

deviation from the original partition’s density characteristic will still be relatively small.

The problem with this procedure is that for every small set Y we examine, we do

not actually know the rest of the partition, nor the fraction of the elements in Y of each

type within each of the partition cells. We overcome this ignorance by simply trying all

possibilities, i.e. when sampling tuples with which to determine the type of the elements

of Y , we will ‘branch’ our computation for all m partition cells to which any tuple may

belong. Similarly, when choosing how to redistribute the elements of Y of similar type,

we will in fact branch our computation for all possible sizes for distribution among the

m partition cells (rounded to multiples of some fraction depending on ε). We will thus

construct, in fact, a large number of partition oracles — exponential in 1/ε — but their

shared query complexity will still be polynomial in 1/ε , as they all use the same set of

queries. This construction of partition oracles and tuple type estimators is described

and analyzed in Subsection 6.3.2.

If an appropriate partition exists, then one of these oracles will simulate it relatively

well. The pseudo-test will be able to determine whether this is indeed the case by

estimating the partition’s density characteristic using the oracle. This is demonstrated

in Subsection 6.3.3, and allows us to complete the proof.

Throughout this section we assume that k ≥ maxr(c) | c ∈ C(H), and without loss

of generality that k = maxr(c) | c ∈ C(H). Also, our query complexity expressions

treat m and k as constants rather than parameters (e.g. the O(·) notations hide

coefficients depending only on m and k).

6.3.1 Key Lemma: Low-damage tuple redistribution

Our proof hinges on repeatedly estimating the types of vertex tuples — with respect to

initial partitions P — and then modifying P by redistributing tuples of similar type

among the various cells at a given arity, while ensuring that ψH,P does not change

overmuch.

Suppose, then, that we have a small set of tuples to redistribute. Formally, let

τ ∈ Type(s) be a type of s-tuples. Let Y τ ⊆∏si=1X be a small set of tuples with

122

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 133: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

|Y τ | < ζns, whose types are all ε′-compatible with τ with respect to the partition P(s),

except perhaps for at most a ξ-fraction. Also, let Q : Y τ→ [m] be a re-distribution of

Y τ — a function partitioning it into m cells, which maintains fairly well the number of

elements in each partition cell: The partition cell sizes of Q and P(s) (with respect to

Y τ ) differ by at most an η-fraction of the total size, that is, for every j ∈ [m],∣∣∣Q−1(j)−∣∣∣XP,sj ∩ Y τ

∣∣∣∣∣∣ < η|Y τ |

Lemma 6.3.1. Let P, Y τ , Q be as per the above, and denote P = P Q. Then

dist(ψH,P , ψH,P

)< k2

(ε′ + η + ζ + ξ

)· |Y

τ |ns

Proof. We must bound the change in density of both the vertex and the edge densities

of ψH,P relative to ψH,P .

As regards the tuple densities of P, the claim follows from the fact that each set

XP,sj has lost∣∣∣XP,sj ∩ Y τ

∣∣∣ elements and gained∣∣Q−1(j)

∣∣ elements; the density ρ(s, j)

has therefore changed by no more than η · n−s|Y τ | as per the constraint on Q.

Moving to edge densities, fix some color c ∈ C and an NTD φ ∈ Φr(c); we must bound

the change in the density Hφ(c). We do so by considering various kinds of s-tuples in

Hφ(c) with respect to before and after the redistribution:

First consider those r(c)-tuples x containing at least two s-subtuples from Y τ :

x(A) ⊆ Y τ and x(B) ∈ Y τ for two different (and disjoint) sequences A,B ∈ Dom(φ).

The fraction of these tuples within all r(c)-tuples is at most n−2s|Y τ |2 < ζ · n−s|Y τ | for

every choice of disjoint sequences A and B in φ; the number of such choices is less than

(r(c)/s)2 ≤ r(c)2, so the total contribution of such tuples to the change in density is

less than r(c)2 · ζ · n−s|Y τ |.Next, consider some maximal 1:1 relation between Q−1(j) and XP,sj ∩ Y τ ; we

can think of the sources of this relation as tuples being replaced by tuples of similar

type (with the rest of the tuples being removed-only or added-only). Consider such

a pair of s-tuples, y and y′, and assume that both have a type which is ε′-compatible

with τ . This replacement effects Hφ(c) through r(c)-tuples in which either y or y′

appears at least once. Consider some A ∈ Dom(φ) of length s, and let φ′ ∈ Φr(c),A

be the corresponding (r(c), A)-NTD (obtained by removing A 7→ j from φ). We have∣∣∣τP,y(A, c, φ′)− τP,y′(A, c, φ′)∣∣∣ < ε′, so Hφ(c) gains or loses at most ε′nr(c)−s tuples by

replacing HA,yφ′ (c) with HA,y′

φ′ (c). Summing over all possible choices of A ∈ Dom(φ) (at

most r(c)/s ≤ r(c) of these) and all pairs y, y′ in the matching, we find that Hφ(c) gains

or loses at most r(c) ·ε′ ·nr(c)−s · |Y τ |, i.e. its density changes by at most r(c) ·ε′ ·n−s|Y τ |.In this last estimate we have disregarded the effect of r(c)-tuples with more than one

s-subtuple from Y τ taking the place of some A ∈ φ — these may behave differently than

what the type of an individual Y τ s-tuple suggests, but the aggregate contribution of

all such tuples to the change in density has already been accounted for with the bound

123

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 134: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

involving ζ above. We have also disregarded the effect of tuples with incompatible type,

which will be considered below.

Now consider those s-tuples in Q−1(j) or XP,sj ∩ Y τ which are neither sources

nor targets in the above-mentioned 1:1 relation. Their number is at most the difference

in size between Q−1(j) and XP,sj ∩ Y τ , which by our assumptions does not exceed

η|Y τ |; thus the fraction of r(c)-tuples in which they take the role of some A ∈ Dom(φ),

and their effect on the density of Hφ(c), is at most r(c) · η · n−s|Y τ |.Finally, for every A ∈ Dom(φ), there are at most ξ · |Y τ | ·nr(c)−s r(c)-tuples in which

the role of A is taken by a Y τ element whose type is incompatible with τ . The total

contribution of these tuples over all A ∈ Dom(φ) to the change in Hφ(c) density is at

most r(c) · ξ · n−s|Y τ |.All other r(c)-tuples do not involve s-tuples from Y τ , are neither introduced into

Hφ(c) nor removed from it by the redistribution of Y τ , and do not affect changes in its

density.

Summing up the above (and recalling that r(c) ≤ k) yields the claim regarding the

edge density µ(c, φ), for any possible choice of c and φ.

6.3.2 Generating type estimators and partition oracles

The first two of the following three lemmata each requires the next one in its proof; the

proofs appear after the statement of all three. Note that the complexity bounds in all

these lemmata treat k and m as constants rather than parameters.

Lemma 6.3.2. Let Ψ be a set of density characteristics, and let δ, ε′ > 0. One may

generate a set Soracles of exp(poly(1/ε′) · ln(1/δ)) partition oracles for H with shared

query complexity q6.3.2(ε′) = poly(1/ε′ ) ·O(ln(1/δ)), such that if the hypergraph satisfies

Ψ, then with probability at least 1 − δ at least one of these oracles induces partition

functions which 12ε′-approximately satisfy Ψ. This, without making any queries to H,

and independently of Ψ.

Lemma 6.3.3. Let P be partition functions for a hypergraph H, let Y ⊆∏si=1X be

a set of normalized size at most 1/` and let δ′ > 0. One may generate a set SYoraclesof exp(poly(`)ln(1/δ′)) partial oracles for s-tuples in Y , with shared query complexity

q6.3.3(`) = poly(`) · O(ln(1/δ′)), so that with probability at least 1− δ′, at least one of

these oracles (say, π) is such, that the partition functions P π(

6k2/`2)-approximately

satisfy ψH,P . This, without making any queries to the hypergraph and independently of

P.

Lemma 6.3.4. Let P be partition functions for a hypergraph H, let Y ⊆∏si=1X and

let δ′′, ε′′, ξ > 0. One may generate a set of at most exp(poly(1/ε′′) · ln(1/δ′′ξ)) type

estimators for the tuples in Y , all using a single uniformly-sampled sequence U of

poly(1/ε′′) · O(ln(1/δ′′ξ )) vertices, such that at least one of these oracles suggests a

compatible cluster with respect to TypeNets,ε′′ and P for all but a ξ-fraction of the tuples

124

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 135: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

in each of the clusters induced by TypeNets,ε′′, and such that all oracles only query

tuples involving vertices from the input tuple and from U , for a shared query complexity

of Θ(|U |). This, with probability at least 1− δ′′ over the choice of U , independently of

P, and with no queries made in advance so as to obtain the oracles.

Proof of Lemma 6.3.2. Set ` = 12k3/ε′ . Assume that H does indeed satisfy the density

characteristic Ψ with partition functions P . At every arity s ≤ k, we choose an arbitrary

partition of the s-tuples into ` equal-size sets: Ys,1, . . . , Ys,`.We generate small-set oracles as described in Lemma 6.3.3, with δ′ = δ/2k` ,

obtaining sets SYoracles of partial oracles for each subset at each arity.

We will now transition from the initial partition functions P through a sequence of

intermediary partition functions, up to the final partition functions Pk,`, which will still

approximately-satisfy Ψ, even though their assignment of cells to tuples is based wholly

on the partitions into Ys,j sets. At every subsequent transition, we apply Lemma 6.3.3

regarding one of the Ys,j sets, to obtain partial oracles for this set — but with respect

to the previous intermediary partition functions, rather than with respect to the initial

partitions P . This is possible due to the fact that Lemma 6.3.3 applies regardless of the

partition for which partial oracles are sought.

Indeed, set P0,` to P, satisfying Ψ exactly. With probability at least 1 − δ/2k` ,

one of the oracles for Y1,1, call it π1,1, is such that P1,1 6k2/`2 -approximately satisfies

ψP0,`

(specifically, P π1,1 ε′/2k` -approximately satisfies ψH,P ; and this will hold

for subsequent partition functions at any arity, not just arity 1, by our choice of

`). Similarly, with probability at least 1 − δ/2k` , one of the oracles for Y1,2 is such

that P1,2 = P1,1 π1,2 ε′/2k` -approximately satisfies ψP1,1

(thus ε′/k` -approximately

satisfying ψP0,`

), and so on until P1,` which ε′/2k -approximately satisfies ψP0,`

. We

implicitly construct similar partition functions Ps,1, . . . , Ps,` for the sets of 2-tuples,

3-tuples, and every arity s — beginning each time with Ps−1,` from the previous phase.

Eventually, with probability at least 1− δ/2, some sequence of oracles (π1,1, . . . , πk,`)

yields a complete partition Pk,` which 12ε′-approximately satisfies P0,` = P.

Consequently, our set of oracles for the entire hypergraph is the set of all combi-

nations of Ys,1, . . . , Ys,` oracles for all s ∈ [k], constituting (exp(poly(`) · ln(1/δ′)))k` =

exp(poly(1/ε′) · ln(1/δ)) oracles in total. Their shared query complexity is the same

as for the small-set partial oracles — as in order to get the oracles’ output for a given

vertex tuple, one in fact uses only the small-set oracles ‘covering’ the tuple in question.

Proof of Lemma 6.3.3. Our partition oracles will be based on the principle of ‘low-

damage’ small-set redistribution, embodied in Lemma 6.3.1: We will estimate the types

of the various tuples of Y , and redistribute them accordingly.

Assume initially that P is known to us; we will later forego this assumption.

First, we choose one of the type estimators of Lemma 6.3.4 — with parameters

ξ = 1/` , δ′′ = δ′ and ε′′ = 1/` ; the estimator induces a clustering of the s-tuples in Y by

their estimated type, which we denoteY τ∣∣ τ ∈ TypeNets,ε′′

. Our redistribution will

125

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 136: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

respect the lexicographic order of s-tuples, so that all tuples in a Y τ up to some tuple

x1 are reassigned to cell 1, tuples between x1 and x2 are reassigned to cell 2 etc. This

will later allow us to forego the knowledge of P; for now note that the decision of how

to redistribute tuples with a similar type does not have any affect on the applicability

of Lemma 6.3.1.

We must this decide, for each Y τ and each partition cell j < m, at which tuples

to make the transition from cell j to cell j + 1. This is clearly dictated by the size of

the intersection of Y τ with each cell j; but instead of using the exactly appropriate

tuple range, we set the ranges differently: We only choose as boundary tuples for

cell reassignment such tuples whose positions are multiples of ns/`3∣∣TypeNets,ε′′

∣∣ ,between 0 and ns. For each cell j the choice is either of the highest multiple of

ns/`3∣∣TypeNets,ε′′

∣∣ below∣∣∣XP,sj ∩ Y τ

∣∣∣ (the original intersection size), or the lowest

multiple above∣∣∣XP,sj ∩ Y τ

∣∣∣. The decision of which of these options to choose is made

so that the ranges cover all ns tuples exactly. (This choice is possible, since we can

begin by always choosing the lower multiple of ns/`3∣∣TypeNets,ε′′

∣∣ for the differences

in boundaries, ending up not covering all ns tuples, and then gradually increasing the

differences to the higher multiples; at some point we will hit ns exactly).

We now redistribute Y τ according to the boundaries we have set. If Y τ is very

small, it is possible that we’ve changed the intersection sizes by a significant fraction of

the size of Y τ — perhaps even placed all of it in a single partition cell. But for most

Y τ this is not the case: Denoting by Tsmall the set of types τ ∈ TypeNets,ε′′ for which

|Y τ | < ns/`2∣∣TypeNets,ε′′

∣∣ , we have∣∣⋃Y τ

∣∣ τ ∈ Tsmall∣∣ < ns/`2, as Tsmall has no

more than∣∣TypeNets,ε′′

∣∣ elements. For a τ /∈ Tsmall, the size of each of its intersections

with each partition cell changes by at most |Y τ |/` relative to the original partition.

This redistribution has in general an adverse effect on P’s satisfaction of Ψ: Even

if Y τ is not very small, and if the types of all tuples in Y τ were exactly τ , and the

redistribution would not be changing the sizes of partition cells’ intersections with Y τ

sets at all — there would still be the effect of tuples involving multiple elements from Y τ

which have now changed cells. And of course, the type estimators may not be perfectly

exact; and the types in Y τ are only close to τ ; and the redistribution intersection sizes

are only close to the original sizes. Still, we can apply Lemma 6.3.1, to bound the

effect of the redistribution on the the density characteristic: For a Y τ with τ ∈ Tsmall,

Lemma 6.3.1 applies with parameters ζ = 1/`2∣∣TypeNets,ε′′

∣∣ , ξ = ε′ = 1/` and η ≤ 1;

and for Y τ with τ /∈ Tsmall, the lemma applies with ζ = ξ = η = ε′ = 1/` .

Let us sum up the total effect of these redistributions as a bound on the distance

from the original partition (using the triangle inequality). The contribution of the

redistribution of Y τ with τ ∈ Tsmall is at most k2(

1/`2∣∣TypeNets,ε′′

∣∣ + 1/` + 1 + 1/`)·

n−s|Y τ | < 2k2n−s|Y τ |; over all such sets Y τ the total contribution is at most 2k2n−s ·(ns/`2

)= 2k2/`2. The contribution of the redistribution of a Y τ set with τ /∈ Tsmall is

at most k2(1/` + 1/` + 1/` + 1/`) · n−s|Y τ |, and over all such Y τ , at most 4k2/`2.

126

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 137: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Thus, if one of the type estimators of Lemma 6.3.4 clusters most vertices in each

cluster into compatible clusters (which happens with probability at least 1− δ′), then

the choice of this type estimator yields a partition P π (our initial partition following

the redistribution of the Y τ s) which 6k2/`2 -approximately satisfies ψH,P .

We need, however, to achieve the above without foreknowledge of P. We note

that the only use of the knowledge of P in the argument above was the choice of

boundary values for the redistribution of each Y τ , and even those were only multiples

of |Y |/`2∣∣TypeNets,ε

∣∣ . Thus instead of relying on our knowledge of the partition,

we will have multiple oracles, one for every possible setting of boundary values for

Y τ , for every type τ ∈ TypeNets,ε′ , and every one of the m partition cells in P(s).

(Note that each such oracle for Y can readily compute the redistribution cell for a

given tuple using the type estimate and its predefined boundary values.) The total

number of such configurations is less than(`3∣∣TypeNets,ε

∣∣+ 1)m·|TypeNets,ε′ |, so the

total number of oracles for establishing the claim is this number, times the number of

possibilities for a choice of the type estimator from Lemma 6.3.4. By Lemma 6.2.23,

the first multiplicand is O(exp(poly(`))); by Lemma 6.3.4, the second multiplicand is

p6.3.4(1/`, δ′, 1/`) = exp(poly(`) · ln(`/δ′)) = exp(poly(`) · ln(1/δ′)), so the product is

exp(poly(`) · ln(1/δ′)) oracles overall, as claimed. With probability at least 1−δ′, at least

one of the choices of the type estimator and one of the choices of rounded intersection

values correspond well enough to the actual partition so that Lemma 6.3.1 applies with

the above parameters. The oracles maintain the same shared query complexity as that

of a single oracle, since they do not differ with respect to the queries made for a given

tuple, so the same queries can be used by all oracles; this query complexity is, in turn,

merely that of using the type estimator, i.e. O(ln(`/δ′)) · poly(`) = poly(`) ·O(ln(1/δ′))

as claimed.

Proof of Lemma 6.3.4. Our type estimators will base their output on the clustering

induced by TypeNets,ε′′ — applied to an estimate of a tuple’s type rather than its actual

type. We assume initially that the partition functions P are known, and describe a

single oracle clustering the tuples.

Let y ∈ Y be the tuple to have its type clustered. Set

t = ln

(1

δ′′ξ·Ndc ·

∣∣TypeNets,ε′′∣∣) · 2

ε′′2

where Ndc is the number of density values in a density characteristic (see Observa-

tion 6.2.12).

Let (A, c, φ) ∈ TypeDom (s). The estimator samples, uniformly, a sequence of

r(c)−s vertices from X, which complete y into an r(c)-tuple x with x(A) = y. Using its

knowledge of the partition, the estimator determines whether or not this tuple observes

the NTD φ, and queries the hypergraph to determine whether x is an edge of H(c).

This is repeated t times, independently, and the density value estimate τU,y(A, c, φ) is

127

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 138: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

the fraction of samples x in Hφ(c).

This estimate is made for all (A, c, φ) ∈ TypeDom (s). As we will be union-bounding

the probability of any of the estimates deviating overmuch, we have the estimator

use the same samples for all choices of (A, c, φ), that is, a sequence of k − s vertices

is sampled t times, with only the first r(c) − s vertices in each sample are used for

τU,y(A, c, φ) estimates. This entire sequence of t · (k − s) vertices is our choice of U .

The estimation of a tuple’s type may fail (with the estimator suggesting an incom-

patible cluster) only if one of the estimated density values is 12ε′′-far from the actual

value. The probability that a uniformly sampled completion of x into an r(c)-tuple will

be in Hφ(c) is, by definition, τ P,y(A, c, φ), and the estimate τU,y(A, c, φ) is an average of

t independent indicators with this probability. We may therefore apply a large deviation

bound to conclude that

Pr

[∣∣∣τU,y(A, c, φ)− τ P,y(A, c, φ)∣∣∣ ≥ ε′′

2

]< 2 · exp

(−2

(ε′′

2

)2

t

)=

δ′′ξ

Ndc ·∣∣TypeNets,ε′′

∣∣Union-bounding over all density values in the tuple’s type, we conclude that the

probability that any estimate is 12ε′′-far from the real value, i.e. the probability of failure

to output a compatible cluster, is less than δ′′ξ/∣∣TypeNets,ε′′

∣∣.We wish to ensure a high enough probability of outputting compatible clusters for

most vertices in each cluster induced by TypeNets,ε′ . Consider some such cluster. The

expected fraction of Y tuples from this cluster, for which the estimator outputs an

incompatible cluster, is less than δ′′ξ/∣∣TypeNets,ε′′

∣∣. Applying Markov’s inequality to the

tuples in this cluster, we conclude that with probability greater than 1−δ′′/∣∣TypeNets,ε′′

∣∣,the estimator outputs a compatible cluster for all but a ξ-fraction of them. We now

union-bound again, this time over all clusters in TypeNets,ε′′ , to conclude that with

probability greater than 1− δ′′, the clustering is indeed correct for all but a ξ-fraction

of the tuples in each cluster.

Finally, we must contend with the fact that the estimator does not actually know P .

Instead of using the (single) estimator’s knowledge of P to decide which tuples within

every sample originate in which partition cell in P, we will have multiple estimators:

There will be one estimator for every possible assignment of each subtuple of each (k−s)-tuple used in the type estimation sample — each of these estimators assumes knowledge

of a different P . Now, the number of possible choices of partition cells for subtuples of a

single (k−s)-tuple is at most m2(k−s) ; and over all t tuples, the number p6.3.4(ε′′, δ′′, ξ) of

such choices is less than m2(k−s)·t = mO(ln( 1/δ′′ξ )·poly(1/ε′′)) = exp(poly(1/ε′′) · ln(1/δ′′ξ)),

thus the total number of type estimators is as claimed.

The estimators all share the same sequence U of sampled vertices as the single

estimator assuming knowledge of the partition: t · (k − s) = O(ln(1/δ′′ξ)

)· poly(1/ε′′)

vertices are sampled, as claimed.

128

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 139: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

6.3.3 Distinguishing good and bad partition oracles

We have established that an unknown partition satisfying a certain density characteristic

can be replaced with a (large) set of partition oracles of our construction, one of whose

induced partitions satisfies the density characteristic approximately. If no approximately-

satisfying partition exists, our construction will still yield a set of oracles, but they will

be useless — none of them will satisfy the density characteristic even approximately;

we need to be able to tell these two cases apart.

Lemma 6.3.5. Suppose one is given a set S of (q,m, k) partition oracles for a hy-

pergraph H, with shared query complexity q. There exists a probabilistic algorithm

making at most q6.3.5(ε′, δ, q) = O(ε′−2log(1/δ) · log(|S|) · q

)queries to H for which the

following holds:

• If one of the oracles’ induced partitions 12ε′-approximately satisfies Ψ, then the

algorithm outputs accept with probability at least 1− δ.• If none of the oracles’ induced partitions ε′-approximately satisfy Ψ, then the

algorithm outputs reject with probability at least 1− δ.

Proof. Essentially, we can obtain good estimates of the density characteristic of each

oracle, and decide accordingly.

Consider a single oracle π ∈ S, inducing partitions Pπ; our estimate of its density

characteristic shall be denoted ψU . Set, with foresight,

t = 8 · log(2) · 1

ε′2·(

log

(1

δ

)+ log

(Ndc

)+ log(|S|)

)(recalling that Ndc is the number of density values in a density characteristic; see

Observation 6.2.12). We sample t sequences of k vertices each: ((xh,1, . . . , xh,k))th=1; let

xh denote the hth k-tuple.

Now, for the partition set vertex density estimates, and for s ≤ k, we use the first s

elements of each sampled tuple to estimate the densities for that arity — we set (abusing

notation somewhat)

ρU (s, j) =1

t|h ∈ [t] | π(xh) = j|

As for the edge density estimates, for every color c ∈ C(H) and φ ∈ Φr(c) we let

µU (c, φ) =1

t|h ∈ [t] | xh ∈ H(c) and observes φ|

=1

t

∣∣∣∣∣h ∈ [t]

∣∣∣∣∣ xh ∈ H(c) and

for every A = (j1, . . . , js) ∈ Dom(φ), π(xh(A)) = φ(A)

∣∣∣∣∣that is, µU (c, φ) is the fraction of the t samples whose first r(c) elements support an H(c)

hyperedge and have sub-tuples which the oracle places in the partition cells indicated

by φ.

129

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 140: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

To bound the probability of the estimates being overly far from the actual density

values, note that, for every sample-set index j, we have

Pr[π((xh,1, . . . , xh,s)) = j] = ρPπ(s, j)

Pr[(xh,1, . . . , xh,s) ∈ H(c) and observes φ] = µPπ(c, φ)

since the tuple vertices, and hence also the tuples, are sampled uniformly and indepen-

dently. The estimates ρU (s, j) and µU (c, φ) each admit, therefore, a large deviation

bound:

Pr

[∣∣ρU (s, j)− ρPπ(s, j)∣∣ > ε′

4

]< 2 · exp

(−2

(ε′

4

)2

t

)=

δ

Ndc · |S|

and the bound for µU (c, φ) is the same. Union-bounding over all Ndc density values in

the characteristic, we find that with probability greater than 1− δ/|S|, our estimates

will indeed all be within less than ε′/4 of the correct values, Union-bounding again over

all oracles in S, we find that, with probability greater than 1 − δ, all oracle density

characteristic estimates are correct to within less than ε′/4 — independently of which

density characteristic these are.

Conditioning on this event, if any of the oracles’ partition ε′/2-approximately satisfies

Ψ, this oracle’s estimate will be at distance under 3ε′/4 from Ψ; while if no oracle’s

partition even ε′-approximately satisfies Ψ, all estimates’ distances from Ψ will be higher

than 3ε′/4. In the former case, we accept, while in the latter, we reject. This completes

a valid algorithm meeting the requirement of the claim, with probability of success

greater than 1− δ.Finally, the number of (single) oracle invocations in making the estimate is t times

the number of subsequences of elements of the k-tuples, which is less than k! · 2k. An

additional t · |C(H)| queries to the hypergraph are made. As the oracles have shared

query complexity q, the total number of queries made for estimating all of their density

characteristics is: t ·(k! · 2k · q + |C(H)|

)= O

(ε′−2log(1/δ) · q

), as claimed.

With the ability to generate an appropriate set of oracles, and to distinguish whether

any of them induces an acceptable partition, we can now proceed to prove the upper

bound:

Proof of Theorem 6.1. Set δ = 1/6 and ε′ = ε. Our algorithm acts as follows: The test

generates a set S of oracles as described in Lemma 6.3.2, applies the distinguishing

algorithm of Lemma 6.3.5 to these oracles, and accepts if and only if the algorithm

accepts.

If the hypergraph satisfies Ψ, then by Lemma 6.3.2, with probability at least 5/6,

one of the oracles induces a partition which 12ε-approximately satisfies Ψ; such an oracle

will be accepted by the algorithm of Lemma 6.3.5 with probability at least 5/6, so with

probability at least 2/3 overall, the test accepts.

130

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 141: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

If the hypergraph does not ε-approximately satisfies Ψ, then no oracle is such that

its induced partition ε-approximately satisfies Ψ, so all the oracles will be rejected with

probability at least 5/6.

The oracles’ shared query complexity is q = poly(1/ε) ·O(ln(1/δ)) = poly(1/ε), and

the number of oracles is |S| = exp(poly(1/ε)) · δ−O(1/ε), so the total number of queries

made by the distinguishing algorithm of Lemma 6.3.5 is O(ε−2log(|S|) · q

)= poly(1/ε).

6.4 A lower bound on testing partition properties

In this section we show that Theorem 6.1 of the previous section cannot be strengthened

from polynomial pseudo-testing to polynomial testing, by the following:

Theorem 6.2. There exists a density characteristic Ψ for hypergraphs of maximum

arity 3, such that testing ΠΨ requires as many queries as testing a digraph for being

triangle-free, up to a constant factor (specifically, Ω((c′/ε)c

′·ln( c′/ε )) queries are required

for some global constant c′).

The combination of the upper bound Theorem 6.1 and this lower bound implies

immediately that pseudo-testing is significantly weaker than actual testing:

Corollary 6.3. The testing query complexity of some partition properties is not bounded

by a polynomial function of their pseudo-testing query complexity.

The lower bound Theorem 6.2 will be proven via a reduction (in the sense of Defini-

tion 2.4.1) from testing triangle-freeness to testing a partition property which we shall

construct. Our construction will use the density characteristic to ‘align’ a partition

of the vertex pairs with a partition of the 2-tuples into edges and non-edges; having

done so, we will constrain every 3-tuple to contain at least one pair of vertices which

is a non-edge, that is, a pair that resides in the 2-tuple partition cells containing only

non-edges. This will make for a straightforward reduction from triangle-freeness testing

to testing the satisfaction of the set of density characteristics corresponding to the above

constraints.

6.4.1 Expressing basic constraints with density characteristics

To express the constraints necessary for the reduction from triangle-freeness, we shall

explore the expressive power of partition properties, gradually establishing its expansion.

The first obvious constraints that we can express using a density characteristic

set are the equality of a density value, for single (vertex or edge) density values, e.g.

µ(c, φ) = α, where α = 0 means “there are no edges respecting a certain NTD” and

α = 1 means ‘all tuples are edges respecting this NTD”. One can also constrain the sum

of several density values. An important example of this would be∑

φ∈Φs µ(c, φ) = α,

constraining the total density of the edge relation of color c to be α.

131

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 142: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

We may also constrain relations of µ values or ρ values to each other, thus expressing

the constraint of partition sets having certain sets of equal size, or sizes which are

functions of each other.

We would like to make finer and more elaborate constraints regarding the hypergraph

edge relations. Efforts in this direction may bear some fruit, e.g.:

Observation 6.4.1. If two (sets of) constraints on hypergraphs (without loss of gener-

ality, having the same set of colors) are partition-expressible, then so is their disjunction

— using the union of the density characteristic sets expressing each of them (and perhaps

promoting first the density characteristics for one of the constraints to a higher value of

m, by constraining the gratuitous sets to be empty).

But it may not be possible to achieve much more than the basics described above.

However, this section focuses on a lower bound rather than expressivity in general,

and for this purpose we may avail ourselves of ‘easy’ auxiliary relations, added to our

hypergraphs, to increase the expressive power using combinations of density constraints.

It will later become clear how such relations are useful for our lower bound construction;

for now let us describe the mechanism for their use:

A partition cell XP,r(c)j with respect to (m, k, C)-partition functions P is said to

capture the color c ∈ C if XP,sj = H(c), i.e. the partition cell contains exactly those

r(c)-tuples which are edges of color c. A set of partition cells is said to capture c if their

union contains exactly those tuples being edges of color c.

Lemma 6.4.2. Assume m > 1. Fix a color c and let S ⊆ [m]. There exists a density

characteristic set Ψ1 (respectively, Ψ2) expressing the constraint ofXP,r(c)j

∣∣ j ∈ Scapturing H(c) (respectively, capturing H(c)c =

∏rci=1X \H(c)).

Proof. For any j ∈ S, let φj be the NTD mapping [r(c)]φ7−→ j, with φ not being defined

for any other subsequence of [k]. We make the constraints µ(c, φj) = ρ(r(c), j) for every

j ∈ S, and µ(c, φj) = 0 for all j ∈ [m] \ S. This ensures that all tuples in each XP,r(c)j

are in H(c), and that prevents any tuples in H(c) from originating in other cells of arity

r(c), thus achieving the desired overall constraint.

For capturing H(c)c, we constrain µ(c, φj) = 0 for all j ∈ S, and use the sum

constraint∑

j /∈S µ(c, φj) = 1−∑

j∈S ρ(r(c), j).

In essence, the above describes a ‘sacrifice’ of an edge relation, as it will not hold

any ‘information’ other than our choice of a partition cell, or union of cells, at the

appropriate arity. Having made this sacrifice, however, we have increased our expressive

power regarding the captured partition cells:

Observation 6.4.3. One may constrain intersections of NTD-respecting tuple sets

not merely with edge relations (i.e. constrain the density of Hφ(c) sets) but also the

132

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 143: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

intersections of NTD tuple sets with other partition cells (or unions of partition cells).

Thus the set y ∈ ⋃j∈S

XP,sj

∣∣∣∣∣∣ ∀B ∈ Dom(φ)[y(B) ∈ φ(B)

]can be constrained by adding a color cS with r(cS) = s, constraining

XP,sj

∣∣ j ∈ S to

capture cS , and then using constraints on Hφ(cS) (which can be made ‘directly’ through

the density characteristic set).

6.4.2 FOL constraints and density characteristic composition

Definition 6.4.4. For a hypergraph H with colors C(H), the vocabulary τC consists

of a symbol Rc for every color c ∈ C(H), with arity r(c), and no constants or function

symbols.

Throughout the rest of the section, we refer to formulae and sentences of First-Order

Logic without equality, with some fixed vocabulary τC ; hypergraphs having color set Care said to respect τC .

Definition 6.4.5. Consider some partition functions P of a hypergraph H, some

formula ϕ(x1, . . . , xs) and some S ⊆ [m]. The set of partition cellsXP,sj

∣∣ j ∈ S is

said to capture ϕ if⋃

XP,sj

∣∣ j ∈ S contains exactly those s-tuples which satisfy ϕ.

Definition 6.4.6. Consider a function f from the labeled hypergraphs of order s with

color set C to 0, 1. We denote by Df the set of all hypergraphs with color set C ·∪ c′,and with r(c′) = s, such that for every H ∈ Df , H(c′) contains exactly those tuples

x = (x1, . . . , xs) for which f returns 1 when applied to the labeled hypergraph of Hinduced by x1, . . . , xs. For such hypergraphs we call f a deriving function for color

c′. Similarly, for a color set C and a set of functions F = fc′ | c′ ∈ C′, DF is the set

of hypergraphs with color set C ·∪ C′ for which each fc′ is a deriving function for the

hypergraph’s c′ relation.

Definition 6.4.7. A formula ϕ(x1, . . . , xs) (with respect to vocabulary τC) said to be

partition-expressible with auxiliary color set C′ if C′ contains relations of arity at most

s, and if there exists an integer m, a set S ⊆ [m], and a set Ψ of (m, k, C ·∪ C′) density

characteristics, such that the following holds. First, the hypergraphs satisfying Ψ have

uniform deriving functions for the colors in C′ — that is, there exists a set of functions

fc′ | c′ ∈ C′, such that a hypergraph with color set C′ ·∪ C satisfies Ψ if and only if it is

in DF . Second, for a hypergraph H satisfying Ψ, the partitions with which it satisfies Ψ

are those in which S captures ϕ. A formula is said to be partition-expressible if there

exists an auxiliary color set C′ with which it is partition expressible.

The first requirement for partition-expressibility is of importance to us, as we are

considering hypergraphs in which only the C relations are known, not any auxiliary

133

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 144: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

relations. With deriving functions, we are able to complete the missing relations using

the existing ones.

Observation 6.4.8. If a formula ϕ (with at least one free variable) is partition-

expressible, then so is its negation, with the same number of partition sets per arity

and the same auxiliary color set: If Ψ is a density characteristic set expressing the

constraint of S capturing ϕ, then Sc = [m] \ S captures ¬ϕ with respect to Ψ, with the

same auxiliary color set and deriving functions.

We would ideally like to establish the partition-expressibility of as large a fragment

of FOL as possible; we come up against a problem, however, already for mere atomic

formulae, before considering connectives or quantifiers: When we capture a relation

with a partition cell index (or a set of indices), we are able to set aside those tuples

satisfying, say, Rc(x1, x2, x3) or Rc(x1, x3, x2); but what about Rc(x1, x1, x3)? Density

constraints do not allow us to distinguish tuples with element repetitions. Bearing

in mind that our objective is merely expressing triangle-freeness, we shall choose to

circumvent the issue and express formulae which are free of such repetition:

Definition 6.4.9. A repeat-free FOL formula is one in which no variable appears twice

within the tuple of arguments for a relation symbol.

Lemma 6.4.10. A repeat-free atomic FOL formula ϕ(x1, . . . , xs) (with respect to τC)

is partition-expressible by a partition with m = 2 with an auxiliary color set C′ = cϕ,with r(cϕ) = s.

Proof. As τC has no function symbols or constants, the repeat-free atomic formulae

are all of the form Rc(xj1 , . . . , xjr(c)

), for some color c, with the ji’s all distinct. (Note,

however, that it may be the case that r(c) < s, i.e. some variables may be unused.)

Fix some such formula ϕ. By Lemma 6.4.2, there exists a set of density characteristics

Ψ1 (with our choice of m = 2 and vocabulary τC ·∪C′) constraining XP,r(c)1 = H(c); there

similarly exists Ψ2 constraining XP,s1 = H(cϕ).

Now, consider the set Φ` =φ ∈ Φs

∣∣ (j1, . . . , jr(c)) φ7−→ `

. This is set of NTDs is

satisfied by those s-tuples whose subtuples corresponding to ϕ originate in XP,r(c)` . We

impose the sum constraints∑

φ∈Φ1µ(cϕ, φ) = ρ(s, 1) and

∑φ∈Φ2

µ(cϕ, φ) = 0 (recall

that m = 2, so Φk = Φ1 ∪ Φ2). The combination of these implies that H(cϕ) contains

exactly the set of s-tuples respecting some NTD from Φ1. We now conjunct our

constraints with those of Ψ2 (that is, take the intersection of the density characteristic

sets), so that the s-tuples in XP,s1 are exactly those respecting some NTD from Φ1;

finally, we conjunct our constraints with those of Ψ1, so that respecting an NTD in

Φ1 means having(xj1 , . . . , xjr(c)

)∈ H(c) — and the s-tuples in XP,s1 are exactly those

with(xj1 , . . . , xjr(c)

)∈ H(c). Thus XP,s1 captures ϕ exactly (without having imposed

any other constraint on other sets XP,sj for j 6= 1).

134

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 145: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Finally, a deriving function for cφ would be the function which returns 1 if a hyper-

graph H′ with vertex set x1, . . . , xs satisfies(xj1 , . . . , xjr(c)

)∈ H ′(c), and 0 otherwise.

Note. The ‘formal’ number of variables of ϕ is significant: It is a different thing to

express, say, E(x1, x2) as a formula of 2 or of 3 variables. In the former case, in fact,

one does not even need an auxiliary relation, as the same set of 2-tuples constrained to

capture E also captures the atomic formula E(x1, x2).

Lemma 6.4.11. If formulae ϕ1(x1, . . . , xs1) and ϕ2(x1, . . . , xs2) are both partition-

expressible with partitions of size m1 and m2, respectively, and (disjoint) auxiliary color

sets C1 and C2, respectively, then the formulae (ϕ1 ∨ ϕ2), (ϕ1 ∧ ϕ2) are also partition-

expressible, with m = m1 ·m2 and auxiliary color set C′ = C1 ·∪ C2.

To prove this, we will require the ability to refine the constraints inducing any set of

density characteristics with the constraints inducing any other set:

Definition 6.4.12. Let ψ1 and ψ2 be (m1, k, C ·∪ C1) and (m2, k, C ·∪ C2) density char-

acteristics (C1 and C2 are disjoint). The composition of the two density characteristics,

denoted Ψψ1⊗ψ2 , is an (m, k) = (m1 ·m2, k) density characteristic set with respect to the

color set C ·∪ C1 ·∪ C2. Abusing our earlier definition somewhat, denote P(s) = P1 ×P2

and think of the partition functions for an (m, k)-partition as though the m1 ·m2 cells

have pairs of indices rather than a single index: (P1 × P2)(s) :∏si=1X→ [m1]× [m2].

Now let P1(s) :∏si=1X→ [m1] and P2(s) :

∏si=1X→ [m2] be the projections of P(s)

onto the first and second coordinates, respectively, i.e. xP(s)7−−−→ ((P1(s))(x), (P2(s))(x)).

Now, partition functions P satisfy Ψψ1⊗ψ2 if the projected P1 and P2 partition functions

satisfy ψ1 and ψ2 respectively. In other words, Ψψ1⊗ψ2 contains all density characteristics

ψ meeting sum constraints on ρ and µ ‘gathering’ the refined partition cells in an entire

cell of P1 or of P2. For ρ, these constraints are:∑j1∈[m1]

ρψ(k′, (j1, j2)

)= ρψ2

(k′, j2

) ∑j2∈[m2]

ρψ(k′, (j1, j2)

)= ρψ1

(k′, j1

)for every j1 ∈ [m1] and j2 ∈ [m2] respectively. For µ values, We need a bit more

machinery. Every NTD φ′ in Φk′ with respect to m1 ·m2 corresponds to two NTDs

φφ′,1, φφ′,2 with respect to m1 and m2 respectively, with the same domain as φ′, such

that φ′(A) =(φφ′,1(A), φφ′,2(A)

)— the projections of φ′ onto the first and second

coordinates. Now, for some φ1, let Φ′1 be the set of all NTDs φ′ in Φk′ with respect

to m1 ·m2 for which φφ′,1 = φ1, and let Φ′2 be defined similarly for any φ2. The sum

constraints on Ψψ1⊗ψ2 for µ values are:∑φ′∈Φ1

µψ(c, φ′

)= µψ1(c, φ1)

135

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 146: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

for every c ∈ C ·∪ C1 and φ1 in Φk′ with respect to m1, and

∑φ′∈Φ2

µψ(c, φ′

)= µψ2(c, φ2)

for every c ∈ C ·∪ C2 and φ2 in Φk′ with respect to m2.

Definition 6.4.13. Let Ψ1 and Ψ2 be (m1, k, C ·∪ C1) and (m2, k, C ·∪ C2) density

characteristic sets (with C1 and C2 disjoint). The composition of the two density

characteristics, denoted Ψ1⊗Ψ2, is the union of all compositions of pairs of characteristics

from Ψ1 and Ψ2, i.e. Ψ1 ⊗Ψ2 =⋃ψ1∈Ψ1

⋃ψ2∈Ψ2

Ψψ1⊗ψ2 .

Proof of Lemma 6.4.11. Let Ψ1,Ψ2 be the density characteristic sets expressing the two

formulae (and their negations), respectively, with capturing cell index sets S1 ⊆ [m1]

and S2 ⊆ [m2] respectively. Consider the composition Ψ1 ⊗ Ψ2 and some partition

functions P with respect to this composition: x ∈∏si=1X satisfies ϕ1 if and only if

x ∈ XP1,s

(j1,j′2)for some j1 ∈ S1 and some j′2 ∈ [m2]; x ∈

∏si=1X satisfies ϕ2 if and only if

x ∈ XP1,s

(j′1,j2)for some j2 ∈ S2 and some j′1 ∈ [m1]. Thus, the composed partition cells

with index set S1×S2 capture (ϕ1 ∧ ϕ2); and by De-Morgan’s law, the cells with index

set (S1c×(S2)c)c capture (ϕ1 ∨ ϕ2). The expressibility is maintained, as the auxiliary

relations with colors in C1 and C2 are unaffected by the composition (we simply keep

the deriving functions for the relations in both auxiliary relation sets).

Lemma 6.4.14. If a formula ϕ(x1, . . . , xs) is partition-expressible with auxiliary color

set C′, and with deriving functions F , then there exists density characteristic sets Ψϕ,∃

and Ψϕ,∀, which are only satisfied by hypergraphs in DF , and their satisfying graphs are

those whose sub-hypergraph obtained by considering the C relations only, satisfies

∀x1 . . . ∀xs[ϕ(x1, . . . , xs)

]∃x1 . . . ∃xs

[ϕ(x1, . . . , xs)

]respectively. In other words, at least one of the sub-hypergraph’s s-tuples satisfies ϕ if

the graph satisfies Ψϕ,∃, and all of the sub-hypergraph’s s-tuples satisfy ϕ if the graph

satisfies Ψϕ,∀.

Proof. Constrain a set S of partition cells to capture ϕk(x1, . . . , xs); now constrain the

set⋃

XP,sj

∣∣ j ∈ S to be non-empty (for an ∃ constraint) or full (for a ∀ constraint),

i.e. constrain either∑

j∈S ρ(s, j) > 0 or∑

j∈S ρ(s, j) = 1. The density characteristic set

Ψ for these constraints is indeed a set satisfied by exactly those pairs of a hypergraph Hwith auxiliary relations as per the deriving function, with partition functions with which

S captures ϕ, and with a tuple of H satisfying ϕ (or with all tuples of H satisfying ϕ in

the case of a ∀ constraint). Thus the two sentences are partition-expressible

136

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 147: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

6.4.3 The reduction from testing triangles

Let C = c with r(c) = 2. The property of the binary relation H(c) being triangle-free

(in other words, 3-cycle free) is the property of all hypergraphs which satisfy the following

FOL sentence:

ϕtriangle-free = ∀x1, x2, x3

[¬(Rc,2(x1, x2) ∧Rc,2(x2, x3) ∧Rc,2(x3, x1))

]with this fact at hand, we can proceed to prove our lower bound.

Note. The formula above forbids degenerate triangles as well, i.e. ones in which two

or more of the vertices are the same. Regarding these we can either use the fact

that the known lower bound of [AS04a] uses a tri-partite graph with no degenerate

triangles, or better still, note that a degenerate triangle must contain a self-loop, while

non-degenerate triangles do not contain them; thus if a graph is free from having non-

degenerate triangles, then it is 1/n-close to being altogether triangle-free, and a graph

is at least as far from being triangle-free as it is from being non-degenerate-triangle free.

Consequently, a test for degenerate-triangle-freeness in digraphs making q queries is a

valid test for triangle-freeness for n = Ω(1/ε). We may therefore disregard the issue of

degenerate triangles.

Proof of Theorem 6.2. By Lemma 6.4.14, ϕtriangle-free is a partition-expressible con-

straint, if we add three auxiliary relations of arity 3 (one for each of the relation symbols

appearing in the sentence), each of which with a deriving function. Let Ψ denote the

density characteristic set guaranteed by the lemma (expressing this constraint using the

above-mentioned auxiliary relations) and consider some hypergraph test for ΠΨ making

q(ε) queries.

Given oracle access to a digraph input with edge set E, we simulate an oracle to a

hypergraph with the color set of Ψ, as follows: Queries to H(c) are answered as queries

to the digraph; when a query to an auxiliary relation is made about a certain tuple,

the oracle queries the subgraph induced by the tuple vertices, and reports whether a

hyperedge of the auxiliary relation exists by applying the appropriate deriving function to

the (labeled) queried subgraph. If the input digraph is triangle-free, then the simulated

hypergraph satisfies ΠΨ; if the input digraph is ε-far from being triangle-free, then the

simulated hypergraph is at least ε-far from ΠΨ, as, in particular, one must alter at least

an ε-fraction of E in order to satisfy ϕtriangle-free.

This oracle meets the requirements of Definition 2.4.1, with f(ε) = ε, h(n) = n

and g(n) = 9 (as each query to an auxiliary relation requires at most 32 queries to E).

The property of testing triangle-freeness is therefore reducible to testing an (arbitrary)

hypergraph partition property; we now apply Lemma 2.4.2: Since, by [AS04a], the

triangle-freeness of a digraph cannot be tested using less than (c′/ε)c′·ln( c′/ε ) for some

global constant c′, so is the case for hypergraph partition properties (up to a constant

factor).

137

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 148: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

138

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 149: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Chapter 7

Open Questions

Some of the research work resulting in this thesis has fully resolved the questions it had

set out to address; other questions were given essential answers with a gap between

what has been established and a potential for future improvement or tightening; and

others have been given only partial answers indicating a way for future research. All

of these, however, bring up additional questions, either regarding their continuation

and extension, or on issues only touched upon which may have independent interest.

Additionally, some questions reflect objects of the author’s research efforts, in the

context of the previous chapters, which have not yielded concrete results as of yet. All

such question have been concentrated in this chapter, mostly grouped by the chapter

which inspired them.

7.1 Natural testing and inflatable properties

Naturalization without canonization. Chapter 3 explores natural tests entirely

through the prism of canonical (and more generally, non-adaptive) testing; so much

so that it can be seen as a further study of canonical tests rather than of naturalness

in testing. What can be said regarding the naturality of non-canonical and possibly

adaptive tests? Can such tests be made natural without incurring the double penalty of

canonization followed by naturalization of a canonical test?

‘Natural’ testing with an n-dependent number of queries. What kind of prop-

erties have tests whose number of queries depends on n, but whose decision, in some

sense, does not? For example, we might consider a test which accepts if the query

results satisfy a sentence in some appropriate logic (e.g. First-Order Logic or Monadic

Second-Order Logic, with a vocabulary allowing for unqueried edges).

Note that the above two issues are particularly relevant to the question of natural testing

in the sparse graph model, in which non-adaptivity is costly to impose, and where many

interesting properties investigated thus far actually have n-dependent query complexity.

139

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 150: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

The “heredity and inflatability gap” for natural testing. Our test natural-

ization procedure requires much stronger approximate heredity and approximate in-

flatability than we can deduce in the reverse direction from the existence of a natural

test. Can the requirements be somehow relaxed, or alternatively, can it be shown that

naturally-testable properties have stronger approximate inflatability and approximate

heredity?

Testing a large graph by testing small subgraphs. Goldreich and Trevisan posed

in [GT05] the question of whether any test for a hereditary property can be replaced

with merely ensuring that a random small induced subgraph (not much larger than the

subgraph queried by the original test) has the property — as was originally claimed in

[GT03, Proposition D.2]. We’ve shown that being hereditary and inflatable, or having

an original test with one-sided error, is a sufficient condition for this to hold. Are these

conditions, or similar ones, also necessary? (Note that this question differs from the

previous one, at least in that such a test need not be natural and the tested subgraph

size might depend on n.)

The benefit of non-natural testing. Some testable properties have a non-constant-

factor gap in query complexity between their adaptive and non-adaptive tests; Such a

gap may also exist between natural and n-dependent tests. As with adaptivity, it will

be bounded by the penalty of naturalizing the test when at all possible. Can one find

specific properties exhibiting such a gap, or ‘non-contrived’ properties for which there is

no gap (similarly to Goldreich and Ron’s work in [GR10] regarding adaptivity in tests)?

A more appropriate notion of inflatability. Our choices for the definition of a

blowup and of (perfect) inflatability are somewhat arbitrary. For example, the property

of being the empty graph is inflatable, but the property of being the complete graph is

not — since the clusters in a blowup are empty rather than, say, supporting a clique.

Also, the property of being H-free, when H itself is a (generalized) blowup of a smaller

graph, is not inflatable. However, these properties are all (s(δ), δ)-inflatable on the

average (even though for the case of subgraph freeness, s(δ) is exceedingly high). Can

one devise a more appropriate, perhaps more relaxed notion of inflatability, which covers

such properties as well, while still allowing for naturalization with the same polynomial

penalty as in Theorem 3.1? We are uncertain whether one can devise a useful notion of

graph blowups under which all such properties would be considered ‘perfectly’ inflatable.

Of course, this is not much of an issue with regard to (s, δ)-inflatability, as at high

orders the edges within the clusters have a negligible effect on the distance.

Testability of inflatable graph properties. Alon and Shapira have shown in

[AS08a] that any hereditary property is testable (albeit with a prohibitively high query

complexity). Is this also the case for properties which are only known to be inflatable?

140

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 151: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

That is, can one use the closure to blowups, rather than the closure to taking induced

subgraphs, to devise a test? Perhaps Goldreich and Avigad’s recent work in [AG11] can

shed some light on this question.

7.2 Hard properties and complexity hierarchies

Hard functions with a combination of desirable features. The two construc-

tions of hard properties in Chapter 4, namely, in Section 4.2 and Subsection 4.6.1,

immediately beg the question of whether one can combine the desirable features of two

or all three of the constructed properties. Specifically, are there hard graph properties

(requiring Ω(n2)

queries) which are

• both monotone and decidable in PTIME?

• monotone, and with a test whose running time is polynomial in n?

Note that one-sided-error testing is a feature of all hard properties, since reading the

entire input constitutes a one-sided test with a minimum number of queries up to a

constant. Also, it seems likely that the use of an NPTIME-decidable small sample

space for constructing a hard-to-test property, as in [GGR98, Proposition 10.2.3.2], can

yield a monotone property decidable in NPTIME at least.

Complexity hierarchies with a combination of desirable features. Assuming

that appropriate hard properties can be constructed, is it also the case that the desirable

feature of the three query complexity hierarchy results can be combined? Specifically,

for any reasonable q(n), is there a dense graph property requiring Θ(q(n)) queries which

• is both monotone and one-sided-testable with Θ(q(n)) queries?

• is monotone, and has a test making Θ(q(n)) queries with running time polynomial

in q(n)?

• has a Θ(q(n))-query, poly(q(n))-time test which is also one-sided?

Towards this end, it may be useful to consider whether one can use a permutation-

invariant LDPC code in the initial construction (see Subsection 4.2.2).

Decoupling the dependence on n and ε. For the case of generic functions, Chap-

ter 4 establishes the existence of properties with query complexity c · q(n) + f(ε) where

c is independent of ε. Can this be established in other models? A discussion of this

possibility for the case of bounded-degree graphs (with no answer) can be found in the

conclusion of Section 4.4. What about properties of dense graphs? This question can be

asked, of course, for any combination of the desirable features in the different hierarchy

theorems.

Hard properties that are ‘self-similar’ at different values of n. The construc-

tions in Chapter 4 make no guarantees regarding the relation between Πn1 and Πn2 for

141

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 152: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

n1 ≤ n2 — not even if n1 = n2± 1. Can hard properties be constructed, and hierarchies

be shown to exist, for properties in which the property is ‘similar’ for different values of

n? e.g. where adding or removing a vertex from a satisfying graph puts it at a relatively

small distance from the property? This question can be asked also for any combination

of the desirable features in the different hierarchy theorems.

Tighter bounds on the effect of graph blowups on distances. As mentioned

in Chapter 3 and Chapter 4, the distance between graphs does not change overmuch

when applying an exactly-balanced blowup: It does not increase (an easy observation)

and does not drop by a factor higher than 3 (the result of [Pik10, Lemma 14]). An

example by Arie Matsliah shows that the distance can drop to as low as 10/11 of the

original distance. It would be interesting to tighten both the upper and lower bound on

the potential drop in distance, and to gain a better understanding of this drop.

The effect of hypergraph blowups on distances. Does Pikhurko’s result regard-

ing the preservation of distance under blowup carry to hypergraphs? Also, what about

an extension to hypergraphs of the similar lemma for the case of dispersed graphs and

imperfectly-balanced blowups (Lemma 4.5.13)? One tends to believe that both of these

should hold. Establishing the latter should also allow proving hierarchy theorems for

hypergraphs, or any dense structure.

7.3 Partite and multi-colored dense structures

Subgraph-freeness testing in partite vs general graphs. The state-of-the-art

lower bounds on induced subgraph freeness testing (specifically, triangle testing) are

based on using the arithmetic-progression-free set constructions in a partite graph

(tri-partite for the case of triangles); the fixed number of parts is what allows us to

apply this lower bound to the case of colored bipartite graphs or matrices, which can

simulate a higher number of parts — but not a general graph. Do better lower bounds

hold for testing induced subgraph freeness in general graphs, rather than for testing in

partite ones? Can constructions rely specifically on the “non-partiteness” of a graph? It

seems that this can be answered negatively, so that lower bounds in the general settings

are translatable to partite graph and colored bipartite graph lower bounds (by methods

similar to those used in Chapter 5), and partite graph tests can translate to tests of

general graphs (through the partitioning of general graphs and the testing of partite

subgraphs).

Expressive power of subgraph-freeness with multiple colors. The results pre-

sented in this thesis mean that three colors are more expressive than two in bipartite

graphs, in that properties which are harder to test can be expressed as freeness of

certain induced subgraphs. What about three-vs-two colors in three-dimensional tensors

142

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 153: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

(i.e. 3-partite 3-uniform hypergraphs)? Perhaps it can it be shown that tensors can be

carved up into test-identifiable regions, so as to simulate additional colors (in which case

the query complexity of testing freeness of an arbitrary family of subtensors will not

be higher when allowing multiple cell colors). Also, is there additional such expressive

power in allowing more than three different colors in bipartite graphs?

7.4 Hypergraph partition properties

In studying tuple partition properties, the initial hope was to obtain a proof that a

wider class of tuple partition properties is pseudo-testable — rich enough to essentially

capture the property of a hypergraph having a certain regular partition. To express

the constraints necessary for representing such a regular partition, it is necessary to

cross-constrain elements and subtuples of a single tuple (see discussion in Section 6.1).

Attempts to establish the pseudo-testability of such properties have not met with success

thus far; had they succeeded, a test for a regular partition would be at hand, due to the

following lemma, which we present informally and without proof here:

Lemma 7.4.1. Consider the property of a (uniform) hypergraph having an ε-regular

partition with a fixed maximum number of partition sets m′. If a hypergraph has a

partition with m′ sets, whose densities (with respect to the appropriately expressive

definitions of partition densities) are close to those densities corresponding to a regular

partition, then the graph is f(ε)-close to having a g(ε)-regular partition with m′ sets.

Efficiently testing for regular partitions in hypergraphs. Is the specific prop-

erty of a hypergraph having a regular partition, with a fixed maximum number of

partition sets, testable with poly(1/ε) queries? If not, what lower and upper bounds

can one establish for the query complexity of this property? We note that the super-

polynomial lower bound, established for testing a partition property even with limited

expressibility, does not necessarily apply to this particular property.

Pseudo-testing vs. actual testing of rich-constraint partition properties.

With the limited expressibility imposed in this work, we’ve shown that testing a

hypergraph partition property is harder than pseudo-testing it. Does this hold for

rich-constraint partition properties? That is, can one show that pseudo-testing is, say,

polynomial in ε? Or more generally, establish that the query complexity of pseudo-

testing is q1(ε, n) and find a rich-constraint partition property requiring q2(ε, n) queries

with q1 = o(q2)?

Possible hardness of non-rich partition properties. We’ve established that hy-

pergraph partition properties, even without ‘rich’ constraints, can capture a property

with query complexity super-polynomial in ε. But this construction was not overly

complex, and only utilized a maximum arity of 3. Can a more involved construction can

143

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 154: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

be shown to require a super-polynomial number of queries in 1/ε , significantly higher

than the bound due to the reduction from triangle-freeness testing? We have not even

ruled out the existence of partition properties of this kind whose query complexity must

depend on n: Such a possibility seems unlikely, as a small random subgraph should

exhibit about the same partition as the large graph, and such properties are clearly

inflatable; but this is not much more than intuition.

7.5 Expanding the testing model via ‘plugging’

Expanding the testing model via ‘plugging’ testable relations and functions

Consider questions of the following type: “Let E′ be all vertex tuples of a certain arity,

which satisfy a certain condition. Now, given a set of tuples, what fraction of it intersects

E′?” or “what is its distance from E′?”. One can think of this as a “formula-type

property” rather than a “sentence-type property” as in formal logic. Now, suppose one

has an oracle which answers questions of this type with certain query complexity to

the input structure. It would be interesting to consider property tests which use such

oracles as subroutines; in the case of such a subroutine giving an “is in E′ / is not in

E′” answer, one could think of the test having temporarily or locally added a new edge

relation to the structure (in the same way as when quantifying over a relation variable

in formal logic, it is used as relation symbols from the vocabulary would be used).

In fact, this is done implicitly by many tests in the literature and some in this

work, e.g. when obtaining an approximate clustering of vertices using a signature

(Algorithm 4.3 in Chapter 4). One could think of such a test as constructing or learning

a probably-approximately-correct partition function, and then applying another test to

a structure which has both an edge relation and a partition function. If the construction

is valid and the richer-vocabulary structure test is valid, then so is the test of the

original structure. This conceptual approach links different testing models in a more

general way than mere reductions (Definition 2.4.1), and its study may yield some

“meta-results” regarding testing. Thus when given a property whose query complexity is

to be ascertained, one could approach the problem by augmenting the input structure

with “testable relations” or “testable functions”, and only need to consider the modified

problem as though these relations or functions were provided perfectly rather than

through a test.

7.6 Ordered structures

The dense structure testing models studied in this work all share the requirement that

properties be invariant to permutations of the vertices — that is, that properties not

relate to any ordering of the vertices. (An alternative definition of a test is proposed

in this work — Definition 2.1.4 — explicitly adopting the implications of this fact.)

Testing models in which vertices are ordered, with no permutation possible, have not

144

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 155: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

been the object of much study thus far.

Efficient testing of induced (ordered) submatrices. [AFN07] show that it is easy

to test a matrix for being free of a fixed set of small submatrices and their permutations.

What about a set of forbidden submatrices not closed to permutations? The answer to

this question regards testing matrices without ignoring the coordinate order. As part of

the research work leading to this thesis, efforts were made to apply the upper bounds

of [AFN07] in this context, using a ‘conditional regularity’ lemma for forbidden small

submatrices (see Section 5.1); unfortunately, these efforts have not met with success.

On the other hand, there seems to be no indication against the unordered-case result

carrying to ordered matrices.

Applicability of unordered results to the ordered settings. Generalizing the

previous questions, which results carry over from the unordered to the ordered-vertex

setting? Some can be seen to easily carry over, such as lower bounds on testing

forbidden subgraph freeness — using closure under permutations and a reduction to the

unordered case. What about results such as regularity-based (and other) upper bounds?

Canonization, adaptivity gaps, etc.? Also, what kind of upper and lower bounds can

one obtain in the ordered setting for specific properties with known n-dependent query

complexity in the unordered setting?

145

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 156: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

146

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 157: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Bibliography

[ABI86] Noga Alon, Laszlo Babai, and Alon Itai. A fast and simple random-

ized parallel algorithm for the maximal independent set problem.

Journal of Algorithms, 7(4):567–583, 1986.

[ADL+94] Noga Alon, Richard A. Duke, Hanno. Lefmann, Vojtech Rodl, and

Raphael Yuster. The algorithmic aspects of the regularity lemma.

Journal of Algorithms, 16:80–109, 1994.

[AFKS00] Noga Alon, Eldar Fischer, Michael Krivelevich, and Mario Szegedy.

Efficient testing of large graphs. Combinatorica, 20:451–476, 2000.

[AFN07] Noga Alon, Eldar Fischer, and Ilan Newman. Efficient testing of

bipartite graphs for forbidden induced subgraphs. SIAM Journal on

Computing, 37(3):959–976, 2007.

[AFNS09] Noga Alon, Eldar Fischer, Ilan Newman, and Asaf Shapira. A

combinatorial characterization of the testable graph properties: It’s

all about regularity. SIAM Journal on Computing, 39(1):143–167,

2009. An earlier version appeared in the proceedings of the 38th

STOC, 2006.

[AG11] Lidor Avigad and Oded Goldreich. Testing graph blow-up. In Oded

Goldreich, editor, Studies in Complexity and Cryptography, volume

6650 of Lecture Notes in Computer Science, pages 156–172. Springer,

2011.

[AK99] Noga Alon and Michael Krivelevich. Testing k-colorability. SIAM

Journal on Discrete Mathematics, 15:211–227, 1999.

[AKKR08] Noga Alon, Tali Kaufman, Michael Krivelevich, and Dana Ron.

Testing triangle-freeness in general graphs. SIAM Journal on Discrete

Mathematics, 22:786–819, 2008.

[Alo99] Noga Alon. Private communication, 1999.

[Alo02] Noga Alon. Testing subgraphs in large graphs. Random Structures

and Algorithms, 21(3-4):359–370, 2002.

147

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 158: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

[AS04a] Noga Alon and Asaf Shapira. A characterization of easily testable

induced subgraphs. In Proceedings of the 15th SODA, pages 942–951,

2004.

[AS04b] Noga Alon and Asaf Shapira. Testing subgraphs in directed graphs.

Journal of Computer Systems Science, 69(3):354–382, 2004.

[AS05] Noga Alon and Asaf Shapira. Every monotone graph property is

testable. In Proceedings of the 37th STOC, pages 128–137, New

York, NY, USA, 2005. ACM Press.

[AS06] Noga Alon and Asaf Shapira. A characterization of easily testable

induced subgraphs. Combinatorics, Probability and Computing,

15(6):791–805, 2006. An earlier version appeared in the proceedings

of the 15th SODA, 2004.

[AS08a] Noga Alon and Asaf Shapira. A characterization of the (natural)

graph properties testable with one-sided error. SIAM Journal on

Computing, 37(6):1703–1727, 2008. An earlier version appeared in

the proceedings of the 46th FOCS, 2005.

[AS08b] Noga Alon and Asaf Shapira. A separation theorem in property

testing. Combinatorica, pages 261–281, 2008.

[ASE92] Noga Alon, Joel H. Spencer, and Paul Erdos. The Probabilistic

Method. Wiley-Interscience Series in Discrete Mathematics and

Optimization. John Wiley and Sons, Inc., 1992.

[BCL+06] Christian Borgs, Jennifer Chayes, Laszlo Lovasz, Vera T. Sos, Balazs

Szegedy, and Katalin Vesztergombi. Graph limits and parameter

testing. In Proceedings of the 38th STOC, pages 261–270, New York,

NY, USA, 2006. ACM Press.

[Beh46] Felix A. Behrend. On sets of integers which contain no three terms

in arithmetical progression. Proceedings of the National Academy of

Sciences of the USA, 32:331–332, 1946.

[BEKKR10] Ido Ben-Eliezer, Tali Kaufman, Michael Krivelevich, and Dana Ron.

Comparing the strength of query types in property testing: The case

of testing k-colorability. In Goldreich [Gol10], pages 253–259.

[BLR90] Manuel Blum, Michael Luby, and Ronitt Rubinfeld. Self-tes-

ting/correcting with applications to numerical problems. In Proceed-

ings of the 22nd STOC, pages 73–83, New York, NY, USA, 1990.

ACM Press.

148

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 159: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

[BOT02] Andrej Bogdanov, Kenji Obata, and Luca Trevisan. A lower bound

for testing 3-colorability in bounded-degree graphs. In Proceedings

of the 43th FOCS, 2002.

[BT04] Andrej Bogdanov and Luca Trevisan. Lower bounds for testing

bipartiteness in dense graphs. Proceedings of CCC 2004, pages

75–81, 2004.

[Elk11] Michael Elkin. An improved construction of progression-free sets.

Israeli Journal of Mathematics, 184:93–128, 2011.

[Fis04] Eldar Fischer. The art of uninformed decisions: A primer to property

testing. In G. Paun, G. Rozenberg, and A. Salomaa, editors, Current

Trends in Theoretical Computer Science: The Challenge of the New

Century, volume 1, pages 229–264. World Scientific Publishing, 2004.

[FM06] Eldar Fischer and Arie Matsliah. Testing graph isomorphism. In

Proceedings of the 17th SODA, pages 299–308, New York, NY, USA,

2006. ACM Press.

[FMS07] Eldar Fischer, Arie Matsliah, and Asaf Shapira. Approximate hy-

pergraph partitioning and applications. In Proceedings of the 48th

FOCS, pages 579–589, 2007.

[FN01] Eldar Fischer and Ilan Newman. Testing of matrix properties. In

Proceedings of the 33rd STOC, pages 286–295, New York, NY, USA,

2001. ACM Press.

[FN07a] Eldar Fischer and Ilan Newman. Testing of matrix-poset properties.

Combinatorica, 27(3):293–327, 2007.

[FN07b] Eldar Fischer and Ilan Newman. Testing versus estimation of graph

properties. SIAM Journal on Computing, 37(2):482–501, 2007.

[Fox11] Jacob Fox. A new proof of the graph removal lemma. Annals of

Mathematics, 174(1):561–579, 2011. available at the following URL:

http://math.mit.edu/~fox/paper-removal.pdf.

[FR07] Eldar Fischer and Eyal Rozenberg. Lower bounds for testing for-

bidden induced substructures in bipartite-graph-like combinatorial

objects. In Proceedings of RANDOM 2007, pages 464–478, Berlin,

Heidelberg, 2007. Springer-Verlag.

[FR11] Eldar Fischer and Eyal Rozenberg. Inflatable graph properties

and natural property tests. In Proceedings of RANDOM 2011,

pages 542–554, Berlin, Heidelberg, 2011. Springer-Verlag. Full

149

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 160: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

version available at http://www.cs.technion.ac.il/~eyalroz/

publications/FR2011.pdf.

[GGR98] Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing

and its connection to learning and approximation. Journal of the

ACM, 45(4):653–750, 1998.

[GKNR10] Oded Goldreich, Michael Krivelevich, Ilan Newman, and Eyal Rozen-

berg. Hierarchy theorems for property testing. In Goldreich [Gol10],

pages 289–294.

[Gol10] Oded Goldreich, editor. Property Testing - Current Research and

Surveys (outgrow of a workshop at the Institute for Computer Sci-

ence (ITCS) at Tsinghua University, January 2010), volume 6390 of

Lecture Notes in Computer Science. Springer, 2010.

[Gow07] William Timothy Gowers. Hypergraph regularity and the multidimen-

sional szemeredi theorem. Annals of Mathematics, 166(3):897–946,

2007.

[GR99] Oded Goldreich and Dana Ron. A sublinear bipartitness tester for

bounded degree graphs. Combinatorica, 19(3):335–373, 1999.

[GR02] Oded Goldreich and Dana Ron. Property testing in bounded de-

gree graphs. Algorithmica, 32(2):302–343, 2002. An earlier version

appeared in the proceedings of the 29th STOC, 1999.

[GR07] Mira Gonen and Dana Ron. On the benefits of adaptivity in property

testing of dense graphs. In Proceedings of RANDOM 2007, pages

525–539, Berlin, Heidelberg, 2007. Springer-Verlag.

[GR09] Oded Goldreich and Dan Ron. On proximity oblivious testing. In

Proceedings of the 41st STOC, pages 141–150, New York, NY, USA,

2009. ACM.

[GR10] Oded Goldreich and Dana Ron. Algorithmic aspects of property

testing in the dense graphs model. In Goldreich [Gol10], pages

295–305.

[GT03] Oded Goldreich and Luca Trevisan. Three theorems regarding testing

graph properties. Random Structures and Algorithms, 23(1):23–57,

2003.

[GT05] Oded Goldreich and Luca Trevisan. Errata for [GT03], 2005. available

at the following URL:

http://www.wisdom.weizmann.ac.il/~oded/PS/tt-err.ps.

150

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 161: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

[GW10] Ben Green and Julia Wolf. A note on elkin’s improvement of behrend’s

construction. In David Chudnovsky and Gregory Chudnovsky, editors,

Additive Number Theory, pages 141–144. Springer, 2010.

[Hoe63] Wassily Hoeffding. Probability inequalities for sums of bounded

random variables. Journal of the American Statistical Association,

58:13–30, 1963.

[Ish09] Yoshiyasu Ishigami. A simple regularization of hypergraphs, 2009.

available from the following URL:

http://arxiv.org/abs/math/0612838.

[KKR04] Tali Kaufman, Michael Krivelevich, and Dana Ron. Tight bounds for

testing bipartiteness in general graphs. SIAM Journal on Computing,

33(6):1441–1483, June 2004.

[NRS06] Brendan Nagle, Vojtech Rodl, and Mathias Schacht. The counting

lemma for regular k-uniform hypergraphs. Random Structures &

Algorithms, 28(2):113–179, 2006.

[Obr11] Kevin Obryant. Sets of integers that do not contain long arithmetic

progressions. The Electronic Journal of Combinatorics, 18:59–73,

2011. available at the following URL:

http://www.emis.ams.org/journals/EJC/Volume_18/PDF/

v18i1p59.pdf.

[Pik10] Oleg Pikhurko. An analytic approach to stability. Discrete Mathe-

matics, 310(21):2951 – 2964, 2010.

[PRR03] Michal Parnas, Dana Ron, and Ronitt Rubinfeld. Testing member-

ship in parenthesis languages. Random Structures and Algorithms,

22(1):98–138, 2003.

[Ron01] Dana Ron. Property testing (a tutorial). In Sanguthevar Rajasekaran,

Panos M. Pardalos, John H. Reif, and Jose D. P. Rolim, editors,

Handbook of Randomized Computing. Kluwer Press, 2001.

[Ron10] Dana Ron. Algorithmic and analysis techniques in property testing.

Foundations and Trends in Theoretical Computer Science, 5(2):73–

205, 2010.

[RS96] Ronitt Rubinfeld and Madhu Sudan. Robust characterizations of

polynomials with applications to program testing. SIAM Journal on

Computing, 25(2):252–271, 1996.

151

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 162: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

[Sha04] Ronen Shaltiel. Recent developments in explicit constructions of

extractors. In G. Paun, G. Rozenberg, and A. Salomaa, editors,

Current Trends in Theoretical Computer Science: The Challenge

of the New Century, volume 1, pages 229–264. World Scientific

Publishing, 2004.

[Sha06] Asaf Shapira. Graph Property Testing and Related Problems. PhD

thesis, Tel Aviv University, 2006.

[Sze78] Endre Szemeredi. Regular partitions of graphs. In M. Las Vergnas

J. C. Bermond, J. C. Fournier and D. Sotteau, editors, Proc. Colloque

Inter. CNRS, pages 399–401, 1978.

[Yao77] Andrew Chi-Chih Yao. Probabilistic computations: Toward a unified

measure of complexity (extended abstract). In 18th FOCS, pages

222–227, 1977.

152

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 163: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

בגרפי־על רשומות של חלוקה תכונות של בדיקת־דמה

גרפי־ תכונות של הידועות המחלקות את להרחיב האפשרות בשאלת עוסקות אחרונות תוצאות שתי

בפרט, .(1/εב־ פולינומי שאילתות (במספר יעילה לבדיקה ניתנות אשר קשתות, יחסי מרובי על,

ישנה כאשר גרף על־ידי מסתפקות אשר תכונות גרפים: של חלוקה תכונות של בהכללה מעיינים אנו

בין הקשתות צפיפות אודות הן הקבוצות גדלי אודות הן אילוצים מספקת אשר צמתיו, של חלוקה

הראה קודם ומחקר גרפים; לגבי יעילה לבדיקה ניתנת זו במחלקה תכונות כי ידוע שונות. קבוצות

יעילה. לבדיקה ניתן לגרפי־על כאלה תכונות של בסיסית הכללה כי

חלוקה גם אלא הצמתים, של רק אינה החלוקה בה יותר, רב כושר־ביטוי בעלת הכללה בוחנים אנו

כי מראים אנו סדורות). k־יות עד הלאה וכן סדורות שלשות סדורים, (זוגות הצמתים רשומות של

באופן (לדוגמה, ביטויה כושר את המגביל באופן אותה מגדירים כאשר אפילו – זו במחלקה התכונות

בהתאם רגולרית, חלוקה בעל גרף־על של היותו את כזו בתכונה להביע כנראה מאפשר אינו אשר

לבצע ניתן כי מראים אנו מאידך, יעילה. לבדיקה ככלל ניתנות אינן – לגרפי־על) הרגולריות ללמת

הנבדק לגרף־העל קיימות האם ביעילות להבחין ניתן אומר, הווה יעילה, 'בדיקת־דמה' כאלה לתכונות

במקרה המספיקה זו, 'בדיקת־דמה' מאי, אלא הצפיפות. אילוצי את בקירוב המקיימות חלוקות

על התחתון החסם שמוכיח כפי הכללי, במקרה מספיקה אינה של־ממש, בדיקה לעריכת גרפים של

מציגים. אנו אותו יעילה לבדיקה השאילתות סיבוכיות

iv

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 164: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

נוספים ומבנים צפופים גרפים עבור שאילתות סיבוכיות הירארכיות

שלהן. הקלט גרף בגודל משמעותית תלות תלויות אשר בדיקות של לחקרן פונים אנו טבעיות מבדיקות

זו לשאלה עונים אנו הגרף? גודל ,nב־ שרירותית תלות פונקציית לכל כאלה תכונות קיימות האם

שלהן, השאילתות סיבוכיות לפי תכונות מחלקות של הירארכיות של קיומן הוכחת באמצעות בחיוב,

לכל כי מוכיחים אנו כלליים, בקווים גרפים. תכונות בדיקת של הדליל לדגם הן הצפוף לדגם הן

(נמוך o(q(n)) שאילתות במספר לבדוק ניתן לא אשר גרפים של תכונה קיימת ,q(n) סבירה פונקציה

.O(q(n)) שאילתות במספר לבדוק ניתן אך ,(q(n)מ־ משמעותית

לכל לניסוח קלה מלאכותית, שאינה תכונה באמצעות הירארכיה משפט מבססים אנו הדליל, בדגם

צמתים מחברת אינה קשת שאף כך צביע אומר, (הווה צבעים ב־3 צביע גרף של היותו תכונת :q(n)

,O(q(n))ב־ בדיקה ישנה זו לתכונה היותר; לכל q(n) בגודל קשירים רכיבים ובעל שונה), בצבע

חד־צדדית. שגיאה בעלת שהינה

מוסיף מהם אחד כל אשר שונים, הירארכיה משפטי שלושה למעשה, מוכיחים, אנו הצפוף, בדגם

הבדיקה: של או התכונה של מעניינים מאפיינים

דטרמיניסטית להכרעה ניתנות אשר תכונות עבור שאילתות סיבוכיות מחלקות של הירארכיה •פולינומי בזמן פועל שלהן הבדיקה אלגוריתם ואשר גרפים), של (כשפות בגודל פולינומי בזמן

השאילתות. במספר

כאלה דווקא לאו כי (אם מונוטוניות תכונות עבור שאילתות סיבוכיות מחלקות של הירארכיה •פולינומי). בזמן דטרמיניסטית להכרעה הניתנות

דטרמיניסטית להכרעה ניתנות אשר תכונות עבור שאילתות סיבוכיות מחלקות של הירארכיה •חד־צדדית. שגיאה בעלת הינה שלהן הבדיקה ואשר גרפים), של (כשפות בגודל פולינומי בזמן

צדדיים מבנים של בבדיקה תחתונים חסמים

לדוגמה, כך, הגרף. "צידי" המכונות קבוצות למספר מתחלקים צמתיו אשר מבנה הוא צדדי מבנה

בקבוצה לצומת אחת מקבוצה צומת בין נמתחת קשת וכל צמתים, קבוצות שתי ישנן דו־צדדי בגרף

צד". אותו "בתוך קשתות אין – השניה

אנו צפופים. צדדיים מבנים של בתכונות להתמקדות גרפים של בתכונות מהתמקדות עוברים אנו

גרפי־על לגבי וכן מרובות־צבעים, קשתות בעלי דו־צדדיים גרפים לבדיקת תחתונים חסמים מציגים

סופיים שדות מעל וטנסורים מטריצות כעל לחשוב ניתן (עליהם בקשתותיהם kואחידים־ k־צדדיים

חיובית תוצאה השונים). במימדים הטנסור או המטריצה שיעורי של היחסי מסדרם בהתעלם קבועים,

היכולת את ביססה דו־צדדיים, בגרפים מותנית' 'רגולריות של עקרון על מתבססת אשר קודמת,

שאילתות במספר בבדיקה דו־צדדיים, בגרפים אסורים קטנים תתי־מבנים של היעדרותם את לבדוק

מספר ייתכנו לקשתות כאשר תקפה אינה זו תוצאה כי מוכיחים אנו .1/ε בפרמטר פולינומי שהינו

באמצעות זאת ;k ≥ 3 עבור kואחידים־ k־צדדיים לגרפי־על במעבר מוגדל המימד כאשר או צבעים,

פתרון מהווים אלו חסמים .1/ε בפרמטר על־פולינומית בפונקציה מלרע) (חסמים תחתונים חסמים

ונוימן. פישר אלון, של פתוחה לשאלה

iii

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 165: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

גודל. בכל גרפים עבור זהה חישוב

של ומחלקות תכונות של אפיונים גם להגדיר ניתן כך בודקים, של באיפיונים להבחין שניתן כשם

q(n, ε) פונקציה עבור שאילתות: לסיבוכיות כמובן נוגעים ביותר המתבקשים האיפיונים תכונות.

שהינו שאילתות (מספר שאילתות O(q(n, ε)) מבצעת אשר בדיקה שלהן בתכונות נתעניין כלשהיא,

לעיון זכה אשר תכונות, של למאפיין אחרת דוגמה .(qב־ כפול קבוע על־ידי אסיסמפטוטית, חסום,

בתכונה, מבנה של תת־מבנה כל אם נורשת, היא צפופים מבנים של תכונה הנורשות: הינה מחקרי,

התכונה. את בעצמו מקיים יותר, הגדול המבנה צמתי של תת־קבוצה על־ידי המושרה

בדיקת לגבי הן זאת, תכונות. לבדיקת הצפוף הדגם את הבנתנו את להעמיק מכוון זה חיבור

מרובי־ דו־צדיים וגרפים פשוטים גרפים כגון – במסגרתו מסוימים מסוגים מבנים של תכונותיהם

פיתוח צפופים; מבנים של ניפוח סוגי של פיתוח באמצעות היתר בין זאת כללי. באופן הן – צבעים

ואיפיונים למושגים אלו כל של וקישור ויישום הצפוף; בדגם תכונות של חדשים מועילים איפיונים

תכונות. לבדיקת בקשר ידועים

ושותפיו המחבר בידי פורסמו אשר במאמרים, ברובן הופיעו להלן, נסקרות אשר שבחיבור, התוצאות

גרסאותיהם המחבר. של הדוקטורט מחקר תקופת במהלך שונים ובכתבי־עת בכנסים למחקר

.[FR11]ו־ [GKNR10] ,[FR07] הינן: הללו הפרסומים של ביותר העדכניות

טבעיות תכונה ובדיקות בנות־ניפוח תכונות

חדש, מאפיין מציגים אנו שלהם. טבעיות בבדיקות ומעיינים צפופים, בגרפים תחילה מתמקדים אנו

גרפים. של (מאוזן) ניפוח לפעולת סגורה היא אם בת־ניפוח היא תכונה בנות־ניפוח: תכונות של היותן

בה למידה קרוב קשר קשורה תכונה של טבעיות בדיקות של השאילתות סיבוכיות כי מראים אנו

כמעט שהינן לתכונות, בדיקות כי בפרט מראים אנו ובת־ניפוח־בקירוב. נורשת־בקירוב הינה התכונה

במספר מאוד גבוה מחיר לשלם מבלי טבעיות לבדיקות להפוך ניתן בנות־ניפוח, וכמעט נורשות

ההפוך, בכיוון צפופים. גרפים בדיקות של הידועה ה'קנוניזציה' שיטת של בסיסה על זאת השאילתות;

התלויות במידות ובנות־ניפוח־בקירוב, נורשות־בקירוב הינן טבעית בדיקה להן תכונות כי מראים אנו

הבדיקה. של השאילתות במספר

חלקית משחזרים אנו לטבעיות, בדיקות ההופכת מפתחים, אנו אותה ה'אקלום', שיטת באמצעות

ובאמצעותה נורשות, תכונות בדיקת אודות וטרביסאן גולדרייך העלו אשר טענה של תוקפה את

גם אנו בגרפים. משולשים היעדר של ודו־צדדית חד־צדדית בדיקה על חסם בין הקשר את מכלילים

כזו. בדיקה על הידוע המיטבי התחתון החסם של מזערי חיזוק של פשוטה מפורשת הוכחה מוכיחים

תכונה בדיקות ושל תכונות של אחרים למאפיינים ברות־הניפוח מושג בין ביחסים גם מעיינים אנו

בחוסר־מודעות ובדיקה נורשות, חד־צדדית, שגיאה כגון: במחקר, זה־מכבר נדונו אשר הצפוף, בדגם

למרחק.

גרפים, שאינם כלליים צפופים למבנים וה'אקלום' ה'קנוניזציה' שיטות את מכלילים אנו לבסוף,

גרפים. לגבי רק ולא בכלל הצפוף לדגם טענותינו את מבססים ובאמצעותן

ii

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 166: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

תקציר

(קומבינטוריים). צירופיים מבנים של תכונות של ההסתברותית הבדיקה הינו זה חיבור של עניינו

"כמה הסוג: מן בשאלות עוסק המחקר מן ניכר חלק חישובים, של סיבוכיותם בתחום בעיונים

בדיקת שהינם חישובים חישובית?" בעיה לפתור מנת על נדרשים מסויים מסוג חישוביים משאבים

כלשהוא, צירופי מבנה לגבי שאילתות של האפשר) (ככל מצומצם מספר שאלת על מתבססים תכונות

לשאילתות התשובות על־בסיס להכריע הבדיקה אלגוריתם על ישירה; בגישה כולו את לקרוא מבלי

רחוק אינו המבנה (אם אותה. מלקיים רחוק הוא שמא או כלשהיא, תכונה מקיים המבנה האם

את המקיים מבנה לבין בינו ההבדל את לגלות לא עלול שאילתות של קטן מספר תכונה, של מקיומה

גבוהה. בהסתברות נכונה להיות ההכרעה על התכונה.)

את ובוחן חישובי, כמשאב הדרוש השאילתות במספר מתמקד התכונות בדיקת בתחום המחקר

המבנים בגודל כתלות זאת, תכונות. של מחלקות ושל מסוימות תכונות של השאילתות סיבוכיות

.(εב־ (המסומן בבדיקה טעות מותרת בו המרחק בפרמטר כתלות וכן ,(nב־ (המסומן הנבדקים

אילו בשאלות: בעיקר, הינם, דגמים בין ההבדלים תכונות. בדיקת של שונים (מודלים) דגמים ישנם

של פירושו מה – וכן הנבדק; המבנה לגבי לשאול ניתן שאילתות אילו בדיקה; עוברים מבנים מין

אפשריים. קלט מבני בין (המטריקה) המרחק פונקציית מהי אומר, הווה 'רחוק', מבנה

הצפוף", "הדגם מכונה התכונות בדיקת בתחום ביותר השכיח המחקר מושא את היווה אשר הדגם

בלתי־מכוונים גרפים, עבור לראשונה הוגדר אכן זה דגם גרפים". לבדיקת הצפוף "הדגם במקור או

מלאים, ועד ריקים מגרפים – האפשריות הקשתות כל ייתכנו בהם עצמיות), לולאות (חסרי ופשוטים

בגרף?". jה־ לצומת iה־ הצומת בין קשת קיימת "האם הינה זה בדגם שאילתה כך, קשתות;(n2

)בני

את לקבל כדי מהם לאחד להסיר או להוסיף יש אשר הקשתות מספר הינו גרפים שני בין המרחק

מספר בעל גרף שכן "צפוף", בכינוי הדגם את מזכה מרחק פונקציית של זו בחירה השני; הגרף

ריק. לגרף מספיק) גבוהים n ערכי (עבור כרצוננו קרוב הוא עץ) (כגון o(n2)קשתות, של תת־ריבועי

גרפי־על צבועות, קשתות בעלי או מכוונים גרפים אחרים: רבים למבנים עוד רלבנטי הצפוף הדגם

ורב־מימדיים). דו־מימדיים מערכים (או וטנסורים מטריצות (היפרגרפים),

הבדיקה, אלגוריתמי את שונים אפיונים ולאפיין לדוק ניתן תכונות, לבדיקת יחיד דגם במסגרת גם

שונים. בהקשרים דורשים הם אשר השאילתות וסיבוכיות החישוביות יכולותיהם לעניין משמעות להם

המבנה את תדחה לא לעולם כזו בדיקה חד־צדדית: שגיאה בעלת להיות עשויה תכונה בדיקת לדוגמה,

תכונה אותה. מקיימים שאינם מבנים רק לדחות ותוכל התכונה, את מקיים המבנה כאשר הנבדק

בלתי־תלויה הינה פעולתה אם טבעית בדיקה מכונה תכונה בדיקת טבעיות: הינה בודקים של אחרת

ומבצע זהה שאילתות מספר שואל הבדיקה אלגוריתם אומר, הווה הנבדק, הגרף של בגודלו לחלוטין

i

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 167: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 168: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

המחשב. למדעי בפקולטה פישר, אלדר פרופסור של בהנחייתו בוצע המחקר

במהלך ובכתבי־עת בכנסים למחקר ושותפיו המחבר מאת כמאמרים פורסמו זה בחיבור התוצאות רוב

הינן: ביותר העדכניות גרסאותיהם אשר המחבר, של הדוקטורט מחקר תקופת

Eldar Fischer and Eyal Rozenberg. Lower bounds for testing forbidden induced substructuresin bipartite-graph-like combinatorial objects. In Proceedings of RANDOM 2007, pages464–478. Springer, 2007.

Eldar Fischer and Eyal Rozenberg. Inflatable graph properties and natural property tests.In Proceedings of RANDOM 2011, pages 542–554, Berlin, Heidelberg, 2011. Springer-Verlag.

Oded Goldreich, Michael Krivelevich, Ilan Newman, and Eyal Rozenberg. Hierarchy theoremsfor property testing. In Oded Goldreich, editor, Property Testing, volume 6390 of LectureNotes in Computer Science, pages 289–294. Springer, 2010.

תודות

מצא בהם הפעמים כל ועל וההכוונה, העזרה על פישר, אלדר פרופ' שלי, למנחה להודות ברצוני

תקפות. להוכחות לבסוף שהוביל באופן בלתי־מוצקים בטיעונים פערים

רוצה אני ציבורי. בתפקיד לשרת העונג לי היה עימו יקר חבר אבינועם, לאורי תודה אסיר אני

לקחת לא לי עזר אשר גבוהים, לתארים המשתלמים ארגון בועד שלישי חבר תמיר, לגל גם להודות

לאחרים גם תודה מכיר אני תמיכתו. את צריך כשהייתי לעזרתי ונחלץ מדי, כבדה בצורה דברים

רונן, מוטי ישי, מארק אנגלברג, רועי יותר: ארוכות או קצרות לתקופות בטכניון, שירתתי איתם

אודה זה בהקשר ואחרים. פרבר יאיר בראודה, יונתן שרגאי, נדב ויינשנקר, דניאל סיוון, אידה

מחוץ לוי. וטל דוברין נטע ולנסי, אפרת קאופמן, לאבי ובפרט השנים, לאורך הארגון לצוות גם

מאוניברסיטת הכהן גונן כמו אחרות, מאוניברסיטאות ידידותיים לפעילים תודה חב אני לטכניון

מתן הטוב וידידי בן־גוריון מאוניברסיטת קורן שיאון תל־אביב, מאוניברסיטת קרני אוהד חיפה,

רבות. שלימדוני הדס וניצן מישורי לדניאל וכן העברית; האוניברסיטה מן פרזמה

אוזן הטיית ועל והאחרות, המקצועיות ועצותיו, השראתו על גולדרייך עודד לפרופ' מודה אני

לפיתוח כשותפים ניומן, ואילן קריבילביץ' מיכאל לפרופסורים גם כמו לעודד, תודה חב אני קשת.

ויואב רון דנה מצליח, אריה רובינפלד, לרונית גם להודות רוצה אני זו. בעבודה התוצאות מן חלק

מועילה, נגדית דוגמה מציאת על רק לא לאריה, בפרט מודה אני חלק. אותו לגבי סייעו אשר צור

במשרדו. בפעם מפעם שיחת־כורסה על גם אלא

והידע התרבות עושר שלי. למנחה היה כמעט אשר מקובסקי, יוהאן לפרופ' גם להודות רוצה אני

אפילו מעניין, לדיון פתוחה תמיד (כמעט) היתה ודלתו השראה, עבורי היוו יאנוש של אופקיו ורוחב

מחקרי. רעיון על דווקא לדבר באת לא אם

סוידאן, פיראס לחדר, שותפיי הללו: השנים לאורך הפקולטה של הרבים ל"דיירים" גם תודה

הפעיל ויינשטיין יוסי איתי; אורי איש־החידות; אלעזר רביב עטיה; ויוסי ולבסוף מנו עדי ובהמשך

היתר. וכל התחזוקה; סופרמן טיגראן השקט;

לאחי הללו; השנים לאורך ידידותן על בר ולאיריס גרינשטיין לענת להודות רוצה אני לבסוף,

עד להגיע יכולתי לא האוהבת תמיכתם ללא אשר ורוניקה, ואימי יעקב לאבי ואחרונים־חביבים יגאל;

הלום.

זה. מחקר מימון על לטכניון מסורה תודה הכרת

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 169: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 170: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

מבניות ותוצאות תחתונים חסמיםתכונות בבדיקת

צפופים קומבינטוריים מבנים של

מחקר על חיבור

התואר לקבלת הדרישות של חלקי מילוי לשם

לפילוסופיה דוקטור

רוזנברג אייל

לישראל טכנולוגי מכון – הטכניון לסנט הוגש

2012 ינואר חיפה התשע"ב טבת

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 171: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012

Page 172: Lower bounds and structural results in property testing of ...€¦ · and Trevisan regarding testing hereditary properties, and generalize the relation between one-sided and two-sided

מבניות ותוצאות תחתונים חסמיםתכונות בבדיקת

צפופים קומבינטוריים מבנים של

רוזנברג אייל

Technion - Computer Science Department - Ph.D. Thesis PHD-2012-07 - 2012


Recommended