FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S...

FURSTENBERG’S ERGODIC THEORY PROOF OF

SZEMEREDI’S THEOREM

ZIJIAN WANG

Abstract. We introduce the basis of ergodic theory and illustrate Fursten-

berg’s proof of Szemeredi’s theorem.

Contents

1. Introduction 12. A brief introduction to ergodic theory 22.1. Ergodicity and weak mixing 62.2. Compact systems 132.3. Factor and extension 142.4. Conditional measures 152.5. Weak mixing and compactness for extensions 172.6. The structure theorem 183. Furstenburg’s proof of Szemeredi’s theorem 183.1. General Strategy 183.2. Szemeredi’s theorem 193.3. Correspondence 193.4. Two fundamental systems 233.5. Extension principles 273.6. Conclusion 28Acknowledgments 284. bibliography 28References 28

1. Introduction

The statement of Szemeredi’s theorem is very simple.

Theorem 1.1 (Szemeredi). A subset of integers with positive upper Banach densityhas arbitrarily long arithmetic progressions.

It was first proved by Szemeredi in 1975 using a combinatorial and completelyelementary approach. Although his method was extremely complicated, some ofthe important ideas such as Szemeredi’s regularity lemma in graph theory cameout from his proof. Two years later, a totally different approach is introduced byFurstenberg. He turned Szemeredi’s theorem, a problem that looks extremely com-binatorial, into an ergodic puzzle about multiple recurrence of a measure-preserving

Date: AUGUST 28, 2018.

1

2 ZIJIAN WANG

system. Later in 2002, Gowers gave a Fourier-analytic proof. The fact that the orig-inal question asked by Erdos and Turan in 1936 is answered in three completelydistinct ways has already made this problem highly interesting. In this paper, wediscuss Furstenberg’s ergodic proof of Szemeredi’s theorem. Despite the elegance ofFurstenberg’s ergodic proof of Szemeredi’s theorem, the value of this proof goes waybeyond solving the problem per se. His proof sheds light on many important topicsin ergodic theory, for instance, the classification of dynamical systems, conditionalmeasures, extensions, etc.

2. A brief introduction to ergodic theory

Ergodic theory studies dynamical systems. By dynamical systems, we meancertain “good” actions on measure spaces that exhibit interesting long-term behav-iors. An Z−action is just a function from the space to itself, or in other words,a dynamics. Obviously, not all functions are “well-behaved”. In this section, wetalk about the basics of ergodic theory to set the foundation for our later discussions.

Definition 2.1. A measure space (X,B, µ) is a space X with measure µ andthe σ-algebra B of measurable sets.

Sometimes we ignore the σ-algebra associated to the measure space and justwrite (X,µ) when it is not so important. However, one shall treat the σ−algebrawith great caution when dealing with conditional measures which will be discussedlater in this paper.

Remark 2.2. In this paper, we mostly assume that we are dealing with probabilityspaces, in which the measure of the entire space is 1.

Definition 2.3. A map T : (X,BX , µ)→ (Y,BY , ν) is measure-preserving if forany set A ∈ BY , µ(T−1A) = ν(A).

Definition 2.4. A measure-preserving map φ is an invertible measure-preservingmap if the inverse of φ is measurable and well-defined almost everywhere.

Definition 2.5. We call (X,B, T, µ) a measure-preserving system, or equiva-lently a dynamical system, if T is a measure-preserving map on X.

Example 2.6. We define 2Z to be the infinite product of {0, 1}. This space iscompact by the Tychonoff’s theorem. Given an element x ∈ 2Z, we denote thekth coordinate of x by x[k]. We define a measure µ on the space 2Z by an infiniteproduct. Let πj be the projection onto the jth coordinate. On each copy of {0, 1},we use the ”half-half” measure ν, i.e. for each measurable set A1

ν[A] =

1 if A = {0, 1},0 if A is empty,12 otherwise.

We define µ by µ(B) =∏n∈Z ν(πnB) for all the measurable rectangles in the form of∏

j∈ZAj2 and extend the measure to the entire σ−algebra of measurable sets. This

way of defining a measure is valid as explained in Remark 2.8. Moreover, one candefine the Bernoulli shift Tk on this space for any integer k. Tk acts on an element

1Since we have a finite set, every subset is measurable.2Each Aj is measurable in its own copy of {0, 1}.

FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S THEOREM 3

x ∈ 2Z by shifting each coordinate of x to the left by k bits, i.e. x[s] = Tkx[s − 4]for all s ∈ Z.

Example 2.7. Given a circle T = R/Z equipped with the Haar measure µ, wecan define rotation Rα acting by addition, i.e. Rα : x 7→ x + α. This forms ameasure-preserving system. In order to prove that Rα is measure-preserving, itsuffices to show that Rα preserves the measure of all the intervals3. Notice that forany interval (a, b) ⊂ T,4 µ(R−1

α (a, b)) = (b− α)− (a− α) = b− a = µ((a, b)).

Remark 2.8. Proving the measure-preserving property for every measurable set canbe painful. However, it suffices to prove this property for a collection of sets thatgenerates the σ-algebra. This is a standard trick that we will keep using repeatedly.

Example 2.9. Instead of rotation, we can define a different dynamics on thecircle T, namely the circle doubling map M2, where M stands for multiplication.M2 : T → T is defined by M2(a) = 2a. For an arbitrary interval (a, b) ⊂ T,µ(M−1

2 (a, b)) = µ((a2 ,b2 )⋃

(a+12 , b+1

2 )) = ( b2 −a2 ) + ( b+1

2 −a+1

2 ) = b − a = µ(a, b).Therefore, the circle doubling map M2 is also a dynamics on T. In fact, we canshow that Mk, multiplication by k, is measure-preserving for every natural numberk.

Remark 2.10. Now we have defined two different dynamics on the same space T(or R/Z). One natural question to ask is whether these two systems are equivalent.Although it is quite obvious that they are different given that the action a 7→ a+αis not even close to the action a 7→ 2a. However, it is hard to tell whether twodynamical systems are different or “behave in some similar ways” when they are indifferent spaces. Therefore we introduce the notion of measurable isomorphism.

Definition 2.11. Given a probability measure space (X,B, µ) and a measurableset A ∈ B. A is null if µ(A) = 0. On the other hand, A is conull if µ(A) = 1.

This gives us a convenient way to talk about the special sets that have zero orfull measure, which we will encounter a lot in our discussion of ergodic theory.

Definition 2.12. In a dynamical system (X,B, T, µ) and a measurable set A ∈ B.We call A T-invariant, or invariant to T , if TA ⊂ A. Moreover, if TA = A, wesay that A is strictly T-invariant, or strictly invariant to T .

Definition 2.13. Two systems (X,BX , TX , µ) and (Y,BY , TY , ν) are measurablyisomorphic if there exist conull sets X ′ ∈ BX invariant to TX and Y ′ ∈ BY invariantto TY and an invertible measure-preserving map f : X ′ → Y ′ such that f ◦ TX =TY ◦ f for every x ∈ X ′, i.e. that the following diagram commutes a.e.

X X

Y Y

TX

f f

TY

Notice that the above commutative diagram may only be defined on a set of fullmeasure.

3See Remark 2.8.4Here we view the circle as the interval [0, 1) with endpoints identified.

4 ZIJIAN WANG

Example 2.14. (T,BT ,M4, µ) is isomorphic to (T2,BT2 ,M2 ⊗M2, µ ⊗ µ) whereµ⊗µ is the product measure and M2⊗M2 : T2 → T2 is defined by M2⊗M2(t1, t2) =(2t1, 2t2). It is clear that M2 ⊗M2 is a measure-preserving map on T2. Here weconstruct a measure-preserving map φ from T to T2 such that the diagram belowcommutes.

T T

T2 T2

M4

φ φ

M2⊗M2

We construct a sequence {φn}n∈N of maps from T to T2 where each φn is a measure-preserving map on a ”small σ-algebra”. When n = 1, we define C1 ⊂ BT to be thetrivial σ-algebra, which only contains the entire interval [0, 1) and the empty set.Similarly, we define D1 ⊂ BT2 to be the σ-algebra that only contains the unitsquare. We define φ1 to be some bijective map5 from [0, 1)to [0, 1)2. It is clearlymeasure-preserving when viewed as a map from (T, C1) to (T2,D1). When n = 2,we divide the interval into four subintervals {[0, 1

4 ), [ 14 ,

12 ), [ 1

2 ,34 ), [ 3

4 , 1)} and defineC2 ⊂ BT to be the σ-algebra generated by these four subintervals. Similarly, wecan divide T2 into four squares and define the σ-algebra D2. The function φ2 isdefined by sending the four subintervals of T into the for subsquares of T2 in counterclockwise order starting at the top left square. Again, we can use some bijectivemap to ensure that the map φ2 : (T, C2) → (T2,D2) is measure-preserving. Byconstruction, the sequence {φn}n∈N converges to some measurable function φ andthe limit φ : (T,BT) → (T2,BT2) is measure-preserving. We can prove that such amap is an isomorphism by viewing the points in T in digit 4 expansions and thepoints in T in binary expansions, which is another standard trick in the theory ofdynamical systems.

Remark 2.15. We think of the bijection from a different direction i.e., [0, 1)2 →[0, 1). There are many constructions of such a bijective map. One of the mostdirect ways is to interleave the terms in the continued fraction expansion of eachcoordinate to get a single number in [0, 1). However, it is not very clear that thismap is onto. Actually, we only need to know the existence of a bijective mapinstead of the actual form. Therefore, we can use the Cantor-Schroder-Bernsteintheorem. x 7→ (x, 0) gives the injective map from [0, 1)to [0, 1)2. Interleaving thedigits of decimal expansion on each of the coordinates, i.e (0.a1a2a3..., 0, b1b2b3) 7→0.a1b1a2b2..., gives an injection from [0, 1)2to [0, 1). When the decimal expansionis not unique, we use the Axiom of Choice to pick a random one.

As we have seen in the previous examples, the measure-preserving property isalready highly nontrivial. There are already a lot to say about these systems directlyfrom the definition of being measure-preserving. Here we prove some elementarybut useful results about measure-preserving maps.

Definition 2.16. Given a measure-preserving system (X,B, T, µ), we define UT theassociated operator, or equivalently the operator associated to T by UT f(x) =f(Tx). Moreover, we define U∗T to be the adjoint of UT such that 〈UT f, g〉 =〈f, U∗T g〉 for all f, g ∈ L2(X).

5See Remark 2.15 for more details.


Remark 2.17. Notice that in the Definition 2.16, we did not specify the spaces thatthe operator lives in. This is a rather loose definition since the actual spaces dependon the context. For example, when we are dealing with L2 spaces like in Theorem2.27, we assume that UT : L2(X)→ L2(X).

Theorem 2.18. Given a measure-preserving system (X,B, T, µ), for any L1 func-tion f , we have ∫

X

UT fdµ =

∫X

fdµ

.

Proof. For characteristic function χB ,∫XUTχBdµ = µ(T−1B) = µ(B) =

∫XχB

since T is measure-preserving. Now for any L1 function f that is nonnegative, wecan take a sequence of simple functions that approximate f and apply the monotoneconvergence theorem. Finally for a general L1 function f , we can define f = f+−f−where both parts are nonnegative and then apply the previous result.

Remark 2.19. Another way to state Theorem 2.18 is to say that UT : L1(X) →L1(X) is an isometry, i.e. ‖UT f‖1 = ‖f‖1.

Theorem 2.20. (Poincare Recurrence) Given a dynamical system (X,B, T, µ) andA ∈ B, almost every point in A returns to A infinitely often.

Proof. We need to show that there exists a set A′ ⊂ A of full measure such thatfor every point a ∈ A′ there exists a strictly increasing integer sequence {ni}i∈Nsuch that Tnia ∈ A. In order to find such an A′ we remove from A step-by-step acountable collection of sets with measure 0. Let N1 = {x ∈ A|Tna /∈ A for all n ≥1} =

⋂∞i=1 T

−iAc⋂A ⊂ A. We claim that the set N has measure zero. Indeed,

T−nN1 =⋂∞i=n+1 T

−iAc⋂T−nA. For 1 ≤ m < n, we have T−nN1 ⊂ T−nA but

T−mN1 ⊂ T−nAc since m < n. Therefore, the countable family {T−kN1} containsmutually disjoint sets. We know that

1 ≥µ(

∞⋃i=0

T−iN1)

=

∞∑i=0

µ(T−iN1) (they are disjoint)

=

∞∑i=0

µ(N1) (T is measure-preserving),

so µ(N1) = 0. We now define A1 = A − N1. By construction, every point in A1

returns to A at least once and µ(A) = µ(A1). Now we define N2 = {x ∈ A|T 2na /∈A1 for all n ≥ 1} =

⋂∞i=1 T

−2iAc1⋂A1. Now we let A2 = A1 − N2. Similarly,

µ(A2) = µ(A1) = µ(A) since N2 has measure zero. Moreover, every point in A2

returns to the set A at least twice. Let A′ =⋂∞i=1Ai. Then A′ has the same

measure as A and every point in A′ returns to A infinitely often.

Remark 2.21. Theorem 2.20 does not require any high-level techniques. We cangenerate the result of Poincare’s theorem from the recurrence of points to recurrenceof part of the set A that has positive measure. We call this property the multiplerecurrence property. As we will see later, such generalization applies to arbitrarymeasure-preserving systems just like Poincare recurrence.

6 ZIJIAN WANG

Definition 2.22. A measure-preserving system (X,B, T, µ) has multiple recur-rence of order k if for each A ∈ B that has positive measure, there exist n ∈ Nsuch that

µ

(k−1⋂i=0

T−inA

)> 0.

Despite the similarity between Theorem 2.20 and Definition 2.22, the latter is ac-tually much more sophisticated and involves some of the deeper results in ergodictheory. In fact, the whole point of this paper is to prove the multiple recurrenceproperty for a general measure-preserving system. We will show that this is equiv-alent to proving Szemeredi’s theorem in Theorem 3.9.

2.1. Ergodicity and weak mixing.

Definition 2.23. A measure-preserving system (X,B, T, µ) is ergodic if for anystrictly invariant set A ∈ B such that T−1A = A either µ(A) = 1 or µ(A) = 0.

In the above definition of ergodicity, we can actually “replace” sets with functionsand give a different characterization of ergodicity. The proof is similar to that ofTheorem 2.18, where we prove the proposition for simple functions first and passthrough limits.

Theorem 2.24. A measure-preserving system (X,B, T, µ) is ergodic if and onlyif any measurable function f such that f(Tx) = f(x) for almost every x ∈ X isconstant almost everywhere.

Example 2.25. Consider the dynamical system (T, Rα) where T is the circle andRα : x 7→ x + α. For any Rα− invariant function f ∈ L2(T), let

∑∞n=−∞ cne

2πint

be the Fourier expansion of f(t).

f(Rαt) =

∞∑n=−∞

cne2πin(t+α)

=

∞∑n=−∞

cne2πinαcne

2πint.

By the uniqueness of Fourier coefficients, we know that cn = 0 for all n except n = 0,where cn = cne

2πinα holds trivially since e2πinα = 1. Therefore, f is constant andirrational rotation on the circle is ergodic.

Example 2.26. Now consider the circle doubling system (T,M2) where M2 acts onT by M2(x) = 2x. We use the same technique as the previous example. Suppose wehave a L2 function f(t) that has Fourier expansion

∑∞n=−∞ cne

2πint. Now we have

f(2t) =∑∞n=−∞ cne

2πin(2t) =∑∞n=−∞ cne

2πi(2n)t. Therefore, ck = c2k = c4k... for

every integer k. Notice that the function f(t) is in L2, so the sequence {cn}n∈Zshould be square summable. This is not possible if there is some k 6= 0 such thatck 6= 0. As a result, f must be constant and the system (T,M2) is ergodic.

Theorem 2.27. (Mean ergodic theorem) Given a measure-preserving system (X,B, T, µ),we define P the orthogonal projection onto the closed subspace

V = {h ∈ L2(X)|UTh = h} ⊂ L2(X).


Then we have

1

N

N−1∑n=0

UnT f → Pf

in L2 for all f ∈ L2(X).

Proof. Observe that L2(X) = V ⊕W where W = {UT g − g|g ∈ L2(X)} ⊂ L2(X).To see this, it suffices to show that W⊥ = V . If h ∈ V , then

〈h, UT g − g〉 =〈h, UT g〉 − 〈h, g〉=〈UTh, UT g − 〈h, g〉=〈h, g〉 − 〈h, g〉 (by Theorem 2.18)

=0.

If h ∈W⊥, we need to show that h = UTh. Indeed, we know that 0 = 〈h, UT g−g〉 =〈h, UT g〉 − 〈h, g〉 = 〈U∗Th, g〉 − 〈h, g〉 for all g ∈ L2(X). This means that

(2.28) h = U∗Th

where U∗T is the adjoint of UT defined in Definition 2.16. Therefore

‖UTh− h‖22 =〈UTh− h, UTh− h〉=〈UTh, UTh〉+ 〈h, h〉 − 2〈UTh, h〉=2〈h, h〉 − 2〈h, U∗Th〉=2〈h, h〉 − 2〈h, h〉 (by 2.28)

=0.

Now we have proved our observation and we are ready to prove the mean ergodictheorem. Given any f ∈ L2(X), there exists a sequence of L2 functions {hi}i∈N ⊂W such that f = Pf + h where hi → h. 6 Notice that

1

N

N−1∑n=0

UnT f =1

N

N−1∑n=0

UnTPf +1

N

N−1∑n=0

UnT h

=1

N

N−1∑n=0

Pf +1

N

N−1∑n=0

UnT h (Pf ∈ V )

=Pf +1

N

N−1∑n=0

UnT h.

It suffices to show that 1N

∑N−1n=0 U

nT h → 0. Let hi = UT gi − gi for each hi. Then

we have ∥∥∥∥∥ 1

N

N−1∑n=0

UnT hi

∥∥∥∥∥2

=

∥∥∥∥∥ 1

N

N−1∑n=0

UnT (UT gi − gi)

∥∥∥∥∥2

=1

N

∥∥UNT gi − gi∥∥2

≤2 ‖gi‖2N

6Whenever we talk about convergence in this proof, we assume it is convergence in L2 norm.

8 ZIJIAN WANG

which goes to 0 as N → ∞. For any ε > 0, we choose i large enough such that

‖h− hi‖2 <ε2 and then choose N large enough such that

∥∥∥ 1N

∑N−1n=0 U

nT hi

∥∥∥2< ε

2 .

Finally, ∥∥∥∥∥ 1

N

N−1∑n=0

UnT h

∥∥∥∥∥2

≤

∥∥∥∥∥ 1

N

N−1∑n=0

UT (h− hi)

∥∥∥∥∥2

+

∥∥∥∥∥ 1

N

N−1∑n=0

UnT hi

∥∥∥∥∥2

<1

N

N−1∑n=0

‖UT (h− hi)‖2 +ε

2

=1

N

N−1∑n=0

‖(h− hi)‖2 +ε

2(by Theorem 2.18)

<ε.

�

Remark 2.29. Even though Theorem 2.27 bears the name of ”mean ergodic”, itactually applies to general measure-preserving systems that are not ergodic. No-tice that when using mean ergodic theorem on ergodic systems, we have a stronger

result, namely that 1N

∑N−1n=0

∫XUnT fdµ

L2

−−→∫Xfdµ for any L2 function f by The-

orem 2.24. Indeed, this is a direct consequence of the fact that the only strictlyinvariant sets in an ergodic system either has zero or full measure. Combining withthe property that L2 convergence implies convergence under the weak topology inthe Hilbert space L2, here we provide a weaker version of the mean ergodic theorem.

Corollary 2.30. Given an ergodic system (X,B, T, µ) and f, g ∈ L2, we have

limN→∞

1

N

N−1∑n=0

∫X

fUnT gdµ =

∫X

fdµ

∫X

gdµ

Proof. By the mean ergodic theorem, we know that

1

N

N−1∑n=0

∫X

UnT gdµL2

−−→∫X

gdµ.

Therefore,

1

N

N−1∑n=0

∫X

UnT gdµL2

−−→∫X

gdµ

Actually, we only need that

1

N

N−1∑n=0

∫X

UnT gdµL2

−−⇀∫X

gdµ


which is true since L2 convergence implies weak convergence. Now we have

limN→∞

1

N

N−1∑n=0

〈f, UnT g〉 =〈f,∫X

g〉

=〈f, 1〉∫X

gdµ

=

∫X

fdµ

∫X

gdµ.

Corollary 2.31. A measure-preserving system (X,B, T, µ) is ergodic if and only

if limN→∞1N

∑N−1n=0 µ(A ∩ T−nB) = µ(A)µ(B) for all A,B ∈ B.

Proof. If (X,B, T, µ) is ergodic, we apply Corollary 2.30 by taking f, g to be the

characteristic functions of the sets A,B. It is clear that limN→∞1N

∑N−1n=0 µ(A ∩

T−nB) = µ(A)µ(B). On the other hand, suppose limN→∞1N

∑Nn=0 µ(A∩T−nB) =

µ(A)µ(B). Recall that in order to show that (X,B, T, µ) is ergodic, we need toshow that any strictly invariant set V ∈ B has either zero or full measure. We takeA = V c and B = V . Notice that µ(A∩T−nB) = µ(A∩B) = µ(Bc∩B) = 0 for any

integer n. Therefore, µ(A)µ(B) = limN→∞1N

∑N−1n=0 µ(A ∩ T−nB) = 0, meaning

that either A = Bc or B has zero measure, i.e. T is ergodic. �

While the mean ergodic theorem only applies to L2 functions, the theorem belowis more general since it is true for all L1 functions. The trade-off here is that theconvergence is only guaranteed to be pointwise.

Theorem 2.32. (Birkhoff). Given a measure-preserving ergodic system (X,B, T, µ)and any L1 function f , we have

limn→∞

1

N

N−1∑n=0

UnT f =

∫X

fdµ

almost everywhere and pointwise in L1.

Definition 2.33. A measure-preserving system (X,B, T, µ) is mixing, or strongmixing if limn→∞ µ(A ∩ T−nB) = µ(A)µ(B) for all A,B ∈ B.

Definition 2.34. A measure-preserving system (X,B, T, µ) is weak mixing if

limN→∞1N

∑N−1n=0 |µ(A ∩ T−nB)− µ(A)µ(B)| = 0 for all A,B ∈ B.

Remark 2.35. Recall that it is a fact in analysis that if a sequence of real numbers{an} ⊂ R converges to some real number a, then the Cesaro sum has the same

limit, i.e. 1N

∑N−1n=0 an → a. Therefore, strong mixing implies weak mixing.

Theorem 2.36. If a measure-preserving system (X,B, T, µ) is weak mixing, thenit is ergodic.

Proof. Take a strictly invariant set A ∈ B. By Definition 2.34, we have

limN→∞

1

N

N−1∑n=0

|µ(A ∩ T−nA)− µ(A)µ(A)| = 0.

Since A = T−nA for any integer n, we know that µ(A) = µ(A)2. Therefore, µ(A)is either 0 or 1. This implies the ergodicity of (X,B, T, µ).

10 ZIJIAN WANG

Remark 2.37. We can finally conclude that:

Mixing⇒Weak mixing⇒ Ergodic.

Now we provide some characterizations of weak mixing systems via product spaces.The following lemma, which characterizes bounded real sequences that has zeroCesaro sum, is a fundamental and elementary result in real analysis.

Lemma 2.38. If an ⊂ R is a nonnegative bounded sequence that has zero Cesaro

sum, i.e. limN→∞∑Nn=0 an = 0, then there exists an index set J ∈ N with density7

zero such that

limn→∞,n/∈J

an = 0.

This lemma yields a direct corollary on weak mixing systems when we take an tobe |µ(A ∩ T−nB)− µ(A)µ(B)|.

Corollary 2.39. Given a weak mixing system (X,BX , TX , µ) and A,B ∈ B, thereexists an index set J ∈ N with density zero such that

limn→∞,n/∈J

|µ(A ∩ T−nB)− µ(A)µ(B)| = 0.

Remark 2.40. Corollary 2.39 shows that weak mixing systems are very similar tostrong mixing systems when we ignore an index set with zero density.

Theorem 2.41. Given a measure-preserving system (X,BX , TX , µ) following areequivalent:1. (X,BX , TX , µ) is weak mixing.2. Given any ergodic system (Y,BY , TY , ν), (X × Y,BX ⊗ BY , TX × TY , µ ⊗ ν) isergodic.3. (X ×X,BX ⊗ BX , TX × TX , µ⊗ µ) is weak mixing.

Proof. (3) ⇒ (1): Take A,B ∈ BX , we consider sets A × X,B × X ∈ BX ⊗ BX .Notice that for any measurable set C ∈ BX , µ⊗µ(C×X) = µ(C). We can concludethat X is weak mixing by the weak mixing of X ×X.(1) ⇒ (3): It suffices to check the definition of weak mixing for rectangle setsA×B,C×D ∈ BX ⊗BX . By Corollary 2.39, there exists index sets K,L ∈ N withdensity zero such that

limn→∞,n/∈K

|µ(A ∩ T−nC)− µ(A)µ(C)| = 0,

limn→∞,n/∈L

|µ(B ∩ T−nD)− µ(B)µ(D)| = 0.

Notice that a finite union of zero density sets has zero density. We take J = K ∪Land have

limn→∞,n/∈J

|µ(A ∩ T−nC)− µ(A)µ(C)| = 0,

limn→∞,n/∈J

|µ(B ∩ T−nD)− µ(B)µ(D)| = 0.

7We have defined density in Remark 3.2.


Therefore,

limn→∞,n/∈J

|µ⊗ µ((A×B) ∩ T−n(C ×D))− µ⊗ µ(A×B)µ⊗ µ(C ×D)|

= limn→∞,n/∈J

|µ⊗ µ((A ∩ T−nC)× (B ∩ T−nD))− µ(A)µ(B)µ(C)µ(D)|

= limn→∞,n/∈J

|µ(A ∩ T−nC)µ(B ∩ T−nD)− µ(A)µ(B)µ(C)µ(D)|

=0.

This proves that TX × TX is weak mixing since J has zero density.(1) ⇒ (2): It suffices to prove Corollary 2.31 for rectangular sets in X × Y . Forany pair of rectangular sets A×B,C ×D ∈ BX ⊗ BY , we need to show that

limN→∞

1

N

N∑n=0

µ⊗ ν((A×B) ∩ T−n(C ×D)) = µ⊗ ν(A×B)µ⊗ ν(C ×D).

Actually,

limN→∞

1

N

N∑n=0

µ⊗ ν((A×B) ∩ T−n(C ×D))

= limN→∞

1

N

N∑n=0

µ(A ∩ T−nC)ν(B ∩ T−nD)

= limN→∞

1

N

N∑n=0

µ(A)µ(C)ν(B ∩ T−nD)

+ limN→∞

1

N

N∑n=0

(µ(A ∩ T−nC)− µ(A)µ(C))ν(B ∩ T−nD)

=µ(A)µ(C)ν(B)ν(D) +1

N

N∑n=0

(µ(A ∩ T−nC)− µ(A)µ(C))ν(B ∩ T−nD)

(by the ergodicity of (Y,BY , TY , ν))

≤µ(A)µ(C)ν(B)ν(D) +1

N

N∑n=0

(µ(A ∩ T−nC)− µ(A)µ(C))

(since ν(B ∩ T−nD) ≤ 1)

=µ(A)µ(C)ν(B)ν(D) (since (X,BX , TX , µ) is weak mixing)

=µ⊗ ν(A×B)µ⊗ ν(C ×D).

This shows that (X × Y,BX ⊗ BY , TX × TY , µ⊗ ν) is ergodic.(2)⇒ (1) Assume that for any ergodic system (Y,BY , TY , ν), (X×Y,BX⊗BY , TX×TY , µ⊗ ν) is ergodic. We first consider the ergodic system (Y ′,BY ′ , idY ′ , ν) whereY ′ only contains a single element e and idY ′ is the identity element on Y ′. Thereis a canonical isomorphism φ : X × Y ′ → X that sends (x, e) ∈ X × Y ′ to x ∈ X.Therefore (X,TX) is ergodic since it is isomorphic to (X × Y ′, TX × idY ′) and(X × Y ′, TX × idY ′) is ergodic by (2). We can further deduce that (X ×X,TX ×TX) is ergodic by applying (2) again knowing that (X,TX) is ergodic. Recallthat in order to show that (X,BX , TX , µ) is weak mixing, we need to prove that

limN→∞1N

∑N−1n=0 |µ(A ∩ T−nX B) − µ(A)µ(B)| = 0 for all A,B ∈ BX . Actually,

12 ZIJIAN WANG

it suffices to show that limN→∞1N

∑N−1n=0 |µ(A ∩ T−nX B) − µ(A)µ(B)|2 = 0 by a

general result in real analysis. Indeed,

limN→∞

1

N

N−1∑n=0

(µ(A ∩ T−nX B)− µ(A)µ(B))2(2.42)

= limN→∞

1

N

N−1∑n=0

µ(A ∩ T−nX B)2 + limN→∞

1

N

N−1∑n=0

µ(A)2µ(B)2(2.43)

−2 limN→∞

1

N

N−1∑n=0

µ(A ∩ T−nX B)µ(A)µ(B)(2.44)

= limN→∞

1

N

N−1∑n=0

µ(A ∩ T−nX B)2 + µ(A)2µ(B)2(2.45)

−2µ(A)µ(B) limN→∞

1

N

N−1∑n=0

µ(A ∩ T−nX B).(2.46)

In order to compute the two terms

limN→∞

1

N

N−1∑n=0

µ(A ∩ T−nX B)2

and

limN→∞

1

N

N−1∑n=0

µ(A ∩ T−nX B),

we apply the ergodicity of (X ×X,TX × TX) and deduce that

limN→∞

1

N

N−1∑n=0

µ(A ∩ T−nX B)

= limN→∞

1

N

N−1∑n=0

µ⊗ µ((A×X) ∩ (TX × TX)−n(B ×X))

=(µ⊗ µ)(A×X)(µ⊗ µ)(B ×X)

=µ(A)µ(B),

limN→∞

1

N

N−1∑n=0

µ(A ∩ T−nX B)2

= limN→∞

1

N

N−1∑n=0

(µ⊗ µ)((A×A) ∩ (TX ⊗ TX)−n(B ×B))

=(µ⊗ µ)(A×A)(µ⊗ µ)(B ×B)

=µ(A)2µ(B)2.


Now we can apply (2.46) and have that

limN→∞

1

N

N−1∑n=0

(µ(A ∩ T−nX B)− µ(A)µ(B))2

= limN→∞

1

N

N−1∑n=0

µ(A ∩ T−nX B)2 + µ(A)2µ(B)2

−2µ(A)µ(B) limN→∞

1

N

N−1∑n=0

µ(A ∩ T−nX B)

=µ(A)2µ(B)2 + µ(A)2µ(B)2 − 2µ(A)µ(B)µ(A)µ(B)

=0.

�

Corollary 2.47. If (X,BX , TX , µ) and (Y,BY , TY , ν) are both weak mixing, then(X × Y,BX ⊗ BY , TX × TY , µ⊗ ν) is weak mixing.

Proof. Take an arbitrary ergodic system (Z,BZ , TZ , κ). Since (Y,BY , TY , ν) is weakmixing, by the equivalence that we established in Theorem 2.41, we have that thesystem (Y × Z,BY ⊗ BZ , TY × TZ , ν ⊗ κ is ergodic. We apply Theorem 2.41 againand we know that (X × Y ×Z,BX ⊗BY ⊗BZ , TX × TY × TZ , µ⊗ ν ⊗ κ) is ergodicbecause (X,BX , TX , µ) is weak mixing. Since (Z,BZ , TZ , κ) is chosen arbitrarily,we can conclude that (X × Y,BX ⊗ BY , TX × TY , µ⊗ ν) is weak mixing. �

2.2. Compact systems. We have seen some compact systems already but herewe provide the formal definition about what we mean by compact systems.

Definition 2.48. A system (X,B, T, µ) is compact if the orbit of every L2 func-tions on X is precompact.

Remark 2.49. We sometimes call functions that have precompact orbits almostperiodic functions. Another way of saying a dynamical system is compact is thatevery L2 functions on it is almost periodic. We now give a different characterizationof dynamical compactness and we will be using this to prove that compact systemsare SZ.

Theorem 2.50. A system (X,B, T, µ) is compact if and only if for any f ∈ L2(X)and ε > 0 the set {k|

∥∥f − UkT f∥∥2< ε} ⊂ N has bounded gaps.

Remark 2.51. Integer sets that have bounded gaps are called syndetic sets. Belowis a formal definition for syndetic sets but it is much easier to just think of them assets that has bounded gaps.

Definition 2.52. A set A ⊂ N is syndetic if there exists a finite set of integers

{ai}1≤i≤k such that N ⊂⋃ki=1A− ai.

Proof of Theorem 2.50. Given an arbitrary L2 function f , let Aε ⊂ N be{k|∥∥f − UkT f∥∥2

< ε}. It suffices to show that Orb(f) is totally bounded. Indeed,

for any ε > 0 there exists a finite set of integers {εi}1≤i≤k ⊂ N such that N ⊂⋃ki=1Aε − εi. This means that Orb(f) ⊂

⋃ki=1B(U−εiT , ε)8. This proves totally

boundedness since ε is arbitrary. �

8B(x, r) means the open ball around the point x with radius r.

14 ZIJIAN WANG

Example 2.53. It is helpful to think of rotations when we are dealing with compactsystems. Recall the rotation system on the circle {T, Rα, µ} where α /∈ Q. Given aL2 function f , the orbit of f is Orb(f) = {f(x+ nα)}n∈N. It might not be obviousat first glance that the set Orb(f) is precompact. However, the conclusion is quiteobvious if we use the second characterization of compactness given in Theorem 2.50.This follows from the fact that the orbit of any point on a circle is equidistributedunder irrational rotations.

Remark 2.54. The rotations does not have to be on the circle. In fact, we can definerotation on any compact abelian group.

Definition 2.55. A Kronecker system is defined by (G,Rα) where G is a compactabelian group and Rα acts on G by translation by α, i.e. Rα(g) = gα for someα ∈ G.

We conclude the introduction to basic dynamical systems by stating the twouseful results about compact systems and weak mixing systems [1].

Theorem 2.56. If a compact system is ergodic, then it is isomorphic to a kroneckersystem.

Remark 2.57. Now we have seen the two fundamental structures of dynamicalsystems, namely the weak mixing systems and the compact systems. Interestinglyenough, the two fundamental systems are “opposite” to each other in the followingsense.

Theorem 2.58. A dynamical system is weak mixing if and only if it has no non-trivial compact factors.

Remark 2.59. In other words, a measure-preserving system is either weak mixingor has at least one nontrivial compact factor.

2.3. Factor and extension. We have already encountered a special kind of exten-sion in Example 2.14. Here, we generalize the idea of an isomorphism and introducethe notion of an extension.

Definition 2.60. Given two measure-preserving systems X = (X,BX , µ, TX) andY = (Y,BY , ν, TY ), we say that Y is a factor of X if there is some measure-preserving factor map φ defined almost everywhere such that the diagram belowcommutes.

X X

Y Y

TX

φ φ

TY

We say that X is an extension of Y.

Remark 2.61. A factor map is weaker than an isomorphism in the sense that wedo not require the factor map to be invertible.

Example 2.62. Every measure-preserving system has a trivial factor, namely thefactor that consists of a single element. Factors may also be created by taking subσ−algebras. Given a measure-preserving system (X,B, µ, T ) and B′ ⊂ B a propersub σ−algebra, we can view the system (X,B′, µ, T ) as a factor of (X,B, µ, T )where the factor map is given by the identity map idX . Despite the fact that id−1

X


is clearly defined almost everywhere, idX is not an isomorphism. Note that B′is strictly smaller than B. We will see a generalization of this example later inExample 2.71 where we talk about condition measures.

2.4. Conditional measures. The concept of conditional measure is useful whenwe are working with more than one dynamical systems at the same time, e.g.relatively weak mixing extensions. It allows us to construct a measure with a givensub σ−algebra. In our case of scenario, the smaller σ−algebra is usually generatedby either the fibers of the measure-preserving map that we are working with or thepullback of the σ−algebra of the factor as we will see in Example 2.71. Before goingdirectly into conditional measures, we first take a look at conditional expectations.

Definition 2.63. Given a probability space (X,B, µ) and an integrable function f ,the expectation of f is defined by E(f) =

∫Xfdµ9. If C ⊂ B is a sub σ−algebra,

the conditional expectation of f on C is E(f |C), where E(f |C) is the uniqueelement in L1(X, C, µ) such that the following is true:(1) The function E(f |C) is a measurable function on (X, C, µ).(2) For each C ∈ C,

∫CE(f |C)dµ =

∫Cfdµ.

Remark 2.64. The existence and uniqueness of conditional expectation is a conse-quence of the Radon-Nikodym theorem. We will only use the notions discussed inthis section as black boxes since they are not the focus of this paper.

Example 2.65. (X,B, µ) is a probability space and {Pi}1≤i≤n a partition of X 10.We consider C, a finite sub σ−algebra generated by the given partition {Pi}1≤i≤n.The conditional expectation of an integrable function f is given by E(f |C)(x) =∫Pifdµ

µ(Pi)if x ∈ Pi. This is clearly measurable under C. Moreover, for any measurable

set C ∈ C, we have that∫CE(f |C)dµ =

∫Cfdµ. This is trivial for each C ∈

{Pi}1≤i≤n but every set in C is a finite union of the elements in the partition. Notethat the conditional expectation is defined almost everywhere, therefore it does notmatter if an element in the partition has measure zero.

Theorem 2.66. Given a probability space (X,B, µ) and sub σ−algebras D ⊂ C ⊂ Band f ∈ L∞(X, C, µ), g ∈ L∞(X,B, µ),the following are true:(1) E(fg|C) = fE(g|C),(2) E(E(g|C)|D) = E(g|D).

Remark 2.67. The first statement in Theorem 2.66 implies that an integrable func-tion that is measurable in the smaller sub σ−algebra can be considered as a “con-stant” thus can be pulled out. One can also think of σ−algebras as the collection ofinformation and the conditional expectation with respect to a given sub σ−algebrais just the expectation of an event given the information that we have at hand. Ifwe have no information at hand, i.e. the sub σ−algebra is trivial, then the bestguess we can give is the usual expectation.

Example 2.68. When D is the trivial σ−algebra, we have that E(E(f |C)) = E(f),which is exactly what we required in the Definition 2.63.

9We use E and E to distinguish the usual expectation and the conditional expectation10A collection of subsets of a given space is called a partition if the elements in the collection

are mutually disjoint and the union of all the elements is the entire space.

16 ZIJIAN WANG

Definition 2.69. Let (X,B, µ) be a probability space. Suppose that we have asub σ−algebra C ⊂ B. Then for almost every x ∈ X, we can define a family of ofprobability measures (on X) {µx}x∈X . We call them the conditional measures.11

The conditional measures satisfy the following properties:(1) E(f |C)(x) =

∫Xf(t)dµx(t),

(2) The map x 7→ µx is measurable with respect to C,12

(3) When C is countably generated, µx = µy if and only if they are in the sameatom of C. 13

Remark 2.70. Notice that some regularity conditions on the sub σ−algebra is re-quired to make sense of statement (3) in Definition 2.69. Although countability ofthe sub σ−algebra is a fair assumption to make, e.g. in Example 2.71, conditionalmeasures can still be defined without countability. However, (3) would not work ifthe sub σ−algebra is not countably generated. This is just because the intersectionof uncountably many measurable sets is not necessarily measurable.

Example 2.71. Just as we have mentioned at the start of the introduction toconditional measure, we are going to use this tool when we need to work with

extensions. Suppose (X,BX , µ, Tx)φ−→ (Y,BY , ν, Ty) is an extension between two

dynamical systems. C = φ−1BY gives a sub σ−algebra of BX . Using C we can definea family of conditional measures {µx}x∈X for almost every x and a C−measurable

map xδx7−→ µx. Actually, we are working with the φ−1BY , so µx = µx′ are the same

if x and x′ are in the same fiber of φ. Moreover, we can think of the conditional

measures as {µy}y∈Y . A canonical measurable map yδy7−→ µy can be given by the

diagram below

X {µx}x∈X

Y {µy}y∈Y .

δx

φ φ′

δy

Since the spaces L2(X, C, µ, Tx) and L2(Y,BY , ν, Ty) are isomorphic, we can in-troduce the following notation used by Furstenberg [1]. Given a function f ∈L2(X,B, µ), we define the ”conditional” expectation E(f |Y)(y) =

∫fdµy. Recall

that in order for φ to be an isomorphism, we need the following diagram to commutea.e.

X X

Y Y.

Tx

φ φ

Ty

Therefore, we have the following identities deduced from Theorem 2.66. Given aC−measurable function g,

E(gf |Y) = gE(f |Y),

UTyE(f |Y) = E(UTxf |Y).

11Note that the family of measures depends on the choice of C.12We can think of the space of the measures on X as the linear functionals on L∞(X).13Given a measure space (X,B, µ) where B is countably generated, the atom containing the

point x is defined to be the intersection of all the measurable sets containing x.


2.5. Weak mixing and compactness for extensions. We can generalize thecompactness and weak mixing properties of dynamical systems to relative com-pactness and relative weak mixing properties with the help of extensions and con-ditional measures. All the definitions below assumes that X = (X,BX , µ, Tx) is anextension of Y = (Y,BY , ν, Ty).

Definition 2.72. A function f ∈ L2(X,µ) is almost periodic with respect toY if for every ε > 0, there exists a finite collection of functions {gi}1≤i≤k ⊂ L2(X,µ)such that min1≤i≤r ‖UnT f − gi‖L2

µy

< ε almost everywhere for all n ≥ 1.

Remark 2.73. Note that the definition of relative almost periodicity involves theconditional measure µy constructed in Example 2.71. They are probability measuresfor X instead of Y but there is a measurable map y 7→ µy.

Definition 2.74. The extension X → Y is compact if the set of functions almostperiodic with respect to Y is dense in L2(X,µ).

In order to generalize weak mixing, we need to first introduce the notion ofrelatively independent joining. There are several different constructions of relativelyindependent joining, here we introduce the formulation used by Einsiedler and Ward[2].

Definition 2.75. Given two measure-preserving systems (X,BX , µ, TX) and(Y,BY , ν, TY ), a joining is a TX × TY−invariant measure δ defined on the space(X × Y,BX ⊗ BY ) such that the projections of δ onto X and Y coordinates are µand ν respectively, i.e.(1) δ(AX × Y ) = µ(AX) for all AX ∈ BX ,(2) δ(AY ×X) = ν(AY ) for all AY ∈ BY .

Example 2.76. The product measure µ× ν is always a joining by construction.

Definition 2.77. Given two invertible measure-preserving systems (X,BX , µ, TX)and (Y,BY , ν, TY ) that shares a common non-trivial factor (Z,BZ , δ, TZ) via factormaps φX and φY , we define the relatively independent joining µ×δν(, or X×ZY ) inthe following way. Let B′X be φ−1

X BZ and B′Y be φ−1Y BZ . We can define conditional

measures on X and Y using the sub σ−algebras B′X and B′Y . Let µ′φX(x) be µx and

νφY (y)] be νy and

µ×δ ν =

∫Z

µ′z × ν′zdδ(z).

Definition 2.78. The extension X → Y is weak mixing relative to Y if thesystem (X ×X,µ×Y µ, T × T ) is ergodic.

Remark 2.79. Recall that in Theorem 2.41, we have shown that a dynamical system(Z, C, δ, S) is weak mixing if and only if the product system (Z×Z, C⊗C, δ×δ, S×S)is ergodic. Now back to our definition of relatively weak mixing. Note that therelatively independent joining µ×Y µ is exactly µ× µ if Y is a trivial factor of X .In other words, X is relatively weak mixing with respect to its trivial factor if andonly if the system (X×X,BX⊗B, µ×µ, TX×TX) is ergodic, i.e. X is weak mixingin the usual sense.

18 ZIJIAN WANG

2.6. The structure theorem. The theorem that we are going to introduce isusually referred to as the Furtsenberg-Zimmer structure theorem, which is a generalresult in ergodic theory and has nothing to do with Szemeredi’s theorem at firstglance. Instead of proving the structure theorem, we will only give a simplifiedversion of it used by Furstenberg in his proof of Szemeredi’s theorem [1].

Theorem 2.80. (X,BX , µX , TX) is a measure-preserving system. Suppose it hasa proper factor (Y,BY , µY , TY ), then one of the following is true:(1) The extension X → Y is relatively weak mixing.(2) There exists some intermediate factor (Z,BZ , µZ , TZ) such that the proper ex-tension Z → Y is compact (i.e. Y and Z are not isomorphic).

Remark 2.81. The structure theorem can be viewed as a way of decomposing anarbitrary dynamical system where the decomposition results in a tower of exten-sions and each extension is either relatively weak mixing or compact as the picturebelow shows.

X φ∞−−→ ...φ2−→ X2

φ1−→ X1φ0−→ X0

Each Xi represents a factor and each extension φi is either weak mixing or compact.The structure theorem can be very powerful with the help of transfinite induction.Imagine that we need to prove some property about a general dynamical system,we only need to check that such property lifts through both weak mixing and com-pact extensions and that there exists a maximal factor that satisfies such property.Actually, this is exactly how we are going to prove Szemeredi’s theorem.

3. Furstenburg’s proof of Szemeredi’s theorem

In this section, we illustrate Furstenberg’s proof of Szemeredi’s theorem. Theproof consists of three parts: showing that multiple recurrence of any measure-preserving system implies Szemeredi’s theorem, proving the multiple recurrenceproperty for two basic dynamical systems and finally using the structure theoremfrom Section 2.6 and the extension principles that we are going to establish to proveSzemeredi’s theorem.

3.1. General Strategy. In order to prove Szemeredi’s theorem using ergodic the-ory, the first step is to establish some sort of correspondence between a sequenceof numbers and a measure-preserving system. This is done by proving the corre-spondence principle in Theorem 3.9, which reduces our problem to proving thatevery measure-preserving system has some property SZ to be defined in Definition3.12. As we have seen in Theorem 2.80, an arbitrary dynamical system can bedecomposed into a tower of factors, where each extension is either relatively weakmixing or compact.

X φ∞−−→ ...φ2−→ X2

φ1−→ X1φ0−→ X0

Now we show that the SZ property can be passed through both relatively weakmixing extensions and compact extensions by the extension principles proved inTheorem 3.27 and Theorem 3.29. Finally, using Zorn’s lemma and a lemma byFurstenberg, we will be able to prove Szemeredi’s theorem.


3.2. Szemeredi’s theorem. Before starting to prove the theorem, we shall intro-duce some conventions and backgrounds.

Definition 3.1. Given a set of integers A ∈ Z, the upper Banach density of A

is defined to be lim supN→∞µ(A

⋂[−N,N ])

2N+1 , where µ is the usual counting measurefor the space of integers.

Remark 3.2. This notion of upper Banach density might seem weird at first glance.There’s actually a definition of natural density which replaces the lim sup with lim.Obviously there exist sets which doesn’t possess natural density, but the notion ofupper Banach density applies to all sets. One can define the upper Banach density

in a different but more conventional way, e.g. lim supM−N→∞µ(A∩[N,M ])

M−N . It is nothard to show that these two definitions are actually equivalent in the sense thatthe property of having positive upper density is preserved. Indeed, we do not careabout the exact numerical value of the upper density. Throughout this paper, weare going to use the first definition.

Example 3.3. The set of odd numbers has density 12 and has the same upper

density.

Definition 3.4. Given a set of integers A ∈ Z, we say that it contains k-termarithmetic progression if there exists integers n, d ∈ Z such that n+ id ∈ Z fori ∈ [0, k − 1]. Alternatively, we say that the set A contains k-AP.

Theorem 3.5. (Szemeredi). Any set of integers A ∈ Z that has positive upperdensity contains k-AP for all k ∈ N.

3.3. Correspondence. In this part, we introduce the notion of multiple recur-rence. In fact, multiple recurrence of any order for any measure-preserving systemis equivalent to Szemeredi’s theorem. However for the sake of our argument, itsuffices to show that multiple recurrence of all orders implies Szemeredi property.Once we have established the correspondence between Szemeredi’s theorem andmultiple recurrence, we can use the ergodic theory gadgets that we introduced inthe second section to prove Szemeredi’s theorem.

To get started, we first recall the definition for multiple recurrence.

Definition 3.6. A measure-preserving system (X,B, T, µ) has multiple recur-rence of order k if for each A ∈ B that has positive measure, there exist n ∈ Nsuch that

µ

(k−1⋂i=0

T−inA

)> 0.

In order to prove the correspondence principle, we need to introduce a classicalresult from functional analysis.

Theorem 3.7. (Banach-Alaoglu) Given a separable normed linear space X, theclosed unit ball in the dual space X∗ is sequentially compact under the weak* topol-ogy.

20 ZIJIAN WANG

Remark 3.8. It is obvious that in Euclidean spaces, unit balls are closed andbounded therefore compact. However, a theorem of Riesz states that closed unitballs in infinite dimensional spaces are not compact in the usual norm topology.However, compactness of unit balls is regained in infinite dimensional spaces underthe weak-* topology by the Banach-Alaoglu theorem. This theorem is extremelyuseful here since we are always working with infinite dimensional linear spaces inthe form of C(Y ) where Y is compact. C(Y ) is therefore separable.

Theorem 3.9. Multiple recurrence of any order for all measure-preserving systemsimplies Szemeredi’s theorem.

Proof. Given an integer sequence {an} ⊂ Z with positive upper Banach densityand a fixed natural number k ∈ N we show that {an} contains k-AP. Recall thedynamical system (2Z, T ) where T is the left Bernoulli shift. The sequence {an}naturally corresponds to a point x in our space 2Z where 14

(3.10)

{x[j] = 1 if j ∈ {an}x[j] = 0 otherwise

Let A = {Tnx}n∈Z be the closure of the two-sided15 orbit of x. The subspace Ais still compact since it is closed and 2Z is compact. Let B ⊂ A be the subsetof elements such that the 0th coordinate is 1. Observe that under the usual mea-sure defined in section 2, B has measure 0 since B is an at most countable unionof singletons, which has zero measure under the measure that we defined before.Therefore, we need to find a proper measure µ under which µ(B) > 0 in order touse the multiple recurrence property. The construction we give here is a standardone. Moreover, we need to relate the measure with the upper Banach density of thecorresponding sequence. For any point y ∈ A, we define δy to be the delta measuresupported at the point y. Observe that

1

2N + 1

N∑n=−N

δTnx(B) =|{aj}

⋂[−N,N ]|

2N + 1.

We call the measure defined by the average of the delta measures

νN =1

2N + 1

N∑n=−N

δTnx.

14We have defined x[j] to be the jth coordinate of x.15Usually when we refer to orbits, we assume that the number of times that we iterate is always

positive, i.e. we consider sets in the form of {Tnx}n∈N.


Since the sequence {an} has positive upper Banach density, we can assume thatthere exists a sequence of natural numbers {Ni} such that

limj→∞

νNj (B)

= limj→∞

1

2Nj + 1

Nj∑n=−Nj

δTnx(B)

= limj→∞

|{an} ∩ [−Nj , Nj ]|2Nj + 1

= lim supN→∞

|{an} ∩ [−N,N ]|2N + 1

> 0.

By construction {νNj} is a subset of the unit sphere in the space of measures.Therefore, there is a subsequence {Njl}k∈N such that the sequence {νNjl }k∈N con-

verges to some measure µ under the weak* topology. Moreover, µ(B) > 0 byweak* convergence since B is closed. We also need to verify that µ makes thesystem measure-preserving. Indeed,

νNjl − UT νNjl

=1

2Njl + 1

Njl∑n=−Njl

δTnx −Njl+1∑

n=−Njl+1

δTnx

=

1

2Njl + 1

(δT−Njl x

− δTNjl

+1x

),

which goes to 0 as l approaches infinity. Therefore, the limit µ is invariant underT . Now we are ready to use the multiple recurrence hypothesis on the dynamicalsystem (A, T, ν) and the set B. By multiple recurrence of order k, there exists some

positive integer n such that ν(⋂k−1i=0 T

−inB) > 0. Therefore, there is at least a point

y ∈ {Tnx}n∈N ∩⋂k−1i=0 T

−inB since B is open in A. Suppose y = Tmx, then theset {Tm+inx}0≤i≤k−1 is contained in B. In other words, {an} contains the k-termarithmetic progression {m+ in}0≤i≤k−1. Since k is arbitrary, we have Szemeredi’stheorem. �

Remark 3.11. Recall that we have shown the Poincare recurrence (Theorem 2.20).It is a very elementary result yet similar to the complicated Szemeredi’s theorem.Notice that for weak mixing systems, we automatically have multiple recurrence oforder 2 by definition. Weak mixing systems seem easier to deal with. Therefore,we prove a stronger result which looks quite similar to the weak mixing property,which also involve Cesaro limits. We call this property SZ for simplicity, which isshort for Szemeredi.

Definition 3.12. We call a measure-preserving system (X,B, T, µ) SZ if for eachA ∈ B that has positive measure and any positive integer k,

lim infN→∞

1

N

N∑n=1

µ

(k−1⋂i=0

T−inA

)> 0

Remark 3.13. Notice that in the proof of Szemeredi’s theorem, we provide differentcriteria that characterize the “Szemeredi” property, for instance, Definion 3.12.

22 ZIJIAN WANG

Some of these criteria are equivalent but the others are not. The following theoremshows that being SZ is stronger than having multiple recurrence of any order.

Theorem 3.14. A measure-preserving system (X,B, T, µ) has multiple recurrenceof any order if it is SZ.

Proof. For any positive integer k, X is SZ gives us

limN→∞

1

N

N∑n=1

µ

(k−1⋂i=0

T−inA

)> 0.

Then there must exist some integer n such that µ(⋂k−1

i=0 T−inA

)> 0. If not,

limN→∞1N

∑Nn=1 µ

(⋂k−1i=0 T

−inA)

would be 0 for all N . The limit as N → ∞would also be 0. �

Now we give an alternative characterization of the SZ property using functions.We will be using this to prove that weak mixing systems are SZ. As we will seelater, this formulation will also allow us to exploit certain special properties of com-pactness and relative compactness since they are characterized by almost periodicfunctions.

Theorem 3.15. A measure-preserving system (X,B, T, µ) is SZ if and only if forany function f ∈ L∞ such that f is nonnegative almost everywhere and

∫Xf > 0,

we have

(3.16) lim infN→∞

1

N

N−1∑n=0

∫X

k−1∏i=0

U inT fdµ > 0,

where UT is the associated operator of T .

Proof. Suppose Theorem 3.16 is true, let f = χA be the characteristic functionof a set A with positive measure. Then we have the SZ property. On the otherhand the SZ property guarantees Theorem 3.16 for all characteristic functions ofsets that have positive measures. We know that Theorem 3.16 is also true for allnonnegative L∞ functions f with positive integral if we take a sequence of simplefunctions approaching f . Now we apply the monotone convergence theorem16. �

Example 3.17. Consider the dynamical system (T, Rα) where T is the circle andRα : x 7→ x + α is an irrational rotation acting on the circle. We use the functionformulation of SZ in Theorem 3.15. For a fixed N , the Furstenberg ergodic average

looks like 1N

∑N−1n=0

∫X

∏k−1i=0 f(t−nα)dµ(t). We have proved in Example 2.25 that

irrational rotations are ergodic, which means that {nα}n∈N is equidistributed onthe circle. Therefore, when N →∞ the ergodic average is equivalent to∫

T

∫T

k∏i=0

f(x− it)dµ(x)dµ(t).

We can see that the double integral is positive by considering t sufficiently small.Note that the expectation of f is strictly positive by assumption. When t is suf-ficiently small, the double integral is close to CE(f)k, where C is some positiveconstant depending on the restriction that we put on t.

16This is the reason that we require f to be nonnegative a.e.


3.4. Two fundamental systems. We prove that the two dynamical systems thatare very well-structured has the SZ property, namely the compact systems andweak mixing systems.

3.4.1. Weak mixing systems. In order to prove that weak mixing systems are SZ,we need to introduce the following lemma, which is a very classical result in realanalysis. It is sometimes known as the van der Corput trick.

Lemma 3.18. (van der Corput). Suppose we have a bounded sequence {vn}n≥1

in some Hilbert space X. We define a sequence of real numbers {an}n≥0 ⊂ R such

that ai = lim supN→∞ | 1N

∑Nn=1〈vn+i, vn〉|. If limN→∞

1N

∑N−1n=0 an = 0, then we

have limN→∞ || 1N

∑Nn=1 vn|| = 0

Theorem 3.19. Given a weak mixing system (X,B, T, µ) and a finite set of kfunctions {fi}1≤i≤k ⊂ L∞(X), we have

limN→∞

1

N

N−1∑n=0

k∏i=1

U inT fi =

k∏i=1

∫X

fidµ

in L2.

Remark 3.20. It is clear that Theorem 3.19 implies the “function formulation ofSZ” stated in Theorem 3.16 where we take a single function instead of k differentfunctions. It is actually not quite surprising why in the case of weak mixing systemswe not only know that the limit is larger than zero when all the functions arenonnegative and have positive expectation, but also have information on exactlywhat the limit converges to. Indeed we know that if (X,B, T, µ) is weak mixing, then1N

∑N−1n=0 |µ(A

⋂T−nB)− µ(A)µ(B)| goes to 0 for any measurable set A,B ∈ B.

Proof. The proof goes by induction.When k = 1, by mean ergodic theorem and the fact that weak mixing implies er-

godicity, we know that limN→∞1N

∑N−1n=0 U

nT f1 exists and equals to the expectation

of f1.For simplicity, we will only show how the k = 1 case implies the k = 2 case insteadof doing a general inductive step. Therefore, we need to show that

limN→∞

1

N

N−1∑n=0

UnT f1U2nT f2 =

∫X

f1dµ

∫X

f2dµ.

Notice that if f1 = c for some constant c, then

limN→∞

1

N

N−1∑n=0

UnT f1U2nT f2

=c limN→∞

1

N

N−1∑n=0

U2nT f2

=c

∫X

f2 (By the k = 1 case)

=

∫X

f1

∫X

f2.

24 ZIJIAN WANG

We can also reduce to the base case when f2 is constant. In order to apply the vander Corput lemma, we need some normalization to make sure that

∫Xf1

∫Xf2 =

0. Actually, we can assume without loss of generality that f1 has expectationzero. Indeed, we can always replace f1 by f1 −

∫Xf1dµ. It suffices to show that

limN→∞1N

∑N−1n=0 U

nT f1U

2nT f2 = 0. We define a sequence {vn}n≥1 in the Hilbert

space L2 by vn = UnT f1U2nT f2. Now we compute the real numbers {ai}i≥0 defined

in the van der Corput lemma and we have

ai = limN→∞

1

N

N−1∑n=0

〈vn, vn+i〉

= limN→∞

1

N

N−1∑n=0

∫X

UnT f1U2nT f2U

n+iT f1U

2n+2iT f2dµ

= limN→∞

1

N

N−1∑n=0

∫X

f1UnT f2U

iT f1U

n+2iT f2dµ (T is measure-preserving)

= limN→∞

1

N

N−1∑n=0

∫X

(f1UiT f1)UnT (f2U

2iT f2)dµ

=

∫X

f1UiT f1dµ

∫X

f2U2iT f2dµ (by Corollary 2.30).

By the characterizations of weak mixing systems in section 2, we know that {X ×X,T × T 2, µ⊗ µ} is also weak mixing.17 We define g ∈ L2(X ×X) by g(x1, x2) =f1(x1)f2(x2). Recall that in the van der Corput lemma, we require that

limN→∞1N

∑N−1n=0 an = 0. Indeed, we have

limN→∞

1

N

N−1∑n=0

an = limN→∞

1

N

N−1∑n=0

∫X

f1UnT f1dµ

∫X

f2U2nT f2dµ

= limN→∞

1

N

N−1∑n=0

∫X

∫X

g(x1, x2)UnT×T 2g(x1, x2)dµ(x1)dµ(x2)

= limN→∞

1

N

N−1∑n=0

∫X×X

gUnT×T gdµ⊗ µ

=

∫X×X

gdµ⊗ µ∫X×X

gdµ⊗ µ

(by ergodicity and Corollary 2.30)

= (

∫X×X

gdµ⊗ µ)2

= (

∫X

f1dµ

∫X

f2dµ)2

= 0 (since the expectation of f1 is 0).

.

17We will need {Xk,∏k

i=1 Ti} to be weak mixing in the general inductive step but this is also

clear directly from the properties of weak mixing systems.


By the van der Corput lemma, we have

0 = limN→∞

1

N

∥∥∥∥∥N−1∑n=0

vn

∥∥∥∥∥2

= limN→∞

1

N

∥∥∥∥∥N−1∑n=0

UnT f1U2nT f2

∥∥∥∥∥2

.

This completes the proof. �

Theorem 3.21. If a measure-preserving system (X,B, T, µ) is weak mixing, thenit is SZ.

Proof. In order to show that X is SZ, we need to prove that for any function f ∈ L∞such that f is nonegative a.e. and

∫Xf > 0,

lim infN→∞

N−1∑n=0

1

N

∫X

k−1∏i=0

U inT fdµ > 0.

By Theorem 3.19, we have

lim infN→∞

N−1∑n=0

1

N

∫X

k−1∏i=0

U inT fdµ = (

∫X

fdµ)k > 0

. �

3.4.2. Compact systems. Recall that Theorem 2.56 gives a strong and useful charac-terization about compact systems. Kronecker systems are much easier to visualizesince they can be viewed as rotations. However, we will prove that compact systemsare SZ directly from definition.

Theorem 3.22. If the measure-preserving system (X,B, T, µ) is compact, then itis SZ.

Proof. Given any nonnegative L∞ function f with positive expectation and a fixedinteger k, we need to show that

lim infN→∞

1

N

N−1∑n=0

∫X

k−1∏i=0

U inT fdµ > 0.

Without loss of generality, we may assume that 0 < ‖f‖∞ ≤ 1. Indeed, we can

always replace f by f‖f‖∞

. As before, define Aε ⊂ N to be {k|∥∥f − UkT f∥∥2

< ε}. By

Theorem 2.50, we know that for each 0 < ε < 118 there is some large finite integergε ∈ N such that gε is larger then any gap in the set Aε. Given any nonnegative L∞

function f with positive expectation and a fixed integer k, we need to show that

lim infN→∞

1

N

N−1∑n=0

∫X

k−1∏i=0

U inT fdµ > 0.

18It is not strictly required for ε to be small than 1. It is just a technical issue which willappear later in the proof.

26 ZIJIAN WANG

If n ∈ A ε

k2k, then for 1 ≤ i ≤ k we have

∥∥f − U inT f∥∥2≤i−1∑j=0

∥∥∥U jnT f − U (j+1)nfT

∥∥∥2

(Triangle inequality)

≤k−1∑j=0

∥∥∥U jnT f − U (j+1)nfT

∥∥∥2

≤k−1∑j=0

‖f − UnT f‖2 (T is measure-preserving)

=k ‖f − UnT f‖2≤ ε

2k.

Let hi be U inT f − f . It has L2 norm no greater than ε2k

.19 Observe that for eachfixed 1 ≤ s ≤ k and for a finite index set {jl}1≤l≤s ⊂ [1, k],∣∣∣∣∣

∫X

fk−ss∏l=1

hjldµ

∣∣∣∣∣ ≤∣∣∣∣∣∫X

s∏l=1

hjldµ

∣∣∣∣∣ (because ‖f‖∞ < 1)(3.23)

=

∣∣∣∣∣〈hj1 ,s∏l=2

hjl〉

∣∣∣∣∣(3.24)

≤ ε

2k(by Holder’s inequality),(3.25)

where we assume s ≥ 2 in (3.24) since the result is trivial when s = 1. Now, wehave ∣∣∣∣∣

∫X

k−1∏i=0

U inT fdµ−∫X

fkdµ

∣∣∣∣∣ =|∫X

k−1∏i=0

(f + hi)dµ−∫X

fkdµ|

≤(2k − 1)ε

2k(by Lemma 3.23)

<ε.

In other words,∫X

∏k−1i=0 U

inT fdµ is at least

∫Xfkdµ−ε. We can take ε to be smaller

than∫Xfkdµ so that ∫

X

k∏i=0

U inT fdµ >

∫X

fkdµ− ε > 0.

Recall that since f is nonnegative a.e.,∫X

∏ki=0 U

inT fdµ ≥ 0 even when n is not in

A ε

k2k. The biggest gap in A ε

k2kis smaller than the constant g ε

k2kby the definition

of gε. Therefore if we take the average, we get

lim infN→∞

1

N

N−1∑n=0

∫X

k−1∏i=0

U inT fdµ ≥∫Xfkdµ− εg ε

k2k

> 0

. �

19Notice that h0 = 0.


Remark 3.26. From the proof of Theorem 2.50 we can see that compactness isactually a very strong property since it provides us with the tools to approximate alarge portion of points on the orbit. One may observe that the proof above is quitesimilar to Example 3.17, which proves SZ for irrational rotations.

3.5. Extension principles. Now we introduce the two extension principles. Wewill not prove the extension principles in full detail since they are actually analogousto the proof of the two fundamental systems.

Theorem 3.27. If X = (X,BX , µ, Tx) is a compact extension of Y = (Y,BY , ν, Ty)and Y is SZ, then X is also SZ.

Remark 3.28. Just as in the proof of compact systems, given a function f ∈ L2(X)

we try to find a set Bn for each n such that on the set Bn,∫X

∏ki=0 U

inTxfdµ is very

close to∫Bn

fk. This is strictly positive, so the ergodic average

1

N

N−1∑n=0

∫X

k−1∏i=0

U inTxfdµ

is also positive as N →∞. The fact that the extension X → Y is compact ensuresthat the constant does not get too small or too close to zero, just as the gap gε isbounded in Theorem 3.22. However, since we are working with relative compact-ness instead of compactness, we need to make some adjustments. We consider thespace ⊕k−1

i=0 L2(X,µy) where the norm is given by ‖(f0, f1, ..., fk−1)‖ = max ‖fj‖L2

µy

.

Next, we investigate the set S = {(f, UnTxf, ..., U(k−1)nTx

f}n∈Z ⊂ ⊕k−1i=0 L

2(X,µy).We are able to use the hypothesis that Y is SZ now since there is a measur-able map y → µy. We start with some set B ⊂ X such that E(f |Y)20 is nottoo small. Then we keep reducing B if necessary and finally reach some setBn such that there is a subset of S that is at most ε separated for some smallε in the space S. Finally, we are able to find some integer εn depending on

Bn and approximate 1N

∑N−1n=0

∫X

∏k−1i=0 U

iεnTxfdµ by

∫Bn

fkdµ, which is positive.

Similar to Theorem 3.22, the set nm has bounded gap d and we would have

limN→∞1N

∑N−1n=0

∫X

∏k−1i=0 U

inTxfdµ is larger than C 1

d

∫Xfkdµ where C is some

constant. This is strictly larger than zero since f has positive expectation. As wecan see, this theorem is actually just a relative version of Theorem 3.22.

Theorem 3.29. If X = (X,BX , µ, Tx) is a weak mixing extension of Y = (Y,BY , ν, Ty)and Y is SZ, then X is also SZ.

Remark 3.30. In the case of weak mixing extensions, the limit does not neces-sary exists as in Theorem 3.21. However, we are able to approximate the product∏k−1i=0 U

inTxf by

∏kl=0 U

lnTyE(fl|Y). This uses following lemma stated in Furstenberg’s

proof [1]. This is just a relative version of Theorem 3.21.

Lemma 3.31. Let (X,BX , µ, TX) be a relatively weak mixing extension of (Y,BY , ν, TY ).Fix any integer k ∈ N. Then if fl ∈ L∞(X,B, µ), l = 0, 1, ..., k, we have the follow-ing two equalities:

(1) limN→∞1N

∑Nn=1

∣∣∣E(∏kl=0 U

lnT fl|Y)−

∏kl=0 U

lnS E(fl|Y)

∣∣∣2 = 0,

(2) limN→∞

∥∥∥ 1N

∑Nn=1(

∏kl=1 U

lnT fl −

∏kl=1 U

lnT E(fl|Y))

∥∥∥L2

= 0.

20This is defined in Example 2.71.

28 ZIJIAN WANG

3.6. Conclusion. To complete the proof of Szemeredi’s theorem, we need the fol-lowing lemma proved by Furstenberg [1].

Lemma 3.32. Let X = (X,BX , µ, Tx) be a measure-preserving system and S bethe family of factors of X that are SZ. Suppose R ⊂ S is a totally ordered21 familyof factors, then supR ∈ S, i.e. supR is SZ.

A direct application of Zorn’s lemma combined with the above result proves theexistence of a maximal factor.

Theorem 3.33. Let X = (X,BX , µ, Tx) be a measure-preserving system. Thefamily of factors of X that are SZ contains a maximal element.

Theorem 3.34. Every measure-preserving system is SZ.

Proof. Let X be an arbitrary measure-preserving system. If X is weak mixing,then by Theorem 3.21, X is already SZ. Suppose X is not weak mixing. We knowthat X has some nontrivial compact factor Y′ by Theorem 2.58. Moreover, Y′ isSZ according to Theorem 3.22. Let Y be the maximal SZ factor of X . Suppose Yis a proper factor of X . If the extension X → Y is relatively weak mixing, we knowthat X is also SZ by Theorem 3.29. In this case, Y cannot be maximal. On theother hand, if the extension X → Y is not relatively weak mixing, then by Theorem2.80 there exists an intermediate compact extension X → Z → Y. By Theorem3.27, we know that Z is SZ, thus Y is not maximal. We reach contradiction inboth situations. Therefore, we can conclude that every measure-preserving systemis SZ. �

Acknowledgments

It is a pleasure to thank my mentor Brian Chung for his time and effort in guidingme and helping me understand the material. I have been interested in Szemeredi’stheorem for a long time but I had no experience in either the theorem or ergodictheory before this summer. I would not be able to see the elegance of Furstenberg’sproof without the help of Brian. I also want to thank Professor May for helpingme with my paper writing skills and letting me be a part of the REU program.

4. bibliography

References

[1] H. Furstenberg, Y. Katznelson, D. Ornstein. The Ergodic Theoretical Proof of Szemeredi’s

Theorem. 1982.[2] Manfred Einsiedler, Thomas Ward. Ergodic Theory With A View Towards Number Theory.

Springer London Dordrecht Heidelberg New York. 2011.[3] Yufei Zhao. Szemeredi’s Theorem via Ergodic Theory. 2011.

21Given P and Q. Then we have P < Q if there exists some extension map φP,Q : P → Q,

i.e. Q is a factor of P.

Date post:	01-Jan-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S...

Documents