Download - Metabiology life as evolving software by g j chaitin

1

METABIOLOGY:

LIFE AS EVOLVING

SOFTWARE

METABIOLOGY: a field parallel to biology,dealing with the random evolution of artifi-cial software (computer programs) rather thannatural software (DNA), and simple enoughthat it is possible to prove rigorous theoremsor formulate heuristic arguments at the samehigh level of precision that is common in the-oretical physics.

2

“The chance that higher life forms might have emerged in this way [byDarwinian evolution] is comparable to the chance that a tornado sweepingthrough a junkyard might assemble a Boeing 747 from the materials therein.”— Fred Hoyle.

“In my opinion, if Darwin’s theory is as simple, fundamental and basic as itsadherents believe, then there ought to be an equally fundamental mathemati-cal theory about this, that expresses these ideas with the generality, precisionand degree of abstractness that we are accustomed to demand in pure math-ematics.” — Gregory Chaitin, Speculations on Biology, Information andComplexity.

“Mathematics is able to deal successfully only with the simplest of situations,more precisely, with a complex situation only to the extent that rare goodfortune makes this complex situation hinge upon a few dominant simple fac-tors. Beyond the well-traversed path, mathematics loses its bearings in ajungle of unnamed special functions and impenetrable combinatorial partic-ularities. Thus, the mathematical technique can only reach far if it startsfrom a point close to the simple essentials of a problem which has simpleessentials. That form of wisdom which is the opposite of single-mindedness,the ability to keep many threads in hand, to draw for an argument from manydisparate sources, is quite foreign to mathematics.” — Jacob Schwartz,The Pernicious Influence of Mathematics on Science.

“It may seem natural to think that, to understand a complex system, onemust construct a model incorporating everything that one knows about thesystem. However sensible this procedure may seem, in biology it has repeat-edly turned out to be a sterile exercise. There are two snags with it. Thefirst is that one finishes up with a model so complicated that one cannotunderstand it: the point of a model is to simplify, not to confuse. The sec-ond is that if one constructs a sufficiently complex model one can make itdo anything one likes by fiddling with the parameters: a model that canpredict anything predicts nothing.” — John Maynard Smith & EorsSzathmary, The Origins of Life.

3

Course Notes

METABIOLOGY:LIFE AS EVOLVING

SOFTWARE

G. J. Chaitin

Draft October 1, 2010

4

To my wife Virginiawho played an essential role in this research

Contents

Preface 7

1 Introduction: Building a theory 9

2 The search for the perfect language 19

3 Is the world built out of information? Is everything soft-ware? 39

4 The information economy 45

5 How real are real numbers? 55

6 Speculations on biology, information and complexity 77

7 Metaphysics, metamathematics and metabiology 87

8 Algorithmic information as a fundamental concept inphysics, mathematics and biology 101

9 To a mathematical theory of evolution and biological creativ-ity 1139.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1139.2 History of Metabiology . . . . . . . . . . . . . . . . . . . . . . 1149.3 Modeling Evolution . . . . . . . . . . . . . . . . . . . . . . . . 116

9.3.1 Software Organisms . . . . . . . . . . . . . . . . . . . . 1169.3.2 The Hill-Climbing Algorithm . . . . . . . . . . . . . . 1169.3.3 Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . 1179.3.4 What is a Mutation? . . . . . . . . . . . . . . . . . . . 117

5

6

9.3.5 Mutation Distance . . . . . . . . . . . . . . . . . . . . 1179.3.6 Hidden Use of Oracles . . . . . . . . . . . . . . . . . . 118

9.4 Model A (Naming Integers) Exhaustive Search . . . . . . . . . 1199.4.1 The Busy Beaver Function . . . . . . . . . . . . . . . . 1199.4.2 Proof of Theorem 1 (Exhaustive Search) . . . . . . . . 119

9.5 Model A (Naming Integers) Intelligent Design . . . . . . . . . . 1209.5.1 Another Busy Beaver Function . . . . . . . . . . . . . 1209.5.2 Improving Lower Bounds on Ω . . . . . . . . . . . . . . 1219.5.3 Proof of Theorem 2 (Intelligent Design) . . . . . . . . . 123

9.6 Model A (Naming Integers) Cumulative Evolution at Random . 1249.7 Model B (Naming Functions) . . . . . . . . . . . . . . . . . . . 1289.8 Remarks on Model C (Naming Ordinals) . . . . . . . . . . . . . 1329.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

10 Parsing the Turing test 143

11 Should mathematics be done differently because of Godel’sincompleteness theorem? 149

Bibliography 159

Books by Chaitin 161

Preface

Biology and mathematics are like oil and water, they do not mix. Never-theless this course will describe my attempt to express some basic biologicalprinciples mathematically. I’ll try to explain the raison d’etre of what I callmy “metabiological” approach, which studies randomly evolving computerprograms rather than biological organisms.

I want to thank a number of people and organizations for inviting me tolecture on metabiology; the interaction with audiences was extremely stimu-lating and helped these ideas to evolve.

Firstly, I thank the IBM Watson Research Center, Yorktown Heights,where I gave two talks on this, including the world premiere talk on metabiol-ogy. Another talk on metabiology in the United States was at the Universityof Maine.

In Argentina I thank Veronica Becher of the University of Buenos Airesand Victor Rodriguez of the University of Cordoba for their kind invitations.And I am most grateful to the University of Cordoba, currently celebratingits 400th anniversary, for the honorary doctorate that they were kind enoughto bestow on me.

In Chile I spoke on metabiology several times at the Valparaiso ComplexSystems Institute, and in Brazil I included metabiology in courses I gaveat the Federal University of Rio de Janeiro and in a talk at the FederalUniversity in Niteroi.

Furthermore I thank Bernd-Olaf Kuppers for inviting me to a very stim-ulating meeting at his Frege Centre for Structural Sciences at the Universityof Jena.

And I thank Ilias Kotsireas for organizing a Chaitin-in-Ontario lecture se-ries in 2009 in the course of which I spoke on metabiology at the Universityof Western Ontario in London, at the Institute for Quantum Computing inWaterloo, and at the Fields Institute at the University of Toronto. The chap-

7

8 Chaitin: Metabiology

ter of this book on Ω is based on a talk I gave at Wilfrid Laurier Universityin Waterloo.

Finally, I should mention that the chapter on “The Search for the PerfectLanguage” was first given as a talk at the Hebrew University in Jerusalemin 2008, then at the University of Campinas in Brazil, and finally at thePerimeter Institute in Waterloo, Canada.

The chapter on “Is Everything Software?” was originally a talk at theTechnion in Haifa, where I also spoke on metabiology at the University ofHaifa, one of a series of talks I gave there as the Rothschild DistinguishedLecturer for 2010.

These were great audiences, and their questions and suggestions wereextremely valuable. —

Gregory Chaitin, August 2010

Chapter 1

Introduction: Building a theory

• This is a course on biology that will spend a lot of time discussing KurtGodel’s famous 1931 incompleteness theorem on the limits of formalmathematical reasoning. Why? Because in my opinion the ultimatehistorical perspective on the significance of incompleteness may be thatGodel opens the door from mathematics to biology.

• We will also spend a lot of time discussing computer programs andsoftware for doing mathematical calculations. How come? BecauseDNA is presumably a universal programming language, whichis a language that is rich enough that it can express any algorithm.The fact that DNA is such a powerful programming language is a morefundamental characteristic of life than mere self-reproduction, whichanyway is never exact—for if it were, there would be no evolution.

• Now a few words on the kind of mathematics that we shall use in thiscourse. Starting with Newton mathematical physics is full of what arecalled ordinary differential equations, and starting with Maxwell partialdifferential equations become more and more important. Mathematicalphysics is full of differential equations, that is, continuous mathematics.

But that is not the kind of mathematics that we shall use here. Thesecret of life is not a differential equation. There is no differentialequation for your spouse, for an organism, or for biological evolution.Instead we shall concentrate on the fact that DNA is the software, it’sthe programming language for life.

• It is true that there are (ordinary) differential equations in a highly suc-

9


cessful mathematical theory of evolution, Wright-Fisher-Haldane pop-ulation genetics. But population genetics does not say where newgenes come from, it assumes a fixed gene pool and discusses thechange of gene frequencies in response to selective pressure, not bio-logical creativity and the major transitions in evolution, suchas the transition from unicellular to multicellular organisms, which iswhat interests us.

• If we aren’t going to use anymore the differential equations that popu-late mathematical physics, what kind of math are we going to use? Itwill be discrete math, new math, the math of the 20th century dealingwith computation, with algorithms. It won’t be traditional continuousmath, it won’t be the calculus. As Dorothy says in The Wizard of Oz,“Toto, we’re not in Kansas anymore!”

More in line with our life-as-evolving-software viewpoint are three hotnew topics in 20th century mathematics, computation, informationand complexity. These have expanded into entire theories, called com-putability theory, information theory and complexity theory, theorieswhich superficially appear to have little or no connection with biology.

In particular, our basic tool in this course will be algorithmic infor-mation theory (AIT), a mixture of Turing computability theory withShannon information theory, which features the concept of program-size complexity. The author was one of the people who created thistheory, AIT, in the mid 1960’s and then further developed it in themid 1970’s; the theory of evolution presented in this course could havebeen done then—all the necessary tools were available.

Why then the delay of 35 years? My apologies; I got distracted workingon computer engineering and thinking about metamathematics. I hadpublished notes on biology occasionally on and off since 1969, but Icouldn’t find the right way of thinking about biology, I couldn’t figureout how to formulate evolution mathematically in a workable manner.Once I discovered the right way, this new theory I call metabiology wentfrom being a gleam in my eye to a full-fledged mathematical theory injust two years.

• Also, it would be nice to be able to show that in our toy model hierar-chical structure will evolve, since that is such a conspicuous feature ofbiological organisms.

Introduction: Building a theory 11

What kind of math can we use for that? Well, there are places in puremath and in software engineering where you get hierarchical structures:in Mandelbrot fractals, in Cantor transfinite ordinal numbers, in hier-archies of fast growing functions, and in software levels of abstraction.

Fractals are continuous math and therefore not suitable for our discretemodels, but the three others are genuine possibilities, and we shalldiscuss them all. One of our models of evolution does provably exhibithierarchical structure.

• Here is the big challenge: Biology is extremely complicated, and everyrule has exceptions. How can mathematics possibly deal with this?

We will outline an indirect way to deal with it, by studying a toymodel I call metabiology (= life as evolving software, computer programorganisms, computer program mutations), not the real thing. We areusing Leibnizian math, not Newtonian math.

By modeling life as software, as computer programs, we get a very richspace of possible designs for organisms, and we can discuss biologicalcreativity = where new genes come from (where new biological ideassuch as multicellular organization come from), not just changes in genefrequencies in a population as in conventional evolutionary models.

• Some simulations of evolution on the computer (in silico—as contrastedwith in vivo, in the organism, and in vitro, in the test tube) such asTierra and Avida do in fact model organisms as software. But inthese models there is only a limited amount of evolution followed bystagnation.1

Furthermore I do not run my models on a computer, I prove theoremsabout them. And one of these theorems is that evolution will con-tinue indefinitely, that biological creativity (or what passes for it in mymodel) is endless, unceasing.

• The main theme of Darwinian evolution is competition, survival ofthe fittest, “Nature red in tooth and claw.” The main theme of mymodel is creativity: Instead of a population of individuals competing

1As for genetic algorithms, they are intended to “stagnate” when they achieve anoptimal solution to an engineering design problem; such a solution is a fixed point of theprocess of simulated evolution used by genetic algorithms.


ferociously with each other in order to spread their individual genes (asin Richard Dawkins’ The Selfish Gene), instead of a jungle, my modelis like an individual Buddhist trying to attain enlightenment, a monkwho is on the path to enlightenment, it is like a mystic or a kabbalistwho is trying to get closer and closer to God.

More precisely, the single mutating organism in my model attainsgreater and greater mathematical knowledge by discovering more andmore of the bits of Ω, which is, as we shall see in the part of thecourse on Ω, Course Topic 5, a very concentrated form of mathematicalknowledge, of mathematical creativity. My organisms strive for greatermathematical understanding, for purely mathematical enlightenment.

I model where new mathematical knowledge is coming from, wherenew biological ideas are coming from, it is this process that I modeland prove theorems about.

• But my model of a single mutating organism is indeed Darwinian: Ihave a single organism that is subjected to completely random muta-tions until a fitter organism is found, which then replaces my originalorganism, and this process continues indefinitely. The key point is thatin my model progress comes from combining random mutations andhaving a fitness criterion (which is my abstract encapsulation of bothcompetition and the environment).

The key point in Darwin’s theory was to replace God by randomness;organisms are not designed, they emerge at random, and that is alsothe case in my highly simplified toy model.

• Does this highly abstract game have any relevance to real biology?Probably not, and if so, only very, very indirectly. It is mathematicsitself that benefits most, because we begin to have a mathematicaltheory inspired by Darwin, to have mathematical concepts that areinspired by biology.

The fact that I can prove that evolution occurs in my modeldoes not in any way constitute a proof that Darwinians arecorrect and Intelligent Design partidarians are mistaken.

But my work is suggestive and it does clarify some of the issues, byfurnishing a toy model that is much easier to analyze than the realthing—the real thing is what is actually taking place in the biosphere,


not in my toy model, which consists of arbitrary mutation computerprograms operating on arbitrary organism computer programs.

• More on creativity, a key word in my model: Something is mechanicalif there is an algorithm for doing it; it is creative if there is no suchalgorithm.

This notion of creativity is basic to our endeavor, and it comes fromthe work of Godel on the incompleteness of formal axiomatic theories,and from the work of Turing on the unsolvability of the so-called halt-ing problem.2 Their work shows that there are no absolutely generalmethods in mathematics and theoretical computer science, that cre-ativity is essential, a conclusion that Paul Feyerabend with his bookAgainst Method would have loved had he been aware of it: What Fey-erabend espouses for philosophical reasons is in fact a theorem, that is,is provably correct in the field of mathematics.

So before we get to Darwin, we shall spend a lot of time in this coursewith Godel and Turing and the like, preparing the groundwork for ourmodel of evolution. Without this historical background it is impossibleto appreciate what is going on in our model.

• My model therefore mixes mathematical creativity and biological cre-ativity. This is both good and bad. It’s bad, because it distances mymodel from biology. But it is good, because mathematical creativityis a deep mathematical question, a fundamental mystery, a big un-known, and therefore something important to think about, at least formathematicians, if not for biologists.

Further distancing my model from biology, my model combines ran-domness, a very Darwinian feature, with Turing oracles, which have nocounterpart in biology; we will discuss this in due course.

Exploring such models of randomly evolving software may well developinto a new field of mathematics. Hopefully this is just the beginning,and metabiology will develop and will have more connection with biol-ogy in the future than it has at present.

2Turing’s halting problem is the question of deciding whether or not a computer pro-gram that is self-contained, without any input, will run forever, or will eventually finish.


• The main difference between our model and the DNA software in realorganisms is their time complexity: the amount of time the softwarecan run. I can prove elegant theorems because in my model the timeallowed for a program to run is finite, but unlimited.

Real DNA software must run quickly: 9 months to produce a baby,70 years in total, more or less. A theory of the evolution of programswith such limited time complexity, with such limited run time, wouldbe more realistic but it will not contain the neat results we have in ouridealized version of biology.

This is similar to the thermodynamics arguments which are taken inthe “thermodynamic limit” of large amounts of time, in order to obtainmore clear-cut results, when discussing the ideal performance of heatengines (e.g., steam engines). Indeed, AIT is a kind of thermodynamicsof computation, with program-size complexity replacing entropy. In-stead of applying to heat engines and telling us their ideal efficiency,AIT does the same for computers, for computations.

You have to go far from everyday biology to find beautiful mathematicalstructure.

• It should be emphasized that metabiology is Work in Progress. Itmay be mistaken. And it is certainly not finished yet. We are buildinga new theory. How do you create a theory?

“Beauty” is the guide. And this course will give a history of ideasfor metabiology with plenty of examples. An idea is beautiful when itilluminates you, when it connects everything, when you ask yourself,“Why didn’t I see that before!,” when in retrospect it seems obvious.

AIT has two such ideas: the idea of looking at the size of a com-puter program as a complexity measure, and the idea of self-delimitingprograms. Metabiology has two more beautiful ideas: the idea of or-ganisms as arbitrary programs with a difficult mathematical problemto solve, and the idea of mutations as arbitrary programs that operateon an organism to produce a mutated organism.

Once you have these ideas, the rest is just uninspired routine work,lots of hard work, but that’s all. In this course we shall discuss all fourof these beautiful ideas, which were the key inspirations required forcreating AIT and metabiology.


Routine work is not enough, you need a spark from God. And mostlyyou need an instinct for mathematical beauty, for sensing an idea thatcan be developed, for the importance of an idea. That is, more thananything else, a question of aesthetics, of intuition, of instinct, of judge-ment, and it is highly subjective.

I will try my best to explain why I believe in these ideas, but just asin artistic taste, there is no way to convince anyone. You either feelit somewhere deep in your soul or you don’t. There is nothing moreimportant than experiencing beauty; it’s a glimpse of transcendence, aglimpse of the divine, something that fewer and fewer people believe innowadays. But without that we are mere machines.

And I may have the beginnings of a mathematical theory of evolu-tion and biological creativity, but a mathematical theory of beauty isnowhere in sight.

• Incompleteness goes from being threatening to provoking creativity andbeing applied in order to keep our organisms evolving indefinitely. Evo-lution stagnates in most models because the organisms achieve theirgoals. In my model the organisms are asked to achieve somethingthat can never be fully achieved because of the incompleteness phe-nomenon. So my organisms keep getting better and better at whatthey are doing; they can never stop, because stopping would meanthat they had a complete answer to a math problem to which incom-pleteness applies. Indeed, the three mathematical challenges that myorganisms face, naming large integers, fast growing functions, and largetransfinite ordinals, are very concrete, tangible examples of the incom-pleteness phenomenon, which at first seemed rather mysterious.

Incompleteness is the reason that our organisms have to keep evolvingforever, as they strive to become more and more complete, less andless incomplete. . . Incompleteness keeps our model of evolution fromstagnating, it gives our organisms a mission, a raison d’etre.

You have to go beyond incompleteness; incompleteness gives rise tocreativity and evolution. Incompleteness sounds bad, but the otherside of the coin is creativity and evolution, which are good.

Now we give an outline of the course, consisting of Course Topics 1–9:

1. This introduction.


2. The Search for the Perfect Language. (My talk at the Perimeter Insti-tute in Waterloo.)

Umberto Eco, Lull, Leibniz, Cantor, Russell, Hilbert, Godel, Turing,AIT, Ω. Kabbalah, Key to Universal Knowledge, God-like Power ofCreation, the Golem!

Mathematical theories are all incomplete (Godel, Turing, Ω), but pro-gramming languages are universal. Most concise programming lan-guages, self-delimiting programs.

3. Is the world built out of information? Is everything software? (My talkat the Technion in Haifa.)

Physics of information: Quantum Information Theory; general relativ-ity and black holes, Beckenstein bound, holographic principle = everyphysical system contains a finite number of bits of information thatgrows as the surface area of the physical system, not as its volume(Lee Smolin, Three Roads to Quantum Gravity); derivation of Ein-stein’s field equations for gravity from the thermodynamics of blackholes (Ted Jacobson, “Thermodynamics of Spacetime: The EinsteinEquation of State”).

The first attempt to construct a truly fundamental mathematical modelfor biology: von Neumann self-reproducing automata in a cellularautomata world, a world in which magic works, a plastic world.

See also: Edgar F. Codd, Cellular Automata. Konrad Zuse, Rechnen-der Raum (Calculating Space). Fred Hoyle, Ossian’s Ride. FreemanDyson, The Sun, the Genome, and the Internet (1999), green technol-ogy. Craig Venter, genetic engineering, synthetic life.

Technological applications: Seeds for houses, seeds for jet planes!Plant the seed in the earth just add water and sunlight. Universal con-structors, 3D printers = matter printers = printers for objects. Flexiblemanufacturing. Alchemy, Plastic reality.

4. Artificial Life: Evolution Simulations.

Low Level: Thomas Ray’s Tierra, Christoph Adami’s Avida, WalterFontana’s ALchemy (Algorithmic Chemistry), Genetic algorithms.

High Level: Exploratory concept-formation based on examining lotsof examples in elementary number theory, experimental math with no


proofs: Douglas Lenat (1984), “Automated theory formation in math-ematics,” AM.

After a while these stop evolving. What about proofs instead ofsimulations? Seems impossible—see the frontispiece quotes facing thetitle page, especially the one by Jacob Schwartz—but there is hope.See Course Topic 5 arguing that Ω is provably a bridge from math tobiology.

5. How Real Are Real Numbers? A History of Ω. (My talk at WLU inWaterloo.)

Course Topic 3 gives physical arguments against real numbers, and thiscourse topic gives mathematical arguments against real numbers. Theseconsiderations about paradoxical real numbers will lead us straight tothe halting probability Ω. That is not how Ω was actually discovered,but it is the best way of understanding Ω. It’s a Whig history: how itshould have been, not how it actually was.

The irreducible complexity real number Ω proves that math is morebiological than biology; this is the first real bridge between math andbiology. Biology is extremely complicated, and pure math is infinitelycomplicated.

The theme of Ω as concentrated mathematical creativity is introducedhere; this is important because Ω is the organism that emerges throughrandom evolution in Course Topic 8.

Now let’s get to work in earnest to build a mathematical theory ofevolution and biological creativity.

6. Metabiology: Life as Evolving Software.

Stephen Wolfram, NKS: the origin of life as the physical implementa-tion of a universal programming language; the Ubiquity of Univer-sality. Francois Jacob, bricolage, Nature is a cobbler, a tinkerer. NeilShubin, Your Inner Fish. Stephen Gould, Wonderful Life, on the Cam-brian explosion of body designs. Murray Gell-Mann, frozen accidents.Ernst Haeckel, ontogeny recapitulates phylogeny. Evo-devo.

Note that a small change in a computer program (one bit!) can com-pletely wreck it. But small changes can also make substantial improve-ments. This is a highly nonlinear effect, like the famous butterfly effect


of chaos theory (see James Gleick’s Chaos). Over the history of thisplanet, covering the entire surface of the earth, there is time to trymany small changes. But not enough time according to the Intelli-gent Design book Signature in the Cell. In the real world this is stillcontroversial, but in my toy model evolution provably works.

A blog summarized one of my talks on metabiology like this: “We areall random walks in program space!” That’s the general idea; in CourseTopics 7 and 8 we fill in the details of this new theory.

7. Creativity in Mathematics. We need to challenge our organisms intoevolving. We need to keep them from stagnating. These problems canutilize an unlimited amount of mathematical creativity:

• Busy Beaver problem: Naming large integers: 1010, 101010. . .

• Naming fast-growing functions: N2, 2N . . .

• Naming large transfinite Cantor ordinals: ω, ω2, ωω . . .

8. Creativity in Biology. Single mutating organism. Hill-climbing algo-rithm on a fitness landscape. Hill-climbing random walks in softwarespace. Evolution of mutating software. What is a mutation? Exhaus-tive search. Intelligent design. Cumulative evolution at random. Ω asconcentrated creativity, Ω as an evolving organism. Randomness yieldsintelligence.

We have a proof that evolution works, at least in this toy model; in fact,surprisingly it is nearly as fast as intelligent design, as deliberatelychoosing the mutations in the best possible order. But can we showthat random evolution is slower than intelligent design? Otherwise thetheory collapses onto a point, it cannot distinguish, it does not makeuseful distinctions. We also get evolution of hierarchical structure innon-universal programming languages.

So we seem to have evolution at work in these toy models. But to whatextent is this relevant to real biological systems?

9. Conclusion: On the plasticity of the world. Is the universe mental?Speculation where all this might possibly lead.

Chapter 2

The search for the perfectlanguage

I will tell how the story given in Umberto Eco’s book The Search for thePerfect Language continues with modern work on logical and programminglanguages. Lecture given Monday, 21 September 2009, at the Perimeter In-stitute for Theoretical Physics in Waterloo, Canada.1

Today I’m not going to talk much about Ω. I will focus on that at WilfridLaurier University tomorrow. And if you want to hear a little bit about mycurrent enthusiasm, which is what I’m optimistically calling metabiology —it’s a field with a lovely name and almost no content at this time — that’son Wednesday at the Institute for Quantum Computing.

I thought it would be fun here at the Perimeter Institute to repeat a talk,to give a version of a talk, that I gave in Jerusalem a year ago. To understandthe talk it helps to keep in mind that it was first given in Jerusalem. I’d liketo give you a broad sweep of the history of mathematical logic. I’m a math-ematician who likes physicists; some mathematicians don’t like physicists.But I do. Before I became a mathematician I wanted to be a physicist.

So I’m going to talk about mathematics, and I’d like to give you a broadoverview, most definitely a non-standard view of some intellectual history. It

1This lecture was published in Portuguese in Sao Paulo, Brazil, in the magazine Dicta& Contradicta, No. 4, 2009. See http://www.dicta.com.br/.

19


will be a talk about the history of work on the foundations of mathematicsas seen from the perspective of the Middle Ages. So here goes. . .

This talk = Umberto Eco + Hilbert, Godel, Turing. . .Outline at: http://www.cs.umaine.edu/~chaitin/hu.html

There is a wonderful book by Umberto Eco called The Search for the PerfectLanguage, and I recommend it highly to all of you.

In The Search for the Perfect Language you can see that Umberto Ecolikes the Middle Ages — I think he probably wishes we were still there. Andthis book talks about a dream that Eco believes played a fundamental rolein European intellectual history, which is the search for the perfect language.

What is the search for the perfect language? Nowadays a physicist wouldcall this the search for a Theory of Everything (TOE), but in the terms inwhich it was formulated originally, it was the idea of finding, shall we say, thelanguage of creation, the language before the Tower of Babel, the languagethat God used in creating the universe, the language whose structure directlyexpresses the structure of the world, the language in which concepts areexpressed in their direct, original format.

You can see that this idea is a little bit like the attempt to find a foun-dational Theory of Everything in physics.

The crucial point is that knowing this language would be like having a keyto universal knowledge. If you’re a theologian, it would bring you closer,very close, to God’s thoughts, which is dangerous. If you’re a magician, itwould give you magical powers. If you’re a linguist, it would tell you theoriginal, pure, uncorrupted language from which all languages descend. Onecan go on and on. . .

This very fascinating book is about the quest to find this language. Ifyou find it, you’re opening a door to absolute knowledge, to God, to theultimate nature of reality, to whatever.

And there are a lot of interesting chapters in this intellectual history. Oneof them is Raymond Lull, around 1200, a Catalan.

Raymond Lull ≈ 1200

He was a very interesting gentleman who had the idea of mechanically com-bining all possible concepts to get new knowledge. So you would have a wheelwith different concepts on it, and another wheel with other concepts on it,and you would rotate them to get all possible combinations. This would be

The search for the perfect language 21

a systematic way to discover new concepts and new truths. And if you re-member Swift’s Gulliver’s Travels, there Swift makes fun of an idea like this,in one of the parts of the book that is not for children but definitely only foradults.

Let’s leave Lull and go on to Leibniz. In The Search for the PerfectLanguage there is an entire chapter on Leibniz. Leibniz is a transitionalfigure in the search for the perfect language. Leibniz is wonderful because heis universal. He knows all about Kabbalah, Christian Kabbalah and JewishKabbalah, and all kinds of hermetic and esoteric doctrines, and he knowsall about alchemy, he actually ghost-authored a book on alchemy. Leibnizknows about all these things, and he knows about ancient philosophy, heknows about scholastic philosophy, and he also knows about what was thencalled mechanical philosophy, which was the beginning of modern science.And Leibniz sees good in all of this.

And he formulates a version of the search for the perfect language, whichis firmly grounded in the magical, theological original idea, but which is alsofit for consumption nowadays, that is, acceptable to modern ears, to contem-porary scientists. This is a universal language he called the characteristicauniversalis that was supposed to come with a crucial calculus ratiocinator.

Leibniz: characteristica universalis, calculus ratiocinator

The idea, the goal, is that you would reduce reasoning to calculation, tocomputation, because the most certain thing is that 2 + 5 = 7. In otherwords, the way Leibniz put it, perhaps in one of his letters, is that if twopeople have an intellectual dispute, instead of dueling they could just sitdown and say, “Gentlemen, let us compute!”, and get the correct answer andfind out who was right.

So this is Leibniz’s version of the search for the perfect language. Howfar did he get with this?

Well, Leibniz is a person who gets bored easily, and flies like a butterflyfrom field to field, throwing out fundamental ideas, rarely taking the troubleto develop them fully.

One case of the characteristica universalis that Leibniz did develop iscalled the calculus. This is one case where Leibniz worked out his ideas forthe perfect language in beautiful detail.

Leibniz’s version of the calculus differs from Newton’s precisely becauseit is part of Leibniz’s project for the characteristica universalis. ChristianHuygens hated the calculus.


Christian Huygens taught Leibniz mathematics in Paris at a relativelylate age, when Leibniz was in his twenties. Most mathematicians start very,very young. And Christian Huygen’s hated Leibniz’s calculus because hesaid that it was mechanical, it was brainless: Any fool can just calculate theanswer by following the rules, without understanding what he or she is doing.

Huygens preferred the old, synthetic geometry proofs where you haveto be creative and come up with a diagram and some particular reason forsomething to be true. Leibniz wanted a general method. He wanted to getthe formalism, the notation, right, and have a mechanical way to get theanswer.

Huygens didn’t like this, but that was precisely the point. This wasprecisely what Leibniz was looking for, for everything!

The idea was that if you get absolute truth, if you have found the truth,it should mechanically enable you to determine what’s going on, withoutcreativity. This is good, this is not bad.

This is also precisely how Leibniz’s version of the calculus differed fromNewton’s. Leibniz saw clearly the importance of having a formalism that ledyou automatically to the answer.

Let’s now take a big jump, to David Hilbert, about a century ago. . .No, first I want to tell you about an important attempt to find the perfectlanguage: Cantor’s theory of infinite sets.

Cantor: Infinite Sets

This late 19th century theory is interesting because it’s firmly in the MiddleAges and also, in a way, the inspiration for all of 20th century mathematics.

This theory of infinite sets was actually theology. This is mathematicaltheology. Normally you don’t mention that fact. To be a field of mathe-matics, the price of admission is you throw out all the philosophy, and youjust end up with something technical. So all the theology has been thrownout.

But Cantor’s goal was to understand God. God is transcendent. Thetheory of infinite sets has this hierarchy of bigger and bigger infinities, thealephs, the ℵ’s. You have ℵ0, ℵ1, the infinity of integers, of real numbers,and you keep going. Each one of these is the set of all subsets of the previousone. And very far out you get mind-boggling infinities like ℵω; this is thefirst infinity after

ℵ0,ℵ1,ℵ2,ℵ3,ℵ4 . . .


Then you can continue with

ω + 1, ω + 2, ω + 3 . . . 2ω + 1, 2ω + 2, 2ω + 3 . . .

These so-called ordinal numbers are subscripts for the ℵ’s, which are cardi-nalities. Let’s go farther:

ℵω2 ,ℵωω ,ℵωωω . . .

And there’s an ordinal called epsilon-nought

ε0 = ωωωω...

which is the smallest solution of the equation

x = ωx.

And the corresponding cardinalℵε0

is pretty big!You know, God is very far off, since God is infinite and transcendent. We

can try to go in His direction. But we’re never going to get there, becauseafter every cardinal, there’s a bigger one, the cardinality of the set of allsubsets. And after any infinite sequence of cardinals that you get, you justtake the union of all of that, and you get a bigger cardinal than is in thesequence. So this thing is inherently open-ended. And contradictory, bythe way!

There’s only one problem. This is absolutely wonderful, breath-takingstuff. The only problem is that it’s contradictory.

The problem is very simple. If you take the universal set, the set ofeverything, and you consider the set of all its subsets, by Cantor’s diago-nal argument this should have a bigger cardinality, but how can you haveanything bigger than the set of everything?

This is the paradox that Bertrand Russell discovered. Russell lookedat this and asked why do you get this bad result. And if you look at theCantor diagonal argument proof that the set of all subsets of everything isbigger than everything, it involves the set of all sets that are not membersof themselves,

x : x 6∈ x,


which can neither be in itself nor not be in itself. This is called the Russellparadox.

Cantor was aware of the fact that this happens, but Cantor wasn’t both-ered by these contradictions, because he was doing theology. We’re finite butGod is infinite, and it’s paradoxical for a finite being to try to comprehend atranscendent, infinite being, so paradoxes are okay. But the math communityis not very happy with a theory which leads to contradictions.

However, these ideas are so wonderful, that what the math communityhas done is forget about all this theology and philosophy and try to sweepthe contradictions under the rug. There is an expurgated version of all thiscalled Zermelo-Fraenkel set theory, with the axiom of choice, usually: ZFC.This is a formal axiomatic theory which you develop using first-order logic,and it is an expurgated version of Cantor’s theory believed not to containany paradoxes.

Anyway, Bertrand Russell was inspired by all of this to attempt a generalcritique of mathematical reasoning, and to find a lot of contradictions, a lotof mathematical arguments that lead to contradictions.

Bertrand Russell: mathematics is full of contradictions.

I already told you about his most famous one, the Russell paradox.Russell was an atheist who was searching for the absolute, who believed

in absolute truth. And he loved mathematics and wanted mathematics tobe perfect. Russell went around telling people about these contradictions inorder to try to get them fixed.

Besides the paradox that there’s no biggest cardinal and that the set ofsubsets of everything is bigger than everything, there’s also a problem withthe ordinal numbers that’s called the Burali-Forti paradox, namely that theset of all the ordinals is an ordinal that’s bigger than all the ordinals. Thisworks because each ordinal can be defined as the set of all the ordinals thatare smaller than it is. (Then an ordinal is less than another ordinal if andonly if it is contained in it.)

Russell is going around telling people that reason leads to contradictions.So David Hilbert about a century ago proposes a program to put mathematicson a firm foundation. And basically what Hilbert proposes is the idea ofa completely formal axiomatic theory, which is a modern version ofLeibniz’s characteristica universalis and calculus ratiocinator :

David Hilbert: mathematics is a formal axiomatic theory.


This is the idea of making mathematics totally objective, of removing allthe subjective elements.

So in such a formal axiomatic theory you would have a finite numberof axioms, axioms that are not written in an ambiguous natural language.Instead you use a precise artificial language with a simple, regular artificialgrammar. You use mathematical logic, not informal reasoning, and youspecify the rules of the game completely precisely. It should be mechanicalto decide whether a proof is correct.

Hilbert was a conservative. He believed that mathematics gives abso-lute truth, which is an idea from the Middle Ages. You can see the MiddleAges whenever you mention absolute truth. Nevertheless, modern mathe-maticians remain enamored with absolute truth. As Godel said, we puremathematicians are the last holdout of the Middle Ages. We still believein the Platonic world of ideas, at least mathematical ideas, when everyoneelse, including philosophers, now laughs at this notion. But pure mathemati-cians live in the Platonic world of ideas, even though everyone else stoppedbelieving in this a long time ago.

So math gives absolute truth, said Hilbert. Every mathematician some-where deep inside believes this. Then there ought to exist a finite set ofaxioms, and a precise set of rules for deduction, for inference, such that all ofmathematical truth is a consequence of these axioms. You see, if mathemat-ical truth is black or white, and purely objective, then if you fill in all thesteps in a proof and carefully use an artificial language to avoid ambiguity,you should be able to have a finite set of axioms we can all agree on, thatin principle enable you to deduce all of mathematical truth. This is just thenotion that mathematics provides absolute certainty; Hilbert is analyzingwhat this means.

What Hilbert says is that the traditional view that mathematics providesabsolute certainty, that in the Platonic world of pure mathematics everythingis black or white, means that there should be a single formal axiomatic theoryfor all of math. That was a very important idea of his.

An important consequence of this idea goes back to the Middle Ages.This perfect language for mathematics, which is what Hilbert was lookingfor, would in fact give a key to absolute knowledge, because in principleyou could mechanically deduce all the theorems from the axioms, simply byrunning through the tree of all possible proofs. You start with the axioms,then you apply the rules of inference once, and get all the theorems that haveone-step proofs, you apply them two times, and you get all the theorems that


have two-step proofs, and like that, totally mechanically, you would get allof mathematical truth, by systematically traversing the tree of all possibleproofs.

This would not put all mathematicians out of work, not at all. In practicethis process would take an outrageous amount of time to get to interestingresults, and all the interesting theorems would be overwhelmed by uninter-esting theorems, such as the fact that 1 + 1 = 2 and other trivialities.

It would be hard to find the interesting theorems and to separate thewheat from the chaff. But in principle this would give you all mathematicaltruths. You wouldn’t actually do it, but it would show that math givesabsolute certainty.

By the way, it was important to make all mathematicians agree on thechoice of formal axiomatic theory, and you would use metamathematics totry to convince everyone that this formal axiomatic theory avoids all theparadoxes that Bertrand Russell had noticed and contains no contradictions.

Okay, so this was the idea of putting mathematics on a firm foundationand removing all doubts. This was Hilbert’s idea, about a century ago, andmetamathematics studies a formal axiomatic theory from the outside, andnotice that this is a door to absolute truth, following the notion of the perfectlanguage.

So what happens with this program, with this proposal of Hilbert’s? Well,there’s some good news and some bad news. Some of the good news I alreadymentioned: The thing that comes the closest to what Hilbert asked for isZermelo-Fraenkel set theory, and it is a beautiful axiomatic theory. I wantto mention some of the milestones in the development of this theory.

One of them is the von Neumann integers, so let me tell you about that.Remember that Spinoza has a philosophical system in which the world isbuilt out of only one substance, and that substance is God, that’s all thereis. Zermelo-Fraenkel set theory is similar. Everything is sets, and every setis built out of the empty set. That’s all there is: the empty set, and setsbuilt starting with the empty set.

So zero is the empty set, that’s the first von Neumann integer, and ingeneral n+ 1 is defined to be the set of all integers less than or equal to n:

von Neumann integers: 0 = , n+ 1 = 0, 1, 2, . . . , n.

So if you write this out in full, removing all the abbreviations, all you haveare curly braces, you have set formation starting with no content, and the


full notation for n grows exponentially in n, if you write it all out, becauseeverything up to that point is repeated in the next number. In spite of thisexponential growth, this is a beautiful conceptual scheme.

Then you can define rational numbers as pairs of these integers, youcan define real numbers as limit sequences of rationals, and you get all ofmathematics, starting just with the empty set. So it’s a lovely piece ofontology. Here’s all of mathematical creation just built out of the empty set.

And other people who worked on this are of course Fraenkel and Zermelo,because it is called Zermelo-Fraenkel set theory, and an approximate notionof what they did was to try to avoid sets that are too big. The universal setis too big, it gets you into trouble. Not every property determines a set.

So this is a formal theory that most mathematicians believe enables you tocarry out all the arguments that normally appear in mathematics — maybeif you don’t include category theory, which is very difficult to formalize, andeven more paradoxical than set theory, from what I hear.

Okay, so that’s some of the positive work on Hilbert’s program. Nowsome of the negative work on Hilbert’s program — I’d like to tell you aboutit, you’ve all heard of it — is of course Godel in 1931 and Turing in 1936.

Godel, 1931 — Turing, 1936

What they show is that you can’t have a perfect language for mathematics,you cannot have a formal axiomatic theory like Hilbert wanted for all ofmathematics, because of incompleteness, because no such system will includeall of mathematical truth, it will always leave out truths, it will always beincomplete.

And this is Godel’s incompleteness theorem of 1931, and Godel’s originalproof is very strange. It’s basically the paradox of “this statement is false,”

“This statement is false!”

which is a paradox of course because it can be neither true nor false. If it’sfalse that it’s false, then it’s true, and if it’s true that it’s false, then it’sfalse. That’s just a paradox. But what Godel does is say “this statement isunprovable.”

“This statement is unprovable!”

So if the statement says of itself it’s unprovable, there are two possibilities:it’s provable, or it isn’t.


If it’s provable, then we’re proving something that’s false, because it saysit’s unprovable. So we hope that’s not the case; by hypothesis, we’ll eliminatethat possibility. If we prove things that are false, we have a formal axiomatictheory that we’re not interested in, because it proves false things.

The only possibility left is that it’s unprovable. But if it’s unprovablethen it’s true, because it asserts it’s unprovable, therefore there’s a hole. Wehaven’t captured all of mathematical truth in our theory.

This proof of incompleteness shocks a lot of people, but my personalreaction to it is, okay, it’s correct, but I don’t like it.

A better proof of incompleteness, a deeper proof, comes from Turing in1936. He derives incompleteness from a more fundamental phenomenon,which is uncomputability, the discovery that mathematics is full of stuff thatcan’t be calculated, of things you can define, but which you cannot calculate,because there’s no algorithm.

Uncomputability ⇒ Incompleteness

And in particular, the uncomputable thing that he discovers is the halt-ing problem, a very simple question: Does a computer program that’sself-contained halt or does it go on forever? There is no algorithm to answerthis in every individual case, therefore there is no formal axiomatic theorythat enables you to always prove in individual cases what the answer is.

Why not? Because if there were a formal axiomatic theory that’s completefor the halting problem, that would give you a mechanical procedure fordeciding, by running through the tree of all possible proofs, until you finda proof that an individual program you’re interested in halts, or you find aproof that it doesn’t. But that’s impossible because this is not a computablefunction.

So Turing’s insight in 1936 is that incompleteness, that Godel found in1931, for any formal axiomatic theory, comes from a deeper phenomenon,which is uncomputability. Incompleteness is an immediate corollary of un-computability, a concept which does not appear in Godel’s 1931 paper.

But Turing’s paper has both good and bad aspects. There’s a negativeaspect of his 1936 paper, which I’ve just told you about, but there’s also apositive aspect. You get another proof, a deeper proof of incompleteness,but you also get a kind of completeness. You find a perfect language.

There is no perfect language for mathematical reasoning. Godel showedthat in 1931, and Turing showed it again in 1936. But what Turing also


showed in 1936 is that there are perfect languages, not for mathematicalreasoning, but for computation, for specifying algorithms.

What Turing discovers in 1936 is that there’s a kind of completenesscalled universality and that there are universal Turing machines and universalprogramming languages.

Universal Turing Machines / Programming Languages

What “universal” means, what a universal programming language or a uni-versal Turing machine is, is a language in which every possible algorithm canbe written.

So on the one hand, Turing shows us in a deeper way that any languagefor mathematical reasoning has to be incomplete, but on the other hand,he shows us that languages for computation can be universal, which is justanother name, a synonym, for completeness.

There are perfect languages for computation, for writing algorithms, eventhough there aren’t any perfect languages for mathematical reasoning. Thisis the positive side, this is the completeness side, of Turing’s 1936 paper.

Now, what I’ve spent most of my professional life on, is a subject I callalgorithmic information theory

Algorithmic Information Theory (AIT)

that derives incompleteness from uncomputability by taking advantage ofa deeper phenomenon, by considering an extreme form of uncomputability,which is called algorithmic randomness or algorithmic irreducibility.

AIT: algorithmic randomness, algorithmic irreducibility

There’s a perfect language again, and there’s also a negative side, the halt-ing probability Ω, whose bits are algorithmically random, algorithmicallyirreducible mathematical truths.

Ω = .010010111 . . .

This is a place in pure mathematics where there’s no structure. If you wantto know the bits of the numerical value of the halting probability, this isa well-defined mathematical question, and in the world of mathematics alltruths are necessary truths, but these look like accidental, contingenttruths. They look random, they have irreducible complexity.


This is a maximal case of uncomputability, this is a place in pure mathe-matics where there’s absolutely no structure at all. Although it is true thatyou can in a few cases actually know some of the first bits. . .

There are actually an infinite number of halting probabilities dependingon your choice of programming language. After you choose a language, thenyou ask what is the probability that a program generated by coin tossingwill eventually halt. And that gives you a different halting probability. Thenumerical value will be different; the paradoxical properties are the same.

Okay, there are cases for which you can get a few of the first bits. Forexample, if Ω starts with 1s in binary or 9s in decimal, you can know thosebits or digits, if Ω is .11111. . . base two or .99999. . . base ten. So you can geta finite number of bits, perhaps, of the numerical value, but if you have an N -bit formal axiomatic theory, then you can’t get more than N bits of Ω. That’ssort of the general result. It’s irreducible logically and computationally. It’sirreducible mathematical information. It’s a perfect simulation in pure math,where all truths are necessary, of contingent, accidental, maximal entropytruths.

So that’s the bad news from AIT. But just like in Turing’s 1936 work,there is a positive side. On the one hand we have maximal uncomputabil-ity, maximal entropy, total lack of structure, of any redundancy, in aninformation-theoretic sense, but there’s also good news.

AIT, the theory of program-size complexity, the theory where Ω is thecrown jewel, goes further than Turing, and picks out from Turing’s universalTuring machines, from Turing’s universal languages, maximally expressiveprogramming languages. Because those are the ones that you have to use todevelop this theory where you get to Ω.

AIT has the notion of a maximally expressive programming language inwhich programs are maximally compact, and deals with a very basic complex-ity concept which is the size of the smallest program to calculate something:

H(x) is the size in bits of the smallest program to calculate x.

And we now have a better notion of perfection. The perfect languagesthat Turing found, the universal programming languages, are not all equallygood. We now concentrate on a subset, the ones that enable us to write themost concise programs. These are the most expressive languages, the oneswith the smallest programs.

Now let me tell you, this definition of complexity is a dry, technical wayof expressing this idea in modern terms. But let me put this into Medieval


terminology, which is much more colorful. The notion of program-size com-plexity — which by the way has many different names: algorithmic complex-ity, Kolmogorov complexity, algorithmic information content — in Medievalterms, what we’re asking is, how many yes/no decisions did God haveto make to create something?, which is obviously a rather basic questionto ask. That is, if you consider that God is calculating the universe.

I’m giving you a Medieval perspective on these modern developments.Theology is the fundamental physics, it’s the theoretical physics of the MiddleAges.

I have a lot of time left — I’ve been racing through this material — somaybe I should explain in more detail how AIT contributes to the quest forthe perfect language.

The notion of universal Turing machine that is used in AIT is Turing’svery basic idea of a flexible machine. It’s flexible hardware, which we call soft-ware. In a way, Turing in 1936 creates the computer industry and computertechnology. That’s a tremendous benefit of a paper that mathematicallysounds at first rather negative, since it talks about things that cannot becalculated, that cannot be proved. But on the other hand there’s a very pos-itive aspect — I stated it in theoretical terms — which is that programminglanguages can be complete, can be universal, even though formal axiomatictheories cannot be complete.

Okay, so you get this technology, there’s this notion of a flexible machine,this notion of software, which emerges in this paper. Von Neumann, thesame von Neumann who invented the von Neumann integers, credited all ofthis to Turing. At least Turing is responsible for the concept; the hardwareimplementation is another matter.

Now, AIT, where you talk about program-size complexity, the size of thesmallest program, how many yes/no decisions God has to make to calcu-late something, to create something, picks out a particular class of universalTuring machines U .

What are the universal computers U like that you use to define program-size complexity and talk about Ω? Well, a universal computer U has theproperty that for any other computer C and its program p, your universalcomputer U will calculate the same result if you give it the original programp for C concatenated to a prefix πC which depends only on the computerC that you want to simulate. πC tells U which computer to simulate. Insymbols,

U(πC p) = C(p).


In other words, πC p is the concatenation of two pieces of information.It’s a binary string. You take the original program p, which is also a binarystring, and in front of it you put a prefix that tells you which computer tosimulate.

Which means that these programs πC p for U are only a fixed number ofbits larger than the programs p for any individual machine C.

These U are the universal Turing machines that you use in AIT. Theseare the most expressive languages. These are the languages with maximalexpressive power. These are the languages in which programs are as conciseas possible. This is how you define program-size complexity. God will natu-rally use the most perfect, most powerful programming languages, when hecreates the world, to build everything.

I should point out that Turing’s original universality concept was notcareful about counting bits; it didn’t really care about the size of programs.All a universal machine U had to do was to be able to simulate any othermachine C, but one did not study the size of the program for U as a functionof the size of the program for C. Here we are careful not to waste bits.

AIT is concerned with particularly efficient ways for U to be universal.The original notion of universality in Turing was not this demanding.

The fact that you can just add a fixed number of bits to a program for Cto get one for U is not completely trivial. Let me tell you why.

After you put πC and p together, you have to know where the prefix endsand the program that is being simulated begins. There are many ways to dothis.

A very simple way to make the prefix πC self-delimiting is to have it bea sequence of 0’s followed by a 1:

πC = 0k1.

And the number k of 0’s tells us which machine C to simulate. That’s a verywasteful way to indicate this.

The prefix πC is actually an interpreter for the programming language C.AIT’s universal languages U have the property that you give U an interpreterplus the program p in this other language C, and U will run the interpreterto see what p does.

If you think of this interpreter πC as an arbitrary string of bits, one wayto make it self-delimiting is to just double all the bits. 0 goes to 00, 1 goesto 11, and you put a pair of unequal bits 01 as punctuation at the end:


Arbitrary πC : 0 → 00, 1 → 11, 01 at the end.

This is a better way to have a self-delimiting prefix that you can concatenatewith p. It only doubles the size, the 0k1 trick increases the size exponentially.

And there are more efficient ways to make the prefix self-delimiting. Forexample, you can put the size of the prefix in front of the prefix. But it’ssort of like Russian dolls, because if you put the size |πC | of πC in front ofπC , |πC | also has to be self-delimiting:

U(. . . ||πC || |πC |πC p) = C(p).

Anyway, picking U this way is the key idea in the original 1960s versionof AIT that Solomonoff, Kolmogorov and I independently proposed. Butten years later I realized that this is not the right approach. You actuallywant the whole program πC p for U to be self-delimiting, not just the prefixπC . You want the whole thing to be self-delimiting to get the right theory ofprogram-size complexity.

Let me compare the 1960s version of AIT and the 1970s version of AIT.Let me compare these two different theories of program-size complexity.

In the 1960s version, an N -bit string will in general need an N -bit pro-gram, if it’s irreducible, and most strings are algorithmically irreducible.Most N -bit strings need an N -bit program. These are the irreducible strings,the ones that have no pattern, no structure. Most N -bit strings need an N -bit program, because there aren’t enough smaller programs.

But in the 1970s version of AIT, you go from N bits to N + log2N bits,because you want to make the programs self-delimiting. An N -bit string willusually need an N + log2N bit program:

Most N-bit stringsAIT1960: N bits of complexity,

AIT1970: N + log2N bits of complexity.

Actually, in AIT1970 it’s N plus H(N), which is the size of the smallestself-delimiting program to calculate N , that’s exactly what that logarithmicterm is. In other words, in the 1970s version of AIT, the size of the smallestprogram for calculating an N -bit string is usually N bits plus the size in bitsof the smallest self-delimiting program to calculate N , which is roughly

logN + log logN + log log logN + . . .


bits long. That’s the Russian dolls aspect of this.The 1970s version of AIT, which takes the idea of being self-delimiting

from the prefix and applies it to the whole program, gives us even betterperfect languages. AIT evolved in two stages. First we concentrate on thoseU with

U(πC p) = C(p)

with πC self-delimiting, and then we insist that the whole thing πC p has alsogot to be self-delimiting. And when you do that, you get important newresults, such as the sub-additivity of program-size complexity,

H(x, y) ≤ H(x) +H(y),

which is not the case if you don’t make everything self-delimiting. This justsays that you can concatenate the smallest program for calculating x and thesmallest program for calculating y to get a program for calculating x and y.

And you can’t even define the halting probability Ω in AIT1960. If youallow all N -bit strings to be programs, then you cannot define the haltingprobability in a natural way, because the sum for defining the probabilitythat a program will halt

Ω =∑p halts

2−(size in bits of p)

diverges to infinity instead of being between zero and one. This is the keytechnical point in AIT.

I want the halting probability to be finite. The normal way of thinkingabout programs is that there are 2N N -bit programs, and the natural wayof defining the halting probability is that every N -bit program that haltscontributes 1/2N to the halting probability. The only problem is that forany fixed size N there are roughly order of 2N programs that halt, so if yousum over all possible sizes, you get infinity, which is no good.

In order to get the halting probability to be between zero and one

0 < Ω =∑p halts

2−(size in bits of p) < 1

you have to be sure that the total probability summed over all programs pis less than or equal to one. This happens automatically if we force p tobe self-delimiting. How can we do this? Easy! Pretend that you are the


universal computer U . As you read the program bit by bit, you have tobe able to decide by yourself where the program ends, without any specialpunctuation, such as a blank, at the end of the program.

This implies that no extension of a valid program is a valid program, andthat the set of valid programs is what’s called a prefix-free set. Then the factthat the sum that defines Ω must be between zero and one, is just a specialcase of what’s called the Kraft inequality in Shannon information theory.

But this technical machinery isn’t necessary. That 0 < Ω < 1 followsimmediately from the fact that as you read the program bit by bit you areforced to decide where to stop without seeing any special punctuation. Inother words, in AIT1960 we were actually using a three-symbol alphabet forprograms: 0, 1 and blank. The blank told us where a program ends. Butthat’s a symbol that you’re wasting, because you use it very little. As youall know, if you have a three-symbol alphabet, then the right way to use itis to use each symbol roughly one-third of the time.

So if you really use only 0s and 1s, then you have to force the Turingmachine to decide by itself where the program ends. You don’t put a blankat the end to indicate that.

So programs go from N bits in size to N+log2N bits, because you’ve gotto indicate in each program how big it is. On the other hand, you can just takesubroutines and concatenate them to make a bigger program, so program-size complexity becomes sub-additive. You run the universal machine U tocalculate the first object x, and then you run it again to calculate the secondobject y, and then you’ve got x and y, and so

H(x, y) ≤ H(x) +H(y).

These self-delimiting binary languages are the ones that the study ofprogram-size complexity has led us to discriminate as the ideal languages,the most perfect languages. We got to them in two stages, AIT1960 andAIT1970. These are languages for computation, for expressing algorithms,not for mathematical reasoning. They are universal programming languagesthat are maximally expressive, maximally concise. We already knew how todo that in the 1960s, but in the 1970s we realized that programs should beself-delimiting, which made it possible to define the halting probability Ω.

Okay, so that’s the story, and now maybe I should summarize all of this,this saga of the quest for the perfect language. As I said, the search for theperfect language has some negative conclusions and some positive conclu-sions.


Hilbert wanted to find a perfect language giving all of mathematical truth,all mathematical knowledge, he wanted a formal axiomatic theory for all ofmathematics. This was supposed to be a Theory of Everything for the worldof pure math. And this cannot succeed, because we know that every formalaxiomatic theory is incomplete, as shown by Godel, by Turing, and by myhalting probability Ω. Instead of finding a perfect language, a perfect for-mal axiomatic theory, we found incompleteness, uncomputability, and evenalgorithmic irreducibility and algorithmic randomness.

So that’s the negative side of this story, which is fascinating from anepistemological point of view, because we found limits to what we can know,we found limits of formal reasoning.

Now interestingly enough, the mathematical community couldn’t careless. They still want absolute truth! They still believe in absolute truth, andthat mathematics gives absolute truth. And if you want a proof of this, justgo to the December 2008 issue of the Notices of the American MathematicalSociety. That’s a special issue of the Notices devoted to formal proof.

The technology has been developed to the point where they can run realmathematics, real proofs, through proof-checkers, and get them checked. Amathematician writes the proof out in a formal language, and fills in themissing steps and makes corrections until the proof-checker can understandthe whole thing and verify that it is correct. And these proof-checkers aregetting smarter and smarter, so that more and more of the details can beleft out. As the technology improves, the job of formalizing a proof becomeseasier and easier.

The formal-proof extremists are saying that in the future all mathematicswill have to be written out formally and verified by proof-checkers.

The engineering has been worked out to the point that you can formallyprove real mathematical results and run them through proof-checkers forverification. For example, this has been done with the proof of the four-colorconjecture. It was written out as a formal proof that was run through aproof-checker.

And the position of these extremists is that in the future all mathematicswill have to be written out in a formal language, and you will have to getit checked before submitting a paper to a human referee, who will then onlyhave to decide if the proof is worth publishing, not whether the proof iscorrect. And they want a repository of all mathematical knowledge, whichwould be a database of checked formal proofs of theorems.

This is a substantial community, and to learn more, go to the December


2008 AMS Notices, which is available on the web for free in the AMS website.This is being worked on by a sizeable community, and the Notices devoted aspecial issue to it, which means that mathematicians still believe in absolutetruth.

I’m not disparaging this extremely interesting work, but I am saying thatthere’s a wonderful intellectual tension between it and the incompleteness re-sults that I’ve discussed in this talk. There’s a wonderful intellectual tensionbetween incompleteness and the fact that people still believe in formal proofand absolute truth. People still want to go ahead and carry out Hilbert’sprogram and actually formalize everything, just as if Godel and Turing hadnever happened!

I think this is an extremely interesting and, at least for me, a quiteunexpected development.

These were the negative conclusions from this saga. Now I want to wrapthis talk up by summarizing the positive conclusions.

There are perfect languages, for computing, not for reasoning. They’recomputer programming languages. And we have universal Turing machinesand universal programming languages, and although languages for reason-ing cannot be complete, these universal programming languages are com-plete. Furthermore, AIT has picked out the most expressive programminglanguages, the ones that are particularly good to use for a theory of program-size complexity.

So there is a substantial practical spinoff. Furthermore, since I’ve workedmost of my professional career on AIT, I view AIT as a substantial contri-bution to the search for the perfect language, because it gives us a measureof expressive power, and of conceptual complexity and the complexityof ideas. Remember, I said that from the perspective of the Middle Ages,that’s how many yes/no decisions God had to make to create something,which obviously He will do in an optimum manner.2

From the theoretical side, however, this quest was disappointing due toGodel incompleteness and because there is no Theory of Everything for puremath. Provably there is no TOE for pure math. In fact, if you look at thebits of the halting probability Ω, they show that pure mathematics containsinfinite irreducible complexity, and in this precise sense is more like biology,the domain of the complex, than like theoretical physics, where there is still

2Note that program-size complexity = size of smallest name for something.


hope of finding a simple, elegant TOE.3

So this is the negative side of the story, unless you’re a biologist. Thepositive side is we get this marvelous programming technology. So this dream,the search for the perfect language and for absolute knowledge, ended in thebowels of a computer, it ended in a Golem.

In fact, let me end with a Medieval perspective on this. How would allthis look to someone from the Middle Ages? This quest, the search for theperfect language, was an attempt to obtain magical, God-like powers.

Let’s bring someone from the 1200s here and show them a notebookcomputer. You have this dead machine, it’s a machine, it’s a physical object,and when you put software into it, all of a sudden it comes to life!

So from the perspective of the Middle Ages, I would say that the perfectlanguages that we’ve found have given us some magical, God-like powers,which is that we can breath life into some inanimate matter. Observe thathardware is analogous to the body, and software is analogous to the soul,and when you put software into a computer, this inanimate object comes tolife and creates virtual worlds.

So from the perspective of somebody from the year 1200, the search forthe perfect language has been successful and has given us some magical,God-like abilities, except that we take them entirely for granted.

Thanks very much!4

3Incompleteness can be considered good rather than bad: It shows that mathematicsis creative, not mechanical.

4Twenty minutes of questions and discussion followed. These have not been transcribed,but are available via digital streaming video at http://pirsa.org/09090007/.

Chapter 3

Is the world built out ofinformation? Is everythingsoftware?

From Chaitin, Costa, Doria, After Godel, in preparation. Lecture, the Technion, Haifa,

Thursday, 10 June 2010.

Now for some even weirder stuff! Let’s return to The Thirteenth Floor andto the ideas that we briefly referred to in the introductory section of thischapter.

Let’s now turn to ontology: What is the world built out of, made out of?Fundamental physics is currently in the doldrums. There is no pressing

unexpected, new experimental data — or if there is, we can’t see that itis! So we are witnessing a return to pre-Socratic philosophy with its em-phasis on ontology rather than epistemology. We are witnessing a returnto metaphysics. Metaphysics may be dead in contemporary philosophy, butamazingly enough it is alive and well in contemporary fundamental physicsand cosmology.

There are serious problems with the traditional view that the world isa space-time continuum. Quantum field theory and general relativity con-tradict each other. The notion of space-time breaks down at very smalldistances, because extremely massive quantum fluctuations (virtual parti-cle/antiparticle pairs) should provoke black holes and space-time should betorn apart, which doesn’t actually happen.

Here are two other examples of problems with the continuum, with very

39


small distances:

• the infinite self-energy of a point electron in classical Maxwell electro-dynamics,

• and in quantum field theory, renormalization, which Dirac never ac-cepted.

And here is an example of renormalization: the infinite bare charge of theelectron which is shielded by vacuum polarization via virtual pair formationand annihilation, so that far from an electron it only seems to have finitecharge. This is analogous to the behavior of water, which is a highly polarizedmolecule forming micro-clusters that shield charge, with many of the highlypositive hydrogen-ends of H2O near the highly negative oxygen-ends of thesewater molecules.

In response to these problems with the continuum, some of us feel thatthe traditional

Pythagorian ontology:God is a mathematician,

the world is built out of mathematics,

should be changed to this more modern

→ Neo-Pythagorian ontology:God is a programmer,

the world is built out of software.

In other words, all is algorithm!There is an emerging school, a new viewpoint named digital philosophy.

Here are some key people and key works in this new school of thought: Ed-ward Fredkin, http://www.digitalphilosophy.org, Stephen Wolfram, A NewKind of Science, Konrad Zuse, Rechnender Raum (Calculating Space), Johnvon Neumann, Theory of Self-Reproducing Automata, and Chaitin, MetaMath!.1

These may be regarded as works on metaphysics, on possible digitalworlds. However there have in fact been parallel developments in the worldof physics itself.

1Lesser known but important works on digital philosophy: Arthur Burks, Essays onCellular Automata, Edgar Codd, Cellular Automata.

Is the world built out of information? Is everything software? 41

Quantum information theory builds the world out of qubits, not matter.And phenomenological quantum gravity and the theory of the entropy ofblack holes suggests that any physical system contains only a finite numberof bits of information that grows, amazingly enough, as the surface areaof the physical system, not as its volume — hence the name holographicprinciple. For more on the entropy of black holes, the Bekenstein bound, andthe holographic principle, see Lee Smolin, Three Roads to Quantum Gravity.

One of the key ideas that has emerged from this research on possibledigital worlds is to transform the universal Turing machine, a machinecapable of running any algorithm, into the universal constructor, a ma-chine capable of building anything:

Universal Turing Machine → Universal Constructor.

And this leads to the idea of an information economy: worlds in whicheverything is software, worlds in which everything is information and you canconstruct anything if you have a program to calculate it. This is like magicin the Middle Ages. You can bring something into being by invoking its truename. Nothing is hardware, everything is software!2

A more modern version of this everything-is-information view is presentedin two green-technology books by Freeman Dyson: The Sun, the Genome andthe Internet, and A Many-Colored Glass. He envisions seeds to grow houses,seeds to grow airplanes, seeds to grow factories, and imagines children usinggenetic engineering to design and grow new kinds of flowers! All you need iswater, sun and soil, plus the right seeds!

From an abstract, theoretical mathematical point of view, the key concepthere is an old friend from Chapter 2:

H(x) = the size in bits of the smallest program to compute x.

H(x) is also = to the minimum amount of algorithmic information neededto build/construct x, = in Medieval language the number of yes/no decisionsGod had to make to create x, = in biological terms, roughly the amount ofDNA needed for growing x.

It requires the self-delimiting programs of Chapter 2 for the followingintuitively necessary condition to hold:

H(x, y) ≤ H(x) +H(y) + c.

2On magic in the Middle Ages, see Umberto Eco, The Search for the Perfect Language,and Allison Coudert, Leibniz and the Kabbalah.


This says that algorithmic information is sub-additive: If it takes H(x) bits ofinformation to build x and H(y) bits of information to build y, then the sumof that suffices to build both x and y. Furthermore, the mutual information,the information in common, has this important property:

H(x) +H(y)−H(x, y) =

H(x)−H(x|y∗) +O(1),H(y)−H(y|x∗) +O(1).

Here

H(x|y) = the size in bits of the smallest program to compute x from y.

This triple equality tells us that the extent to which it is better to buildx and y together rather than separately (the bits of subroutines that areshared, the amount of software that is shared) is also equal to the extentthat knowing a minimum-size program y? for y helps us to know x and tothe extent to which knowing a minimum-size program x? for x helps us toknow y. (This triple equality is an idealization; it holds only in the limit ofextremely large compute times for x and y.)

These results about algorithmic information/complexity H are a kindof economic meta-theory for the information economy, which is the asymp-totic limit, perhaps, of our current economy in which material resources(petroleum, uranium, gold) are still important, not just technological andscientific know-how.

But as astrophysicist Fred Hoyle points out in his science fiction novelOssian’s Ride, the availability of unlimited amounts of energy, say from nu-clear fusion reactors, would make it possible to use giant mass spectrometersto extract gold and other chemical elements directly from sea water and soil.Material resources would no longer be that important.

If we had unlimited energy, all that would matter would be know-how,information, knowing how to build things. And so we finally end up with theidea of a printer for objects, a more plebeian term for a universal constructor.There are already commercial versions of such devices. They are called 3Dprinters and are used for rapid prototyping and digital fabrication. They arenot yet universal constructors, but the trend is clear. . . 3

In Medieval terms, results about H(x) are properties of the size of spells,they are about the complexity of magic incantations! The idea that every-thing is software is not as new as it may seem.

3One current project is to build a 3D printer that can print a copy of itself. Seehttp://reprap.org.

Bibliography

[1] A. Burks, Essays on Cellular Automata, University of Illinois Press(1970).

[2] G. J. Chaitin, Meta Math!, Pantheon (2005).

[3] E. Codd, Cellular Automata, Academic Press (1968).

[4] A. Coudert, Leibniz and the Kabbalah, Kluwer (1995).

[5] F. Dyson, The Sun, the Genome and the Internet, Oxford UniversityPress (1999).

[6] F. Dyson, A Many-Colored Glass, University of Virginia Press (2007).

[7] U. Eco, The Search for the Perfect Language, Blackwell (1995).

[8] E. Fredkin, http://www.digitalphilosophy.org.

[9] F. Hoyle, Ossian’s Ride, Harper (1959).

[10] J. von Neumann, Theory of Self-Reproducing Automata, University ofIllinois Press (1966).

[11] L. Smolin, Three Roads to Quantum Gravity, Basic Books (2001).

[12] S. Wolfram, A New Kind of Science, Wolfram Media (2002).

[13] K. Zuse, Rechnender Raum (Calculating Space), Vieweg (1969).

43


Chapter 4

The information economy

S. Zambelli, Computable, Constructive and Behavioural Economic Dynamics, Routledge,2010, pp. 73–78.

In honor of Kumaraswamy Velupillai’s 60th birthday

Abstract: One can imagine a future society in which natural resources areirrelevant and all that counts is information. I shall discuss this possibil-ity, plus the role that algorithmic information theory might then play as ametatheory for the amount of information required to construct something.

Introduction

I am not an economist; I work on algorithmic information theory (AIT). Thisessay, in which I present a vision of a possible future information economy,should not be taken too seriously. I am merely playing with ideas and tryingto provide some light entertainment of a kind suitable for this festschriftvolume, given Vela’s deep appreciation of the relevance of foundational issuesin mathematics for economic theory.

In algorithmic information theory, you measure the complexity of some-thing by counting the number of bits in the smallest program for calculatingit:

program → Universal Computer → output.

If the output of a program could be a physical or a biological system, thenthis complexity measure would give us a way to measure of the difficulty of

45


explaining how to construct or grow something, in other words, measureeither traditional smokestack or newer green technological complexity:

software → Universal Constructor → physical system,DNA → Development → biological system.

And it is possible to conceive of a future scenario in which technology isnot natural-resource limited, because energy and raw materials are freelyavailable, but is only know-how limited.

In this essay, I will outline four different versions of this dream, in orderto explain why I take it seriously:

1. Magic, in which knowing someone’s secret name gives you power overthem,

2. Astrophysicist Fred Hoyle’s vision of a future society in his science-fiction novel Ossian’s Ride,

3. Mathematician John von Neumann’s cellular automata world with itsself-reproducing automata and a universal constructor,

4. Physicist Freeman Dyson’s vision of a future green technology in whichyou can, for example, grow houses from seeds.

As these four examples show, if an idea is important, it’s reinvented, it keepsbeing rediscovered. In fact, I think this is an idea whose time has come.

Secret/True Names and the Esoteric Tradition

“In the beginning was the Word, and the Word was with God,and the Word was God.” John 1:1

Information knowing someone’s secret/true name is very important inthe esoteric tradition [1, 2]:

• Recall the German fairy tale in which the punch line is “Rumpelstiltskinis my name!” (the Brothers Grimm).

• You have power over someone if you know their secret name.

• You can summon a demon if you know its secret name.

The information economy 47

• In the Garden of Eden, Adam acquired power over the animals bynaming them.

• God’s name is never mentioned by Orthodox Jews.

• The golem in Prague was animated by a piece of paper with God’ssecret name on it.

• Presumably God can summon a person or thing into existence by callingits true name.

• Leibniz was interested in the original sacred Adamic language of cre-ation, the perfect language in which the essence/true nature of eachsubstance or being is directly expressed, as a way of obtaining ultimateknowledge. His project for a characteristica universalis evolved fromthis, and the calculus evolved from that. Christian Huygens, who hadtaught Leibniz mathematics in Paris, hated the calculus [3], becauseit eliminated mathematical creativity and arrived at answers mechani-cally and inelegantly.

Fred Hoyle’s Ossian’s Ride

The main features in the future economy that Hoyle imagines are:

• Cheap and unlimited hydrogen to helium fusion power,

• Therefore raw materials readily available from sea-water, soil and air(for example, using extremely large-scale and energy intensive massspectrometer-like devices [Gordon Lasher, private communication]).

• And with essentially free energy and raw materials, all that counts istechnological know-how, which is just information.

Perhaps it’s best to let Hoyle explain this in his own words [4]:

[T]he older established industries of Europe and America. . .grew up around specialized mineral deposits—coal, oil, metallicores. Without these deposits the older style of industrializationwas completely impossible. On the political and economic fronts,


the world became divided into “haves” and “have-nots,” depend-ing whereabouts on the earth’s surface these specialized depositshappened to be situated. . .

In the second phase of industrialism. . . no specialized depositsare needed at all. The key to this second phase lies in the pos-session of an effectively unlimited source of energy. Everythinghere depends on the thermonuclear reactor. . . With a thermonu-clear reactor, a single ton of ordinary water can be made to yieldas much energy as several hundred tons of coal—and there is noshortage of water in the sea. Indeed, the use of coal and oil as aprime mover in industry becomes utterly inefficient and archaic.

With unlimited energy the need for high-grade metallic oresdisappears. Low-grade ones can be smelted—and there is an am-ple supply of such ores to be found everywhere. Carbon can betaken from inorganic compounds, nitrogen from the air, a wholevast range of chemical from sea water.

So I arrived at the rich concept of this second phase of industri-alization, a phase in which nothing is needed but the commonestmaterials—water, air and fairly common rocks. This was a phasethat can be practiced by anybody, by any nation, provided onecondition is met: provided one knows exactly what to do. Thissecond phase was clearly enormously more effective and powerfulthan the first.

Of course this concept wasn’t original. It must have been atleast thirty years old. It was the second concept that I was moreinterested in. The concept of information as an entity in itself,the concept of information as a violently explosive social force.

In Hoyle’s fantasy, this crucial information — including the design of ther-monuclear reactors — that suddenly propels the world into a second phaseof industrialization comes from another world. It is a legacy bequeathed tohumanity by a nonhuman civilization desperately trying to preserve anythingit can when being destroyed by the brightening of its star.


John von Neumann’s Cellular Automata

World

This cellular automata world first appeared in lectures and private workingnotes by von Neumann. These ideas were advertised in article in ScientificAmerican in 1955 that was written by John Kemeny [5]. Left unfinishedbecause of von Neumann’s death in 1957, his notes were edited by ArthurBurks and finally published in 1966 [6]. Burks then presented an overviewin [7]. Key points:

• World is a discrete crystalline medium.

• Two-dimensional world, graph paper, divided into square cells.

• Each square has 29 states.

• Time is quantized as well as space.

• State of each square the same universal function of its previous stateand the previous state of its 4 immediate neighbors (square itself plusup, down, left, right immediate neighbors).

• Universal constructor can assemble any quiescent array of states.

• Then you have to start the device running.

• The universal constructor is part of von Neumann’s self-reproducingautomata.

The crucial point is that in von Neumann’s toy world, physical systemsare merely discrete information, that is all there is. And there is no dif-ference between computing a string of bits (as in AIT) and “computing”(constructing) an arbitrary physical system.

I should also mention that starting from scratch, Edgar Codd came upwith a simpler version of von Neumann’s cellular automata world in 1968 [8].In Codd’s model cells have 8 states instead of 29.


Freeman Dyson’s Green Technology

Instead of Hoyle’s vision of a second stage of traditional smokestack heavyindustry, Dyson [9, 10] optimistically envisions a green-technology small-is-beautiful do-it-yourself grass-roots future.

The emerging technology that may someday lead to Dyson’s utopia is be-coming known as “synthetic biology” and deals with deliberately engineeredorganisms. This is also referred to as “artificial life,” the development of“designer genomes.” To produce something, you just create the DNA for it.Here are some key points in Dyson’s vision:

• Solar electrical power obtained from modified trees. (Not from ther-monuclear reactors!)

• Other useful devices/machines grown from seeds. Even houses grownfrom seeds?!

• School children able to design and grow new plants, animals.

• Mop up excessive carbon dioxide or produce fuels from sugar (actualCraig Venter projects [11]).

On a much darker note, to show how important information is, therepresumably exists a sequence of a few-thousand DNA bases (A, C, G, T)for the genome of a virus that would destroy the human race, indeed, mostlife on this planet. With current or soon-to-be-available molecular biologytechnology, genetic engineering tools, anyone who knew this sequence couldeasily synthesize the corresponding pathogen. Dyson’s utopia can easily turninto a nightmare.

AIT as an Economic Metatheory

So one can imagine scenarios in which natural resources are irrelevant andall that counts is technological know-how, that is, information. We have justseen four such scenarios. In such a world, I believe, AIT becomes, not aneconomic theory, but perhaps an economic metatheory, since it is a theoryof information, a theory about the properties of technological know-how, asI will now explain.


The main concept in AIT is the amount of information H(X) required tocompute (or construct) something, X. This is measured in bits of software,the number of bits in the smallest program that calculates X. Briefly, onerefers to H(X) as the complexity of X. For an introduction to AIT, pleasesee [12, 13].

In economic terms, H(X) is a measure of the amount of technologicalknow-how needed to produce X. If X is a hammer, H(X) will be small. IfX is a sophisticated military aircraft, H(X) will be quite large.

Two other concepts in AIT are the joint complexity H(X, Y ) of produc-ing X and Y together, and the relative complexity H(X|Y ) of producingX if we are given Y for free.

Consider now two objects, X and Y . In AIT,

H(X) +H(Y )−H(X, Y )

is referred to as the mutual information in X and Y . This is the extent towhich it is cheaper to produce X and Y together than to produce X and Yseparately, in other words, the extent to which the technological know-howneeded to produce X and Y can be shared, or overlaps. And there is a basictheorem in AIT that states that this is also

H(X)−H(X|Y ),

which is the extent to which being given the know-how for Y helps us toconstruct X, and it’s also

H(Y )−H(Y |X),

which is the extent to which being given the know-how for X helps us toconstruct Y . This is not earth-shaking, but it’s nice to know.

(For a proof of this theorem about mutual information, please see [14].)One of the reasons that we get these pleasing properties is that AIT is

like classical thermodynamics in that time is ignored. In thermodynamics,heat engines operate very slowly, for example, reversibly. In AIT, the timeor effort required to construct something is ignored, only the informationrequired is measured. This enables both thermodynamics and AIT to haveclean, simple results. They are toy models, as they must be if we wish toprove nice theorems.


Conclusion

Clearly, we are not yet living in an information economy. Oil, uranium,gold and other scarce, precious limited natural resources still matter. Butsomeday we may live in an information economy, or at least approach itasymptotically. In such an economy, everything is, in effect, software; hard-ware is comparatively unimportant. This is a possible world, though perhapsnot yet our own world.

References

1. A. Coudert, Leibniz and the Kabbalah, Kluwer, Dordrecht, 1995.

2. U. Eco, The Search for the Perfect Language, Blackwell, Oxford, 1995.

3. J. Hofmann, Leibniz in Paris 1672–1676, Cambridge University Press,1974, p. 299.

4. F. Hoyle, Ossian’s Ride, Harper & Brothers, New York, 1959, pp. 157–158.

5. J. Kemeny, “Man viewed as a machine,” Scientific American, April1955, pp. 58–67.

6. J. von Neumann, Theory of Self-Reproducing Automata, University ofIllinois Press, Urbana, 1966. (Edited and completed by Arthur W.Burks.)

7. A. Burks (ed.), Essays on Cellular Automata, University of IllinoisPress, Urbana, 1970.

8. E. Codd, Cellular Automata, Academic Press, New York, 1968.

9. F. Dyson, The Sun, the Genome, & the Internet, Oxford UniversityPress, New York, 1999.

10. F. Dyson, A Many-Colored Glass, University of Virginia Press, Char-lottesville, 2007.

11. C. Venter, A Life Decoded, Viking, New York, 2007.


12. G. Chaitin, Meta Maths, Atlantic Books, London, 2006.

13. G. Chaitin, Thinking about Godel and Turing, World Scientific, Singa-pore, 2007.

14. G. Chaitin, Exploring Randomness, Springer-Verlag, London, 2001, pp.95–96.

1 July 2008


Chapter 5

How real are real numbers?

We discuss mathematical and physical arguments against continuity and infavor of discreteness, with particular emphasis on the ideas of Emile Borel(1871–1956). Lecture given Tuesday, 22 September 2009, at Wilfrid LaurierUniversity in Waterloo, Canada.

I’m not going to give a tremendously serious talk on mathematics today.Instead I will try to entertain and stimulate you by showing you some reallyweird real numbers.

I’m not trying to undermine what you may have learned in your mathe-matics classes. I love the real numbers. I have nothing against real numbers.There’s even a real number — Ω — that has my name on it.1 But as youwill see, there are some strange things going on with the real numbers.

Let’s start by going back to a famous paper by Turing in 1936. This isTuring’s famous 1936 paper in the Proceedings of the London Mathemati-cal Society ; mathematicians proudly claim it creates the computer industry,which is not quite right of course.

But it does have the idea of a general-purpose computer and of hardwareand software, and it is a wonderful paper.

This paper is called “On computable numbers, with an application to theEntscheidungsproblem.” And what most people forget, and is the subject of

1See the chapter on “Chaitin’s Constant” in Steven Finch, Mathematical Constants,Cambridge University Press, 2003.

55


my talk today, is that when Turing talks about computable numbers, he’stalking about computable real numbers.

Turing, 1936: “On computable numbers. . . ”

But when you work on a computer, the last thing on earth you’re ever goingto see is a real number, because a real number has an infinite number of digitsof precision, and computers only have finite precision. Computers don’t quitemeasure up to the exalted standards of pure mathematics.

One of the important contributions of Turing’s paper, not to computertechnology but to pure mathematics, and even to philosophy and epistemol-ogy, is that Turing’s paper distinguishes very clearly between real numbersthat are computable and real numbers that are uncomputable.

What is a real number? It’s just a measurement made with infinite pre-cision. So if I have a straight line one unit long, and I want to find out wherea point is, that corresponds to a real number. If it is all the way to the leftin this unit interval, it’s 0.0000. . . If the point is all the way to the right, it’s1.0000. . . If it is exactly in the middle, that’s .50000. . . And every point onthis line corresponds to precisely one real number. There are no gaps.

0.0 ——– 0.5 ——– 1.0

So, if you just tell me exactly where a point is, that’s a real number. Fromthe point of view of geometrical intuition a real number is something verysimple: it is just a point on a line. But from an arithmetical point of view, ifyou want to calculate its numerical value digit by digit or bit by bit if you’reusing binary, it turns out that real numbers are problematical.

Even though to geometrical intuition points are the most natural andelementary thing you can imagine, if you want to actually calculate the valueof a real number with infinite precision, you can get into big trouble. Actually,you never calculate it with infinite precision. What Turing says is that youcalculate it with arbitrary precision.

His notion of a computable real number is a real number that you cancalculate as accurately as you may wish.

I guess he actually says it is an infinite calculation. You start calculatingits numerical value, if you’re using decimal, digit by digit, or if you’re usingbinary, bit by bit. You have the integer part, the decimal point, and thenyou have an infinite string of bits or digits, depending on your base, and thecomputer will grind away gradually giving you more and more bits or moreand more digits of the numerical value of the number.

How real are real numbers? 57

So that’s a computable real number. According to Turing, that means itis a real number for which there is an algorithm, a mechanical procedure, forcalculating its value with arbitrary accuracy, with more and more precision.

For example, π is a computable real number,√

2 is a computable realnumber, and e is a computable real number. In fact, every real numberyou’ve ever encountered in your math classes, every individual real numberthat you’ve ever encountered, is a computable real number.

Computable reals: π,√

2, e, 1/2, 3/4 . . .

These are the familiar real numbers, the computable ones, but surprisinglyenough, Turing points out that there are also lots of uncomputable realnumbers.

Dramatically enough, the moment Turing comes up with the computeras a mathematical concept — mathematicians call this a universal Turingmachine — he immediately points out that there are things no computercan do. And one thing no computer can do is calculate the value of anuncomputable real number.

How does Turing show that there are uncomputable reals? Well, the firstargument he gives goes back to Cantor’s theory of infinite sets, which tells usthat the set of real numbers is an infinite set that is bigger, that is infinitelymore numerous, than the set of computer programs.

The possible computer programs are just as numerous as the positiveintegers, as the whole numbers 1, 2, 3, 4, 5. . . but the set of real numbers isa much bigger infinity.

So in fact there are more uncomputable reals than computable reals.From Cantor’s theory of infinite sets, we see that the set of uncomputablereals is just as big as the set of all reals, while the set of computable realsis only as big as the set of whole numbers. The set of uncomputable reals ismuch bigger than the set of computable reals.

#uncomputable reals = #all reals = ℵ1,#computable reals = #computer programs = #whole numbers =ℵ0,

ℵ1 > ℵ0.

The set of computable reals is as numerous as the computer programs, be-cause each computable real needs a computer program to calculate it. Andthe computer programs are as numerous as the whole numbers 1, 2, 3, 4, 5. . .because you can think of a computer program as a very big whole number.


In base-two a whole number is just a string of bits, which is all a computerprogram is.

That most reals are uncomputable was quite a surprise, but Turingdoesn’t stop with that. In his famous paper he uses a technique from settheory called Cantor’s diagonal argument to exhibit an individual exampleof an uncomputable real.

Turing’s 1936 paper has an intellectual impact and a technological impact.From a technological point of view his paper is fantastic because as we all

know the computer has changed our lives; we can’t imagine living withoutit. In 1936 Turing has the idea of a flexible machine, of flexible hardware, ofwhat we now call software. A universal machine changes to be like any othermachine when you insert the right software. This is a very deep concept:a flexible digital machine. You don’t need lots of special-purpose machines,you can make do with only one machine. This is the idea of a general-purposecomputer, and it is in Turing’s wonderful 1936 paper, before anybody builtsuch a device.

But then immediately Turing points out that there are real numbersthat can’t be calculated, that no machine can furnish you with better andbetter approximations for; in fact there are more uncomputable reals thancomputable reals.

So in a sense this means that most individual real numbers are like myth-ical beasts, like Pegasus or unicorns — name your favorite mythical objectthat unfortunately doesn’t exist in the real world.

In this talk I will concentrate on the uncomputable reals, not the reals thatare familiar to all of us like π,

√2 and 3/4. I’ll tell you about some surprising

real numbers, ones that are in the shadow, that we cannot compute, andwhose numerical values are quite elusive.

And the first thing I’d like to say on this topic is that there actuallywas an example of an uncomputable real before Turing’s 1936 paper. It wasin a short essay by a wonderful French mathematician who is now largelyforgotten called Emile Borel. Emile Borel in 1927, without anticipating inany way Turing’s wonderful paper, does point out a real number that we cannow recognize as an uncomputable real.

Borel’s 1927 number is a very paradoxical real number, and I’d like totell you about it. Let’s see if you enjoy it as much as I do!

Borel’s idea is to have a know-it-all real number: It’s an oracle that knowsthe answer to every yes/no question.

Borel was a Frenchman, and he imagined writing all possible yes/no ques-


tions in French in a numbered list. So each question has its number, and thenwhat you do is consider the real number with no integer part, and whose Nthdecimal, the Nth digit after the decimal point of this number, answers theNth question in the list of all possible questions.

Borel’s 1927 know-it-all real: The Nth digit answers the Nth question.

If the answer to the Nth question is “yes,” then the Nth digit is, say, a 1, andif the answer is “no” then that digit will be a 2. You have an infinite numberof digits, so you can pack an infinite number of answers in Borel’s number.If you can imagine a list of all possible yes/no questions, this number will belike an oracle that will answer them all.

If you could know the value of this magical number with sufficient accu-racy, you could answer any particular, individual question.

Needless to say, this number is a bit fantastic. So why did Borel come upwith this crazy example? To show us that there are real numbers that arenot for real.

Borel’s amazing number will give you the answer to every yes/no questionin mathematics and in physics, and about the past and the future.

You can ask Borel’s oracle paradoxical questions, like

“Is the answer to this question no?”

And then you have a problem, because whether you answer “no” or youanswer “yes,” it will always be the wrong answer. There is no correct answer.

Another problem is if you ask

“Will I drink coffee tomorrow?”

And depending on what you find in Borel’s number, you do the opposite.You will just have to put up with not drinking coffee for one day, perhaps,to refute the know-it-all number.

So these are some paradoxical aspects of Borel’s idea. Another problemis, how do you make a list of all possible yes/no questions in order to give anumber to each question?

Actually, it is easy to number each question. What you do is make a list ofall possible texts drawn from the alphabet of the language you’re interestedin, and you have ten possibilities for each digit in Borel’s oracle number, andso far we’ve only used the digits 1 and 2. So you can use the other digits tosay that that sequence of characters from that particular national alphabet


isn’t grammatical, or it’s not a question, or it’s a question but it’s not ayes/no question, or it’s a yes/no question which has no answer because it’sparadoxical, and maybe it’s a yes/no question which has an answer but Idon’t want to tell you about the future because I want to avoid the coffeeproblem.

Another way to try to fix Borel’s number is to restrict it to just givethe answer to mathematical questions. You can imagine an infinite list ofmathematical questions, you pick a formal language in which you ask yes/nomathematical questions, you pick a notation for asking such questions, andthere is certainly no problem with having Borel’s know-it-all number answeronly mathematical questions. That will get rid of the paradoxes. You canmake that work simply because you can pack an infinite amount ofinformation in a real number.2

And this magical number could be represented, in principle, by two metalrods, if you believe in infinite precision lengths. You have a rod that is exactlyone unit long, one meter long, and you have another rod that is smaller, whoselength is precisely the know-it-all number. You have your standard meter,and you have something less than a meter long whose length is precisely themagical know-it-all number.

If you could measure the size of the smaller rod relative to the standardmeter with arbitrary accuracy, you could answer every mathematical ques-tion, if somebody gave you this magical metal rod.3

Of course, we are assuming that you can make measurements with infiniteprecision, which any physicist who is here will say is impossible. I think thatthe most accurate measurement that has ever been made has under twentydigits of precision.

But in theory having these two rods would give us an oracle. It would belike having Borel’s know-it-all number.

And now you’re not going to be surprised to hear that if you fix Borel’s

2As long as we avoid self-reference, i.e., giving this know-it-all number a name and thenhaving a digit ask about itself in a way that requires that very digit to be different. E.g.,is the digit of Borel’s know-it-all number that corresponds to this very question a 2?

3This is an argument against infinite divisibility of space. In 1932 Hermann Weyl gavean argument against the infinite divisibility of time. If time is really infinitely divisible,then a machine could perform one step of a calculation in one second, then another step inhalf a second, the next in 1/4 of a second, then in 1/8 of a second, then 1/16 of a second,and in this manner would perform an infinite number of steps in precisely 1+ 1

2 + 14 +. . . = 2

seconds. But no one believes that such so-called Zeno machines are actually possible.


number so that it is at least well-defined, not paradoxical, it is in fact un-computable. For otherwise you could compute the answer to every question,which is implausible.

But Borel did not really have the notion of computability nor of whata computer is. He was working with the idea of computation intuitively,informally, without defining it. He said that as far as he was concerned thisnumber is conceivable but not really legitimate. He felt that his real numberwas too wild.

Borel had what is now called a constructive attitude. His personal viewwhich he states in that little 1927 essay with the oracle number, is that hebelieves in a real number if in principle it can be calculated, if in theorythere’s some way to do that. And he talks about this counter-example, thisnumber which is conceivable but in his opinion is not really legitimate.

If we remove Borel’s number, that will leave a hole in the line, in theunit interval [0, 1] = 0 ≤ x ≤ 1. So we’ve broken the unit interval intotwo pieces because we just eliminated one point, which is very unfortunategeometrically. But let’s go on.

Okay, so this was 1927, this was Emile Borel with his paradoxical know-it-all real number that answers every yes/no question, and it is a mathematicalfantasy, not a reality. Now let’s go back to Turing.

In 1936 Turing points out that there are more uncomputable reals thancomputable ones. Now I’d like to tell you something that Turing doesn’tpoint out, which uses ideas from Emile Borel, who was one of the inventorsof what’s called measure theory or probability theory.

It turns out that if you choose a real number x between zero and one,x ∈ [0, 1], and you have uniform probability of picking any real numberbetween zero and one, then the probability is unity that the number x will beuncomputable. The probability is zero that the number will be computable.

Probuncomputable reals = 1, Probcomputable reals = 0.

That’s not too difficult to see. Now I’ll give you a mathematical proof ofthis.

You want to cover all the computable reals in the unit interval with acovering which can be arbitrarily small. That’s the way to prove that thecomputable reals have measure zero, which means that they are an infinites-imal part of the unit interval, that they have zero probability. Technically,you say they have measure zero.


Remember that every computable real corresponds to a program, theprogram to calculate it, and the programs are essentially positive integers,so there’s a first program, a second program, a third program. . . So you canimagine all the computable reals in a list. There will be a first computablereal, a second, a third. . .

And I cover the first computable real with an interval. I put on top of itan interval of length ε/2. And then I cover the second computable real withan interval of length ε/4. I cover the third computable real with an intervalof length ε/8 . . . then ε/16, then ε/32 . . .

So the total size of the covering is

ε

2+ε

4+ε

8+

ε

16+

ε

32+ . . . = ε.

Some of these intervals may overlap; it doesn’t really matter. What doesmatter is that the total length of all these intervals is exactly ε, and you canmake ε as small as you want.

So this is just a proof that something is of measure zero. I’m taking thetrouble to show that you can corner all the computable real numbers in theunit interval by covering them. You can do it with a covering that you canmake as small as you please, which means that the computable reals havezero probability, they occupy zero real-estate in the unit interval.

This is a proof that the computable reals really are exceptional. Butthey’re the exception of our normal, everyday experience. The fact that un-computable reals have probability unity doesn’t help us to find any concreteexamples!

To repeat, if I pick a real number at random between zero and one, itis possible to get a computable real, but it is infinitely unlikely. Probabilityzero in this circumstance doesn’t mean impossibility, it just means that it’san infinitesimal probability, it is infinitely unlikely. It is possible. It wouldbe miraculous, but it can happen.

The way mathematicians say this, is that real numbers are almost surelyuncomputable.

This is a bit discouraging to those of us who prefer computable reals touncomputable ones. Or maybe it is a bit surprising that all the individualreals in our normal experience are exceptional. It does make me think thatperhaps real numbers are problematic and cannot be taken for granted. Whatdo you think?


What’s another way to put it? In other words, the real numbers are likea Swiss cheese with a lot of holes. In fact, it’s all holes! It’s like looking atthe night sky. There are stars in the night sky, those are the computablereals, but the background is always black — the stars are the exception.

So that is how the real numbers look!All the reals we know and love are exceptional.And Borel goes a little farther in his last book, written when he was in

his eighties. In 1952 he published a book called Les nombres inaccessibles —Inaccessible Numbers — in which he points out that most real numbers can’teven be referred to or named individually in any way. The real numbers thatyou can somehow name, or pick out as individuals, even without being ableto compute them, have to be the exception, with total probability zero.

Most real numbers cannot even be named as individuals in any way,constructive or non-constructive. The way somebody put it is, most realsare wall-flowers, they’ll never be invited to dance!

Probindividually nameable reals = 0, Probun-nameable reals = 1.

Okay, so what I’d like to do in the rest of this talk is to take Borel’s crazy,know-it-all, oracle real number, and try to make it as realistic as possible.

We’ve gotten ourselves into a bit of a quandary. I tend to believe insomething if I can calculate it; if so, that mathematical object has concretemeaning for me. So I have a sort of constructive attitude in math.

But there is this surprising fact that in some sense most mathematicalfacts or objects seem to be beyond our reach. Most real numbers can neverbe calculated, they’re uncomputable, which suggests that mathematics is fullof things that we can’t know, that we can’t calculate.

This is related to something famous called Godel’s incompleteness the-orem from 1931, five years before Turing. Godel’s 1931 theorem says thatgiven any finite set of axioms, there will be true mathematical statementsthat escape, that are true but can’t be proven from those axioms. So math-ematics resists axiomatization. There is no Theory of Everything for puremathematics.

Godel, 1931: No TOE for pure mathematics!

And what Turing shows in 1936 is that there are a lot of things in mathemat-ics that you can never calculate, that are beyond our reach because there’sno way to calculate the answer. And in fact real numbers are an example:most real numbers, with probability one, cannot be calculated.


So it might be nice to try to come up with an example of a particular realnumber that can’t be calculated, and try to make these strange, mysterious,hidden real numbers as concrete as possible.

I’d like to show you a real number — Ω — which is as real as I can makeit, but is nevertheless uncomputable. That’s my goal.

In other words, there is what you can calculate or what you can prove inmathematics, and then there is this vast cloud of unknowing, of thingsthat you can’t know or calculate. And I would like to try to find somethingright at the border between the knowable and the unknowable. I’m going tomake it as real as possible, but it’s going to be just beyond our reach, justbeyond what we can calculate.

I want to show you a real number that can almost be calculated, whichis as close as possible to seeming real, concrete, but in fact escapes us and isan example of this ubiquitous phenomenon of uncomputability that Turingdiscovered in 1936, of numbers which cannot be computed.

How can we come up with a number like this? I’ll do it by combining ideasfrom Turing with ideas from Borel, and then using compression to eliminateall the redundancy. And the result will be my Ω number. This is not how Iactually discovered Ω, but I think it is a good way to understand Ω.

In his 1936 paper Turing discovered what’s called the halting problem.This famous paper took years to digest, and it was a while before mathe-maticians realized how important the halting problem is. Another importantidea in this paper is the notion of a universal Turing machine. Of course, hedoesn’t call it a Turing machine, that name came later. So if you look thereyou don’t find the words “Turing machine.”

Another thing that is in this paper but you won’t find it if you look for it,is a very famous result called the unsolvability of the halting problem, whichI will explain now. If you look at the paper, it’s not easy to spot, it’s notcalled that, but the idea is certainly there.

It took years of work on this paper by a community to extract the essentialideas, give them catchy names, and start waving flags with those names onthem.

So let me tell you about the halting problem, which is a very fundamentalthing that Turing came up with.

Remember that Turing has the idea of a general-purpose computer, andthen since he’s a pure mathematician, he immediately starts pointing outthat there are things that no computer can calculate. There are things thatno algorithm can achieve, which there is no mechanical way to calculate.


One of these things is called the halting problem. What is the haltingproblem? It’s a very simple question. Let’s say you’re given a computerprogram, and it’s a computer program that is self-contained, so it cannot askfor input, it cannot read in any data. If there is any data it needs all thatdata has to be included in the program as a constant.

And the program just starts calculating, it starts executing. And thereare two possibilities: Does the program go on forever, or at some point doesit get a result and say “I’m finished,” and halt? That’s the question.

Does a self-contained computer program ever halt?

So you’re given a program that’s self-contained, and want to know what willhappen. It’s a self-contained program, you just start running it — the processis totally mechanical — and there are two possibilities: The first possibilityis that this process will go on forever, that the program is searching forsomething that it will never find, and is in some kind of an infinite loop.

The other possibility is that the program will eventually find what it islooking for, and maybe produce a result; at any rate, it will halt and stopand it is finished.

And you can find out which is the case by running it. You run theprogram, and if it stops eventually you are going to discover that, if you arepatient enough.

The problem is what if it never stops; running the program cannot de-termine that. You can give up after running the program for a day or for aweek, but you can’t be sure it is never going to stop.

So Turing asks the very deep question, “Is there a general procedure,given a program, for deciding in advance, without running it, whether it isgoing to go on forever or whether it is eventually going to stop?” You wantan algorithm for deciding this. You want an algorithm which will take afinite amount of time to decide, and will always give you the correct answer.

And what Turing shows is that there is no general method for deciding,there is no algorithm for doing this; deciding whether a program halts or notisn’t a computable function.

This is a very simple question involving computer programs that alwayshas an answer — the program either goes on forever or not — but there’s nomechanical procedure for deciding, there’s no algorithm which always givesyou the correct answer, there’s no general way, given a computer program,to tell what is going to happen.


For individual programs you can sometimes decide, you can even settleinfinitely many cases, but there’s no general way to decide.

This is the famous result called the unsolvability of the halting problem.

Turing, 1936: Unsolvability of the halting problem!

However if you are a practical person from the business school, you may say,“What do I care?” And I would have to agree with you. You may well say,“All I care is will the program stop in a reasonable amount of time, say, ayear. Who is going to wait more than a year?”

But if you want to know if a program will stop in a fixed amount of time,that’s very easy to do, you just run it for that amount of time and see whathappens. There is no unsolvability, none at all.

You only get into trouble when there’s no time limit.So you may say that this is sort of a fantasy, because in the real world

there is always a time limit: We’re not going to live forever, or you’re goingto run out of power, or the computer is going to break down, or be crushed byglaciers, or the continents will shift and a volcano will melt your computer,or the sun will go nova, whatever horror you want to contemplate!

And I agree with you. The halting problem is a theoretical question. Itis not a practical question. The world of mathematics is a toy world wherewe ask fantasy questions, which is why we have nice theories that give niceanswers. The real world is messy and complicated. The reason you can usereasoning and prove things in pure mathematics is because it’s a toy world,it’s much simpler than the real world.

Okay, so this question, the halting problem, is not a real question, it’san abstract, philosophical question. If you suspected that, I agree with you,you were right! But I like to live in the world of ideas. It’s a game, you maysay, it’s a fantasy world, but that’s the world of pure mathematics.

So let’s start with the question Turing proved in his 1936 paper is unsolv-able. There is no general method, no mechanical procedure, no algorithm toanswer this question that will always work. So what I do, is I play a trick ofthe kind that you use in a field called statistical mechanics, a field of physics,which is to take an individual problem and imbed it in a space, an ensembleof all possible problems of that type. That’s a well-known strategy.

In other words, instead of asking if an individual program halts or not,let’s look at the probability that this will happen, taken over the ensembleof all possible programs. . .


But first let me tell you why it is sometimes very important to knowwhether a program halts or not. You may say, “Who cares?” Well, inpure mathematics it is important because there are famous mathematicalconjectures which it turns out are equivalent to asking whether a programhalts or not.

There’s a lovely example from ancient Greece. There’s something calleda perfect number. A number is perfect if it is the sum of all its divisors. (Ortwice the sum, if you include the number itself as one of the divisors.) So 6is a perfect number, because its divisors are 1, 2 and 3, and

6 = 1 + 2 + 3.

That’s a perfect number.If the sum of the divisors is more than the number, then it’s abundant;

if the sum of the divisors is less than the number, then it is deficient; andif the sum of the divisors is exactly equal to the number, then it is perfect.Furthermore, two numbers are amicable if each one is the sum of the divisorsof the other.

And there are lots of perfect numbers. The next perfect number is 28.

28 = 1 + 2 + 4 + 7 + 14.

The question is, are there any odd perfect numbers? This is a question thatgoes back to ancient Greece, to Pythagoras, Euclid and Plato.

Are there odd perfect numbers?

So the question is, are there any odd perfect numbers? And the answer,amazingly enough, is that nobody knows. It’s a very simple question, theconcepts go back two millennia, but all the perfect numbers that have beenfound are even, and nobody knows if there’s an odd perfect number.

Now, in principle you could just start a computer program going, haveit look at each odd number, find its divisors, add them, and see whetherthe sum is exactly the number. So if there’s an odd perfect number, we’reeventually going to find it.

If the program never ends, then all the perfect numbers are even. Itsearches for an odd perfect number and either it halts because it found one,or goes on forever without ever finding what it is looking for.

It turns out that most of the famous conjectures in mathematics, but notall, are equivalent to asking whether a computer program halts. The general


idea is that most down-to-earth mathematical questions are instances of thehalting problem. However whether or not there are infinitely many perfectnumbers — which is also unknown — is not a case of the halting problem.

On the other hand, a famous conjecture called the Riemann hypothesis isan instance of the halting problem. And there’s Fermat’s Last Theorem, ac-tually a three-century old conjecture which has now been proven by AndrewWiles, stating that there is no solution of

xN + yN = zN (x, y, z integers > 0, N integer ≥ 3).

These are all conjectures which if false can be refuted by a numerical counter-example. You can search systematically for a counter-example using a com-puter program, hence that kind of mathematical conjecture is equivalent toasking whether a program halts or not.

There’s a program that systematically looks for solutions of xN+yN = zN

and there’s a program that systematically looks for zeros of the Riemann zetafunction that are in the wrong place. The Riemann hypothesis is complicated,but if it’s false, there is a finite calculation which refutes it, and you can searchsystematically for that. (The Riemann hypothesis is important because if it’strue, then the prime numbers are smoothly distributed in a certain precisetechnical sense. This seems to be the case but no one can prove it.)

What I’m trying to say is that a lot of famous mathematical conjecturesare equivalent to special cases of the halting problem. If you had a way ofsolving the halting problem that would be pretty nifty. It would be greatto have an oracle for the halting problem. Which by the way is Turing’sterminology, but not in that famous 1936 paper. In another paper he talksabout oracles, which is a lovely term to use in pure mathematics.

Following Borel 1927, we know how to pack the answers to all possiblecases of the halting problem into one real number, and this gives us a morerealistic version of Borel’s magical know-it-all oracle number. You use thesuccessive bits of a real number to give the answer to every individual caseof the halting problem.

Remember that you can think of a computer program as a whole number,as an integer. You can number all the programs. In binary machine languagea computer program is just a long bit string, and you can think of it as thebase-two numeral for a big whole number. So every program is also a wholenumber.

And then if a program is the number N , the Nth program in a list of allpossible programs, you use the Nth bit of a real number to tell us whether


or not that program halts. If the Nth program halts, the Nth bit will be a1; if it doesn’t halt, the Nth bit will be a 0.

Halting-problem oracle number:The Nth bit answers the Nth case of the halting problem.

This is a more realistic version of Borel’s 1927 oracle number. And followingTuring’s 1936 paper it is uncomputable. Why?

Because if you could compute this real number, you could solve the halt-ing problem, you could decide whether any self-contained program will halt,and this would enable you to settle a lot of famous mathematical conjectures,for instance the Riemann hypothesis. The Clay Mathematics Institute hasoffered a million dollar prize to the person who settles the Riemann hypoth-esis, but only if they settle it positively, I think. But it would also be veryinteresting to refute the Riemann hypothesis.

There is a bit in this real number which corresponds to the program thatlooks for a refutation of the Riemann hypothesis. If you could know what thisparticular bit is, that wouldn’t actually be worth a million dollars, because itwouldn’t give you a proof. Nevertheless this is a piece of information that alot of mathematicians would like to know because the Riemann hypothesis isa famous problem in pure mathematics having to do with the prime numbersand how smoothly they are distributed.

So a halting-problem oracle would be a valuable thing to have. Thisnumber wouldn’t tell you about history or the future; it wouldn’t answerevery yes/no question in French. But Borel’s 1927 number is paradoxical.Our halting-problem oracle is a much more down-to-earth number. In spiteof being more down to earth, it is an uncomputable real that would also bevery valuable.

But we can do even better! This halting-problem oracle packs a lot ofmathematical information into one real number, but it doesn’t do it in thebest, most economical way. This real number is redundant, it repeats a lotof information, it’s not the most compact, concise way to give the answer toevery case of the halting problem. You’re wasting a lot of bits, you’re wastinga lot of space in this real number, you’re repeating a lot of information.

Let me tell you why. We want to know whether individual programs haltor not. Now I’ll give the second and last proof in this talk.

Suppose that we are given a lot of individual cases of the halting problem.Suppose we have a list of a thousand or a million programs, and want to knowif each one halts or not. These are all self-contained programs.


If you have a thousand programs or a million programs, you might thinkthat to know whether each of these programs halts or not is a thousand or amillion bits of mathematical information. And it turns out that it’s not, it’sactually only ten or twenty bits of mathematical information.

N cases of halting problem = only log2N bits of information.

Why isn’t it a thousand or a million bits of information?Well, you don’t need to know the answer in every individual case. You

don’t want to ask the oracle too many questions. Oracles should be usedsparingly.

Do we really need to ask the oracle about each individual program? Notat all! It is enough to know how many of the programs halt; I don’t need toknow each individual case.

And that’s a lot less information. If there are 2N programs, you onlyneed N bits of information, not 2N bits. You don’t need to know about eachindividual case. As I said, you just need to know how many of the programshalt. If there are N programs, that’s just log2N bits of information, whichis much less than N bits of information.

How come we get this huge savings?Let’s say you are given a finite set of programs, you have a finite collection

of programs, and you want to know whether each one halts or not. Why doesit suffice to know how many of these programs halt? You just start runningall of them in parallel, and they start halting, and eventually all the programsthat will ever halt, have halted. And if you know exactly how many that is,you don’t have to wait any longer, you can stop at that point. You knowthat all the other programs will never halt. All the ones that haven’t haltedyet are never going to halt.

In other words, the answers to individual instances of the halting problemare never independent, they are always correlated. These are not independentmathematical facts. That’s why we don’t really need to ask an oracle in eachindividual case whether a program halts. We can compress this informationa great deal. This information has a lot of redundancy. There are a lot ofcorrelations in the answers to individual instances of the halting problem.

Okay, so you don’t need to use a bit for each program to get a realnumber that’s an oracle for the halting problem. I just told you how to domuch better if you are only interested in a finite set of programs. But whatif you are interested in all possible programs, what then? Well, here’s howyou handle this.


You don’t ask whether individual programs halt or not; you ask what isthe probability that a program chosen at random will halt.

Halting probability Ω = Probrandom program halts.

That’s a real number between zero and one, and it is a real number I’m veryproud of. I like to call it Ω, which is the last letter in the Greek alphabet,because it’s sort of a maximally unknowable real number.

Let me explain first how you define Ω, and then I’ll talk about its remark-able properties.

The idea is this: I’m taking Turing’s halting problem and I’m making itinto the halting probability. Turing is interested in individual programs andasks whether or not they halt. I take all possible programs, I put them intoa bag, a big bag that contains every possible computer program, I close myeyes, I shake the bag, I reach in and pull out a program and ask, “What isthe probability that this program will halt?”

If every program halts, this probability would be one. If no programhalts, the probability of halting would be zero. Actually some programs haltand some don’t, so the halting probability is going to be strictly betweenzero and one

0 < Ω = .11011100 . . . < 1,

with an exact numerical value depending on the choice of programming lan-guage.

And it turns out that if you do things properly — there are some technicalproblems that I don’t want to talk about — you don’t really need to knowfor every individual program whether it halts or not. What you really needto know is what is the probability that a program will halt.

And the way it works is this: If I know the numerical value of the haltingprobability Ω with N bits of precision — I’m writing it in binary, in base two— if I know the numerical value of the halting probability Ω with N bits ofprecision, then I know for every program up to N bits in size whether or notit halts.

Can you see why? Try thinking about it for a while.

Knowing N bits of Ω ⇒ Knowing which ≤ N bit programs halt.

This is a very compact, compressed way — in fact, it is the most com-pressed, compact way — of giving the answers to Turing’s halting problem.You can show this is the best possible compression, the best possible oracle,


this is the most economical way to do it, this is the algorithmic informationcontent of the halting problem.

Let me try to explain this. Do you know about file compression programs?There are lots of compression programs on your computer, and I’m takingall the individual answers to the halting problem and compressing them.

So whatever your favorite compression program is, let’s use it to com-press all the answers to the halting problem. If you could compress it per-fectly, you’d get something that has absolutely no redundancy, somethingthat couldn’t be compressed any more.

So you get rid of all the redundancy in individual answers to the haltingproblem, and what you get is this number I call the halting probability Ω.This is just the most compact, compressed way to give you the answer to allthe individual cases of Turing’s famous 1936 halting problem.

Even though Ω is a very valuable number because it solves the haltingproblem, the interesting thing about it is that it is algorithmically and log-ically irreducible. In other words, Ω looks random, it looks like it has nostructure, the bits of its numerical value look like independent tosses of a faircoin.

The bits of Ω are irreducible mathematical information.

Why is this? The answer is, basically, that any structure in something dis-appears when you compress it. If there were any pattern in the bits of Ω,for example, if 0s and 1s were not equally likely, then Ω would not be maxi-mally compressed. In other words, when you remove all the redundancy fromsomething, what you’re left with looks random, but it isn’t, because it’s fullof valuable information.

What you get when you compress Turing’s halting problem, Ω, isn’t noise,it’s very valuable mathematical information, it gives you the answers to Tur-ing’s halting problem, but it looks random, accidental, arbitrary, simply be-cause you’ve removed all the redundancy. Each bit is a complete surprise.

This may seem paradoxical, but it is a basic result in information theorythat once you compress something and get rid of all the redundancy in it, ifyou take a meaningful message and do this to it, afterwards it looks just likenoise.

Let me summarize what we’ve seen thus far.We have the halting probability Ω that is an oracle for Turing’s halting

problem. It depends on your programming language and there are technicaldetails that I don’t want to go into, but if you do everything properly you


get this probability that is greater than zero and less than one. It’s a realnumber, and if you write it in base two, there’s no integer part, just a “.” andthen a lot of bits. These bits look like they have absolutely no structure orpattern; they look random, they look like the typical result of independenttosses of a fair coin. They are sort of maximally unknowable, maximallyuncomputable. Let me try to explain what this means.

At this point I want to make a philosophical statement. In pure mathe-matics all truths are necessary truths. And there are other truths that arecalled contingent or accidental like historical facts. That Napoleon was theemperor of France is not something that you expect to prove mathematically,it just happened, so it’s an accidental or a contingent truth.

And whether each bit of the numerical value of the halting probability Ωis a 0 or a 1 is a necessary truth, but looks like it’s contingent. It’s a perfectsimulation of a contingent, accidental, random truth in pure mathematics,where all truths are necessary truths.

The bits of Ω are necessary but look accidental, contingent.

This is a place where God plays dice. I don’t know if any of you rememberthe dispute many years ago between Neils Bohr and Albert Einstein aboutquantum mechanics? Einstein said, “God doesn’t play dice!”, and Bohr said,“Well, He does in quantum mechanics!” I think God also plays dice in puremathematics.

I do believe that in the Platonic world of mathematics the bits of thehalting probability are fully determined. It’s not arbitrary, you can’t chosethem at random. In the Platonic world of pure mathematics each bit isdetermined. Another way to put it is that God knows what each bit is.

But what can we know down here at our level with our finite means? Well,seen from our limited perspective the bits of Ω are maximally unknowable,they are a worst case.

The precise mathematical statement of why the bits of the numericalvalue of Ω are difficult to know, difficult to calculate and difficult to prove(to determine what they are by proof) is this: In order to be able to calculatethe first N bits of the halting probability you need to use a program that isat least N bits in size. And to be able to prove what each of these N bits isstarting from a set of axioms, you need to have at least N bits of axioms.

So the bits of Ω are irreducible mathematical facts, they are computa-tionally and logically irreducible. Essentially the only way to get out of a


formal mathematical theory what these bits are, is to put that in as a newaxiom. But you can prove anything by adding it as a new axiom.

So this a place where mathematical truth has no structure, no pattern,where logical reasoning doesn’t work, because these are sort of accidentalmathematical facts.

Let me explain this another way. Leibniz talks about something calledthe principle of sufficient reason. He was a rationalist, and he believed thatif anything is true, it must be true for a reason. In pure math the reasonthat something is true is called a proof. However, the bits of the haltingprobability Ω are truths that are true for no reason; more precisely, they aretrue for no reason simpler than themselves. The only way to prove whatthey are is to take that as a new postulate. They seem to be completelycontingent, entirely accidental.

The bits of Ω are mathematical facts that are true for no reason.

They look a lot like independent tosses of a fair coin, even though they aredetermined mathematically. It’s a perfect simulation within pure math ofindependent tosses of a fair coin.

So to give an example, 0s and 1s are going to be equally likely. If youknew all the even bits, it wouldn’t help you to get any of the odd bits. Ifyou knew the first million bits, it wouldn’t help you to get the next bit. It’sa place where mathematical truth just has no structure or pattern.

But the bits of Ω do have a lot of statistical structure. For example, inthe limit there will be exactly as many 0s as 1s, the ratio of their occurrenceswill tend to unity. Also, all blocks of two bits are equally likely. 00, 01, 10and 11 each have limiting relative frequency exactly 1/4 — you can provethat. More generally, Ω is what Borel called a normal number, which meansthat in each base b, every possible block of K base-b “digits” will have exactlythe same limiting relative frequency 1/bK . That’s provably the case for Ω.Ω is provably Borel normal.

Another thing you can show is that the Ω number is transcendental ;it’s not algebraic, it’s not the solution of an algebraic equation with integercoefficients. Actually any uncomputable number must be transcendental;it can’t be algebraic. But Ω is more than uncomputable, it’s maximallyuncomputable. This is a place where mathematical truth has absolutelyno structure or pattern. This is a place where mathematical truth lookscontingent or accidental or random.


Now if I may go one step further, I’d like to end this talk by comparingpure mathematics with theoretical physics and with biology.

Pure mathematics developed together with theoretical physics. A lot ofwonderful pure mathematicians of the past were also theoretical physicists,Euler for example, or more recently Hermann Weyl. The two fields are rathersimilar. And physicists are still hoping for a theory of everything (TOE),which would be a set of simple, elegant equations that give you the wholeuniverse, and which would fit on a T-shirt.

So that’s physics. On the other hand we have biology. Molecular biologyis a very complicated subject. An individual cell is like a city. Every one ofus has 3 × 109 bases in our DNA, which is 6 × 109 bits. There is no simpleequation for a human being. Biology is the domain of the complicated.

How does pure mathematics compare with these two other fields? Nor-mally you think pure math is closer to physics, since they grew together,they co-evolved. But what the bits of the halting probability Ω show is thatin a certain sense pure math is closer to biology than it is to theoreticalphysics, because pure mathematics provably contains infinite irreduciblecomplexity. Math is even worse than biology, which has very high but onlyfinite complexity. The human genome is 6 × 109 bits, which is a lot, butit’s finite. But pure mathematics contains the bits of Ω, which is an infinitenumber of bits of complexity!

Human = 6× 109 bits, Ω = infinite number of bits.

Thanks very much!


Chapter 6

Speculations on biology,information and complexity

Bulletin of the European Association for Theoretical Computer Science 91 (February 2007), pp. 231–237.

Abstract: It would be nice to have a mathematical understanding of basicbiological concepts and to be able to prove that life must evolve in very generalcircumstances. At present we are far from being able to do this. But I’lldiscuss some partial steps in this direction plus what I regard as a possiblefuture line of attack.

Can Darwinian evolution be made into a math-

ematical theory?

Is there a fundamental mathematical theory

for biology?

Darwin = math ?!

In 1960 the physicist Eugene Wigner published a paper with a wonderful title,“The unreasonable effectiveness of mathematics in the natural sciences.” Inthis paper he marveled at the miracle that pure mathematics is so oftenextremely useful in theoretical physics.

To me this does not seem so marvelous, since mathematics and physics co-evolved. That however does not diminish the miracle that at a fundamental

77


level Nature is ruled by simple, beautiful mathematical laws, that is, themiracle that Nature is comprehensible.

I personally am much more disturbed by another phenomenon, pointedout by I.M. Gel’fand and propagated by Vladimir Arnold in a lecture ofhis that is available on the web, which is the stunning contrast between therelevance of mathematics to physics, and its amazing lack of relevance tobiology!

Indeed, unlike physics, biology is not ruled by simple laws. There isno equation for your spouse, or for a human society or a natural ecology.Biology is the domain of the complex. It takes 3× 109 bases = 6× 109 bitsof information to specify the DNA that determines a human being.

Darwinian evolution has acquired the status of a dogma, but to me asa mathematician seems woefully vague and unsatisfactory. What is evolu-tion? What is evolving? How can we measure that? And can we prove,mathematically prove, that with high probability life must arise and evolve?

In my opinion, if Darwin’s theory is as simple, fundamental and basic asits adherents believe, then there ought to be an equally fundamental math-ematical theory about this, that expresses these ideas with the generality,precision and degree of abstractness that we are accustomed to demand inpure mathematics.

Look around you. We are surrounded by evolving organisms, they’reeverywhere, and their ubiquity is a challenge to the mathematical way ofthinking. Evolution is not just a story for children fascinated by dinosaurs.In my own lifetime I have seen the ease with which microbes evolve immunityto antibiotics. We may well live in a future in which people will again die ofsimple infections that we were once briefly able to control.

Evolution seems to work remarkably well all around us, but not as amathematical theory!

In the next section of this paper I will speculate about possible directionsfor modeling evolution mathematically. I do not know how to solve thisdifficult problem; new ideas are needed. But later in the paper I will havethe pleasure of describing a minor triumph. The program-size complexityviewpoint that I will now describe to you does have some successes to itscredit, even though they only take us an infinitesimal distance in the directionwe must travel to fully understand evolution.

Speculations on biology, information and complexity 79

A software view of biology:

Can we model evolution via evolving software?

I’d like to start by explaining my overall point of view. It is summarizedhere:

Life = Software ?program → COMPUTER → output

DNA → DEVELOPMENT/PREGNANCY → organism(Size of program in bits) ≈ (Amount of DNA in bases) × 2

So the idea is firstly that I regard life as software, biochemical software.In particular, I focus on the digital information contained in DNA. In myopinion, DNA is essentially a programming language for building an organismand then running that organism.

More precisely, my central metaphor is that DNA is a computer program,and its output is the organism. And how can we measure the complexity of anorganism? How can we measure the amount of information that is containedin DNA? Well, each of the successive bases in a DNA strand is just 2 bitsof digital software, since there are four possible bases. The alphabet forcomputer software is 0 and 1. The alphabet of life is A, G, C, and T,standing for adenine, cytosine, guanine, and thymine. A program is just astring of bits, and the human genome is just a string of bases. So in bothcases we are looking at digital information.

My basic approach is to measure the complexity of a digital object by thesize in bits of the smallest program for calculating it. I think this is moreor less analogous to measuring the complexity of a biological organism by 2times the number of bases in its DNA.

Of course, this is a tremendous oversimplification. But I am only search-ing for a toy model of biology that is simple enough that I can prove sometheorems, not for a detailed theory describing the actual biological organ-isms that we have here on earth. I am searching for the Platonic essence ofbiology; I am only interested in the actual creatures we know and love to theextent that they are clues for finding ideal Platonic forms of life.

How to go about doing this, I am not sure. But I have some suggestions.It might be interesting, I think, to attempt to discover a toy model for

evolution consisting of evolving, competing, interacting programs. Each or-ganism would consist of a single program, and we would measure its com-plexity in bits of software. The only problem is how to make the programs


interact! This kind of model has no geometry, it leaves out the physical uni-verse in which the organisms live. In fact, it omits bodies and retains onlytheir DNA. This hopefully helps to make the mathematics more tractable.But at present this model has no interaction between organisms, no notionof time, no dynamics, and no reason for things to evolve. The question ishow to add that to the model.

Hopeless, you may say. Perhaps not! Let’s consider some other modelsthat people have proposed. In von Neumann’s original model creatures areembedded in a cellular automata world and are largely immobile. Not sogood! There is also the problem of dissecting out the individual organismsthat are embedded in a toy universe, which must be done before their in-dividual complexities can be measured. My suggestion in one of my earlypapers that it might be possible to use the concept of mutual information—the extent to which the complexity of two things taken together is smallerthan the sum of their individual complexities—in order to accomplish this,is not, in my current opinion, particularly fruitful.

In von Neumann’s original model we have the complete physics for atoy cellular automata universe. Walter Fontana’s ALChemy = algorithmicchemistry project went to a slightly higher level of abstraction. It usedLISP S-expressions to model biochemistry. LISP is a functional programminglanguage in which everything—programs as well as data—is kept in identicalsymbolic form, namely as what are called LISP S-expressions. Such programscan easily operate on each other and produce other programs, much in theway that molecules can react and produce other molecules.

I have a feeling that both von Neumann’s cellular automata world andFontana’s algorithmic chemistry are too low-level to model biological evolu-tion. (A model with perhaps the opposite problem of being at too high alevel, is Douglas Lenat’s AM = Automated Mathematician project, whichdealt with the evolution of new mathematical concepts.) So instead I amproposing a model in which individual creatures are programs. As I said,the only problem is how to model the ecology in which these creatures com-pete. In other words, the problem is how to insert a dynamics into this staticsoftware world.1

1Thomas Ray’s Tierra project did in fact create an ecology with software parasitesand hyperparasites. The software creatures he considered were sequences of machinelanguage instructions coexisting in the memory of a single computer and competing forthat machine’s memory and execution time. Again, I feel this model was too low-level. Ifeel that too much micro-structure was included.


Since I have not been able to come up with a suitable dynamics for thesoftware model I am proposing, I must leave this as a challenge for thefuture and proceed to describe a few biologically relevant things that I cando by measuring the size of computer programs. Let me tell you what thisviewpoint can buy us that is a tiny bit biologically relevant.

Pure mathematics has infinite complexity and

is therefore like biology

Okay, program-size complexity can’t help us very much with biological com-plexity and evolution, at least not yet. It’s not much help in biology. Butthis viewpoint has been developed into a mathematical theory of complexitythat I find beautiful and compelling—since I’m one of the people who cre-ated it—and that has important applications in another major field, namelymetamathematics. I call my theory algorithmic information theory, and init you measure the complexity of something X via the size in bits of thesmallest program for calculating X, while completely ignoring the amountof effort which may be necessary to discover this program or to actually runit (time and storage space). In fact, we pay a severe price for ignoring thetime a program takes to run and concentrating only on its size. We get abeautiful theory, but we can almost never be sure that we have found thesmallest program for calculating something. We can almost never determinethe complexity of anything, if we chose to measure that in terms of the sizeof the smallest program for calculating it!

This amazing fact, a modern example of the incompleteness phenomenonfirst discovered by Kurt Godel in 1931, severely limits the practical utility ofthe concept of program-size complexity. However, from a philosophical pointof view, this paradoxical limitation on what we can know is precisely themost interesting thing about algorithmic information theory, because thathas profound epistemological implications.

The jewel in the crown of algorithmic information theory is the haltingprobability Ω, which provides a concentrated version of Alan Turing’s 1936halting problem. In 1936 Turing asked if there was a way to determinewhether or not individual self-contained computer programs will eventuallystop. And his answer, surprisingly enough, is that this cannot be done.Perhaps it can be done in individual cases, but Turing showed that there


could be no general-purpose algorithm for doing this, one that would workfor all possible programs.

The halting probability Ω is defined to be the probability that a programthat is chosen at random, that is, one that is generated by coin tossing, willeventually halt. If no program ever halted, the value of Ω would be zero. Ifall programs were to halt, the value of Ω would be one. And since in actualfact some programs halt and some fail to halt, the value of Ω is greaterthan zero and less than one. Moreover, Ω has the remarkable property thatits numerical value is maximally unknowable. More precisely, let’s imaginewriting the value of Ω out in binary, in base-two notation. That would consistof a binary point followed by an infinite stream of bits. It turns out that thesebits are irreducible, both computationally and logically:

• You need an N -bit program in order to be able to calculate the first Nbits of the numerical value of Ω.

• You need N bits of axioms in order to be able to prove what are thefirst N bits of Ω.

• In fact, you need N bits of axioms in order to be able to determine thepositions and values of any N bits of Ω, not just the first N bits.

Thus the bits of Ω are, in a sense, mathematical facts that are true for noreason, more precisely, for no reason simpler than themselves. Essentiallythe only way to determine the values of some of these bits is to directly addthat information as a new axiom.

And the only way to calculate individual bits of Ω is to separately addeach bit you want to your program. The more bits you want, the larger yourprogram must become, so the program doesn’t really help you very much.You see, you can only calculate bits of Ω if you already know what these bitsare, which is not terribly useful. Whereas with π = 3.1415926 . . . we can getall the bits or all the digits from a single finite program, that’s all you haveto know. The algorithm for compresses an infinite amount of informationinto a finite package. But with Ω there can be no compression, none at all,because there is absolutely no structure.

Furthermore, since the bits of Ω in their totality are infinitely complex,we see that pure mathematics contains infinite complexity. Each of the bitsof Ω is, so to speak, a complete surprise, an individual atom of mathematicalcreativity. Pure mathematics is therefore, fundamentally, much more similar


to biology, the domain of the complex, than it is to physics, where thereis still hope of someday finding a theory of everything, a complete set ofequations for the universe that might even fit on a T-shirt.

In my opinion, establishing this surprising fact has been the most impor-tant achievement of algorithmic information theory, even though it is actuallya rather weak link between pure mathematics and biology. But I think it’san actual link, perhaps the first.

Computing Ω in the limit from below as a

model for evolution

I should also point out that Ω provides an extremely abstract—much tooabstract to be satisfying—model for evolution. Because even though Ω con-tains infinite complexity, it can be obtained in the limit of infinite time viaa computational process. Since this extremely lengthy computational pro-cess generates something of infinite complexity, it may be regarded as anevolutionary process.

How can we do this? Well, it’s actually quite simple. Even though, asI have said, Ω is maximally unknowable, there is a simple but very time-consuming way to obtain increasingly accurate lower bounds on Ω. To dothis simply pick a cut-off t, and consider the finite set of all programs p upto t bits in size which halt within time t. Each such program p contributes1/2|p|, 1 over 2 raised to p’s size in bits, to Ω. In other words,

Ω = limt→∞

∑|p| ≤ t & halts within time t

2−|p|

.

This may be cute, and I feel compelled to tell you about it, but I certainlydo not regard this as a satisfactory model for biological evolution, since thereis no apparent connection with Darwin’s theory.

References

The classical work on a theoretical mathematical underpinning for biologyis von Neumann’s posthumous book [2]. (An earlier account of von Neu-mann’s thinking on this subject was published in [1], which I read as a


child.) Interestingly enough, Francis Crick—who probably contributed morethan any other individual to creating modern molecular biology—for manyyears shared an office with Sydney Brenner, who was aware of von Neumann’sthoughts on theoretical biology and self-reproduction. This interesting factis revealed in the splendid biography of Crick [3].

For a book-length presentation of my own work on information and com-plexity, see [4], where there is a substantial amount of material on molecularbiology. This book is summarized in my recent article [5], which howeverdoes not discuss biology. A longer overview of [4] is my Alan Turing lecture[6], which does touch on biological questions.

For my complete train of thought on biology extending over nearly fourdecades, see also [7,8,9,10,11].

For information on Tierra, see Tom Ray’s home page at http://www.his.atr.jp/~ray/. For information on ALChemy, see http://www.santafe.

edu/~walter/AlChemy/papers.html. For information on Douglas Lenat’sAutomated Mathematician, see [12] and the Wikipedia entry http://en.

wikipedia.org/wiki/Automated_Mathematician.For Vladimir Arnold’s provocative lecture, the one in which Wigner and

Gel’fand are mentioned, see http://pauli.uni-muenster.de/~munsteg/

arnold.html. Wigner’s entire paper is itself on the web at http://www.

dartmouth.edu/~matc/MathDrama/reading/Wigner.html.

1. J. Kemeny, “Man viewed as a machine,” Scientific American, April1955, pp. 58–67.

2. J. von Neumann, Theory of Self-Reproducing Automata, University ofIllinois Press, Urbana, 1967.

3. M. Ridley, Francis Crick, Eminent Lives, New York, 2006.

4. G. Chaitin, Meta Math!, Pantheon Books, New York, 2005.

5. G. Chaitin, “The limits of reason,” Scientific American, March 2006,pp. 74–81.

6. G. Chaitin, “Epistemology as information theory: from Leibniz to Ω,”European Computing and Philosophy Conference, Vasteras, Sweden,June 2005.


7. G. Chaitin, “To a mathematical definition of ‘life’,” ACM SICACTNews, January 1970, pp. 12–18.

8. G. Chaitin, “Toward a mathematical definition of ‘life’,” R. Levine,M. Tribus, The Maximum Entropy Formalism, MIT Press, 1979, pp.477–498.

9. G. Chaitin, “Algorithmic information and evolution,” O. Solbrig, G.Nicolis, Perspectives on Biological Complexity, IUBS Press, 1991, pp.51-60.

10. G. Chaitin, “Complexity and biology,” New Scientist, 5 October 1991,p. 52.

11. G. Chaitin, “Meta-mathematics and the foundations of mathematics,”Bulletin of the European Association for Theoretical Computer Science,June 2002, pp. 167–179.

12. D. Lenat, “Automated theory formation in mathematics,” pp. 833–842in volume 2 of R. Reddy, Proceedings of the 5th International JointConference on Artificial Intelligence, Cambridge, MA, August 1977,William Kaufmann, 1977.


Chapter 7

Metaphysics, metamathematicsand metabiology

To be published in H. Zenil, Randomness Through Computation, World Scientific, 2011.

Abstract: In this essay we present an information-theoretic perspec-tive on epistemology using software models. We shall use the notion ofalgorithmic information to discuss what is a physical law, to determine thelimits of the axiomatic method, and to analyze Darwin’s theory of evolution.

Weyl, Leibniz, complexity and the principle of

sufficient reason

The best way to understand the deep concept of conceptual complexity andalgorithmic information, which is our basic tool, is to see how it evolved,to know its long history. Let’s start with Hermann Weyl and the greatphilosopher/mathematician G. W. Leibniz. That everything that is true istrue for a reason is rationalist Leibniz’s famous principle of sufficient reason.The bits of Ω seem to refute this fundamental principle and also the ideathat everything can be proved starting from self-evident facts.

87


What is a scientific theory?

The starting point of algorithmic information theory, which is the subject ofthis essay, is this toy model of the scientific method:

theory/program/010 → Computer → experimental data/output/110100101.

A scientific theory is a computer program for exactly producing the exper-imental data, and both theory and data are a finite sequence of bits, a bitstring. Then we can define the complexity of a theory to be its size in bits,and we can compare the size in bits of a theory with the size in bits of theexperimental data that it accounts for.

That the simplest theory is best, means that we should pick the smallestprogram that explains a given set of data. Furthermore, if the theory is thesame size as the data, then it is useless, because there is always a theory thatis the same size as the data that it explains. In other words, a theory mustbe a compression of the data, and the greater the compression, the betterthe theory. Explanations are compressions, comprehension is compression!

Furthermore, if a bit string has absolutely no structure, if it is completelyrandom, then there will be no theory for it that is smaller than it is. Mostbit strings of a given size are incompressible and therefore incomprehensible,simply because there are not enough smaller theories to go around.

This software model of science is not new. It can be traced back viaHermann Weyl (1932) to G. W. Leibniz (1686)! Let’s start with Weyl. Inhis little book on philosophy The Open World: Three Lectures on the Meta-physical Implications of Science, Weyl points out that if arbitrarily complexlaws are allowed, then the concept of law becomes vacuous, because there isalways a law! In his view, this implies that the concept of a physical law andof complexity are inseparable; for there can be no concept of law without acorresponding complexity concept. Unfortunately he also points out that inspite of its importance, the concept of complexity is a slippery one and hardto define mathematically in a convincing and rigorous fashion.

Furthermore, Weyl attributes these ideas to Leibniz, to the 1686 Dis-cours de metaphysique. What does Leibniz have to say about complexity inhis Discours? The material on complexity is in Sections V and VI of theDiscours.

In Section V, Leibniz explains why science is possible, why the world iscomprehensible, lawful. It is, he says, because God has created the bestpossible, the most perfect world, in that the greatest possible diversity of

Metaphysics, metamathematics and metabiology 89

phenomena are governed by the smallest possible set of ideas. God simul-taneously maximizes the richness and diversity of the world and minimizesthe complexity of the ideas, of the mathematical laws, that determine thisworld. That is why science is possible!

A modern restatement of this idea is that science is possible because theworld seems very complex but is actually governed by a small set of lawshaving low conceptual complexity.

And in Section VI of the Discours, Leibniz touches on randomness. Hepoints out that any finite set of points on a piece of graph paper always seemsto follow a law, because there is always a mathematical equation passingthrough those very points. But there is a law only if the equation is simple,not if it is very complicated. This is the idea that impressed Weyl, and itbecomes the definition of randomness in algorithmic information theory.1

Finding elegant programs

So the best theory for something is the smallest program that calculates it.How can we be sure that we have the best theory? Let’s forget about theoriesand just call a program elegant if it is the smallest program that producesthe output that it does. More precisely, a program is elegant if no smallerprogram written in the same language produces the same output.

So can we be sure that a program is elegant, that it is the best theoryfor its output? Amazingly enough, we can’t: It turns out that any formalaxiomatic theory A can prove that at most finitely many programs are el-egant, in spite of the fact that there are infinitely many elegant programs.More precisely, it takes an N -bit theory A, one having N bits of axioms,having complexity N , to be able to prove that an individual N -bit programis elegant. And we don’t need to know much about the formal axiomatictheory A in order to be able to prove that it has this limitation.

What is a formal axiomatic theory?

All we need to know about the axiomatic theory A, is the crucial require-ment emphasized by David Hilbert that there should be a proof-checking

1Historical Note: Algorithmic information theory was first proposed in the 1960s byR. Solomonoff, A. N. Kolmogorov, and G. J. Chaitin. Solomonoff and Chaitin consideredthis toy model of the scientific method, and Kolmogorov and Chaitin proposed definingrandomness as algorithmic incompressibility.


algorithm, a mechanical procedure for deciding if a proof is correct or not. Itfollows that we can systematically run through all possible proofs, all possiblestrings of characters in the alphabet of the theory A, in size order, check-ing which ones are valid proofs, and thus discover all the theorems, all theprovable assertions in the theory A.2

That’s all we need to know about a formal axiomatic theory A, that thereis an algorithm for generating all the theorems of the theory. This is thesoftware model of the axiomatic method studied in algorithmic informationtheory. If the software for producing all the theorems is N bits in size, thenthe complexity of our theory A is defined to be N bits, and we can limit A’spower in terms of its complexity H(A) = N . Here’s how:

Why can’t you prove that a program is elegant?

Suppose that we have an N -bit theory A, that is, that H(A) = N , and thatit is always possible to prove that individual elegant programs are in factelegant, and that it is never possible to prove that inelegant programs areelegant. Consider the following paradoxical program P :

P runs through all possible proofs in the formal axiomatic theoryA, searching for the first proof in A that an individual programQ is elegant for which it is also the case that the size of Q in bitsis larger than the size of P in bits. And what does P do when itfinds Q? It runs Q and then P produces as its output the outputof Q.

In other words, the output of P is the same as the output of the first provablyelegant program Q that is larger than P . But this contradicts the definitionof elegance! P is too small to be able to calculate the output of an elegantprogram Q that is larger than P . We seem to have arrived at a contradiction!

But do not worry; there is no contradiction. What we have actuallyproved is that P can never find Q. In other words, there is no proof in theformal axiomatic theory A that an individual program Q is elegant, not ifQ is larger than P . And how large is P? Well, just a fixed number of bitsc larger than N , the complexity H(A) of the formal axiomatic theory A. P

2Historical Note: The idea of running through all possible proofs, of creativity bymechanically trying all possible combinations, can be traced back through Leibniz toRamon Llull in the 1200s.


consists of a small, fixed main program c bits in size, followed by a largesubroutine H(A) bits in size for generating all the theorems of A.

The only thing tricky about this proof is that it requires P to be able toknow its own size in bits. And how well we are able to do this depends onthe details of the particular programming language that we are using for theproof. So to get a neat result and to be able to carry out this simple, elegantproof, we have to be sure to use an appropriate programming language. Thisis one of the key issues in algorithmic information theory, which programminglanguage to use.3

Farewell to reason: The halting probability Ω4

So there are infinitely many elegant programs, but there are only finitelymany provably elegant programs in any formal axiomatic theoryA. The proofof this is rather straightforward and short. Nevertheless, this is a fundamentalinformation-theoretic incompleteness theorem that is rather different in stylefrom the classical incompleteness results of Godel, Turing and others.

An even more important incompleteness result in algorithmic informa-tion theory has to do with the halting probability Ω, the numerical valueof the probability that a program p whose successive bits are generated byindependent tosses of a fair coin will eventually halt:

Ω =∑p halts

2−(size in bits of p).

To be able to define this probability Ω, it is also very important how youchose your programming language. If you are not careful, this sum willdiverge instead of being ≤ 1 like a well-behaved probability should.

Turing’s fundamental result is that the halting problem in unsolvable.In algorithmic information theory the fundamental result is that the haltingprobability Ω is algorithmically irreducible or random. It follows that thebits of Ω cannot be compressed into a theory less complicated than theyare. They are irreducibly complex. It takes N bits of axioms to be able to

3See the chapter on “The Search for the Perfect Language” in Chaitin, Mathematics,Complexity and Philosophy, in press.

4Farewell to Reason is the title of a book by Paul Feyerabend, a wonderfully provocativephilosopher. We borrow his title here for dramatic effect, but he does not discuss Ω inthis book or any of his other works.


determine N bits of the numerical value

Ω = .1101011 . . .

of the halting probability. If your formal axiomatic theory A has H(A) = N ,then you can determine the values and positions of at most N + c bits of Ω.

In other words, the bits of Ω are logically irreducible, they cannot beproved from anything simpler than they are. Essentially the only way todetermine what are the bits of Ω is to add these bits to your theory A as newaxioms. But you can prove anything by adding it as a new axiom. That’snot using reasoning!

So the bits of Ω refute Leibniz’s principle of sufficient reason: they aretrue for no reason. More precisely, they are not true for any reason simplerthan themselves. This is a place where mathematical truth has absolutelyno structure, no pattern, for which there is no theory!

Adding new axioms: Quasi-empirical mathematics5

So incompleteness follows immediately from fundamental information-theoretic limitations. What to do about incompleteness? Well, just addnew axioms, increase the complexity H(A) of your theory A! That is theonly way to get around incompleteness.

In other words, do mathematics more like physics, add new axioms notbecause they are self-evident, but for pragmatic reasons, because they helpmathematicians to organize their mathematical experience just like physi-cal theories help physicists to organize their physical experience. After all,Maxwell’s equations and the Schrodinger equation are not at all self-evident,but they work! And this is just what mathematicians have done in theoret-ical computer science with the hypothesis that P 6= NP , in mathematicalcryptography with the hypothesis that factoring is hard, and in abstractaxiomatic set theory with the new axiom of projective determinacy.6

5The term quasi-empirical is due to the philosopher Imre Lakatos, a friend of Feyer-abend. For more on this school, including the original article by Lakatos, see the collectionof quasi-empirical philosophy of math papers edited by Thomas Tymoczko, New Directionsin the Philosophy of Mathematics.

6See the article on “The Brave New World of Bodacious Assumptions in Cryptography”in the March 2010 issue of the AMS Notices, and the article by W. Hugh Woodin on “TheContinuum Hypothesis” in the June/July 2001 issue of the AMS Notices.


Mathematics, biology and metabiology

We’ve discussed physical and mathematical theories; now let’s turn to biol-ogy, the most exciting field of science at this time, but one where mathematicsis not very helpful. Biology is very different from physics. There is no sim-ple equation for your spouse. Biology is the domain of the complex. Thereare not many universal rules. There are always exceptions. Math is veryimportant in theoretical physics, but there is no fundamental mathematicaltheoretical biology.

This is unacceptable. The honor of mathematics requires us to come upwith a mathematical theory of evolution and either prove that Darwin waswrong or right! We want a general, abstract theory of evolution, not animmensely complicated theory of actual biological evolution. And we wantproofs, not computer simulations! So we’ve got to keep our model very, verysimple.

That’s why this proposed new field is metabiology, not biology.What kind of math can we use to build such a theory? Well, it’s certainly

not going to be differential equations. Don’t expect to find the secret oflife in a differential equation; that’s the wrong kind of mathematics for afundamental theory of biology.

In fact a universal Turing machine has much more to do with biology thana differential equation does. A universal Turing machine is a very complicatednew kind of object compared to what came previously, compared with thesimple, elegant ideas in classical mathematics like analysis. And there areself-reproducing computer programs, which is an encouraging sign.

There are in fact three areas in our current mathematics that do havesome fundamental connection with biology, that show promise for math tocontinue moving in a biological direction:

Computation, Information, Complexity.

DNA is essentially a programming language that computes the organism andits functioning; hence the relevance of the theory of computation for biology.

Furthermore, DNA contains biological information. Hence the relevanceof information theory. There are in fact at least four different theories ofinformation:

• Boltzmann statistical mechanics and Boltzmann entropy,

• Shannon communication theory and coding theory,


• algorithmic information theory (Solomonoff, Kolmogorov, Chaitin),which is the subject of this essay, and

• quantum information theory and qubits.

Of the four, AIT (algorithmic information theory) is closest in spirit to biol-ogy. AIT studies the size in bits of the smallest program to compute some-thing. And the complexity of a living organism can be roughly (very roughly)measured by the number of bases in its DNA, in the biological computer pro-gram for calculating it.

Finally, let’s talk about complexity. Complexity is in fact the most distin-guishing feature of biological as opposed to physical science and mathematics.There are many computational definitions of complexity, usually concernedwith computation times, but again AIT, which concentrates on program sizeor conceptual complexity, is closest in spirit to biology.

Let’s emphasize what we are not interested in doing. We are certainlynot trying to do systems biology: large, complex realistic simulations ofbiological systems. And we are not interested in anything that is at all likeFisher-Wright population genetics that uses differential equations to studythe shift of gene frequencies in response to selective pressures.

We want to use a sufficiently rich mathematical space to model the spaceof all possible designs for biological organisms, to model biological creativity.And the only space that is sufficiently rich to do that is a software space, thespace of all possible algorithms in a fixed programming language. Otherwisewe have limited ourselves to a fixed set of possible genes as in populationgenetics, and it is hopeless to expect to model the major transitions in bio-logical evolution such as from single-celled to multicellular organisms, whichis a bit like taking a main program and making it into a subroutine that iscalled many times.

Recall the cover of Stephen Gould’s Wonderful Life on the Burgess shaleand the Cambrian explosion? Around 250 primitive organisms with wildlydiffering body plans, looking very much like the combinatorial exploration ofa software space. Note that there are no intermediate forms; small changesin software produce vast changes in output.

So to simplify matters and concentrate on the essentials, let’s throw awaythe organism and just keep the DNA. Here is our proposal:

Metabiology: a field parallel to biology that studies the randomevolution of artificial software (computer programs) rather than


natural software (DNA), and that is sufficiently simple to permitrigorous proofs or at least heuristic arguments as convincing asthose that are employed in theoretical physics.

This analogy may seem a bit far-fetched. But recall that Darwin himselfwas inspired by the analogy between artificial selection by plant and animalbreeders and natural section imposed by malthusian limitations.

Furthermore, there are many tantalizing analogies between DNA andlarge, old pieces of software. Remember bricolage, that Nature is a cobbler,a tinkerer? In fact, a human being is just a very large piece of software, onethat is 3× 109 bases = 6× 109 bits ≈ one gigabyte of software that has beenpatched and modified for more than a billion years: a tremendous mess, infact, with bits and pieces of fish and amphibian design mixed in with thatfor a mammal.7 For example, at one point in gestation the human embryohas gills. As time goes by, large human software projects also turn into atremendous mess with many old bits and pieces.

The key point is that you can’t start over, you’ve got to make do withwhat you have as best you can. If we could design a human being fromscratch we could do a much better job. But we can’t start over. Evolutiononly makes small changes, incremental patches, to adapt the existing codeto new environments.

So how do we model this? Well, the key ideas are:

Evolution of mutating software,

and:

Random walks in software space.

That’s the general idea. And here are the specifics of our current model,which is quite tentative.

We take an organism, a single organism, and perform random mutationson it until we get a fitter organism. That replaces the original organism, andthen we continue as before. The result is a random walk in software spacewith increasing fitness, a hill-climbing algorithm in fact.8

7See Neil Shubin, Your Inner Fish: A Journey into the 3.5-Billion-Year History of theHuman Body.

8In order to avoid getting stuck on a local maximum, in order to keep evolution fromstopping, we stipulate that there is a non-zero probability to go from any organism toany other organism, and − log2 of the probability of mutating from A to B defines animportant concept, the mutation distance, which is measured in bits.


Finally, a key element in our proposed model is the definition of fitness.For evolution to work, it is important to keep our organisms from stagnating.It is important to give them something challenging to do.

The simplest possible challenge to force our organisms to evolve is whatis called the Busy Beaver problem, which is the problem of providing concisenames for extremely large integers. Each of our organisms produces a singlepositive integer. The larger the integer, the fitter the organism.9

The Busy Beaver function of N , BB(N), that is used in AIT is defined tobe the largest positive integer that is produced by a program that is less thanor equal to N bits in size. BB(N) grows faster than any computable functionof N and is closely related to Turing’s famous halting problem, because ifBB(N) were computable, the halting problem would be solvable.10

Doing well on the Busy Beaver problem can utilize an unlimited amountof mathematical creativity. For example, we can start with addition, theninvent multiplication, then exponentiation, then hyper-exponentials, and usethis to concisely name large integers:

N +N → N ×N → NN → NNN → . . .

There are many possible choices for such an evolving software model:You can vary the computer programming language and therefore the soft-ware space, you can change the mutation model, and eventually you couldalso change the fitness measure. For a particular choice of language andprobability distribution of mutations, and keeping the current fitness func-tion, it is possible to show that in time of the order of 2N the fitness willgrow as BB(N), which grows faster than any computable function of N andshows that genuine creativity is taking place, for mechanically changing theorganism can only yield fitness that grows as a computable function.11

9Alternative formulations: The organism calculates a total function f(n) of a singlenon-negative integer n and f(n) is fitter than g(n) if f(n)/g(n) → ∞ as n → ∞. Or theorganism calculates a (constructive) Cantor ordinal number and the larger the ordinal,the fitter the organism.

10Consider BB′(N) defined to be the maximum run-time of any program that halts thatis less than or equal to N bits in size.

11Note that to actually simulate our model an oracle for the halting problem wouldhave to be employed to avoid organisms that have no fitness because they never calculatea positive integer. This also explains how the fitness can grow faster than any computablefunction. In our evolution model, implicit use is being made of an oracle for the haltingproblem, which answers questions whose answers cannot be computed by any algorithmicprocess.


So with random mutations and just a single organism we actually do getevolution, unbounded evolution, which was precisely the goal of metabiology!

This theorem may seem encouraging, but it actually has a serious prob-lem. The times involved are so large that our search process is essentiallyergodic, which means that we are doing an exhaustive search. Real evolu-tion is not at all ergodic, since the space of all possible designs is much tooimmense for exhaustive search.

It turns out that with this same model there is actually a much quickerideal evolutionary pathway that achieves fitness BB(N) in time of the order ofN . This path is however unstable under random mutations, plus it is muchtoo good: Each organism adds only a single bit to the preceding organism,and immediately achieves near optimal fitness for an organism of its size,which doesn’t seem to at all reflect the haphazard, frozen-accident nature ofwhat actually happens in biological evolution.12

So that is the current state of metabiology: a field with some promise, butnot much actual content at the present time. The particular details of ourcurrent model are not too important. Some kind of mutating software modelshould work, should exhibit some kind of basic biological features. The chal-lenge is to identify such a model, to characterize its behavior statistically,13

and to prove that it does what is required.

12The Nth organism in this ideal evolutionary pathway is essentially just the first N bitsof the numerical value of the halting probability Ω. Can you figure out how to computeBB(N) from this?

13For instance, will some kind of hierarchical structure emerge? Large human softwareprojects are always written that way.


Bibliography

[1] G. J. Chaitin, Thinking about Godel and Turing: Essays on Complexity,1970–2007, World Scientific, 2007.

[2] G. J. Chaitin, Mathematics, Complexity and Philosophy, Midas, in press.(Draft at http://www.cs.umaine.edu/~chaitin/midas.html.)

[3] S. Gould, Wonderful Life, Norton (1989).

[4] N. Koblitz and A. Menezes, “The brave new world of bodacious assump-tions in cryptography,” AMS Notices 57, 357–365 (2010).

[5] G. W. Leibniz, Discours de metaphysique, suivi de Monadologie, Galli-mard (1995).

[6] N. Shubin, Your Inner Fish, Pantheon (2008).

[7] T. Tymoczko, New Directions in the Philosophy of Mathematics, Prince-ton University Press (1998).

[8] H. Weyl, The Open World, Yale University Press (1932).

[9] W. H. Woodin, “The continuum hypothesis, Part I,” AMS Notices 48,567–576 (2001).

99


Chapter 8

Algorithmic information as afundamental concept in physics,mathematics and biology

In Memoriam Jacob T. “Jack” Schwartz (1930–2009)

The concept of information is not only fundamental in quantum me-chanics, but also, when formulated as program-size complexity, helps inunderstanding what is a law of nature, the limitations of the axiomaticmethod, and Darwin’s theory of evolution. Lecture given Wednesday, 23September 2009, at the Institute for Quantum Computing in Waterloo,Canada.

I’m delighted to be here at IQC, an institution devoted to two of my favoritetopics, information and computation.

Information & Computation

The field I work in is algorithmic information theory, AIT, and in a funnyway AIT is precisely the dual of what the IQC is all about, which is quantuminformation and quantum computing.

Let me compare and contrast the two fields: First of all, I care aboutbits of software, not qubits, I care about the size of programs, not about

101


compute times; I look at the information content of individual objects, notat ensembles, and my computers are classical, not quantum. You care aboutwhat can be known in physical systems and I care about what can be knownin pure mathematics. You care about practical applications, and I care aboutphilosophy.

And strangely enough, we both sort of end up in the same place, becauseGod plays dice both in quantum mechanics and in pure math: You needquantum randomness for cryptography, and I find irreducible complexity inthe bits of the halting probability Ω.

So my subject will be algorithmic information, not qubits, and I’d like toshow you three different applications of the notion of algorithmic information:in physics, in math and in biology. I’ll show you three software models, threetoy models of what goes on in physics, in math and in biology, in which youget insight by considering the amount of software, the size of programs, thealgorithmic information content.

But first of all, I should say that these are definitely toy models, highlysimplified models, of what goes on in physics, in math and in biology. Infact, my motto for today is taken from Picasso, who said, “Art is a lie thathelps us see the truth!” Well, that also applies to theories:

Theories are lies that help us see the truth! (Picasso)

You see, in order to create a mathematical theory in which you can proveneat theorems, you have to concentrate on the essential features of a situation.The models that I will show you are highly simplified toy models. You haveto eliminate all the distractions, brush away all the inessential features, sothat you can see the ideal case, the ideal situation. I am only interested inthe Platonic essence of a situation, so that I can weave it into a beautifulmathematical theory, so that I can lay bare its inner soul. I have no interestin complicated, realistic models of anything, because they do not lead tobeautiful theories.

You will have to judge for yourself if my models are oversimplified, andwhether or not they help you to understand what is going on in more realisticsituations.

So let’s start with physics, with what you might call AIT’s version of theLeibniz/Weyl model of the scientific method. What is a law of nature, whatis a scientific law? The AIT model of this is given in this diagram:

Theory / Program / 011 → Computer → Experimental Data / World / 1101101

Algorithmic information in physics, mathematics and biology 103

On the left-hand side we have a scientific theory, which is a computer pro-gram, a finite binary sequence, a bit string, for calculating exactly yourexperimental data, perhaps the whole time-evolution of the universe, whichin this discrete model is also a bit string.

In other words, in this model a program is a theory for its output. I amnot interested in prediction and I am not interested in statistical theories,only in deterministic theories that explain the data perfectly. And I don’tcare how much work it takes to execute the program, to run the theory, toactually calculate the world from it, as long as the amount of time requiredto do this is finite. And I assume the world is discrete, not continuous.Remember Picasso. Remember I’m a pure mathematician!

The best theory is the smallest, the most concise program that calculatesprecisely your data. And a theory is useless unless it is a compression, unlessit has a much smaller number of bits than the data it accounts for, the datait explains. Why? Because there is always a theory with the same numberof bits, even if the data is completely random.

In fact, this gives you a way to distinguish between situations wherethere is a law, and lawless situations, ones where there is no theory, no wayto understand what is happening.

Amazingly enough, aspects of this approach go back to Leibniz in 1686and to Weyl in 1932. Let me tell you about this. In fact, it’s all here:

If arbitrarily complex laws are permitted, then the concept of lawbecomes vacuous, because there is always a law!

Leibniz, 1686: Discours de metaphysique V, VI

Hermann Weyl, 1932: The Open World1949: Philosophy of Mathematics & Natural Science

So you see, the concepts of law of nature and of complexity are insepara-ble. Remember, 1686 was the year before Newton’s Principia. What Leibnizdiscusses in Sections V and VI of his Discours is why is science possible,how can you distinguish a world in which science applies from one where itdoesn’t, how can you tell if the world is lawful or not. According to AIT,that’s only if there’s a theory with a much smaller number of bits than thedata.

Okay, that’s enough physics, let’s go on to mathematics. What if youwant to prove that you have the best theory, the most concise program forsomething? What if you try to use mathematical reasoning for that?


What metamathematics studies are so-called formal axiomatic theories,in which there has to be a proof-checking algorithm, and therefore there is analgorithm for checking all possible proofs and enumerating all the theoremsin your formal axiomatic theory. And that’s the model of an axiomatictheory studied in AIT. AIT concentrates on the size in bits of the algorithmfor running through all the proofs and producing all theorems, a very slow,never-ending computation, but one that can be done mindlessly. So howmany bits of information does it take to do this? That is all AIT caresabout. To AIT a formal axiomatic theory is just a black box; the innerstructure, the details of the theory, are unimportant.

Formal Axiomatic Theory / Hilbert

Axioms, Rules of Inference → Computer → Theorem1, Theorem2,Theorem3 . . .

How many bits of software are you putting into the computer toget the theorems out?

So that’s how we measure the information content, the complexity of amathematical theory. The complexity of a formal axiomatic theory is thesize in bits of the program that generates all the theorems. An N -bit theoryis one in which the program to generate all the theorems is N bits in size.

And my key result is that it takes an N -bit axiomatic theory to enableyou to prove that an N -bit program is elegant, that is, that it is the mostconcise explanation for its output. More precisely, a program is elegant if nosmaller program written in the same language produces the same output.

It takes an N -bit theory to prove that an N -bit program is elegant.

How can you prove this metatheorem?Well, that’s very easy. Consider a formal axiomatic mathematical theory

T and this paradoxical program P :

P computes the same output as the first provably elegant program Qwhose size in bits is larger than P .

In other words, P runs through all proofs in T , checking them all, until itgets to the first proof that a program Q that is larger than P is elegant,then P runs Q and produces Q’s output. But if P finds Q and does that,it produces the same output as a provably elegant program Q that is larger


than P , which is impossible, because an elegant program is the most conciseprogram that yields the output it does.

Hence P never finds Q, which means that in T you can’t prove that Q iselegant if Q is larger in size than P .

So now the key question is this: How many bits are there in P? Well, justa fixed number of bits more than there are in T . In other words, there is aconstant c such that for any formal axiomatic theory T , if you can prove in Tthat a program Q is elegant only if it is actually elegant, then you can provein T that Q is elegant only if Q’s size in bits, |Q|, is less than or equal to cplus the complexity of T , which is by definition the number of bits of softwarethat it takes to enumerate all the theorems of your theory T . Q.E.D.

“Q is elegant” ∈ T only if Q is elegant⇒ “Q is elegant” ∈ T only if |Q| ≤ |T |+ c.

That’s the first major result in AIT: that it takes an N -bit theory to provethat an N -bit program is elegant. The second major result is that the bits ofthe base-two numerical value of the halting probability Ω are irreducible,because it takes an N -bit theory to enable you to determine N bits of Ω.

This I won’t prove, but let me remind you how Ω is defined. And ofcourse Ω = ΩL also depends on your choice of programming language L,which I discussed in my talk on Monday at the Perimeter Institute:

Halting Probability

0 < ΩL =∑

p halts 2−|p| < 1, ΩL = .1101100 . . .

N -bit theory ⇒ at most N + c bits of Ω.

By the way, the fact that in general a formal axiomatic theory can’tenable you to prove that individual programs Q are elegant, except in finitelymany cases, has as an immediate corollary that Turing’s halting problem isunsolvable. Because if we had a general method, an algorithm, for decidingif a program will ever halt, then it would be trivial to decide if Q is elegant:You’d just run all the programs that halt that are smaller than Q to see ifany of them produce the same output.

Corollary: There is no algorithm for solving the halting problem.

And we have also proved in two different ways that the world of pure mathis infinitely complex, that no theory of finite complexity can enable you todetermine all the elegant programs or all the bits of the halting probabilityΩ.


Corollary: The world of pure math has infinite complexity, andis therefore more like biology, the domain of the complex, thanlike physics, where there is still hope of a simple, elegant theoryof everything.

Those are my software models of physics and math. Now for a softwaremodel of evolution! These are some new ideas of mine. I’ve just startedworking on this. I call it metabiology.

Our starting point is the fact that, as Jack Schwartz used to tell me,DNA is just digital software. So we model organisms only as DNA, thatis, we consider software organisms. Remember Picasso! We’ll mutate thesesoftware organisms, have a fitness function, and see what happens!

The basic ideas are summarized here:

MetabiologyRandom Walks in Software SpaceEvolution of Mutating Software

Organism = Program? Single Organism

Fitness Function = Busy Beaver Problem

Mutation Distance[A,B] =− log2 probability of mutating from A to B

It will take a while to explain all this, and to tell you how far I’ve been ableto get with this model. Basically, I have only two and a half theorems. . .

Let me start by reminding you that Darwin was inspired by the analogybetween artificial selection by animal and plant breeders and natural selec-tion by Nature. Metabiology exploits the analogy between natural software(DNA) and artificial software (computer programs):

METABIOLOGY: a field parallel to biology, dealing with the ran-dom evolution of artificial software (computer programs) ratherthan natural software (DNA), and simple enough to make it pos-sible to prove rigorous theorems or formulate heuristic argumentsat the same high level of precision that is common in theoreticalphysics.

Next, I’d like to tell you how I came up with this idea. There are twokey components. One I got by reading David Berlinski’s polemical book The


Devil’s Delusion which discusses some of the arguments against Darwinianevolution, and the other is the Busy Beaver problem, which gives our organ-isms something challenging to do. And I should tell you about Neil Shubin’sbook Your Inner Fish. Let me explain each of these in turn.

Berlinski has an incisive discussion of perplexities with Darwinian evolu-tion. One is the absence of intermediate forms, the other is major transitionsin evolution such as that from single-celled to multicellular organisms. Butneither of these is a problem if we consider software organisms.

Darwin himself worried about the eye; he thought that partial eyes wereuseless. In fact, as a biologist explained to me, eye-like organs have evolvedindependently many different times.

Anyway, we know very well that small changes in the software can producedrastic changes in the output. A one-bit change can destroy a program! Sothe absence of intermediate forms is not a problem.

How about the transition from unicellular to multicellular? No problem,that’s just the idea of a subroutine. You take the main program and makeit into a subroutine that you call many times. Or you fork it and run itsimultaneously in many parallel threads.

And Berlinski discusses the neutral theory of evolution and talks aboutevolution as a random walk.

So that was my starting point.But to make my software organisms evolve, I need to give them something

challenging to do. The Busy Beaver problem to the rescue! That’s theproblem of finding small, concise names for extremely large positive integers.A program names a positive integer by calculating it and then halting.

BB(N) = the largest positive integer you can name witha program of size ≤ N bits.

And the BB problem can utilize an unlimited amount of mathematical cre-ativity, because it’s equivalent to Turing’s halting problem, since anotherway to define BB of N is as follows:

BB(N) = the longest runtime of any program that halts that is ≤ N bits in size.

These two definitions of BB(N) are essentially equivalent.1

1On naming large numbers, see Archimedes’ The Sand Reckoner, described in Gamow,One, Two, Three. . . Infinity. I thank Ilias Kotsireas for reminding me of this early workon the BB problem.


The key point is that BB(N) grows faster than any computable functionof N , because otherwise there would be an algorithm for solving the halt-ing problem, which earlier in this lecture we showed is impossible using aninformation-theoretic argument.

BB(N) grows faster than any computable function of N .

At the level of abstraction I am working with in this model, there is noessential difference between mathematical creativity and biological creativity.

Now let’s turn to Neil Shubin’s book Your Inner Fish, that he summarizedin his article in the January 2009 special issue of Scientific American devotedto Darwin’s bicentennial.

I don’t know much about biology, but I do have a lot of experience withsoftware. Besides my theoretical work, during my career at IBM I workedon large software projects, compilers, operating systems, that kind of stuff.You can’t start a large software project over from scratch, you just patch itand add new function as best you can.

And as Shubin spells it out, that’s exactly what Nature does too. Thinkof yourself as an extremely large piece of software that has been patchedand modified for more than a billion years to adapt us to new ecologicalniches. We were not designed from scratch to be human beings, to be bipedalprimates. Some of the design is like that of fish, some is like that of anamphibian.

If you could start over, you could design human beings much better. Butyou can’t start over. You have to make do with what you have as best youcan. Evolution makes the minimum changes needed to adapt to a changingenvironment.

As Francois Jacob used to emphasize, Nature is a cobbler, a handyman,a bricoleur. We are all random walks in program space!

So those are the main ideas, and now let me present my model. I have asingle software organism and I try making random mutations. These could behigh-level language mutations (copy a subroutine with a change), or low-levelmutations.

Initially, I have chosen point mutations: insert, delete or change one ormore bits. As the number of bits that are mutated increases, the probabilityof the mutation drops off exponentially. Also I favor the beginning of theprogram. The closer to the beginning a bit is, the more likely it is to change.Again, that drops off exponentially.


So I try a random mutation and I see what is the fitness of the resultingorganism. I am only interested in programs that calculate a single positiveinteger then halt, and the bigger the integer the fitter the program. So if themutated organism is more fit, it becomes my current organism. Otherwise Ikeep my current organism and continue trying mutations.

By the way, to actually do this you would need to have an oracle for thehalting problem, since you have to skip mutations that give you a programthat never halts.

There is a non-zero probability that a single mutation will take us froman organism to any other, which ensures we will not get stuck at a localmaximum.

For the details, you can see my article in the February 2009 EATCSBulletin, the Bulletin of the European Association for Theoretical ComputerScience. I don’t think that the details matter too much. There are a lot ofparameters that you can vary in this model and probably still get things towork. For example, you can change the programming language or you canchange the mutation model.

As a matter of fact, the programming language I’ve used is one of theuniversal Turing machines in AIT1960 that I discussed in my Monday lectureat Perimeter. I picked it not because it is the right choice, but because it isa programming language I know well.

Okay, we have a random walk in software space with increasing fitness.How well will this do? I’m not sure but I’ll tell you what I can prove.

First of all, the mutation model is set up in such a way that in time of theorder of 2N a single mutation will try adding a self-contained N -bit prefixthat calculates BB(N) and ignores the rest of the program; the time is thenumber of mutations that have been considered. The size of the organism inbits will grow at most as 2N , the fitness will grow as BB(N). So the fitnesswill grow faster than any computable function, which shows that biologicalcreativity is taking place; for if an organism is improved mechanically via analgorithm and without any creativity, then its fitness will only increase as acomputable function.

Theorem 1With high probabilityfitness[time order of 2N ] ≥ BB(N)which grows faster than any computable function of N .


We can prove that evolution will occur but our proof is not very interestingsince the time involved is large enough for evolution to randomly try allthe possibilities. And, most important of all, in this proof evolution is notcumulative. In effect, we are starting from scratch each time.

Now, I think that the behavior of this model will actually be cumulativebut I can’t prove it.

However, I can show that there is what might be called an ideal evolution-ary pathway, which is a sequence of organisms having fitness that grows asBB(N) and size in bits that grows as N , and the mutation distance betweensuccessive organisms is bounded.

That is encouraging. It shows that there are intermediate forms that arefitter and fitter. The only problem is that this pathway is unstable and nota likely random walk; it’s an ideal evolutionary pathway.

This is my second theorem, and the organisms are reversed initial seg-ments of the bits of the halting probability Ω (plus a fixed prefix). If you aregiven an initial portion of Ω, K bits in fact, then you can find which ≤ Kbit programs halt and see which one produces the largest positive integer,which is by definition BB(K). Furthermore the mutation distance betweenreversed initial segments of Ω is not large because you are only adding onebit at a time.

Theorem 2There is a sequence of organisms OK with the property that:OK = the first K bits of Ω,fitness[OK ] = BB(K),mutation-distance[OK , OK+1] < c.

(If we could shield the prefix from mutations, and we picked as successor toeach organism the fittest organism within a certain fixed mutation distanceneighborhood, then this ideal evolutionary pathway would be followed.)

These are my two theorems. The half-theorem is the fact that a sequenceof software organisms with increasing fitness and bounded mutation distancedoes not depend on the choice of universal Turing machine, because addinga fixed prefix to each organism keeps the mutation distance bounded.

Theorem 2.5That a sequence of organisms OK has the property thatmutation-distance[OK , OK+1] < cdoes not depend on the choice of universal Turing machine.


Okay, so at this point there are only these 2 1/2 theorems. Not veryimpressive. As I said, at this time metabiology is a field with a lovely namebut not much content.

However, I am hopeful. I feel that some kind of evolving software modelshould work. There are a lot of parameters to vary, a lot of knobs to tweak.The question is, how biological will the behavior of these models be? Inparticular I would like to know if hierarchical structure will emerge.

The human genome is a very large piece of software: about a gigabyte ofDNA.

Large computer programs must be structured, they cannot be spaghetticode; otherwise they cannot be debugged and maintained. Software organ-isms, I suspect, can also benefit from such discipline. Then a useful muta-tion is likely to be small and localized, rather than involve many coordinatedchanges scattered throughout the organism, which is much less likely.

Software engineering practice has a lot of experience with large softwareprojects, which may also be relevant to randomly evolving software organ-isms, and perhaps indirectly to biology.

Clearly, there is a lot of work to be done.As Dobzhansky said, nothing in biology makes sense except in the light

of evolution, and I think that a randomly evolving software approach cangive us some insight.

Thank you very much!


Chapter 9

To a mathematical theory ofevolution and biologicalcreativity

To be published in H. Zenil, Computation in Nature & The Nature of Computation, World Scientific, 2012.

Abstract: We present an information-theoretic analysis of Darwin’stheory of evolution, modeled as a hill-climbing algorithm on a fitnesslandscape. Our space of possible organisms consists of computer pro-grams, which are subjected to random mutations. We study the randomwalk of increasing fitness made by a single mutating organism. In twodifferent models we are able to show that evolution will occur and to char-acterize the rate of evolutionary progress, i.e., the rate of biological creativity.

Key words and phrases: metabiology, evolution of mutating soft-ware, random walks in software space, algorithmic information theory

9.1 Introduction

For many years we have been disturbed by the fact that there is no fun-damental mathematical theory inspired by Darwin’s theory of evolution[1, 2, 3, 4, 5, 6, 7, 8, 9]. This is the fourth paper in a series [10, 11, 12]attempting to create such a theory.

In a previous paper [10] we did not yet have a workable mathematical

113


framework: We were able to prove two not very impressive theorems, andthen the way forward was blocked. Now we have what appears to be a goodmathematical framework, and have been able to prove a number of theorems.Things are starting to work, things are starting to get interesting, and thereare many technical questions, many open problems, to work on.

So this is a working paper, a progress report, intended to promote interestin the field and get others to participate in the research. There is much tobe done.

In order to present the ideas as clearly as possible and not get boggeddown in technical details, the material is presented more like a physics paperthan a math paper. Estimates are at times rather sloppy. We are trying toget an idea of what is going on. The arguments concerning the basic mathframework are however very precise; that part is done more or less like amath paper.

9.2 History of Metabiology

In the first paper in this series [10] we proposed modeling biological evo-lution by studying the evolution of randomly mutating software—we callthis metabiology. In particular, we proposed considering a single mutatingsoftware organism following a random walk in software space of increasingfitness. Besides that the main contribution of [10] was to use the Busy Beaverproblem to challenge organisms into evolving. The larger the positive integerthat a program names, the fitter the program.

And we measured the rate of evolutionary progress using the Busy Beaverfunction BB(N) = the largest integer that can be named by an N -bit pro-gram. Our two results employing the framework in [10] are that

• with random mutations, random point mutations, we will get to fitnessBB(N) in time exponential in N (evolution by exhaustive search) [10,11],

• whereas by choosing the mutations by hand and applying them in theright order, we will get to fitness BB(N) in time linear in N (evolutionby intelligent design) [11, 12].

We were unable to show that cumulative evolution will occur at random;

To a mathematical theory of evolution and biological creativity 115

exhaustive search starts from scratch each time.1

This paper advances beyond the previous work on metabiology [10, 11, 12,13] by proposing a better concept of mutation. Instead of changing, deletingor inserting one or more adjacent bits in a binary program, we now have high-level mutations: we can use an arbitrary algorithm M to map the organism Ainto the mutated organism A′ = M(A). Furthermore, the probability of themutation M is now furnished by algorithmic information theory: it dependson the size in bits of the self-delimiting program for M . It is very importantthat we now have a natural, universal probability distribution on the spaceof all possible mutations, and that this is such a rich space.

Using this new notion of mutation, these much more powerful mutations,enables us to accomplish the following:

• We are now able to show that random evolution will become cumula-tive and will reach fitness BB(N) in time that grows roughly as N2, sothat random evolution behaves much more like intelligent design thanit does like exhaustive search.2

• We also have a version of our model in which we can show that hi-erarchical structure will evolve, a conspicuous feature of biologicalorganisms that previously [10] was beyond our reach.

This is encouraging progress, and suggests that we may now have thecorrect version of these biology-inspired concepts. However there are manyserious lacunae in the theory as it currently stands. It does not yet deserveto be called a mathematical theory of evolution and biological creativity ; atbest, it is a sketch of a possible direction in which such a theory might go.

On the other hand, the new results are encouraging, and we feel it wouldbe inappropriate to sit on these results until all the lacunae are filled. Afterall, that would take an entire book, since metabiology is, or will hopefullybecome, a rich and entirely new field.

That said, the reader will understand that this is a working paper, aprogress report, to show the direction in which the theory is developing, and

1The Busy Beaver function BB(N) grows faster than any computable function. Thatevolution is able to “compute” the uncomputable function BB(N) is evidence of creativitythat cannot be achieved mechanically. This is possible only because our model of evolu-tion/creativity utilizes an uncomputable Turing oracle. Our model utilizes the oracle in ahighly constrained manner; otherwise it would be easy to calculate BB(N).

2Most unfortunately, it is not yet demonstrated that random evolution cannot be asfast as intelligent design.


to indicate problems that need to be solved in order to advance, in orderto take the next step. We hope that this paper will encourage others toparticipate in developing metabiology and exploring its potential.

9.3 Modeling Evolution

9.3.1 Software Organisms

In this paper we follow a metabiological [10, 11, 12, 13] approach: Instead ofstudying the evolution of actual biological organisms we study the evolutionof software subjected to random mutations. In order to do this we use toolsfrom algorithmic information theory (AIT) [13, 14, 15, 16, 17, 18, 19]; tofully understand this paper expert understanding of AIT is unfor-tunately necessary (see the outline in the Appendix).

As our programming formalism we employ one of the optimal self-delimiting binary universal Turing machines U of AIT [14], and also, butonly in Section 9.7, a primitive FORTRAN-like language that is not univer-sal.

So our organisms consist on the one hand of arbitrary self-delimitingbinary programs p for U , or on the other hand of certain FORTRAN-likecomputer programs. These are the respective software spaces in which weshall be working, and in which we will study hill-climbing random walks.

9.3.2 The Hill-Climbing Algorithm

In our models of evolution, we define a hill-climbing random walk as follows:We start with a single software organism A and subject it to random mu-tations until a fitter organism A′ is obtained, then subject that organismto random mutations until an even fitter organism A′′ is obtained, etc. Inone of our models, organisms calculate natural numbers, and the bigger thenumber, the fitter the organism. In the other, organisms calculate functionsthat map a natural number into another natural number, and the faster thefunction grows, the fitter the organism.

In this connection, here is a useful piece of terminology: A mutation Msucceeds if A′ = M(A) is fitter than A; otherwise M is said to fail.


9.3.3 Fitness

In order to get our software organisms to evolve it is important to presentthem with a challenge, to give them something difficult to do. Three well-known problems requiring unlimited amounts of mathematical creativity are:

• Model A: Naming large natural numbers (non-negative integers) [20,21, 22, 23],

• Model B: Defining extremely fast-growing functions [24, 25, 26],

• Model C: Naming large constructive Cantor ordinal numbers [26, 27].

So a software organism will be judged to be more fit if it calculates a largerinteger (our Model A, Sections 9.4, 9.5, 9.6), or if it calculates a faster-growing function (our Model B, Section 9.7). Naming large Cantor ordinals(Model C) is left for future work, but is briefly discussed in Section 9.8.

9.3.4 What is a Mutation?

Another central issue is the concept of a mutation. Biological systems aresubjected to point mutations, localized changes in DNA, as well as to highlevel mutations such as copying an entire gene and then introducing changesin it. Initially [10] we considered mutating programs by changing, deletingor adding one or more adjacent bits in a binary program, and postponedworking with high-level source language mutations.

Here we employ an extremely general notion of mutation: A mutationis an arbitrary algorithm that transforms, that maps the original organisminto the mutated organism. It takes as input the organism, and produces asoutput the mutated organism. And if the mutation is an n-bit program, thenit has probability 2−n. In order to have the total probability of mutations be≤ 1 we use the self-delimiting programs of AIT [14].3

9.3.5 Mutation Distance

A second crucial concept is mutation distance, how difficult it is to get fromorganism A to organism B. We measure this distance in bits and it is defined

3The total probability of mutations is actually < 1, so that each time we pick a mutationat random, there is a fixed probability that we will get the null mutation M(A) = A, whichalways fails.


to be − log2 of the probability that a random mutation will change A to B.Using AIT [14, 15, 16], we see that this is nearly H(B|A), the size in bits ofthe smallest self-delimiting program that takes A as input and produces Bas output.4 More precisely,

H(B|A) = − log2 P (B|A) +O(1) = − log2

∑U(p|A)=B

2−|p|

+O(1). (9.1)

Here |p| denotes the size in bits of the program p, and U(p|A) denotes theoutput produced by running p given input A on the computer U until p halts.

The definition of H(B|A) that we employ here is somewhat different fromthe one that is used in AIT: a mutation is given A directly, it is not given aminimum-size program for A. Nevertheless, (9.1) holds [14].

Interpreting (9.1) in words, it is nearly the same to consider the simplestmutation from A to B, which is H(B|A) bits in size and has probability2−H(B|A), as to sum the probability over all the mutations that carry A intoB.

Note that this distance measure is not symmetric. For example, it is easyto change (X, Y ) into Y , but not vice versa.

9.3.6 Hidden Use of Oracles

There are two hidden assumptions here. First of all, we need to use an oracleto compare the fitness of an organism A with that of a mutated organism A′.This is because a mutated program may not halt and thus never produces anatural number. Once we know that the original organism A and the mutatedorganism A′ both halt, then we can run them to see what they calculate andwhich is fitter.

In the case of fast-growing computable functions, an oracle is definitelyneeded to see if one grows faster than another; this cannot be determined byrunning the primitive recursive functions [29] calculated by the FORTRAN-like programs that we will study later, in Section 9.7.

Just as oracles would be needed to actually find fitter organisms, theyare also necessary because a random mutation may never halt and produce a

4Similarly, H(B) denotes the size in bits of the smallest self-delimiting program forB that is not given A. H(B) is called the complexity of B, and H(B|A) is the relativecomplexity of B given A.


mutated organism. So to actually apply our random mutations to organismswe would need to use an oracle in order to avoid non-terminating mutations.

9.4 Model A (Naming Integers) Exhaustive Search

9.4.1 The Busy Beaver Function

The first step in this metabiological approach is to measure the rate of evo-lution. To do that, we introduce this version of the Busy Beaver function:

BB(N) = the biggest natural number named by a ≤ N -bit program.

More formally,BB(N) = max

H(k)≤Nk.

Here the program-size complexity or the algorithmic information contentH(k) of k is the size in bits of the smallest self-delimiting program p withoutinput for calculating k:

H(k) = minU(p)=k

|p|.

Here again |p| denotes the size in bits of p, and U(p) denotes the outputproduced by running the program p on the computer U until p halts.

9.4.2 Proof of Theorem 1 (Exhaustive Search)

Now, for the sake of definiteness, let’s start with the trivial program thatdirectly outputs the positive integer 1, and apply mutations at random.5

Let’s define the mutation time to be n if we have tried n mutations, andthe organism time to be n if there are n successive organisms of increasingfitness so far in our infinite random walk.

From AIT [14] we know that there is an N + O(1)-bit mutation thatignores its input and produces as output a ≤ N -bit program that calculatesBB(N). This mutationM has probability 2−N+O(1) and on the average, it willoccur at random every 2N+O(1) times a random mutation is tried. Therefore:

5The choice of initial organism is actually unimportant.


Theorem 1 The fitness of our organism will reach BB(N) by mutation time2N . In other words, we will achieve N bits of biological/mathematical cre-ativity by time 2N . Each successive bit of creativity takes twice as long as theprevious bit did.6

More precisely, the probability that this should fail to happen, the prob-ability that M has not been tried by time 2N , is(

1− 1

2N

)2N

→ e−1 ≈ 1

2.7<

1

2.

And the probability that it will fail to happen by mutation time K2N is< 1/2K .

This is the worst that evolution can do. It is the fitness that organismswill achieve if we are employing exhaustive search on the space of all possibleorganisms. Actual biological evolution is not at all like that. The humangenome has 3× 109 bases, but in the mere 4× 109 years of life on this planetonly a tiny fraction of the total enormous number 43×109

of sequences of3× 109 bases can have been tried. In other words, evolution is not ergodic.

9.5 Model A (Naming Integers) Intelligent Design

9.5.1 Another Busy Beaver Function

If we could choose our mutations intelligently, evolution would be much morerapid. Let’s use the halting probability Ω [19] to show just how rapid. Firstwe define a slightly different Busy Beaver function BB′ based on Ω. Con-sider a fixed recursive/computable enumeration pi : i = 0, 1, 2 . . . withoutrepetitions of all the programs without input that halt when run on U . Thus

0 < Ω = ΩU =∑i

2−|pi| < 1 (9.2)

and we get the following sequence Ω0 = 0 < Ω1 < Ω2 . . . of lower bounds onΩ:

ΩN =∑i<N

2−|pi|. (9.3)

6Instead of bits of creativity one could perhaps refer to bits of inspiration; said inspira-tion of course is ultimately coming through/from our oracle, which keeps us from gettingstuck on non-terminating programs.


In (9.2) and (9.3) |p| denotes the size in bits of p, as before.We define BB′(K) to be the least N for which the first K bits of the

base-two numerical value of ΩN are correct, i.e., the same as the first K bitsof the numerical value of Ω. BB′(K) exists because we know from AIT [14]that Ω is irrational, so Ω = .010000 is impossible and there is no danger thatΩN will be of the form .0011111 with 1’s forever.

Note that BB and BB′ are approximately equal. For we can calculateBB′(N) if we are given N and the first N bits of Ω. Therefore

BB′(N) ≤ BB(N +H(N) + c) = BB(N +O(logN)).

Furthermore, if we knew N and any M ≥ BB′(N), we could calculate thestring ω of the first N bits of Ω, which according to AIT [14] has complexityH(ω) > N − c′, so

N − c′ < H(ω) ≤ H(N) +H(M) + c′′.

Therefore BB′(N) and all greater than or equal numbers M have complexityH(M) > N −H(N) − c′ − c′′, so BB′(N) must be greater than the biggestnumber M0 with complexity H(M0) ≤ N −H(N)− c′ − c′′. Therefore

BB′(N) > BB(N −H(N)− c′ − c′′) = BB(N +O(logN)).

9.5.2 Improving Lower Bounds on Ω

Our model consists of arbitrary mutation computer programs operating onarbitrary organism computer programs. To analyze the behavior of thissystem (Model A), however, we shall focus on a select subset: Our organismsare lower bounds on Ω, and our mutations increase these lower bounds.

We are going to use these same organisms and mutations to analyzeboth intelligent design (Section 9.5.3) and cumulative evolution at random(Section 9.6). Think of Section 9.5.3 versus Section 9.6 as counterpoint.

Organism Pρ — Lower Bound ρ on Ω

Now we use a bit string ρ to represent a dyadic rational number in [0, 2) =0 ≤ x < 2; ρ consists of the base-two units “digit” followed by the base-twoexpansion of the fractional part of this rational number.

There is a self-delimiting prefix πΩ that given a bit string ρ that is alower bound on Ω, calculates the first N such that Ω > ΩN ≥ ρ, where ΩN


is defined as in (9.3).7 If we concatenate the prefix πΩ with the string of bitsρ, and insert 0|ρ|1 in front of ρ in order to make everything self-delimiting,we obtain a program Pρ for this N .

We will now analyze the behavior of Model A by using these organismsof the form

Pρ = πΩ 0|ρ|1ρ. (9.4)

To repeat, the output of Pρ, and therefore its fitness φPρ , is determined asfollows:

U(Pρ) = the first N for which∑i<N

2−|pi| = ΩN ≥ ρ. (9.5)

This fitness will be ≥ BB′(K) if ρ < Ω and the first K bits of ρ are thecorrect base-two numerical value of Ω. Pρ will fail to halt if ρ > Ω.8

Mutation Mk — Lower Bound ρ on Ω Increased by 2−k

Consider the mutations Mk that do the following. First of all, Mk computesthe fitness φ of the current organism A by running A to determine the integerφ = φA that A names. All that Mk takes from A is its fitness φA. ThenMk computes the corresponding lower bound on Ω:

ρ =∑i<φ

2−|pi| = Ωφ.

Here pi is the standard enumeration of all the programs that halt whenrun on U that we employed in Section 9.5.1. Then Mk increments the lowerbound ρ on Ω by 2−k:

ρ′ = ρ+ 2−k.

In this way Mk obtains the mutated program

A′ = Pρ′ .

A′ will fail to halt if ρ′ > Ω. If A′ does halt, then A′ = Mk(A) = Pρ′ willhave fitness N(see (9.5)) greater than φA = φ because ρ′ > ρ = Ωφ, so morehalting programs are included in the sum (9.3) for ΩN , which therefore hasbeen extended farther:

[ΩN ≥ ρ′ > ρ = Ωφ] =⇒ [N > φ].

7That ρ 6= Ω follows from the fact that Ω is irrational.8That ρ 6= Ω follows from the fact that Ω is irrational.


Therefore if Ω > ρ′ = ρ+ 2−k, then Mk increases the fitness of A.If ρ′ > Ω, then Pρ′ = Mk(A) never halts and is totally unfit.

9.5.3 Proof of Theorem 2 (Intelligent Design)

Please note that in this toy world, the “intelligent designer” is the author ofthis paper, who chooses the mutations optimally in order to get his creaturesto evolve.

Let’s now start with the computer program Pρ with ρ = 0. In otherwords, we start with a lower bound on Ω of zero.

Then for k = 1, 2, 3 . . . we try applying Mk to Pρ. The mutated organismPρ′ = Mk(Pρ) will either fail to halt, or it will have higher fitness than ourprevious organism and will replace it. Note that in general ρ′ 6= ρ + 2−k,although it could conceivably have that value. Mk will from Pρ take only itsfitness, which is the first N such that ΩN ≥ ρ.

ρ′ = ΩN + 2−k ≥ ρ+ 2−k.

So ρ′ is actually equal to a lower bound on Ω, ΩN , plus 2−k. Thus Mk willattempt to increase a lower bound on Ω, ΩN , by 2−k. Mk will succeed ifΩ > ρ′. Mk will fail if ρ′ > Ω. This is the situation at the end of stage k.Then we increment k and repeat. The lower bounds on Ω will get higher andhigher.

More formally, let O0 = Pρ with ρ = 0. And for k ≥ 1 let

Ok =

Ok−1 if Mk fails,Mk(Ok−1) if Mk succeeds.

Each Ok is a program of the form Pρ with Ω > ρ.At the end of stage k in this process the first k bits of ρ will be exactly

the same as the first k bits of Ω, because at that point all together we havetried summing 1/2+1/4+1/8 · · ·+1/2k to ρ. In essence, we are using anoracle to determine the value of Ω by successive interval halving.9

In other words, at the end of stage k the first k bits of ρ in Ok are correct.Hence:

9That this works is easy to see visually. Think of the unit interval drawn vertically,with 0 below and 1 above. The intervals are being pushed up after being halved, but it isstill the case that Ω remains inside each halved interval, even after it has been pushed up.


Theorem 2 By picking our mutations intelligently rather than at random,we obtain a sequence ON of software organisms with non-decreasing fitness10

for which the fitness of each organism is ≥ BB′(N). In other words, we willachieve N bits of biological/mathematical creativity in mutation time linearin N . Each successive bit of creativity takes about as long as the previous bitdid.

However, successive mutations must be tried at random in our evolutionmodel; they cannot be chosen deliberately. We see in these two theorems twoextremes: Theorem 1, brainless exhaustive search, and Theorem 2, intelligentdesign. What can real, random evolution actually achieve? We shall see thatthe answer is closer to Theorem 2 than to Theorem 1. We will achieve fitnessBB′(N) in time roughly order of N2. In other words, each successive bit ofcreativity takes an amount of time which increases linearly in the number ofbits.

Open Problem 1 Is this the best that can be done by picking the mutationsintelligently rather than at random? Or can creativity be even faster thanlinear? Does each use of the oracle yield only one bit of creativity? 11

Open Problem 2 In Theorem 2 how fast does the size in bits of the or-ganism ON grow? By using entirely different mutations intelligently, wouldit be possible to have the size in bits of the organism ON grow linearly, or,alternatively, for the mutation distance between ON and ON+1 to be bounded,and still achieve the same rapid growth in fitness?

Open Problem 3 In Theorem 2 how many different organisms will therebe by mutation time N? I.e., on the average how fast does organism timegrow as a function of mutation time?

9.6 Model A (Naming Integers) Cumulative Evolu-

tion at Random

Now we shall achieve what Theorem 2 achieved by intelligent design, by usingrandomness instead. Since the order of our mutations will be random, not

10Note that this is actually a legitimate fitness increasing (non-random) walk becausethe fitness increases each time that ON changes, i.e., each time that ON+1 6= ON .

11Yes, only one bit of creativity, otherwise Ω would be compressible. In fact, thesequence of oracle replies must be incompressible.


intelligent, there will be some duplication of effort and creativity isdelayed, but not overmuch.

In other words, instead of using the mutations Mk in a predeterminedorder, they shall be picked at random, and also mixed together with othermutations that increase the fitness.

As you will recall (Section 9.5.2), a larger and larger positive integer isequivalent to a better and better lower bound on Ω. That will be our clock,our memory. We will again be evolving better and better lower bounds ρ onΩ and we shall make use of the organisms Pρ as before ((9.4), Section 9.5.2).We will also use again the mutations Mk of Section 9.5.2.

Let’s now study the behavior of the random walk in Model A if we startwith an arbitrary program A that has a fitness, for example, the programthat is the constant 0, and apply mutations to it at random, according tothe probability measure on mutations determined by AIT [14], namely thatM has probability 2−H(M).12 So with probability one, every mutation willbe tried infinitely often; M will be tried roughly every 2H(M) mutationtimes.

At any given point in this random walk, we can measure our progress toΩ by the fitness φ = φA of our current organism A and the correspondinglower bound Ωφ = ΩφA on Ω. Since the fitness φ can only increase, the lowerbound Ωφ can only get better.

In our analysis of what will happen we focus on the mutations Mk; othermutations will have no effect on the analysis. They are harmless and canbe mixed in together with the Mk. By increasing the fitness, they can onlymake Ωφ converge to Ω more quickly.

We also need a new mutation M∗. M∗ doesn’t get us much closer to Ω,it just makes sure that our random walk will contain infinitely many of theprograms Pρ. M

∗ will be tried roughly periodically during our random walk.M∗ takes the current lower bound Ωφ = ΩφA on Ω, and produces

A′ = M∗(A) = PΩ1+φA.

A′ has fitness 1 greater than the fitness of A and thus mutationM∗ will alwayssucceed, and this keeps lots of organisms of the form Pρ in our random walk.

Let’s now return to the mutations Mk, each of which will also have to betried infinitely often in the course of our random walk.

12This is a convenient lower bound on the probability of a mutation. A more precisevalue for the probability of jumping from A to A′ is 2−H(A′|A).


The mutation Mk will either have no effect because Mk(A) fails to halt,which means that we are less than 2−k away from Ω, that is, ΩφA is lessthan 2−k away from Ω, or Mk will have the effect of incrementing our lowerbound ΩφA on Ω by 2−k. As more and more of these mutations Mk are triedat random, eventually, purely by chance, more and more of the beginningof ΩφA will become correct (the same as the initial bits of Ω). Meanwhile,the fitness φA will increase enormously, passing BB′(n) as soon as the firstn bits of ΩφA are correct. And soon afterwards, M∗ will package this in anorganism A′ = PΩ1+φA

.How long will it take for all this to happen? I.e., how long will it take to

try the Mk for k = 1, 2, 3, . . . , n and then try M∗? We have

H(Mk) ≤ H(k) + c.

Therefore mutation Mk has probability

≥ 2−H(k)−c >1

c′k(log k)1+ε(9.6)

since ∑k

1

k(log k)1+ε

converges.13 The mutation Mk will be tried in time proportional to 1 over theprobability of its being tried, which by (9.6) is approximately upper boundedby

ξ(k) = c′′k(log k)1+ε. (9.7)

On the average, from what point on will the first n bits of Ωφ = ΩφA bethe same as the first n bits of Ω? We can be sure this will happen if wefirst try M1, then afterwards M2, then M3, etc. through Mn, in that order.Note that if these mutations are tried in the wrong order, they will not havethe desired effect. But they will do no harm either, and eventually will alsobe tried in the correct order. Note that it is conceivable that none of theseMk actually succeed, because of the other random mutations that were inthe mix, in the melee. These other mutations may already have pushed uswithin 2−k of Ω. So these Mk don’t have to succeed, they just have to betried. Then M∗ will make sure that we get an organism of the form Pρ withat least n bits of ρ correct.

13We are using here one of the basic theorems of AIT [14].


Hence:

Expected time to try M1 ≤ ξ(1)

Expected time to then afterwards try M2 ≤ ξ(2)

Expected time to then afterwards try M3 ≤ ξ(3)

. . .

Expected time to then afterwards try Mn ≤ ξ(n)

Expected time to then afterwards try M∗ ≤ c′′′

∴ Expected time to try M1,M2,M3 . . .Mn,M∗ in order ≤

∑k≤n ξ(k) + c′′′

Using (9.7), we see that this is our extremely rough “ball-park” estimateon a mutation time sufficiently big for the first n bits of ρ in Pρ = M∗(A) tobe the correct bits of Ω:∑

k≤n

ξ(k) + c′′′ =∑k≤n

c′′k(log k)1+ε + c′′′ = O(n2(log n)1+ε). (9.8)

Hence we expect that in time O(n2(log n)1+ε) our random walk will includean organism Pρ in which the first n bits of ρ are correct, and so Pρ willcompute a positive integer ≥ BB′(n), and thus at this time the fitness willhave to be at least that big:

Theorem 3 In Model A with random mutations, the fitness of the organismsPρ = M∗(A) will reach BB′(N) by mutation time roughly N2.

Note that since the bits of ρ in the organisms Pρ = M∗(A) are becomingbetter and better lower bounds on Ω, these organisms in effect contain theirevolutionary history. In Model A, evolution is cumulative, it does notstart over from scratch as in exhaustive search.

It should be emphasized that in the course of such a hill-climbing randomwalk, with probability one every possible mutation will be tried infinitely of-ten. However the mutations Mk will immediately recover from perturbationsand set the evolution back on course. In a sense the system is self-organizingand self-repairing. Similarly, the initial organism is irrelevant.

Also note that with probability one the time history or evolutionary path-way (i.e., the random walk in Model A) will quickly grow better and betterapproximations to all possible halting probabilities ΩU ′ (see (9.2)) determinedby any optimal universal self-delimiting binary computer U ′, not just for our


original U . Furthermore, some mutations will periodically convert our organ-ism into a numerical constant for its fitness φ, and there will even be arbitrar-ily long chains of successive numerical constant organisms φ, φ+ 1, φ+ 2 . . .The microstructure and fluctuations that will occur with probability one arequite varied and should perhaps be studied in detail to unravel the full zooof organisms and their interconnections; this is in effect a kind of miniaturemathematical ecology.

Open Problem 4 Study this mathematical ecology.

Open Problem 5 Improve the estimate (9.8) and get a better upper boundon the expected time it will take to try M1, M2, M3 through Mn and M∗ inthat order. Besides the mean, what is the variance?

Open Problem 6 Separate random evolution and intelligent design: Wehave shown that random evolution is fast, but can you prove that it cannotbe as fast as intelligent design? I.e., we have a lower bound on the speedof random evolution, and now we also need an upper bound. This is prob-ably easier to do if we only consider random mutations Mk and keep othermutations from mixing in.

Open Problem 7 In Theorem 3 how fast does the size in bits of the organ-ism Pρ grow? Is it possible to have the size in bits of the organism Pρ growlinearly and still achieve the same rapid growth in fitness?

Open Problem 8 It is interesting to think of Model A as a conventionalrandom walk and to study the average mutation distance between an organismA and its successor A′, its second successor A′′, etc. In organism time ∆thow far will we get from A on the average? What will the variance be?

9.7 Model B (Naming Functions)

Let’s now consider Model B. Why study Model B? Because hierarchical struc-ture is a conspicuous feature of actual biological organisms, but it is impossi-ble to prove that such structure must emerge by random evolution in ModelA.

Why not? Because the programming language used by the organisms inModel A is so powerful that all structure in the programs can be hidden.


Consider the programs Pρ defined in Section 9.5.2 and used to prove Theo-rems 2 and 3. As we saw in Theorem 3, these programs Pρ evolve withoutlimit at random. However, Pρ consists of a fixed prefix πΩ followed by alower bound on Ω, ρ, and what evolves is the lower bound ρ, data whichhas no visible hierarchical structure, not the prefix πΩ, code which has fixed,unevolving, hierarchical structure.

So in Model A it is impossible to prove that hierarchical structure willemerge and increase in depth. To be able to do this we must utilize a lesspowerful programming language, one that is not universal and in which thehierarchical structure cannot be hidden: the Meyer-Ritchie LOOP language[28].

We will show that the nesting depth of LOOP programs will increasewithout limit, due to random mutations. This also provides a much moreconcrete example of evolution than is furnished by our main model, ModelA.

Now for the details.We study the evolution of functions f(x) of a single integer argument x;

faster growing functions are taken to be fitter. More precisely, if f(x) andg(x) are two such functions, f is fitter than g iff g/f → 0 as x→∞. We usean oracle to decide if A′ = M(A) is fitter than A; if not, A is not replacedby A′.14 The programming language we are using has the advantage thatprogram structure cannot be hidden. It’s a programming language that ispowerful enough to program any primitive recursive function [29], but it’snot a universal programming language.

To give a concrete example of hierarchical evolution, we use the extremelysimple Meyer-Ritchie LOOP programming language, containing only assign-ment, addition by 1, do loops, and no conditional statements or subroutines.All variables are natural numbers, non-negative integers. Here is an exampleof a program written in this language:

14An oracle is needed in order to decide whether g(x)/f(x) → 0 as x → ∞ and alsoto avoid mutations M that never produce an A′ = M(A). Furthermore, if a mutationproduces a syntactically invalid LOOP program A′, A′ does not replace A.


// Exponential: 2 to the Nth power

// with only two nested do loops!

function(N) // Parameter must be called N.

M = 1

//

do N times

M2 = 0

// M2 = 2 * M

do M times

M2 = M2 + 1

M2 = M2 + 1

end do

M = M2

end do

// Return M = 2 to the Nth power.

return_value = M

// Last line of function must

// always set return_value.

end function

More generally, let’s start with f0(x) = 2x:

function(N) // f_0(N)

M = 0

// M = 2 * N

do N times

M = M + 1

M = M + 1

end do

return_value = M

end function // end f_0(N)

Note that the nesting depth of f0 is 1.And given a program for the function fk, here is how we program

fk+1(x) = fxk (2) (9.9)

by increasing the nesting depth of the program for fk by 1:


function(N) // f_(k+1)(N)

M = 2

// do M = f_k(M) N times

do N times

N_ = M

// Insert program for f_k here

// with "function" and "end function"

// stripped and all variable names

// renamed to variable name_

M = return_value_

end do

return_value = M

end function // end f_(k+1)(N)

So following (9.9) we now have programs for

f0(x) = 2x, f1(x) = 2x, f2(x) = 222...

with x 2’s . . .

Note that a program in this language which has nesting depth 0 (no doloops) can only calculate a function of the form (x+a constant), and that thedepth 1 function f0(x) = 2x grows faster than all of these depth 0 functions.More generally, it can be proven by induction [29] that a program in thislanguage with do loop nesting depth ≤ k defines functions that grow moreslowly than fk, which is defined by a depth k+1 LOOP program. This is thebasic theorem of Meyer and Ritchie [28] classifying the primitive recursivefunctions according to their rates of growth.

Now consider the mutation M that examines a software organism A writ-ten in this LOOP language to determine its nesting depth n, and then re-places A by A′ = fn(x), a function that grows faster than any LOOP func-tion with depth ≤ n. Mutation M will be tried at random with probability≥ 2−H(M). And so:

Theorem 4 In Model B, the nesting depth of a LOOP function will increaseby 1 roughly periodically, with an estimated mutation time of 2H(M) betweensuccessive increments. Once mutation M increases the nesting depth, it willremain greater than or equal to that increased depth, because no LOOP func-tion with smaller nesting depth can grow as fast.

Note that this theorem works because the nesting depth of a primitiverecursive function is used as a clock; it gives Model B memory that can beused by intelligent mutations like M .


Open Problem 9 In the proof of Theorem 4, is the mutation M primitiverecursive, and if so, what is its LOOP nesting depth?

Open Problem 10 M can actually increase the nesting depth extremelyfast. Study this.

Open Problem 11 Formulate a version of Theorem 4 in terms of subrou-tine nesting instead of do loop nesting. What is a good computer programminglanguage to use for this?

9.8 Remarks on Model C (Naming Ordinals)

Now let’s briefly turn to programs that compute constructive Cantor ordinalnumbers α [27]. From a biological point of view, the evolution of ordinals ispiquant, because they certainly exhibit a great deal of hierarchical structure.Not, in effect, as we showed in Section 9.7 must occur in the genotype; hereit is automatically present in the phenotype.

Ordinals also seem like an excellent choice for an evolutionary modelbecause of their fundamental role in mathematics15 and because of the mys-tique associated with naming large ordinals, a problem which can utilize anunlimited amount of mathematical creativity [26, 27]. Conventional ordinalnotations can only handle an initial segment of the constructive ordinals.

However there are two fundamentally different ways [27] to use algorithmsto name all such ordinals α:

• An ordinal is a program that given two positive integers, tells us whichis less than the other in a well-ordering of the positive integers withorder type α.

• An ordinal α is a program for obtaining that ordinal from below: If itis a successor ordinal, as β + 1; if it is a limit ordinal, as the limit of afundamental sequence βk (k = 0, 1, 2 . . .).

This yields two different definitions of the algorithmic information contentor program-size complexity of a constructive ordinal:

15As an illustration of this, ordinals may be used to extend the function hierarchy fk

of Section 9.7 to transfinite k. For example, fω(x) = fx(x), fω+1(x) = fxω(2), fω+2(x) =

fxω+1(2) . . . fω×2(x) = fω+x(x), etc., an extension of (9.9).


H(α) = the size in bits of the smallest self-delimiting programfor calculating α.

We can now define this beautiful new version of the Busy Beaver function:

BBord(N) = maxH(α)≤N

α.

In order to make programs for ordinals α evolve, we now need to usea very sophisticated oracle, one that can determine if a program computesan ordinal and, given two such programs, can also determine if one of theseordinals is less than the other. Assuming such an oracle, we get the followingversion of Theorem 1, merely by using brainless exhaustive search:

Theorem 5 The fitness of our ordinal organism α will reach BBord(N) bymutation time 2N .

Can we do better than this? The problem is to determine if there is somekind of Ω number or other way to compress information about constructiveordinals so that we can improve on Theorem 5 by proving that evolutionwill probably reach BBord(N) in an amount of time which does not growexponentially.

We suspect that Model C may be an example of a case in which cumulativeevolution at random does not occur. On the other hand, we are given anextremely powerful oracle; maybe it is possible to take advantage of that.The problem is open.

Open Problem 12 Improve on Theorem 5 or show that no improvement ispossible.

9.9 Conclusion

At this point we should look back and ask why this all worked. Mainly forthe following reason: We used an extremely rich space of possible mutations,one that possess a natural probability distribution: the space of all possibleself-delimiting programs studied by AIT [14]. But the use of such powerfulmutational mechanisms raises a number of issues.

Presumably DNA is a universal programming language, but how sophis-ticated can mutations be in actual biological organisms? In this connection,


note that evo-devo views DNA as software for constructing the embryo, andthat the change from single-celled to multicellular organisms is roughly liketaking a main program and making it into a subroutine, which is a fairlyhigh-level mutation. Could this be the reason that it took so long—on theorder of 109 years—for this to happen?16

The issue of balance between the power of the organisms and the powerof the mutations is an important one. In the current version of the theory,both have equal power, but as a matter of aesthetics it would be bad form fora proof to overemphasize the mutations at the expense of the organisms. Infuture versions of the theory perhaps it will be desirable to limit the powerof mutations in some manner by fiat.

In this connection, note that there are two uses of oracles in this theory,one to decide which of two organisms is fitter, and another to eliminate non-terminating mutations. It is perfectly fine for a proof to be based on takingadvantage of the oracle for organisms, but taking advantage of the oracle formutations is questionable.

We have by no means presented in this paper a mathematical theory ofevolution and biological creativity comme il faut. But at this point in time webelieve that metabiology is still a possible contender for such a theory. Theultimate goal must be to find in the Platonic world of mathematical ideasthat ideal model of evolution by natural selection which real, messy biologicalevolution can but approach asymptotically in the limit from below.

We thank Prof. Cristian Calude of the University of Auckland for readinga draft of this paper, for his helpful comments, and for providing the paperby Meyer and Ritchie [28].

Appendix. AIT in a Nutshell

Programming languages are commonly universal, that is to say, capable ofexpressing essentially any algorithm.

In order to be able to combine subroutines, i.e., for algorithmic informa-tion to be subadditive,

size of program to calculate x and y≤ size of program to calculate x+ size of program to calculate y,

16During most of the history of the earth, life was unicellular.


it is important that programs be self-delimiting. This means that the uni-versal computer U reads a program bit by bit as required and there is nospecial delimiter to mark the end of the program; the computer must decideby itself where to stop reading.

More precisely, if programs are self-delimiting we have

H(x, y) ≤ H(x) +H(y) + c,

where H(. . .) denotes the size in bits of the smallest program for U to cal-culate . . . , and c is the number of bits in the main program that reads andexecutes the subroutine for x followed by the subroutine for y.

Besides giving us subadditivity, the fact that programs are self-delimitingalso enables us to talk about that probability P (x) that a program that isgenerated at random will compute x when run on U .

Let’s now consider how expressive different programming languages canbe. Given a particular programming language U , two important things toconsider are the program-size complexity H(x) as a function of x, and thecorresponding algorithmic probability P (x) that a program whose bits arechosen using independent tosses of a fair coin will compute x.

We are thus led to select a subset of the universal languages that minimizeH and maximize P ; one way to define such a language is to consider auniversal computer U that runs self-delimiting binary computer programsπC p defined as follows:

U(πC p) = C(p).

In other words, the result of running on U the program consisting of theprefix πC followed by the program p, is the same as the result of running pon the computer C. The prefix πC tells U which computer C to simulate.

Any two such maximally expressive universal languages U and V willnecessarily have

|HU(x)−HV (x)| ≤ c

andPU(x) ≥ PV (x)× 2−c, PV (x) ≥ PU(x)× 2−c.

It is in this precise sense that such a universal U minimizes H and maximizesP .

For such languages U it will be the case that

H(x) = − log2 P (x) +O(1),


which means that most of the probability of calculating x is concentratedon the minimum-size program for doing this, which is therefore essentiallyunique. O(1) means that the difference between the two sides of the equationis order of unity, i.e., bounded by a constant.

Furthermore, we have

H(x, y) = H(x) +H(y|x) +O(1).

Here H(y|x) is the size of the smallest program to calculate y from x.17 Thistells us that essentially the best way to calculate x and y is to calculate xand then calculate y from x. In other words, the joint complexity of x and yis essentially the same as the absolute complexity of x added to the relativecomplexity of y given x.

This decomposition of the joint complexity as a sum of absolute andrelative complexities implies that the mutual information content

H(x : y) ≡ H(x) +H(y)−H(x, y),

which is the extent to which it is easier to compute x and y together ratherthan separately, has the property that

H(x : y) = H(x)−H(x|y) +O(1) = H(y)−H(y|x) +O(1).

In other words, H(x : y) is also the extent to which knowing y helps us toknow x and vice versa.

Last but not least, using such a maximally expressive U we can definethe halting probability Ω, for example as follows:

Ω =∑

2−|p|

summed over all programs p that halt when run on U , or alternatively

Ω′ =∑

2−H(n)

summed over all positive integers n, which has a slightly different numericalvalue but essentially the same paradoxical properties.

What are these properties? Ω is a form of concentrated mathematicalcreativity, or, alternatively, a particularly economical Turing oracle for the

17It is crucial that we are not given x directly. Instead we are given a minimum-sizeprogram for x.


halting problem, because knowing n bits of the dyadic expansion of Ω enablesone to solve the halting problem for all programs p which compute a positiveinteger that are up to n bits in size. It follows that the bits of the dyadicexpansion of Ω are irreducible mathematical information; they cannot becompressed into a theory smaller than they are.18

From a philosophical point of view, however, the most striking thingabout Ω is that it provides a perfect simulation in pure mathematics, whereall truths are necessary truths, of contingent, accidental truths—i.e., of truthssuch as historical facts or biological frozen accidents.

Furthermore, Ω opens a door for us from mathematics to biology. Thehalting probability Ω contains infinite irreducible complexity and in a senseshows that pure mathematics is even more biological then biology itself,which merely contains extremely large finite complexity. For each bit of thedyadic expansion of Ω is one bit of independent, irreducible mathematicalinformation, while the human genome is merely 3× 109 bases = 6× 109 bitsof information.

18More precisely, it takes a formal axiomatic theory of complexity ≥ n−c (one requiringa ≥ n− c bit program to enumerate all its theorems) to enable us to determine n bits ofΩ.


Bibliography

[1] D. Berlinski, The Devil’s Delusion, Crown Forum, 2008.

[2] S. J. Gould, Wonderful Life, Norton, 1990.

[3] N. Shubin, Your Inner Fish, Pantheon, 2008.

[4] M. Mitchell, Complexity, Oxford University Press, 2009.

[5] J. Fodor, M. Piattelli-Palmarini, What Darwin Got Wrong, Farrar,Straus and Giroux, 2010.

[6] S. C. Meyer, Signature in the Cell, HarperOne, 2009.

[7] J. Maynard Smith, Shaping Life, Yale University Press, 1999.

[8] J. Maynard Smith, E. Szathmary, The Origins of Life, Oxford UniversityPress, 1999; The Major Transitions in Evolution, Oxford UniversityPress, 1997.

[9] F. Hoyle, Mathematics of Evolution, Acorn, 1999.

[10] G. J. Chaitin, “Evolution of mutating software,” EATCS Bulletin 97(February 2009), pp. 157–164.

[11] G. J. Chaitin, “Metaphysics, metamathematics and metabiology,” inH. Zenil, Randomness Through Computation, World Scientific, in press.(Draft at http://www.umcs.maine.edu/~chaitin/lafalda.pdf.)

[12] G. J. Chaitin, Mathematics, Complexity and Philosophy, Midas,in press. (Draft at http://www.umcs.maine.edu/~chaitin/midas.html.)(See Chapter 3, “Algorithmic Information as a Fundamental Concept inPhysics, Mathematics and Biology.”)

139


[13] G. J. Chaitin, Chapter “Complexity, Randomness” in Chaitin, Costa,Doria, After Godel, in preparation. (Draft at http://www.umcs.maine.

edu/~chaitin/bookgoedel_2.pdf.)

[14] G. J. Chaitin, “A theory of program size formally identical to informa-tion theory,” J. ACM 22 (1975), pp. 329–340.

[15] G. J. Chaitin, Algorithmic Information Theory, Cambridge UniversityPress, 1987.

[16] G. J. Chaitin, Exploring Randomness, Springer, 2001.

[17] C. S. Calude, Information and Randomness, Springer-Verlag, 2002.

[18] M. Li, P. M. B. Vitanyi, An Introduction to Kolmogorov Complexity andIts Applications, Springer, 2008.

[19] C. Calude, G. Chaitin, “What is a halting probability?,” AMS Notices57 (2010), pp. 236–237.

[20] H. Steinhaus, Mathematical Snapshots, Oxford University Press, 1969,pp. 29–30.

[21] D. E. Knuth, “Mathematics and computer science: Coping with finite-ness,” Science 194 (1976), pp. 1235–1242.

[22] A. Hodges, One to Nine, Norton, 2008, pp. 246–249; M. Davis, TheUniversal Computer, Norton, 2000, pp. 169, 235.

[23] G. J. Chaitin, “Computing the Busy Beaver function,” in T. M. Cover,B. Gopinath, Open Problems in Communication and Computation,Springer, 1987, pp. 108–112.

[24] G. H. Hardy, Orders of Infinity, Cambridge University Press, 1910. (SeeTheorem of Paul du Bois-Reymond, p. 8.)

[25] D. Hilbert, “On the infinite,” in J. van Heijenoort, From Frege to Godel,Harvard University Press, 1967, pp. 367–392.

[26] J. Stillwell, Roads to Infinity, A. K. Peters, 2010.


[27] H. Rogers, Jr., Theory of Recursive Functions and Effective Computabil-ity, MIT Press, 1987. (See Chapter 11, especially Sections 11.7, 11.8 andthe exercises for these two sections.)

[28] A. R. Meyer, D. M. Ritchie, “The complexity of loop programs,” Pro-ceedings ACM National Meeting, 1967, pp. 465–469.

[29] C. Calude, Theories of Computational Complexity, North-Holland, 1988.(See Chapters 1, 5.)


Chapter 10

Parsing the Turing test

Journal of Scientific Exploration 23 (2009), pp. 530–534.

Parsing the Turing Test: Philosophical and MethodologicalIssues in the Quest for the Thinking Computer edited by RobertEpstein, Gary Roberts and Grace Beber. Springer, 2009. xxiii + 517 pp.$199.00 (hardcover). ISBN 9781402067082.

This big, expensive book offers much food for thought. This review willbe a reaction to the first editor’s introduction, plus the clever reverse Turingtest in Chapter 28 by Charles Platt with machines attempting to determineif humans have any intelligence. Basically, based on my sample of these twochapters, this book is a celebration of the coming extinction of the humanrace. I shall play the devil’s advocate, and also take a meta perspectiveon the book, analyzing its significance as a social phenomenon instead ofconsidering its contents.

Turing’s famous paper on the imitation game (reprinted and annotatedin this book), a remote conversation with a computer attempting to proveit is human, in addition to its intellectual fireworks, reflects the fact thatTuring, as the French say, “felt uncomfortable in his skin,” both as a maleand as a human being. As this book indicates, this has now become part ofthe zeitgeist and a general social problem.

The general attitude I see here reminds me of remarks by Marvin MinskyI heard many years ago, when he called human beings “meat machines,”and described the human race as a carbon-based life-form that was creatinga silicon-based life-form that would replace it. At the time, his remarksseemed a bit mad, but now many people seem to feel that way.

143


Why is this? Well, our current society attempts to make people intomachines, it behaves as if human beings were ants or bees. We are beingforced to live in an anthill, beehive society. Obviously machines are better atbeing machines than we are, and humans feel ill-suited for anthill or beehivelife. Human beings are made to feel obsolete, has-beens.

Robert Epstein’s introduction argues that a super-human intelligence isinevitable and not far off in time, and that at best we shall be slaves or petsfor the machine, at worse exterminated as annoying insects.

The authors are well aware of the amazing advances in computer technol-ogy that they believe make this possible, but perhaps they are less aware ofthe fact that the more we understand about organisms, the more molecularbiology progresses, the more amazing living beings seem. The cells in thehuman body were originally autonomous living beings that have now bandedtogether much like the citizens in a nation or the employees in a corporation.An individual cell is amazingly sophisticated, and, it seems to me, is bestcompared with a computer or even with an entire city.

So our artificial machines may not catch up with Nature’s machines fora while. Can a century of human engineering compare with billions of yearsof evolution, essentially an immense parallel-processing molecular-level com-putation going on throughout the entire biosphere?

In a more optimistic scenario we are not exterminated, the machines willbe our servants. Isaac Asimov thought that in the future human beings mightlive like ancient Greek aristocrats with robotic slaves.

Yes, machines can calculate better than we can, and remember thingsbetter than we can. Should we be very upset? Railroad trains go faster thana person can run, a steam-shovel can move earth quicker than a person, anairplane can fly. But human beings made those machines, and should beproud of it. Are we upset about the fact that we need to wear clothing inthe winter? Not at all. People are not very fast, not very strong, they donot have fur or a tough hide, but they are extremely curious, clever, andimaginative, flexible and adaptable. Like the universal Turing machine, weare generalists, not specialists. We are not optimized for any particular littleecological niche.

It is also possible that eventually enhanced humans and humanized ma-chines will become nearly indistinguishable, which doesn’t sound too bad tome. It’s much like wearing clothing or using a can-opener.

But maybe none of this will happen. Another possibility is that machineintelligences will remain unconscious zombies, monstrous golems lacking a

Parsing the Turing test 145

divine spark, a human soul. For we are products of George Bernard Shaw’slife-force, of Henri Bergson’s “elan vital”, and machines are not. This is ofcourse not a fashionable view in our secular times, but let me try to give acontemporary version of this argument, one designed for modern sensibilities.

First of all, quantum mechanics, a branch of fundamental physics, hasbeen telling us that the Schrdinger Psi function is real, more real than theparticles it describes. Electrons in atoms are expressed as probability wavesthat interfere constructively and destructively. Atoms are like musical in-struments.

Whatever the Psi function is, it is not material. It is more like an idea,and therefore gives support to those Platonic idealist philosophies that viewspirit as more fundamental, more real, than matter. Of course, this is nota fashionable interpretation. Nonetheless Nature is giving us this hint loudand clear, even if we refuse to listen.

The latest version of quantum mechanics, now called quantum informa-tion theory, reformulates “classical” 1920s quantum mechanics in terms ofqubits of information; information is certainly not matter. In my opinionquantum information theory is even less materialist than classical quantummechanics.

Consciousness, quite mysterious at this time, is also more about informa-tion than about matter, I think. Could consciousness reflect some currentlyunknown level of physical reality? Could our current science be radically in-complete? Indeed, it may well be so. There may be many scientific mysteriesyet to solve.

It is true that during the three-century plus history of modern science,each period thinks it has a nearly final answer, only to discover 25 or 50years later some totally unexpected phenomenon that provokes a completeparadigm shift. Let me invoke a temporal rather than a spatial “Copernicanprinciple.” Why should our epoch be especially favored? Why should wehave the final answers?

A simple linear extrapolation of the history of science suggests that acentury from now things will look remarkably different. What did we knowof quantum mechanics a century ago? Is it possible that, to use Wolf-gang Pauli’s trenchant phrase, our current scientific world-view “is not evenwrong?” For our grand-children and great-grand children’s sake I hope so.How boring if it should happen that there will be no fundamental changesin our scientific world view in the future. Why should Nature’s imaginationbe as limited as ours?


So if our current scientific world view is not at all final, perhaps livingbeings do have something special that machines cannot attain, somethingthat science will some day understand as well as we currently understandquantum mechanics, a scientific version, perhaps, of the soul or what thespiritual would refer to as a divine spark. How otherwise to understand casesof amazing human creativity? Pick your own favorite examples. I pick thecomposer Johann Sebastian Bach, and the mathematicians Leonhard Euler,Srinivasa Ramanujan and Georg Cantor. Can machines have that kind ofcreativity, that kind of inspiration? These men seem to have had a directlink to the source of new ideas.

Believers in Darwinian evolution by natural selection will argue that novital spark, no lan vital, nothing at all divine is needed, just random muta-tions. I myself am a believer in Darwinian evolution. I am currently trying todevelop a theory I optimistically have dubbed “metabiology.” The purposeof metabiology is to prove mathematically that Darwinian evolution works.But I am open to the possibility that this may not be achievable. It wouldalso be delightful to be able to prove that evolution by natural selectiondoesn’t, cannot work. I would be happy either way, as long as I can prove it.Most likely my metabiological ideas will lead nowhere, but I feel my honoras a mathematician demands that I should give it a try.

And why have human beings become so defeatist? Is it more fun towork in a factory that produces robots than to conceive and raise one’s ownchildren? Or look at cars. I have been in remote corners of Argentina, wherepeople seem almost completely divorced from the modern world economyand do everything themselves. They manage splendidly without cars, withhorses and donkeys. These are self-reproducing cars, vegetarian cars, notones that need petroleum.

No wonder that the contributors to this book have given up on humanbeings. People are ill-used in our modern society, and sensitive scientificintellectuals feel it. Scientists are now micro-managed. The refereeing andgrant systems with everything decided by committees favors safe, conser-vative, incremental science. Can radical new ideas have a chance with ourcurrent “factory” science? I doubt it. Would Galileo, Newton, Maxwell,Darwin and Einstein be able to work in the current system? Would Euler,Ramanujan and Cantor? I think not.

As I said, human beings are not ants, they are not bees, they were notdesigned to be slaves. Let’s look at particularly creative periods in humanhistory, for example ancient Greece and the Italian Renaissance.

Parsing the Turing test 147

How come the ancient Greeks were so creative? I asked a Greek intellec-tual that once, in Mykonos, and he told me that the ancient Greeks discussedthis, and noted that ancient Egypt was largely stable and un-innovative formillennia, the contrary of the ancient Greeks, because Greek city-states weresmall and separated by mountains or isolated on islands, and so imaginativeindividuals could be creative and affect things, while Egyptian geographypermitted strong central, unified control of an empire, creativity was sup-pressed, and talented individuals could have little or no effect.

Similarly, the creativity of the Italian Renaissance probably had some-thing to do with the fact that, even now, there is no Italian nation-state.Italians are first of all Tuscans or Sicilians, they are individualists, not Ital-ians!

In both cases, ancient Greece and renaissance Italy, chaos and anarchyencouraged creativity, and kept it from being suppressing by the authorities.

What can we learn from this? That strong central control is bad for us.Immediate corollaries: The European Community was not a good idea. Andthe United States would be better off as fifty separate states. At least that’sthe case if you want to maximize creativity. I’ve already said what I thinkof the current refereeing and grant systems.

Let me wrap up my argument. People are not machines. It is time forpeople to stop trying to be like machines, because we have machines forthat now. We should stop worshipping the machine, and instead unleashour creative, curious, passionate, inspired, intuitive, irrational individualistichumanity.


Chapter 11

Should mathematics be donedifferently because of Godel’sincompleteness theorem?

Speech on the occasion of being granted an honorary doctorate by the Univer-sity of Cordoba, founded in 1613. Lecture given Monday, 23 November 2009,in Cordoba, Argentina.1

Good afternoon.First of all, I want to thank the university authorities who are present, to

thank the University of Cordoba, and to thank the Faculty of Philosophy andHumanities, for this honor which I find really moving. I consider myself anArgentinean-American, and I cannot imagine anything nicer than receivingan honorary doctorate from the oldest university in Argentina and one of theoldest in the Americas.

I’m really very moved. It’s a great pleasure for me and my wife to behere in Cordoba, especially for such a nice reason, and for us to becomeacquainted with this city and its intellectual and scientific traditions.

So thank you very much.Furthermore, in spite of what has just been said here by Professor Victor

Rodriguez about my achievements, I don’t think that I have accomplished

1This speech was delivered in Spanish and translated into English by the author.

149


very much. What I see constantly before me are the challenging questionsthat I have not been able to answer, the big holes in what we can understand.Very basic questions, such as whether it is possible to prove mathematicallythat Darwin’s theory of evolution works or that it doesn’t work — either wayit would be very interesting. Or the subject that I want to talk about today,which I will now introduce for you.

I used to work as a computer programmer; I wrote computer software anddid theory as a hobby. So I’m an amateur mathematician and a professionalprogrammer. That’s how I used to earn a living.

People normally think that mathematics is a dry, serious subject wherenothing dramatic ever happens. But in the past century math went througha revolution as serious as the one that took place in physics because of thetheory of relativity and quantum theory. This fact is not well-known outsidethe math community, but it is becoming better known now.

In particular, I’m referring to a controversy over how mathematics shouldbe done. There is a struggle for the soul of mathematics. I exaggerate a bit,but not too much. There is a struggle for the soul of mathematics betweentwo different groups, two tendencies, two opposing viewpoints.

On one side there is the famous French mathematician Poincare whospoke of the importance of intuition in mathematics. On the other sidewe have the German mathematician Hilbert who emphasized formalismand the role of the axiomatic method. The conflict is between intuition andformalism. In other words, is mathematics creative or is it mechanical?Stating it that way, I indicate my own biases.

You can see which side I am on: the romantic side. But the debate is stillvery much alive and I want to give you a concise history of this conflict.

About a century ago Hilbert proposed formalizing all of mathematics,dropping the use of natural language and making math into a formal ax-iomatic theory using an artificial language and mathematical logic. The keypoint is that Hilbert thought that math gives absolute certainty and thatthis implies that you can formalize mathematics completely in such a waythat there is an algorithm, a mechanical procedure, for checking whether ornot a proof is correct.

In other words, Hilbert believed that if math is objective not subjective,if it really is absolutely certain, this is equivalent to saying that there arerules of the game for carrying out proofs — if no steps are left out and weuse a completely formal language — which provide us with a completelymechanical way to check if a proof is correct, that is, whether it obeys the

Should mathematics be done differently? 151

rules. According to Hilbert, this is what it means to say that math givesabsolute certainty, which is what most mathematicians believe, because mathis a way of fleeing from the real world to a toy world where truth is black orwhite and proofs are absolutely convincing.

This is what Hilbert proposed about a century ago. And most peoplethought that it could actually be done, that one could formalize everything.Hilbert represented the orthodox, conservative position within the math com-munity. People thought that it ought to be possible. In fact, some very prettywork was done trying to achieve what Hilbert had proposed, trying to ful-fill his dream of formalizing mathematics completely and obtaining absolutecertainty and total objectivity.2

But in 1931 and in 1936 there were two big surprises. In 1931 KurtGodel showed that Hilbert’s project could never work, and in 1936 AlanTuring showed this completely differently and found a deeper reason whyHilbert’s dream was unattainable.

These two pieces of work are greatly admired, but in my opinion themath community has a very ambiguous position about these two achieve-ments. Godel and Turing are heroes, but nobody wants to face the disturbingimplications of their work.

What Godel showed in 1931 is that Hilbert’s dream is impossible becauseany formalization of mathematics — any formal axiomatic system of the kindthat Hilbert sought for all of math, to give absolute certainty, to show thatthe truth is black or white — will necessarily have to be incomplete becausesome true results will be missing. In other words, no finite formal axiomatictheory can give us all mathematical truths, some of them will always escapeus. In fact, an infinity of true math results will be missing from any formalaxiomatic theory proposed to achieve Hilbert’s dream. Formal axiomatictheories are always incomplete, they do not enable us to demonstrate allpossible mathematical truths.

Godel shows how to construct assertions which are true but cannot bedemonstrated within a given formal axiomatic system. The way he does itis very surprising. He constructs a mathematical assertion — in fact, anarithmetical assertion — which states that it itself cannot be demonstrated.

“I’m unprovable!”

2In particular, I’m thinking of Zermelo-Fraenkel set theory and of the von Neumannintegers.


If you can construct an assertion that states that it’s unprovable, there aretwo possibilities: that it’s provable and that it isn’t. If it’s provable andit asserts that it isn’t, we’re demonstrating something that’s false, which isterrible. So by hypothesis we eliminate this possibility. If a formal axiomaticsystem enables us to prove things that are false, it doesn’t interest us, it’sa complete waste of time. Therefore “I’m unprovable” cannot be proved,which means that it is true.

So you either demonstrate things that are false, or there are indemonstra-ble truths, truths that escape us. This is the alternative that Godel confrontsus with. Assuming that the formal axiomatic system doesn’t enable you toprove things that are false, there must be true mathematical assertions thatcannot be proved.

Godel incompleteness theorem was a big surprise at the time, and whilenot provoking panic, it did lead to some rather emotional reactions, for ex-ample, from Hermann Weyl. Weyl said that his faith in pure mathematicswas badly affected, and that at first it was difficult for him to continue withhis research. And Weyl was a very fine mathematician.

Now my story splits in two. On the one hand, there is more research onincompleteness, on Godel’s remarkable discovery. On the other hand, themath community begins to lose interest in these philosophical questions andcontinues with its everyday work.

First I’ll tell you about Turing.In 1936 Turing goes beyond Godel and finds a much deeper reason for

incompleteness. But I should emphasize that pioneering work is always themost difficult. Before Godel nobody was courageous enough to imagine thatHilbert might be wrong. Turing found a deeper reason for incompleteness.

Turing discovered that there are many things in mathematics that canbe defined but which there is no mechanical procedure, no algorithm, forcalculating — they are not computable functions. Math is full of things thatcan be defined but cannot be calculated. And uncomputability is a newsource of incompleteness.

If we consider a mathematical question such as Turing’s famous haltingproblem for which there is no general method for calculating the answer, weget the immediate corollary that there cannot be a formal axiomatic theorythat always enables us to prove what the answer is.

Why not?One of the most basic properties of a formal axiomatic theory is that

in principle there is a mechanical procedure for systematically traversing the


tree of all possible proofs and eliminating the ones that are incorrect. It wouldbe very slow, but in principle it would enable us to find all the theorems.So if we have a theory that enables us to demonstrate in individual caseswhether or not a program eventually halts, this would give us a mechanicalprocedure, an algorithm, that always gives the correct answer, which Turingshowed in 1936 is impossible.

So Turing deduces Godel incompleteness from a more fundamental idea,uncomputability, which is the fact that math is full of things that can bedefined but cannot be calculated.

Now World War II begins and the generation that was interested in thesephilosophical questions disappears from the scene. The math communitygoes forward forgetting the crisis that was provoked by Godel’s theoremwhich had been such a big surprise.

My problem is that I didn’t go forward. I remained obsessed with Godel’stheorem. I thought it had to be very important. I bet my professional careeron the idea that it was a mistake to ignore Godel’s result.

What the math community did, since they are mathematicians and notphilosophers, is to continue with their daily work, with the problems thatinterested them. The consensus was that yes in theory there are limits towhat can be demonstrated using any particular formal axiomatic theory, butnot in practice, not with the kinds of questions that interest us, not in ourown particular field. This was more or less the community’s reaction.

In other words, while there may be mathematical facts that are true butunprovable, these are highly artificial pathological cases. The consensus wasthat in practice this does not occur. At least that is what mathematicianspreferred to think in order to be able to carry on with their work.

People have an amazing ability to avoid thinking about unpleasant sub-jects such as death. If we think about death all the time it is impossibleto function. And if mathematicians think all the time about incompletenessthey can’t function either, since there will always be doubt about whetherthe matter at hand can be settled by means of a proof. Why am I wastingyears of my life trying to prove something if there may not even be a proof?

Let’s consider an alternative course of action. Instead of ignoring Godel’stheorem, what if we take it very seriously? I don’t believe in going to ex-tremes, but if one took Godel’s result very, very seriously, how might one pro-ceed? Consider the Riemann hypothesis. This is an important mathematicalconjecture that has a lot of significant consequences. But unfortunately in ahundred and fifty years of effort nobody has succeeded in proving the Rie-


mann hypothesis. Mathematicians don’t know what to do; the way forwardis blocked. But physicists would just consider the Riemann hypothesis to bea mathematical fact that has been corroborated empirically.

In other words, I think that a possible reaction to Godel’s result is tomake math a little bit more like theoretical physics. In physics axioms don’thave to be self-evident. Maxwell’s equations and the Schrodinger equationare not self-evident but they help us to organize, to unify a large body ofexperimental data.

One could do mathematics in a similar fashion, taking Godel as justifica-tion for behaving as if math were an empirical science in which one doesn’ttry to demonstrate everything from self-evident principles, but instead oneonly seeks to organize mathematical experience like physicists organize theirphysics lab experience. One could proceed pragmatically and adopt unprovenhypotheses as new basic principles because they are extremely fruitful andhave many useful consequences even though they aren’t at all self-evident.This is what I think we should do if we take Godel’s theorem seriously.

In my opinion mathematics is different from physics, but maybe not asdifferent as most people think. My work on metamathematics using complex-ity and information-theoretic ideas suggests to me that perhaps we shouldemphasize the similarities between the world of mathematics and the worldof physics instead of emphasizing the differences.

In this connection, there is a highly pertinent remark by the Russianmathematician Vladimir Arnold. In his opinion the only difference betweenmathematics and physics is that in mathematics the experiments are cheaper,since one can carry them out on a computer instead of having to have alaboratory full of expensive equipment! So math experiments are easier thanphysics experiments.

How do I try to justify this new “quasi-empirical” view of mathematics?Well, like most mathematicians, I do in fact believe in the Platonic world ofmath ideas in which the truth is totally black or white. But I also believethat we are denied direct access to this Platonic world and that down hereat our level it may be helpful to work a bit more quasi-empirically.

It may look like my mixed, hybrid, Platonic-empiricist position is incon-sistent, but I don’t think that this is actually the case. Indeed, it is sometimesvery fruitful to take ideas that seem to be inconsistent and show that in factthey aren’t.

Okay, so where do I find arguments in favor of this quasi-empirical view ofmathematics? The key question is whether the incompleteness phenomenon


that was discovered by Godel and further explored by Turing is exceptionalor widespread. How pervasive is incompleteness? That’s the basic question,and it is quite controversial.

My contribution to this discussion is that I’ve found tools for measuringthe complexity or the information content of a formal axiomatic mathemati-cal theory. And by using the concept of complexity in algorithmic informationtheory one can see that incompleteness is natural, not surprising. In fact,it’s inevitable, it’s unavoidable.

Using algorithmic information theory, one can see that the world of math-ematical truths, the Platonic world of mathematical ideas, is infinitely com-plex. But any formal axiomatic system made by human beings necessarilyhas only finite complexity. Indeed, rather low complexity, since the axiomsand rules of inference normally fit on a couple of pages.

So seen from this perspective, incompleteness is natural, inevitable. Theworld of mathematical ideas is infinitely complex, but our theories only havelow, finite complexity; otherwise they wouldn’t fit in a mathematician’s brainnor would they be regarded as self-evident — but I’m against the idea that inmathematics axioms have to be self-evident, because in physics self-evidenceof axioms is not required.

I’ve used complexity and information theory to argue that since theamount of information in pure mathematics is infinite, incompleteness isonly to be expected, since a formal axiomatic theory can capture at most afinite amount of this mathematical information, an infinitesimal portion infact.

This more or less summarizes an entire lifetime of research. But you willnot be surprised to learn that the mathematics community has not acceptedmy quasi-empirical proposal. The immune system of an intellectual commu-nity is very strong, and my ideas are rejected as foreign, as alien to the mathcommunity.

Logicians don’t care much for computability, for complexity, for informa-tion and for randomness. Randomness is a nightmare for a logician, becauserandomness is irrational. Random events happen for no reason, they areincomprehensible from a logical point of view.

However the physics community has some interest in my work. They likethe idea of using a physics-inspired approach in pure mathematics. They likethe idea that math isn’t that different from physics. They like the idea thata mathematics proof may be more convincing than the heuristic argumentsthat are accepted in physics, but that this is only a matter of degree, not an


absolute black or white difference. They have always felt that mathemati-cians believe too much in absolute truth, and do not appreciate theoreticalphysics enough.

But the coin is two-sided, and the conflict between intuition and for-malism has become much more acute because of the computer. Computertechnology is a powerful argument against creativity and in favor of mecha-nization and formalization.

Just take a look at the December 2008 issue of the Notices of the AmericanMathematical Society which you can find for free on the web. This is a specialissue devoted to formal proof. While I, a poor theoretician, have been tryingto convince mathematicians to pay attention to Godel’s theorem and workslightly differently, these people — I didn’t realize what was happening untilthey did it — have nearly succeeded in carrying out Hilbert’s dream.

They’ve constructed tools for formalizing almost all of mathematics.They’ve done a superb piece of software engineering.

This community, which is a group of fine mathematicians and softwareengineers, believes that in the future all mathematical proofs should be for-mal proofs. In their opinion, there will soon be no reason for acceptinginformal proofs. We can start demanding formal proofs and re-writing all ofmathematics in a formal language so that it can be checked by verificationsoftware.

There are now interactive proof checkers for verifying mathematicalproofs. This is how these work: If I’m a mathematician and I have aninformal proof that I want to formalize, I give it to the proof checker. It willsay, “Well, there’s a particular step in this proof that I don’t understand yet.Can you please explain this better?” And you keep filling in the proof, pro-viding more details, until the software says, “Now I understand everything.It’s all fine. I have a complete formal proof.”

You didn’t have to write all the steps in the formal proof yourself; thatwould be a big job. You write part of it, and the software provides therest. The final result of this joint effort is a complete formal proof that hasbeen checked and verified by reliable software, software that you trust be-cause it was carefully developed using the best available software engineeringmethodology.

And this verification technology has advanced to the point where youdon’t just verify toy proofs, you can verify complicated proofs of really im-portant theorems, for example the four color theorem, which states that fourcolors suffice for coloring maps without having neighboring countries with


the same color.This was a rather complicated proof that not only was formalized, the

mathematician who did it did not complain and even stated that goingthrough this process enabled him to substantially improve the proof. Sothis formal proof business is getting really serious.

Hilbert never thought that mathematicians should be required to usedetailed formal proofs in their daily work. But this community does. Fur-thermore, they envision an official repository for formal proofs that have beenput through this verification process. Proofs will have to be accepted by thisrepository to be used by the mathematics community; everything that hasbeen formalized and checked will be there, in one place.

So amazingly enough, the lines of research opened up by Hilbert’s formal-ization proposal and by Godel’s work on the limitations of formal systemsare both progressing dramatically. I think there is a wonderful intellectualtension between the work advancing formalization and the one criticizing it.Both of these lines of research are going forward splendidly in parallel!

In mathematics this circumstance is striking because one thinks that thetruth is black or white. But in philosophy this situation doesn’t seem sostrange because philosophers understand that ideas that seem contradictoryare often in fact complementary.

I won’t try to predict the final outcome of this conflict; probably therewill be no final outcome. In philosophy there are no final answers. Eachgeneration does its best to resolve the fundamental questions to their ownsatisfaction, and then the next generation goes off in a different direction.

So I won’t try to predict the future. I don’t know if mathematicianswill eventually think that incompleteness implies that they should do mathdifferently, or if formalization will win.

Perhaps we don’t have to choose between quasi-empiricism and formal-ization. Both of these approaches can contribute something to mathematicsand to mathematical practice.

My late friend the mathematician Gian-Carlo Rota, whose provocativeideas I greatly enjoy, has bequeathed us a collection of his essays entitledIndiscrete Thoughts. He thinks that formal axiomatization is a cemetery.

When a theory is completely finished, then you can formalize it.But when you are creating a new theory, you have to work with vague

intuitions, with imprecise ideas, and formalization is deadly. Premature for-malization stifles creativity; once a theory is formalized it becomes stiff andrigid and no new ideas can get in.


So I think that quasi-empiricism and formalism can both contribute some-thing of value. Furthermore, both are advancing step by step.

In 1974 I proposed accepting new math axioms the way that this is donein physics,3 and nobody took me seriously, but in the past thirty-five yearsthis has actually happened.

It has happened in set theory, where there’s a new axiom called “projec-tive determinacy.” It has happened in theoretical computer science, whereyou use the hypothesis that P is not equal NP, which everyone believesbut nobody can prove. And it has happened in mathematical cryptogra-phy, which is based on the assumption that you can’t factorize big numbersquickly.

In these fields mathematicians are behaving as if they were physicists.They’ve found new principles that enable them to organize the experiencesof each of these communities. These are principles that are not self-evident,that have not been demonstrated, but that are accepted by consensus as newfundamental principles, at least until they are disproven or counter-examplesare encountered.

Each of these mathematical communities is behaving as if they were the-oretical physicists, they are doing what I call quasi-empirical mathematics.So I’ve been delighted to witness these developments, but not so delightedto see the striking advance of formalism in recent years.

These questions are still open, and they are very difficult ones. I’ve triedto argue in favor of a quasi-empirical stance, in favor of creativity and againstformalism, but I myself am not completely convinced by my own arguments.More work is needed. We still do not know to what extent math is mechanicalor creative.

Thank you very much!

3“Information-theoretic limitations of formal systems,” J. ACM 21, 1974, pp. 403–424.

Bibliography

1. David Berlinski, The Devil’s Delusion: Atheism and its Scientific Pre-tensions

2. Stephen Jay Gould, Wonderful Life: The Burgess Shale and the Natureof History

3. Neil Shubin, Your Inner Fish: A Journey into the 3.5-Billion-YearHistory of the Human Body

4. Melanie Mitchell, Complexity: A Guided Tour

5. Jerry Fodor and Massimo Piattelli-Palmarini, What Darwin GotWrong

6. Stephen C. Meyer, Signature in the Cell: DNA and the Evidence forIntelligent Design

159


Books by Chaitin

• Algorithmic Information Theory, Cambridge University Press, 1987.

• Information, Randomness and Incompleteness: Papers on AlgorithmicInformation Theory, World Scientific, 1987, 2nd edition, 1990.

• Information-Theoretic Incompleteness, World Scientific, 1992.

• The Limits of Mathematics: A Course on Information Theory and theLimits of Formal Reasoning, Springer, 1998. Also in Japanese.

• The Unknowable, Springer, 1999. Also in Japanese.

• Exploring Randomness, Springer, 2001.

• Conversations with a Mathematician: Math, Art, Science and the Lim-its of Reason, Springer, 2002. Also in Portuguese and Japanese.

• From Philosophy to Program Size: Key Ideas and Methods. LectureNotes on Algorithmic Information Theory from the 8th Estonian Win-ter School in Computer Science, EWSCS ’03, Tallinn Institute of Cy-bernetics, 2003.

• Meta Math! The Quest for Omega, Pantheon, 2005. Also UK, French,Italian, Portuguese, Japanese and Greek editions.

• Teoria algoritmica della complessita, Giappichelli, 2006.

• Thinking about Godel and Turing: Essays on Complexity, 1970–2007,World Scientific, 2007.

• Mathematics, Complexity & Philosophy: Lectures in Canada and Ar-gentina, Midas, in press. This is an English/Spanish bilingual edition.

161


• G. Chaitin, N. da Costa, F. A. Doria, After Godel: Exploits into anundecidable world, in preparation.