Some consequences of recursion in human affairs

MANAGEMENT

Some consequences of recursion in human affairs

Indexing terms:

G.G. Scarrott, F.Eng., F.B.C.S., M.I.E.E.

Engineering administration and management, Information and communication theory,Computer applications, Operational research

Abstract: Meaningful sequences of symbols in natural languages, and also in the most useful artificialcomputer languages, are constructed following rules whose applicability is independent of the scale of theconstruct. Thus, for example, a single adjective, an adjectival clause, a sentence or a complete paper can all beused in the same way as a qualifier, and similarly for other basic grammatical components. This scale-independent assembly technique has been named 'recursive' by linguistic theorists. The same recursivetechnique is used in the reproduction and growth of many living organisms and, indeed, in the growth andoperation of organised human society, which can be regarded as a living organism in this general sense. Thescale independence of such recursively defined organised systems creates statistical corelations at every scale;that can account for empirical statistical distributions such as Zipfs law for the use of words and Pareto's lawfor the distribution of income. The large-scale corelations also invalidate the 'ergodic process' as a model of agenerator of meaningful text, so that widely used measures of information technology such as 'cost per bit'are of questionable value.

So naturalists observe a fleaHath smaller fleas that on him prey;And these have smaller fleas to bite 'em,And so proceed ad infinitumThus every poet, in his kind,is bit by him that comes behind

Jonathan Swift 1667-1745

1 Introduction

A proper objective of an engineer working in a new field is tohelp the practice of his art to envolve into a mature branch ofengineering, supported by a body of valid and relevant theory.

Accordingly, the considerations and speculations thatresulted in this paper were originally an ambitious and perhapsfoolhardy attempt to distil, from experience of computerdesign and use, an understanding of the natural structures andbehaviour of 'information', to guide the deployment ofinformation technology more effectively to meet humanneeds.

It quickly became clear that to make any progress at all itwas necessary for the information engineer to trample overfields such as anthropology and sociology, which are remotefrom the normal competence of an engineer, so that some ofthe assumptions and generalisations may need criticalexamination by professionals in such fields.

In the event, only limited progress has been made on theprimary objective, but the endeavour to create soundconceptual foundations for information engineering has shed asomewhat unexpected light on human social-organisationtechniques and some of their consequences, so that it is hopedthat the paper will be of interest to an audience that is notrestricted to computer professionals.

The classic paper by Shannon and Weaver that initiated theanalysis widely known as 'information theory' was originallyentitled 'The mathematical theory of communication' [11].The authors made very clear that they distinguished betweencommunicable information and meaningful information, andthey made no claim to have formulated a valid model of asource of meaningful information. Unfortunately, the wide useof the term 'information theory' has concealed this situationso that, despite the ubiquity of computers in business and

Paper 1681 A, first received 20th May and in revised form 2 5thSeptember 1981Mr Scarrott was formerly with International Computers Limited,Research & Advanced Development Centre, Fairview Road, Stevenage,Herts. SGI 2DX, and is now retired at 34 Parkway, Welwyn GardenCity, Herts. AL8 6HQ, England

government, there is still no theory of meaningful informationand, consequently, common measures of informationtechnology such as storage capacity and processing powercannot easily be interpreted in users' terms.

At first sight, computer languages and programs appear tobe arbitrary and artificial but, in fact, they are shaped by thesame human information-handling habits that have shapednatural languages. Natural languages in turn have evolved toprovide a vehicle for the traffic in meaningful information thatbonds human society. Hence, by studying natural languagesand human organisational practices together, each can shedlight on the other. We can thus gain insight into the naturalrequirements for information handling aids, the intrinsicutility of information devices, and also into broader issuessuch as the natural structures and statistical properties ofhuman society.

Pioneers of linguistics, e.g. Chomsky, have long recognisedthat in every natural language meaningful information ofunlimited complexity can be conveyed by a few thousandword symbols and quite simple combinatorial rules usedrecursively. The same technique is used by mankind to effectcomplex social cooperation, by following quite simple rulesfor making and meeting commitments also used recursively.Thus recursion should be regarded as a fundamental elementboth in human affairs, generally, and in the information thatbrings them to life. Accordingly, in this paper, arguments andsupporting evidence for this proposition will be presentedtogether with seme of its consequences.

2 Recursion in nature

Much of our present understanding of the natural world hasbeen based on the explicit recognition that nature is neithertotally predictable nor totally random, so that any usefulmodel of the natural world must embody the interplay oforder and disorder.

Thus disorder, in the sense of intrinsic unpredictability indetail, is an essential feature of the inanimate world, e.g. inthermodynamics and radioactivity; but it also characterises thechance mutations and disordered competition that areessential features of Darwinian evolution of living organisms.

likewise the ultimate source of geometrical order in bothliving and dead matter is the chemical bond that enforces anidentical form of local order in each case. If, however, weconsider large-scale order we find a qualitative contrastbetween the comparatively simple order that occurs in aninanimate object, such as a crystal, over a limited distance, andthe complex order that characterises a typical living organism

66 0143-702X/82/010066 +10 $01.50/0 IEEPROC, Vol. 129, Pt. A, No. 1, JANUARY 1981

at every scale, from protein molecules to a 200 ft tree. Aplausible explanation of the contrast is that the 'order' of aliving organism includes recursively defined structures thataccount for many of the familiar features of living organisms,and do not occur in inanimate matter. The essence of thegeneral concept of 'recursion' is exposed if we consider howwe define our understanding of the word 'organisation', e.g.as used in the term 'organised structure'. Our specificallyhuman skill is the creation and operation of organisedassemblies of people, and to maintain such an organisation thepeople concerned interchange 'information' represented innatural language by organised strings of semantic symbols, i.e.words.

A typical organised structure, either of people or of words,can be analysed to reveal components that are themselvesorganised structures, and so we are forced to define, formally,the meaning of the term 'organised structure' by a statementthat applies to all but the simplest constructs:

'organised structure' = an organised assembly of'organised structures'

In this statement, the interpretation of the word in italics isunavoidably subjective as one man's 'organisation' is anotherman's 'chaos', but, if we consider structures on a large enoughscale, the consequential concensus interpretation by a largenumber of people is well defined.

The other essential property of the definition of an'organised structure' is that the object class being defined alsoappears in the definition, which is therefore of a class knownas a 'recursive definition', and it is the consequences of therecursive property that are the concern of this paper.Computer programmers are already familiar with the conceptof an iterative computation process arising from a recursivedefinition of a function, but this familiarity with themechanics of recursively defined computation has concealedthe ubiquity of recursively defined structures in nature bypresenting recursion as no more than a programmer's trick. Auseful way to grasp the broad significance of recursion innature is to consider the structure and growth technique of atree. A typical branch puts forth twigs, some of which in duecourse become branches with the same power to initiate twigsand so on. Thus, formally a 'branch' must be defined as some-thing that grows out of a 'branch' — a recursive definition.

The recursively defined growth process enables the tree totake full advantage of the chance combinations of light andsoil nutrient where it happens to be growing, and also providesmassive redundancy that confers decisive survival value. More-over, to do all this necessitates only an elementary techniquefor replicating the information that controls growth, so thatthe complete growth technique can be regarded, essentially, asa method of creating large-scale order by the effective use ofa manageable quantity of control information.

Every living organism has the same problem because large-scale order is an essential feature of life. Hence, regardless ofthe ultimate chemical mechanisms for storing and reproducinginformation, one may reasonably speculate that the recursivetechnique for building large living organisms with adequateredundancy, of which the growth of trees is a familiarexample, must have been adopted by necessity at an earlystage in the evolution of life. However, the information-handling mechanisms in living organisms are slightly imperfectso that some of their structure arises by chance.

We therefore find that structures characterised by acombination of recursively defined order with a measure ofdisorder occur in all living things. The 'order' is necessary forsurvival in a static environment and the 'disorder' is anessential component of the Darwinian adaptation to a

changing environment. Certainly, such structures can beobserved in many living organisms.

Our human speciality is dynamically adaptive social co-operation so that human social organisations can legitimatelybe regarded as living organisms in this general sense. It is there-fore no accident that the recursively defined hierarchicalstructures that are so common in human affairs are widelyknown as 'tree structures', and that their interactions andchanges constitute the framework of history. These structuresare built by the recursive technique and bonded by large-scaleinterchange of information in the form of commitmentsbetween individuals and groups. We may reasonably speculatethat natural language evolved primarily to meet the need tointerchange such commitments, and so effect the flexiblesocial co-operation that has, in the event, accounted for thesurvival and present dominance of mankind.

We are therefore not surprised to find that natural languageis also characterised by recursion since we use the sametechnique to make an organised string of words that we haveevolved to organise ourselves.

The recursive technique in the use of natural language hasbeen adopted, quite naturally, in artifical programminglanguages — no doubt for the same reasons. One may concludethat recursion should be regarded as quite fundamental inliving organisms generally, in organised human affairs, and inthe information that bonds and operates human society.

The concept of recursion has been well illustrated byMandlebrot in his essay 'Fractals, form, chance and dimension'[1]. He refers to a great variety of recursively definedgeometrical objects, and even offers an object lesson in therecursive use of natural language by marking the digressions

Fig. 1 A 'snake'

IEEPROC, Vol. 129, Pt. A, No. 1, JANUARY 1982 67

and subdigressions in his text. An example of such an object isthe 'snake', Fig. 1, and it illustrates how, within the confinesof the recursively defined construction rules, evermoreelaborate objects can be drawn indefinitely. The diagram alsoillustrates that a recursively defined object has the curiousproperty, termed 'self-similarity' by Mandlebrot, that it has nosense of scale, because at any magnification it will always lookthe same. There are obvious limits to the range of the self-similarity property in the case of an object such as a physicaltree since the tree must exist in only three dimensions.However a recursively defined abstract object, such as thestructure of society or the information that bonds society, isnot subject to such a physical limitation so that the self-similarity property can apply over a wide scale range of severaldecades.

The 'snake' is highly regular as well as recursive, but there isno need for this to be so. Mandlebrot's use of the word'chance' in the title of his essay refers to recursively definedstructures that still have no sense of scale despite incorporatingrandom elements. Such a random recursive structure could, forexample, be constructed by playing a game akin to Patience[16], in which each move chosen at random for a short list ofpossible moves causes part of the diagram to be elaborated bya technique analogous to that illustrated in the 'snake'. Then,as Mandlebrot clearly recognised (Reference 1, Chap. 10), theresulting structure, combining recursive order with a measureof disorder, offers a convincing model of large-scale humansociety and the information that bonds and operates it.

10'

o ^cf

10'10* 10= 10e

rank

Fig. 2 Wealthy US citizens 1961

3 Statistical properties of an organisation

From a structural description of an organisation it is possibleto deduce statistical properties, but the statistical propertiesdo not define the organisational structure. Nevertheless,statistical properties are relevant and valuable, because theycan easily be measured and restrict possible structures to thosewith the measured statistical properties. An empirical statisti-cal law has been known for a long time, that refers equally toorganisational structures and to information. A typicalexample of such a relationship is shown in Fig. 2 in which thepoints represent wealthy American citizens tabulated in theAmerican almanac 1973 [2], Table 551. On this graph, bothhorizontal and vertical scales are logarithmic; the horizontalscale represents rank and the vertical scale represents size. Therank is simply derived by putting the sizes in descending orderso that rank 1 is the largest, rank 2 the second largest and soon. The diagram is therefore a series of points whose co-ordinates refer to the rank and wealth of each individual.

Fig. 3 shows a rank/frequency diagram for the use of words

in English. Zipf [3] and others studied many such diagramsand noticed that some of them approximate to a straight line,implying a hyperbolic distribution, as shown in theseexamples, This relationship, sometimes called the rank/sizerule, has been known for quite a long time. It was first pointedout by J.B. Estoup [4] in 1916, followed by S.C. Bradford[5] in 1934. G.K. Zipf made a comprehensive study of suchdistributions, so that 'Zipf's Law' is probably the name bywhich the relationship is most widely known. More recently ithas been observed that the use patterns of computer instruc-tions, routines, operating system chapters and even magnetictapes, follow Zipf's law.

0.100001 the

0.00001

Fig. 3 Word frequency

100word order

1000 10000

10'

10"

10

straight section following Pareto's law

measured distributionalmanac)

^27 200milhon-22.1 million peopleWhit worth

3A000million30million people' Whitworth

regionfollowingWhitworth[distribution

105

rank106 10'

Fig. 4 Measured distribution (Whitaker's almanac)

Fig. 4 shows a rank/size curve for the distribution ofincome in the UK for 1968-69 [6]. This remarkabledistribution of salaries, represented by the large almost straightline segment, was first published by Vilfredo Pareto in 1897and is known as Pareto's Law.

Many attempts have been made to explain Zipf's andPareto's Laws. Fairthorne [7] discussed the matter at length in1969 and recognised clearly that self similarity leads to ahyperbolic distribution, but glossed over the essence of thematter by concluding that 'the hyperbolic distributions (ofwhich Zipf's and Pareto's Laws are examples) are theinevitable result of combinatorial necessity'. Zipf's explanationis unconvincing because he did not state his proposed^Principle of least effort' clearly enough to show how it leadsto the observed distribution. Mandlebrot's first explanation, interms of the statistical pattern of letter usage, can be criticised

68 IEEPROC, Vol. 129, Ft. A, No. 1, JANUARY 1982

as a circular argument; but his later explanation, as aconsequence of the self-similarity of recursively definedstructures, is the basis of the argument presented in thispaper.

3.1 Whitworth distributionEvidently, the first question one must ask about Zipf's Law iswhether the evidence demands an explanation at all, as severalauthors have pointed out that, armed with logarithmic paperand a lively imagination, it is possible to prove almostanything. Some light may be cast on this problem by plottinga rank/size diagram for the pieces that result if a unit whole ischopped up at random. This problem was tackled byWhitworth [8] and it turns out that the rank/size diagram, forrandom numbers generated in this way to have a defined sum,is a curve illustrated by Fig. 5 (see also Appendix 8.1). More-over, despite the fact that the chopping up is at random, theact of putting the pieces in rank order causes the sizedistribution for most of the ranked pieces to be remarkablywell defined. The diagrams show that the Whitworthdistribution fits quite well the frequency distribution of theuse of individual letters in English and Polish, Fig. 6. Thefrequencies of specific letter usage in the two languages aresubstantially different, but their rank/size diagrams areremarkably similar and in close accord with the Whitworthdistribution; a fact that was first pointed out by I.J. Good [9]in the early 1960s. This is not at all surprising, as it would beexpected that the statistics of letter usage would be dominatedby the chance accidents of spelling conventions, and theWhitworth distribution confirms this hypothesis.

3.2 Zipf and Pareto distributions as a consequenceof recursion

The diagrams make clear that the Zipf and Pareto distributionsare quite different from the Whitworth distribution, andindeed from the other well-known stochastically generateddistributions, so that they cannot be explained as arising bychance. Two distinct features of the Zipf distribution have tobe explained: the fact that the points lie approximately on astraight line, and the slope of the straight line. Although, at

space

WhitworthD\/distribution

2 3 A 5 67 8910rank

20 30

Fig. 5 Letter probabilities in English

IEEPROC, Vol. 129, Pt. A, No. 1, JANUARY 1982

first sight, the straight line property is surprising, a very simpleexplanation can be proposed to account for it as aconsequence of recursive organisational structure.

It has already been pointed out that a recursively definedobject such as the 'snake', shown in Fig. 1, has no sense ofscale because an enlarged version of it at any magnificationlooks essentially the same. This property, termed 'self-similarity' by Mandlebrot, is a direct and simple consequenceof any recursive definition that in its very nature refers equally

10"1

Id2

10-3

E

-

i i i i

- \

W^TK,Whit worth\<aistribution

\NR\ C

\PU\GJ\ N

\B\ H

F

i

1 2 3

Fig. 6 Letter probabilities in Polish

5 6 7 8910rank

20 30

to features of all sizes and is therefore intrinsically incapableof saying anything specific about any particular size. Thus, ifa rank/size plot on double logarithmic graph paper neitherapproximates to the broadly curved Whitworth distribution,nor is straight, it would have to have a characteristic, such asa bend, whose position on the graph paper would imply asense of scale. The recursive definition, however, has no meansof specifying where such a bend would be and so there can beno bend, i.e. the line must be straight.

This argument is the essence of those used by Mandlebrotand Fairthorne, and it applies to any rank/size plot of arecursively defined structure. It explains the linear rank/frequency plot for single words and, moreover, it would bepredicted that the log-rank/log-frequency plot for n word

pairs two apartslope=-0.522

x single wordsadjacent X slope=-0.984

pairs slope^- '= -0.587

rank

Fig. 7 Section R of Brown corpus

100 1000

69

groups in natural language text would also be a straight linewhose slope would tend to zero for large n, as the largenumber of possible messages that can be communicated bymany words must be equally usable. This prediction has beentested by experiment for word pairs, both adjacent and non-adjacent, as shown in Fig. 7 and in greater detail in Appendix8.2. It is difficult to verify the prediction for larger values of n,because a very large sample of natural language in electronicform would be required to ensure statistically significantmeasurements. Further supporting evidence for the recursionhypothesis can be drawn from the work of Prof. M.F. Lynch[10] who studied the rank/frequency diagrams for singlecharacters, groups of four and groups of eight, and showedthat the plot changes from a Whitworth distribution to astraight line as the group size is increased, see Fig. 8.

10''

- 210

* 103

- 410

- 510

character

10 102 103

rank

Fig. 8 Rank/frequency distribution for character strings of lengths 1,4 and 8

The choice of adjacent words in natural language text is notstatistically independent, because if words were selectedindependently to obey the single-word probability spectrumthe most common multiword message in English would be along string of 'the's'. This argument can be extended since, forexample, the most common word pair is 'of the' but 'of theof the' never occurs so that adjacent word pairs cannot bestatistically independent either. In fact, repetition does notoccur at any scale in natural language text, so that correlationsmust exist at every scale.

The experimental confirmation of the straight-line rank/frequency plots for n = 2 cannot be regarded as a proof of therecursion hypothesis. If an explanation of the straight-line plotfor n = 1 could be devised, without invoking recursion, itwould no doubt be possible to devise mechanisms based onthat proposed for n = 1, which could account for the straight-ness of the plot for word pairs. However, the recursionhypothesis offers an elegant explanation of all the experimentalevidence, in the sense that it necessitates the minimum numberof assumptions so that the new evidence for n > 1 adds weightto this argument.

There are other mechanisms, in addition to recursion, thatcontrol the choice of words in natural language; so that thereis scope for deeper studies than those outlined in Appendix 2.8.Nevertheless, it is clear that the simplest explanation of Zipf'sLaw requires that in natural-language text statistical corre-lations between the occurrence of words and word groups

exist on every scale derived from the recursive technique usedto make meaningful text.

4 Consequences of recursion in human affairs

4.1 Recursion in information theoryShannon and Weaver [11] were well aware that, to enlargetheir theory to encompass 'information', it would be necessaryto take into account 'meaning' and hence 'context' as well asthe combinatorial hazards of communication. Indeed Weaverwrote, "The idea of utilising the powerful body of theoryconcerning Markov processes seems particularly promising forsemantic studies since this theory is specifically adapted tohandle one of the most significant but difficult aspects ofmeaning, namely the influence of context'. Unfortunately,although he recognised clearly that 'meaning' necessitatescontext, the choice of the Markov process to model thegeneration of information limited the context dependency tothe immediately preceding symbols.

In the event, communication theory was founded on theconcept of an 'ergodic process' that Weaver described as, 'apoll-takers dream because any reasonably large sample tends tobe representative of the sequence as a whole'. Clearly implicitin this definition is the assumption that it is possible to takea sample of the sequence which is large compared with anyinternal statistical correlations. The consequential property ofan 'ergodic process' that on a large-enough scale events arestatistically independent, is an essential foundation for theconvenience of the logarithmic 'entropy' concept, as theprobability of occurrence of a combination of statisticallyindependent events is simply the product of their individualprobabilities.

Thus, the adoption of the 'ergodic process' as the foun-dation for communication theory enabled the elegant conceptof 'entropy' to be brought to bear on the matter, and theconsequential restricted validity was of no consequence to thecommunication engineers who were concerned only with theconveyance of messages and not with their meaning. However,now that electronic computer engineering is no longer infant,there is a growing recognition that computers and communi-cations are complementary aspects of a new branch ofengineering that is beginning to be known as 'informationengineering'. In this phrase, the interpretation of the word'information' must be derived from its primary role as thebonding agent in organised human society so that 'information'must be formally defined as symbol strings constructed bypeople to convey 'meaning' to other people or, quite recently,to machines. As we have seen, 'information', so defined, isconstructed by a recursive process that creates correlations atevery scale so that text in both natural language and artificialcomputer language cannot be regarded as the result of an'ergodic process'. Indeed the invalidity of the ergodic process,as a generator of natural language text, was pointed out byI.J. Good [9] as long ago as 1966.

It follows that the number of meaningful measages that canbe conveyed by A' communication bits is very much less than2N • The reduction in entropy due to short range correlationshas been already well recognised in established communicationtheory and has been assessed, but the assessment was derivedfrom ZipPs law, regarded as an empirical fact ignoring the cor-relations at every scale of which Zipfs Law is only a symptom.

It appears to be necessary to introduce the concept ofusable entropy (UE), based on a recursively structured modelof meaningful information, and clearly distinguished fromcommunicable entropy (CE), based on a linear Markov chain.

The UE of n bits of CE is defined as the binary logarithmof the number of distinct meaningful messages that can be

70 IEEPROC, Vol. 129, Pt. A, No. 1, JANUARY 1982

constructed by the human recursive process from n bits of CE.UE is unlikely to be precisely definable as a function of CEbut is it possible to shed light on the form of the function,since the evidence of correlations at every range in natural-language text implies that UE cannot be proportional to CE. Itwould be expected that UE increases monotonicaUy as CE -*• °°,but at an ever decreasing rate arising from larger-scalecorrelations, so that UE/CE -»• 0 as CE •+ «>.

These arguments would be expected to apply equally tocomputer programs and could have contributed to thenotorious contrast between triumphant device technology andfaltering systems technology. If, indeed, recursion is a funda-mental reality both in human affairs generally and in the'information' that co-ordinates human activities it offers acredible justification for the economists' law of decreasingreturns', and the common experience that dazzling achieve-ments in device technology have too often led to marginalimprovements in the satisfaction of the ultimate user.

Even if these conjectures turn out to be well founded, theywould not alter the immediate objectives of device and systemdevelopment. However, deeper studies of the semanticstructures [12] and statistical properties of natural-languagetest could be expected to enable communication theory to beenlarged to encompass 'information' in the true human senseof the word by replacing the linear Markov chain by arecursively defined structure. Such a theory would be of greatvalue to designers and users of information systems.Information systems themselves are designed by a recursiveprocess so that there could be a parallel between extravagantuse of elementary logical devices in an organised informationsystem and extravagant use of words in organised text.

4.2 Recursion in human societyThe distribution of income in human society is a matter ofcentral importance to economists, but despite the abundanceof relevant statistical observations, collected over many years,there is still no widely accepted theoretical model to accountfor the observations. Fig. 4 shows the rank/income plot for22 million people in the UK in 1968-69 [13]. Also shown isthe plot that would be observed if the same total income,£27 200 million, were distributed over 22 million people atrandom with a Whitworth distribution. The two distributionsare similar for rank numbers higher than 12 million, i.e. thepoorest 10 million, accounting for about one quarter ofthe total income. The distribution of the remaining threequarters of the total income over twelve million people departssubstantially from the Whitworth distribution and follows thePareto distribution [6] first observed as an empirical fact atthe turn of the twentieth century. When Pareto formulated hislaw the concept of recursion had not been widely recognised.

Consequently, although Pareto's observations were wellfounded, his attempts to explain them in terms of a naturalelite were unconvincing and led to controversy rather thanunderstanding, so that Pareto's Law is still widely regarded as asource of more heat than light. Now, in 1981, the propositionthat a recursively defined model can explain Pareto'sobservations on the distribution of income, without any needto assume a natural elite class is almost obvious, sinceorganised human society everywhere is constructed by thesame recursive technique regardless of local myths.

Accordingly, it is now quite easy to make a credible starton the interpretation of the observations of UK incomedistributions summarised in Fig. 4. Human individuals andtheir situations are all different in uncountable ways. Humansociety, bonded and shaped by the economic production andconsumption of all these individuals, is a living organism and istherefore characterised by a combination of order and disorder.'Order' is represented by about 75% of the income and 60% of

the population whose economic activities are organised in arecursively defined large-scale structure that accounts for thestraight-line part of the rank/income plot. The 'disorder'accounts for 25% of the income and 40% of the populationwhose activities reflect only short-range social co-operation, sothat the fixed total income available is distributed at randomaccording to the Whitworth Law. Both the 'order' and the'disorder' are essential for the operation and survival of humansociety, but their relative proportions vary with time andplace.

It is noteworthy that the observed distribution from rank,12 million to 22 million, matches very closely a Whitworthdistribution of £34 000 million between 30 million people,as shown in Fig. 4. Although at first sight the additional£7000 million and 8 million people over the figures recordedin Whitakers may appear somewhat artificial, deeperconsideration suggests that it may be the published figuresthat are artificial. The essential reality represented by therank/income plot is the distribution of the production ofsociety between its consumers; a meaningful concept longbefore the introduction of money. Moreover, published moneyfigures are no doubt derived from tax returns that do not referto all the production or all the relevant population. Thus,although no independent assessment has been made to justify£7000 million and 8 million people, such an adjustment is notunreasonable.

This explanation of the straight-line rank/size plot forincomes as a consequence of recursive social structured wasforeshadowed by D.G. Champernowne [14], H.F. Lydall [15]and others who proposed recursively defined mechanisms toaccount for Pareto's Law but did not emphasise that theessential ingredient was simply recursion.

The model accounts for the random distribution of smallincomes and the systematic distribution of the larger incomesrepresented by the straight-line plot, but it does not accountfor the slope of the straight line. It seems reasonable tospeculate that in a small primitive society the straight-lineportion would be short and of higher slope, as the techniquesof recursively structured social co-operation are of such recentorigin that they are still evolving. If this is indeed the case, theslope of the Pareto line and the proportion of the populationthat it represents would be expected to vary somewhatbetween one society and another, and, indeed, from one timeto another in the same society. Fig. 9 shows how the slope ofthe Pareto line for UK has varied over the last 40 years. Duringthis period the slope has indeed fallen, but at a wildly varyingrate so that there is good reason to suspect that there havebeen periods in recorded history when the slope increased.Moreover, it would be expected that the interplay of order anddisorder, represented by the combination of the Pareto and

1010J 10

Fig. 9 UK income distributions

10

IEEPROC, Vol. 129, Pt. A, No. 1, JANUARY 1982 71

Whitworth distributions, would apply on a scale larger thanthe nation state, particularly as technological advance causesphysical boundaries to crumble. Thus, although the recursionhypothesis offers a useful foundation for the analysis ofhuman society there is much scope for deeper study.

5 Conclusions

(i) Meaningful messages in natural language or artificialcomputer languages are constructed by a recursive process inwhich symbol groups of every size from a single word to awhole chapter may be transplanted or copied, as necessary, bythe originator to convey his meaning. The concept of entropyas a measure of information, with the valuable property thatinformation flows can simply be added, is a consequence ofthe selection of the 'ergodic process' based on a Markovprocess in the foundations of communication theory, and theconsequential assumption that large-scale correlations do notoccur. However, the recursive process in the formulation ofmeaningful messages creates statistical correlations at everyscale, so that the 'ergodic process' model is invalid for natural-language text.

(ii) It is therefore necessary to introduce the concept ofusable entropy UE of N bits of communicable entropy CE,defined as the binary logarithm of the number of distinctmessages that can be formulated by the human recursiveprocess using TV bits of CE. Superficial consideration suggeststhat UE ̂ °° as CE ^ °° but UE/CE -*- 0 as CE -*- «>.

(iii) These arguments are derived from consideration ofnatural-language text, but they apply equally to programswritten in artificial computer languages and to the data uponwhich they operate.

At first sight, it would appear that tabulated operandsought not to be subject to the dilution of intrinsic utilityarising from recursion. However, deeper consideration suggeststhat just as nouns and verbs play an equal part in the recursivestructure of natural language, likewise the distinction betweeninstructions and operands in computer operation is irrelevantto large-scale structures. In natural language, tabulated infor-mation is a legitimate component of natural text so thatUE is the proper measure of the intrinsic utility of all textualinformation. Video information would appear to be anexception because it is generated by a mechanical processwithout human intervention. Nevertheless, the reconstructedpicture is normally intended for human use and it could wellbe that picture-compression techniques can be regarded asexploitation of the distinction between UE and CE.

(iv) It follows that if the W bits, simultaneously accessed,can be put to good use then the intrinsic utility of a store ofN words of W bits and access time T is

W— f(N) where, as TV ,/w

The cost of a large store in any technology tends to beproportional to W x N, so that it is more cost effective toincrease W and devise ways of using W bits, than to increase N.

(v) The conclusion that large stores are not cost effectivemay seem contrary to experience. However, the paradox canbe resolved by recognising that a set of small stores, eachdeployed in an organisational position close to the intrinsicstorage requirements, is more useful than a single store ofcapacity equal to the sum of the capacities of the small storesin an overcentralised organisational position.

(vi) Since it appears that the natural way that peoplemanipulate and use information incorporates recursion as afundamental element, a useful technique for making progressin the design of information engines would be given by thefollowing:

(a) Design a high-level language whose power is derivedfrom a few simple and natural operations of maximumrelevance, used recursively.

(b) Design an automaton to obey the high-level languageinstructions with maximum cost/effectiveness and adequateerror control by making optimum use of up-to-date storageand processing technology, following conclusion (iv), togetherwith appropriate use of interpretative and translationtechniques. As the recursive technique inevitably shapes dataand process structures, such structures represent much of theUE so that they must be represented explicitly at a low level inthe automaton.

(vii) Pareto's Law for the distribution of incomes inorganised society can be partly explained as a consequence ofrecursion in human social structures. However, there is scopefor further study to identify the social factors that determinethe slope of the rank/income plot, and the ratio of ordered todisordered income.

(viii) When Zipf [3] entitled his book, 'Human behaviourand the principle of least effort', his implication that there wasa common factor behind the wide range of statistical distri-butions that he studied was correct, although he did notspecify the nature of the common factor. It would appear thatZipf's 'Principle of least effort' is simply recursion.

(ix) If there is a grain of validity in these conjectures thereis a need to study consequential questions, e.g. consider thefollowing:

(a) What is the rank/frequency plot for Chinese ideo-grams?

(b) The slope of a rank/size plot is related to Mandle-brot's 'dimension' D. Is D then a measure of mean social'valency'?

(c) Is it possible to propose a recursive informationmodel well enough defined to enable UE to be expressed as afunction of CE?

(d) Is it possible to regard common structures in thearts, such as sonata form in music, simply as examples ofrecursion in 'information' used as a plaything?

(e) Is it a useful hypothesis to regard recursively definedstructure as an essential feature of every living organism?

6 Acknowledgments

The main threads of the argument have been spun with thehelp of many colleagues over a long period. The essentialtrigger that caused the paper to be written was a remark byE.C.P. Portman of ICL that 'Zipf's law is a consequence ofrecursion'. S.S. Roy undertook the measurement of word pairstatistics using the Brown corpus as experimental material withthe agreement of Prof. Francis of Brown University, and theco-operation of Prof. M.F. Lynch and Dr. D. Cooper ofSheffield University, who also provided supporting evidencefrom their own work. O.V.D. Evans read an early version ofthe draft and made valuable suggestions. K.C. Johnson helpedwith the analysis of the Whitworth distribution.

7 References

1 MANDLEBROT, B.: 'Fractals, form, chance and dimension' (W.H.Freeman & Co., San Francisco, 1977)

2 'American almanac' 1973, Table 551, p. 3363 ZIPF, G.K.: 'Human behaviour and the principle of least effort'

(Addison Wesley, 1949)4 ESTOUP, J.B.: 'Gammesstenographiques'Gammes Sterographiques,

Paris, 1916, 4th edn.)5 BRADFORD, S.C.: 'Source of information on specific subjects'

Engineering, 1934,131, pp. 85-866 PARETO, V.: 'Cours d'economie politique — Vol. 2' (Lausanne,

1897, Section 37 FAIRTHORNE, R.A.: 'Progress in documentation', J. Doc, 1969,

25, pp. 320-343

72 IEEPROC, Vol. 129, Pt. A, No. 1, JANUARY 1982

8 WHITWORTH, W.A.: 'Choice and chance' (5th edn., 1901,reprinted Stechert, 1942), pp. 567-580

9 GOOD, I.J.: 'The encyclopedia of linguistics, information andcontrol', in METHAN (Ed.)

10 LYNCH, M.F.: 'Variable length character string analysis of threedatabases and their-application for file compression'. Proceedings ofASLTB Co-ordinate Indexing Group, in Informatics 1, 1974

11 SHANNON, C, and WEAVER, W.: 'The mathematical theory ofcommunication' (University of Illinois Press, 1949)

12 ADDIS, T.R.: 'A theory of knowledge for man-computer com-munication'. Ph.D. Thesis, Brunei University, 1980

13 'Whitaker's almanac' (William Clowes & Sons, 1972)14 CHAMPERNOWNE, D.G.: 'A model of income distribution',

Economic Journal, 1953,63, p. 318^15 LYDALL, H.F.: 'The distribution of employment income',

Econometrica, 1959, 27, pp. 110-11516 SCARROTT, G.G.: 'Will Zipf join Gauss?' New Sci., 1974, 62, pp.

402-404

8 Appendix

8.1 Whitworth distirbutionsSince many distributions of interest must sum to a well-defined total it is useful to consider the rank/size diagram thatwould result if a straight line of unit length were cut into Npieces by N —\ cuts, such that each cut has a uniformprobability density of occurring at any point along the line.This model can be regarded as a generator of random numberssuch that their sum is predetermined. Whitworth showed thatif the pieces are sorted into order, starting with the largest, theexpectation, that is the mean value, of the M\\\ rankE(N,M)

Thus, the expectation of the largest piece (M = 1) is 1/TVxN2 (Ifr), the expectation of the smallest piece (M = TV) is I/TV2

M=Nand 2 E(N,M) = 1, because the sum of the expectationsmust be unity. The statistical distribution of the size of eachpiece of TV, computed for 2 < TV < 5, is shown in Fig. 10a as aset of histograms. They show that as TV increases, the histo-grams approximate to Gaussian as expected for all M exceptthe extremes, and that as TV -* °° the distribution of the sizeof a typical rank becomes narrow compared with its meanvalue, so that the properties of the Whitworth distribution aredefined by the distribution of expectations.

The properties of the Whitworth distribution can beillustrated by a graph (Fig. 10b) normalised with respect to TV,so that TV can be allowed to increase without limit. The verticalscale Y represents the sum of the P largest expectations, i.e.

M=Py = 2 E(N, M) and it is plotted at abscissa x = P/N so thatboth co-ordinates are in the range 0 to 1 regardless of TV. Then,as illustrated, the curve converges rapidly to y = x (1 — lnx)asTV -+ °°. The discrepancy in the distribution of expectations issmall, even for TV as small as 20, but the value of TV determinesthe statistical uncertainty and, therefore, the precision withwhich an experimental random distribution would be expectedto fit the asymptotic curve.

The distribution cf expectations can also be plotted as arank/sizg distribution on Iog10/log10 graph paper for a range ofvalues of TV as shown in Fig. 10c. The diagram illustrates aproperty of Whitworth distributions that enables the rank/sizeplot to be estimated easily for large values of TV. The largestexpectation is given by

1 £ 1

The second largest expectation is given by

1 N 1 1) z

N:2

N:3

M

- y*

-2

2

2

vX

M = 1

1

each distributionhas unit area

e.g. expec ta t ion E ( 3 , 2 ) N = 3 . M=2

N=5

c10Fig. 10 Whitworth distributions

a Statistical distribution of size, for 2 < N < 5b Normalised distributionc Whitworth rank/size with respect to the largest

Expectation E(N, M) = —

With increasing N the distributions approximate to Gaussion, with thestandard deviation small compared with the mean expectation.

IEEPROC, Vol. 129, Pt. A, No. 1, JANUARY~1982 73

ioooo r

1000 -

100 -

evilspiritual

Christianityknow u \ h o p e

that ^_ existence

10000

1000T

100 -

10000

74 JEEPROC, Vol. 129, Pt. A, No. 1, JANUARY 1982

1000

100

100

1

vth<

\

at

a n d

the

XX

\ i n

the

j t was

^ ^ t h i s

Xwould\ . who

N^morei to be X

^ V X^. X(^hQd been

^ ^ 1

roomX funny

XcomedyX ^ woman

H- housewas it »-|husband1 n^wonderfui*Tn 1 reference

1 1 1 mnd1 1 1 bravado

nymphomaniac

r . ^rn n 1000

correspondence

10000rank

Fig. 11 Rank/frequency relationships in natural language texts

a Section D of the Brown corpus on religionb Section M of the Brown corpus on science fictionc Section R of the Brown corpus on humour 16 000 words, 4000distinct (approximately)

therefore

E(2,N)1 -

In TVas TV ->

hence ratio projected to Mh rank

lnN/lm( I

l~kuv= e ~ 1 / l n 2 = 0.23629

Thus, as shown, the line joining the first and second rank,tangential to the curve defined by the rank/size points,extrapolated to the Nth rank has the value 0.236. The curvecan be sketched for large values of M and N using theapproximations

E(M,N) In NjM

E{\,N) In 1.781 N

8.2 Empirical studies o f rank/frequency re/a tionshipsin natural language texts

Our measurements were carried out on the Brown Universitystandard corpus of present-day American English (abbreviatedto the Brown corpus). It comprises over one million words of

running text of edited English prose, printed in the USAduring the calendar year 1961.

The attached graphs are the results of analyses of three ofits sections, i.e. section D on religion (Fig. Wei), section M onscience fiction (Fig. lib) and sectionR on humour (Fig. 1 lc).These sections comprise texts of approximately 34 000,16 000 and 12 000 words, respectively.

The rank/frequency distribution is, essentially, a collectionof discrete points, there being an ordinate (a frequency),corresponding to each integral value, up to the maximum ofthe abscissa (the rank). Some of these points are plotted, andthe line joining them serves no more than to guide the eye overthe ensemble of points belonging to one family.

The data on the rank/frequency distribution of single wordscorroborates earlier published evidence on different texts inthe cited References, whereas that on word pairs is anextension. The data on word pairs is collected for contiguousas well as noncontiguous pairs, d (for distance) = 0corresponds to adjacent word pairs, whereas d = 1 correspondsto word pairs once removed and so on. The highest frequencywords and word pairs, up to about rank 5, reflect languagespecific peculiarities of grammar such as the use of the definitearticle. It is known that the single-word plot for rank > 10is language independent, and there is reason to expect that theword pair plots would also be independent of language, butthis has not been checked.

IEEPROC, Vol. 129, Pt. A, No. I, JANUARY 1982 75

Date post:	20-Sep-2016
Category:	Documents
Upload:	gg
View:	218 times
Download:	4 times

Some consequences of recursion in human affairs

Documents