+ All Categories
Home > Documents > Hockett The Mathematical Theory of Communication by Claude L Shannon

Hockett The Mathematical Theory of Communication by Claude L Shannon

Date post: 08-Apr-2018
Category:
Upload: rebeca-tavira
View: 225 times
Download: 0 times
Share this document with a friend

of 26

Transcript
  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    1/26

    Linguistic Society of America

    Review: [untitled]Author(s): Charles F. HockettReviewed work(s):

    The Mathematical Theory of Communication by Claude L. Shannon ; Warren WeaverSource: Language, Vol. 29, No. 1 (Jan. - Mar., 1953), pp. 69-93Published by: Linguistic Society of AmericaStable URL: http://www.jstor.org/stable/410457

    Accessed: 01/06/2009 23:50

    Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at

    http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless

    you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and youmay use content in the JSTOR archive only for your personal, non-commercial use.

    Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at

    http://www.jstor.org/action/showPublisher?publisherCode=lsa.

    Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed

    page of such transmission.

    JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the

    scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that

    promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected].

    Linguistic Society of America is collaborating with JSTOR to digitize, preserve and extend access to Language.

    http://www.jstor.org

    http://www.jstor.org/stable/410457?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/action/showPublisher?publisherCode=lsahttp://www.jstor.org/action/showPublisher?publisherCode=lsahttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/410457?origin=JSTOR-pdf
  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    2/26

    REVIEWSThe mathematical theory of communication. By CLAUDEL. SHANNONand WAR-

    REN WEAVER. PP. vii, 117. Urbana: University of Illinois Press, 1949.Reviewed by CHARLESF. HOCKETT,Cornell University

    Most of this book (1-91) consists of an article by Shannon, bearing the sametitle as the volume, which first appeared in the Bell System Technical Journalfor July and October 1948. The remaining section, by Weaver, is entitled Recentcontributions to the mathematical theory of communication; a more condensedversion appeared in the Scientific American for July 1949. Weaver's paper isless technical than Shannon's, and might well have been placed first in the book,as an introduction to Shannon's exposition. In this review explicit referencesto Weaver will be few, but this is deceptive: the reviewer found Weaver's morediscursive treatment of great value in grasping Shannon's often highly technicalpresentation, and the reader who chooses to pursue the subject further will dowell to read Weaver's paper before attempting Shannon's.A number of other contributions to the theory of communication have ap-peared in recent years. Two by Robert M. Fano are worth mentioning here:The transmission of information, Technical reports No. 65 (17 March 1949)and No. 149 (6 February 1950) of the Research Laboratory of Electronics, Massa-chusetts Institute of Technology. Fano's discussion is helpful because of adifference of approach, though his results are substantially the same as Shan-non's.'The appearance of the term 'communication' or 'information' in the title ofan article or a book is naturally no guarantee that its contents are of any concernto linguists. Shannon's work stems in the first instance from engineering con-siderations-telegraph, teletype, telephone, radio, television, radar, and thelike-which would seem rather remote. But the theory is rendered so general inthe course of mathematicizing that it may turn out to apply, in part or with

    l These and other contributions to information theory refer constantly to the work ofNorbert Wiener: the famous Cybernetics (1948), and The extrapolation, interpolation, andsmoothing of stationary time series with engineering applications (1949; earlier printed asan NDRC Report, MIT, 1942). Cybernetics consists of chapters of extremely difficult prosealternating with chapters of even more difficult mathematics; the other volume is reportedto consist almost completely of the latter. The reviewer had managed to absorb some oddbits of the prose parts of the first of these before Shannon's articles appeared; the mathe-matical parts are entirely beyond his capacity.A fairly popular discussion of some aspects of information theory will be found in E. C.Berkeley, Giant brains (1949), particularly the earlier chapters.In June and July 1951, a grant from the Committee on the Language Program of theACLS enabled the reviewer to attend the first intensive summer course at MIT on ModernCommunications; the various lectures and discussions in this course were of considerablehelp in following the work of Shannon and others in this field. However, the reader shouldbe warned that the reviewer's training in mathematics is very slight, and that as a resultparts of this review may be based on serious misunderstandings.

    69

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    3/26

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    4/26

    REVIEWS

    per message, on the average, than with only two alternatives. That is, the largerthe repertory of possible messages, the larger, in general, is the informationalCAPACITY of the system.

    For various reasons the measure actually chosen is not the raw count of thenumber of messages in the repertory, but rather the logarithm of this number tothe base 2-providing that the number of messages in the repertory is finite,and that they are all equally likely to be chosen. If either or both of these condi-tions is lacking, then the measure of amount of information becomes more in-tricate; but the more complicated formula reduces to the one described abovewhen the conditions are met.3The unit of information thus measured and quantified is called the BINARYDIGIT, BIGIT, BINIT, or BIT; the last term is currently the most favored, but weshall use BINIT.4 A term is needed for the unit of capacity; we define one SHAN-NON as a capacity of one binit of information per second.Thus, in the scheme outlined above, where I ask you questions which must beanswered with 'yes' or 'no', and where those answers are equally probable, wehave a system with a capacity of one binit per message. The fundamental as-sumption in the game Twenty Questions is that any animal, vegetable, or mineralcan be specified unambiguously, on the average, by twenty successive dichot-omizings of the appropriate kingdom; that is, that twenty binits of informationwill usually suffice for such specification. Skill at interrogation in this gameconsists in so phrasing one's questions that the region specified by the answersto all previous questions is divided into two essentially equiprobable subregions.However, there is an important but peculiar restriction on the use of informa-tion-theoretical methods. They serve to measure the entropy of an information-source or the capacity of a channel (the terms will be defined in a moment), butthey afford no means whereby we can state how much information is actuallyconveyed by the actual selection and transmission of any specific message.Your equiprobable yesses and noes transmit ON THE AVERAGE one binit of in-formation each but how much information is carried by any one specific yes orno is undefined.5A concrete example will serve to introduce more of the necessary terms.

    A generalat a teleconference6writes out a message for transmission. In so doinghe functions, from the viewpoint of communications, as a SOURCE. The MESSAGE3 Much of Fano's first report (see the first paragraph of this review) is devoted to a slow

    building up of this measure of capacity. He makes it eminently clear why one chooses thelogarithm to the base 2. (For readers who have forgotten their high-school arithmetic: thelogarithm of n to the base b is a number x, such that bz = n.)

    4 Because the assignment of a technical meaning to a word which is frequently used aspart of our common-sense vocabulary proves constantly embarrassing in more informaldiscussion. The replacement will be made even in (otherwise) direct quotations fromShannon's text. Similarly 'shannon' will usually thus replace 'bit per second'.

    6 Wiener's approach is somewhat different, and specifies at least some circumstancesunder which we can state exactly how much information is conveyed in a given message.See his Cybernetics, ch. 3, esp. 75-6. But it is not certain that Wiener is dealing with thesame 'information' as Shannon.6 A type of conference, common in military operations, in which participants at widely

    distant points communicate by teletype.

    71

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    5/26

    LANGUAGE,VOLUME29, NUMBER1consists of a linear sequence of SYMBOLS (MESSAGE-UNITS), each one of which isselected from a repertory of 32 possible symbols: the 26 letters of the alphabet(with no distinction between capital and lower case) and six supplementarypunctuation marks, one of which is a space.At the keyboard of the teletype TRANSMITTER, an operator strikes keys in theorder required by the message. This TRANSDUCES the message (or ENCODES it)into a SIGNAL, in this case a set of electrical impulses which will travel along awire until they reach the teletype receiver. The wire, or alternatively a band-width of frequencies of electrical impulse used on the wire, constitutes a CHAN-NEL. Teletypy operates in terms of a stock of 32 SIGNAL-UNITS (or SYMBOLS)-different patterns of voltage variation-assigned in a one-to-one way to the 32letters and punctuation marks allowable in messages so to be transmitted. Thesesignal-units all require the same amount of transmission time. So far as teletypyitself is concerned, therefore, a transmission rate of n signal-units per secondwould imply the possible transmission of 5n binits of information per second, ora capacity of 5n shannons-since the logarithm of 32 to the base 2 is 5. Forreasons which we shall examine shortly, teletypy never attains this maximum.

    At the teletype RECEIVER, the incoming signal is retransduced (or DECODED),producing once again a message. This message will show nothing of the general'shandwriting, of course, but normally it will be 'literally' the same-that is,it will consist of the same letters and other symbols in the same linear order-asthe message produced by the general. The RECOVERED message is then handed toa colonel, let us say, who from the viewpoint of information theory is a DESTINA-TION or SINK.In order for teletypy to operate at maximum capacity, it would be necessaryfor each one of the 32 signal-units to be equally probable at all times during thetransmission, regardless of which signal-units had already been transmitted.Now so long as the 32 signal-units are assigned in a one-to-one fashion to Englishletters and punctuation marks, this condition cannot be met, since all the limita-tions on sequence of English spelling are directly carried over to the signal. Sincethe letter-sequences QL, TSR, SSS, and the like, never occur in English spelling,the corresponding sequences of signal-units will never leave the transmitter.Since ' CHL (# = 'space') is relatively infrequent, while N CHE is rather morecommon, the same differences of frequency will appear in the utilization of thevarious signal-units. After the signal-unit assigned to T, the probability of theone assigned to H will be higher than that of the one assigned to C. All such devi-ations from constant equiprobability of the signal-units represent inefficiencies-the use of more transmission-time than is necessary for the amount of informa-tion to be sent.Greater efficiency can be attained by changing the code which assigns signal-units to message-units. A first step would be to use signal-units of varying dura-tions (though with the same average duration as before), and to assign the short-est signal-units to the most frequent message-units, the longest to the leastfrequent. Or instead of considering the message-units one at a time, one coulddetermine the relative frequency of all possible sequences of two message-units,and assign the shortest sequences of two signal-units to the sequences most

    72

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    6/26

    REVIEWSfrequently used. If one does not care how complicated the transmitter and re-ceiver have to be-if this is a sufficiently trivial consideration relative to thecost of transmission of the signal from transmitter to receiver-then such changeof code can be continued until the maximum capacity inherent in teletype (5nshannons) is approached. It is worth noting that more efficient coding ofmessage into signal in general requires a delay at the transmitter, which mustcollect a number of successive message-units to be encoded all at once, and asimilar delay at the receiver.Most communicative systems involve at least some constraints on sequence;that is, some symbols are not followed by certain others, or are followed byvarious others with different relative frequencies. To handle this, 'we imagine anumber of possible states [of a source or a transmitter] .... For each state onlycertain symbols . . . can be transmitted (different subsets for different states).When one of these has been transmitted the state changes to a new state de-pending both on the old state and the particular symbol transmitted' (8). Thematter of relative frequency is easily added to this, by considering that eachstate is characterized by a set of relative probabilities as to which symbol willnext be transmitted and, consequently, which new state will ensue.We can illustrate with English phonemics. Having begun an utterance bypronouncing a /p/, a speaker of English is in a 'post-/p/' state: he can chooseany vowel, or /r, 1,y, w/, as next'symbol', but not, for example, /t/ or /k/. Thevarious possible choices have various relative probabilities. If he chooses /r/,he is then in a 'post-/pr/' state, with new limitations and new probabilities: anyvowel, but not, for example, another /r/ or /p/ or /1/. And so on. It is to benoted that the post-/pr/ state is not identical with the post-/r/ state, establishedwhen a speaker begins his utterance with /r/.With this scheme, attention can focus on the whole set of possible interstitialstates, instead of on the symbols; the latter can be regardedas elements 'emitted'by the source (or transmitter) as it passes from one state to another. Mathe-matically the great advantage of this way of viewing the matter is that there is awell-understood set of machinery at hand, the theory of Markoff chains, which isimmediately applicable.7Any Markoff chain can be described by a square arrayof probabilities, the entry in a given row and column being the probability thatthe state correspondingto the row will be next followed by that correspondingto the column. To facilitate further deductions, some limitations have to beimposed on the variety of Markoff chains allowed; a very general limitation,which both renders further deduction possible and also subsumes a wide varietyof cases, is that the chain be ERGODIC:that is, the probabilities must be such thatthere is no state which can never recur.8This seems to me to correspond to thefundamental (and not always overtly expressed) assumption involved in syn-chronic linguistic analysis: the assumption that we as analysts, like the speakers

    7A good new reference on this is W. Feller, Introduction to probability theory and itsapplication, chs. 14-17 (1950).

    8 More strictly, there is no state which has probability zero of recurrence. Impossibilityimplies probability zero, but not conversely. Note that the term'ergodic' is currently usedin a variety of more or less closely related senses, of which the present use is one of thesimpler.

    73

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    7/26

    LANGUAGE,VOLUME29, NUMBER1of a language themselves, can ignore the short-term (hourly, daily, yearly) re-sults of continuous linguistic change, and still get valid results; the extent towhich this assumption is false is a measure of the rate of linguistic change.A source (or a transmitter) which emits its symbols with constant equiprob-ability is generating information at the maximum rate possible within the limitsof the finite repertory of symbols it uses and of the rate at which those symbolsare emitted. The actual rate at which a source generates information, on theaverage, is the ENTROPY of the source; the ratio of this to the theoretical maxi-mum is the RELATIVE ENTROPY.9 'One minus the relative entropy is the REDUN-DANCY. The redundancy of ordinary English [writing],not considering statisticalstructure over greater distances than about eight letters, is roughly 50%. Thismeans that when we write English half of what we write is determined by thestructure of the language [i.e. of the language and of the writing system] andhalf is chosen freely. The figure 50% was found by several independent methodswhich all gave results in this neighborhood.' One method 'is to delete a certainfraction of the letters from a sample of English text and then let someone at-tempt to restore them. If they can be restored when 50%0are deleted [at random]the redundancy must be greater than 500%'(25-6).10

    Shannon's first major result (towards his aim, summarized in the first para-graph of this section) is the following 'fundamental theorem for a noiselesschannel' (28):Let a source have entropy H [binits per symbol] and a channel have acapacity C [shannons].Then it is possible to encode the output of the sourcein such a way as to transmit at the average rate C/H -e symbols persecond over the channel wheree is arbitrarily small. It is not possible totransmit at an average rate greater than C/H.

    The readerwill recall our earlier discussion of the efficiencyof teletypy and meth-ods of increasing it by modification of code. The theorem establishes the outerlimits within which such improvement can be brought about.But there is a factor, not yet discussed, which sets narrower limits: NOISE.In engineering parlance, noise is anything which operates on the signal, as ittravels along the channel, in such a way that the received signals are not alwaysthe same as the transmitted ones. To be noise, the effect must be random, andthus only statistically predictable. For if one knew in advance, for example, that9 Entropy can be measured in terms of time or in terms of symbols; the latter is usefulin dealing with cases such as writing (or other forms of 'information storage'), where therate per unit time depends on the rate at which the written symbols are read. If the entropyin terms of symbols is H', and n symbols per second is the average rate of transmission,emission, or reading, then the entropy in terms of time in nH'.10Some of Shannon's discussion preparatory to this is fascinating, particularly on thesubject of successive artificial (statistically controlled) approximations to written English(13-5), which underlie another method of determining the redundancy of written English.

    Those of us interested in such matters as literary style are particularly apt to enjoy thefollowing paragraph (26): 'Two extremes of redundancy in English prose are represented byBasic English and by James Joyce's book Finnegans Wake. The Basic English vocabularyis limited to 850 words and the redundancy is very high. This is reflected in the expansionthat occurs when a passage is translated into Basic English. Joyce on the other hand en-larges the vocabulary and is alleged to achieve a compression of semantic content.'

    74

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    8/26

    REVIEWSprecisely every fifth signal-unit would be distorted in the channel, then it wouldbe easy simply to avoid those moments of distortion, and to transmit the entiremessage during the noiseless intervals.1'This recalls the necessary indeterminacy (for receiver and destination) inmessages themselves, if any information is to be transmitted. If the receiver ordestination knows in advance what message is going to be transmitted, itstransmission conveys no information; if the receiver knows in advance whatdistortions are going to be imposed on the signal in the channel, those distortionsare not noise and do not interfere with the transmission of the message. In fact,since noise is necessarily random, it is possible to characterize a 'noise source'in precisely the same way that we characterize an information source: a noisesource emits an undesired 'message', with a statable entropy, and this undesired'message' interferes with the reception of the desired one. Put another way, ifpart of the capacity of a channel is used for the transmission of noise (undesired'message'), then just that much of the capacity is unavailable for the transmissionof the desired message.Occasional misprints in an article, or errors of transmission in a telegram, donot usually interfere with the intelligibility of the article or telegram for him whoreads or receives it. Such misprints or errors of transmission are the result ofnoise (or, with a slight change of emphasis, can be said to constitute noise). Thereason for usual intelligibility despite such noise is perfectly clear: the redun-dancy of written English. Here, then, is the importance of redundancy: channelnoise is never completely eliminable, and redundancy is the weapon with whichit can be combatted.The capacity of a noisy channel is obviously not definable in the same way asthat of a noiseless channel. If at the receiver the entropy of the incoming signal isactually equal to the capacity of the channel on the assumption of no noise, thena certain portion of that entropy is in fact due to noise, and only the remainderconstitutes the effective maximum capacity of the channel-assuming that atthe transmitter the message is being encoded into the signal in the optimum way.This defines the capacity C of a noisy channel (38), and this definition proves tobe a generalization of the earlier one for a noiseless channel, in that if zero beassigned to the noise factor in the new definition, the result is the old one.

    It may seem surprising that we should define a definite capacity C fora noisy channel since we can never send certain information in such acase. It is clear, however, that by sending the information in a redundantform the probability of errors can be reduced. For example, by repeatingthe message many times and by a statistical study of the different receivedversions of the message the probability of errors could be made very small.One would expect, however, that to make this probability of errors approachzero, the redundancy of the encoding must increase indefinitely, and therate of transmission therefore approach zero. This is by no means true.If it were, there would not be a very well defined capacity, but only acapacity for a given frequency of errors ...; the capacity going down as11On page 48 Shannon gives an example of noise which is indeterminate within certainprecisely defined (determinate) limits, and of a method of counteracting its effect com-

    pletely; this is, in a sense, only a more complex example of 'determinate' distortion thanthat given here.

    75

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    9/26

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    10/26

    REVIEWSdirectly, as sound-waves passing through the air. The acoustician currentlymakes use of oscillographsand spectrographs, both of which transduce the speechsignal into a visual report which can be examined at leisure. The transductionsinvolved are quite complex, and do not always give facsimile-type accuracy;but whatever difficulties may be implied by this, at least one thing is certain:oscillographs and spectrographs do not impose a spurious appearance of con-tinuity on a signal that is actually discrete.The linguist, in phonemicizing, uses no hardware; but he, also, is unable toexamine the speech-signal directly. The ear and the associated tracts of thecentral nervous system constitute a transducer of largely unknown characteris-tics; in what follows we shall attempt to deduce at least a few of these.A continuum can be transformed into a discrete sequence by any of variousQUANTIZING operations; the notion of quantizingis familiarenough to communica-tions engineers, though the quantizing operations used in electronic communica-tions are all quite arbitrary. Similarly, a discrete sequence can be transformedinto a continuum by what might be called a CONTINUIZING operation. Now if thecontinuum-report of the acoustician and the discrete-report of the linguist areboth correct, then there must be, for any given shared body of raw material, aquantizing operation which will convert the acoustician's description of the rawmaterial into that of the linguist, and a continuizing operation which will dothe reverse; the desired quantizing and continuizing operations must be inversesof each other.

    Joos affordsa point of departure for the search for these operations:12 'Let usagree to neglect the least important features of speech sound [the speech-signal],so that at any moment we can describe it sufficiently well with n measurements,a point in n-dimensional continuous space, n being not only finite but a fairlysmall number, say six.... Now the quality of the sound becomes a point whichmoves continuously in this 6-space, sometimes faster and sometimes slower,so that it spends more or less time in different regions, or visits a certain regionmore or less often. In the long run, then, we get a probability-density for thepresence of the moving point anywhere in the 6-space. This probability-densityvarious continuously all over the space. Now wherever [one] ... finds a localmaximum of probability density,' there the linguist findsan allophone; and 'therewill be not only a finite but a fairly small number of such points, say less thana hundred.'

    By regarding the moving point as input, and adding certain further specifica-tions, we shall convert Joos's description into that of a transducer. Forgettingfor the moment about the probability-density, let us imagine, in the 6-space, ahoneycomb of cells, with boundaries between them, such that every point in thespace (save those on the boundaries) is in one or another cell. In each cell thereis a trigger, which is activated whenever the moving point stays within theboundaries of that cell for a sufficiently long time. Each trigger, when activated,transmits a single output signal, so that the continuous operation of the trans-ducer produces a discrete series of outputs. Finally, imagine that the boundaries12 Description of language design, Journal of the Acoustical Society of America 22.701-8

    (1950). Joos's number 6 is purely arbitrary; one might better replace it throughout by n.

    77

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    11/26

    LANGUAGE, VOLUME 29, NUMBER 1are made of perfectly flexible rubber, so that the location of the different bound-aries is not fixed; indeed, at a given time one cell may be enlarged so as to includealmost the entire relevant volume of the 6-space, compressingthe others to verysmall size. In a given short interval of time, the output of the system is a functionof the input and of the location of the boundaries between the cells. Now weshall specify that the location of the cell boundaries, at any given moment, is afunction of the immediately preceding succession of N output signals (NOT in-put), where N is some fairly large number.Such a system will indeed map a continuous input into a discrete output. Ifthe details of its construction are based on both the acoustic analysis and thephonemic analysis of the speech-signal of a given language, then the system willtransduce the acoustician's description into the linguist's description, or, whatamounts to the same thing, will transduce the speech-signal in the physical senseinto a linear sequence of allophones in the linguist's sense.

    oFiaedBoundary

    ,\ g VOICEDOntput^Y .Conteauous h s t t vmayut hvvu a iftable Boundary

    >\ Pliced Boundary

    A one-dimensional reduction will serve as illustration. Suppose for a momentthat the only relevant feature of speech is voicing versus voicelessness. Even forjust this one feature the acoustician tells us that the voice-bar onla spectrogrammay have virtually an infinity of degrees of prominence (or could have on an idealspectrogram), while the linguist knows that, in French or English for example,there are two and only two degrees of contrast along this particular scale. In theappended diagram the continuous input arrives from the left. The curve repre-sents the probability-density along this one dimension; our reason for giving thecurve its particular shape will appear later. At the top and bottom of the scaleare two fixed boundaries. Between them is a movable boundary. If a sufficientlylong stretch of input falls above this movable boundary, the top trigger is acti-vated and the voiced output signal is transmitted; similarly below the movableboundary. At a given moment, if the probability of voiced as the next signal isextremely high, then the movable boundary will slide very far down the scale,

    78

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    12/26

    so that almost any next input-stretch will activate the voiced trigger rather thanthe voiceless trigger; and vice versa. The output will then be a discrete successionof signals, each of them either voiced or voiceless.Let us return to the way in which the location of the movable boundaries isdetermined. It is determined, in the first instance, by the preceding N outputsignals. Since this preceding output depends, in turn, on input, one might suspectthat the preceding N output signals, as a factor conditioning what happens to agiven bit of input, could eventually be bypassed, as we consider the progressivelyearlier history of the transducer. But each of these preceding N output signalswas also dependent, because of the principle which governs boundary locations,on the N output signals which preceded it, so that there can be no hope of by-passing the conditioning effect of 'previous output' save by tracing the operationof the system back to some initial state, before any input at all. So far, such aninitial state is undefined, but we will return to it in a moment. Examiningthe system in current operation, what the acoustician is able to do is to describeprobability-densities. What the linguist can do is to state, on the one hand, thetopological structure of the honeycomb of cells, and on the other, the contingentprobabilities of each output signal after each sequence of N preceding outputsignals. Part of this is what the linguist currently does when he determines anddescribes phonemic structure; the remainderis what he could do with no furthertechnique but some simple statistics. What the acoustician and the linguistworking together can do is to determine the metric structure of the honeycomb

    under various conditions, including the relation of boundaries to local maximaof probability-density.Our assumption is that, in a way, the above description of a transducer and atype of transduction applies to the processing to which the incoming speech-signal, after it impinges on the ear of a hearer, is subjected, within the centralnervous system of the hearer. Because the linguist has such a transducer withinhimself, which operates on the speech-signal before he can begin to examine andanalyze it, he cannot (using no hardware) perceive the continuous nature of thespeech-signal, but necessarily interprets it as discrete.It is known in information theory that if a continuous signal is quantized, notransduction is available which will recover the original continuous signalexactly.'3Ourassumption is that this is irrelevant, since the linguistic informationcarried by the speech signal accounts for only a small part of the total entropythereof; the quantizing transduction performed within the hearer need onlyrecover as much from the speech signal as was put into it by the speaker. Theact of speaking is also a transduction; it converts an inner discrete flow of allo-phones, inside the speaker, into the continuous speech signal. The apparent dis-crepancy between the acoustician's report and that of the linguist is then due tothe fact that they are tapping the same complex communications system atdifferent points: the acoustician taps it in the external channel, where the in-

    "3Shannon's Theorem 13 (p. 53) does not contradict this assertion. It allows us to trans-duce a band-limited continuous signal into a discrete signal, but requires for complete re-coverability the use of an infinite repertory of discrete signal-units. All that can be donewith a finite repertory is to approximate complete recoverability as closely as it is desired.

    79REVIEWS

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    13/26

    LANGUAGE, VOLUME 29, NUMBER 1formation indeed flowsin a continuous form; the linguist taps it after the quanti-zing transduction of hearing, or before the continuizing transduction of speaking,or both. Edward Sapir characterizedthese two transductions beautifully, thoughin a different terminology, when he wrote in 1921: 'In watching my Nootka in-formant write his language, I often had the curious feeling that he was transcrib-ing an ideal flow of phonetic elements which he heard, inadequately from apurely objective standpoint, as the intention of the actual rumble of speech.'4Since the linguist does not investigate these postulated 'inner flows' withscalpel and electroencephalograph, it is proper to ask just what procedure hedoes use. To answer this we ask what might be meant by our earlier referenceto an 'initial state' of a quantizing transducer. If our requisite quantizing andcontinuizing transducers exist in human nervous systems, then the 'initial state'of either is the state to be found in a child before he has begun to learn his nativelanguage. As he learns his native language, the child has access to only certainkinds of evidence to decide whether two stretches of incoming speech-signal areinstances of 'the same' signal (that is, are phonemically the same) or not: thephysical properties of that speech-signal, insofar as human physiology can meas-ure them, and the conditions under which the speech-signal arrives-in short,physical similarity and similarity of meaning. The building of the quantizingtransducer in the child proceeds by trial and error, errors being corrected byfailure to adapt appropriately, as with any learning process.

    If the linguist cannot open up the transducers in the head of a native speakerof a language, he can do something just as effective: he can himself become as alittle child (insofar as the specific language is concerned), and build within hisown nervous system a reasonable facsimile of the transducer that the nativespeaker has. A reasonable facsimile is enough, for if a language is spoken bymore than one person, the transducer inside any one person is no more than areasonable facsimile of that inside another. This is what the linguist can do, anddoes. In addition, he does something that the child cannot do: he keeps an overtrecord of the structure of the system which is being built internally. This overtrecord in due time becomes his description of the phonemic system which hascome to exist inside his own nervous system.We can now see why local maxima of probability-density, determinable byacoustics without the help of the linguist, will show some correlation with thelinguist's allophones. If there were no such correlation, the process of learning alanguage, and equally the linguist's problem of phonemicizing a language, wouldbe impossible. That is why the curve of probability-density, in the simplifiedone-dimensional case of the diagram, was drawn with two local maxima.The above considerations, if valid, define the problem of acoustic phonetics:accepting the phonemic findings of the linguist and the acoustical findings ofspectrograms or oscillograms, it is the task of acoustic phonetics to determineand describe the very complicated nature of the two transductions which relatethe two sets of findings.

    14 Language58 fn. 16 (1921).

    80

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    14/26

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    15/26

    LANGUAGE, VOLUME 29, NUMBER 1possible sequences of this kind being allowable in the system. Given equiprob-ability, the system is capable of transmitting on an average 10 binits of informa-tion per message. Call this binary code (a).Now suppose that we decide, quite arbitrarily, to replace a random selectionof the occurrences of the symbol '1' in these messages by the symbol 'i'. Themessage which in code (a) would appear always as '1211212221' will now betransmitted in that same shape, or alternatively in the shape 'i211212221','12I1212221', 'I2112I222I', and so on. This modification gives us code (b).Nothing essential in code (a) has been changed; the entropy per message is thesame as before; but we have a case of what the linguist calls 'free alternation'.That is, '1' and 'i' are physically distinct ('phonetically different', different 'allo-phones'), but do not contrast. Yet if one began the study of a system which usedcode (b) by examining sample messages in it, one's first impression would be thatthe repertory contained three different symbols rather than two. Only statisticalanalysis would show that the code was essentially binary rather than ternary.The linguist would conclude, in due time, that '1' and 'i', though 'phonetically'different, were 'phonemically' the same: in free alternation, and 'phoneticallysimilar' in that the shapes '1' and'I' resemble each other more than either ofthem resembles '2'.

    Code (b') is formed from code (b) by using the shape '3' instead of the shape'I'; the sample message given above might now be transmitted as '3211232223','1213212221', and so on, all equivalent. With respect to information, this is stillan inessential change, and the code is fundamentally binary. But the linguist atthis point would balk at phonemically identifying'1' and '3', despite the factthat they are in free alternation, since the shapes'1' and '3' do not resembleeach other any more than either of them resembles the shape '2'. The factor of'phonetic similarity' is lacking.Next, suppose we keep three physically distinct symbols,'1', '2', and'I' forcode (c),'1', '2', and '3' for code (c'), but set up an instance of complementarydistribution instead of free alternation. Starting with the messages of code (a),we replace '1' respectively by 'i' and'3' wherever in code (a)'1' is immediatelyfollowed by'2',

    but otherwise keep the symbol-shape'1'. Once again, there isno essential change in the code from an information point of view. But in code(c) the linguist will note the complementary distribution and the phonetic simi-larity, and will call'1' and 'I' allophones of a single phoneme; while in code(c) the presence of the former and absence of the latter will lead him to set upthree phonemes, despite the lack of contrast between two of them.Finally, we devise codes (d) and (d') by a rather different step, based on mes-sages as they appear in codes (c) and (c') respectively: we delete from the mes-sages all occurrences of 'I' or '3' which are immediately preceded by the shape'1', but retain those occurrences of 'i' or'3' which are initial in a message orwhichare immediately preceded by '2'. For example:code (a) 1221221112 1121211121code (c) I22i2211i2 1i2i211i21code (c') 3223221132 3323211321code (d) i22i22112 12i21121code (d') 322322112 12321121

    82

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    16/26

    REVIEWSThe messages of code (d) or (d') stand in one-to-one relation to those in code(c) or (c'), and hence to those in code (a). The latter can be recovered from thosein (c) or (c') by operating on them as follows: wherever the sequence '12' occurs,insert an 'I' (or a '3') between the '1' and the '2'; then replace all instances of'I' (or '3') by '1'. In (d) and (d') the ENTROPY PER MESSAGE is the same as incode (a).However, neither the information theorist nor the linguist would claim thatcodes (d) and (d') are identifiable with code (a) in quite the same way as codes(b) and (c).In (d) and (d'), the ENTROPY PER SYMBOLis different from that in (a); one hashere a real ternary code, in which there are considerablerestrictions in sequence,restrictions which reduce the overall entropy of the code to that of a binary codeof type (a). From the linguistic point of view, 'I' and 'I' are in contrast in (d),and '3' and '1' in (d'), because there are such initial sequences as 12... versusi2... (or 32...), and such medial sequences as ...212... versus ...2i2... (or ...232...).Long before we get this far, the communications engineers will have askedus: Why bother to make such replacements and introduce such complications?Why not stick to just one shape for each symbol in the repertory? The answerto this is clear from the first paragraph of this subsection: such complications,arbitrary though they may be, are of the sort encountered by the linguist as heexamines languages; they are not invented by him for the pleasure of complexity,but simply have to be accepted. When the linguist works with a language, whathe finds accessible to study is in the first instance a body of messages exhibiting alarge variety of such complications and limitations of sequence. His task, asphonemicist, is essentially to determine the precisenature of the allophonic units,the nature of the interrelations between them (free alternation, complementa-tion, contrast, and the like), and in these terms to ascertain the various statisticalrestraints on sequence. So much, and only so much, is his LINGUISTIC task. Hemay do this in terms of allophones (cells in the 6-space described above), or hemay break allophones down into simultaneous components and regard them ashis fundamental units (equivalent to taking the coordinates of the local maximaof probability-density in the 6-space and handling those in each dimension asseparate units).17Essentially irrelevant orthographic considerations come in as soon as thelinguist proceeds to the point of phonemic identification. When, in code (b) or(c), the linguist says that '1' and 'I are 'the same phoneme', he is simply sum-marizing the following facts: '1' and 'I' are, within the system, phoneticallysimilar; they do not contrast; messages in this code can therefore be transduced,in a manner that preserves the entropy, into a code with one fewer symbol; indevising a writing-system one can make use of this fact and eliminate a symbolneeded earlier.When, for code (b') or (c'), the linguist refuses to make a similarstatement, he is reflecting an aspect of the meaning of the term 'phoneme' which

    17 As described in the parenthesis, the result would be acoustic componential analysis.What we normally do is to change coordinates in such a way as to get articulatory compo-nents. The properties of the system are in theory invariant under any such change of co-ordinates, no matter how complex; if in practice they are not, it is because we do not yetunderstand well enough how to change the coordinates. This is part of the task of acousticphonetics.

    83

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    17/26

    LANGUAGE, VOLUME 29, NUMBER 1is irrelevant with respect to information-the requirement of phonetic similarity-and is therefore simply choosing a different terminology to report the remain-ing facts of the case. Some linguists would be tempted to 'phonemicize' code (d)by saying that where overtly one has the allophonic sequence '12' there is 'really'a variety-a zero alternant-of 'I' between them; to base a writing-system onthis consideration would clearly be feasible in an unambiguous way, but withinphonemics such a step is not valid.The communications engineeris right in not understanding fully what linguistsmean by phonemics, for we linguists have been fairly muddy in our phonemicthinking. The establishment of phonemic units can be rendered relatively non-arbitrary by accepting the criteria of phonetic similarity and of contrast versusno contrast, and by preferringthat otherwise valid phonemicization which maxi-mizes average entropy per symbol. But the selection of these criteria is itselfarbitrary. A redefinition of the aims and procedures of phonemic analysis alongthe lines suggested above, and a clearer segregation of purely orthographicconsiderations, is a desideratum.

    C. The Entropy and Redundancyof SpeechSpeech, examined linguistically, is discrete, but it is not necessarily linear.In telegraphy, which is linear, if two signal-units are to be transmitted, one ofthem can be sent either before or after the other, but there is no third alternative.In speech there is a third possibility: the signal-units may be simultaneous. Thisarrangement is possible, of course, only for certain phonemic units relative tocertain others, not freely for all, even if we go to the extreme of componentialanalysis and say that at practically any time more than one phonemic componentis being transmitted. Nevertheless, this greatly complicates the problem ofmeasuring the entropy of speech. The mathematical frame of reference workedout by Shannon for such measurements can be applied only if we interpret allsimultaneous bundles of components, or of short components and temporalsegments of long components, as distinct phonological units, a procedure whichdoes indeed portray speech, phonologically, as linear. In English, for example,

    /a/ (thevowelwithloudstress), /a/, /A/, and/a/ would thushave to be inter-preted as four distinct units, rather than as one vowel and four different accom-panying and simultaneous stresses; and this set of four would have to be multi-plied by four again to account for differences in phonemic tone. Such a procedureis obviously highly inefficient for most linguistic purposes.An alternative is to modify Shannon's mathematical machinery so as to takecare of a set of several linear sequences of symbols transmitted in parallel, wherethere are statistical restraints not only among those in the same linear sequence,but also between those in one linear sequence and those in others. I have no ideahow complicated the necessary mathematical machinery might be, but I suspectthat it would be very complicated indeed.In the face of these difficulties, it may seem absurd to give any figure at all forthe entropy of speech; we shall nevertheless state that the entropy of English,at normal conversational speed, seems to be very roughly in the neighborhoodof 50 shannons. This figure may be off by as much as one hundred per cent, and

    84

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    18/26

    REVIEWSis more likely to be an overestimate than an underestimate. For our immediatepurpose this rather gross inaccuracy does not count for much, since we want tocomparethe entropy of speech, analyzed phonemically, with the capacity of thechannel used by the speech signal. This channel is a bandwidth of acoustic fre-quencies; if fully utilized, it could carry 50,000 shannons.18The discrepancy is astonishing, Neglecting noise, it would imply a relativeentropy of the source of only 0.1%, a redundancy of 99.9%. This would revealhuman speech as one of the least efficient communicative systems in existence.But there are other factors in the situation which render it a bit less striking.A speech signal carries more information than just that imposed on it by thephonological structure of what the speaker is saying. Some of this informationserves to identify the speaker, since we do manage somehow to tell people apartby their voices. Some of it tells the hearer about the mood or state of health ofthe speaker-whether he is angry or contented, whether or not he has a springcold.19Of course, linguistically relevant portions of the speech signal may alsocarry information, indirectly, about all of these matters, but that is part of theO.l1%oand does not concern us. In the normal face-to-face situation of speechcommunication, there is a good deal of interchange of information which is notcarriedby the speech-signals at all, but by the continuous train of socially condi-tioned bodily movement and gesture which both accompanies speech and goeson during silence.20If we could measure the capacity of this channel-for cer-tainly it is one-and add that to the outside capacity of the channel of the speechsignal, the relative figures would be even greater.No one knows how much of the capacity of the speech signal is taken up bymetalinguistic and speaker-identification information, but it may be a good deal.It is for all these reasons that the linguist has very little useful advice for tele-phone and radio engineers. Their job is to deliver the whole speech signal, withas high a fidelity as the public wants, or as high as is possible in terms of thecost the public will put up with. Measurement of fidelity has to be made psycho-acoustically in terms of the whole speech-signal, not just in terms of its linguisticcontent; it may be important to be able to tell over the telephone that someonehas a cold in the head.Furthermore, language sometimes must operate under extremely noisy condi-tions. The high linguistically relevant redundancy of the speech signal can beinterpreted not as a sign of low efficiency, but as an indication of tremendousflexibility of the system to accommodateto the widest imaginable variety of noiseconditions. And here we mean not only 'channel noise', which is noise in Shan-non's sense, but 'semantic noise', discrepancies between the codes used by trans-

    18 R. M. Fano, The information theory point of view in speech communication, Jour.Acoust. Soc. Am. 22.691-6, esp. 694 (1950).19 A certain proportion of the articulatory and auditory but nonlinguistic milieu oflinguistic signalling is highly organized and culturally transmitted. This portion constituteswhat G. L. Trager and H. L. Smith Jr. take as the object of METALINGUISTIC study; see nowSmith, An outline of metalinguistic analysis (1952).

    20 The significance of gesture has traditionally been underestimated, and its 'natural-ness'-the extent to which various cultures agree on gestures and their meanings-has beenvastly overestimated. See R. I. Birdwhistell, Introduction to kinesics (1952).

    85

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    19/26

    LANGUAGE,VOLUME29, NUMBER1mitter and receiver, the kind of noise despite which we often understand someonewith a speech-pattern widely different from our own.21It is worth while also to consider the ratio of the amount of information whichcan be carried by any one phonemic contrast in a language, given the statisticalstructure of the whole phonemic system, to the total entropy of the phonemicsystem. Different contrasts obviously carry different amounts of 'functionalload'; just as obviously, no single contrast ever carries any very high proportionof the whole load. The redundancy of a phonemic system is so high that most ofthe time a hearerneed receive accurately only a small percentage of the phonemicunits transmitted by the speaker, in orderto reconstruct the whole message. Thisbears on the problem of phonetic change. Any single contrast in a phonemicsystem can be lost, by phonetic change, without the speakers' being any thewiser. This also militates against Hoenigswald's theory that coalescence ofphonemes can be brought about only by dialect borrowing.22

    D. Phonologyand TacticsWhen the linguist goes beyond the phonological level to the tactical (or 'gram-matical' or 'morphemic') level, he finds another way in which to regard utter-ances as discrete messages, the units in this case being not phonemes but mor-phemes. It is by no means certain that calculation of the entropy of speech interms of morphemes will give the same results as calculation in terms of phonemes-though it is certain that the computation is vastly more difficult.There is a way to approach the relationship between tactical pattern andphonemic pattern which may be physically and physiologically meaningless(though not necessarily so), but which nevertheless has some utility. This is to

    say that, just as the external speech-signal represents a continuizing transductionof an internal discrete phoneme flow, so this phoneme flow itself represents atransduction of an even deeper morpheme flow. The morphemes of a language,in this portrayal, cease to be CLASSES of morphs, and become rather message-units on this deeper level which are REPRESENTED by morphs on the phonemiclevel. The morphophonemics of a language is then a set of rules for transducingmorpheme-sequences to phoneme-sequences. And in the hearer, after the moresuperficial quantizing of the arriving speech-signal into a discrete phoneme flow,the morphophonemic principles of the language are applied backwards to thephoneme flow to recover a morpheme flow.To make this concrete, let us image a unit called a tactics box in the brain.This tactics box passes through a series of states. Each passage from one stateto another is accompanied by the emission of a morpheme, and the new statedepends both on the old state and on the morpheme emitted, as well as on twoadditional factors to be mentioned presently. When the box is in any given

    21 Semantic noise is discussed very briefly by Weaver towards the end of his paper inthe volume under review. The reviewer's paper An approach to the quantification of seman-tic noise, Philosophy of science 19.257-60 (1952), though set up in terms of a highly oversim-plified model, perhaps shows how communication can take place despite failure to agreecompletely on code-conventions.

    22 See for example his review of Hall's Leave your language alone, Classical weekly 42.250(1949).

    86

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    20/26

    REVIEWSstate, there are various probabilities that it will pass next to each of the variousother states, and thus the same probabilities that it will emit next each of thevarious morphemes which constitute its repertory. Insofar as these probabilitiesare determined by previously emitted morphemes, they constitute the tacticalstructure of the language. The emitted stream of morphemes gets encoded into astream of phonemes; there is delay at the transducer which does this, since theproper stream of phonemes for a given string of morphemes often depends onseveral successive morphemes, not just on one (wife is encoded into /wayv/ ifthe next morpheme is the noun-plural -s, and -s is encoded into /z/, rather than/s/, when the preceding morpheme is wife). The stream of phonemes is smearedby articulation into a speech signal; this, entering someone else's ear, isquantized again into a stream of phonemes, and then decoded into a stream ofmorphemes, which is fed into the tactics box of the hearer. For a tactics box is acombined source and sink; the impact of incoming morphemes is a third factorconditioning the sequence in which the box passes from one state to another. Wecan add the specification that on some occasions emitted morphemes are shunteddirectly back into the emitting box, or are converted into phonemes and thendecoded back into morphemes in the same brain, instead of breaking all the wayout in the form of overt speech; this is 'thinking in words'.The last factor that conditions the probabilities of change of state in thetactics box is all that goes on in the rest of the central nervous system: the con-stant feeding in of a stream of perceptions, the retention of some of these, thereorganizing of others, the delaying of still others. We can say that the condi-tioning of the tactics box by this factor is the SEMANTICS of the language.Since there is currently no way in which all this can be disproved, it does notqualify as a scientific hypothesis; it is merely a terminology. It should be ap-parent, however, that as a terminology it affords us a way to bring existing tech-niques in linguistics and existing methods in communication theory jointly tobear on the workings of language.

    E. ImmediateConstituentsUsually in tactical analysis (less often in phonological) linguists make use ofthe procedure of immediate constituents. Sometimes this is regarded merely asa way of attaining descriptive statements in economical form; sometimes it isregarded as an objective factor in linguistic structure which must be ascertainedin the course of analysis. In either view, there would appear at first sight to benothing in information theory to parallel it. As we shall show, however, there is.At various points in an utterance in the course of transmission, say at the endsof successive morphemes, the degree of indeterminacy as to what may come nextcan in theory be computed. The indeterminacyis greaterif the number of possiblenext morphemes is greater; with a given number, the indeterminacy is greater if

    the probabilities are nearly equal, less if they diverge widely from equality. Nowgenerally in current practice, and universally in a theoretically possible optimumprocedure, a linguist makes his primary cut of a composite form at that pointwhere the indeterminacy is greatest. The form red hats (we ignore supraseg-mental features) consists of three morphemes, with a cut between red and hat

    87

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    21/26

    LANGUAGE, VOLUME 29, NUMBER 1and another between hat and -s. It would seem that the indeterminacy after redis greater than that after redhat; certainly it is if we regardthe singular form hatas involving a morpheme of zero shape 'singular', since in that case there is only asmall handful of morphemes which can immediately follow red hat-: this singularmorpheme, or -s, or -ed (red-hatted),or prehaps one or two others. Some forms,it may be, are not actually so cut, because the pressure of their similarity to largenumbers of forms where the cutting is unambiguous may lead us to go counterto this first and fundamental criterion. Thus there is probably greater inde-terminacy after the hermetic-of hermeticallysealed than there is after the wholefirst word, but we would all choose to cut first between the words.Shannon has conducted experiments in ordinary English orthography,23andthe reviewer has conducted similar ones, with the proper audiences, in terms ofphonemic notation, the results of which bear on the stated correlation betweenIC-analysis and information theory. One decides on a sentence which is totallyunknown to the audience, and writes it down. Then one has the audience guess-without any hints-the first letter (or phonemic symbol) of the sentence, andrecords the number of guesses made, up to and including the right guess. Thenthe second letter (or phonemic symbol) is guessed, and the third; spaces in theorthographic form, and open junctures in transcription, count as symbols andhave to be guessed along with the others. As might be imagined, the number ofguesses necessary for each successive symbol varies considerably; this numberdecreases sharply within the body of a word or morpheme, and in general in-creases when any word boundary or morpheme boundary is reached. And onecan discern some tendency for a larger number of guesses to be required at somecuts between morphemes or words than at others; the greater number usuallycorrelates with a more elementary cut between ICs.

    F. WritingWhen one writes, one transduces a linguistic message into a different form;reading is the inverse transduction. In some writing systems (Chinese) thelinguistic message is taken in morphemic shape for such transduction; in others(Finnish), it is taken in phonemic shape; in most-including actually both of the

    extremes named-both elements are involved in various proportions.24Most traditional writing systems provide no machinery for the indication ofcertain relevant features of speech. In English writing, for example, there are noconventions for indicating stresses and intonations. The information carried inspeech by stresses and intonations is therefore either lost in the transduction towriting, or is carried by the use of more morphemes of other types.A short count seems to indicate that about one-sixth of the morphemes ofEnglish are lost in the act of writing in just this way. This was determined bycounting the number of morphemes graphically indicated in several passages ofwritten English, then counting the number which occurred when the same23 Shannon, Prediction and entropy of printed English, Bell System technical journal30.50-65 (1951).24 This is worked out in more detail in the reviewer's paper for the Third Annual Con-ference on Linguistics and Language Teaching (Georgetown University), to be publishedsoon.

    88

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    22/26

    REVIEWSpassages were read aloud. The specific morphemes added in reading aloud maynot match the ones spoken by the writer preparatory to writing, but certainlythe number of occurrences of such morphemes must be about the same.

    The fact that written English is intelligible despite this loss, and that it canoften be read aloud in a way which restores the lost morphemes with reasonableaccuracy, implies, of course, a degree of redundancy in spoken English that wealready know to be there. Nevertheless, not all passages of written English areintelligible. Given a writing-system which forces such loss, a good WRITINGSTYLE is definable as one which compensates for the loss by the use of more seg-mental morphemes. All of us have seen, particularly in newspapers, passageswhich were not in good writing style.The devising of utterances in a good writing style to be transduced into writtenform finds its analog in many other communicative situations. In preparing awritten message for telegraphic transmission, for example, we refrain from anysignificant use of the contrast between lower-case and capital letters, since thatdistinction is not maintained in the transduction to telegraphic signal. The generalat the teleconference of our first section imposed the same restraint on what hewrote, for the same reason. When it is necessary to transmit via telegraph orteletype a chemical formula or a mathematical equation, the task cannot be doneby direct transduction; instead, we do the same thing that must be done in send-ing information about a building-plan or the like: we phrase and transmit aDESCRIPTIONof the formula, equation, or building-plan, on the basis of which thedestination can construct a more or less reasonable facsimile. Compare with thisthe difference between a dialog in a novel and on the stage: the novelist writes,'You can't do that!' he rasped; the actor on the stage omits the last two words butrasps as he delivers the rest of it.

    GENERAL IMPLICATIONSWe have demonstrated that a certain number of problems of concern to lin-guists can be phrased in the terminology of information theory. We have notproved that such rephrasing makes for added clarity, or leads to the posing of

    relevant new questions. It is always challenging to discover that a systematiza-tion will subsume a larger variety of phenomena than those for which it was inthe first instance devised; the discussion in the last section implies not only thatinformation theory is applicable to language, but also that linguistic theory isapplicable, with certain restrictions or modifications, to other varieties of com-municative behavior. But in our exhilaration we must always guard against mis-leading analogy and the invalid identification of concepts because the wordsfor them are identical. Otherwise we may think we have obtained genuinely newknowledge when all we have done is to formulate old knowledge in a newterminology.

    'Information' and 'meaning' must not be confused. Meaning might be said,in a sense, to be what information is about. For the relatively simple communica-tive systems that Shannon handles, it is easy to introduce a realistic definitionof the meaning of a signal or a portion of a signal: the meaning of a stretch ofsignal is the stretch of message which was transduced into it and into which, at

    89

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    23/26

    LANGUAGE, VOLUME 29, NUMBER 1the receiver (barring noise), it will be retransduced. In telegraphy the meaningof two dots is the letter I. The meaning of the English written word 'man' is thespoken form man. The meaning of the speech-signal produced when someonesays man is the sequence of phonemic units which compose that word linguistic-ally. The meaning of the sequence of phonemic units is a certain morpheme.But if we inquire into the meaning of the morpheme, information theory cannothelp us. Information theory does not deal with the way in which a source maps anoncommunicative stimulus into communicative output, nor with the way inwhich a sink maps communicative input into a noncommunicative response.Precisely this is the fundamental problem of semantics, and on this score thespeculations of linguists such as Bloomfield or of psychologists such as Weisshave much more to tell us than information theory. It is possible that thesespeculations could afford the basis for an expansion of information theory invaluable directions.There is a partial analogy between information and energy, which extends also,as a matter of fact, to money; the tendency is strong to make more of thisanalogy than is justified. Energy-flow is POWER;information-flow is ENTROPY;money-flow (at least in one direction) is INCOME. Energy is measured in ergs orwatt-seconds or kilowatt-hours, power in watts or kilowatts; information ismeasured in binits, entropy in shannons; money is measured, say, in dollars,income in dollars-per-month. In all three cases it is, in a sense, the rate of flowthat counts; energy, information, and money are substantialized (the last actu-ally in the form of pieces of metal or paper, the other two only in words) primarilybecause we find it easier to think about problems in that way-perhaps becausewe think in languages of the type that Whorf called Standard Average European.But there is a law of conservation of energy, while there is no law of conserva-tion of information (I cannot speak for money). At this point the parallelismbreaks down. Proof is easy. To supply one hundred-watt light bulb, a generatormust transmit one hundred watts of power, plus a bit more to make up for line-loss. To supply ten such bulbs, the generator must transmit ten times as muchpower. If all the bulbs are turned off, the generatoris forced to cease transmittingpower-either it is turned off also, or it burns itself out. To supply a receiver withone hundred shannons, a source must transmit information at that rate, plusenough more to counteract noise. But to supply ten receivers with one hundredshannons each, the source need not increase its entropy at all (unless the hook-upproduces more noise to counteract). The entire output, minus that lost throughnoise, reaches each receiver. And if all receivers are turned off, the source cancontinue to produce and transmit at the same rate. The information in this casedoes not dissipate (as energy might in the form of heat); it simply disappears.We have all had the experience of continuing to talk over the telephone after theperson at the other end of the line has hung up.25This defect in the analogy has proved uncomfortable for some investigators,who have also, perhaps, been misled by the homophony of 'information' as a tech-

    25 The contrast between energy and information appears in biological study: the physi-ologist is concerned primarily with energy-flow, the psychologist (even the physiologicalpsychologist) primarily with information-flow.

    90

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    24/26

    nical term and 'information' in everyday speech. One writer tries to differentiatebetween ABSOLUTE INFORMATION, 'which exists as soon as one person has it, andshould be counted as the same given amount of information, whether it is knownto one man or to millions', and DISTRIBUTED INFORMATION, 'defined as the pro-duct of the amount of absolute information and the number of people who sharethat information'.26Whatever validity there may be in the distinction, neitherof the items distinguished can be identified with Shannon's information, andShannon's work affords no method for quantifying either. If it is necessary tomaintain some analogy between an information-system and a power-system, thenentropy can better be compared to voltage, since a single generator can supplycurrent at a specified voltage to any number of outlets.It helps in avoiding this particular errorto note that a transducer is a kind ofcomplicated trigger. When a marksman aims and fires a pistol, the energy thathe must expend bears no necessary relation to the amount of energy producedby the burning powder in expelling the bullet. When the operator depresses akey on a teletype transmitter, the energy that he uses is not the energy which istransmitted along the wire, and bears no necessary quantitative relation to this.The human brain operates on a power-level of five watts; such a brain guides aderrick which lifts a stone weighing tons. In micrurgy, on the other hand, theoperator expends much more energy on his apparatus than it expends on itsobject.The distinctiDn between trigger action and direct action is of fundamentalimportance at least in human life, possibly in the entire physical universe, andone line of argument implies that 'communication' is ultimately definable onlyin terms of trigger action. In the world of man, the trigger effect of language istoo obvious to need discussion. Human artifacts can be classed into tools, whichinvolve no triggers or transducers, and machines, which do. Some artifacts areused for SENSORY PROSTHESIS: telescopes and microscopes are tools for sensoryprosthesis, while radar, electron-microscopes, Geiger counters, and the like aremachines. Other artifacts are used for MOTOR PROSTHESIS: spades, shovels,wrenches, bows and arrows are all tools, while steam shovels, firearms, anddraft animals are machines. Still other artifacts are used for COMMUNICATIVEPROSTHESIS: language itself (or rather, the vibrating air which is the artifactionproduced by speech) is a tool, while writing, smoke-signals, and electronicapparatus of various kinds (including mechanical computers) are all machines.This ties in with White's notion of measuring human evolution in terms of theincreasing amounts of energy controlled and utilized by human beings.27It isclear that human tools developed before human machines, and the simpler ma-chines before the more complex. A very late development is the practice of coup-ling a device for sensory prosthesis, directly or through one for communicativeprosthesis, to a device for motor prosthesis, so as to produce an apparatus whichwill perform as it is supposed to perform without human participation-e.g.

    26 L. Brillouin, Thermodynamics and information theory, American scientist 38.594-9,esp. 595 (1950).27 A. L. White, Energy and the evolution of culture, American anthropologist 45.335-56(1943). White gives credit in turn to Morgan.

    REVIEWS 91

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    25/26

    LANGUAGE, VOLUME 29, NUMBER 1an electric refrigeratoror a radar-controlled anti-aircraft battery. On the level ofhuman understanding, much of man's progress has consisted of a slow clarifica-tion as to what can be triggered, and by what means. If you can trigger a fireintocooking your meat, why can't you trigger the sky into giving you rain? Thedifference between the rites of a shaman and the seeding of clouds with dry iceis that the latter sometimes triggers a release of rain, whereas the former neverdoes. All of this has an important inverse: a deeper understanding of man's rolein the evolution of the universe.The argument which attributes such cosmic significance to communication isbased on an identification which may be as fallacious as that of Shannon's 'in-formation' and the common-vocabulary term 'information'. This is the as-sumption that the entropy of communication theory is physically the same thingas the entropy of thermodynamics. The latter is a measure of the degree ofrandomness of energy-distribution in a closed physical system; the second lawof thermodynamics states than in any such system the entropy increases, by andlarge, until it is maximum. If information-theory entropy is actually the samething, rather than something else which can be handled by the same mathematicalmachinery, then the transfer of information from a physical system representsa local decrease in entropy-an increase in orderliness and pattern. Since theonly completely closed physical system in the universe is the universe itself,local decreases in entropy do not controvert the second law of thermodynamics.It is none the less valuable to study the mechanisms by which local and temporarydecreases are brought about. With Wiener's elegant discussion of this28we closeour review:

    A very important idea in statistical mechanics is that of the Maxwelldemon. Let us suppose a gas in which the particles are moving around withthe distribution of velocities in statistical equilibrium for a given tempera-ture. For a perfect gas, this is the Maxwell distribution. Let this gas becontained in a rigid container with a wall across it, containing an openingspanned by a small gate, operated by a gatekeeper, either an anthropo-morphic demon or a minute mechanism. When a particle of more thanaverage velocity approaches the gate from compartment A or a particle ofless than average velocity approaches the gate from compartment B, thegatekeeper opens the gate, and the particle passes through; but when a parti-cle of less than average velocity approachesfrom compartment A or a parti-cle of greater than average velocity approaches from compartment B, thegate is closed. In this way, the concentration of particles of high velocityis increased in compartment B and is decreased in compartment A. Thisproduces an apparent decrease in entropy; so that if the two compartmentsare now connected by a heat engine, we seem to obtain a perpetual-motionmachine of the second kind.It is simpler to repel the question posed by the Maxwell demon than toanswer it. Nothing is easier than to deny the possibility of such beings orstructures. We shall actually find that Maxwell demons in the strictestsense cannot exist in a system in equilibrium, but if we accept this from thebeginning, we shall miss an admirable opportunity to learn somethingabout entropy and about possible physical, chemical, and biological sys-tems.28 Cybernetics 71-3.

    92

  • 8/7/2019 Hockett The Mathematical Theory of Communication by Claude L Shannon

    26/26

    For a Maxwell demon to act, it must receive information from approach-ing particles, concerning their velocity and point of impact on the wall.Whether these impulses involve a transfer of energy or not, they must in-volve a coupling of the demon and the gas. Now, the law of the increase ofentropy applies to a completely isolated system, but does not apply to anon-isolated part of such a system. Accordingly, the only entropy whichconcerns us is that of the system gas-demon, and not that of the gas alone.The gas entropy is merely one term in the total entropy of the larger sys-tem. Can we find terms involving the demon as well which contribute to thistotal entropy?Most certainly we can. The demon can only act on information received,and this information ... represents a negative entropy. The informationmust be carried by some physical process, say some form of radiation. Itmay very well be that this information is carried at a very low energy level,and that the transfer of energy between particle and demon is for a con-siderable time far less significant than the transfer of information. How-ever, under the quantum mechanics, it is impossible to obtain any informa-tion giving the position or the momentum of a particle, much less the twotogether, without a positive effect on the energy of the particle examined,exceeding a minimum dependent on the frequency of the light used forexamination. Thus all coupling is strictly a coupling involving energy; anda system in statistical equilibrium is in equilibrium both in matters con-cerning entropy and those concerning energy. In the long run, the Maxwelldemon is itself subject to a random motion correspondingto the tempera-ture of its environment, and as Leibnitz says of some of his monads, it re-ceives a large number of small impressions, until it falls into 'a certainvertigo', and is incapable of clear perceptions. In fact, it ceases to act asa Maxwell demon.Nevertheless, there may be a quite appreciable interval of time beforethe demon is deconditioned, and this time may be so prolonged that wemay speak of the active phase of the demon as metastable. There is noreason to suppose that metastable demons do not in fact exist; indeed, it maywell be that enzymes are metastable Maxwell demons, decreasing entropy,perhaps not by the separation between fast and slow particles, but by someother equivalent process. We may well regard living organisms, such asMan himself, in this light. Certainly the enzyme and the living organismare alike metastable: the stable state of an enzyme is to be deconditioned,and the stable state of a living organism is to be dead. All catalysts areultimately poisoned: they change rates of reaction, but not true equilibrium.Nevertheless, catalysts and Man alike have sufficiently definite states ofmetastability to deserve the recognition of these states as relatively perma-nent conditions.

    A comparative grammar of the Hittite language, revised edition. By EDGARH. STURTEVANT(and E. ADELAIDEHAHN). (William Dwight Whitney lin-guistic series.) Vol. 1, pp. xx, 199. New Haven: Yale University Press, 1951.

    Reviewed by HOLGERPEDERSEN,University of CopenhagenThis book is a revised and completely rewritten version of a work first publishedin 1933. In spite of the appearance of E. Adelaide Hahn's name on the title page,the book is wholly by Sturtevant; Miss Hahn is to be the author of a projectedsecond volume, which will treat the syntax of Hittite.' This first volume contains1 [Cf. E. Adelaide Hahn, Lg. 28.422 fn. 19.]

    For a Maxwell demon to act, it must receive information from approach-ing particles, concerning their velocity and point of impact on the wall.Whether these impulses involve a transfer of energy or not, they must in-volve a coupling of the demon and the gas. Now, the law of the increase ofentropy applies to a completely isolated system, but does not apply to anon-isolated part of such a system. Accordingly, the only entropy whichconcerns us is that of the system gas-demon, and not that of the gas alone.The gas entropy is merely one term in the total entropy of the larger sys-tem. Can we find terms involving the demon as well which contribute to thistotal entropy?Most certainly we can. The demon can only act on information received,and this information ... represents a negative entropy. The informationmust be carried by some physical process, say some form of radiation. Itmay very well be that this information is carried at a very low energy level,and that the transfer of energy between particle and demon is for a con-siderable time far less significant than the transfer of information. How-ever, under the quantum mechanics, it is impossible to obtain any informa-tion giving the position or the momentum of a particle, much less the twotogether, without a positive effect on the energy of the particle examined,exceeding a minimum dependent on the frequency of the light used forexamination. Thus all coupling is strictly a coupling involving energy; anda system in statistical equilibrium is in equilibrium both in matters con-cerning entropy and those concerning energy. In the long run, the Maxwelldemon is itself subject to a random motion correspondingto the tempera-ture of its environment, and as Leibnitz says of some of his monads, it re-ceives a large number of small impressions, until it falls into 'a certainvertigo', and is incapable of clear perceptions. In fact, it ceases to act asa Maxwell demon.Nevertheless, there may be a quite appreciable interval of time beforethe demon is deconditioned, and this time may be so prolonged that wemay speak of the active phase of the demon as metastable. There is noreason to suppose that metastable demons do not in fact exist; indeed, it maywell be that enzymes are metastable Maxwell demons, decreasing entropy,perhaps not by the separation between fast and slow particles, but by someother equivalent process. We may well regard living organisms, such asMan himself, in this light. Certainly the enzyme and the living organismare alike metastable: the stable state of an enzyme is to be deconditioned,and the stable state of a living organism is to be dead. All catalysts areultimately poisoned: they change rates of reaction, but not true equilibrium.Nevertheless, catalysts and Man alike have sufficiently definite states ofmetastability to deserve the recognition of these states as relatively perma-nent conditions.

    A comparative grammar of the Hittite language, revised edition. By EDGARH. STURTEVANT(and E. ADELAIDEHAHN). (William Dwight Whitney lin-guistic series.) Vol. 1, pp. xx, 199. New Haven: Yale University Press, 1951.

    Reviewed by HOLGERPEDERSEN,University of CopenhagenThis book is a revised and completely rewritten version of a work first publishedin 1933. In spite of the appearance of E. Adelaide Hahn's name on the title page,the book is wholly by Sturtevant; Miss Hahn is to be the author of a projectedsecond volume, which will treat the syntax of Hittite.' This first volume contains1 [Cf. E. Adelaide Hahn, Lg. 28.422 fn. 19.]

    REVIEWSREVIEWS 9393


Recommended