+ All Categories
Home > Documents > IEEE TRANSACTIONS ON INFORMATION THEORY, …yzchen/papers/papers/limits_protinfo_gal...IEEE...

IEEE TRANSACTIONS ON INFORMATION THEORY, …yzchen/papers/papers/limits_protinfo_gal...IEEE...

Date post: 11-Jul-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
14
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-22, NO, ~4, JULY 1976 385 Basic Limits oh Protocol Information in Data Communication Networks ROBERT G. GALLAGER, FELLOW, IEEE Abstract-We consider basic limitations on the amount of pro- tocol information that must be transmitted in a data communication network to keep track of source and receiver addresses and of the starting and stopping of messages. Assuming Poisson message ar- rivals between each communicating source-receiver pair, we find a lower bound on the required protocol information per message. This lower bound is the sum of two terms, one for the message length information, which depends only on the distribution of message lengths, and the other for the message start information, which depends only on the product of the source-receiver pair ar- rival rate and the expected delay for transmitting the message. Two strategies are developed which, in the limit of large numbers of sources and receivers, almost meet the lower bound on protocol information. I. INTRODUCTION A DATA communication network, for the purposes of this paper, is a finite collection of nodes; a finite collection of noiseless two-way communication links, each connecting some pair of nodes; and a finite collection of sources and receivers, each source and each receiver being connected to a node (see Fig. 1). We view each source as being paired with a receiver (typically at a different node), and the purpose of the network is to transmit messages from each source to its paired receiver. Messages are as- sumed to arrive from a source at randomly chosen instants of time and consist of binary strings of random length. It seems intuitively obvious that the network must not only transmit the messages from source to receiver but must also transmit a certain amount of control information indicating, for example, the beginning, the end, and the destination of each message. It is customary in data net- works to refer to such control information as protocol in- formation and to refer to the conventions for representing it as protocols. To an information theorist, then, a protocol is a source code for representing control information. Our major objective in this paper is to define and cal- culate the amount of protocol information (in the infor- mation-theoretic sense) required in the type of network described above. We shall find, rather surprisingly, that this information is related solely to the starting and stop- ping of messages and has nothing to do with addressing. We interpret addresses later as a particular type of source code for representing some of the information about the starting of messages. We shall use our results on protocol Manuscriptleceived October 10, 1974; revised August 18, 1975. This work was supported in part by NASA under Grant NGL 22-099-013 and in part by the National Science Foundation under Grant GK-37582. The author is with the Department of Electrical Engineering and Research Laboratory of Electronics, Massachusetts Institute of Tech- nology, Cambridge, MA 02139. information to derive a lower bound on the required transmission rate of binary digits throughout the network. Finally we shall demonstrate some source coding strategies (i.e., protocols) which come very close to our lower bounds. The following examples provide some idea of the wide variety of ways in which protocol information can be rep- resented in different types of networks. In a message switching network, each message is generally preceded by a binary encoding of the source and receiver address and of the message length. Messages, with their preceding protocol bits, are queued at the individual nodes and passed from one node to another by a variety of routing algorithms which will be of no concern here. In a packet switching network, the messages are first divided into packets of some fixed length. Each packet is preceded by an encoding of the source and receiver address and the position of the packet within the message. The final packet, of course, might have fewer message bits than the others and this information must also be encoded. The packets are transmitted through the network independently and are reconstructed into a message at the receiver’s node. The amount of protocol information is somewhat greater than for message switching, but the delay in transmission is frequently reduced. Another possible system is to use multiplexers on each link of the network and to assign to each source-receiver pair a fixed fraction of the capacity of the links on some path from source to receiver. Frequently in such systems, there is a special string of bits called an idle character, which is used repeatedly when there is no message to be sent, and a special string of bits called a flag, which pre- cedes and follows each message. The messages themselves are slightly encoded to prevent the occurrence of the flag character within the encoded message. The idle characters and flags should be regarded as protocol codewords which together indicate the beginning and the end of each mes- sage. Such a system is usually either very inefficient in its use of link capacities or subject to very long queues since it has no flexibility to allocate the network .rcsources to messages as needed. If all the messages from ! source are of equal length and if they arrive equally spaced in time, then the multiplexer becomes very efficient and, in fact, it is possible to eliminate all control characters. In comparing the multiplexer approach with the mes- sage switching or packet switching approach, we see that the former transmits message start information but no addresses, while the latter transmits addresses but no message start information. We shall later see more clearly how message start information and addressing are related.
Transcript
Page 1: IEEE TRANSACTIONS ON INFORMATION THEORY, …yzchen/papers/papers/limits_protinfo_gal...IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-22, NO, ~4, JULY 1976 385 Basic Limits oh Protocol

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-22, NO, ~4, JULY 1976 385

Basic L imits oh Protocol Information in Data Communication Networks

ROBERT G. GALLAGER, FELLOW, IEEE

Abstract-We consider basic limitations on the amount of pro- tocol information that must be transmitted in a data communicat ion network to keep track of source and receiver addresses and of the starting and stopping of messages. Assuming Poisson message ar- rivals between each communicat ing source-receiver pair, we find a lower bound on the required protocol information per message. This lower bound is the sum of two terms, one for the message length information, which depends only on the distribution of message lengths, and the other for the message start information, which depends only on the product of the source-receiver pair ar- rival rate and the expected delay for transmitting the message. Two strategies are developed which, in the limit of large numbers of sources and receivers, almost meet the lower bound on protocol information.

I. INTRODUCTION

A DATA communication network, for the purposes of this paper, is a finite collection of nodes; a finite

collection of noiseless two-way communication links, each connecting some pair of nodes; and a finite collection of sources and receivers, each source and each receiver being connected to a node (see Fig. 1). We view each source as being paired with a receiver (typically at a different node), and the purpose of the network is to transmit messages from each source to its paired receiver. Messages are as- sumed to arrive from a source at randomly chosen instants of time and consist of binary strings of random length.

It seems intuitively obvious that the network must not only transmit the messages from source to receiver but must also transmit a certain amount of control information indicating, for example, the beginning, the end, and the destination of each message. It is customary in data net- works to refer to such control information as protocol in- formation and to refer to the conventions for representing it as protocols. To an information theorist, then, a protocol is a source code for representing control information.

Our major objective in this paper is to define and cal- culate the amount of protocol information (in the infor- mation-theoretic sense) required in the type of network described above. We shall find, rather surprisingly, that this information is related solely to the starting and stop- ping of messages and has nothing to do with addressing. We interpret addresses later as a particular type of source code for representing some of the information about the starting of messages. We shall use our results on protocol

Manuscriptleceived October 10, 1974; revised August 18, 1975. This work was supported in part by NASA under Grant NGL 22-099-013 and in part by the National Science Foundation under Grant GK-37582.

The author is with the Department of Electrical Engineering and Research Laboratory of Electronics, Massachusetts Institute of Tech- nology, Cambridge, MA 02139.

information to derive a lower bound on the required transmission rate of binary digits throughout the network. Finally we shall demonstrate some source coding strategies (i.e., protocols) which come very close to our lower bounds.

The following examples provide some idea of the wide variety of ways in which protocol information can be rep- resented in different types of networks. In a message switching network, each message is generally preceded by a binary encoding of the source and receiver address and of the message length. Messages, with their preceding protocol bits, are queued at the individual nodes and passed from one node to another by a variety of routing algorithms which will be of no concern here. In a packet switching network, the messages are first divided into packets of some fixed length. Each packet is preceded by an encoding of the source and receiver address and the position of the packet within the message. The final packet, of course, might have fewer message bits than the others and this information must also be encoded. The packets are transmitted through the network independently and are reconstructed into a message at the receiver’s node. The amount of protocol information is somewhat greater than for message switching, but the delay in transmission is frequently reduced.

Another possible system is to use multiplexers on each link of the network and to assign to each source-receiver pair a fixed fraction of the capacity of the links on some path from source to receiver. Frequently in such systems, there is a special string of bits called an idle character, which is used repeatedly when there is no message to be sent, and a special string of bits called a flag, which pre- cedes and follows each message. The messages themselves are slightly encoded to prevent the occurrence of the flag character within the encoded message. The idle characters and flags should be regarded as protocol codewords which together indicate the beginning and the end of each mes- sage. Such a system is usually either very inefficient in its use of link capacities or subject to very long queues since it has no flexibility to allocate the network .rcsources to messages as needed. If all the messages from ! source are of equal length and if they arrive equally spaced in time, then the multiplexer becomes very efficient and, in fact, it is possible to eliminate all control characters.

In comparing the multiplexer approach with the mes- sage switching or packet switching approach, we see that the former transmits message start information but no addresses, while the latter transmits addresses but no message start information. We shall later see more clearly how message start information and addressing are related.

Page 2: IEEE TRANSACTIONS ON INFORMATION THEORY, …yzchen/papers/papers/limits_protinfo_gal...IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-22, NO, ~4, JULY 1976 385 Basic Limits oh Protocol

386 IEEE TRANSACTIONS ON INFORMATION THEORY, JULY 1976

N denotes NODE S denotes SOURCE R denotes RECEIVER

Fig. 1. Simple model of data communication network,

It should be clear that our data network model is a highly simplified version of a real data network. We have left out of consideration such questions as how new sources and receivers are connected to the network, how they are paired, and how they establish conventions for interpreting the bit-string messages between themselves. Similarly, we have ignored questions of record keeping, of privacy and security, and of removing malfunctioning nodes or links from the network and of adding new nodes and links. The control messages within the network to accomplish these functions can be, and should be, regarded as ordinary messages within the type of network we have described, with special conceptual sources and receivers within the nodes sending and receiving these messages. In data net- work parlance, these control functions are generally called higher level protocols, which means simply that they can be regarded as messages within the basic network de- scribed. Thus, although protocols to implement these higher level functions are important, these problems are separable from the problems of transmitting messages in our data network model.

We have also ignored the problem of errors on the communication links. These problems can be, and usually are, handled by error detection and retransmission strat- egies applied to individual links. Aside from some variable delay introduced into the communication links, these problems are essentially separable from those of concern here. Finally, we have excluded the possibility of a source communicating with several receivers and of a receiver getting messages from several sources. We shall see later that these latter situations can be easiiy incorporated into our model.

The data network model here has been chosen to focus clearly on what we view as the most important aspect of data networks, namely, the sporadic nature of the sources. This sporadic nature of data sources, generating messages occasionally but being idle most of the time, is the major reason for the practical importance of data networks (see, for example, [7] and [l]), but it is also the major cause of their complexity.

The sporadic arrivals of messages, of necessity, generate queues either within the network or at the sources awaiting

message entry. These queues introduce both delay and the danger of buffer overflow. Complex flow control protocols are required to prevent buffer overflows, and frequently alternative or dynamic routing is attempted to reduce queueing delays. Alternative or dynamic routing, in turn, introduce a wide variety of protocols needed to get the information required for routing decisions and to handle the attendent problems of lost messages or packets and out-of-order arrivals at the receivers.

The sporadic arrivals of messages and the random lengths of messages also generate the need for message start-stop protocols and/or message addressing as, dis- cussed earlier. These latter protocols are the subject of this paper. We shall see that, as the number of sources at each node in the network grows (with a, proportional increase in the link capacities), the queueing delays decrease and hence the need for protocols to handle queueing delays decreases, Thus our results can be interpreted as the lim- iting amount of protocol required for very large networks.

To our knowledge, this is the first theoretical study of protocol in data networks. We hope to convince the reader that it is indeed a subject amenable to theoretical analysis. The subject is also of practical importance. For example, the Arpanet is an example of a well engineered and re- searched network. A nonnegligible amount of traffic in this network comes from simple character-at-a-time trans- mission from interactive terminals, and the amount of protocol to handle such traffic is orders of magnitude greater than the information traffic itself. A more thorough conceptual understanding of protocol issues could un- doubtedly give rise to much higher efficiencies in networks and, perhaps more importantly, to reduced complexity.

II. SUMMARY OF RESULTS

In Section III, we start with a very simple situation, in- cluded primarily for pedagogic reasons. The network consists of only one link with K sources connected to the first node and K receivers to the second. Source k sends messages only to receiver k (1 I k I K). The sources are all synchronous. When a source is producing a message, it produces one binary digit per time unit. The lengths of all messages are independent geometrically distributed ran- dom variables with a mean denoted by 11~; the lengths of all idle periods between successive messages from a source are independent geometrically distributed random vari- ables with a mean denoted by l/6. We assume that each message digit must be delivered to the appropriate receiver with a fixed delay. It is then shown that the average pro- tocol information per message is

bits (1)

where

H(x) = --x log‘2 x - (1 - x) log2 (1 - X).

Furthermore, it is shown that as the number of sources K becomes large, strategies exist for which the average

Page 3: IEEE TRANSACTIONS ON INFORMATION THEORY, …yzchen/papers/papers/limits_protinfo_gal...IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-22, NO, ~4, JULY 1976 385 Basic Limits oh Protocol

GALLAGER: DATA COMMUNICATION NETWORKS 387

number of binary digits used for protocol approaches the amount in (1) as a lower bound arbitrarily closely. It is also shown that the above strategies work equally well for any distribution on the lengths of messages and idle periods which have means l/e and l/6, respectively, but that the lower bound can be reduced for other distributions. In other words, the geometric distributions assumed on message lengths and idle periods are of interest not only for their simplicity but also for their extremal character.

In Section IV, we consider an arbitrary network with an arbitrary set of source-receiver pairs. For each source- receiver pair, we assume that the message arrivals are modeled by a Poisson process with rate LY (CX does not have to be the same for each pair). We assume that the entire message arrives at a single time and that the length of the message is again geometrically distributed with mean l/e. It is shown that, if the average transmission delay is given by CE, the transmitted protocol information per message must be at least

-log:, (1 - e-ad) + l%(t). E

The first term above is interpreted as message start in- formation and approaches zero as the allowable delay be- comes larger. The second term is identified as stop infor- mation, or more appropriately as message length infor- mation and does not depend on delay.

In Section V, two strategies are developed for trans- mitting protocol information. Both of them consist of various ways of queueing the messages and sending them at more regular intervals, thus reducing the required in- formation about starting time. The behavior of these strategies, for very large networks, is shown in Fig. 5.

III. ANALYSIS OFASINGLELINKWITHIDENTICAL SYNCHRONOUSSOURCES

We start off with a very simple “network” consisting of two nodes connected by one link. There are K sources connected to the first node, and K receivers connected to the second node. Source k, 1 I k I K, sends messages only to receiver k. Such a network might more appropriately be considered as a simple-minded concentrator. The sources are synchronous with a given basic time unit. When a source is producing a message, it produces one binary digit per time unit; and, when a source is idle, it produces nothing for some integral number of time units. Naturally, the node must be able to distinguish a nothing output from the binary symbol outputs so that, when the source is producing nothing, we say that it produces an idle symbol L.

Assume that each source is an independent Markov source as defined in Fig. 2. The connection of sources and receivers is indicated in Fig. 3. Each transition between states of the Markov chain is described in Fig. 2, first by the probability of that transition and second by the source output corresponding to that transition; c and 6 are arbi- trary probabilities but are the same for each source. The two states, B and I, represent the busy state and the idle

Fig. 2. Markov message source.

Fig. 3. Example of synchronous concentrator.

state, respectively. After producing an idle symbol i the source is in the idle state, and after producing an infor- mation symbol of a message (zero or one), the source is in the busy state. The steady state probability of the busy state is 6/(6 + t) and this is the fraction of time that the source is actually producing message symbols. A message is a string of zeros and ones with at least one idle symbol i on each side. The length of a message (the number of zeros and ones) is a geometrically distributed random variable with an expectation l/t. Similarly the length of each idle sequence separating messages is a geometrically distributed random variable with expectation l/6. The mean recurrence time between messages is thus l/e -I- l/6. Finally, the entropy of the Markov source is easily calcu- lated to be (see [4, sec. 3.61)

H,(U) = &+&m +&m bits/time unit. (2)

Since each message symbol contains 1 bit of information and since the source is producing message symbols a fraction 6/(6 + t) of the time, the first term on the right side of (2) may be interpreted as the entropy of the messages. Similarly, the second term may be interpreted as the en- tropy of the ending of the message or as the entropy in the length of the message. This information could be usefully employed by the source as part of the message to the re- ceiver and thus could be interpreted as part of the message entropy. However, this specification of message length is traditionally regarded as protocol, and we arbitrarily define it as such here. Finally, the last term in (2) may be inter- preted as the entropy in the starting time of a message or the entropy in the length of the idle sequences. Again this information can, and often does, provide useful informa- tion to the receiver, but we again define it as protocol. Note that if there is a variable delay in the transmission of messages to the receiver, then this information might be distorted at the receiver.

Next, assume that each binary digit emitted by each source is to be delivered to the corresponding receiver at some fixed delay d after the source emission time. What

Page 4: IEEE TRANSACTIONS ON INFORMATION THEORY, …yzchen/papers/papers/limits_protinfo_gal...IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-22, NO, ~4, JULY 1976 385 Basic Limits oh Protocol

388 IEEE TRANSACTIONS ON INFORMATION THEORY, JULY 1976

this means in effect is that in each time unit each receiver either must receive the binary digit emitted by its source d time units earlier or must receive no binary digit if none were emitted. Implicitly, therefore, whether or not the receiver is interested in the idle symbols, the idle symbols must be reconstructible from the input to the receiver. Since the output of the Markov source, idle symbols and all, must be reconstructible from the receiver input, the average mutual information between a source and its re- ceiver is the same as the source entropy. Since we have K sources that are independent of each other, the average mutual information between the set of sources and the set of receivers is given by K times the entropy of a single source. The channel between the two nodes must have at least this much channel capacity by the converse to the coding theorem. We summarize our results thus far in the following theorem.

Theorem 1: Let K synchronous Markov sources be sta- tistically independent of each other and each be described by Fig. 2. Let the sources be connected to a common channel with K receivers at the other end. In order for each binary digit from each source to be transmitted to the corresponding receiver with a fixed delay, it is necessary for the channel to have a capacity C satisfying

C 1 KH,(U) (3)

where H,(U) is the source entropy given in (2).

In order to interpret this result, we define the protocol information overhead ratio 7 as the ratio of the protocol information to the message information,

Tj = S(t) + ; S(6). (4)

For 6 small, this is approximately

(5)

As shown in Fig. 4, a significant fraction of the channel capacity must be used for protocol information when either the expected message length is short (E large) or the idle sequences are very much longer than the message se- quences. We shall return to this interpretation later.

A strategy will now be described for encoding the source outputs at the transmitter node. We shall see that, in the limit of large K, the lower bound to the link capacity in (3) suffices to transmit the required information. The strategy is not intended to be practical, although practical systems could be devised using this approach. Assume that the transmitter node and receiver node each maintain a list, in numerical order, of the sources that are in the busy state at each unit of time. Each time unit, the transmitter node will send encoded nrotocol information to the receiver

r=0.01

0 I I I I I I 10-l lO‘2 10-3 10-4 10-5 IO -6 IO -7

6 = (MEAN IDLE PERIODj’

Fig. 4. Ratio of protocol to useful information as function of e and 6.

At time j, the transmitter first encodes the identities of the set of sources which have gone from the busy to the idle state at time j. Each of the KBC~ - 1) sources in the busy state immediately preceding time j independently makes a transition to the idle state with probability t. Thus this information can be represented by a binary prefix condi- tion source code with an expected codeword length of at most KB(~ - l).%(t) + 1 (assume a different source code for each value of KB). Since the receiver node knows KB (j - l), it can decode this source code and remove those sources that have gone idle at time j from its busy list.

The transmitter next encodes the identities of the set of sources that have made a transition from idle to busy at time j. There are K - KB (j - 1) sources in the idle state immediately before time j, and each independently has probability 6 of going to the busy state. Thus the required expected length of a binary prefix condition source code here is at most [K - KB(~ - l)]%(6) + 1. Upon receipt of this second codeword, the receiver adds the new busy sources to its busy list which is now updated to be the list of busy sources after the transition at time j.

Finally, the transmitter node sends the KB(~) binary digits which have been emitted by the busy sources at time j (this includes both the sources that have just become busy and those that have remained busy). These binary digits are sent in order according to the numbering of the sources (1 I k 5 K). Since the receiver node has the cor- responding numbering of the receivers, it can route these bits, in order, to the receivers corresponding to the busy sources.

The steady state probability of a source being in the busy state is J/(6 + t), and thus the expected value of KB in the steady state is K6/(6 + e). It follows then that the expected number of digits rt encoded by the source node in the in- terval between time j and time j + 1 is

6 ?=LSK- +K

6+6 +4(c) + 1 + K&H@) + 1

= KH,(U) + 2. (6)

node, allowing this iist at the receiver to be updated. As- Note that, when K is large, the extra two bits that ap- sume that each source emits a symbol (zero, one, or i) at pears in this encoding are quite insignificant. Naturally, each integer time j. Let KB(~) denote the number of one could reduce this extra term to one bit by jointly coding sources in the busy state immediately after the transition the transitions into the busy state and the transitions out at time j. of it. One could also employ run length coding for the

Page 5: IEEE TRANSACTIONS ON INFORMATION THEORY, …yzchen/papers/papers/limits_protinfo_gal...IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-22, NO, ~4, JULY 1976 385 Basic Limits oh Protocol

GALLAGER: DATA COMMUNICATION NETWORKS 389

transitions, at a slight increase in codeword length and a large decrease in complexity.

It should be stressed that what we are doing here is a little different from just single letter source coding over a set of parallel sources. Each source has memory, and thus the entropy of a single source letter is much greater than the conditional entropy of a source letter given the previous state of the source. The encoding here is encoding the conditional information, which is why both the transmitter and receiver need a list of busy states and why the encoding is virtually as efficient as possible.

In showing that the source outputs can be efficiently encoded into binary digits, we have really come to grips with only half the problem. The other half is that the code is a variable length code and sending such a code over a fixed capacity channel will generate a queueing problem. Intuition suggests that, if the number of sources is great enough, the variability in codeword length should become negligible. This, of course, is the reason for sharing a channel among many sources; the channel only serves the busy sources, and with enough sources one needs little more capacity than the average rate. The following theo- rem, which is proved in Appendix A, bears out our intuition in this regard.

Theorem 2: Let the output from K sources, each defined by Fig. 2, be encoded by the strategy just described. As- sume that at each time unit, a maximum of KH, (U) + f(K) bits can be transmitted where f(K) satisfies

Assume that the codewords are queued for transmission in a first-in first-out queue. Then the probability that a codeword will not be entirely transmitted within its own time unit approaches zero with increasing K.

Note that, as K gets large, the channel capacity per source required in the theorem approaches H,(U) which is the minimum possible as shown by Theorem 1.

In one sense this theorem is quite fragile. Real sources do not have the statistical regularity that we have assumed (for example, there are busy times of the day) and the

-number of sources that can share a channel is not effec- tively infinite. For these reasons, queuing delays are im- portant; we shall return to the question of delay in the next section.

In a number of other senses, Theorem 2 and the coding strategy just described are very robust and depend very little on the detailed behavior of the sources and the channel. Note first that we have tacitly been assuming a noiseless binary channel. One may employ coding for error detection and retransmission or for error correction to transmit the information reliably, but this introduces delay. On the other hand, with many sources sharing one channel, many bits will be transmitted over the channel in each basic time unit, so that the delay required for re- liable transmission becomes negligible as the number of sources sharing the channel increases.

Next consider what happens if the sources do not quite obey the statistical characterization of Fig. 2. As long as the fraction of sources which are busy remains close to a/(6 + E), and the fraction of sources making a transition to the busy state (or to the idle state) in each time interval re- mains close to SJ(S + t), then the average codeword lengths are given approximately by (6).

One implication of this is that, if each source has a mean message length of l/c and a mean idle sequence length of l/6 but an otherwise arbitrary distribution of message lengths and idle sequence lengths, the average codeword length for the strategy will not change appreciably. Since we have seen that a strategy designed for (and essentially optimal for) one probabilistic message source model works equally well for a whole class of models, we must conclude that the model of Fig. 2 has an interesting extremal prop- erty; it generates the most protocol information of any source model with the same mean message length and mean idle sequence length.

This extremal property of Fig. 2 is, in fact, easily dem- onstrated analytically. Consider a source model with the alphabet zero, one, and i. Assume that the expected length of runs of the idle symbol (i) is l/6 and that the expected length of runs of busy symbols (zero and one) is l/c. The entropy of the source can be maximized subject to these constraints, and it is seen that the maximum occurs when the idle runs and busy runs are each geometrically dis- tributed, leading to the entropy in (2).

If one is dealing with sources which do not have the Markov property of Fig. 2, then it is possible to design more efficient source codes for those particular sources than the ones we have described here. For example, one frequently deals with sources for which the messages all have a fixed length. One could then omit the transmission of transitions from the busy to the idle state, since the receiver would already know when these transitions had to occur.

IV. THEEFFECTOFDELAYONPROTOCOL INFORMATION

When messages can be delayed in passing through a network, there is an opportunity to send less protocol in- formation than our results in Section III indicated. First we consider a simple example. Suppose that we modify the encoding strategy of Section III in the following way. We shall allow the transmission of messages to begin only at even numbered time intervals. If a source starts a message at an odd interval of time, then that message is delayed by one time unit before being encoded. For small 6, this re- duces the average codeword length for start information (per source per time unit) from t/(8 + e)%(6) to approxi- mately e/(2(~ + S))%(2S). An equivalent way to view this is that for each message the receiver has approximately one bit of uncertainty as to whether the message from the source started at an odd time or an even time; we are saving protocol by not resolving the intermessage time delay as precisely as before.

It is also instructive to look at this same type of example in a different context. Suppose we take the source model

Page 6: IEEE TRANSACTIONS ON INFORMATION THEORY, …yzchen/papers/papers/limits_protinfo_gal...IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-22, NO, ~4, JULY 1976 385 Basic Limits oh Protocol

390 IEEE TRANSACTIONS ON INFORMATION THEORY, JULY 1976

in Fig. 2 and scale down the basic time interval of syn- chronization while keeping the mean message length and the mean message interarrival time fixed. As the time in- terval shrinks, the parameter t stays fixed, but the pa- rameter 6 decreases. As a result, the protocol information per message due to start information N(6)/& increases without bound; more and more protocol information is being used to resolve message arrival times with more and more precision.

The above situation brings out a fundamental limitation to the point of view that we adopted in Section III. In some sense, the rate at which a source produces the binary digits in a message is a detail which should not be inextricably linked to the required amount of protocol information. Our point of view in this section is that the important param- eter in specifying protocol information is the average delay that messages experience. When the delay is small, then the receiver must receive considerable information about message starting times; when the delay is large, the receiver needs less information.

In order to focus on the problem of delay in the simplest context, we abandon the synchronous model of Fig. 2 and assume instead that the message arrival times of each source form a Poisson process of a given rate CY. The pa- rameter LY is the expected number of arrivals per second, and l/or is the mean interarrival time. We assume that an entire message arrives instantaneously, rather than one bit per time unit, but it will be seen from the analysis that the results are not critically dependent upon this as- sumption. The message lengths have some given proba- bility mass function PM(m), and an entropy H(M) = XI==, - PM(m) log PM(m). We assume that the messages must have their identity preserved in transmission to the re- ceiver. That is, if a source produces a message 011 followed by a message 1011, then it is not acceptable for them to be combined and presented to the receiver as a single message 0111011. This means that the protocol information about message lengths must be transmitted to the receiver. Our only problem is to find out how much information about arrival times must be transmitted.

In the analysis that follows, we consider an arbitrary network as described in the Introduction. We consider an arbitrary source receiver pair in the network and derive a lower bound on the average information per message that must be sent to the receiver in order to limit the average delay per message to a given value d. This lower bound is independent of the network topology, but in the next section we shall describe protocols that approach the bound quite closely for networks with very large numbers of source receiver pairs. This rather surprising indepen- dence from network topology will be discussed more fully in Section VI.

We assume that different source-receiver pairs may have different message arrival rates and different distributions on message length. We also generalize the model in the Introduction slightly to allow a source to send messages to several different receivers (and vice versa). We simply regard such a source as being several virtual sources, one

virtual source being paired with, and sending messages to, each of the receivers.

We are now ready to analyze a given source-receiver pair. Let Xi, i = 1,2,. . . , be the message arrival times and Yi, i= 1,2,... , be the times at which the messages are deliv- ered to the receiver. Since we assume the message arrivals to be a Poisson process with rate (Y, the interarrival times Ti = Xi - Xi-i, i = 1,2, * * * , (with X0 = 0) are indepen- dent and each have the probability density (yemat.

For any given network and network protocol, and for any given integer N, there will be a joint probability measure PN on XN = (Xl, * -. , XN) and YN = (Yr, Yz, . . a, YN). There are two obvious constraints on the probability measure PN: first, the marginal distribution of XN must satisfy the Poisson process assumption, and second, for each message, the delay Di = Yi - Xi must be nonnegative with probability one. For a given PN, there is an average mutual information, Ip,(XN; YN) between the sequence XN and YN. This mutual information is a lower bound, given PN, on the information provided to the receiver about message arrival times at the source. Note that we are not asserting that the receiver should be interested in this information, nor that it is in a form suitable for any par- ticular use; we are simply asserting that the very existence of PN and the delivery of the messages provides this in- formation whether we want to provide it or not.

For a given PN, there is also an expected delay per message, DN given by

DN’- k ,c E(Di) (7) 1 1

where Di = Yi - Xi and E(Di) is the expected value of Di for the probability measure PN, Define pN(d) as the class of probability measures PN which both satisfy the above constraints and have an expected delay per message DN 5 d. We can obtain a lower bound on the amount of transmitted protocol information about message arrival times, subject to a constraint on expected delay, by mini- mizing IpN(XN; YN) over PN E fpN(d). This will be rec- ognized by information theorists as a rate-distortion problem, and we define the Nth order rate-distortion function by

The rate-distortion function R(d) is then defined by

R(d) = lim inf RN(d). N-m

(9)

From the definitions it is clear that R(d) is a lower bound to the average protocol information about message arrival times that must be transmitted to limit the expected delay per message to d. It is not clear whether, for any given network, protocols exist which actually transmit this minimum amount of information. Even if such protocols did exist, it is not clear that the information could be ef- ficiently encoded. In other applications of rate-distortion

Page 7: IEEE TRANSACTIONS ON INFORMATION THEORY, …yzchen/papers/papers/limits_protinfo_gal...IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-22, NO, ~4, JULY 1976 385 Basic Limits oh Protocol

GALLAGER: DATA COMMUNICATION NETWORKS 391

theory, one can demonstrate the existence of efficient encodings by introducing delay and complexity; that ap- proach does not work here since delay is the distortion measure.

There may be some question as to why we have used mean delay as our criterion here rather than, let us say, mean-square delay or maximum delay. Mean-square delay is both less tractable analytically and less conventional in data networks than mean delay and has no clear cut ad- vantages. Maximum delay appears to be quite intractable and has the further disadvantage that, in a finite network with finite capacities and Poisson arrivals, there is clearly no possibility of guaranteeing a finite maximum delay for all messages, since there is no maximum on the number of message arrivals within a given interval.

The following theorem, which is proved in Appendix B, provides a lower bound to R(d).

Theorem 3: RI(d), as given by (8) with N = 1, is a lower bound to RN(d), for all N > 1, to R(d), and to the average protocol information per message about message arrival times between a source-receiver pair for Poisson message arrivals of rate CY and expected delay d. Furthermore, RI(d) is given by

RI(d) = -logs (1 - emold) bits/message. (10)

The probability measure PI that achieves RI(d) is defined implicitly by

YI = max (Xl, d) + 2 (11)

where Z is a nonnegative random variable, independent of X1, with probability density pz(z) = (cy + p) exp - (a! + p)z, where p is given by

cre -ad

P= 1 - e-ad’ (12)

Note that RI(d) is a function only of the product ad (it is not hard to see that RN(d) and R(d) also must be func- tions only of ad). The asymptotic behavior of RI(d), for small d and large d, are given by the approximations

RI(d) = -log2 ad, ad << 1 e -ad log2 e, ad >> 1.

(13) (14)

Small ad is the case of major interest for most network applications since messages are usually relatively infre- quent. One possible exception to this is that of a teletype source in which each character is modeled as a message, but even here it is often desirable to keep ad small.

Fig. 5 compares the lower bound RI(d) with the average number of binary digits used for message arrival time protocol in the strategies of Section V. It is seen that the difference between the strategies and the bound is always less than 0.6 bits, and that the ratio approaches one for small ad. We conjecture that, for large ad, R(d) (as con- trasted with RI(d)) approaches zero as ((ud)-2 rather than as emad, but our reason for this conjecture will be explained later.

If all of the virtual sources, corresponding to the source-receiver pairs in the network, are statistically in- dependent, then the data processing theorem asserts that the information for each source-receiver pair must be transmitted through the links of the network, thus im- posing constraints on the channel capacities of the links. These capacity constraints1 have been intensively studied in the literature as multicommodity flow problems [3]. The only difference between the classical problem and that here is that the information flow here includes the protocol tirms in (15). Also, since (15) underbounds the information flow, the multicommodity flow solution will yield lower bounds on the feasible capacities.

V. STRATEGIES FOR MINIMIZING PROTOCOL INFORMATION W ITH A DELAY CONSTRAINT

In this section we discuss two strategies for encoding protocol information. For each of the strategies, we regard the sources and receivers in the network as being split into

Next we look at the total average mutual information per message between a source-receiver pair. Assume that

* We are interested not in queueing delays here but only in link ca- pacities that are capable of carrying the required traffic; we discuss the implication of queueing delays in Section VI.

0 I \ l-l 0.02 -

I I I 0.02 0.1 ul 10 100

ad ‘MESSAGE ARRIVAL RATE x AVG. MESSAGE DELAY

Fig. 5. Message start protocol information as function of average mes- sage delay.

the message lengths and message arrival times are statis- tically independent and assume that the messages are composed of independent equiprobable binary digits. Then the total mutual information, per message I, is the sum of the message length information H(M), the message arrival time information, lowerbounded by -logs (1 - emad), and the actual message information l/t, where l/e is defined as the mean message length,

I L. H(M) - logs (1 - e-old) + l/e. (15)

If the message lengths are geometrically distributed, this becomes

I 1 1 34(t) - logs (1 - e-ad) + l/t. 6

(16)

The first two terms in each of the above expressions are classified as protocol information, and the final term is classified as message information. The mutual information per unit time is, of course, just al.

Page 8: IEEE TRANSACTIONS ON INFORMATION THEORY, …yzchen/papers/papers/limits_protinfo_gal...IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-22, NO, ~4, JULY 1976 385 Basic Limits oh Protocol

392 IEEE TRANSACTIONS ON INFORMATION THEORY, JULY 1976

virtual sources and virtual receivers such that each virtual source sends messages to only one virtual receiver and vice versa.

The set of virtual source receiver pairs are partitioned in such a way that, within any block of the partition, all the sources are connected to one node and all the receivers to some other node. Also all sources within one block of the partition have the same message arrival rate a, and the same desired average message delay d. All of the sources within one block of the partition will be encoded together into a bit stream which is then transmitted through a single path in the network.

Frequently, in order to avoid overloading particular links in the network, it is necessary to split the traffic be- tween a given pair of nodes into several paths. In this case we simply further subdivide the blocks above, each smaller block containing the proper traffic for a given path. In the limit of an arbitrarily large number of source receiver pairs in each block, this subdivision can approximate the desired fraction of traffic for each path arbitrarily closely.

For simplicity we assume that, when the bit streams from several blocks must pass through the same link in the network, they are simply multiplexed together (see Fig. 6). From a practical point of view, such multiplexing is usually not a good idea because of the nonsteady flow on each bit stream. From a conceptual point of view, however, the argument of Theorem 2 can be applied to each of the strategies suggested here to show that, as the number of sources within a block of the partition becomes large, the encoded data from that block becomes steady. Our ob- jective is to focus on the tradeoff between protocol infor- mation and the delay introduced by reducing protocol information. The assumption of many sources, leading to steady flow, is an artifice for studying this relationship independently of network queuing problems.

The first strategy is the simplest. The output of each source within a block is separately queued at the input to the encoder for that block. The encoder samples the source queues cyclically, sampling each queue once each 2d sec- onds. When the encoder samples a queue, it removes all the messages from the queue, encoding first the number of messages in the queue, then the length of each message, and then the messages themselves.

Consider the entropy of the contents of a queue when it is sampled. On the assumption that the message arrivals form a Poisson process with rate o(, the probability that n messages arrive in the 2d second interval between in- spections is given by the Poisson distribution

P2ad (n) = e-2md (2old)”

I ’ n 2 0. (17) n.

Thus the entropy of the number of message arrivals in the interval is

m e-2ad (2cud)n H(Pad = c

n!e 2cld I log (2ad)n

bits. n=O n.

(18)

BLOCK 3

Fig. 6. Data network with source-receiver pairs partitioned into b!ocks.

Assuming as before that the message lengths are inde- pendent of the arrivals and are independent identically distributed random variables of mean l/t and entropy H(M), we see that the entropy of the message lengths in the interval is 2otdH(M). On the assumption that the messages are composed of independent equiprobable bi- nary digits, their entropy is 2adlE. The entropy of the queue contents per sample is thus

H(&d) + 2adH(M) + 2adle. 09)

Assuming an arbitrarily large number of sources per encoder, the encoder can generate a bit stream with arbi- trarily little more than the above number of bits per source per sample and with arbitrarily little delay. Since the message arrivals have a uniform distribution between sample points, the average delay of a message before being sampled by the encoder, and thus essentially before ap- pearing on the bit stream going to the receiver, is d.

The number of bits transmitted per source per sample in (19) can, as usual, be separated into protocol information about message arrivals (H(Psold)), protocol information about message lengths (2adH(M)), and useful information (2ad/e). In order to compare the protocol information about message arrivals with the lower bound in Section IV, we observe that the number of messages per sample ap- proaches 2 ad over a long period of time. Thus the number of protocol bits about message arrival times transmitted per message in this strategy is

H(Paad) 2crd

(20)

This value is compared with the lower bound -logs (1 - evold) in Fig. 5. For small orD, the expression in (20) ap- proaches logs (e/2otd), which is larger than the bound by logs e/2 bits per message.

The delay d in these results ignores the propagation delay on the path through the network. Thus we call d the mean protocol induced delay; the total expected delay will be d plus the propagation delay. If the number of source- receiver pairs is not sufficiently large, there will, of course, also be a nonnegligible queuing delay. The delay in the lower bound of Section IV was also a protocol-induced delay (i.e., total delay less propagation delay) as can be seen by defining time zero at the receiver in Section IV to be one propagation delay later than time zero at the source.

Page 9: IEEE TRANSACTIONS ON INFORMATION THEORY, …yzchen/papers/papers/limits_protinfo_gal...IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-22, NO, ~4, JULY 1976 385 Basic Limits oh Protocol

GALLAGER: DATA COMMUNICATION NETWORKS

Observe that the protocol-induced delay in this strategy has a maximum value of 2d, and thus this strategy yields an upper bound on the required protocol information given a constraint on maximum delay (again assuming enough source-receiver pairs to ignore improbably large queuing delays).

It is instructive to compare the message arrival time protocol in (20) with the corresponding term %(6)/a in Section III. For 6 << C, the model in Section III can be viewed as Poisson message arrivals which are sampled each time unit. Thus the average delay before an arriving mes- sage is sampled is 1/2 time unit and the arrival rate is 6, so that ad = 6/2. Finally, %(6)/a = log2 (e/6), for 6 small, which is the same as (20) with ad = 6/2 << 1.

It is also instructive to compare the message arrival time protocol here with addressing in a message switching network. First we investigate what happens in the strategy here as ad becomes small. In this limit, using (17), the probability of finding one message in a source queue ap- proaches 2ad and the probability of finding more than one message is negligible. Most source queues are empty when sampled, which suggests that an efficient way to encode the presence or absence of a message is to use run length coding across the set of sources in the block. As shown in [5], the mean number of bits per message used in this en- coding is very close to the entropy in (20), which is ap- proximately logz [(el(2ad)].

Now let K be the number of source-receiver pairs in the block and let the source-receiver pairs be assigned ad- dresses from zero to K - 1. Then an alternative strategy to the run length coding above would be to take each message as it arrives from a source, prefix it with the en- coded address of the source-receiver pair and the encoded message length, and enter it directly onto the queue for transmission to the receiver node. The number of protocol bits per message for this addressing is about logs K. Since the messages are transferred to the encoder output queue immediately upon arrival, the protocol induced delay in this addressing can be interpreted to be zero. This appears to contradict the lower bound of Section IV, but it does not for two reasons: first, the queueing delay can not be ne- glected here, and second, the capacity of the path from source node to receiver node must be greater than the in- formation rate, and the idle bits when the queue is empty carry information about message arrival times.

We see from the above that addressing can be regarded as the low protocol delay, high protocol information end, of the tradeoff between delay and protocol information. The use of addressing is preferable for 2ad > Ke. The design question of where a network with given require- ments should be on that tradeoff curve is difficult and unsolved except in the limit of arbitrarily large numbers of source-receiver pairs.

Most message switched and packet switched networks use separate addresses for source and receiver (one notable exception is Tymnet [8] which assigns addresses only to communicating source-receiver pairs). This requires more protocol bits than addressing within a block and, from an

393

kc k-l 9 INamq.*-f-Mqtk] , 1

I I

t

Fig. 7. Algorithm for strategy 2.

information-theoretic point of view, most of these extra bits are wasted. This loss of efficiency is sometimes justi- fied because, first, separate full addresses simplifies dy- namic routing, and second, the loss of efficiency is unim- portant if the messages are long.

We next describe a strategy designed for large values of ad. The encoder for a given block of sources samples the source queues cyclically, as in the first strategy, but the sampling rate is l/a. Let N be a positive integer and as- sume that we want to have an average message delay of about N/a. The strategy then, with two exceptions, is to remove one message from the queue at each sample time and encode it. The first exception is that if no message is in the queue then no message is taken out for N consecu- tive sample times. The other exception is that if more than 2N messages are in the queue then N + 1 messages are removed from the queue and encoded. This description leaves a few fine points uncovered, such as what happens during an N sample idle period if more than 2N messages accumulate (we transmit N of them), and what happens if, after transmitting an extra N messages, there are still more than 2N - 1 messages on the queue (we transmit another extra N). The flow diagram in Fig. 7 describes the strategy in detail. The idea behind the strategy is to min- imize the protocol information by making the two excep- tional events relatively rare. Waiting for N sample times when the queue is empty, or transmitting N + 1 messages when the queue is too full, have the tendency of leaving the queue half full (with N messages) after each exceptional event. Since messages normally enter and leave the queue

Page 10: IEEE TRANSACTIONS ON INFORMATION THEORY, …yzchen/papers/papers/limits_protinfo_gal...IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-22, NO, ~4, JULY 1976 385 Basic Limits oh Protocol

394 IEEE TRANSACTIONS ON INFORMATION THEORY, JULY 1976

at the same rate, the exceptional events will not happen very often.

This strategy is analyzed in Appendix C. It is shown that, for any given N, the protocol information per message about message arrivals is bounded by

2H(e(e1 1))’ N=l

2Yf 1

> 2N(N + 4/3) ’ N>2 (21)

and the average delay per message by

5N-3 12N + 16 I *

(22)

These expressions are compared with the lower bound in Fig. 5.

We note that these expressions have the protocol in- formation per message going to zero with increasing d roughly as (cxd)-2, whereas the lower bound approaches zero as e-ad. The standard deviation of the time required for N2 arrivals, however, is N/a. Thus it appears that somewhere on the order of one bit of information should be required by the receiver every N2 messages to track the arrivals within N/CX seconds. Since d = N/a, we conjecture that the lower bound is not tight for large ad and that this upper bound has at least the right qualitative behavior.

VI. DISCUSSION

The preceding two sections have derived lower and upper bounds on the number of protocol bits per message required in a data network as a function of delay (excluding propagation delays). Both bounds assumed that the mes- sage arrivals from each source formed a Poisson process. The lower bound also assumed statistical independence between different source-receiver pairs and between message lengths and arrival times for each pair. The upper bound also assumed an arbitrarily large number of source-receiver pairs at each pair of nodes and assumed noiseless channels. In this section, we give a qualitative discussion of these assumptions.

The most restrictive assumption, of course, is that of an arbitrarily large number of source-receiver pairs, which was the assumption that allowed us to ignore queueing delays. Existing and planned data networks use orders of magni- tude more protocol bits per message than we have here. Part of this extra protocol is devoted to higher level pro- tocols, and part to error control on the channels, which are largely separable issues. A major part, however, is devoted to flow control and to protocols required for alternative or dynamic routing. This paper does not address these flow control and routing protocols except to show that they become wasteful in the limit of many source-receiver pairs per node pair. It is hoped, however, that the analysis given here of addressing or message arrival protocols will lay the groundwork for future analysis of flow control and routing.

The results here do suggest one connection between protocol delay and queueing delay. In existing networks,

the protocol per message increases beyond a certain point of network loading. This causes throughput, in messages per second, to decrease beyond a certain delay, thus causing a further increase in congestion and an instability similar to that with automobile traffic. Our results here suggest that, at least as far as addressing or message arrival information is concerned, this effect can be partly coun- teracted in a well designed system by using less protocol per message for addressing as delay increases.

Next we look at the assumption that different source- receiver pairs are independent of each other. The major situation in which this is a poor assumption is that of in- teractive communication, which we would model as two source-receiver pairs going in opposite directions. In this case, a receiver will have some side information about when messages arrive at the distant source node, and the results of Section IV do not quite apply. If we assume, as we should from the standpoint of functional modularity, that the network protocols must be independent of the detailed interactive characteristics of the sources, then the results of Section IV still apply to the data through the network.

Finally, we consider the assumption of Poisson arrivals. We have not been able to establish that, for a given arrival rate, the rate-distortion function of Section IV is maxi- mized by Poisson arrivals. However, it is not difficult to see that, for small ad, the performance of the first strategy in Section V is virtually independent of the source arrival statistics and depends primarily on the arrival rate.

APPENDIX A

Proof of Theorem 2 The strategy described in the text uses separate source codes

for encoding transitions out of the busy state, transitions into the busy state, and binary data. The length of each codeword for the transitions can be chosen to exceed the conditional self infor- mation of the transitions by at most one, and the length of the codewords for the binary data is precisely equal to the conditional self information of the binary data. Thus, in one time unit, the length of the entire encoded sequence for that time unit exceeds the conditional self information of the outputs of the K sources by at most two. Since KH, (U) + f(K) bits can be transmitted within the time unit, the probability that the codeword requires more than one time unit for transmission is at most the proba- bility that the conditional self information exceeds KH, (U) + f(K) - 2. Unfortunately, a codeword may be short enough to be transmitted within the time unit, but still might be delayed be- cause of previous codewords that spilled over into the present time unit (in other words, we have a queuing problem). If the combined length of the m codewords (m 2 1) ending at the present time exceed m[KH,(U) + f(K)], then the present codeword will be delayed and, moreover, the present codeword will be delayed only if the above condition is satisfied for some m 1 1. Let Im,K be the conditional self information from the K sources over the m time units up to and including the present. Then we can upper bound the probability that the Furrent codeword is delayed PD by

PD 5 2 P(lm,~ 1 m [KH, (U) + f(K) - 21). Ml) m=l

Page 11: IEEE TRANSACTIONS ON INFORMATION THEORY, …yzchen/papers/papers/limits_protinfo_gal...IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-22, NO, ~4, JULY 1976 385 Basic Limits oh Protocol

GALLAGER: DATA COMMUNICATION NETWORKS 395

Since the K sources are statistically independent, the condi- tional self information Im,K is the sum of K independent identi- tally distributed random variables, namely, the conditional self information for the individual sources. Let ~~(a) be the semi-

(t - 6)[(1 - 6)1-s - (1 - t)l-s 2s] + 2&[P + 2WS] -

2(t + 6) d[(l - 6)1-s - 2S(l - t)l-s]2 + 4(&l-” 25 (A1O)

invariant moment generating function for the conditional self information from a single source over m time units,

h(s) = In E bp W,,dl. (A21

Using the Chernoff bound, we than have, for any s I 0,

Since 1 Xs(s)l < X,(s), we can overbound (A9) by

K,(S) 5 In [(l + 21P(s)l)x1~(s)] I Z(P(s)l + m In XI(S). (All)

PD I 2 exp (KpL,(s),- sm[KH,(U) + f(K) - 211. (A3) By direct calculation, p(s) and dp(s)/ds have the value zero at s = 0. The second derivative of p(s) is messy, but easily seen to be

m=l finite and continuous for 0 5 s < 1. Thus, given 0 < se < 1, there We need the following lemma about pm(s); we will use the exists al such thatIP(s)l I u1.s 2, for 0 I s 4 se. Similarly, In Xl(s)

Lemma to complete the proof of the theorem and then prove the and its first derivative have the values zero and H,(U), respec- Lemma. tively, at s = 0. The second derivative of In Xl(s) is finite and

Lemma: For any source parameters 0 < t < 1,O < 6 < 1, there continuous for 0 5 s < 1 so that, for large enough a2, In X,(s) 5 exists se > 0 and a finite number a such that for all m I 1 and all sH,(U) + a2s2, for 0 5 s 5 so. Choosing o( = 2al+ u2, we have s, 0 Is I so, (A4), completing the proof.

,,tm(s) 5 smH,(U) + ams’. t-44) APPENDIX B Substituting (A4) into (A3), we have, for 0 5 s 5 se,

PD 5 f exp (amKs2 - sm[f(K) - 21). Proof of Theorem 3

645) f?L=l Let XN = (Xl, X2, aa., NN), YN = (Yr, . s., YN) be a joint

Lets = K-1’2; for large enough K, this is within the range 0 I ensemble in which X, (1 I n I IV) is the arrival time of the nth

s 5 se. Then message at the source node and Y, (1 5 n 5 N) is the time of delivery of the nth message to the destination. Let PN be a joint

Po.5 2 exp m m=l [ (

a-y)] probability measure on XN, YN and assume that PNE~PN(~). First, we must show that the average mutual information between

exp [a - (f(K) - 2)K-1/2] XN and YN satisfies

= 1 - exp [ a - (f(K) - 2)K-1’2]’ (-46) IpN(XN; YN) 2 NRl(d) 031)

The limit of the right side as K - m is zero, completing the proof. and then we must calculate RI(d).

Proof of Lemma: Let A(s) be the 2 X 2 matrix with compo- Using standard relations between average mutual informations

nents and entropies (see [4, ch. 2]), we have

Ai; = PO’1 i)es’O ’l i) IpN(XN; YN) = H(XN) - H(XY YN), 03%

in which P( jl;) is the probability of a transition to state j given H(XNI YN) = H(X11 YN) + f H(X,IXl, . . . , X,-l, YN) state i for the Markov chain defining the source, and I(il i) is the

n=2

conditional self information of the source output for the same transition. Thus

5 H(Xd YI) + I? WLIXn-1, Yn) n=2

A(s) = (1 - t)l--s 2” el--s

61-s 2s (1 - S)l-s 2s 1 = H(X11 Yd + g HW, - Xn-llX,-I, I’,). n=2

033)

Here we have used the chain rule, then the fact that conditioning The semi-invariant moment generating function pL,(s) in (A2) cannot increase entropy, and finally the fact that, conditioned

is then given by on Xn-1, the interarrival interval X, - X,-r is simply a trans- r -l lation of X,. Now define the random variable 2, by

I&(S) = In to A”(S) 1T 1

C-47) 2, = Y, - X,-l. 034)

in which the row vector fe = 6/(6 + c), t/(S + t) is the steady state Since 2, is a deterministic function of Y, and X,-l,

probability assignment for the source and 1 T is the column vector of two ones. The eigenvalues of A(s) are easily calculated to be

H(X, - Xn--llXn-I, Yn) = H(X, - &--11-G, X,-l, Yn) 5 HGL - Xn-IIZA 035)

Xi(S) = L (1 - t)l- 2s + (1 - 6)l-s 2 i H(XNI YN) 5 H(X1l Y1) + 5 H(X, - X,-&Z,). (B6)

f d[(l - 6)1--S - 2S(l - ,,I-,]2 + 4(&-5 2s I

(-48) Since the interarrival times X, - X,-I are independent of each other, we also have

where the plus sign corresponds to Xl(s) and the minus to X2(s). Expanding A(s) in terms of eigenvalues and eigenvectors and H(XN) = H(X1) + E H(X, - Xn-l), U37)

n=2 multiplying the terms of (A7), we obtain iv

h(s) = ln (11 - B(s)l~l%) + i3(s)X2%)1, (A91 ZpN(XN; YN) 2 Z(X1; Yl) + 2 1(X, - X,-l; 2,) (B8)

n=2

Page 12: IEEE TRANSACTIONS ON INFORMATION THEORY, …yzchen/papers/papers/limits_protinfo_gal...IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-22, NO, ~4, JULY 1976 385 Basic Limits oh Protocol

396 IEEE TRANSACTIONS ON INFORMATION THEORY, JULY 1976

where (B8) results from substituting (B6) and (B7) into (B2). Now observe from (B4) that the delay for the nth message, D,

= Y, - X,, also satisfies

& =.%I - LL -xl--l), 21nIN. (B9)

Let the expected value of D, be d, (which is determined by the joint probability measure Phi). Since X, - X,-l has the same distribution as Xi and since D, satisfies (B9), we have

~Lxl - x,-1; m 2 Rl(&), n12 0310)

by the definition of the function RI. Letting di = E( Yi - Xi), and substituting (BlO) into (B8), we obtain

zPN(XN; YN) 1 g RI(&) n=l

5: NRI ($,&L) 0312)

where we have used the convexity of RI. Finally, since pNE;PN(d), (l/N) ZtZ1 d, I d. Since RI is a nonincreasing function, (Bl2) implies (Bl). Since PNET’,v(d) is arbitrary, we have shown that RN(d) I RI(d).

We now calculate RI(d); we use natural logarithms here and convert back to binary units in Section IV. Let X denote the ar- rival time of the message at the source and Y the delivery time at the destination. Let P be a probability measure on X, Y with the given marginal density on X,

Px(x) = @?-a”, x20.

Let the distortion measure be given by

0313)

D(x, y) = ’ - x’ I

Y>_X

a> y <x. (B14)

The infinite distortion for y < x is simply an artifice to exclude probability measures that allow Y < X.

Define the Lagrange multiplier function

Ro(P, P) = 1,(X; y) + P-WW, VI, 0315)

RO(P) = inf Rob, PI. 0316) P

Because of the convexity of RI(d), we have

RI(d) = inf .[Ro(p) - pd]. /I>_0

(B17)

Thus Ro(p) may be interpreted as the ordinate axis intercept of the tangent of slope -p to RI(d).

There is a well-known upper bound [6] to the function Ro(p) given by

RO(P) 5 - S dx px(x) In Lf

dy w(y) e-pD(ray) 1 =- S dx px(x) In epx

[ s m dy w(y) e-py 1 (Em) r

where w(y) is an arbitrary probability density. Similarly there is a well-known lower bound ([4, theorem 9.4.11 or [2, theorem 4.X3]) given by

RO(P) 1 S f(x) dx:Px(x)ln- PX(X)

0319)

where f (r :) > 0 is any function satisfying the inequality

S dx f(x)e- PD(X,Y) = e-PY S 'dxf(x)epr I1 (B20) 0

for all y 1 0. Furthermore, (B19) (and (B18)) are satisfied with equality if

epx S m dy w(y) e-py = - PXh) x f(x)

and, under these conditions, (B20) is also satisfied with equality for almost all y for which w(y) > 0.

We now find a probability density w(y) and a function f(x) satisfying (B20) and (B21). We note that the restriction X I Y means that small values of Y yield large mutual informations about X, and thus we hypothesize that there is some number ys such that w(y) > 0, for y L yc, and w(y) = 0, for y < ys. Differ- entiating (B20) with respect toy, and assuming equality for y 2 ya, we find that f(x) = p, for x: > yo. Using this in (B21) and dif- ferentiating with respect to x: yields

da + PI ___ e-aY Y LYO W(Y) = p

0, Y < Yo, (B22)

yo= 11&l (B23) a P

where (B23) follows from the constraint that w(y) is a probability density.

Substituting this value for w(y) into (B21) and integrating, we have

f(x) = P exp [(a + P)(Yo - ~11, x IYO

P> x Lye. (B24) Finally, we must demonstrate that this f(x) satisfies (B20) for ally 10. Defining h(y) as the integral in (B20) and integrating, we get

h(y) P e-py Joy dx f(x) epx

= !I e(a+P)YO-PY [I - e-nY], Y IYO. a

Using (B23), we see that h&J = 1. Also we see that the derivative of h(y) is positive for y < ya, from which it follows that h(y) < 1, for y < yc. A similar integration verifies that h(y) = 1, for y > Yo.

Since f(x) in (B24) has now been shown to satisfy (B19) with equality, we can integrate (B19) to obtain

Rob) = a+P a+P -ln- + In p. a P 01

Minimizing over p in (B17), we get

d =&&kyo, Lye-cud P=

a 1 - e-ad @=I P

RI(d) = -In (1 - emmd). (B27)

The transition probability density that achieves RI(d) is re- lated to ru(y) and f(x) by (see [4] or [2])

P(YlZ) = w(y)f(x)e-@(X,Y)

PXb) (B28)

(a + p)e-(a+P)(Y-x), y2xld p(ylx) = (a + p)e-(a+P)(Y-d),

L

yld>x

0, elsewhere.

This is equivalent to (12), completing the proof.

Page 13: IEEE TRANSACTIONS ON INFORMATION THEORY, …yzchen/papers/papers/limits_protinfo_gal...IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-22, NO, ~4, JULY 1976 385 Basic Limits oh Protocol

GALLAGER: DATA COMMUNICATION NETWORKS 397

APPENDIX C

Analysis of Strategy 2 Observe from the flow diagram describing strategy 3 that, at

the end of each sample, MS is the number of messages on the queue and Nomq is the nominal number of messages on the queue; Nomq is normally the same as Mq, but, on an idle se- quence, it is equal to the sum of Mq and the number of samples for which the queue remains in the idle state. Note that Nomq, from one sample to the next, is incremented by the number of messag,e arrivals and decremented by one. If the above operation would leave Nomq negative, Nomq is set to N - 1; whereas if it would leave Nomq greater than 2N - 1, then Nomq is succes- sively decremented by N until it is less than or equal to 2N - 1

With these observations, we can-describe Nomq by a Markov chain. The probability of i arrivals during a sample period is (ei!)-‘, and thus the transition probabilities for Nomq are given by

order pole at x = 1. Evaluating the residues of the poles, we find

g(x) = 2 2

e(x - 1)2 - 3e(x - 1) +.52(x) (C5)

where gi(x) is analytic within and on the unit circle. Let gi(x) = Bi”,s tixi. Because of the analyticity, limi,, ci = 0. Expanding the other two terms in (C5), we find that

(‘33 e 3e -

Numerically, ti converges remarkably quickly. We find that ts = 4 X 10m5, and, for i > 2, ti is even smaller.

We now return to the transition probabilitieson the Markov chain for Nomq. Note that, for all i I N - 1 and all j,

p(iii) + W + $1 = l$o e,(i _ j + 1j Lad N + lNl,.

P(iJjj = [e(i - j + l)!]-1, N-l>i-jl-l,i<N-1 This quantity remains the same if j is increased or decreased by

= e-l + (eN!)-1, N. This means that a Markov chain also describes (Nomq) mod

i=N-l,j=O N, and, because of the modular symmetry,

= ,go ie(i - j + 1 + ZN)!]-1, irj-1,ilN 1 Qdi) + Qdi + N) = -,

N OIi_<N-1. (C7)

= lc [e(i - j + 1 + 1N)!]-1, i<j-1,iLN

= 0, otherwise.

The above chain isclearly ergo&c, and we let QN(i) denote the steady state probability of state i(C) I i 5 2hf - 1) for a given N. These equations are not as difficult to solve as they appear. The simplification comes from the fact that state 0 can only be reached from itself and state 1, leading to the equation

QNU) = (e - ~)QN(O). Similarly, state 1 can be reached only from 0,l and 2, so that

Qd2) = ie i UQNW - i QNW). Similar equations can be written for each state i < N - 1, leading to the following lemma.

Lemma: Let the sequence of numbers Co, Cl, s.. be defined by the recurrence

Ci = (e - l)Ci-i - ji2 y, iL1 (Cl)

with the initial condition Co = 1. Then, for all N 1 1 and all i I N- 1,

Qdi) = QN(O)Ci. (W

The numbers C; in (Cl) can be calculated for any given i, but it is useful to have an asymptotic expression for Ci. To obtain this, let g(x) be the generating function for the sequence,

g(X) = 2 CiX! (C3) i=O

By multiplying both sides of (Cl) by xi and summing over i, we find that

g(x) = -L ex - xe’

iC4)

. This equation is used below in finding the steady state prob-

ability of state N - I. We have

QNW - 1) = i QN(N) + f Q~(o) + N?’ ~~tj) e,N’ j)r ;=o

= j$ + T,gol &NO') 1

e(N - j)!

i= QN(N - l)(e - 1) - “2 2 QN(J 1 (N - j)!’ j=O

(C8)

Substituting (C2) into this equation and comparing with (Cl), we find that

$ = C,QN (0). ((29)

This equation, (Cl), (C2), and (C7) allow us to calculate the steady state probabilities. These in turn allow us to calculate the protocol information required by the receiver about message arrivals. The protocol information is denoted by PI in the flow diagram for the algorithm. When an idle sequence starts, PI = -1 is sent. This happens when the queue is in the state Nomq = 0 and no messages enter for a sample interval. Thus

pr(pZ= -1) =Q”o=&, (Cl01 e

For N 2 2, this is given with negligible error from (C6) as

Pr(PZ= -l)= 1

2N(N + 4/3)’

Special protocol information is also required when extra messages are sent for a particular sample. We have PI = 1 if NZ extra messages are sent at the sample time. Since the number of message arrivals divided by the number of sample times ap- proaches one over a long interval of time, we expect the number of extra batches of N messages to balance the number of idle periods yielding

Regarding g(x) as a function of a complex variable, we see that it is analytic within and on the unit circle except for a second

Pr (PI = -1) = 5 1 Pr (PI = I). 1=1

(CM

Page 14: IEEE TRANSACTIONS ON INFORMATION THEORY, …yzchen/papers/papers/limits_protinfo_gal...IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-22, NO, ~4, JULY 1976 385 Basic Limits oh Protocol

398 IEEE TRANSACTIONS ON INFORMATION THEORY, JULY 1976

To derive this formally, observe that, for Nomq to make a transition from j to i with PI = 1, we must have j - i + 1 + IN arrivals in the intersample interval. Thus

Pr (PI = 1) = 2Nf ’ QN(~) 2E1 ’ j=O ~=,xJ e(z - j + 1 + lN)!’

Next observe that (C13)

2N- 12N- 1 jZio izo Qd.M(ilj)(i -j) = 0. (C14)

This equation simply states that the mean value of Nomq is constant in the steady state and follows from the equation QN(i) = BjQN(j)P(i( j). Substituting the equation for P(i( j) into (CM), we get 2N-1 2N- 1

jfo QNO’) (i -A

i=max’~-l,Oje(i - j + l)! - W - ~)QN(O)

e 2N- 1 2N-1 m i-j

-j+l+IN)!= 0.

If we extend the first sum on i to start at j - 1, we include the additional term Q~(0)/e. We can also rewrite the final sum over i and 1 by the change of variable k = i + IN.

2N- 1 2N- 1 C

time is simply l/cy times the sum of the number of messages in the queue over the first L - 1 sample times. By the law of large numbers, this sum divided by L approaches the mean length of the queue with probability one. Similarly the number of message arrivals divided by L approaches one with probability one. Thus the average message delay d is given by

d =&+b(Mq) (C16) o! when Mq is the length of the queue at the end of a sample.

The length of the queue is related to the nominal queue length Nomq. Whenever PI = -1 occurs, Nomq jumps from zero to N - 1 and Mq remains at zero. On each subsequent sample time, the difference between Nomq and Mq is reduced by one until the difference is zero. Thus the mean value of Nomq exceeds the mean value of Mq by Pr(PI = -l)N(N - 1)/2,

d =-&-,iPr(pI= -1) N(N-1) 1 ff

2 + - E(Nomq), ci

(C17) 2N-1

E(Nomq) = C &vti). j=O

Using the fact that QN(j) + QN(~ + N) = l/N, for 0 I j I N,

E(Nomq) = N + N+ - N Nc’ QN(j). j=O

2N-1 cm 2N-l+lN k-j - 1N + jEo QNW 1F; k=g+lN e(k _ j + l)r = 0. From tcg) and (c2)y

z

Combining part of the final sum with the first sum, we find QN(~) = 2.

N

2N-1 C QNW ,f

i-j N&N (0) -- j=o i=j-1 e(i - j + l)! e

2N-1 2N-1 m

- j&o QNO’) iFN lgl e(i _ j i”, + lN)l = ” (‘15)

Using (C6) and overbounding by ignoring the error terms ei, we obtain

E(Nomq)<N-2(s).

The first sum over i is clearly zero for every j, and, comparing the Substituting this into (C17) and simplifying, we have final term with (C13), we see that (C15) is equivalent to (Cl2).

It is rather messy to calculate Pr(PI = l), for each 1. On the other hand, given a random variable PI which takes on integer values from -1 to m and which satisfies (C12), it is easy to obtain an upper bound on the entropy of PI. We simply maximize the entropy of PI subject to the constraint in (C12), getting 2%(Pr(PI = -1)). The entropy of PI is the protocol information about message arrivals, and using the value for Pr(PI = -1) in (ClO) and (Cll), we have completed the derivation of (21).

We turn now to the average message delay. We split this delay into two terms: first the delay of each message before the first sample time following the message arrival, and second the inte- gral number of sample times that the message waits until the sample when it is transmitted. Since the message arrivals are Poisson, the average delay before the next sample is half a sample time or l/2@. For the second component of average delay, we use the law of large numbers. The accumulated delay in this com- ponent over all the messages that arrive before the Lth sample

111 M

[31

[41

[51

F31 I71

PI

5N-3 ’ 12N + 16 1

(Cl81

(cm)

REFERENCES N. Abramson, “Packet switching with satellites,” 1973 National Computer Conf., AFIPS Conference Proceedings, vol. 42,1973. T. Berger, Rate Distortion Theory, A Mathematical Basis for Data Compression. Englewood Cliffs, N. J.: Prentice Hall, 1971. L. R. Ford and D. R. Fulkerson, Flows in Networks. J.: Princeton University Press, 1962.

Princeton, N.

R. G. Gallager, Information Theory and Reliable Communication. New York: Wiley, 1968. R. G. Gallager and D. C. Van Voorhis, “Optimal source codes for geometrically distributed alphabets,“lEEE Trans. Inform. Theory, vol. IT-21, pp. 228-230, Mar. 1975. B. Haskell, “The computation and bounding of rate distortion functions,” IEEE Trans. Inform. Theory, vol. IT-15, pp. 525-531, Sept. 1969. R. Kahn, “Resource-sharing computer communication networks,” Proc. IEEE, vol. 60, Nov. 1972. L. Tymes, “TymnetA terminal oriented communications network,” AFIPS Conf. Proc., vol. 38,197l.


Recommended