Information theory

Chapter 1

Introduction

1.1 Introduction

Modern age is age of Information.Information is everywhere encapsulated as letters, words, sentences, prose,

sounds,signals and pictures.Everyone sees information differently but generally, Information is the

level of abstraction succeeding data and preceding Knowledge. Human be-ings perceive information as critical tool for decision making especially inuncertain environment. Human ability to gather and process informationcoupled with communicating the ideas to other similar entities played a ma-jor hand in development and establishment of human rule over other specieswhich are more strong and sharp.

Information can be seen as -

Level of Abstraction- Information is the level of abstraction , obtainedafter processing on data. Data could be any sensory input and process-ing can be done by mapping data on context.

Measure of Disorder - Information measures the disorder of the system.The measure of information is Entropy, which is again very similar tomeasure of disorder in any system.

Reduction in Complexity- Gathering information reduces the complex-ity of the system. Thus information is a tool for representing complexsystems and its process

Tool for control and transform-Information about any system providesthe knowledge of its state and change.When system undergoes a change

1

it produces information. Conversely providing specific information tothe context can influence the behaviour and produce required change

Mode of communication- Information is a mode of communicating ideas, thoughts and expectations

1.2 Information Theory

Information theory is a branch of applied science which uses techniques fromMathematics, Physics, and Engineering to quantify information and devicemethods of communicating, storing, retrieving and processing information.

Early attempts to describe Information on basis of philosophy and sciencecan be dated back to Leibnitz, however major breakthrough for informationtheory came when C. E. Shannon published his seminal paper A Math-ematical Theory of Communication in Bell Systems Technical Journal in1948.

Shannons work was based on previous attempts of R.V. Hartely and H.Nyquist, which are also published in same journal 20 years ago. Shannonswork defined the subject in new intuitive way and provided a solid theoreticalbackground on which many applications and theories are build.

The other noticeable attempts to define information were from Kolmogrov,Weiner and Fisher

Shannons paper not only provided mathematical basis for informationcommunication but also proposed a systems approach to communicationmodel. In 1948 W. Weaver published an article describing the work of Shan-non and it was this interpretation of Shannons theory which extended it toother fields like philosophy and management.

1.3 Weavers Interpretation of Shannons work

Shannons background was in Electrical engineering and hence he is concernonly in the technical aspect of communication problem. However the articleof Weaver presented the whole idea of communicating information in differentways including philosophical implications of communicating information.

Weaver classified the problem of communication into three categories,namely technical, semantic and control.

Problem Class A The Technical Problem- Problems related to commu-nicating message from one entity (Sender)to other entity (Receiver).(Shannons Work)

2

Problem Class B The Semantic Problem- Problems related to the seman-tic aspects of message or information i.e. the meaning of the messagereceived. (Floridi & Zadeh - Fuzzy Information theory)

Problem Class C The Control problem - Problems on the study of effectof information communication on the behaviour of receiver i.e. doesthe communicated message induce the desire behaviour of receiver ornot? (Wiener, Kolmogrov, Chaitin and Vigo)

In the above classification technical problems concerns with the accuracyof transference of sets of symbols, from sender to receiver .The set can bediscrete set (like written speech), continuous varying signal (like voice ormusic) or continuously varying two - dimensional pattern (like television).

The second class of problems i.e. the semantic problem has deep signif-icance in daily life. Almost all human communication is based on semanticinformation, like language (set of symbols which produce different meaningswith different combinations) or body gestures etc. Semantic problem is alsoa difficult problem to handle only on the basis of technical aspects of infor-mation theory.

Class C problems are logical extension of problems of class B in the sensethat after understanding the content of the message dose the receiver behavesin the desired way? The question of control through information is topic ofintense research in humanities and cognitive sciences.

Weaver (page 5) also argue that all the three levels of problem are in-terrelated and the technical side is the heart of any solution we proposeto solve the communication problem. Hence sending message correctly andcompletely without errors is the first important step.

1.4 Structure of this Review

Information theory, from its conception has became a potent subject of studyand research. It is now interpolating with other areas of sciences like Com-puter sciences, Fuzzy theory, Electronics Engineering, Systems theory to ad-dress some of very old problems of various applied subjects and in similarnature it is merging with Cognitive sciences, Psychology, Communication,Management, Information systems,Economics to develop theories of humaninteraction and nature. Presented review on Information Theory is an at-tempt to trace its history, its present status and see its future potential bothin Applied Sciences and Humanities. The structure of this review is based onthe classification of Weaver (See section ) and the particulars are following -

3

Chapter - I - Introduction - Presents a general overview of Informationtheory, classification of problems related to information theory

Chapter - II - The Technical Problem - The technical problem and itsdescription. Works of Shannon, Kolmogrov, Weiner. Entropy and itsrelation with information. Improvement on Shannons theory. Appli-cation in other areas like Cryptography and Coding.

Chapter - III - The Semantic Problem - Overview of Semantic Prob-lem. Various views on the Meaning of message. Attempts of quantifi-cation of semantic content - Semantic and Fuzzy Information theory.

Chapter - IV - The Control Problem - Future of Information Theory- Controlling aspects of information. Information in general - repre-sentation of complexity of system. Application of various theories inproblems related to Science, Engineering and Humanities.

*************

4

Chapter 2

The Technical Problem

In 1948 C. E. Shannon published A Mathematical Theory of Communicationin Bell Systems Technical Journal. It was not a first attempt of proposingmathematical framework for communication, but it was by far most com-plete and comprehensive theory of communication. According to Verdu,Shannons discovery of the fundamental laws of data compression and trans-mission marks the birth of Information theory. Shannons theory was anattempt to unify various fields like Probability, Statistics, Computer Sci-ences etc. The proposed theory still continues to be an inspiration to newfindings in field of Information communication. In this chapter we will seethe main achievements of Shannon and the applications of Shannons theoryin other fields.

2.1 Attempts to Define Information Mathe-

matically

Before 1948 major modes of communications which existed were Telephone,telegraph(Morse Code), Wireless Telegraph, Television, FM and AM Radio.In each of above methods there are some elements of Information theory, likeMorse code is based on efficient way of encoding information based on thefrequency of the symbols to be encoded.

In 1924 H. Nyquist first proposed use of logarithm for defining the trans-mission rate

W = K logm

where W = Speed of transmission of intelligence (Number of characters whichcan be transmitted in a given length of time), m = number of current values,and K is a constant.

5

Nyquist also discussed the possibility of transmission gain by replac-ing Morse code by an optimum code. Similar studies were conducted byKuppfmuller, Kotelnikov, Whittaker, and Gabor targeted towards the sig-nalling speed or bandwidth manipulation. R. Hartelys paper in 1928 intro-duced various terms like capacity to communicate information, rate ofcommunication and most importantly the quantitative measure of infor-mation.

He defines the amount of information on the basis of available choices ofmessages for the sender. For, if there are n selection states and s numberof symbols available in each selection then the amount of information H isdefined by

H = n log s

In above mentioned attempts there are two prominent features - One, Infor-mation is defined as statistically, on the basis of frequencies of letters andwords (it was useful for Cryptography) and second there was no study for theeffect of noise(i.e. unwanted information or disturbance caused by mediumof communication).

During WWII need of providing information with increased rate and bet-ter security measures was immense.There were several attempts to buildtheory of communication including trade-off between transmission rate, reli-ability, bandwidth and signal - noise ratio. Prominent names were NorbertWeiner, A.N.Kolmogorov R. A. Fisher and C. E. Shannon.

In his ground - breaking book Cybernetics, Weiner classifies the interestsof his peers. He writes

The idea of [probabilistic] information occurred at about thesame time to several writers, among them are statistician R.A.Fisher, Dr. Shannon of Bell Labs and the author. Fishers mo-tive in studying the subject is to be found in classical statisticaltheory, that of Shannon in the problem of coding informationand that of the author in the problem of noise and message inelectrical filter. Let it be remarked parenthetically that some ofmy speculations in this direction attach themselves to the earlierwork of Kolmogorov...

Shannons and Weiners works had appeared simultaneously, delayed by warbut accepted and popularized very quickly both in theoretical and appliedrealms. Today information theory is synonymous with Shannons communi-cation theory however in early years it was also known as Weiner - ShannonCommunication theory

6

2.2 The Birth of Information Theory - Shan-

nons Work

In 1945 Shannon wrote a technical report on information systems and useof cryptography. This report contains the phrases Entropy, Informationtheory which were later defined in the paper published in 1948 entitled AMathematical Theory of Communication.

The publication was at that time mo comprehensive and had a strongbasis of mathematical theory.On the onset of the paper Shannon assertedthat the semantic aspects (meaning) of communication are irrelevant toengineering problem . However he made an important note of the basicstructure of language helping in coding and communication of information.In Shannons words -

We can think of a discrete source as generating the messagesymbol by symbol. It chooses successive symbols according tocertain probabilities depending, in general on preceding choicesas well as the particular symbols in the question.

Thus Shannon had found a way to exploit the redundancy of languageand based on this, he proposed the mathematical model of a communicationsystem which produces sequence of symbols on the basis of probability( as astochastic process - specifically a Markov Process).

Major breakthrough of Shannon was to define the measure of informa-tion i. e. at what rate a Markov process generate information. Let usconsider a single random variable with n possible outcomes with probabili-ties p1, p2, p3..., pn, then the measure of choice, uncertainty or information isgiven by

H = ni=1

pi log pi

The quantity H, is known as Entropy1. Entropy measures the average un-certainty in the message. The unit of measure of information is known asbit2. The important properties of this measure H are as followed -

1. H is continuous in pi.

2. H is positive - since pi 1 log pi 0.3. H = 0 if and only if one and only one of pis is one and all remaining

are 0 i.e. if we are certain about the outcome then the entropy is zero.

1The name was suggested by John Von Neumann2Suggested by John Wilder Tukey

7

4. If all pi are equal to 1/n then H is maximum equal to log n i.e. whenevery outcome is equally likely then the entropy is maximum.

5. If x and y are two events with m and n possibilities respectively. Ifp(i, j) is the probability of joint occurrence of xi and yj then the entropyof joint event is given by -

H(x, y) = i,j p(i, j) log p(i, j)H(x) = i,j p(i, j) logj p(i, j)H(y) = i,j p(i, j) logi p(i, j)

6. Any operation which equalize the probabilities, will increase H.

7. The conditional entropy of y, is defined byHx(y) =

i,j p(i, j) log pi(j) Also we can write

H(x, y) = H(x)+Hx(y), where x and y are not necessarily independent.

2.2.1 Model of Communication System

Shannon has also presented a model for a general communication system,which comprises of five parts (see fig.)

Figure 2.1: Schematic Diagram

Source- An information source is a producer of messages to be send. Amessage can be discrete sequence like written letters, or a function oftime like telephone, or may be function of several variables or severalfunctions like television signals.

8

Transmitter- Transmitter converts the message into signals which can betransmitted over the channel. For example in telegraphy words arechanged into sequences of spaces, dashes and dots.

Channel- Channel is a medium used to transmit the signal from the trans-mitter to the receiver.Example - transmission lines for telegraph andtelephones.During transmission the signal may be disturbed by un-wanted signals known as noise.

Receiver- Receiver performs a reverse operation on incoming signal to con-vert it into message, it is a reconstruction of message from receivedsignal.

Destination- Destination is the ultimate end of communication system, aperson or entity for which the message was sent.

2.2.2 Information and Entropy

The quantity which measures the information in any message is known as en-tropy, denoted by H = pi log pi. The quantity is similar to the homonymmeasure of randomness in physics. The entropy in Physical sciences was in-troduced by Clausius about two centuries ago. The works of Boltzmannand Gibbs then firmly establish the concept in Statistical Mechanics andThermodynamics. In words of A. Eddington -

The second law of thermodynamics - the law that entropy alwaysincreases - holds the supreme position among the laws of nature

The resemblance of physical entropy H =fi log fi and informational en-

tropy H = pilogpi has deep significance in Information theory. In phys-ical sciences the entropy measures the degree of randomness or the disorderof the system. It is fundamental property of any dynamic system to becomemore and more disordered or shued. This property gives the time its arrow.Similarly the informational entropy measures the average surprise in themessage. If a message contains very discernible pattern of letters repeatedlythen the it contains very less information because we can predict the nextstring of letters.

2.2.3 Some Important theorems from Shannon

Shannon has not only classified the source of message but provided efficientways to encode the communication and data compression on the basis ofentropy. According to Shannon H is approximately the logarithm (base 2)

9

of the reciprocal probability of a long sequence divided by the number ofsymbols in the sequences. Following theorem is given -

Theorem 2.2.1 (Shannon) Given any > 0 and > 0, we can find anN0 such that the sequences of any length N N0 fall into two classes:

1. A set whose total probability is less than .

2. The remainder, all of whose members have probabilities satisfying theinequality log p1N H

< Coding and Data Compression

In above theorem the second class of sequences are known as typical se-quences. The probability is decreasing exponentially with increasing block-length. This fact is useful in the data compression and coding, we can ne-glect atypical sequences and code the typical sequences assuming them to beequiprobable.The resulting encoding of string of N symbols will be a longerstring of HN , increasing the length will delimit the probability of failure ofrecovery of signal as small as possible.The above theorem provides a subop-timal coding method, thus in next theorem Shannon provides a criterion foroptimal coding.

Theorem 2.2.2 (Strong Converse Source Coding Theorem) We candefine n(q) to be the number of most probable sequences of length N , whichaccumulate to give probability q, then

limN

log n(q)

N= H

where q does not equal to 0 or 1.

In other words, as long as we require probability of error strictly less than 1,asymptotically we cannot encode the rates below the entropy. The weak con-verse theorem states that error probability cannot vanish if the compressionrate is below the entropy.

Channel Capacity

The channel capacity is defined by the maximum possible rate at which asystem can transmit information. Mathematically we define channel capacityof a discrete channel by -

C = limT

logN (T )

T

10

where N(T ) is the number of allowed signals of duration T.

Theorem 2.2.3 (The Fundamental Theorem of Discrete Noiseless Channel)Let a source have entropy H (bits per symbol) and a channel have a capacityC (bits per second). Then it is possible to encode the output of the source insuch a way as to transmit at the average rate C

H symbols per second over

the channel where is arbitrarily small. It is not possible to transmit at anaverage rate greater than C

H.

Discrete Channel With Noise

When we transmit a signal, the receiver gets the signal plus some perturba-tions due to medium, which is known as noise.Thus the received signal cannotbe completely reconstructed into original transmitted signal.Noise can be dis-tortion, disruption or overlapping of incoming signal. Distortion is easy toremove because of invertability of function, however the cases where signaldoes not undergo the same change in transmission, noise has to be compen-sated by other methods. If E is the received signal, S is the transmittedsignal and N is noise then E = f(S,N), E is function of S and N .

When we fed a noisy channel a source there are two statistical processes,the source and the noise. If H(x) is the entropy of source and H(y) is theentropy of receiver, when there is no noise then H(x) = H(y). If H(x, y) isthe joint entropy then the following relation is true for a noisy channel -

H(x, y) = H(x) +Hx(y) = H(y) +Hy(x)

If the channel is noisy then the incoming signal cannot be reconstructed intotransmitted signal with certainty by any operation on E.However there areways to guess or fill the missing information. According to Shannon theconditional entropy is the apt measure of the missing information, thus theactual rate of transmission R is given by

R = H(x)Hy(x),which is the amount of information sent less the uncertainty of what wassent.The quantity Hy(x) is known as equivocation, which measures the av-erage ambiguity of the received signal. Following theorem uses the fact ofequivocation and proposes the use of a correction channel, which enable thereceiver to measure the noise and correct the error.

Theorem 2.2.4 (Correction Channel) If the correction channel has a ca-pacity equal to Hy(x) it is possible to so encode the correction data as to sendit over this channel and correct all but an arbitrarily small fraction of theerrors. This is not possible if the channel capacity is less than Hy(x).

11

Channel Capacity of Noisy Channel

The channel capacity of a noisy channel is the maximum possible rate oftransmission or the source is matched to the channel -

C = Max(H(x)Hy(x))

, where the maximum is taken over all possible inputs. If the channel isnoiseless then Hy(x) = 0.

Theorem 2.2.5 (Fundamental Theorem of Discrete Channel with Noise)Let a discrete channel have the capacity C and a discrete source the entropyper second H. If H C there exists a coding system such that the outputof the source can be transmitted over the channel with an arbitrarily smallfrequency of errors or arbitrarily small equivocation. If H > C it is possibleto encode the source so that the equivocation is less than HC + , for arbi-trarily small . There is no method of encoding which gives an equivocationless than H C.

2.3 Application of Shannons Theory & Im-

provements

In this section we shall discuss the application of Shannons theory and sub-sequent improvements by other significant contributors. In 1949, Shannonsother paper Communication Theory of Secrecy Systems was made pub-lic. Together these two papers Shannon had established a deep theoreticalfoundation for various branches of digital communication - data compression,data encryption and data correction (Gappmair).

Coding Theory

The fundamental problem in communication engineering is ideal encoding, i.e.how to transmit as many bits of information as possible while maintaininga high reliability(Jan Kahre). Coding Theory is the study of codes andtheir various applications. Codes are used for data compression (known assource coding),cryptography,digital communication,network communicationand error correction (channel coding).

Data Encoding is process of conflicting goal - more information can besend but the accuracy will be compromised. The second part of Shannonspaper [MTC]addresses the issue of encoding the information in optimum way.According to Shannon the objective of encoding is to introduce redundancy

12

so that even if some of the information is lost or corrupted it will still bepossible to recover the message at the receiver (Error correcting codes)

Shannon proposed (1957) a new measure, namely Mutual Informationwhich measures the information about one random variable by observing theanother variable. If x and y are two random variables, then the mutualinformation of x relative to y is given by-

Iy(x) =x,y

p(x, y) logp(x, y)

p(x)p(y)

The following relation is true for mutual information Iy(x) = H(x)Hy(x),where Hy(x) is equivocation. Thus mutual information is the gain in theinformation coding of x, when we know y, compared to when we do notknow y.

Shannon completed the theory of coding with following two theorems -

Theorem 2.3.1 (Source Coding Theorem) The number of bits neces-sary to uniquely describe any data source can approach the correspondinginformation content as closely as desired. If there are N random variablesindependently distributed with entropy H, then these data can be compressedinto NH bits with negligible risk of information loss (as N ), but con-versely if data are compressed into fewer bits than NH, then there will besome data loss.

Theorem 2.3.2 (Channel Coding Theorem) The error rate of data trans-mitted over a band - limited noisy channel can be reduced to an arbitrarilysmall amount if the information rate is lower than the channel capacity.

In his attempts Shannon did not proposed any algorithm to encode infor-mation but in subsequent years growth of digital communication saw therise of coding theory as separate branch of investigation in engineering andscience. Shannons Channel coding theorem had predicted the forward errorcorrection schemes,invented in 1967 by A. J. Viterbi. Similarly more datacompression methods(both lossless and lossy) have been found after emer-gence of mobile communication.Following figure (adapted from Gappmair)presents an overview of various coding schemes.

2.3.1 Entropy and its Generalization

The relation of physical entropy and informational entropy goes very back inhistory. In 1860 Maxwell proposed a thought experiment by name MaxwellsDemon. In this experiment Maxwell proposed a bifurcated chamber with

13

Figure 2.2: History of Coding

gas molecules moving randomly. The partition has a gate with hinge whichcan moved by a demon with negligible work. The demon watches moleculesand stops the fast moving ones to move from side A to side B and similarlystops slow moving ones from B to A. After finite time the side A will haveonly fast moving molecules and side B will have only slow moving molecules.Thus the entire system is more ordered, hence lesser entropy. This violatesthe second law of thermodynamics - i.e. the entropy of system does notdecrease without doing work.(See 2.3)

The thought experiment by Maxwell was controversial at that time andwas interpreted in various ways to overcome the violation of second law. Oneof the interpretation was given by Leo Szilard, in which he proposed that thedemon has to gather information about the speed of gas molecules in thechamber,hence gathering information and using it generates the necessaryamount of work which reduces the entropy of system.

Shannon has proposed not only the measure of informationH = pi log pi,but also defines the maximum and minimum cases. The maximum possibleentropy Hmax = log n is attained when each possible outcome (n messages)has equal probability, given by 1

n. This case is also known as white noise. The

opposite case (single message) is obtained when there is only one messagehas probability 1 and all other n 1 messages have zero probability, i.e. we

14

Figure 2.3: Maxwells Demon - A thought Experiment

can send only one message, then Hmin = 0.Information entropy gives a value for only the potential for information

as we go from one letter in the text to the next letter.The value of entropydenotes the average choice available for the sender about the massage. Ahigh value of H does not denote high information content, because higheruncertainty may be caused by either noise or the message is entire gibber-ish. As discussed earlier in the chapter the semantic aspects of message isneglected in the technical theory of communication,but Shannons anotherarticle in 1951 describes the counter balance of entropy (uncertainty) by in-herit property of communication medium i.e. language, which is known as -redundancy. If sender selects some part of the message freely then remainingpart is filled by the redundancy(language structure,grammatical rules andrules of word formation).

Thus the communication process, as described by Shannon, is more so astatistical process of selecting and sending messages from a set of possiblemessages, and hence it is suitably defined by a measure based on the prob-ability. There are other entropies defined by various eminent researchers, inwhich some are refinements of Shannons entropy and some are generaliza-tion of entropy function or communication process. Following is the list ofvarious (information)entropies -

Shannon Entropy Continuous case- If information is coming from a con-tinuous source, with probability distribution P = f(x), then the con-

15

tinuous equivalent of Shannons entropy is given by -

H(P ) =

f(xi) log f(xi)

Renyi Entropy - Renyi entropy is a generalization of Shannons entropy,which is used for measuring information from weak signal (of lowerprobabilities), which are overlapped by stronger signals (of higher prob-abilities).We define Renyi entropy by

H =1

1 log(ni=1

pi )

Leaving inclusion of , the Renyi entropy works just as Shannons en-tropy (equals to Shannons entropy when = 1). It has maximumH(max) = log n at pi = 1/n, and satisfy the additive property. Theinclusion of parameter makes it more sensitive to selective frequen-cies.

Tsallis Entropy - Tsallis Entropy is defined by

T =1

1(1ni=1

pi )

Min-Entropy - Min - Entropy is defined by the limiting value of RenyiEntropy as . It is given by H = logmaxxXp(x)

2.3.2 End Note on the Chapter -

As it is clear from Shannons theory and its subsequent improvements, byShannon and other scientists, have strengthen digital communication and es-tablished information theory as a important subject of research and study.BeforeShannons initial attempt there was no useful definition of information andneither was any yardstick to measure information content in communication.Shannon introduced terms like Entropy, Channel Capacity,Encoding,and also produced various results which became pillars of the subject. Shan-non termed the paper A Mathematical Theory of Communication, but helater termed the theory by nformation theory. Shannon also stated thatthe theory is gaining immense popularity which was overwhelming, and cau-tioned the general research community about the bandwagon effect. The

16

theory itself presented few unanswered questions about the meaning andcontrol aspects of communication.

Shannon has declared that the mathematical/ technical aspect is free fromthe semantic aspect of information theory, however to apply the Shannonstheory in fields like Economics, Psychology or Biology, one cannot overlookthese aspects.For Shannon the semantic aspects are left due to his believe thatit plays no part in technical side of theory. Weaver in his paper cleared theinterconnection between technical, semantic and control side of informationtheory and how they are related to each other, and in the heart of this is thetechnical or mathematical aspect of communication.

*************

17

Chapter 3

The Semantic Problem

The credit to establish Information Theory as a discipline of study goes toC. E. Shannon and Norbert Weiner. However, making the theory famousand accessible to general scientific community was the attempt of WarrenWeaver who published The Mathematics of Communication in ScientificAmerican Journal in 1949. Weaver defines Communication in a broadersense, a process by which one entity may effect another entity.This processmay be of three types - Technical - sending and receiving messages, Semantic- meaning of the message, and Influential - controlling the behaviour ofreceiver of message through communication.

Shannons theory is attempt to establish rules for the syntactic part, butit neglects the semantic and control aspects. Semantics is important aspect ofcommunication, specially in human communication where we have to conveyour thoughts to similar entities, or we have to understand the meaning ofwhat the others are communicating.

In 1938 C.W. Morris published a book on the theory of signs titled Foun-dations of Theory of Signs. In this he describes not only the communicationprocess of humans by signs and symbols but all organisms capable of commu-nication. Human communication depends partially on language. Languageis collection of symbols, groups of symbols and the rules of grouping the sym-bols. Language can be understand as a coding method , which one humanuse to encode his/her thoughts and communicate. Other methods of com-municate are performing arts (music, dance, acting,painting etc.) and bodylanguage.

Morris defines the process of communication through signs, semiotics1, asa universal process and an interdisciplinary undertaking.Morriss foundationdivides semiotics into three interrelated discipline -

1A version of word Semiosis given by Ch. S. Peirce

18

1. syntactics- the study of the methods by which signs may be combinedto form compound signs.

2. semantics- the study of the significance of signs.

3. pragmatics- the study of the origins, uses, and effects of signs.

Thus semantics is the study of combinations of signs which indicates somemeaning to receiver of the message.The success of communication dependson the fact that it generates the desired behaviour in the part of receiver ornot.

3.1 Understanding the Semantic Problem-

Weaver writes in his paper -

The semantic problems are concerned with the identity or sat-isfactorily close approximation, in the interpretation of meaningby the receiver,as compared with the intended meaning of thesender. This is very deep and involved situation, even when onedeals only with the relatively simpler problem of communicatingthrough speech.

Above quote sum-up the semantic problem adequately - it concerns withunderstanding on the side of receiver, where the meaning is intended by thesender.The complexity of this problem can be illustrated by simple example.Suppose A asks B Do you understand me?, and B replies No, then wecannot be certain that B has not understood the question or the meaning ofthe question.

Thus semantic problem of communication can be interpreted as prob-lem of approximation (understanding as close to the intended meaning) andcoding (encoding understanding/meaning in one mode of communication toother mode.) Weaver also introduce elements of meaning and effectivenessin the schematic diagram proposed by Shannon(See Figure3.1)

The schematic diagram now contains following new blocks -

Semantic Coding - Semantic coding becomes the first coding of informa-tion into message, it can be mode of expression of intended meaningby sender, generally it is language(but can be any means of expression,like music, art, painting, writing etc).

Semantic Noise - The unwanted (or essential) addition in the informationcontent of the message to make it receivable or understandable. It

19

Figure 3.1: Modified Communication Scheme For Semantic Messaging

works like engineering noise present in the medium, but it affects thesemantic content of the message.The noise is produced on the part ofsender, and often it is in form of redundancy.

Semantic Receiver/decoder - The semantic receiver works at the sideof receiver, which takes the message and decode it on the semanticbasis.For example - Voice of known person works differently on us,while some stranger shouting our name works differently.

The theory of semantic problem thus works differently from that of tech-nical problem. The technical problem is independent of mode of message(language, music, pictures etc.).It is concerned with how we can send thecontent (message), from one point (sender) to another point (receiver), effi-ciently and quickly?, however, semantic problem starts with the selectionof communication mode and continues to till the intended message is under-stood by the receiver and the required response is generated (at this point itmerges with influential problem) The main questions in semantic informationtheory are -

1. How to quantify semantic information?

20

2. How can semantic information theory help in data compression andencryption (reliable communication)?

3. How engineering coding is related to semantic coding?

4. Identifying and quantifying semantic noise.

5. Are there any bounds in semantic coding, analogues to Shannons clas-sical information theory?

6. How we can improve semantic communication where the content ofmessage matters?

From Shannons theory ( Information is sequence of bits), Semantic theoryis whole different approach (Meaning or content of message).In next sectionwe shall see some existing theories of semantic information.

3.2 Theories of Semantic Information -

Theories of semantic information are applied in those communication caseswhere the semantic content is important.The endeavour of formulating theoryof semantic information started soon after Shannon proposed his communi-cation theory (generally known as classical information theory).Most notablewere C. S. Peirce (1918),Donal MacKay (1948),Bar-Hillel & Carnap (1952),L. A. Zadeh (1971-),Floridi (2000-) etc. Following are some theories whichtry to capture the essence of semantics-

3.2.1 Classical Semantic theory - Bar-Hillel & Carnap

In 1952 Yehoshua Bar-Hillel and Rudolf Carnap presented Theory of Se-mantic Information. It was the first attempt to define semantic content inmessage. The highlight of (Classical) Semantic theory (CST) was that ituses similar ideas that of Shannon and defines measures of semantic contentuseful for application in technical as well as non - technical problems.

The CST applies on a formal system of language, denoted by Lpin.It de-notes a universe which includes entities, formal statements (predicates), log-ical connectives and rules to form compound formal statements.Statementscan be Basic or Ordinary. A basic statement is predicate applied to anentity(Ram is tall) and an ordinary statement is combination of basic state-ments (Ram is tall and Laxman is handsome.).Furthermore, we have statedescriptions, which are statement about the universe in terms of predicateapplied to every entity of the universe. There are infinite number of state

21

description possible. If we take any ordinary statement and count the num-ber of state description which are made false by it, then this number givesthe information content of the statement.

If more state descriptions are made false by the statement, its informa-tion content is that much higher. A tautology (true by definition) does notdisapprove anything, thus it contains zero information. A self-contradictorystatement, disapproves everything, hence it contains maximum informationcontent. Statements, which are logically indeterminate rule out some pos-sibilities, thus they have some level of content.More particular a statement,more it rules out possible states and it has more information content. 2

To measure the information content, Bar-Hillel and Carnap used logicalprobability instead of classical probability (used by Shannon). For a hypoth-esis Hi to be probably true, the empirical evidence is E, then the degree ofsupport of Hi by E is known as logical probability of Hi given E. Logicalprobabilities are based on logical relations between propositions.It is denotedby m(A) = TA

TS, where TA= total no. of times when A is logically true, TS =

Total no. of logical states.With use of logical probability of state description CST proposes two

information measure, based on Inverse relationship property. These measureare defined as followed -

1. Content - Denoted by Cont(A), the content measure, measures theamount of content in the logical statement denoted by A, and it isdefined by Cont(A) = 1m(A),m(A) being the logical probability ofA.

2. Information - Denoted by In(A), the information measure, measuresthe semantic information of the logical statement, and it is defined by

In(A) = log1

1 Cont(A) = log1

m(A)= logm(A)

Following are some properties of above two functions -

1. log is logarithm base 2.

2. For a basic statement B,Cont(B) = (1/2) and In(B) = 1.

3. For conjunction of n basic statements Cn, Cont(Cn) = 1 (1/2)n

4. For disjunction of n basic statements Dn, Cont(Dn) = (1/2)n

2The inverse of information content is Range, the number of states which are confirmed(implied)by a statement.

22

5. 0 In(A) 6. In(A) = 0 if and only if A is logically true.

7. In(A) = if and only if A is logically false.8. In(A) is positively finite iff A is factual.

9. If A logically implies B then In(A) In(B)10. In(A) = In(B), iff A is logically equivalent to B.

11. In(A B) = logCont( A B)12. In(A B) = logCont( A B)13. IfA andB are inductively independent then In(AB) = In(A)+In(B)14. The relative content of B with respect to A is defined by Cont(B/

A) = Cont(A B) Cont(A)15. If A and B are inductively independent then In(B/A) = In(B)

Bar - Hillel and Carnap also define estimation of amount of information. IfH = h1, h2, , hn is an exhaustive system on a given evidence e, then theestimate of the information carried by H with respect to e is given by theformula

Est(In,H, e) =ni=1

c(hi, e) In(hi/e)

where c(hi, e) is the relative inductive probability.

Critique on CST

Bar -Hillel and Carnap proposed CST in lines along Shannons classical in-formation theory. However following are some limitations of CST-

1. CST applies only on a formal universe which includes only logical state-ments.It does not includes communication between two entities.

2. CST provides clear definitions, but it does not apply with practicalaspects of communication.

3. CST tries to measure possible meaning of set of propositions ratherthan actual meanings.

23

4. Generally CST will apply to very restricted domain of formal state-ments, it cannot be applicable to real world languages.

5. CST assumes ideal receiver, which can understand all of the logicand consequences of incoming message. However the theory does nothave scope of mis-information or wrong information.

6. CST assumes infinite information content from a contradiction , how-ever real life situations does not confirm this.

Some improvement on CST was done by J. Hintikka, in terms of inclusionof polyadic first order logic and defining new information content measure oftautology (non-zero).Hintikka also prove the maximum information contentissue for contradictions.

3.2.2 Floridis Strongly Semantic Theory

In Classical Semantic Theory of Bar-Hillel and Carnap, a statement whichis less probable or less possible, contains more information (element of sur-prise).However, we cannot make a sentence more and more wrong to increasethe information content. According to authors, making statement less likelyto be true will increase information, but at certain point it will implode, i.e.becomes too informative to be true. Bar-Hillel and Carnap address this prob-lem by saying that people do not communicate in self -contradictory state-ments because they assert too much.This phenomena is commonly known asBar-Hillel and Carnap Paradox (BCP). The occurrence of BCP shows theneed of consistency conditions for any measure of semantic content. In 2004,L. Floridi provide a possible solution of BCP in his strongly semantic theory.

Data Based definition of Information

Floridi starts the explanation with defining information on the basis of dataand meaning, the definition is also known as General Definition of Informa-tion (GDI), and it is given by -

is an instance of information, understood as semantic content, iff-

1. consists of n data (d), for n 1.2. the data are well-formed (wfd).

3. the wfd are meaningful (mwfd = )

24

Postulate one affirms Data as the building block of information, a singledata (datum) is an uninterpreted variable whose domain is left open to fur-ther interpretation. Postulate 2, denotes that data are clustered togethercorrectly according to the rules of language chosen for communication.Thethird postulate provides the condition for the clustering of symbols in a waythat it should be meaningful (it must confirm semantically to the chosenlanguage), however meaning does not always conveyed by words only.

According to Floridi, when data are well formed and meaningful, it isknown as semantic information. The semantic information can be one of twotypes Instructional Information - a piece of instruction to convey need forthe specific action or Factual Information- which represents fact (a proposi-tion). An instance of factual information can put constraints on informationagents behaviour, which are known as Constraining Affordances. One typeof affordance is alethic value or truthfulness of data.

The Strongly Semantic Theory of Information

The occurrence of Bar-Hillel and Carnap paradox shows that the ClassicalSemantic theory is weak theory. The weakness is due to the fact that truth- values and semantic information are taken independent to each other.Thestrong semantic theory (SST) takes approach of defining semantic - factualinformation from well-formed, meaningful and truthful data. A data has tofirst qualify as truthful, then the quantity of semantic information is cal-culated. Let w be a situation (Devlin defines - A situation is topologicallyconnected, structured region of space - time), and be an information in-stance. The set W be set of all situations, and E be set of all informationinstances.The information i logically conforms the situation wi (maximalinformativeness property), however each i can correspond partially to anyof wis.

The informativeness of i is a function of (i) - The alethic value of i(polarity) and (ii) degree of discrepancy between i and elements of set W ,measured in terms of degree of semantic deviation . A statement can befull of information but false and may be it has no information but it is true,the maximal informative pair (i, wi) provides a benchmark to calculate theboolean truth value and degree of discrepancies.To express both the positiveand negative discrepancies (of true and false values respectively), let f() bea function from E into set of real numbers with range [1, 1].The functionassociates a real value of discrepancy to each depending on the truthvalue and deviation from w, denoted by = f().

The resulting set contains a continuous set of ordered pairs (, ). If weplot values atX axis and a composite function (measure of informativeness)

25

of Y axis, then zero denotes complete conformity, and left and right sidedenote negative and positive discrepancies respectively.The value of denotesthe distance of an information instance from a situation w (can be read asdegree of support of w by ). The estimation of is done through a metricfunction which should satisfy following properties -

1. If is true and confirms w completely and precisely then = 0 = f()).

2. If is true and confirms the complete set W then = 1 = f().

3. If is false and confirms no situation at all then = 1 = f().4. If is false in some cases then 0 > > 1.5. If is true in some cases and does not confirm to any w completely

then 0 < < 1.

According to Floridi the degree of informativeness can be estimated by aparabolic function defined by () = 1 ()2.(see Figure 3.2)

Figure 3.2: Estimation of Semantic Information

Above measure follow all the properties stated before.If has a very highdegree of informativeness , then it has very low and we can deduce that

26

it contains very high amount of semantic information, equivalently loweramount of , shows high value of and therefore very low semantic infor-mation. The amount of information is the area delimited by the estimateequation (), and given by -

=

ba

()d

, where a and b are the bounds of area.Floridi also describes the possible solution of Bar-Hillel and Carnap Para-

dox(BCP), by stating that the content measure of Bar-Hillel and CarnapCont() measures only quantity of data in , and thus only deals in com-pletely uninterpreted data.Thus without confirming the truthfulness of , theCont measure falls in the trap of BCP.However in Strong semantic theory() is based on alethic values (truthfulness) of and its value is proportionalto semantic content, thus it avoids the BCP.

Critique of Strongly Semantic Theory

Floridis SST, provides a sound basis of evaluating semantic content in in-formation, however following are some limitation of this theory -

1. SST applies on restricted formal system, the set W of possible situa-tions (world states) must be listed down a priori.

2. Theory concentrates more on discrepancies rather than semantic con-tent.

3. The estimation of () is not possible for every situation instance, sim-ilarly denotes distance of from w, but it can be negative as well aspositive.

4. Fitting estimation curve on for real data is highly difficult and ap-proximating may provide errors.

5. To apply SST, needs to posses alethic value, but the measurement ofalethic value is not well defined.

3.2.3 Some Technical Aspects on Semantic Theory

In 2011,Jie Bao and a group of researcher, published a technical report onSemantic communication, which improves the Classical Semantic theory ofBar-Hillel and Carnap. Some important insights of this model are as follow-ing -

27

A semantic information source is denoted by tuple (Ws, Ks, Is,Ms), andsimilarly destination is denoted by (Wr, Kr, Ir,Mr), where -

W is world model of source or receiver (How they observe the world) K is background knowledge base of source or receiver. I is the inference procedure of Source or destination. M is Message generator at source side or Message interpreter at desti-

nation side.

A semantic source is an entity that can send messages using a syntax, suchthat these messages are true, i.e. semantically valid messages.Semanticsource sends most accurate and minimum cost messages. Similarly Semanticdestination is an entity which receives the messages and interpret it withthe use of Kr. A semantic error occurs when a message which is truewith respect to source Ws, Ks, Is but receive message is false with respect toWr, Kr, Ir.This may be due to source coding, noise in the channel or lossesin decoding or a combination of these.

To define semantic entropy let w Ws be a observed value of worldmodel for which the message generator Ms generates information x, in termsof propositions. If m(x) is logical probability of x denoting w well and truly(proposition x is true), then the semantic entropy is defined by Hs(x) = log2(m(x)). Similarly the conditional entropy is defined by using the con-ditional logical probability m(x|K), it gives a restricted value of x being truein K. Hence the entropy Hs(x|K) = log2(m(x|K))

Semantic Source Coding

Since all possible messages which are syntactically valid can be infinite if thelength of messages are not restricted. To delimit this we assume that theinterface language only allows a subset of all possible messages. Let X befinite set of allowed messages. A semantic coding strategy is a conditionalprobabilistic distribution P (X|W ). If is probability of w W , then thedistribution of expressed messaging P (X) can be determined by

P (x) =w

(w)P (x|w)

The Shannon entropy of messages X with the distribution P (X) is given by

H(X) = x

P (x) logP (x)

28

A relation between semantic entropy and the syntactic entropy of source isgiven by

H(X) = H(W ) +H(X|W )H(W |X)where H(X|W ) is measure of semantic redundancy of coding and H(W |X) ismeasure of semantic ambiguity of the coding. Above relation is direct resultof definition of respective entropies.Above relation states that the messageentropy can be larger or smaller than the model entropy depending uponwhether the redundancy or ambiguity is higher.

The maximum entropy is reached when the messages are description ofthe models themselves i.e. most informative coding will be the full statedescription.

Hmax = w

(w) log((w)) = H(W )

A Note on Semantic Noise

Semantic noise is an instance where the incoming content is evaluated wrongthrough Kr. However the semantic noise can affect the message both atsenders and receivers level, as opposite to technical noise which effects themessage only in medium during transmission. Let X be input in the channeland Y be output. The technical aspect deals with the minimizing the differ-ence between x y where x X and y Y .In semantic communication, wetry to keep intact the meaning of x, converted in y. Technical aspect not al-ways create semantic noise, however reduction of noise also helps in reductionof semantic noise.Generally semantic errors are of two types - Unsoundness-sent message is true but received message is false, and Incompleteness - Sentmessage is false and received message is true.In lossless coding, the messageis always true (no compression) the goal is to reduce unsoundness, i.e. tofind joint distribution of (w, x, y) such that

w=y

p(w, x, y)

is maximum.Another way of reducing noise is to introduce semantic redundancy into

message, or reformulating it in equivalent way i.e. replacing longer messageswith shorter messages or adding pictures or graphics to it.

3.2.4 Other Attempts to Define Semantic Information

Semantics relates with the language of communication, therefore the attemptto define natural languages mathematically, invariably also defines semantics

29

of the messaging. Most important work is due to L.A. Zadeh, who is thefounding father of Fuzzy Theory. Day to day communication is mostlyfuzzy in the sense of meaning. Thus to capture the fuzziness we can usePossibilistic logic instead of probabilistic logic. This is defined as quantitativefuzzy semantics, or Possibilistic Relational Universal Fuzzy (PRUF).

Possibilistic logic applies to propositions of type x is F where x is objectand F is a fuzzy subset of universe of discourse U . PRUF uses possibilisticfunctions to represent such statements which provides meaning in terms offuzzy set F of objects x.The benefit of possibilistic logic is that it removesthe necessity of truth values, by using fuzzy logic we can now use linguisticconstructs for relations and logic.These attempts are dealt in extension in(Ref.)

3.3 End Note to the Chapter

Problem of semantic information theory is an interesting and inspiring chal-lenge in various fields of study. Semantic theory is applied in our daily lifein which we communicate meaning to similar entities. Diverse problemsin Information theory, Economics, Psychology, Mathematics and ComputerScience can be dealt with use of semantic information theory.

***********

30

Chapter 4

The Future of InformationTheory

Discussion presented in previous chapters shows the importance and appli-cability of Information Theory in Engineering and Technology. Problems indiverse fields can be converted and solved by methods of information theory.The subject has given birth to new discipline of study termed as InformationScience, which borrows techniques from Mathematics, Physics, ComputerScience and engineering disciplines, and studies all the problems related toinformation transmission, collection, analysis and storage.

In this chapter we shall see the the control problem and its example inother subjects. Also we shall discuss the endeavours to define information ingeneral. We will also discuss the information theory in dealing complexity ofsystem and uncertainty. The chapter will be closed by listing out some openproblems in the field.

4.1 The Control Problem

The prime objective of communication is to influence the behaviour of re-ceiver. A logical extension of semantic problem, control problems concernswith the control of behaviour (or influence it) by information. The successof information exchange is measured by the fact that it generates the desiredbehaviour on behalf of the receiver. The application of this fact is in AI(Artificial Intelligence), Electronics and humanities (like Social Science andEconomics).

The control problem is hard to define, because it intersects the technicalproblem and semantic problem in a vague way.To generate the behaviourfrom receiver,the information first reach it (Technical problem), and the re-

31

ceiver must understand the meaning intended by sender (Semantic Problem).Most theories of information communication deals in either technical prob-lem (Shannons Communication theory) or semantic problem (Bar-Hillel &Carnap, Floridi), but there are few examples of dealing in control problem.

The controlling theory must address following points of importance -

Receiver must get the information complete and without distortion(Noise reduction, Coding and compression)

Receiver must understand the information (Semantics, Language, the-ory of signs and symbols)

Observation and Measurement of behaviour of system (Observationsystem)

Reporting of result of observation to sender (Feedback system) Correction of communicated signal (Correction system)

4.1.1 Examples of Control Problem

In this section we shall list some examples of control problem-

In Economics

Economics deals with price of resources with the goal of efficiency and effec-tiveness of resource allocation. The main information related to economicsis the level of prices. It is indicator of unused resources or extra pressure onresources.

Previous works on information theory and Economics deals with the mea-sure of information rather than the transmission part. The first instance ofapplication of Shannons information theory was in J. F. Muths paper. Muthand Stigler propound that market transmit information through prices.Alsoinformation asymmetry plays important role in competitive equilibrium. Ifevery one is has same information about future prices there is no incentiveto buy or sell the product. However if there is different information fromdifferent individuals, there will be some economic activity which will shiftthe equilibrium and change it.

The how information transmits and interpreted effects the behaviourof economics agent, which is essence of control problem.According to W.ONeill, when information is free, there can be no equilibrium in price, be-cause information on prices is freely exchanged between economic agents,however when information is costly (available only for selected individual),there will be a certain equilibrium.

32

In Ecology

In 1955 R. MacArthur proposed a measure of stability of a diverse biome onthe basis of ecological processes or flows. He cites view of Eugene Odem Theamount of choice which the energy has in following the paths up through thefood web is a measure of the stability of the community.If a species hashigher number in the system but if its energy is distributed in the food webby different predators, it will not effect the whole ecology of the system.

The measure of stability is calculated by Shannon - Weaver index givenby

S = i

pi log(pi)

where pi =fiF

is the contribution (fraction) of energy flow of specie (i) givenby fi in the sum of all flows F . The measurement is very similar to theentropy in information theory.The theory states that the higher diversityindex shows higher stability of ecological system.

Operations Management

Operations management is concerned with sooth running of operations ofany system which takes inputs, process it and produces output. Informationflow in this type of system is most important because it facilitates feedbackand correction.Automation of production processes and use of PLC (Pro-grammable Logic & Controlling) machines in controlling the process in anexample of information exchange.PLC machines gather information aboutproduction line and evaluate a possibility of a bottleneck (stoppage of pro-duction), this makes production line smooth and less flawed.

In Artificial Intelligence

Artificial Intelligence deals will intelligence in machines. Intelligence can bedefined by assimilation of information and decision making about any par-ticular context.Officially AI was born in 1956 and now it has been applied tomany areas.Recent literature on AI shows the increasing trend towards pro-ducing Human level intelligence in AI system. According to Zadeh,capabilityof reasoning and decision making on the basis of possibility and partial in-formation is most remarkable ability of humans.Achieving this quality in amachine is still a non-achieved goal of AI.

Zadeh has also proposed a solution of the problem in terms of Comput-ing with words. Computing with words or CWW is method of converting

33

words as computing tools (as humans do) opposite to bivalent logic (whichmachines use). The fundamental of CWW is that words are converted intomathematical representations using fuzzy sets (Type - I or Type - II) and aCWW engine will solve the problems, the results again will converted intowords. The entire CWW depends on the logic of information theory, morespecifically semantic theory of information, because the CWW engine mustunderstand the word and its meaning/relation to other words to calculatethe solution.

In Psychology

Psychology is a complex science with no definition, it concerns with mind,mental and functions of brain. Information theory can be applied to thecognition process of mind, methods of learning and training process. Thelearning process can be defined as gathering of information and making pat-terns of world, known as perception and intelligence. Information theory canhelp in improving learning and to device training programmes which helpindividuals to learn and understand better.

4.2 Other Related Concepts In Information

Theory - Uncertainty, Complexity and Rep-

resentation

The field of Information theory has progressed leaps and bounds from itsconception in 1940s.Not only the communication part has been in rapiddevelopment, researcher are finding answers of other problems in informationtheory. Following are some examples-

4.2.1 Algorithmic Information Theory

Algorithmic Information Theory (AIT) stems out from works of Godel andTuring. The Incompleteness theorem of Godel states that Infinitely manystatements in Mathematics are true, but they cannot be proved. Turing inother hand states that If a computing machine halts at any input or whetherit continues in infinite loop, no one can decide.

The credit to formalize AIT is given to three Mathematicians - Kol-mogorov, G. Chaitin and Solomonoff. Solomonoff proposed definitions whichare used in AIT and G. Chaitin composed the framework of AIT. Kolmogorovproved concepts related to algorithmic complexity and measures. According

34

to AIT, algorithms encapsulate information. If a programme (running onalgorithm) generates a string then it is producing information in uncom-pressed form.The information content in a string is equivalent to the lengthof the most compressed self contained representation of that string, this selfcontained representation is a programme, which generates the string whenrun. The example of compressed sentence can be given by removing vowelsfrom any sentence in common English. A sufficient knowledgeable personcan complete the sentence. It is same as the concept of Redundancy inShannons Information Theory.

Let U be a universal Turing Machine, p be a binary program and |p| beits length. The halting probability of U is the function of Chaitin.

=

U(p)halts

2|p|

The algorithmic complexity of a finite string s, denoted by K(s) is definedby

K(s) = min{|p| : U(p) = s}A finite string is said to be random if it cannot be compressed to a shorterprogramme i. e. its complexity is equal to its length. The famous result ofChaitin is given by following theorem -

Theorem 1 (Chaitin) Any recursively axiomatizable formalized theory en-ables one to determine only finitely many digits of .

4.2.2 Representational Information Theory

The representational information is the information carried by a finite non-empty set of objects R about its origin i.e. a superset S. RepresentationalInformation Theory (RIT), is a method of defining information in terms ofsubsets of a larger set. It uses elements of Category Theory and MeasureTheory to define information represented by subsets of the parent universalset.

A category is a set of objects which are related in some well - definedway (more specifically a boolean algebraic rule).A Categorical Structure is acategory and a concept function defined on the category. Concept functionsare useful to define attributes of the elements in the set and define the logicalstructure of set adequately. For example if x = Blue, x = Red, y = Circleand y = Square, then a concept function given by boolean expression xy +xy denotes a category consisting of{blue circle,red circle}.

35

Use of Category theory in human cognition and learning process has arich history. RIT uses the similar concepts to develop a theory of infor-mation which can be use in human learning, machine learning, psychology,behavioural sciences and AI.

Representational Information Theory is based on following principles -

1. Human communication is based on mental representations of categories(of objects).

2. The mental representation or formally concept is a well defined categorywhich acts as mediator of information.

3. A concept can be qualitative definition of any environmental variable(moving , non-moving, a thing, or a person).

4. The extent of non-distinguishable objects is characterized (quantita-tively)by the degree of categorical invariance and its cardinality.

In RIT the information carried by R is measured by the change of struc-tural complexity of superset S after removing R. If R carries large informa-tion then removal of R effects the structural complexity of S, greatly. Thusagain the information is equivalent to the surprise element of informationinstance. The difference of RIT from Shannons theory is that it uses con-cept structure rather than symbolic sequences. Also it is based on categorytheory not on probability theory. In RIT an event is not a source of informa-tion, it is represented by set of objects, this structure is helpful in problemsassociated with cognition and learning applications and modelling.

4.2.3 General Information Principle

In 2015, L. A. Zadeh published The Information Principle. It was a firstattempt to define information in generalised way - in terms of restriction. Arestriction is defined for a variable, which delimits the values it can take.Arestriction is usually defined in natural language.There can be possibilisticrestriction, probabilistic restrictions or bimodal restrictions (a combinationof possibilistic and probabilistic restriction). If x is a variable then the pos-sibilistic restriction of x is defined by

R(x) = x is A

where A is a fuzzy set in universe of discourse U , with membership functionA.

36

Similarly the probabilistic restriction of x is delimitation of values of xwith a probability distribution i. e.

R(x) = x is p

where p is a probability distribution.A bimodal restriction of x is combination of possibilistic and probabilistic

restriction, it is defined by

R(x) = Prob(x is A) is B

where A and B are fuzzy sets and usually defined in natural language.According to Zadeh the restrictions are most general form of information,

and can be used to represent semantic information. Thus the possibilisticrestriction about a variable x is possibilistic information about x. Similarlyprobabilistic information about x is the probabilistic restriction on x. Theinformation is fuzzy or crisp, depending upon the restriction set is fuzzy orcrisp. Similar case is true for bimodal information.

The general information principle consists of three important axioms -

1. information is restriction.

2. There are types of information depending upon restriction we choseover the variable - Possibilistic, probabilistic or Bimodal.

3. Possibilistic and probabilistic information are orthogonal to each other,i.e. one is not derivable from other.

Equating information with a restriction provides new insights in infor-mation theory and it can be applied to human and machine learningproblems and A.I.

4.3 Some Open Problems in Information

Theory

In 1945, C. E. Shannons article gave birth to a new area of study -Information theory, now it has grown from unknown to a Scientific dis-cipline, its concepts are being widely used in fields of Science, humani-ties and engineering. Opposite to Shannons view that the theory mustremain strictly in technical domain, it has been accepted and appliedin non -technical subjects like Psychology, Life-Sciences, Management

37

and Economics. The theory of information is applicable in every fieldwhere data are exchanged and informations are communicated.

There are some problems which remain unanswered through this progress,some are very old and some are result of new searching in the field, fol-lowing is the list which by no means is complete and exhaustive -

Improvement of Data compression limit, extending it beyond thelimit of entropy.

Improvement of speed of transmission of information. Theory of multichannel and multi-network information flow. Breaking limits of computation (calculation of ). Search of Complete theory of Information, which includes the tech-

nical, semantic and influential aspects of information.

Search of semantic information theory which can be applied onnatural language.

A theory of human perception based on information, duplicatingit to a machine.

Information reduces uncertainty, thus exploration of generalizeduncertainty principle using information theoretic concepts.

Development of human capabilities of calculation and approxima-tion on uncertain information in machine.

4.3.1 End Note to the Chapter

It is clear by above discussion that information theory has a rich history,a solid ground of concepts and theories and has tools and techniqueswhich can be applied to plethora of both technical and non-technicalproblems. There are avenues of further research and application whichare promising and fruitful.

Information lies in the basic need of humans, communicating to similarentities, exchange of information makes humans the most prominentspecie in this earth.Thus knowing more about information flow and itscommunication is both necessary and important for scientific commu-nity.

***********

38

Date post:	05-Sep-2015
Category:	Documents
Upload:	manu13
View:	9 times
Download:	1 times

Information theory

Documents