+ All Categories
Home > Documents > 2_BasicInformationTheory

2_BasicInformationTheory

Date post: 14-Apr-2018
Category:
Upload: khandai-seenanan
View: 214 times
Download: 0 times
Share this document with a friend

of 31

Transcript
  • 7/30/2019 2_BasicInformationTheory

    1/31

    Lecture 2:

    Basic Information Theory

    TSBK01 Image Coding and Data Compression

    Jrgen Ahlberg

    Div. of Sensor Technology

    Swedish Defence Research Agency (FOI)

  • 7/30/2019 2_BasicInformationTheory

    2/31

    Today

    1. What is information theory about?

    2. Stochastic (information) sources.

    3. Information and entropy.4. Entropy for stochastic sources.

    5. The source coding theorem.

  • 7/30/2019 2_BasicInformationTheory

    3/31

    Part 1: Information Theory

    Claude Shannon: A Mathematical Theory of Communication

    The

    Bell System Technical Journal, 1948

    Sometimes referred to as Shannon-Weaver, since

    the standalone pub

    lication has a foreword by Weaver.Be careful!

  • 7/30/2019 2_BasicInformationTheory

    4/31

    Quotes about Shannon

    What is information? Sidestepping questions aboutmeaning, Shannon showed that it is a measurablecommodity.

    Today, Shannons insight help shape virtually all systemsthat store, process, or transmit information in digital form,from compact discs to computers, from facsimile machinesto deep space probes.

    Information theory has also infilitrated fields outsidecommunications, including linguistics, psychology,economics, biology, even the arts.

  • 7/30/2019 2_BasicInformationTheory

    5/31

  • 7/30/2019 2_BasicInformationTheory

    6/31

  • 7/30/2019 2_BasicInformationTheory

    7/31

  • 7/30/2019 2_BasicInformationTheory

    8/31

    Part 2: Stochastic sources

    A source outputs symbolsX1,X2, ...

    Each symbol take its value from an alphabetA

    = (a1, a2, ). Model:P(X1,,XN) assumed to be known for

    all combinations.

    Source X1,X2,

    Example 1: A textis a sequence of symbols

    each taking its value from the alphabetA= (a, , z, A, , Z, 1, 2, 9, !, ?, ).

    Example 2: A (digitized) grayscale image is a

    sequence of symbols each taking its valuefrom the alphabet A= (0,1) orA= (0, , 255).

  • 7/30/2019 2_BasicInformationTheory

    9/31

    Two Special Cases

    1. The Memoryless Source

    Each symbol independent of the previous

    ones.

    P(X1,X2, ,Xn) =P(X1) P(X2) P(Xn)

    2. The Markov Source

    Each symbol depends on the previous one.

    P(X1,X2, ,Xn)=P(X1) P(X2|X1) P(X3|X2)

    P(Xn|Xn-1)

  • 7/30/2019 2_BasicInformationTheory

    10/31

    The Markov Source

    A symbol depends only on the previous

    symbol, so the source can be modelled by a

    state diagram.

    a

    b

    c

    1.00.5

    0.7

    0.3

    0.3

    0.2

    A ternary source withalphabet A= (a, b, c).

  • 7/30/2019 2_BasicInformationTheory

    11/31

    The Markov Source

    Assume we are in state a, i.e.,Xk= a.

    The probabilities for the next symbol are:

    a

    b

    c

    1.00.5

    0.7

    0.3

    0.3

    0.2

    P(Xk+1 = a | Xk= a) = 0.3

    P(Xk+1 = b | Xk= a) = 0.7

    P(Xk+1 = c | Xk= a) = 0

  • 7/30/2019 2_BasicInformationTheory

    12/31

    The Markov Source

    So, ifXk+1 = b, we know thatXk+2 will

    equal c.

    a

    b

    c

    1.00.5

    0.7

    0.3

    0.3

    0.2

    P(Xk+2 = a | Xk+1 = b) = 0

    P(Xk+2 = b | Xk+1 = b) = 0

    P(Xk+2 = c | Xk+1 = b) = 1

  • 7/30/2019 2_BasicInformationTheory

    13/31

  • 7/30/2019 2_BasicInformationTheory

    14/31

    Analysis and Synthesis

    Stochastic models can be used foranalysinga source.

    Find a model that well represents the real-world

    source, and then analyse the model instead ofthe real world.

    Stochastic models can be used forsynthesizinga source.

    Use a random number generator in each step ofa Markov model to generate a sequencesimulating the source.

  • 7/30/2019 2_BasicInformationTheory

    15/31Show plastic slides!

  • 7/30/2019 2_BasicInformationTheory

    16/31

    Part 3: Information and Entropy

    Assume a binary memoryless source, e.g., a flip ofa coin. How much information do we receive whenwe are told that the outcome is heads?

    If its a fair coin, i.e.,P(heads) =P(tails) = 0.5, we saythat the amount of information is 1 bit.

    If we already know that it will be (or was) heads, i.e.,P(heads) = 1, the amount of information is zero!

    If the coin is not fair, e.g.,P(heads) = 0.9, the amount of

    information is more than zero but less than one bit! Intuitively, the amount of information received is the

    same ifP(heads) = 0.9 orP(heads) = 0.1.

  • 7/30/2019 2_BasicInformationTheory

    17/31

    Self Information

    So, lets look at it the way Shannon did.

    Assume a memoryless source with

    alphabet A= (a1, , an) symbol probabilities (p1, , pn).

    How much information do we get when

    finding out that the next symbol is ai? According to Shannon the self information of

    ai is

  • 7/30/2019 2_BasicInformationTheory

    18/31

    Why?

    Assume two independent eventsA andB, withprobabilities P(A) = pA andP(B) = pB.

    For both the events to happen, the probability isp

    Ap

    B. However, the amount of information

    should be added, not multiplied.

    Logarithms satisfy this!

    No, we want the information to increase with

    decreasing probabilities, so lets use the negative

    logarithm.

  • 7/30/2019 2_BasicInformationTheory

    19/31

    Self Information

    Example 1:

    Example 2:

    Which logarithm? Pick the one you like! If you pick the natural log,

    youll measure in nats, if you pick the 10-log, youll get Hartleys,

    if you pick the 2-log (like everyone else), youll get bits.

  • 7/30/2019 2_BasicInformationTheory

    20/31

  • 7/30/2019 2_BasicInformationTheory

    21/31

    Entropy

    Example:Binary Memoryless Source

    BMS 0 1 1 0 1 0 0 0

    1

    0 0.5 1

    The uncertainty (information) is greatest when

    Then

    Let

  • 7/30/2019 2_BasicInformationTheory

    22/31

    Entropy: Three properties

    1. It can be shown that 0 Hlog N.

    2. Maximum entropy (H = log N) is reached

    when all symbols are equiprobable, i.e.,pi = 1/N.

    3. The difference log NHis called the

    redundancyof the source.

  • 7/30/2019 2_BasicInformationTheory

    23/31

    Part 4: Entropy for Memory Sources

    Assume a block of source symbols (X1, ,

    Xn) and define the block entropy:

    The entropy for a memory source is defined

    as:

    That is, the summation is done over all possible combinations ofn symbols.

    That is, let the block length go towards infintity.

    Divide by n to get the number ofbits / symbol.

  • 7/30/2019 2_BasicInformationTheory

    24/31

    Entropy for a Markov Source

    The entropy for a state Sk can be expressed as

    Averaging over all states, we get the

    entropy for the Markov source as

    Pkl is the transition probability from state kto state l.

  • 7/30/2019 2_BasicInformationTheory

    25/31

    The Run-length Source

    Certain sources generate long runs orbursts ofequal symbols.

    Example:

    Probability for a burst of length r:P(r) = (1-)r-1

    Entropy:HR = - r=11P(r) logP(r)

    If the average run length is , thenHR/ =HM.

    A B

    1-1-

  • 7/30/2019 2_BasicInformationTheory

    26/31

    Part 5: The Source Coding Theorem

    The entropy is the smallest number of bitsallowing error-free representation of the source.

    Why is this? Lets take a look on typical sequences!

  • 7/30/2019 2_BasicInformationTheory

    27/31

    Typical Sequences

    Assume a longsequence from a binarymemoryless source withP(1) =p.

    Among n bits, there will be approximately

    w = n

    pones. Thus, there isM= (n overw) such typical

    sequences!

    Only these sequences are interesting. Allother sequences will appear with smallerprobability the larger is n.

  • 7/30/2019 2_BasicInformationTheory

    28/31

    How many are the typical

    sequences?

    bits/symbol

    Enumeration needs logMbits, i.e,

    bits per symbol!

  • 7/30/2019 2_BasicInformationTheory

    29/31

    How many bits do we need?

    Thus, we needH(X) bits per symbol

    to code any typical sequence!

  • 7/30/2019 2_BasicInformationTheory

    30/31

    The Source Coding Theorem

    Does tell us

    that we can represent the output from a source

    XusingH(X) bits/symbol.

    that we cannot do better.

    Does not tell us

    how to do it.

  • 7/30/2019 2_BasicInformationTheory

    31/31

    Summary

    The mathematical model of communication.

    Source, source coder, channel coder, channel,

    Rate, entropy, channel capacity.

    Information theoretical entities Information, self-information, uncertainty, entropy.

    Sources

    BMS, Markov, RL The Source Coding Theorem