Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 213 times |
Download: | 0 times |
Today:
Entropy
<break>
Information Theory
Information Theory
Claude Shannon Ph.D.1916-2001
€
I x;y( ) = H x( ) − H x | y( )
€
I x;y( ) = H x( ) − H x | y( )
€
H x( ) = − p x i( )log2 p x i( )[ ]∑
Entropy
€
H x( ) = p x i( )log2 p x i( )[ ]∑
Entropy
A measure of the disorder in a system
EntropyThe (average) number of yes/no
questions needed to completely specify the state of a
system
The (average) number of yes/no questions needed
to completely specify the state of a system
What if there were two coins?
What if there were two coins?
What if there were two coins?
What if there were two coins?
2 states. 1 question.
4 states. 2 questions.
8 states. 3 questions.
16 states. 4 questions.
number of states = 2
number of yes-no questions
number of states = 2
number of yes-no questions
log2(number of states) =
number of yes-no questions
€
H = log2 n[ ]
H is entropy, the number of yes-no questions required to specify the state of the system
n is the number of states of the system, assumed (for now) to be equally likely
€
H = log2 n[ ]
Consider Dice
The Six Sided Die
H = log2(6) = 2.585 bits
The Four Sided Die
H = log2(4) = 2.000 bits
The Twenty Sided Die
H = log2(20) = 4.322 bits
What about all three dice?
H = log2(4620)
What about all three dice?
H = log2(4)+log2(6)+log2(20)
What about all three dice?
H = 8.907 bits
What about all three dice?
Entropy, from independent elements of a system, adds
€
H = log2 n[ ]
Let’s the rewrite this a bit...Trivial Fact 1: log2(x) = - log2(1/x)
€
H = −log2
1
n
⎡ ⎣ ⎢
⎤ ⎦ ⎥
Trivial Fact 1: log2(x) = - log2(1/x)
Trivial Fact 2:if there are n equally likely
possibilites p = (1/n)
€
H = −log2 p[ ]
Trivial Fact 2:if there are n equally likely
possibilites p = (1/n)
€
H = −log2 p[ ]
€
H = −log2 p[ ]
What if the n statesare not equally
probable?Maybe we should use the
expected value of the entropies,a weighted average by probability
€
H = − pi log2 pi[ ]i=1
n
∑
Let’s do a simple example:n = 2 , how does H change as we
vary p1 and p2 ?
€
H = − pi log2 pi[ ]i=1
n
∑
n = 2
p1 + p2 = 1
how about n = 3€
H = − pi log2 pi[ ]i=1
n
∑
n = 3
p1 + p2 + p3 = 1
The bottom line intuitions for Entropy:
• Entropy is a statistic for describing a probability distribution.
• Probabilities distributions which are flat, broad, sparse, etc. have HIGH entropy.
• Probability distributions which are peaked, sharp, narrow, compact etc. have LOW entropy.
• Entropy adds for independent elements of a system, thus entropy grows with the dimensionality of the probability distribution.
• Entropy is zero IFF the system is in a definite state, i.e. p = 1 somewhere and 0 everywhere else.
Pop Quiz:
1. 2.
3. 4.
EntropyThe (average) number of yes/no
questions needed to completely specify the state of a
system
11:16 am (Pacific) on June 29th of the year 2001,
there were approximately 816,119 words in the English Language
H(english) = 19.6 bits
Twenty Questions:220 = 1,048,576
What’s a winning 20 Questions Strategy?
<break>
€
I x;y( ) = H x( ) − H x | y( )
So, what is information?
It’s a change in what you don’t know.
It’s a change in the entropy.
x yInformation as a measure of correlation
x yInformation as a measure of correlation
heads tails
pro
bab
ilit
y
0
1
1/2
P(Y)
I (X;Y) = H(Y) - H(Y|X) = 0 bits
heads tails
pro
bab
ilit
y
0
1
1/2
P(Y|x=heads )H(Y|x=heads) = 1H(Y) = 1
x yInformation as a measure of correlation
x yInformation as a measure of correlation
heads tails
pro
bab
ilit
y
0
1
1/2
P(Y)
I (X;Y) = H(Y) - H(Y|X) ~ 1 bit
heads tails
pro
bab
ilit
y
0
1
1/2
P(Y|x=heads )H(Y|x=heads) ~ 0H(Y) = 1
x yInformation Theory in Neuroscience
The Critical Observation:
Information is Mutual
I(X;Y) = I(Y;X)
H(Y)-H(Y|X) = H(X)-H(X|Y)
The Critical Observation:
What a spike tells the Brain about the stimulus,
is the same as what our stimulus choice tells us about the likelihood
of a spike.
I(Stimulus;Spike) = I(Spike;Stimulus)
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
The Critical Observation:
What our stimulus choice tells us about the likelihood of a spike.
stimulus response
This, we can measure....
Show your system stimuli.Measure neural
responses.P( neural response | stimulus presented )Estimate: P( neural repsones )From that,
Estimate:Compute: H(neural response) and H(neural response | stimulus presented)
Calculate: I(response ; stimulus)
How to use Information Theory:
Choose stimuli which are not representative.Measure the “wrong” aspect of the response.Don’t take enough data to estimate P( ) well. Use a crappy method of computing H( ).Calculate I( ) and report it without comparing it to anything...
How to screw it up:
Here’s an example of Information Theory applied appropriately
Temporal Coding of Visual Information in the Thalamus Pamela Reinagel and R. Clay Reid
J. Neurosci. 20(14):5392-5400. (2000)
LGN responses are very reliable.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Is there information in thetemporal pattern of spikes?
x yPatterns of Spikes in the LGN
…….0……..…….1……..
spikes
x yPatterns of Spikes in the LGN
…….00……..…….10…….. …….01……..…….11……..
spikes
x yPatterns of Spikes in the LGN
…….000……..…….101…….. …….011……..…….100……..
spikes
x yPatterns of Spikes in the LGN
…….000100……..…….101101…….. …….011110……..…….010001……..
spikes
P( spike pattern)
P( spike pattern | stimulus )
There is some extra Information in Temporal Patterns of spikes.
Claude Shannon Ph.D.1916-2001
Prof. Tom CoverEE376A & B
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.