+ All Categories
Home > Documents > Molecular Information Theory

Molecular Information Theory

Date post: 12-Jan-2016
Category:
Upload: karl
View: 34 times
Download: 0 times
Share this document with a friend
Description:
Molecular Information Theory. Niru Chennagiri Probability and Statistics Fall 2004 Dr. Michael Partensky. Overview. Why do we study Molecular Info. Theory? What are molecular machines? Power of Logarithm Components of a Communication System Discrete Noiseless System Channel Capacity - PowerPoint PPT Presentation
23
Molecular Information Theory Niru Chennagiri Probability and Statistics Fall 2004 Dr. Michael Partensky
Transcript
Page 1: Molecular Information Theory

Molecular Information Theory

Niru Chennagiri

Probability and Statistics

Fall 2004

Dr. Michael Partensky

Page 2: Molecular Information Theory

Overview

Why do we study Molecular Info. Theory?What are molecular machines?Power of LogarithmComponents of a Communication SystemDiscrete Noiseless SystemChannel CapacityMolecular Machine Capacity

Page 3: Molecular Information Theory

Motivation

Needle in a haystack situation.How will you go about looking for the

needle?How much energy you need to spend?How fast can you find the needle?Haystack = DNA, Needle = Binding site,

You = Ribosome

Page 4: Molecular Information Theory

What is a Molecular Machine?

One or more molecules or a molecular complex: not a macroscopic reaction.

Performs a specific function.Energized before the reaction.Dissipates energy during reaction.Gains information.An isothermal engine.

Page 5: Molecular Information Theory

Where is the candy?

Is it in the left four boxes? Is it in the bottom four boxes? Is it in the front four boxes?

You need answer to three questions to find the candy

Box labels: 000, 001, 010, 011, 100, 101, 110, 111

Need log8 = 3 bits of information

Page 6: Molecular Information Theory

More candies…

Box labels: 00, 01, 10, 11, 00, 01, 10, 11 Candy in both boxes labeled 01.Need only log8 - log2 = 2 bits of

information.

In general,

m boxes with n candies need

log m - log n bits of information

Page 7: Molecular Information Theory

Ribosomes

2600 binding sites from

4.7 million base pairs

Need

log(4.7 million) - log(2600)

= 10.8 bits of information.

Page 8: Molecular Information Theory

Communication System

Page 9: Molecular Information Theory

Information Source

Represented by a stochastic processMathematically a Markov chainWe are interested in ergodic sources: Every

sequence is statistically same as every other sequence.

Page 10: Molecular Information Theory

How much information is produced?

Measure of uncertainty H should be:Continuous in the probability.Monotonic increasing function of the

number of events.When a choice is broken down into two

successive choices, Total H = weighted sum of individual H

Page 11: Molecular Information Theory

Enter Entropy

H=- Kâi=1

n

pi logHpiL

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

1

Page 12: Molecular Information Theory

Properties of EntropyH is zero iff all but one p are zero.H is never negative.H is maximum when all the events are

equally probableIf x and y are two events

H(x,y) H(x) + H(y)Conditional entropy:

H y p i j p jx ii j

( ) ( , ) lo g ( ),

Hx(y) H(y)

Page 13: Molecular Information Theory

Why is entropy important?Entropy is a measure of uncertainty. Entropy relation from thermodynamics

Also from thermodynamics

For every bit of information gained, the machine dissipates kBTln2 joules.

H H HA fter B efore

S k H k RB B ( ln ) ( ln )2 2

Sq

T

Page 14: Molecular Information Theory

Ribosome binding sites

Page 15: Molecular Information Theory

Information in sequence

Position p H Before H After Change inH

1 A-1/2G-1/2

2 1 1

2 U-1 2 0 2

3 G-1 2 0 2

Page 16: Molecular Information Theory

Information curve

H l f b l f b lb A C G T

( ) ( , ) lo g ( , ){ , , , }

R l H lSequence ( ) ( ) 2Information gain for site l is

Plot of this across the sites gives Information curve.For E.Coli, Total information is about 11 bits.… same as what the ribosome needs.

Page 17: Molecular Information Theory

Sequence Logo

Page 18: Molecular Information Theory

Channel capacity

Source transmitting 0 and 1 at 1000 symbols/sec.1 in 100 symbols have an error.What is the rate of transmission?Need to apply a correction correction = uncertainty in x for a given value of y Same as conditional entropy

H xy ( ) ( . lo g . . lo g . ) 0 9 9 0 9 9 0 0 1 0 0 1

= 81 bits/sec

Page 19: Molecular Information Theory

Channel capacity contd.

C M ax H x H xy { ( ) ( )}

Shannon’s theorem:As long as the rate of transmission is below C, the number of errors can me made as small as needed.

For a continuous source with white noise,

C WP

N

lo g 1Signal to noise ratio

Bandwidth

Page 20: Molecular Information Theory

Molecular Machine Capacity

Lock and key mechanism.Each pin on the ribosome is a simple

harmonic oscillator in thermal bath.Velocity of the pins represented by points in

2-d velocity spaceMore pins -> more dimensions.Distribution of points is spherical.

Page 21: Molecular Information Theory

Machine capacity

For larger dimensions:All points are in a thin spherical shellRadius of the shell is the velocity and hencesquare root of the energyBefore binding:

r P Nbefore y y

After Binding:

r Na ftere y

Page 22: Molecular Information Theory

Number of choices = Number of ‘after’ spheres that can sit in the ‘before’ sphere=Vol. of Before sphere/Vol. Of after sphereMachine capacity = logarithm of number of choices

C dP

N

lo g 1

Page 23: Molecular Information Theory

References

Claude Shannon, Mathematical Theory of communication, Reprinted with

corrections from The Bell System Technical Journal,Vol. 27, pp. 379–423, 623–656,

July, October, 1948.

Mathematical Theory of Communication by Claude E. Shannon, Warren Weaver

T. D. Schneider, Sequence Logos, Machine/Channel Capacity, Maxwell's Demon,

and Molecular Computers: a Review of the Theory of Molecular Machines,

Nanotechnology, 5: 1-18, 1994

T. D. Schneider, Theory of Molecular Machines. I. Channel Capacity of Molecular

Machines J. Theor. Biol., 148:, 83-123, 1991

How (and why) to find a needle in a haystack Article in The Economist (April 5th-

11th 1997, British version: p. 105-107, American version: p. 73-75, Asian version: p.

79-81).

http://www.math.tamu.edu/~rahe/Math664/gene1.html

http://www.lecb.ncifcrf.gov/~toms/


Recommended