Kirk20130318 Book Club StochProcesses

7/25/2019 Kirk20130318 Book Club StochProcesses

1/29

Stochastic Processes for PhysicistsUnderstanding Noisy Systems

Chapter 1: A review of probability theory

Paul KirkDivision of Molecular Biosciences, Imperial College London

19/03/2013


2/29

1.1 Random variables and mutually exclusive events

Random variables

Suppose we do not know the precise value of a variable, but mayhave an idea of the relative likelihood that it will have one of anumber of possible values.

Let us call the unknown quantity X.

This quantity is referred to as arandom variable.

Probability

6-sided die. Let Xbe the value we get when we roll the die.

Describe the likelihood that Xwill have each of the values 1, . . . , 6by a number between 0 and 1: the probability.

IfProb(X= 3) = 1, then we will always get a 3.

IfProb(X= 3) = 2/3, then we expect to get a 3 about two-thirdsof the time.

Paul Kirk 1 of 22


3/29


Mutually exclusive events

The various values ofXare an example ofmutually exclusiveevents.

Xcan take precisely one of the values between 1 and 6.

Mutually exclusive probabilities sum

Prob(X= 3 or X= 4) = Prob(X= 3) +Prob(X = 4).

Paul Kirk 2 of 22


4/29


Note: in mathematics texts it is customary to denote the unknownquantity using a capital letter, say X, and a variable that specifies oneof the possible values that Xmay have as the equivalent lower-caseletter, x. We will use this convention in this chapter, but in thefollowing chapters we will use a lower-case letter for both the unknown

quantity and the values it can take, since it causes no confusion.

So, rather than writing Prob(X= 3) or Prob(X =x), we will (in laterchapters) tend to write Prob(3) or Prob(x).

Paul Kirk 3 of 22


5/29


Note: in mathematics texts it is customary to denote the unknownquantity using a capital letter, say X, and a variable that specifies oneof the possible values that Xmay have as the equivalent lower-caseletter, x. We will use this convention in this chapter, but in thefollowing chapters we will use a lower-case letter for both the unknown

quantity and the values it can take, since it causes no confusion.

So, rather than writing Prob(X= 3) or Prob(X =x), we will (in laterchapters) tend to write Prob(3) or Prob(x).

Warning: may cause confusion.

Paul Kirk 3 of 22


6/29


Continuous random variables

For continuous random variables, the probability for X to be withina range is found by integrating the probability density function.

Prob(a


7/29


Expectation

Theexpectationof an arbitrary function, f(X), with respect to theprobability density function P(X) is

f(X)P(X)=

P(x)f(x)dx.

Themeanorexpected valueofX is X.

Paul Kirk 5 of 22


8/29


Variance

ThevarianceofXis the expectation of the squared difference fromthe mean

V[X] =

P(x)(x X)2dx

=

P(x)(x2 +X2 2xX)dx

=

P(x)x2dx+

P(x)X2dx

P(x)2xXdx

=X2+

X2

P(x)dx

2X

P(x)xdx

=X2+X2 2X2

=X2 X2

Paul Kirk 6 of 22


9/29

1.2 Independence

Independence

Independent probabilities multiply

For independent variables, PX,Y(x, y) =PX(x)PY(y)

For independent variables, XY=XY

Paul Kirk 7 of 22


10/29

1.3 Dependent random variables

Dependence

IfX and Y are dependent, then PX,Y(x, y) does not factor as theproduct ofPX(x) and PY(y)

If we know PX,Y(x, y) and want to know PX(x), then it is obtainedbyintegrating out(ormarginalising) the other variable:

PX(x) =

PX,Y(x, y)dy

Paul Kirk 8 of 22


11/29

1.3 Dependent random variables

Conditional probability densities

The probability density for Xgiven that we know that Y =y iswritten P(X =x|Y =y) orP(x|y) and is referred to as theconditional probability density for X given Y.

PX|Y(X =x|Y =y) = PX,Y(X =x,Y =y)PY(Y =y) .

Explanation: To see how to calculate this conditional probability, we

note first that P(x, y) with y=a gives the relative probability fordifferent values ofxgiven that Y =a. To obtain the conditionalprobability density for X given that Y =a, all we have to do is divideP(x, a) by its integral over all values ofx.

Paul Kirk 9 of 22

C


12/29

1.4 Correlations and correlation coefficients

ThecovarianceofX and Y is:

cov(X,Y) =(X X)(Y Y)=XY XY.

Idea:

1. How can we define what it means for a value xto be bigger thanusual? Well, we can see ifx>X i.e. ifx X> 0.

2. Similarly, we can say that a value x is smaller than usual ifx0

Thecorrelationis just a normalised version of the covariance,

which takes values in the range 1 to 1:

CXY = XY XY

V[X]V[Y]

Paul Kirk 10 of 22

1 Addi d i bl h


13/29

1.5 Adding random variables together

When we have two continuous random variables, X and Y, withprobability densities PX and PY, it is often useful to be able tocalculate the probability density of the random variable whose value isthe sum of them: Z =X+Y. It turns out that the probabilitydensity for Z is given by

PZ(z) = PX(sz)PY(s)ds=PXPY

Paul Kirk 11 of 22

1 5 Addi d i bl h


14/29




Paul Kirk 11 of 22

1 5 Addi d i bl t th


15/29




PZ(z) =

PX,Y(zs, s)ds=

PX,Y(s, zs)ds

Paul Kirk 11 of 22

1 5 Addi d i bl t th


16/29




PZ(z) =

PX,Y(zs, s)ds=

PX,Y(s, zs)ds

IfX

andY

areindependent, this becomes:

PZ(z) =

PX(zs)PY(s)ds=PXPY

Paul Kirk 11 of 22

1 5 Addi d i bl s t th


17/29


IfX1 and X2 are random variables and X =X1+X2, then

X= X1+X2,

andifX1 and X2 are independent, then

V[X] =V[X1] +V[X2].

Mysterious (?) assertion

Averaging the results of a number of independent measurementsproduces a more accurate result. This is because the variances of thedifferent measurements add together. does this make sense?

Paul Kirk 12 of 22

1 5 Adding random variables together


18/29


Explanation

Assume all measurements have expectation, , and variance, 2

.By the independence assumption, the variance of the average is:

V

N

n=1Xn

N

=

N

n=1 VXn

N .Moreover,

VXn

N =EX2nN2 E

Xn

N 2

= E

X2n

(E[Xn])

2

N2 =

V[Xn]

N2 .

So,

V

Nn=1

Xn

N

=

Nn=1

V[Xn]

N2 =

1

N2

Nn=1

2 =2

N.

Paul Kirk 13 of 22

1 6 Transformations of a random variable


19/29

1.6 Transformations of a random variable

Key assertion: IfY =g(X), then:

f(Y)= x=bx=a

PX(x)f(g(x))dx= y=g(b)y=g(a)

PY(y)f(y)dy.

Paul Kirk 14 of 22



20/29



f(Y)= x=bx=a


PY(y)f(y)dy.

Given this assumption, everything else falls out automatically:

f(Y)= x=bx=a


PX(g1(y))f(y)dx

dydy

=

y=g(b)y=g(a)

PX(g1(y))

g(g1(y))f(y)dy.

Paul Kirk 14 of 22



21/29



f(Y)= x=bx=a


PY(y)f(y)dy.

Given this assumption, everything else falls out automatically:

f(Y)= x=bx=a


PX(g1(y))f(y)dx

dydy

=

y=g(b)y=g(a)

PX(g1(y))

g(g1(y))f(y)dy.

General result (for invertible g):

PY(y) = PX(g

1(y))

|g(g1(y))|.

Paul Kirk 14 of 22

1 7 The distribution function


22/29

1.7 The distribution function

The probability distribution function, which we will call D(x), of arandom variable Xis defined as the probability that X is less than orequal to x. Thus

D(x) =Prob(X x) = x P(z)dz

In addition, the fundamental theorem of calculus tells us that

P(x) = d

dx

D(x).

Paul Kirk 15 of 22

1 8 The characteristic function


23/29

1.8 The characteristic function

The characteristic functionis defined as theFourier transformof theprobability density.

(s) =

P(x)exp(isX)dx.

The inverse transform gives:

P(x) = 12

(s)exp(isx)ds.

The Fourier transform of the convolution of two functions, P(x) andQ(x), is the product of their Fourier transforms, P(s) and Q(s).

For discrete random variables, the characteristic function is a sum.In general (for both discrete and continuous r.v.s), we have:

P(s) =exp(isx)P(X).

Paul Kirk 16 of 22

1 9 Moments and cumulants


24/29

1.9 Moments and cumulants

Moment generating function(departure from the book)

The moment generating function is defined as:

M(t) =exp(tX),

where X is a random variable, and the expectation is with respect tosome density P(X), so that

M(t) =

exp(tx)P(x)dx

=

1 +tx+ 1

2!

t2x2 + ...P(x)dx= 1 +tm1+

1

2!t2m2+ . . . +

1

r!trmr+ . . .

where mr =Xr is the r-th (raw) moment.

Paul Kirk 17 of 22

1 9 Moments and cumulants


25/29


Moment generating function(continued)

M(t) = 1 +tm1+ 1

2!t2m2+ . . . +

1

r!trmr+ . . .

It follows from the above expansion that:

M(0) = 1

M(0) =m1

M(0) =m2

...

M(r)(0) =mr

Paul Kirk 18 of 22



26/29


Cumulant generating function(continued departure from book)

The log of the moment generating function is called the cumulantgenerating function,R(t) = ln(M(t)).

By the chain rule of differentiation, we can write down the derivativesofR(t) in terms of the derivatives ofM(t), e.g.

R(t) = M(t)

M(t)

R(t) = M(t)M(t)(M(t))2

(M(t))2

Note that R(0) = 1, R(0) =M(0) =m1 =,R(0) =M(0)(M(0))2 =m2 m

21 =

2, . . .These are thecumulants.

Paul Kirk 19 of 22



27/29

9

The moments can be calculated from the derivatives of the

characteristic function, evaluated at s= 0. We can see this byexpanding the characteristic function as a Taylor series:

(s) =

n=0

(n)(0)sn

n!

where (n)(s) is the n-th derivative of(s). But we also have:

(s) =eisX=

n=0

(isX)n

n!=

n=0

inXnsn

n!

Equating the 2 expressions, we get: Xn= (n)(0)in

Paul Kirk 20 of 22



28/29

Cumulants

The n-th order cumulant ofX, is the n-th derivative of the logof thecharacteristic function.

Forindependentrandom variables, X and Y, ifZ =X+Y then then-th cumulant ofZ is the sum of the n-th cumulants ofX and Y.

The Gaussian distribution is also the only absolutely continuousdistribution all of whose cumulants beyond the first two (i.e. other

than the mean and variance) are zero.

Paul Kirk 21 of 22

1.10 The multivariate Gaussian


29/29

Let x= [x!, . . . , xN], then the general form of the Gaussian pdf is:

P(x) = 1(2)Ndet()

exp

1

2(x )1(x )

,

where is the mean vector and is the covariance matrix.

All higher moments of a Gaussian can be written in terms of themeans and covariances. Defining X X X, for a 1-dimensionalGaussian we have:

X2n

=

(2n1)!(V[X])n

2n1(n1)!

X2n1= 0

Paul Kirk 22 of 22

Date post:	24-Feb-2018
Category:	Documents
Upload:	jolo-candare
View:	214 times
Download:	0 times

Kirk20130318 Book Club StochProcesses

Documents