Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Student’s T Distribution
Prof. Nicholas Zabaras
School of Engineering
University of Warwick
Coventry CV4 7AL
United Kingdom
Email: [email protected]
URL: http://www.zabaras.com/
August 7, 2014
1
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Gamma Distribution as a Conjugate prior for the precision of a Gaussian
Student’s T, Student’s T Approaching a Gaussian
Robustness of Student’s T to Outliers
Multivariate Student’s T
The Laplace Distribution
2
Contents
• Following closely Chris Bishops’ PRML book, Chapter 2
• Kevin Murphy’s, Machine Learning: A probablistic perspective, Chapter 2
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
1/2
1/2 1
0
1/2
1/2 1
1/2 1 1
0
1( | , , ) exp
( ) 2
1 1exp
( ) 2
aa
aa
a
bp x a b z d
a
bz z z dz
a A
Gamma as a Conjugate Prior
3
We have seen that the conjugate prior for the precision of a Gaussian
is given by a Gamma distribution. If we have a univariate Gaussian
N(x|μ, τ−1) together with a prior Gamma(τ |a, b) and we integrate out the
precision, we obtain the marginal distribution of x
Introduce the transformation to simplify as:
1
0
1/22 1
0
( | , , ) | , | ,
exp2 2 ( )
aa b
p x a b x a b d
bx e d
a
N Gamma
21
2
A
z b x
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Student’s T Distribution
4
Recalling the definition of the Gamma function:
It is common to redefine the parameters in this distribution as:
1/2
1/2 1
1/2 1 1
0
1/2 1/22 1/2
0
1 1( | , , ) exp
( ) 2
1 1exp
( ) 2 2
aa
a
aaa
bp x a b z z z dz
a A
bb x z z dz
a
1
0
( ) exp aa z z dz
1/2 1/2
21 1 1( | , , ) ( )
( ) 2 2 2
aabp x a b b x a
a
2 ,a
ab
/2 1/2
21/21
( )2 2( | , , ) 1
( )2
xp x
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Student’s T Distribution
5
The parameter λ is called the precision of the t-distribution, even
though it is not in general equal to the inverse of the variance (see
below on behavior as ν →∞).
The parameter ν is called the degrees of freedom.
For the particular case of ν = 1, the T-distribution reduces to the
Cauchy distribution.
In the limit ν →∞, the t-distribution T(x|μ, λ, ν) becomes a Gaussian
N(x|μ, λ−1) with mean μ and precision λ.
/2 1/2
21/21
( )2 2( | , , ) 1
( )2
xp x
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
For ν →∞, T(x|μ, λ, ν) Becomes a Gaussian
6
We first write the distribution as follows:
For large n we can approximate the log as follows:
In the limit ν →∞, the T-distribution T(x|μ, λ, ν) is indeed a Gaussian
N(x|μ, λ−1) with mean μ and precision λ. The normalization of the T is
valid in this limit as well (so the Gaussian obtained is normalized).
/2 1/2
21/21
( )2 2( | , , ) 1
( )2
xp x
/2 1/2
2 2
1( | , , ) 1 exp ln 1
2
x xx
T
2 2
2 11( | , , ) exp ( ) exp ( )
2 2
x xx O O
T
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Student’s T Distribution
7
/2 1/2
21/21
( )2 2( | , , ) 1
( )2
xp x
-5 -4 -3 -2 -1 0 1 2 3 4 50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4student distribution
v=10
v=1.0
v=0.1
1
0, 1
,
( , )
For we
obtain
n
N
MatLab Code
2
2
: , 1
:
:2
, 22
Mean
Mode
Var
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
We plot:
The mean and variance of the
Student’s is undefined for n=1.
Logs of the PDFs. The Student’s
is NOT log concave.
Run MatLab function studentLaplacePdfPlot
from Kevin Murphys’ PMTK
When n=1, the distribution is
known as Cauchy or Lorentz.
Due to its heavy tails, the mean
does not converge.
Recommended to use n=4.
Student’s T Vs the Gaussian
8
-4 -3 -2 -1 0 1 2 3 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8prob. density functions
Gauss
Student
Laplace
-4 -3 -2 -1 0 1 2 3 4-9
-8
-7
-6
-5
-4
-3
-2
-1
0log prob. density functions
Gauss
Student
Laplace
| 0,1 , ( | 0,1,1), | 0,1/ 2x x xN T Lap
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Student’s T Distribution
9
The Student’s T distribution can be seen from the equation
above is a mixture of infinite Gaussian each of them
with different precision.
The result is a distribution that in general has longer ‘tails’
than a Gaussian.
This gives the T-distribution robustness, i.e. the T-
distribution is much less sensitive than the Gaussian to the
presence of outliers.
1
0
1/22 1
0
( | , , ) | , | ,
exp2 2 ( )
aa b
p x a b x a b d
bx e d
a
N Gamma
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
-10 -8 -6 -4 -2 0 2 4 6 8 10 120
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Robustness of Student’s T Distribution
10
The robustness of the T-distribution is illustrated here by comparing
the maximum likelihood solutions for a Gaussian and a T-distribution
(30 data points from the Gaussian are used). The effect of a small
number of outliers is less significant for the T-distribution than for the
Gaussian.
-10 -8 -6 -4 -2 0 2 4 6 8 10 120
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
t-distribution &
Gaussian nearly
overlap
T-distribution
Gaussian
MatLab Code
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Robustness of Student’s T Distribution
11
The earlier simulation is repeated here with the MPTK toolbox.
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
gaussian
student T
laplace
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
gaussian
student T
laplace
Run MatLab function robustDemo
from Kevin Murphys’ PMTK
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Multivariate Student’s T Distribution
12
If we return to the prior above and substitute ,
and use
we can write the Student’s T distribution as :
This form is useful in providing generalization to a multivatiate Student’s
T
1
0
( | , , ) | , | ,p x a b x a b d
N Gamma
2 , , /a
a b ab
1
0
( | , , ) | , | / 2, / 2x x d
T N Gamma
1
0
( | , , ) | , | / 2, / 2 d
x x T N Gamma
1| ,( )
aa bb
a b ea
Gamma
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Multivariate Student’s T Distribution
13
This integral can be computed analytically as:
One can derive the above form of the distribution by substitution in the
Eq. on the top.
Normalization proof is immediate from the normalization of the normal &
Gamma distributions
1
0
( | , , ) | , | / 2, / 2 d
x x T N Gamma
/2 /21/2 2
/2
2
( )| |2 2( | , , ) 1
( )2
( )
D
D
T
D
x
x - x -
T
Mahalanobis Distance
2
1/2/2
/2 /2 1 /2 /2
/2
0
1/2 1/2/2/2 /2 /2 /2
2 /2 /2 1 2
/2 /2
0
/ 2( | , , )
/ 2 2
/ 2 / 2 / 2/ 2 / 2 1 /
/ 2 / 22
D
D
D DD
D D
e e d
de d
x
T 2/ 2 / 2Use
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Multivariate Student’s T Distribution
14
Some useful results of the multivatiate Student’s T are given below:
One can show easily the expression for the mean by using x=z+:
The 1st term drops out since is even. The 2nd term gives
from the normalization of the distribution.
The covariance is computed as:
/2 /21/2 2
/2
( )| |2 2( | , , ) 1
( )2
D
D
D
x
T
11, 2,2
if cov if mode
x x x
/2 /2
1/2 2
/2
( )| |2 2 1
( )2
D
D
D
d
x z z
( | , , )z T
/2
1 1 /2 1 /2
0 0
/2
1 1 1 1
/2 /2 1
/ 2cov | , | / 2, / 2
/ 2
/ 2 / 2 1 / 2 / 2 1 / 2
/ 2 / 2 / 2 1 2/ 2 / 2
Td d e d
x
x x x - x - x
N Gamma
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Multivariate Student’s T Distribution
15
Differentiation with respect to x also shows the mode being :
The Student’s T has fatter tails than a Gaussian. The smaller n is the
fatter the tails.
For n ∞, the distribution approaches a Gaussian. Indeed note that:
The distribution can also be written in terms of S=-1 (scale matrix – not
the covariance) or V=nS.
/2 /21/2 2
/2
( )| |2 2( | , , ) 1
( )2
D
D
D
x
T
11, 2,2
if cov if mode
x x x
/2 /2 2
2 2 2 2 211
1 exp ln 1 exp exp2 2 2 2 2
D
DO
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
The Laplace Distribution Another distribution with heavy tails is the Laplace distribution, also
known as the double sided exponential distribution. It has the following
pdf:
is a location parameter and b > 0 is a scale parameter
Its robust to outliers (see an earlier demonstration).
It puts mores probability density at 0 than the Gaussian. This property
is a useful way to encourage sparsity in a model.
1
| ,2
x
bx b eb
Lap
2, , 2Mean Mode Var b
16