Post on 20-Dec-2015
transcript
1
Sampling Bayesian Networks
ICS 295
2008
4
Algorithm Tree
5
Sampling Fundamentals
)(
)()(XD
dxXxggE
Given a set of variables X = {X1, X2, … Xn}, a joint probability distribution (X) and some function g(X), we can compute expected value of g(X) :
)(
)()(XDx
xpxggE
6
Sampling From (X)
Given independent, identically distributed samples (iid) S1, S2, …ST from (X), it follows from Strong Law of Large Numbers:
T
t
tSgT
g1
)(1
},...,,{ 21tn
ttt xxxS A sample St is an instantiation:
7
Sampling Challenge
It is hard to generate samples from (X) Trade-Offs:
Generate samples from Q(X) Forward Sampling, Likelyhood Weighting, IS Try to find Q(X) close to (X)
Generate dependent samples forming a Markov Chain from P’(X)(X)
Metropolis, Metropolis-Hastings, Gibbs Try to reduce dependence between samples
8
Markov Chain
A sequence of random values x0,x1,… , defined on a finite state space D(X), is called a Markov Chain if it satisfies the Markov Property:
)|(),...,|( 101 xxyxPzxxxyxP tttt
If P(xt+1 =y |xt) does not change with t (time homogeneous), then it is often expressed as a transition function, A(x,y)
Liu, Ch.12, p 245
1),( y
yxA
9
Markov Chain Monte Carlo
First, define a transition probability P(xt+1=y|xt)
Pick initial state x0, usually not important because it becomes “forgotten”
Generate samples x1, x2,… sampling each next value from P(X| xt)
x0 x1 xt xt+1
If we choose proper P(xt+1|xt), we can guarantee that the distribution represented by samples x0,x1,… converges to (X)
10
Markov Chain Properties
Irreducibility Periodicity Recurrence Revercibility Ergodicity Stationary Distribution
11
Irreducible
A station x is said to be irreducible if under the transition rule one has nonzero probability of moving from x to any other state and then coming back in a finite number of steps.
If on state is irreducible, then all the sates must be irreducible.
Liu, Ch. 12, pp. 249, Def. 12.1.1
12
Aperiodic
A state x is aperiodic if the greatest common divider of {n : An(x,x) > 0} is 1.
If state x is aperiodic and the chain is irreducible, then every state must be aperiodic.
Liu, Ch. 12, pp.240-250, Def. 12.1.1
13
Recurrence
A state x is recurrent if the chain returns to x with probability 1
State x is recurrent if and only if:
Let M(x) be the expected number of steps to return to state x
State x is positive recurrent if M(x) is finite The recurrent states in a finite state chain are
positive recurrent.
0
)(
n
niip
14
Ergodicity
A state x is ergodic if it is aperiodic and positive recurrent.
If all states in a Markov chain are ergodic then the chain is ergodic.
15
Reversibility
Detail balance condition:
Markov chain is reversible if there is a such that:
For a reversible Markov chain, is always a stationary distribution.
)()|()()|( 111 jxPjxixPixPixjxP tttttt
jijiji pp
16
Stationary Distribution
If the Markov chain is time-homogeneous, then the vector (X) is a stationary distribution (aka invariant or equilibrium distribution, aka “fixed point”), if its entries sum up to 1 and satisfy:
)(
),()()(XDy
xyAyx
An irreducible chain has a stationary distribution if and only if all of its states are positive recurrent. The distribution is unique.
17
Stationary Distribution In Finite State Space
Stationary distribution always exists but may not be unique
If a finite-state Markov chain is irreducible and aperiodic, it is guaranteed to be unique and A(n)=P(xn = y | x0) converges to a rank-one matrix in which each row is the stationary distribution .
Thus, initial state x0 is not important for convergence: it gets forgotten and we start sampling from target distribution
However, it is important how long it takes to forget it!
18
Convergence Theorem
Given a finite state Markov Chain whose transition function is irreducible and aperiodic, then An(x0,y) converges to its invariant distribution (y) geometrically in variation distance, then there exists a 0 < r < 1 and c > 0 s.t.:
nn crxA var
),(
19
Eigen-Value Condition
Convergence to stationary distribution is driven by eigen-values of matrix A(x,y).
“The chain will converge to its unique invariant distribution if and only if matrix A’s second largest eigen-value in modular is strictly less than 1.”
Many proofs of convergence are centered around analyzing second eigen-value.
Liu, Ch. 12, p. 249
20
Convergence In Finite State Space
Assume a finite-state Markov chain is irreducible and aperiodic
Initial state x0 is not important for convergence: it gets forgotten and we start sampling from target distribution
However, it is important how long it takes to forget it! – known as burn-in time
Since the first k states are not drown exactly from , they are often thrown away. Open question: how big a k ?
21
Sampling in BN
Same Idea: generate a set of samples T
Estimate P(Xi|E) from samples Challenge: X is a vector and P(X) is a
huge distribution represented by BN Need to know:
How to generate a new sample ? How many samples T do we need ? How to estimate P(E=e) and P(Xi|e) ?
23
Sampling Algorithms
Forward Sampling Gibbs Sampling (MCMC)
Blocking Rao-Blackwellised
Likelihood Weighting Importance Sampling Sequential Monte-Carlo (Particle
Filtering) in Dynamic Bayesian Networks
24
Gibbs Sampling
Markov Chain Monte Carlo method(Gelfand and Smith, 1990, Smith and Roberts, 1993, Tierney, 1994)
Transition probability equals the conditional distribution
Example: (X,Y), A(xt+1|yt)=P(x|y), A(yt+1|xt) = P(y|x)
x0
y0
x1
y1
25
Gibbs Sampling for BN
Samples are dependent, form Markov Chain Sample from P’(X|e) which converges to P(X|
e) Guaranteed to converge when all P > 0 Methods to improve convergence:
Blocking Rao-Blackwellised
Error Bounds Lag-t autocovariance Multiple Chains, Chebyshev’s Inequality
26
Gibbs Sampling (Pearl, 1988)
A sample t[1,2,…],is an instantiation of all variables in the network:
Sampling process Fix values of observed variables e Instantiate node values in sample x0 at
random Generate samples x1,x2,…xT from P(x|e) Compute posteriors from samples
},...,,{ 2211tNN
ttt xXxXxXx
27
Ordered Gibbs Sampler
Generate sample xt+1 from xt :
In short, for i=1 to N:),\|(
),,...,,|(
...
),,...,,|(
),,...,,|(
1
11
12
11
1
31
121
22
3211
11
exxxPxX
exxxxPxX
exxxxPxX
exxxxPxX
it
itii
tN
ttN
tNN
tN
ttt
tN
ttt
from sampled
ProcessAllVariablesIn SomeOrder
28
Gibbs Sampling (cont’d)(Pearl, 1988)
ij chX
jjiiit
i paxPpaxPxxxP )|()|()\|(
:)\|( )\|( :Important it
iit
i xmarkovxPxxxP
iX )()( jj chX
jiii pachpaXM
Markov blanket:
nodesother all oft independen is parents), their andchildren, (parents,
Given
iX
blanketMarkov
29
Ordered Gibbs Sampling Algorithm
Input: X, EOutput: T samples {xt } Fix evidence E Generate samples from P(X | E)1. For t = 1 to T (compute samples)2. For i = 1 to N (loop through variables)3. Xi sample xi
t from P(Xi | markovt \ Xi)
Answering Queries
Query: P(xi |e) = ? Method 1: count #of samples where Xi=xi:
Method 2: average probability (mixture estimator):
T
t it
iiii XmarkovxXPT
xXP1
)\|(1
)(
T
xXsamplesxXP ii
ii
)(#)(
31
Gibbs Sampling Example - BN
X = {X1,X2,…,X9}
E = {X9}X1
X4
X8 X5 X2
X3
X9 X7
X6
32
Gibbs Sampling Example - BN
X1 = x10
X6 = x60
X2 = x20
X7 = x70
X3 = x30
X8 = x80
X4 = x40
X5 = x50
X1
X4
X8 X5 X2
X3
X9 X7
X6
33
Gibbs Sampling Example - BN
X1 P (X1 |X0
2,…,X0
8 ,X9}
E = {X9}X1
X4
X8 X5 X2
X3
X9 X7
X6
34
Gibbs Sampling Example - BN
X2 P(X2 |X1
1,…,X0
8 ,X9}
E = {X9}
X1
X4
X8 X5 X2
X3
X9 X7
X6
35
Gibbs Sampling: Illustration
36
Gibbs Sampling Example – Init
Initialize nodes with random values:
X1 = x10 X6 = x6
0
X2 = x20 X7 = x7
0
X3 = x30 X8 = x8
0
X4 = x40
X5 = x50
Initialize Running Sums:SUM1 = 0
SUM2 = 0
SUM3 = 0
SUM4 = 0
SUM5 = 0
SUM6 = 0
SUM7 = 0
SUM8 = 0
37
Gibbs Sampling Example – Step 1
Generate Sample 1 compute SUM1 += P(x1| x2
0, x30, x4
0, x50, x6
0, x70, x8
0, x9 ) select and assign new value X1 = x1
1 compute SUM2 += P(x2| x1
1, x30, x4
0, x50, x6
0, x70, x8
0, x9 ) select and assign new value X2 = x2
1 compute SUM3 += P(x2| x1
1, x21, x4
0, x50, x6
0, x70, x8
0, x9 ) select and assign new value X3 = x3
1 …..
At the end, have new sample: S1 = {x1
1, x21, x4
1, x51, x6
1, x71, x8
1, x9 }
38
Gibbs Sampling Example – Step 2
Generate Sample 2 Compute P(x1| x2
1, x31, x4
1, x51, x6
1, x71, x8
1, x9 ) select and assign new value X1 = x1
1 update SUM1 += P(x1| x2
1, x31, x4
1, x51, x6
1, x71, x8
1, x9 ) Compute P(x2| x1
2, x31, x4
1, x51, x6
1, x71, x8
1, x9 ) select and assign new value X2 = x2
1 update SUM2 += P(x2| x1
2, x31, x4
1, x51, x6
1, x71, x8
1, x9 ) Compute P(x3| x1
2, x22, x4
1, x51, x6
1, x71, x8
1, x9 ) select and assign new value X3 = x3
1 compute SUM3 += P(x2| x1
2, x22, x4
1, x51, x6
1, x71, x8
1, x9 ) …..
New sample: S2 = {x12, x2
2, x42, x5
2, x62, x7
2, x82, x9 }
39
Gibbs Sampling Example – Answering Queries
P(x1|x9) = SUM1 /2
P(x2|x9) = SUM2 /2 P(x3|x9) = SUM3 /2P(x4|x9) = SUM4 /2
P(x5|x9) = SUM5 /2P(x6|x9) = SUM6 /2P(x7|x9) = SUM7 /2P(x8|x9) = SUM8 /2
40
Gibbs Convergence Stationary distribution = target sampling
distribution MCMC converges to the stationary
distribution if network is ergodic Chain is ergodic if all probabilities are
positive
Si Sj pij > 0pij
If i,j such that pij = 0 , then we may not be able to explore full sampling space !
41
Gibbs Sampling: Burn-In
We want to sample from P(X | E) But…starting point is random Solution: throw away first K samples Known As “Burn-In” What is K ? Hard to tell. Use intuition. Alternatives: sample first sample values
from approximate P(x|e) (for example, run IBP first)
42
Gibbs Sampling: Performance
+Advantage: guaranteed to converge to P(X|E)-Disadvantage: convergence may be slow
Problems:
Samples are dependent ! Statistical variance is too big in high-
dimensional problems
43
Gibbs: Speeding Convergence
Objectives:1. Reduce dependence between
samples (autocorrelation) Skip samples Randomize Variable Sampling Order
2. Reduce variance Blocking Gibbs Sampling Rao-Blackwellisation
44
Skipping Samples
Pick only every k-th sample (Gayer, 1992)
Can reduce dependence between samples !
Increases variance ! Waists samples !
45
Randomized Variable Order
Random Scan Gibbs SamplerPick each next variable Xi for update at
random with probability pi , i pi = 1.
(In the simplest case, pi are distributed uniformly.)
In some instances, reduces variance (MacEachern, Peruggia, 1999 “Subsampling the Gibbs Sampler: Variance
Reduction”)
46
Blocking
Sample several variables together, as a block Example: Given three variables X,Y,Z, with
domains of size 2, group Y and Z together to form a variable W={Y,Z} with domain size 4. Then, given sample (xt,yt,zt), compute next sample:
Xt+1 P(yt,zt)=P(wt)(yt+1,zt+1)=Wt+1 P(xt+1)
+ Can improve convergence greatly when two variables are strongly correlated!
- Domain of the block variable grows exponentially with the #variables in a block!
47
Blocking Gibbs Sampling
Jensen, Kong, Kjaerulff, 1993“Blocking Gibbs Sampling Very Large
Probabilistic Expert Systems” Select a set of subsets:
E1, E2, E3, …, Ek s.t. Ei X
Ui Ei = X
Ai = X \ Ei
Sample P(Ei | Ai)
48
Rao-Blackwellisation
Do not sample all variables! Sample a subset! Example: Given three variables
X,Y,Z, sample only X and Y, sum out Z. Given sample (xt,yt), compute next sample:
Xt+1 P(x|yt)yt+1 P(y|xt+1)
49
Rao-Blackwell Theorem
Bottom line: reducing number of variables in a sample reduce variance!
50
Blocking vs. Rao-Blackwellisation
Standard Gibbs:P(x|y,z),P(y|x,z),P(z|x,y) (1)
Blocking:P(x|y,z), P(y,z|x) (2)
Rao-Blackwellised:P(x|y), P(y|x) (3)
Var3 < Var2 < Var1 [Liu, Wong, Kong, 1994Covariance structure of the Gibbs
sampler…]
X Y
Z
51
Rao-Blackwellised Gibbs: Cutset Sampling
Select C X (possibly cycle-cutset), |C| = m
Fix evidence E Initialize nodes with random values:
For i=1 to m: ci to Ci = c 0i
For t=1 to n , generate samples:For i=1 to m:Ci=ci
t+1 P(ci|c1 t+1,…,ci-1
t+1,ci+1t,…,cm
t ,e)
52
Cutset Sampling
Select a subset C={C1,…,CK} X A sample t[1,2,…],is an instantiation of C:
Sampling process Fix values of observed variables e Generate sample c0 at random Generate samples c1,c2,…cT from P(c|e) Compute posteriors from samples
},...,,{ 2211tKK
ttt cCcCcCc
53
Cutset SamplingGenerating Samples
Generate sample ct+1 from ct :
In short, for i=1 to K:),\|(
),,...,,|(
...
),,...,,|(
),,...,,|(
1
11
12
11
1
31
121
22
3211
11
ecccPcC
eccccPcC
eccccPcC
eccccPcC
it
itii
tK
ttK
tKK
tK
ttt
tK
ttt
from sampled
54
Rao-Blackwellised Gibbs: Cutset Sampling
How to compute P(ci|c t\ci, e) ?
Compute joint P(ci, c t\ci, e) for each ci
D(Ci) Then normalize:
P(ci| c t\ci , e) = P(ci, c
t\ci , e) Computation efficiency depends
on choice of C
55
Rao-Blackwellised Gibbs: Cutset Sampling
How to choose C ? Special case: C is cycle-cutset, O(N) General case: apply Bucket Tree Elimination (BTE), O(exp(w)) where w is the induced width of the network when nodes in C are observed.
Pick C wisely so as to minimize w notion of w-cutset
56
w-cutset Sampling
C=w-cutset of the network, a set of nodes such that when C and E are instantiated, the adjusted induced width of the network is w
Complexity of exact inference: bounded by w !
cycle-cutset is a special case
57
Cutset Sampling-Answering Queries
Query: ci C, P(ci |e)=? same as Gibbs: Special case of w-cutset
computed while generating sample t
compute after generating sample t
T
t it
ii ecccPT
|e)(cP1
),\|(1
Query: P(xi |e) = ?
T
t
tii ,ecxP
T|e)(xP
1)|(
1
58
Cutset Sampling Example
}{ 05
02
0 ,xxc
X1
X7
X5 X4
X2
X9 X8
X3
E=x9
X6
59
Cutset Sampling Example
),(
),(1)(
),(
),(
}{
905
''2
905
'2
9052
12
905
''2
905
'2
05
02
0
,xxxBTE
,xxxBTE,x| xxP x
,xxxBTE
,xxxBTE
,xx c
X1
X7
X6 X5 X4
X2
X9 X8
X3
Sample a new value for X2 :
60
Cutset Sampling Example
},{
),(
),(1)(
),(
),(
)(
},{
15
12
1
9''
512
9'5
12
9125
15
9''
512
9'5
12
9052
12
05
02
0
xxc
,xxxBTE
,xxxBTE,x| xxP x
,xxxBTE
,xxxBTE
,x| xxP x
xxc
X1
X7
X6 X5 X4
X2
X9 X8
X3
Sample a new value for X5 :
61
Cutset Sampling Example
)(
)(
)(
3
1)|(
)(
)(
)(
9252
9152
9052
92
9252
32
9152
22
9052
12
,x| xxP
,x| xxP
,x| xxP
xxP
,x| xxP x
,x| xxP x
,x| xxP x
X1
X7
X6 X5 X4
X2
X9 X8
X3
Query P(x2|e) for sampling node X2 :Sample 1
Sample 2
Sample 3
62
Cutset Sampling Example
),,|(
),,|(
),,|(
3
1)|(
),,|(},{
),,|(},{
),,|(},{
935
323
925
223
915
123
93
935
323
35
32
3
925
223
25
22
2
915
123
15
12
1
xxxxP
xxxxP
xxxxP
xxP
xxxxPxxc
xxxxPxxc
xxxxPxxc
X1
X7
X6 X5 X4
X2
X9 X8
X3
Query P(x3 |e) for non-sampled node X3 :
63
Gibbs: Error Bounds
Objectives: Estimate needed number of samples T Estimate error Methodology: 1 chain use lag-k autocovariance
Estimate T M chains standard sampling
variance Estimate Error
64
Gibbs: lag-k autocovariance
12
1
1
1
)(2)0(1
)(
))((1
)(
)\|(1
)|(
)\|(
i
kN
t kii
itN
t ii
it
ii
iT
PVar
PPPPT
k
xxxPT
exPP
xxxPP
Lag-k autocovariance
65
Gibbs: lag-k autocovariance
12
1
)(2)0(1
)(
i
iT
PVar
)(
)0(ˆPVar
T
Estimate Monte Carlo variance:
Here, is the smallest positive integer satisfying:
1)12()2( Effective chain size:
In absense of autocovariance: TT ˆ
66
Gibbs: Multiple Chains
Generate M chains of size K Each chain produces independent estimate Pm:
M
i mPM
P1
1
)\|(1
)|(1 i
tK
t iim xxxPK
exPP
Treat Pm as independent random variables.
Estimate P(xi|e) as average of Pm (xi|e) :
67
Gibbs: Multiple Chains
{ Pm } are independent random variables
Therefore:
M
St
PMPM
PPM
SPVar
M
M
mm
M
m m
1,2/
1
22
2
1
2
1
1
1
1)(
68
Geman&Geman1984
Geman, S. & Geman D., 1984. Stocahstic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans.Pat.Anal.Mach.Intel. 6, 721-41.
Introduce Gibbs sampling; Place the idea of Gibbs sampling in a general
setting in which the collection of variables is structured in a graphical model and each variable has a neighborhood corresponding to a local region of the graphical structure. Geman and Geman use the Gibbs distribution to define the joint distribution on this structured set of variables.
69
Tanner&Wong 1987
Tanner and Wong (1987) Data-augmentation Convergence Results
70
Pearl1988
Pearl,1988. Probabilistic Reasoning in Intelligent Systems, Morgan-Kaufmann.
In the case of Bayesian networks, the neighborhoods correspond to the Markov blanket of a variable and the joint distribution is defined by the factorization of the network.
71
Gelfand&Smith,1990
Gelfand, A.E. and Smith, A.F.M., 1990. Sampling-based approaches to calculating marginal densities. J. Am.Statist. Assoc. 85, 398-409.
Show variance reduction in using mixture estimator for posterior marginals.
72
Neal, 1992
R. M. Neal, 1992. Connectionist learning of belief networks, Artifical Intelligence, v. 56, pp. 71-118.
Stochastic simulation in Noisy-Or networks.
73
CPCS54 Test Results
MSE vs. #samples (left) and time (right)
Ergodic, |X| = 54, D(Xi) = 2, |C| = 15, |E| = 4
Exact Time = 30 sec using Cutset Conditioning
CPCS54, n=54, |C|=15, |E|=3
0
0.001
0.002
0.003
0.004
0 1000 2000 3000 4000 5000
# samples
Cutset Gibbs
CPCS54, n=54, |C|=15, |E|=3
0
0.0002
0.0004
0.0006
0.0008
0 5 10 15 20 25
Time(sec)
Cutset Gibbs
74
CPCS179 Test Results
MSE vs. #samples (left) and time (right) Non-Ergodic (1 deterministic CPT entry)|X| = 179, |C| = 8, 2<= D(Xi)<=4, |E| = 35
Exact Time = 122 sec using Loop-Cutset Conditioning
CPCS179, n=179, |C|=8, |E|=35
0
0.002
0.004
0.006
0.008
0.01
0.012
100 500 1000 2000 3000 4000
# samples
Cutset Gibbs
CPCS179, n=179, |C|=8, |E|=35
0
0.002
0.004
0.006
0.008
0.01
0.012
0 20 40 60 80
Time(sec)
Cutset Gibbs
75
CPCS360b Test Results
MSE vs. #samples (left) and time (right)
Ergodic, |X| = 360, D(Xi)=2, |C| = 21, |E| = 36
Exact Time > 60 min using Cutset Conditioning
Exact Values obtained via Bucket Elimination
CPCS360b, n=360, |C|=21, |E|=36
0
0.00004
0.00008
0.00012
0.00016
0 200 400 600 800 1000
# samples
Cutset Gibbs
CPCS360b, n=360, |C|=21, |E|=36
0
0.00004
0.00008
0.00012
0.00016
1 2 3 5 10 20 30 40 50 60
Time(sec)
Cutset Gibbs
76
Random Networks
MSE vs. #samples (left) and time (right)
|X| = 100, D(Xi) =2,|C| = 13, |E| = 15-20
Exact Time = 30 sec using Cutset Conditioning
RANDOM, n=100, |C|=13, |E|=15-20
0
0.0005
0.001
0.0015
0.002
0.0025
0.003
0.0035
0 200 400 600 800 1000 1200
# samples
Cutset Gibbs
RANDOM, n=100, |C|=13, |E|=15-20
0
0.0002
0.0004
0.0006
0.0008
0.001
0 1 2 3 4 5 6 7 8 9 10 11
Time(sec)
Cutset Gibbs
77
Coding Networks
MSE vs. time (right)
Non-Ergodic, |X| = 100, D(Xi)=2, |C| = 13-16, |E| = 50
Sample Ergodic Subspace U={U1, U2,…Uk}
Exact Time = 50 sec using Cutset Conditioning
x1 x2 x3 x4
u1 u2 u3 u4
p1 p2 p3 p4
y4y3y2y1
Coding Networks, n=100, |C|=12-14
0.001
0.01
0.1
0 10 20 30 40 50 60
Time(sec)
IBP Gibbs Cutset
78
Non-Ergodic Hailfinder
MSE vs. #samples (left) and time (right)
Non-Ergodic, |X| = 56, |C| = 5, 2 <=D(Xi) <=11, |E| = 0
Exact Time = 2 sec using Loop-Cutset Conditioning
HailFinder, n=56, |C|=5, |E|=1
0.0001
0.001
0.01
0.1
1
1 2 3 4 5 6 7 8 9 10
Time(sec)
Cutset Gibbs
HailFinder, n=56, |C|=5, |E|=1
0.0001
0.001
0.01
0.1
0 500 1000 1500
# samples
Cutset Gibbs
79
Non-Ergodic CPCS360b - MSE
cpcs360b, N=360, |E|=[20-34], w*=20, MSE
0
0.000005
0.00001
0.000015
0.00002
0.000025
0 200 400 600 800 1000 1200 1400 1600
Time (sec)
Gibbs
IBP
|C|=26,fw=3
|C|=48,fw=2
MSE vs. Time
Non-Ergodic, |X| = 360, |C| = 26, D(Xi)=2
Exact Time = 50 min using BTE
80
Non-Ergodic CPCS360b - MaxErr
cpcs360b, N=360, |E|=[20-34], MaxErr
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0 200 400 600 800 1000 1200 1400 1600
Time (sec)
Gibbs
IBP
|C|=26,fw=3
|C|=48,fw=2
81
Importance vs. Gibbs
T
tt
tt
t
T
t
t
T
t
xq
xpxf
Tf
exqx
xfT
xf
expexp
expx
1
1
)(
)()(1
)|(:Importance
)(1
)(
)|()|(~)|(~ :Gibbs
wt