Probabilistic Reasoning Over Time 2
CISC 453Amy VanBerlo
Adapted from Ch 15 AIMA3e
Overview
15.3 Hidden Markov Models – HMM◦ +Power of Linear Algebra
15.4 Kalman Filters◦ +Gaussian Distributions revisited
15.5 Dynamic Bayesian Networks◦ The MEGAVARIABLE & Particle Filtering
Hidden Markov Models – HMM(15.3)
Recall:initial state model: P(X0)
transition model: P(Xi | Xi-1)
sensor model: P(Ei | Xi)
Discrete State Variables!
Hidden Markov Models – HMM(15.3)
transition model: P(Xi | Xi-1) => Transition Matrix
sensor model: P(Ei | Xi) => Diagonal Matrices
Hidden Markov Models – HMM(15.3)
Transition Matrix
state variable: Xt has a value denoted by integer 1-S, where S = # possible states
transition model: P(Xt | Xt-1) becomes SxS Matrix T:
Tij = P(Xt = j| Xt-1 = i)
Tij is the probability of a transition from state i to j.
Hidden Markov Models – HMM(15.3)
Transition MatrixEXAMPLE:
From Umbrella World:
Tij = P(Xt = j| Xt-1 = i)
T = P(Xt | Xt-1) =0.7 0.3
0.3 0.7
Diagonal Matrices (Evidence Variable / Sensor Model)
Et is known at time t => Value / et
Need P(et | Xt =i ), => diagonals of SxS Matrix, Ot
Hidden Markov Models – HMM(15.3)
Hidden Markov Models – HMM(15.3)
Diagonal MatricesEXAMPLE:
From Umbrella World:
U1 = true; U3 = false;
O1 = O3 =0.9 0
0 0.2
0.1 0
0 0.8
Hidden Markov Models – HMM(15.3)
Forward Eq => column vector f
f1:t+1 = αOt+1TTf1:t
Backward Eq => column vector b
bk+1:t = TOk+1bk+2:t
Hidden Markov Models – HMM(15.3)
Advantages
All computations become simple matrix-vector operations!
ComplexityForward-backward algorithm:
sequence length t is O(S2t)
Improved Smoothing Algorithm:…
Hidden Markov Models – HMM(15.3)
Example: LocalizationVacuum World, simplified:
Original Robot, obstacle sensors
NSEW
Action(moveN/S/E/orW)
Belief state: set of all possible locations robot could be in
Additions: Allow for sensor noise
P model robot’s motion
Domain: set empty squares {s1,…sn}
Neighbours(s) / N(s)
Hidden Markov Models – HMM(15.3)
Example: LocalizationVacuum World, simplified:
Transition Model for Move:P(Xt = j| Xt-1 = i) = Tij =(1/N(i), if j ε Ne(i) else 0)
Unknown start state, so assume uniform distribution over all squares, P(X0 = i) = 1/n
Et 16 possible values
Є each sensor’s error rate
Hidden Markov Models – HMM(15.3)
Example: LocalizationVacuum World, simplified:
Et 16 possible values
Є each sensor’s error rate, probability getting all four bits right (1- Є)4 / wrong Є4
Discrepancy dit
Probability that a robot in square I would receive a sensor reading et
P(Et = et | Xt =i) = Otii = (1- Є)4-ditЄ
dit
Hidden Markov Models – HMM(15.3)
Example: LocalizationVacuum World, simplified:
Kalman Filters (15.4)
Where HMM dealt with DISCRETE vars, Kalman Filters vars are CONTINOUS
Ex: tracking a bird flying through the forest
◦ Xt = X, Y , Z , Xveloc ,Yveloc ,Zveloc
Kalman Filters (15.4)
Updating Gaussian Distributions
Probability P(Xt | e1:t) (current distribution)
Prediction P(Xt+1 | e1:t) = ∫Xt P(Xt+1 | xt) P(xt | e1:t) dxt
Sensor Model P(et+1 | Xt+1)
Updated distribution
P(Xt+1 | e1:t+1) = αP(et+1 | Xt+1) P(Xt+1 | e1:t)
Kalman Filters (15.4)
Updating Gaussian Distributions
If P(Xt | e1:t) is Gaussian, then predictionP(Xt+1 | e1:t) is Gaussian.
If P(Xt+1 | e1:t) is Gaussian, then the updated distributionP(Xt+1 | e1:t+1) is Gaussian.
therefore:
P(Xt | e1:t) is multivariate Gaussian N( μt, ∑t) for all t
Gaussian is LINEAR=> Xt+1 linear function of Xt
Kalman Filters (15.4)
Multivariate Gaussian Implication:
Filtering with a linear Gaussian model produces a Gaussian state distribution for all time
Why so important?
Continuous variable systems grow without bound
over time
Being able to model with normal distributions allows for accurate and complexity reduced calculations
Kalman Filters (15.4)
Multivariate Gaussian Implication:
Kalman Filters (15.4)
Where it Breaks:
Cannot be applied if transition model nonlinear
ex: bird evading tree
Extended Kalman Filter models transitions as locally linear; fails if system is locally unsmooth.
Dynamic Bayesian Networks (15.5)
In general: each ‘slice’ of a DBN can have any number of:
state variables Xt
Sensor/evidence variables Et
Assume variables and their relationships preserved/replicated from time t to t+1
Dynamic Bayesian Networks (15.5)
vs Hidden Markov Models
Every HMM can be rep as a DBN with a single Xt
and Et
Every discrete variable DBN can be rep as an HMM… Combine all Xt into MEGAVARIABLE
◦ Values: all possible tuples of values of individual Xt
Dynamic Bayesian Networks (15.5)
vs Hidden Markov Models
◦ If interchangeable.. Where lies the difference?
“Sparseness”:Ex. Suppose DBN 20 boolean Xt, each 3 parents
DBN transition model: 20 x 23 = 160 probabilities
HMM transition matrix: 220 states, 240 probabilities (~trillion!!)
Dynamic Bayesian Networks (15.5)
vs Kalman Filters
Every KF can be rep as a DBN with continuous variables and linear Gaussian conditional distributions
Ex: tracking a bird flying through the forest
◦ Xt = X, Y , Z , Xveloc ,Yveloc ,Zveloc
Dynamic Bayesian Networks (15.5)
vs Kalman Filters
Every KF can be rep as a DBN but few DBNs are KFs;
DBNs can model arbitrary distributions
KF always model a single multivariate Gaussian distribution
Aspects of the real world (obstacles) introduce non-linearities, require combination discrete and continuous
Dynamic Bayesian Networks (15.5)
Constructing DBNs Must specify :
1. prior distribution over state variables P(X0)
2. transition model P(Xt+1 | Xt)
3. sensor model P(Et | Xt)
Must also specify connections between slices
RECALL: model assumed variables and their relationships preserved/replicated from time t to t+1
Simply specify first slice and copy!
Dynamic Bayesian Networks (15.5)
Exact Inference in DBNs have seen inference in BN before given sequence observations, can construct full
Bayesian network representation of DBN by replicating slices until network large enough
Unrolling
then apply inference algorithm (ch.14) variable elimination, clustering..etc
Dynamic Bayesian Networks (15.5)
Exact Inference in DBNsUnrolling
Problem:
inference cost for each update grows with t
Rollup Filtering add slice t+1, “sum out” slice t using variable
elimination
Largest factor is O(dn+1), update cost O(dn+2)
Dynamic Bayesian Networks (15.5)
Exact Inference in DBNsUnrolling
Use DBNs to represent very complex temporal process with many sparsely connected variables
CANNOT reason efficiently and exactly about those processes!
Dynamic Bayesian Networks (15.5)
Approximate Inference in DBNs
Likelihood Weighting adapted from (14.5)
sample and weight non-evidence nodes of network in topological order
to avoid growth problem seen in Exact Inference, can simply run all N samples together through DBN, one slice at a time
Dynamic Bayesian Networks (15.5)
Approximate Inference in DBNs
Likelihood Weighting STILL FLAWED!
LW samples pay no attention to evidence◦ Fraction “agreeing” falls exponentially with t
◦ # samples req grows exponentially with t
Idea: focus set of samples on high-probability regions state space…. Particle Filtering:
Dynamic Bayesian Networks (15.5)
Inference in DBNs- Particle Filtering
population N initial-state samples created from prior distribution P(X0)
Update Cycle repeated for each time step:1. given xt (current state value for the sample)
based on transition model P(Xt+1 | Xt)-propagate sample forward
2. sample weighted by ‘likelihood it assigns to new evidence’, P(et+1 | xt+1)
3. resample pop, new N unweighted samples
Dynamic Bayesian Networks (15.5)
Inference in DBNs- Particle Filtering
Assume consistent at time t:
◦ N(xt | e1:t) / N = P(xt | e1:t)
Propagate forward: pop o Xt+1 are◦ N(xt+1 | e1:t) = ∑xtP(xt+1 | xt) N(xt | e1:t)
Weight samples by their likelihood for et+1:◦ W(xt+1 | e1:t+1) = P(et+1 | xt+1) N(xt+1 | e1:t)
Resample to obtain populations proportional to W:
◦ N(xt+1 | e1:t+1)/N = ……
◦ = P(xt+1 | e1:t+1)
Dynamic Bayesian Networks (15.5)
Inference in DBNs- Particle Filtering
Performance:
Approximation error of PF remains bounded over time :D
At least empirically! – Theoretical analysis difficult.
Summary
Temporal models use state and sensor variables replicated over time
Hidden Markov Models have single discrete state variable
Kalman Filters allow n state variables, linear Gaussian, multivariate Gaussian distributions
Dynamic Bayesian Nets selectively interchangeable with HMMs and KFs◦ Particle Filtering good inference method/ filtering
algorithm for DBNs
Thanks!
Questions?