Transcript

Lecture 13: Hidden Markov ModelShuai Li

A Markov system

• There are states 1, 2, … , , and the time steps are discrete, = 0,1,2, …

• On the t-th time step the system is in exactly one of the available states. Call it

• Between each time step, the next state is chosen only based on the information provided by the current state

• The current state determines the probability distribution for the next state

2

Example

Example (cont.)

Markovian property

• +1is independent of −1, −2, … , 0 given

• In other words:

8

Example

• A human and a robot wander around randomly on a grid

9

Example (cont.)

• Each time step the human/robot moves randomly to an adjacent cell

• Typical Questions: • “What’s the expected time until the human is crushed like a bug?”

• “What’s the probability that the robot will hit the left wall before it hits the human?”

• “What’s the probability Robot crushes human on next time step?”

10

Example (cont.)

• The currently time is , and human remains uncrushed. What’s the probability of crushing occurring at time + 1?

• If robot is blind: • We can compute this in advance

• If robot is omnipotent (i.e. if robot knows current state): • can compute directly

• If robot has some sensors, but incomplete state information • Hidden Markov Models are applicable

11

= -- A clumsy solution

• Step 1: Work out how to compute () for any path = 12

• Step 2: Use this knowledge to get =

12

= -- A cleverer solution

• For each state , define = = to be the probability of state at time t

• Easy to do inductive computation

13

= -- A cleverer solution

• For each state , define = = to be the probability of state at time t

• Easy to do inductive computation

14

Complexity comparison

• Cost of computing for all states is now 2

• Why?

• Why?

• This is the power of dynamic programming that is widely used in HMM

15

Example (cont.)

• It’s currently time t, and human remains uncrushed. What’s the probability of crushing occurring at time t + 1

• If robot is blind: • We can compute this in advance

• If robot is omnipotent (I.E. If robot knows state at time t): • can compute directly

• If robot has some sensors, but incomplete state information • Hidden Markov Models are applicable

16

Hidden state

• The previous example tries to estimate = unconditionally (no other information)

• Suppose we can observe something that’s affected by the true state

17

Noisy observation of hidden state

• Let’s denote the observation at time by

• is noisily determined depending on the current state

• Assume that is conditionally independent of −1, −2, … , 0, −1, −2, … , 1, 0 given

• In other words

• The robot with noisy sensors is a good example

• Question 1: (Evaluation) State estimation: • what is = |1, … ,

• Question 2: (Inference) Most probable path: • Given 1, … , , what is the most probable path of the states? And what is

the probability?

• Question 3: (Leaning) Learning HMMs: • Given 1, … , , what is the maximum likelihood HMM that could have

produced this string of observations?

• MLE 21

• Speech recognition/understanding • Phones → Words, Signal → phones

• Human genome project

• Consumer decision modeling

• Economics and finance

Basic operations in HMMs

• For an observation sequence = 1, … , , three basic HMM operations are:

23

T = # timesteps, N = # states

Formal definition of HMM

• The states are labeled 1, 2, … , • For a particular trial, let

• be the number of observations

• be the number of states

• be the number of possible observations

• 1, 2, … , is the starting state probabilities

• = 1… is a sequence of observations

• = 12 is a path of states

• Then is the specification of an HMM The definition of and () will be introduced in next page

24

• The definition of and ()

25

Example

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random

26

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random.

• Let’s generate a sequence of observations:

Example (cont.)

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random.

• Let’s generate a sequence of observations:

Example (cont.)

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random.

• Let’s generate a sequence of observations:

Example (cont.)

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random.

• Let’s generate a sequence of observations:

Example (cont.)

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random.

• Let’s generate a sequence of observations:

Example (cont.)

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random.

• Let’s generate a sequence of observations:

Example (cont.)

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random.

• Let’s generate a sequence of observations:

Example (cont.)

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random.

Probability of a series of observations

• What is = 123 = 1 = ∧ 2 = ∧ 3 = ?

• Slow, stupid way:

• How do we compute () for an arbitrary path ?

• How do we compute (|) for an arbitrary path ?

35

• () for an arbitrary path

36

37

Probability of a series of observations (cont.)

• Computation complexity of the slow stupid answer: • () would require 27 () and 27 (|)

• A sequence of 20 observations would need 320=3.5 billion () and 3.5 billion (|)

• So we have to find some smarter answer

38

• Smart answer (based on dynamic programming)

• Given observations 12… • Define:

• In the example, what is 2(3) ?

39

40

42

43

• The robot with noisy sensors is a good example

• Question 1: (Evaluation) State estimation: • what is = |1, … ,

• Question 2: (Inference) Most probable path: • Given 1, … , , what is the most probable path of the states? And what is

the probability?

• Question 3: (Leaning) Learning HMMs: • Given 1, … , , what is the maximum likelihood HMM that could have

produced this string of observations?

• MLE 44

45

• We’re going to compute the following variables

• It’s the probability of the path of length − 1 with the maximum chance of doing all these things OCCURING and ENDING UP IN STATE Si and PRODUCING OUTPUT O1…Ot

• DEFINE: mppt(i) = that path

• The robot with noisy sensors is a good example

• Question 1: (Evaluation) State estimation: • what is = |1, … ,

• Question 2: (Inference) Most probable path: • Given 1, … , , what is the most probable path of the states? And what is

the probability?

• Question 3: (Leaning) Learning HMMs: • Given 1, … , , what is the maximum likelihood HMM that could have

produced this string of observations?

• MLE 51

• That “” is the notation for our HMM parameters

• Now we want to estimate from the observations

• AS USUAL: We could use

52

EM for HMMs

• If we knew we could estimate EXPECTATIONS of quantities such as • Expected number of times in state

• Expected number of transitions →

• If we knew the quantities such as • Expected number of times in state

• Expected number of transitions →

• We could compute the MAX LIKELIHOOD estimate of

57

• Bad news • There are lots of local minima

• Good news • The local minima are usually adequate models of the data

• Notice • EM does not estimate the number of states. That must be given.

• Often, HMMs are forced to have some links with zero probability. This is done by setting = 0 in initial estimate (0)

• Easy extension of everything seen today: • HMMs with real valued outputs

59

A Markov system

• There are states 1, 2, … , , and the time steps are discrete, = 0,1,2, …

• On the t-th time step the system is in exactly one of the available states. Call it

• Between each time step, the next state is chosen only based on the information provided by the current state

• The current state determines the probability distribution for the next state

2

Example

Example (cont.)

Markovian property

• +1is independent of −1, −2, … , 0 given

• In other words:

8

Example

• A human and a robot wander around randomly on a grid

9

Example (cont.)

• Each time step the human/robot moves randomly to an adjacent cell

• Typical Questions: • “What’s the expected time until the human is crushed like a bug?”

• “What’s the probability that the robot will hit the left wall before it hits the human?”

• “What’s the probability Robot crushes human on next time step?”

10

Example (cont.)

• The currently time is , and human remains uncrushed. What’s the probability of crushing occurring at time + 1?

• If robot is blind: • We can compute this in advance

• If robot is omnipotent (i.e. if robot knows current state): • can compute directly

• If robot has some sensors, but incomplete state information • Hidden Markov Models are applicable

11

= -- A clumsy solution

• Step 1: Work out how to compute () for any path = 12

• Step 2: Use this knowledge to get =

12

= -- A cleverer solution

• For each state , define = = to be the probability of state at time t

• Easy to do inductive computation

13

= -- A cleverer solution

• For each state , define = = to be the probability of state at time t

• Easy to do inductive computation

14

Complexity comparison

• Cost of computing for all states is now 2

• Why?

• Why?

• This is the power of dynamic programming that is widely used in HMM

15

Example (cont.)

• It’s currently time t, and human remains uncrushed. What’s the probability of crushing occurring at time t + 1

• If robot is blind: • We can compute this in advance

• If robot is omnipotent (I.E. If robot knows state at time t): • can compute directly

• If robot has some sensors, but incomplete state information • Hidden Markov Models are applicable

16

Hidden state

• The previous example tries to estimate = unconditionally (no other information)

• Suppose we can observe something that’s affected by the true state

17

Noisy observation of hidden state

• Let’s denote the observation at time by

• is noisily determined depending on the current state

• Assume that is conditionally independent of −1, −2, … , 0, −1, −2, … , 1, 0 given

• In other words

• The robot with noisy sensors is a good example

• Question 1: (Evaluation) State estimation: • what is = |1, … ,

• Question 2: (Inference) Most probable path: • Given 1, … , , what is the most probable path of the states? And what is

the probability?

• Question 3: (Leaning) Learning HMMs: • Given 1, … , , what is the maximum likelihood HMM that could have

produced this string of observations?

• MLE 21

• Speech recognition/understanding • Phones → Words, Signal → phones

• Human genome project

• Consumer decision modeling

• Economics and finance

Basic operations in HMMs

• For an observation sequence = 1, … , , three basic HMM operations are:

23

T = # timesteps, N = # states

Formal definition of HMM

• The states are labeled 1, 2, … , • For a particular trial, let

• be the number of observations

• be the number of states

• be the number of possible observations

• 1, 2, … , is the starting state probabilities

• = 1… is a sequence of observations

• = 12 is a path of states

• Then is the specification of an HMM The definition of and () will be introduced in next page

24

• The definition of and ()

25

Example

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random

26

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random.

• Let’s generate a sequence of observations:

Example (cont.)

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random.

• Let’s generate a sequence of observations:

Example (cont.)

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random.

• Let’s generate a sequence of observations:

Example (cont.)

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random.

• Let’s generate a sequence of observations:

Example (cont.)

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random.

• Let’s generate a sequence of observations:

Example (cont.)

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random.

• Let’s generate a sequence of observations:

Example (cont.)

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random.

• Let’s generate a sequence of observations:

Example (cont.)

• Start randomly in state 1 or 2

• Choose one of the output symbols in each state at random.

Probability of a series of observations

• What is = 123 = 1 = ∧ 2 = ∧ 3 = ?

• Slow, stupid way:

• How do we compute () for an arbitrary path ?

• How do we compute (|) for an arbitrary path ?

35

• () for an arbitrary path

36

37

Probability of a series of observations (cont.)

• Computation complexity of the slow stupid answer: • () would require 27 () and 27 (|)

• A sequence of 20 observations would need 320=3.5 billion () and 3.5 billion (|)

• So we have to find some smarter answer

38

• Smart answer (based on dynamic programming)

• Given observations 12… • Define:

• In the example, what is 2(3) ?

39

40

42

43

• The robot with noisy sensors is a good example

• Question 1: (Evaluation) State estimation: • what is = |1, … ,

• Question 2: (Inference) Most probable path: • Given 1, … , , what is the most probable path of the states? And what is

the probability?

• Question 3: (Leaning) Learning HMMs: • Given 1, … , , what is the maximum likelihood HMM that could have

produced this string of observations?

• MLE 44

45

• We’re going to compute the following variables

• It’s the probability of the path of length − 1 with the maximum chance of doing all these things OCCURING and ENDING UP IN STATE Si and PRODUCING OUTPUT O1…Ot

• DEFINE: mppt(i) = that path

• The robot with noisy sensors is a good example

• Question 1: (Evaluation) State estimation: • what is = |1, … ,

• Question 2: (Inference) Most probable path: • Given 1, … , , what is the most probable path of the states? And what is

the probability?

• Question 3: (Leaning) Learning HMMs: • Given 1, … , , what is the maximum likelihood HMM that could have

produced this string of observations?

• MLE 51

• That “” is the notation for our HMM parameters

• Now we want to estimate from the observations

• AS USUAL: We could use

52

EM for HMMs

• If we knew we could estimate EXPECTATIONS of quantities such as • Expected number of times in state

• Expected number of transitions →

• If we knew the quantities such as • Expected number of times in state

• Expected number of transitions →

• We could compute the MAX LIKELIHOOD estimate of

57

• Bad news • There are lots of local minima

• Good news • The local minima are usually adequate models of the data

• Notice • EM does not estimate the number of states. That must be given.

• Often, HMMs are forced to have some links with zero probability. This is done by setting = 0 in initial estimate (0)

• Easy extension of everything seen today: • HMMs with real valued outputs

59

Top Related