Date post: | 18-Dec-2015 |
Category: |
Documents |
Upload: | isabel-griffith |
View: | 219 times |
Download: | 0 times |
2
Weather:
• raining today 40% rain tomorrow
60% no rain tomorrow
• not raining today 20% rain tomorrow
80% no rain tomorrow
Markov ProcessSimple Example
rain no rain
0.60.4 0.8
0.2
Stochastic Finite State Machine:
3
Weather:
• raining today 40% rain tomorrow
60% no rain tomorrow
• not raining today 20% rain tomorrow
80% no rain tomorrow
Markov ProcessSimple Example
8.02.0
6.04.0P
• Stochastic matrix:Rows sum up to 1
• Double stochastic matrix:Rows and columns sum up to 1
The transition matrix:
Rain No rain
Rain
No rain
Markov Process
• Markov Property: Xt+1, the state of the system at time t+1 depends only on the state of the system at time t
X1 X2 X3 X4 X5
x | X x X x x X | X x X tttttttt 111111 PrPr
• Stationary Assumption: Transition probabilities are independent of time (t)
1Pr t t ab X b | X a p
Let Xi be the weather of day i, 1 <= i <= t. We may decide the probability of Xt+1 from Xi, 1 <= i <= t.
5
– Gambler starts with $10 (the initial state)
- At each play we have one of the following:
• Gambler wins $1 with probability p
• Gambler looses $1 with probability 1-p
– Game ends when gambler goes broke, or gains a fortune of $100
(Both 0 and 100 are absorbing states)
0 1 2 99 100
p p p p
1-p 1-p 1-p 1-pStart (10$)
Markov ProcessGambler’s Example
1-p
6
• Markov process - described by a stochastic FSM
• Markov chain - a random walk on this graph (distribution over paths)
• Edge-weights give us
• We can ask more complex questions, like
Markov Process
1Pr t t ab X b | X a p
?Pr 2 b a | X X tt
0 1 2 99 100
p p p p
1-p 1-p 1-p 1-pStart (10$)
7
• Given that a person’s last cola purchase was Coke, there is a 90% chance that his next cola purchase will also be Coke.
• If a person’s last cola purchase was Pepsi, there is an 80% chance that his next cola purchase will also be Pepsi.
coke pepsi
0.10.9 0.8
0.2
Markov ProcessCoke vs. Pepsi Example
8.02.0
1.09.0P
transition matrix:
coke pepsi
coke
pepsi
8.02.0
1.09.0P
8
Given that a person is currently a Pepsi purchaser, what is the probability that he will purchase Coke two purchases from now?
Pr[ Pepsi?Coke ] =
Pr[ PepsiCokeCoke ] + Pr[ Pepsi Pepsi Coke ] =
0.2 * 0.9 + 0.8 * 0.2 = 0.34
66.034.0
17.083.0
8.02.0
1.09.0
8.02.0
1.09.02P
Markov ProcessCoke vs. Pepsi Example (cont)
Pepsi ? ? Coke
9
Given that a person is currently a Coke purchaser, what is the probability that he will buy Pepsi at the third purchase from now?
Markov ProcessCoke vs. Pepsi Example (cont)
562.0438.0
219.0781.0
66.034.0
17.083.0
8.02.0
1.09.03P
10
•Assume each person makes one cola purchase per week
•Suppose 60% of all people now drink Coke, and 40% drink Pepsi
•What fraction of people will be drinking Coke three weeks from now?
Markov ProcessCoke vs. Pepsi Example (cont)
8.02.0
1.09.0P
562.0438.0
219.0781.03P
Pr[X3=Coke] = 0.6 * 0.781 + 0.4 * 0.438 = 0.6438
Qi - the distribution in week i
Q0= (0.6,0.4) - initial distribution
Q3= Q0 * P3 =(0.6438,0.3562)
11
Simulation:
Markov ProcessCoke vs. Pepsi Example (cont)
week - i
Pr[
Xi =
Cok
e]
2/3
31
32
31
32
8.02.0
1.09.0
stationary distribution
coke pepsi
0.10.9 0.8
0.2
How to obtain Stochastic matrix?
Solve the linear equations, e.g.,
Learn from examples, e.g., what letters follow what letters in English words: mast, tame, same, teams, team, meat, steam, stem.
12
31
32
2,21,2
2,111
31
32
xx
xx ,
How to obtain Stochastic matrix?
Counts table vs Stochastic Matrix
13
P a s t m e \0
a 0 1/7 1/7 5/7 0 0
e 4/7 0 0 1/7 0 2/7
m 1/8 1/8 0 0 3/8 3/8
s 1/5 0 3/5 0 0 1/5
t 1/7 0 0 0 4/7 2/7
@ 0 3/8 3/8 2/8 0 0
Application of Stochastic matrix
Using Stochastic Matrix to generate a random word: Generate most likely first letter For each current letter generate most likely next letter
14
A a s t m e \0
a - 1 2 7 - -
e 4 - - 5 - 7
m 1 2 - - 5 8
s 1 - 4 - - 5
t 1 - - - 5 7
@ - 3 6 8 - -
If C[r,j] > 0, let A[r,j] = C[r,1]+C[r,2]+…+C[r,j]
C
Application of Stochastic matrix Using Stochastic Matrix to generate a random word:
Generate most likely first letter: Generate a random number x between 1 and 8. If 1 <= x <= 3, the letter is ‘s’; if 4 <= x <= 6, the letter is ‘t’; otherwise, it’s ‘m’. For each current letter generate
most likely next letter: Suppose
the current letter is ‘s’ and we
generate a random number x
between 1 and 5. If x = 1, the next
letter is ‘a’; if 2 <= x <= 4, the next
letter is ‘t’; otherwise, the current
letter is an ending letter.
15
A a s t m e \0
a - 1 2 7 - -
e 4 - - 5 - 7
m 1 2 - - 5 8
s 1 - 4 - - 5
t 1 - - - 5 7
@ - 3 6 8 - -
If C[r,j] > 0, let A[r,j] = C[r,1]+C[r,2]+…+C[r,j]
Supervised vs Unsupervised
Decision tree learning is “supervised learning” as we know the correct output of each example.
Learning based on Markov chains is “unsupervised learning” as we don’t know which is the correct output of “next letter”.
16
K-Nearest Neighbor
Features All instances correspond to points in an n-
dimensional Euclidean space Classification is delayed till a new instance
arrives Classification done by comparing feature
vectors of the different points Target function may be discrete or real-valued
K-Nearest Neighbor
An arbitrary instance is represented by (a1(x), a2(x), a3(x),.., an(x))
ai(x) denotes features Euclidean distance between two instances
d(xi, xj)=sqrt (sum for r=1 to n (ar(xi) - ar(xj))2) Continuous valued target function
mean value of the k nearest training examples
Distance-Weighted Nearest Neighbor Algorithm
Assign weights to the neighbors based on their ‘distance’ from the query point
Weight ‘may’ be inverse square of the distances
All training points may influence a particular instance
Shepard’s method