+ All Categories
Home > Documents > Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design

Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design

Date post: 26-Jan-2016
Category:
Upload: henry
View: 27 times
Download: 2 times
Share this document with a friend
Description:
Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design. Xin Lu 11/04/2002. POMDP: UNCERTAINTY. Uncertainty about the action outcome Uncertainty about the world state due to imperfect (partial) information --- Huang Hui. Outline. POMDP agent - PowerPoint PPT Presentation
28
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002
Transcript
Page 1: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

1

Chapter 17 2nd Part Making Complex Decisions

--- Decision-theoretic Agent Design

Xin Lu11/04/2002

Page 2: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

2

POMDP: UNCERTAINTY

Uncertainty about the action outcome Uncertainty about the world state due to

imperfect (partial) information

--- Huang Hui

Page 3: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

3

Outline

POMDP agent Constructing a new MDP in which the current probability

distribution over states plays the role of the state variable causes the state new state space characterized by real-valued probabilities and infinite.

Decision-theoretic Agent Design for POMDP

a limited lookahead using the technology of decision networks

Page 4: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

4

Decision cycle of a POMDP agent

1. Given the current belief state b, execute the action

2. Receive observation o3. Set the current belief state to SE(b,a,o) and

repeat.

)(* ba

SE Agent

World

Observation

Action

b

Page 5: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

5

Belief state

b(s) is the probability assigned to the actual state s by belief state b.

0.111 0.111 0.111 0.000

0.111 0.111 0.000

0.111 0.111 0.111 0.111

0,0,,,,,,,,, 91

91

91

91

91

91

91

91

91

),,( oabSEb

| , | ,

' P | , ,| , | ,

i

j i

j j i is S

j j

j j i is S s S

P o s a P s s a b s

b s s o a bP o s a P s s a b s

Page 6: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

6

Belief MDP A belief MDP is a tuple <B, A, , P>:

B = infinite set of belief states

A = finite set of actions

(b, a) = (reward function)

P(b’|b, a) = (transition function)

Where P(b’|b, a, o) = 1 if SE(b, a, o) = b’,P(b’|b, a, o) = 0 otherwise;

,s S

b s R s a

( ' | , , ) ( | , )o O

P b b a o P o a b

0.111 0.111 0.111 0.000

0.111 0.111 0.000

0.111 0.111 0.111 0.111

0.222 0.111 0.000 0.000

0.111 0.111 0.000

0.222 0.111 0.111 0.000

Move West onceb b’

Page 7: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

7

Solutions for POMDP Belief MDP has reduced POMDP to MDP, the MDP obtained has a

continuous state space. Methods based on value and policy iteration:

A policy can be represented as a set of regions of belief state space, each of which is associated with a particular optimal action. The value function associates a distinct linear function of b with each region. Each value or policy iteration step refines the boundaries of the regions and may introduce new regions.

A Method based on lookahead search:

decision-theoretic agents

)(b

Page 8: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

8

Decision Theory = probability theory + utility theory

The fundamental idea of decision theory is that an agent is rational if and only if it chooses the action that yields the highest expected utility, averaged over all possible outcomes of the action.

Page 9: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

9

A decision-theoretic agent

function DECISION-THEORETIC-AGENT(percept) returns action calculate updated probabilities for current state based on available

evidence including current percept and previous action calculate outcome probabilities for actions given action descriptions and probabilities of current states select action with highest expected utility given probabilities of outcomes and utility information return action

Page 10: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

10

Basic elements of decision-theoretic agent design

Dynamic belief network--- the transition and observation models

Dynamic decision network (DDN)--- decision and utility A filtering algorithm (e.g. Kalman filtering)---incorporate

each new percept and action and update the belief state representation.

Decisions are made by projecting forward possible action sequences and choosing the best action sequence.

Page 11: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

11

Definition of Belief The belief about the state at time t is the

probability distribution over the state given all available evidence:

(1) is state variable, refers the current state of

the world

is evidence variable.

tX

)...,...|()( 111 tttt AAEEXPXBel

tE

Page 12: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

12

Calculation of Belief (1) Assumption 1: the problem is Markovian,

(2) Assumption 2: each percept depends only on the

state at the time

(3) Assumption 3: the action taken depends only on

the percepts the agent has received to date

(4)

),|()...,...|( 111111 tttttt AXXPAAXXXP

)|()...,...,...|( 11111 tttttt XEPEEAAXXEP

)...|()...,...|( 11111211 ttttt EEAPEEAAAP

Page 13: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

13

Calculation of Belief (2)

Prediction phase: (5)

ranges over all possible values of the state

variables Estimation phase:

(6) is a normalization constant

1

)(),|()( 11111

tXttttttt xXBelAxXXPXBel

)()|()( tttt XBelXEPXBel

1tx

1tX

Page 14: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

14

Design for a decision-theoretic Agent

Function DECISION-THEORETIC-AGENT( ) returns an action

inputs: , the percept at time t

static: BN, a belief network with nodes X

Bel(X), a vector of probabilities, updated over time

return action

tE

tE

1

)(),|()( 11111tX ttttttt xXBelAxXXPXBel

)()|()( tttt XBelXEPXBel

t tt X X ttttttttA xUAxXxXPxXBelaction

1)](),|()([maxarg 111

Page 15: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

15

Sensing in uncertain worlds Sensor model: , describes how the

environment generates the sensor data. vs Observation model O(s,o) Action model: , describes the effects

of actions vs Transition model Stationary sensor model: where E and X are random variables ranging

over percepts and statesAdvantage: can be used at each time step.

)|( tt XEP

),|( 11 ttt AXXP

)|()|( XEPXEPt tt

)|( XEP

),,( sasT

Page 16: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

16

A sensor model in a belief networkP(B)

.001

P(E)

.002

(a) Belief network fragment showing the general relationship between state variables and sensor variables.

Burglary Earthquake

Alarm

JohnCalls MaryCalls

B E P(A)

T

T

F

F

T

F

T

F

.95

.94

.29

.001A P(J

)

T .90

F .05

A P(M)

T .70

F .01

Sensornodes

Next step: break apart the generalized state and sensor variables into their components.

Page 17: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

17

(b) An example with pressure and temperature gauges

(c) Measuring temperature using two separate gauges

Page 18: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

18

Sensor Failure In order for the system to handle sensor failure,

the sensor model must include the possibility of failure.

Lane Position

Sensor Accuracy

Weather Sensor Failure Terrain

Position Sensor

Page 19: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

19

Dynamic Belief Network

Markov chain (state evolution model):

a sequence of values where each one is determined solely by the previous one:

Dynamic belief network (DBN): a belief network with one node for each state and sensor variable for each time step.

tX

)|( 1tt XXP

Page 20: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

20

Generic structure of a dynamic belief network

State.t-2

Percept.t-2

State.t-1

Percept.t-1

State.t

Percept.t

State.t+1

Percept.t+1

State.t+2

Percept.t+2

STATE EVOLUTION MODEL

SENSOR MODEL

Two tasks of the network: Calculate the probability distribution for state at

time t Probabilistic projection: concern with how the

state will evolve into the future

Page 21: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

21

Prediction:

Rollup:

remove slice t-1

Estimation:

1

)(),|()( 11111

tXttttttt xXBelAxXXPXBel

State.t-1

Percept.t-1

State.t

Percept.t

State.t

Percept.t

State.t

Percept.t

State.t+1

Percept.t+1

)()|()( tttt XBelXEPXBel

Page 22: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

22

LanePosition.t

PositionSensor.t

SensorAccuracy.t

Weather.t

Terrain.t

SensorFailure.t

LanePosition.t+1

PositionSensor.t+1

SensorAccuracy.t+1

Weather.t+1

Terrain.t+1

SensorFailure.t+1

Page 23: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

23

Dynamic Decision Networks

Dynamic Decision Networks: add utility nodes and decision nodes for actions into dynamic belief networks.

Page 24: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

24

Sense.t

D.t-1

State.t

Sense.t+1

State.t+1

D.t

Sense.t+2

State.t+2

D.t+1

Sense.t+3

State.t+3

D.t+2 U.t+3

The generic structure of a dynamic decision network

The decision problem involves calculating the value of that maximizes the agent’s expected utility over the remaining state sequence.

tD

Page 25: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

25

Search tree of the lookahead DDNtD

1tD

2tD

10 -4 -6 3

1tE

2tE

3tE

)|( :1 tt EXP

)|( 1:11 tt EXP

)|( 2:12 tt EXP

)( 3tXU

in

in

in

Page 26: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

26

Some characters of DDN search tree The search tree of DDN is very similar to the

EXPECTIMINIMAX algorithm for game trees with chance nodes, expect that:

1. There can also be rewards at non-leaf states2. The decision nodes correspond to belief

states rather than actual states. The time complexity: d is the depth, |D| is the number of available

actions, |E| is the number of possible observations

)|||(| dd EDO

Page 27: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

27

Discussion of DDN The DDN promises potential solutions to many

of the problems that arise as AI systems are moved from static, accessible, and above all simple environments to dynamic, inaccessible, complex environments that are closer to the real world.

The DDN provides a general, concise representation for large POMDP, so they can be used as inputs for any POMDP algorithm including value and policy iteration methods.

Page 28: Chapter 17        2 nd  Part  Making Complex Decisions --- Decision-theoretic Agent Design

28

Perspective of DDN to reduce complexity

Combined with a heuristic estimate for utility of the remaining steps

Many approximation techniques:1. Using less detailed state variables for states in the

distant future.2. Using a greedy heuristic search through the space of

decision sequences.3. Assuming “most likely” values for future percept

sequences rather than considering all possible values…


Recommended