Date post: | 29-Dec-2015 |
Category: |
Documents |
Upload: | amberlynn-tyler |
View: | 214 times |
Download: | 0 times |
Abstract Architecture
Planning+
CopingDeliberatedActions
Agent in the World
BodySpeechFacial expressions
Effectors
SensorsFilters
World Agent model+
Model of self(Emotions)
+Model of others
(“memories”)
Appraisal
ConcernsReactions
objectsagentsactions
propertiesAction tendencies Emotional Signals
Agent Mind
Planning and Coping Module
The Planning and Coping module incorporates the action-selection mechanism of the agent
Conventional approaches require the programmer to anticipate every possible context and state and tune the mechanism to produce the right action
To overcome the problem complexity, we can adopt a learning approach
Planning and Coping Module
Hybrid approaches make use of both neural network and symbolic structures to learn sensory-motor correlations and abstract concepts through experience
We propose one way to deal with action sequencing, viewed as a type of motor reasoning, in a fully neural architecture
Basic Mechanisms - Bistables
1F
11F 12F
121F 122F13F
2F
Frontal Cortex
Associative Cortex
1F 2F13F
122F121F12F11F
Basic Mechanisms - Bistables
Frontal Cortex
Associative Cortex
1F 2F13F
122F121F12F11F
1F
11F
12F
“Stack”
Basic Mechanisms - Bistables
Frontal Cortex
Associative Cortex
1F 2F13F
122F121F12F11F
1F
11F
12F
121F
122F
“Stack”
Basic Mechanisms - Bistables
Frontal Cortex
Associative Cortex
1F 2F13F
122F121F12F11F
1F
11F
12F
121F
122F
Basic Mechanisms - Bistables
Frontal Cortex
Associative Cortex
1F 2F13F
122F121F12F11F
1F
11F
12F
121F
122F
“Pop”
Basic Mechanisms - Bistables
Frontal Cortex
Associative Cortex
1F 2F13F
122F121F12F11F
1F
11F
12F
121F
122F
13F
Basic Mechanisms - Bistables
Frontal Cortex
Associative Cortex
1F 2F13F
122F121F12F11F
1F
11F
12F
121F
122F
13F
“Pop”
Schemas
Schemas are functional units (intermediate between overall behavior and neural function) for analysis of cooperative competition in the brain
A perceptual schema embodies the process whereby the system determines whether a given domain of interaction is present in the environment.
Current plans are made up of motor schemas.
Simplified Architecture
WORLD
External Perceptions
PerceptionsActions
Planning and Coping Network
Internal Perceptions MotivationalSystem
Neural Architecture
The Planing network is composed of:
Nodes
Environmental states are expressed through the activation ofstate nodesThe agent’s needs are reflected in the activation of drive nodesThe agent’s actions are determined by the activation of action nodes
Links are one-way communication channels that enable thecommunication between nodes.
Internal State
Internal perceptions are defined by a set of internal variables that evolve in time.
A general internal variable is in the range [vmin, vmax] and evolves in time according to:
where vi is a variation caused by some
external causes.
max
min
, if 0
, if 0
0, otherwise
i i i i i
ii i ij i j i i i i i
j i
v v v vdv
v v v v v v vdt
Drives
Internal variables are homeostatic variables.
Each internal variable has a comfort zone and two drives associated with it, whose activity measures the need of increase or decrease.
Drive for increase :
Drive for decrease :
min max,i ith th
min
min min
1log i i
e i i
th v
k th vexiti ed k e
max
max max
1log i i
i i i
v th
k v thinibi id k e
min
i iv th
max
i iv th
Global Drive and Reward
All the base nodes receive a global drive signal whose intensity is a function of the most pressing need
All the base nodes also receive a global reward signal corresponding to the satisfaction of one of the needs
1max ,inib excit
i ii n
d d d
1max , ,0 ,
1, if 1 0
1 , ,
0, if 1 0
inib exciti i
i n
j ji i j
ijjii
ji
r f t f t
d t d td t
d tf t j inib excit
d t
Learning - Detection of Goals
A base action node learns the correlation between the success of the corresponding command and the global reward
When such correlation is strong enough, the node splits, producing a specialized node that plays the role of a goal
Action success + reward = GOAL
Context Learning
Specialized nodes compute the context of certain events
The context is defined by the ensemble configuration of the activities of the group of nodes linked to one node
The context is learned on the occurrence of certain events Event good is an event we wish to be able to predict Event bad corresponds to the situation where the event
good does not occur, contrary to what was expected
The events good and bad are associated to two distinct weights that are used to compute the context value
Base Nodes Functioning
Excitation of a node codes the detection of a perceptive event
Action nodes can also have a call activity that will trigger the associated action
The call activity is initiated at random, except when it is regulated by the specialized nodes, but the number of action nodes that can be simultaneously calling by means of spontaneous activity is limited
The call activity is maintained during a random period of time
Specialized Nodes Functioning
Specialized nodes are used primarily to organize the calls sent to the motor module
These nodes can be seen as schemas that can be chained in a plan
The bistable activity of specialized nodes implements a stack mechanism
The specialized nodes make use of local notions of drive and reward The local drive corresponds to the necessity of using
the associated command The local reward corresponds to the successful
execution of that command
Specialized Nodes Functioning
A Competition mechanism determines the nodes that can perform a call
The behaviour of a specialized node depends on its bistable activity If the node is off, the call is transmitted to the parent
base node, where it will trigger action execution If the node is on, the local drive is stocked and
transmitted to the node’s subgoalsThe on is triggered by the event badThe off transition happens in two situations
After a given period of time When the right context can now be obtained
Specialized Nodes Functioning
Context learning is triggered when the events good or bad occur The event good corresponds to occurrence of reward (action
successful execution) in response to a call not stocked The event bad corresponds to the case where the call of the
node is not able of leading to rewardSpecialized nodes also learn the time needed for the execution of
the associated command The occurrence of the event good means that that time can
be reduced The occurrence of the event bad when the context is
favourable means that that time should increase
Contexts
There are two types of context Excitation context evaluate the excitation of context
nodes Call context is only computed by action nodes and
evaluate the call activity of the other action nodesExcitation context leads to the reinforcement of the weights
corresponding to base nodes whose detection activity is predictive of the success of the command generated by the specialized node
When a base node is strongly implicated in the context, it splits to create a subgoal of the specialized node
Contention Scheduling
The local drive determines the dominant schemas and its computation favours the nodes whose excitation context is more coherent with the current activity
The reward obtained by the subgoals during plan execution is propagated upwards in the hierarchy, in the direction of the goal
A dominant specialized node inhibits the base nodes whose commands are incompatible with its own command according to the learned call context
Conclusions
Several aspects of the simplified architecture have been tested successfully in a text-like world Goal creation Creation and chaining of subgoals Inhibition mechanism
Conclusions
The integration of the agents in a 3D world has raised some technical problems
Goal creation was tested, other aspects need more work
We have to design carefully the internal state of the agents and do some bootstrapping if we want the characters to exhibit the right behaviour