Holland and Goodman – Caltech – Banbury 2001
(Autonomous robots) +(Dynamic environment) +
(Intelligent control) =Consciousness?
Owen Holland and Rod Goodman
California Institute of Technology
The Banbury Center, Cold Spring Harbor Laboratory
May 13-16 2001
Holland and Goodman – Caltech – Banbury 2001
Some facts (for engineers)
Animals are autonomous embodied entities with a particular mission – propagation of their genes
Animal brains are controllers evolved to achieve that mission
Humans are the most intelligent animals
Humans are the only animals known to be conscious
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Hypothesis
When an autonomous embodied system, with a difficult animal-like mission in a difficult environment, has a sufficiently high level of intelligence (i.e. is able to achieve that mission well), then it may exhibit consciousness, either as a necessary component for achieving the mission, or as a by-product.
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Strategies for building a conscious machine
(A) Build the most intelligent robot we can, with the right sort of mission in the right sort of environment, and see if it’s conscious
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Why (A) is not a good idea
We don’t know how difficult the mission or the environment would have to be
We don’t know whether the robot would be intelligent enough
And even if the robot turned out to be conscious, and we could prove that it was, we wouldn’t know exactly why
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Strategies for building a conscious machine
(B)Build a dumb robot, with a simple mission, in a simple environment.
If it can cope, make the mission and environment more difficult until it can’t. Then make the robot smarter until it can cope.
Repeat until conscious.
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Why (B) is a better idea than (A)
It probably reflects what happened during our evolution.
We can start now, because we know how to build robots too dumb to be conscious.
If we detect the appearance of consciousness after increasing the robot’s intelligence, we have a chance of identifying what underlies consciousness.
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
What brains do (1)
Simple brains (nervous systems) perform mappings from sensory inputs to motor outputs; the interaction of this process with the environment produces behavior.
This can be a very effective strategy, especially when coupled with the use of the environment as an external memory (stigmergy).
We can build excellent real robots (behaviorbased, collective) using these principles.
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
What brains do (2)
There seems to be little doubt that complex brains build and exploit models.
Craik (1943) ‘The brain models external reality’
But we still know very little about how the models are built, what they are like, and how they are used.
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
What intelligent engineeringcontrol systems do
The most powerful control systems for controlling unknown state-rich plants (e.g. chemical plants) in complex, dynamic, uncertain environments are adaptive model-based predictive controllers.
They build internal models of the behavior of the plant in the environment, and use them to predict plant behavior and compute appropriate control actions.
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Where do models come from?
Where does a control system’s model of something come from?
- it’s built-in
- it’s partially built-in, and is modified by experience
- it’s taught
- it’s built from scratch
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Improvement by exercise
Once a model has been acquired or updated by exposure to the world, it can often be improved by exercising or ‘running’ it.
- e.g. Sutton’s Dyna architectures for efficient reinforcement learning in maze environments
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Robots and models
What can a robot use a model for?
- augmenting sensory information
- feedback and feedforward control
- detecting novelty or anomaly
- planning
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Planning with models
What gets planned? Action sequences.
(State t) + (Action t) => (State t+1)
(State t+1) + (Action t+1) => (State t+2) etc
For optimal planning: find the sequence of actions likely to make the greatest contribution to the success of the mission, and execute it. For useful planning, do better than is possible with no planning.
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
What we want to know
For each robot, environment, and mission:
- What is the model like?
- How well does it correspond to the real world?
- Can it be used for control? If so, how good is it?
- Can it be used for planning? If so, how good is it?
- Are there any behavioral phenomena reminiscent of consciousness-related human behavior?
- Are there any phenomena connected with internal processes reminiscent of conscious human experience?
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Robots
Holland and Goodman – Caltech – Banbury 2001
A Simple Robot
5.5 cm
•The Khepera miniature robot
• Features
•8 IR sensors which allow it to detect objects
• two independently controlled motors.
Holland and Goodman – Caltech – Banbury 2001
Webots – Khepera Embodied Simulator
Simulators allow faster operation than real robots – particularly if learning involved.
Simlulator complexity is OK for a simple robot like the Khepera, but for more complex robots, the simulator may be too complex or not simulate the real word accurately.
Holland and Goodman – Caltech – Banbury 2001
A Generic Robot Controller Architecture
INPUT UNITS
STATE UNITS
HIDDEN UNITS
OUTPUT UNITS
Recur rent Neural Machine
• The controller of the robot is an artificial neural network with recurrent feedback, capable of forming internal representations of sensory information in the form of a neural state machine.
•Sensory inputs (vision, sound, smell, etc) from sensors are fed to this structure
•Sensory inputs also include feedback from the motors and effectors.
•Controller outputs drive the locomotion and manipulators of the robot.
•The neural controller learns to perform a task, using neural network and genetic algorithm techniques.
•But - the internal model of the controller is implicit and therefore hidden from us.
SensoryInputs Including Motors and effectors
Controller outputs to motors and effectors
Holland and Goodman – Caltech – Banbury 2001
Understanding the internal model
Sensory inputs, including feedback from motors & effectors
Motor & effector drive outputs
INPUT UNITS
STATE UNITS
HIDDEN UNITS
OUTPUT UNITS
Recurrent Neural Machine
INVERSE Recurrent Neural
Machine
OBSERVE
•This mechanism will allow us to represent the hidden internal state of the controller in terms of the sensory inputs that correspond to that state.
•Thus we may claim to know something of “what the robot is thinking”.
•We assume that the controller be learned first, and that, once this is learned and reasonably stable, the inverse can be learned.
•Introduce a second recurrent neural network, separate from the first system, which learns the inverse relationship between the internal activity of the controller and the sensory input space Outputs of
inverse in same sensory space as inputs of forward controller
Holland and Goodman – Caltech – Banbury 2001
Simplified Inverse•In this experiment, we utilize a controller model which is much less powerful than the recurrent controllers described above, but allows us to illustrate the principle, and in particular makes “inversion” of the forward controller extremely simple.
•The crucial simplification we make is that the controller will learn its representation directly in the input space. Thus there is no inverse to learn - the internal representation learned by the robot is directly visible as an input space vector.
•The first phase is to learn or program the forward model or robot controller. In this simple experiment we program in a simple reactive wall-following behavior, rather than learn a complex behavior. The robot starts with no internal model, and adaptively learns its internal representation in an unsupervised manner as it performs its wall following behavior.
Holland and Goodman – Caltech – Banbury 2001
The Learning Algorithm(based on Linaker and Niklasson 2000 ARAVQ algorithm)
• A 10-dimensional feature space is formed from the 8 Khepera IR sensor signals plus the 2 motor drive signals.
• Clusters feature-vectors by change detection, to form prototype feature vector “models”.
• Unsupervised• Adds new models based on two criteria:
• Novelty: Large distance from existing models• Stability: Low variance in buffered history of features
• Adapts existing models over time• We program in a simple “wall following” behavior to act
as a “teacher”.
Holland and Goodman – Caltech – Banbury 2001
Learning in action
Colors show learned concepts:
Black – right wall
Blue – ahead wall
Green – 45 degree right wall
Red – corridor
Light Blue – outside corner
Holland and Goodman – Caltech – Banbury 2001
Running with the model
• Switch off the wall follower• The robot “sees” features as it moves• Choose the closest learned model vector at each
tick• Use the model vector motor drive values to
actually drive the motors.
Holland and Goodman – Caltech – Banbury 2001
Running with the model
Color indicates which is the current “best”model feature
Holland and Goodman – Caltech – Banbury 2001
Run the model in the real robot
Holland and Goodman – Caltech – Banbury 2001
Invert the motor signals back to sensory signals to infer an egocentric “map” of the environment as
“seen” by the robot.
0.1 0 0.1
0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.08 0.06 0.04 0.02 0 0.02 0.04 0.06
0.02
0
0.02
0.04
0.06
0.08
Holland and Goodman – Caltech – Banbury 2001
Keeping it Real
• Mapping with the real robot
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.30.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
Holland and Goodman – Caltech – Banbury 2001
Manipulating the model “mentally” to make a decision - “planning”
• Take the sequence of learned model feature vectors and cluster sub –sequences into higher-level concepts
• For example: • Blue-Green-black = Left Corner• Red = Corridor• Black = right wall
• At any instant ask the robot to go to “home”• Run the model forwards mentally to decide if it is
shorter to go ahead or to go back• Take appropriate action
Holland and Goodman – Caltech – Banbury 2001
Decision Time
Corridor corner is home
Rotate = Home is behind me
Flash LED’s = Home is ahead of me
Holland and Goodman – Caltech – Banbury 2001
CONTROLLER
INVERSE
Real WorldSense Signals
Switch SwitchMotorSignalsToRealRobot
Model WorldSense Signals
Inverse Predictor Architecture
•We now allow the inverse to be fed back into the controller via the switch
•Thus the controller has an image of its internal hidden state or “self” in the same feature space as its real sensory inputs
•Thus it can “see” what it “itself” is thinking.
•As before “we” can also observe what the machine is “thinking”.
Holland and Goodman – Caltech – Banbury 2001
Consequences of the architecture•In “normal” mode - the controller is producing motor signals based on the sensory input it “sees” (including motor/effector feedback). Normally we expect to see what it is seeing. The inverse allows for detecting mismatch between a predicted and an actual sensory input – thus indicating a novel experience, which in turn could focus attention and learning in the main controller. Noisy, ambiguous, and partial inputs can be “completed”.
•In “thinking or planning” mode the real world is disconnected from the controller input, and the mental images being output by the inverse are input to the controller instead. Thus sequences of planned action towards a goal can take place in mental space, and executed as action. Note that by switching between normal mode and “thinking” mode in some way, we can emulate the robot doing both reactive control and thinking at the same (multiplexed really) time. That is, like humans do when driving a car on “automatic” while “thinking” of something else.
•In “sleeping” mode we shut off the sensory input and allow noise to be input. Then the inverse will output “mental images”, which themselves can be fed back into the input (because they have the same representation) producing a complex series of “imagined” mental images or “dreams”. Note that we can use this “sleeping” mode to actually learn (or at least update) the inverse. The input noise vector is a “sensory input” vector like any other (whether it is structured accordingly or not), thus the inverse should be able to output this vector like any other from the state and motor signals. Thus we can use the error to update the inverse.
•If we do not disconnect the motors during “dreaming” we will have “sleepwalking” or “twitching”. If we assume that the controller is continually learning, then the inverse must be continually updated. If they get too much out of synchronization we could get irrational sequences in “thinking” or worse in execution mode - an analog of “madness”.
Holland and Goodman – Caltech – Banbury 2001
Where’s the Consciousness?• Not there yet• More complex robots• More complex environments• More complex architecture
Head:2 degrees of freedomBody:2 degrees of freedomArms:4 degrees of freedom (x2)Legs:6 degrees of freedom (x2)(Total of 24 degrees of freedom)
SONY DREAM ROBOT
Holland and Goodman – Caltech – Banbury 2001
Increasing complexityEnvironment Agent
Fixed environment Movable bodyMoving objects More sensorsMovable objects EffectorsObjects with different values Articulated bodyOther agents – prey Metabolic stateOther agents – predators Acquired skillsOther agents – competitors ToolsOther agents – collaborators Imitative learningOther agents – mates LanguageEtc Etc
Holland and Goodman – Caltech – Banbury 2001
Multi-stage planningAt each step:
- what actions could it take?- what actions should it take?- what actions would it take?
The planning system needs- a good and current model of the world- a good and current model of the agent’s abilities, expressible in terms of their effects on the model world- an associated executive system to use the information generated by the planning system
Holland and Goodman – Caltech – Banbury 2001
A framework?
Self Model
Environment Model
Updates To executive
Updates
Holland and Goodman – Caltech – Banbury 2001
Speculation…
There may be something it is like to be such a self-model linked to such a world model in a robot with a mission