1
Decision Making for Autonomous Navigation
Abstract
Robots can automate a wide range of physical tasks from warehouse management to
space exploration. Many of these tasks require robots to navigate autonomously. Such a
robot should continuously choose and execute actions from a set of available actions until
it reaches the destination. The robot’s decision should avoid collisions and other
damaging outcomes while it ensures favorable ones, such as reduced travel distance and
time. This paper reviews decision-making techniques for effective navigation. In
particular, it reviews different methods to acquire information about the environment and
how this information can be used for effective decision making.
1. Introduction
Robots are machines that effect changes in the world inspired by human or animal
physical capabilities. A robot is autonomous if it can complete an assigned task with no
human intervention. Autonomous robots are increasingly used to automate routine tasks
in areas such as industrial automation and warehouse management, or tasks in areas that
are dangerous for people such as space exploration, military missions, and search and
rescue operations.
Mobile robots are robots that can move in a two-dimensional (2-D) or a three-
dimensional (3-D) world. This paper will focus on mobile robots that move in a 2-D
world. The success of a mobile robot depends on its ability to navigate, that is, to
ascertain its current position in the world and reach an assigned location. Mobile robots
that navigate autonomously are used to move goods in a warehouse (Sun et al. 2014),
2
carry out mundane tasks such as vacuuming (Forlizzi & Disalvo 2006), or transport
people in vehicles (Petrovskaya & Thrun 2009). Mobile robots are also used to explore
dangerous or unknown locations in space missions (Maimone et al. 2007) or search and
rescue operations (Davids 2002). Although these applications have different requirements
and criteria for success, they all rely on autonomous navigation.
Most commercially available mobile robots have a hardware/firmware layer and a
software layer. The hardware layer includes sensors and actuators. Sensors measure or
detect quantities in the environment, (e.g., light, heat) and convert them into digital
information. Actuators are motors that cause physical actions. The software layer accepts
sensor information as input, and generates commands to the robot actuators such that the
robot completes its assigned task. This separation of hardware systems from the
autonomous decision-making software has significant advantages. It allows robot
manufacturers to build standardized robots without application-specific details, and
allows robot users the flexibility to implement customized decision-making algorithms
based on the application. This paper focuses on such decision-making algorithms for
autonomous navigation.
In computer science, decision making involves several subfields of artificial
intelligence; including planning, reasoning, and machine learning. Planning organizes
future actions to achieve an assigned goal. Reasoning infers the truth-value of statements
from known facts. Learning uses past experience to improve decision making, so that the
performance of a robot on a task improves with its experience.
The remainder of this paper describes decision making, planning, learning, and
other techniques used to build autonomous navigation systems for mobile robots, with a
3
particular focus on how knowledge about the environment is acquired and used. Section 2
describes the basic hardware components of a mobile robot and classifies various mobile
robot settings. It also provides an overview of different decision-making approaches and
describes the role of knowledge acquisition in decision making. Section 3 focuses on
knowledge acquisition, that is, how raw sensor data can be converted into information
useful for decision making. Sections 4, 5, and 6 address deliberative, reactive, and hybrid
decision-making approaches to navigation, respectively. Section 7 briefly describes
spatial knowledge and decision making in humans. Finally, Section 8 discusses important
challenges in current autonomous navigation systems.
2. Mobile robots
This section introduces basic components of a mobile robot and describes the different
class of mobile robot systems based on their hardware capabilities, the environment they
inhabit, the tasks they are assigned, and the decision-making approach they adopt. Mobile
robots use a broad variety of sensors and actuators. Sensors can be classified as active or
passive. Active sensors use their own energy source. They emit radiation that is directed
toward the target to be investigated. The radiation reflected from that target is then
detected and measured by the sensor. The key advantage of an active sensor is that it is
less sensitive to external changes, such as time of day and season. Range-finding sensors
(e.g., LiDAR, SONAR, and UltraSonic sensors) are active sensors. They measure the
distance to the nearest obstacle by the emission of light or sound waves. LiDAR uses
light waves; SONAR and Ultrasonic sensors uses sound waves. The reflected light or
sound wave is detected, and the distance between the sensor and the obstacle is calculated
4
based on the delay until the arrival of the reflected wave. In robotics, range-finding
sensors are used to detect obstacles, map the environment, and avoid collision.
A passive sensor measures the naturally available signals in the environment.
Motion sensors and vision-based sensors are the commonly used passive sensors. Motion
sensors are primarily used as an odometer to estimate the position of the robot. Motion
sensors measure various aspects of robot movements. Inertial motion sensors measure the
inertial force experienced by a moving object; wheel encoders estimate the rotation of the
wheel. Vision-based sensors measure electromagnetic radiation in the environment. They
are most commonly used to identify landmarks along a route, but they can also serve as
odometers. There are also hybrid sensors that are both passive and active. Kinnect™ is
one such hybrid sensor. Kinnect™ passively captures visual information with a camera. It
also makes active depth measurements by its emission and detection of infrared radiation.
Kinnect™ sensors are often used to localize and identify landmarks (Veloso et al. 2012).
(a) (b) (c)
Figure 1: Robot models with different actuators and sensors. (a) Surveyor with treaded wheels. (b)
Turtlebot with a Kinnect™ Sensor. (c) Sony’s AiBO robot with legs.
5
Wheels and legs are the two primary kinds of actuators used in mobile robots. As
in Figure 1, Sony’s AiBO1 robots have legs where each joint is controlled by a motor.
Other robots (e.g., Surveyors2 and Turtlebots3) use motorized wheels. Indeed, any
standard vehicle (e.g., a car or a truck) can also be viewed as a wheeled robot. Wheeled
robots predominate among mobile robots because of their simplicity. Compared to a
robot with legs, a wheeled robot has fewer moving parts, which makes decision making
more tractable. Robots with legs often have more motors, whose simultaneous control
requires complex decision making.
Mobile robot systems can be classified, based on the number of robots, as single-
robot or multi-robot systems. Single-robot systems do not require task allocation,
teamwork, or collaboration. Multi-robot systems can parallelize tasks and hence reduce
the task completion time. They are also more robust because hardware or software failure
on a single robot does not necessarily result in task failure. Multi-robot systems are
classified based on whether the robots complete tasks individually or through
collaboration. They can also be classified by whether a central server makes decisions for
all robots or if each robot controls itself.
The sensing capabilities of a robot can be classified as local or global. Local
sensing captures information about the immediate environment through onboard sensors
(e.g., cameras, range-finding sensors). Global sensing captures a global perspective of the
environment with an external sensor like GPS or an overhead camera. In many cases,
mobile robot systems rely on both kinds of sensing.
1 http://www.sony-aibo.com 2 http://www.robotmarketplace.com/products/IL-SRV1Q.html 3 http://www.turtlebot.com
6
Navigational tasks assigned to the robots can be classified as exploratory or
target-based. Exploratory tasks focus on the exploration of a given area without any
particular target location. Target-based tasks expect the robot to travel to one or more
given targets. In navigation tasks with multiple targets, either the target set is known in
advance and is given to the robot before it begins execution, or the target set is not known
in advance and is assigned to robots as and when new targets arrive.
Finally, a robot’s environment can be classified as structured or unstructured.
Structured environments, such as offices or warehouses, have certain geometric
regularities (e.g., rooms, corridors) that allow a robot to make simplifying assumptions
that assist in navigation. In an unstructured setting these assumptions do not hold, and
more domain-independent approaches are required. Environments can also be classified
as static or dynamic. Environments where no features change during task execution are
called static environments. If features of the environment change during the execution of
the task the environment is called dynamic.
The acquisition of knowledge is central to decision making in autonomous
navigation. The knowledge required for autonomous navigation can be classified as
knowledge about the environment, knowledge about the position and orientation of the
robot in the environment, knowledge about an effective set of parameters for a robot in
the given environment, and knowledge about the robot’s goal. This knowledge can be
acquired continuously with sensors. Such knowledge will be up to date and need not be
stored internally. Another approach is one-time acquisition of knowledge that is then
stored internally for future use.
7
Reactive systems use only continuous sensor data without any stored internal
knowledge. Reactive systems are rule-based, fast, and can quickly respond to changes in
the environment. They can also be sub-optimal because they make decisions based only
on the immediate situation and ignore the future implications of a decision.
In contrast, deliberative systems require an internal representation of knowledge.
That knowledge can be either pre-defined or learned through sensor data. A deliberative
system uses its knowledge to predict the future states of a robot and to select actions that
would lead the robot to its goal. These methods work well only when the external
environment remains consistent with robot’s internal representation. Most approaches are
hybrids; they acquire some components of knowledge continuously from sensors and
store others. Whether or not it stores knowledge, every navigation system must acquire
and apply knowledge about the environment, about the position and orientation of the
robot, about its parameter settings, and about the goal.
3 Knowledge acquisition
This section describes the components of knowledge required for navigation. Mapping
generates a map of the environment. Localization determines the position and orientation
of the robot (robot state) in the environment. Simultaneous localization and mapping is a
knowledge acquisition problem where both mapping and localization is done
simultaneously. Estimation of parameter values that optimize robot performance in a
given environment is also discussed. Finally, this section ends with a survey of different
approaches to order tasks assigned to a robot.
8
3.1 Mapping
A map is a diagrammatic representation of an environment that represents features useful
for navigation. Metric maps capture the geometric properties of the environment;
topological maps capture the connectivity among different regions.
Two important approaches to mapping based on metric maps are those that use
geometric shapes (Chatila & Laumond 1985) and those that use probabilistic grids called
occupancy grids (Elfes 1989). Early mapping work used convex polygons to model
empty space (Chatila & Laumond 1985). Service robots in museums or factories could
use these techniques to build a map of the environment. The environment can be initially
controlled to ensure that there are no moving obstacles; once an accurate map is built, the
robot can then start to navigate. This approach has two main difficulties: geometric,
shape-based mapping provides no mechanism to incorporate uncertainties due to sensor
or actuator errors or changes in the environment, and it applies only to environments with
regular geometric structures.
In contrast, occupancy grids try to address both these problems with a
probabilistic grid that models a maps (Elfes 1989). This approach partitions the map into
a grid, each of whose cells stores a probabilistic estimate of whether it contains an
obstacle. This allows the use of techniques such as Bayesian estimation, where an initial
belief about the occupancy of the cell is continuously revised as more evidence is
collected from sensor measurements.
Topological approaches collect information about the connections among the
various regions in the map (Mataric 1992; Kuipers & Byun 1991). In one approach,
robots use sensors to detect and follow walls (Mataric 1992). A compass mounted on the
9
robot detects the orientation of the walls. As the robot tries to follow one wall after
another, it creates a network of such landmarks, and uses the network as a topological
map for navigation. In another approach, the robot randomly explores the environment
while it simultaneously identifies places with distinctive characteristics based on its
sensor measurements (Kuipers & Byun 1991). For example, a robot in a room could use
“equal-distance-from-nearby-obstacles” as its metric to correctly identify the middle of
the room. After the robot reaches a distinctive place, it randomly chooses and follows an
applicable control strategy such as “follow-the-midline-in-a-corridor” or “walk-along-
the-edge-of-a-large-space”. If a new distinctive place is found, then the two distinctive
places are treated as nodes and the control strategy between them is recorded as the edge
that connects the nodes.
Hybrid approaches use both topological and metric information for navigation
(Thrun 1998). Metric information is stored as an occupancy grid; a trained neural
network converts the sensor measurements into the values in the grid. Planning over an
occupancy grid depends on the granularity and can be suboptimal for fine granularity. A
topological map makes planning more efficient, since it has fewer regions and hence a
smaller state space compared to an occupancy grid. A Voronoi diagram converts the
occupancy grid into a topological map. A Voronoi diagram (Figure 2(a)) is a set of
points in free space that have at least two distinct nearest points. Critical points are points
in the set where the distance to its nearest neighbor reaches a local minimum. Critical
lines connect critical points to its nearest neighbors (Figure 2(b)). These critical lines
partition a map into topological regions (Figure 2(c)). These topological regions are then
10
converted into a graph where nodes represent the regions and edges represent adjacencies
between regions (Figure 2(d)).
(a) Voronoi diagram (b) Critical lines
(c) Regions (d) Topological graph
Figure 2: Extracting the topological graph from the map (Thrun 1998, p. 44).
3.2 Localization
Localization is a fundamental ability for navigation. There are two kinds of localization:
position tracking, which finds the position of the robot relative to its initial position, and
global localization, which finds the global co-ordinates of the robots.
Dead reckoning can be used for position tracking. It continuously measures the
motion of the robot and estimates a position relative to the robot’s initial position. The
robot’s motion can be estimated either by odometers that measure the speed of the wheels
or by inertial sensors, such as accelerometers, that measure the inertial forces experienced
11
by the robot during motion. Probabilistic approaches can improve the performance of
dead reckoning. One approach is to estimate the probability distribution of the next
position of the robot in the map, given the sensor measurements and its previous
positions. The Markov assumption reduces this dependency to only the previous position.
Kalman filters are commonly used to estimate this probability distribution (Montemerlo
et al. 2002). Kalman filters assume that a Gaussian distribution can model the actuator
and sensor errors.
The two most common approaches to global localization are landmarks and
sensor matching. A landmark is defined as a distinctive and recognizable feature of the
environment. The robot identifies and stores landmarks encountered in a map. The robot
is localized when a new landmark matches a stored landmark. Landmark-based
approaches are common in robots that use vision-based sensors. The robot can use scale-
invariant features (Lowe 1999) of the environment as landmarks (Se et al. 2001). Scale-
invariant features do not change under image translation, scaling, or rotation. This makes
them suitable landmarks for mobile robot localization. Given a database of such features,
a matching algorithm can be used to find the global coordinates of the robot.
Sensor matching uses all available sensor information to estimate the robot’s
global position. Monte Carlo Localization (MCL) is one such global localization
technique. It computes the probability distribution of the robot’s position in the
environment given all sensor measurements and a global map (Thrun et al. 2001). MCL
applies sampling-based methods to approximate the distribution. The key idea is to
maintain a probability distribution of the robot’s position as a set of random position
samples (particles) drawn from the probability distribution. Each particle is a possible
12
position for the robot. The set of particles constitute a discrete approximation of the
probability distribution. Weights are then assigned to each particle based on the sensor
information. These weights reflect changes in the probabilities of the possible positions
of the robot, given the new evidence from the sensors. Once the robot moves, it computes
a new particle set that represents the robot’s latest position. Both the old particle set and
its weights are used to compute the new particle set. This ensures that the new particle set
incorporates the evidence from the sensors. Because the number of particles can be
adjusted, MCL allows a tradeoff between accuracy and computational cost, which makes
it a popular localization algorithm.
3.3 Simultaneous localization and mapping
Simultaneous localization and mapping (SLAM) generates a map of the environment and
simultaneously determines the robot’s location in that map. This is accomplished by
detection of new landmarks and identification of observed landmarks as the robot moves
in the environment. When a robot moves, a noisy odometer estimates the new position.
Landmarks from the environment are extracted from this new position. The robot
attempts to associate these just-identified landmarks with earlier observed landmarks. Re-
observed landmarks are then used to improve the estimation of robot’s position.
The initial approach to SLAM proposed the use of an extended Kalman filter
(EKF) to estimate the distribution of the robot’s position and the distribution of landmark
positions (Smith et al. 1987). Key limitations of EKF-based approaches are that they are
slow and scale quadratically with the number of landmarks (Montemerlo et al. 2002).
FASTSLAM, a popular SLAM method based on particle filters, has been shown to scale
logarithmically with the number of landmarks (Montemerlo et al. 2002). An important
13
requirement of the SLAM-based approach is the detection and identification of enough
landmarks to ensure accurate localization. A large number of landmarks, however, imply
a large computation time, which might be inappropriate for real-time applications. This
inherent tradeoff between speed and accuracy remains an important issue in SLAM.
3.4 Parameter estimation
The identification of optimal parameter values to control robot navigation is called
parameter estimation. In reactive navigation systems, the parameters are often used to
weight different behaviors (Maes & Brooks 1990; Ram et al. 1994). In one approach, the
correlation between behaviors and conditions that result in positive reinforcements are
learned, and the behaviors with a high probability of positive reinforcement are preferred
(Maes & Brooks 1990).
Another approach to parameter estimation uses genetic algorithms (Ram et al.
1994). Initially, random weights are assigned to the behaviors of the robots. The robots
are run in simulation and their performance on a navigational task is recorded. Many such
sets of weights are generated and their performance is evaluated in simulation. A new
generation of weights is created from the combination of weights with better performance
from the previous generation and they are again tested in simulation. The process is
repeated until a termination condition is reached; then the weights with best performance
are selected.
A robot that learns to adapt to changes in its environment greatly improves its
utility. For example, a robot that can switch from one kind of sensor to another to account
for variations in lighting conditions (e.g., learn to use a vision camera during the day and
an infrared sensor at night) can be more useful. A robot that can adapt its behavior based
14
on feedback from its users is also useful. Such a robot could learn to navigate differently
in different settings, and optimize for time or distance travelled.
3.5 Goal Knowledge
Navigation goals set targets and order them, that is, they specify when a visit to one target
should precede a visit to another target. For a single robot with multiple unordered
targets, target ordering seeks an ordering that maximizes efficiency. In such cases, target
ordering can be modeled as a traveling salesperson problem. In other cases, target sets are
partially ordered or completely ordered. In scenarios such as manufacturing where the
robot has to move to different parts of a factory floor to follow a predefined workflow
that combines components to build products, the target set might be completely ordered
and the robot’s task is only to reach the targets in the specified order. Other scenarios
might specify the target set as a partial order. For example, if the goal is to shift identical
chairs from one room to another, the robots must visit the first room to pick a chair up
and then visit the destination room to place that chair. There are no restrictions, however
on which chair to pick up first or where in the destination room to place the chair.
A partially or unordered target set can also be dynamic. The target set is dynamic
either if the target’s position changes over time (the moving target tracking problem)
(Huang 2009) or if targets are dynamically added and deleted. To operate successfully in
such an environment, the robot should be able to re-compute the optimal target order
during the task. Another way to address dynamism is to rely on heuristics (e.g., visit the
closest point first) that can provide possibly suboptimal but quick solutions.
In multi-robot systems, target allocation partitions a set of targets among the
robots, in a way that improves performance. Optimal target allocation is computationally
15
expensive, because it must find both an optimal partition and an optimal target order for
every allocation. When both the robot’s environment and the set of targets are static,
combinatorial auctions can be used to allocate targets (Berhault et al. 2003). Each robot
bids for a bundle of targets. The targets are allocated to the lowest bidder. Since the
number of possible bundles increases exponentially with the targets, finding an optimal
bidding strategy is a hard problem.
4. Deliberative decision making
This section describes several planning mechanisms relevant for navigation. Given an
internal representation of the map, the robot state, parameter values, and a goal, a planner
should be able to generate a sequence of actions (a plan) that will accomplish its goal,
provided the robot’s actuators execute the action accurately. When the task is navigation,
a plan can be a sequence of waypoints on a map. The identification of this sequence of
waypoints is called pathfinding. There are many different ways to plan including classical
planning, graph-based approaches, sampling-based approaches, and planning in non-
deterministic settings.
4.1 Classical Planning
Classical planning uses formal logic to represent and create a plan. In classical planning
methods, the map, robot state information, robot action repertoire, and goal information
are stored as logical formulas, and the planner uses theorem proving to prove that
execution of the plan will satisfy the goal conditions. Classical planning can address
planning in a state space, a plan space, a planning graph, or as a constraint satisfaction
problem.
16
In state-space planning, the search space is a set of world states, transitions
between states are actions, and a path through the state space that begins at the start state
and ends in a goal state is a plan. STRIPS was an early a state-space planner that used
theorem proving to test if the state that eventually results from the execution of actions
listed in a plan would satisfy the goal condition (Fikes & Nilsson 1971). STRIPS used
means-ends analysis as a heuristic to guide search. ABSTRIPS improved on STRIPS; it
built an abstract plan first, and moved towards a more detailed version of the plan only if
the abstract plan succeeded (Sacerdoti 1974). This detection and elimination of
impossible search spaces allowed ABSTRIPS to plan faster.
In plan-space planning, the search space is a set of plans (including partial plans).
The initial state is a null plan and transitions are plan operators that add or reorder actions
to modify plans. Plan-space planning methods follow the principle of least commitment
and create a partial order of actions. Actions are ordered only when necessary; this
reduces backtracking, as it avoids a situation seen in state space planning called
Sussman’s anomaly (Sacerdoti 1975). For example, NOAH, a non-linear planner, fulfills
unsatisfied preconditions by adding actions to the plan (Sacerdoti 1975). It then uses
procedures called critics to confirm that the additions do not harm previously satisfied
preconditions. Actions that cause conflicts are ordered to resolve any conflicts. Because
actions are ordered only when required (in this case is to resolve conflicts), NOAH
follows the principle of least commitment. TWEAK is an improvement over NOAH; it is
a simpler, provably complete and correct non-linear planner that used a STRIPS-like
action representation (Chapman 1987). TWEAK, however, is restrictive; its
representation does not allow conditions and universal quantifiers. UCPOP, another
17
partial-order planner, allows universal quantification and conditional effects (Penberthy
& Weld 1992). The efficiency of UCPOP is further improved in RePOP, where heuristics
ensure that the partial-order plan most likely to lead to a solution is explored first
(Nguyen & Kambhampati 2001).
A planning graph is a directed graph where levels alternate between proposition
nodes and action nodes. An example appears in Figure 3. An edge connects an action in
one level to a proposition in the next level whenever the proposition forms a post-
condition of the action. An edge also connects a proposition in one level to an action in
the next whenever a proposition satisfies a pre-condition of an action. Two actions
interfere with one another if the outcome of one action reverses the outcome of the other,
or if the outcome of one action negates the preconditions of the other. Planning graphs
are more compact than their corresponding state-space based graphs, that is, they have
fewer nodes because they allow actions to interfere. To produce a plan, a search
algorithm is run on the planning graph in stages.
Figure 3: Example of a planning graph with alternate levels of actions and propositions (Blum 1997, p. 5).
18
GraphPlan is a planner based on a planning graph and a STRIPS-like
representation (Blum 1997). GraphPlan begins in stage 1 with a planning graph whose
single proposition level represents the initial conditions. In stage i, GraphPlan takes the
planning graph from stage i−1, extends it one time step (the next action level and the
following propositional level), and then searches the extended planning graph for a valid
plan of length i. GraphPlan’s search either finds a valid plan (in which case it halts) or
determines that the goals are not all achievable by time i (in which case it iterates to the
next stage). In each iteration through this extend/search loop, the algorithm either
discovers a plan or else proves that no plan with that many time steps or fewer is
possible.
Planners based on heuristic-search focus on how to guide the search through the
state space. Heuristics evaluate future states on their distance to the goal; the state closest
to the goal is explored first. The number of actions in a plan that solves a relaxed version
of the planning problem is used as the measure distance. The Fast-Forward (FF) planning
system uses planning graphs to construct relaxed planning problems whose solutions are
then used to estimate the distance to the goal (Hoffmann & Nebel 2001). The Fast-
Downward (FD) planning system combines heuristic search with hierarchal problem
decomposition (Helmert 2006). Like FF, FD computes plans by heuristic search in the
state space. Unlike FF, however, FD’s heuristic function uses hierarchal decomposition
of the planning task to construct relaxed planning problems. The number of actions in the
plans for these relaxed planning problems is then used to estimate the distance from a
state to the goal. The LAMA planner extends FD with landmarks, propositional formulas
19
that must be true at some point in every plan for a given task (Richter & Westphal 2010).
LAMA then uses landmarks to guide the search to states with many landmarks.
Some approaches have sought to transform the planning problem into a constraint
satisfaction problem, to exploit existing constraint satisfaction solvers. A planning graph
can be formulated as a SAT problem, and general-purpose SAT solvers can be used to
solve the problem (Kautz et al. 1996). In another approach, Local Planning Graph (LPG),
the planning graph is formulated as a propositional constraint satisfaction problem
(Gerevini & Serina 2003). LPG treats the stages of the planning graph as variables, and
the detection of a partial plan in a stage as variable assignment. LPG relies on heuristics
to improve its speed.
4.2 Graph-based Search
In graph-based search, an undirected graph represents the topology and distance
information of a fixed map. The map is divided into a regular grid, where nodes represent
cells and edges connect adjacent cells. Edge weights represent obstacles between the
cells. A route through the map from the initial position of the robot to the goal can be
represented as a path in the undirected graph. This reduces the path-planning problem to
a shortest path problem.
A* is the most common approach to search for a path to the goal state (Nilsson
1980). A* estimates the total cost from the start node to the goal as the sum of the cost to
get to the current node from the start node and a heuristic estimate of the cost from the
current node to the goal node. The algorithm returns an optimal path from the start to the
goal when the heuristic always underestimates the true distance. A* search keeps a queue
of states to be explored and the state with the lowest total cost is explored first.
20
On large maps, A* consumes considerable memory, because the number of states
in the queue grows at an exponential rate. In contrast, Iterative Deepening A* (IDA*)
only requires memory linear in the length of the solution that it constructs (Korf 1996).
IDA* is an iterative algorithm that begins with an initial cutoff for the total estimated cost
of the solution. Each iteration begins with a depth-first search for a solution whose
estimated total cost is less than the cutoff. If a solution is not found, the cutoff is
incremented and depth first search is repeated. Hierarchal planning A* (HPA*) is a near-
optimal algorithm that uses the hierarchal properties of path planning to save computation
(Botea et al. 2004). HPA* partitions the map into neighborhoods, and finds a path to the
border of the neighborhood that contains the start location. Next, it finds a path from the
border of the start neighborhood to the border of the neighborhood around the goal.
Because there are fewer neighborhoods than nodes in a grid-based graph, computation of
such a path is fast. Finally, HPA* finds a path from the border of the goal neighborhood
to the goal. All three paths combined forms a solution that is both quick to compute and
near-optimal.
4.3 Sampling-based approaches
Unlike simple path planning, motion planning does not assume that the robot is a single
point in space. Instead, motion planning considers the problems that can arise when
robots are treated as solid-bodied objects. The high dimensionality of motion planning
renders several of the standard planning approaches ineffective. Sampling-based
approaches are commonly used instead. Rather that an explicit representation of the
environment, sampling-based algorithms rely on collision checking to test whether two
randomly sampled states have a feasible trajectory. A graph of feasible trajectories is
21
built from multiple sampled states, which is then used to construct a solution to the
original motion-planning problem. Although sampling-based algorithms are not
complete, they provide probabilistic completeness guarantees, that is, the probability that
the planner fails to return a solution, if one exists, decays to zero as the number of
sampled points approaches infinity.
There are two principal sampling-based approaches: Probabilistic RoadMaps
(PRMs) (Kavraki et al. 1996) and Rapidly exploring Random Trees (RRT) (Zucker et al.
2007; Ferguson et al. 2006). As shown in Figure 4, PRM is a graph whose nodes
represent valid states, chosen at random from a state space. In motion planning, a state
includes not only the position of the robot, but also the orientation of the robot’s body.
Each state is represented as a node in a graph. A set of such nodes is chosen randomly. A
local planner checks for a path from one state to other nearby states (within distance d
from the given state, where d is pre-defined) and, if a path is found, the nodes that
represent the states are connected with an edge. The resultant graph becomes the input to
a graph search algorithm that then computes the shortest path from the initial state to the
goal state.
The RRT algorithm randomly builds a tree to search a non-convex, high-
dimensional space. RRT chooses a state at random, and connects it to the nearest state in
the tree, provided that a collision-free path exists between them. A local planner that
searches for a plan of pre-defined length l is used to test for the existence of a collision-
free path. Each iteration adds a new state to the tree, and the tree grows with a bias
toward unexplored areas of the search space. An example appears in Figure 5. Growth of
the tree stops when the goal state is randomly chosen and added to the tree. The tree is
22
then used as an input to a search algorithm, which returns a path from the initial state to a
goal state.
Figure 4: PRMs (Choset 2005, p. 245). The circles denote randomly sampled states and the lines denote
feasible trajectory. The dark lines form a path from initial state (qinit) to final state (qgoal).
Figure 5: Growth of RRTs biased towards unexplored spaces (Kuffner and LaValle 2000, p. 3).
4.4 Planning in non-deterministic settings
In many domains a robot’s actuators are imperfect, which leads to uncertain outcomes. In
such domains the state of a robot after an action can be represented as a probability
23
distribution over the possible states. A Markov decision process (MDP) can be used to
model such domains. An MDP framework, however, assumes that there is complete
observability (i.e. the robot can accurately detect its current state).
An MDP can be described as a tuple <S, A, T, R, π >, where:
• S is a finite set of states of the world. A continuous world can be converted into a
finite set of states by discretization.
• A is a finite set of actions.
• T : S × A → p(S) is a state-transition function, which for every state s ∈ S and
action a ∈ A, the function returns a probability distribution over the world states.
• R : S × A → ℝ is the reward function that returns expected reward gained by the
robot when it chooses an action in a specific state.
• 𝜋: S → A is policy function that specifies the action to be chosen in any given
state.
The objective of an MDP framework is to find an optimal policy function that results in
the maximum reward accumulation, as defined by the reward function. Value Iteration
(VI) is an iterative algorithm to find such an optimal policy. A value function returns the
maximum reward that the robot could get from a particular state if it is allowed to make a
limited number of decisions. VI initially computes the value function assuming the robot
is allowed make only one decision. In the next step, it iteratively computes the value
function if the robot is allowed to make two decisions with the previously computed
value function. This is repeated until the value function is found for the desired number
of decisions. The optimal policy is then to choose actions that should lead to states with a
maximum value as defined by the value function.
24
In many domains, a robot with noisy sensors may not able to determine its current
state with complete reliability. Partially Observable MDP (POMDP) provides a
framework to obtain optimal policies in such domains (Kaelbling et al. 1998). A POMDP
can be described as a tuple <S, A, T, R, O, Of> where,
• S, A, T and R describe a Markov decision process;
• O is a finite set of observations the robot can experience through its sensors. A
continuous valued sensor can be discretized to generate a finite observation set.
• Of : S × A → p(O) is the observation function which returns for each action and
resulting state a probability distribution over possible observations
Because POMDPs do not know the current state accurately, they maintain a probability
distribution over states called a belief state. Some approaches have a separate component
called State Estimator (SE) that updates the belief state based on the last action, the
current observation, and the previous belief state (Figure 6). The goal is to generate a
policy 𝜋 that could suggest optimal actions from any belief state. An important aspect of
POMDPs is that there is no difference between an action chosen to change the state of the
world and action chosen to gain information. POMDPs have several drawbacks, however.
First, the generation of an optimal policy is computationally expensive, so such a policy
cannot be generated for large domains. Second, POMDPs are based on a stochastic model
of the transition and reward function, so any policy generated is optimal only for that
model; any change in the stochastic nature of the environment results in suboptimal
results.
25
Figure 6: Relationship between state estimator, policy and the world (Kaelbling, Littman, and Cassandra
1998, p. 106).
Online POMDPs, unlike offline POMDPs, interleave policy construction and
execution (Figure 7) (Ross et al. 2008). The online approach tries only to find a good
local policy for the current belief state of the robot, so it considers only states reachable
from the current belief state. This focuses computation on a small number of states. Once
this local planning phase terminates, the execution phase executes the best action from
the local policy and updates the current belief based on the observation obtained.
Figure 7: Policy construction and execution in online and offline POMDPs (Ross et al. 2008, p. 671).
An improvement in performance of online POMDPs is made by the application of
Monte Carlo sampling (Silver & Veness 2010). Instead of search over all possible
observations, a search over a sampled set of observations, and beliefs reached by the
sampled observations allows for a deeper search within a given planning time.
26
5. Reactive decision making
Figure 8: Subsumption Architecture (R. Brooks 1986, p. 7).
This section reviews various architectures built for reactive decision-making. Rod
Brooks, in 1991, argued that intelligence can be achieved without any internal
representation (Brooks 1991), based on two important observations. First, the acquisition
of accurate world information through noisy sensors is difficult. Second, even if an
accurate internal representation were built, it would have to be updated constantly
because most real-world robotics deals with dynamic environments. Given these
arguments, Brooks’ approach was to replace internal representations with sensor
information and predefined rules that computed the decision reactively. Without any
internal representation, the problems of accurate knowledge acquisition and stale
knowledge are avoided. His reactive subsumption architecture is based on behaviors
organized into layers, where decisions from higher-layer behaviors override decisions
from lower-layer behaviors, as shown in Figure 8 (Brooks 1986). Behaviors at each layer
receive sensor information and make independent decisions that are sent to the actuators.
27
Toto, a mobile robot system, was built on the subsumption architecture (Brooks
1986). Its low-level behaviors enabled it to wander without collision. An intermediate
level of behavior recognized such landmarks as walls and clutters. Each time it
recognized a landmark, a behavior allocated itself as the 'place' of that particular
landmark. Behaviors that corresponded to physically adjacent landmarks had neighbor
relationship links activated between them. A graph structure was thus formed, where the
nodes of the graph were active computational elements rather than static data structures.
As the robot moved in the environment, the nodes tracked its location. Nodes became
active if they could detect that they corresponded to the robot’s current location. Thus the
robot had both a map, and a sense of where it was on the map, from a distributed
computational model.
AuRA also builds robot decision systems as a collection of parallel behaviors
(Arkin 1990). It includes motor schemas that are primitive movement behaviors, and
perceptual schemas that are behaviors to sense actively the required information from the
environment. Each motor schema recommends a possible move in the form a vector with
direction and magnitude. The sum of all such vectors represents the final decision that is
then executed by the robot.
Pengi consists of a collection of rules (Agre & Chapman 1987). These rules were
guided by the properties of the immediate situation. With its simple scheme of rules,
Pengi plays a fairly complicated game without an explicit internal representation of the
environment. Although reactive methods are fast and can handle dynamism well, the fact
that they do not exploit available knowledge to make long-range plans limits their
efficiency.
28
6. Hybrid decision making
This section reviews different approaches used for robot navigation in a dynamic
environment such as dynamic obstacles (e.g., people moving on a factory floor) and
dynamic maps (e.g., roads blocked for construction). Given a navigation plan as a
sequence of subgoals, changes in the environment could render it useless. Because not all
environmental changes can be predicted, robots have to rely on their sensors to gather
information about them. Thus the reliance on only an internal representation does not
serve well in realistic scenarios. Instead, hybrid architectures try to adapt deliberative
behavior in the light of sensor information about the changes in the world. Such
architectures are also reviewed in this section.
6.1 Obstacle Avoidance
If there are moving obstacles in the environment, the robot must avoid them (obstacle
avoidance) and continue to follow its earlier plan, revise it, or create a new plan. It is
typically assumed that the robot can sense the location and the velocity of the obstacle. A
common approach to obstacle avoidance is first to plan a path to avoid static obstacles
and then use local information about dynamic obstacles to divert minimally from that
plan. How to minimize the cost of diversion is treated as an optimization problem
(Chhugani et al. 2009). Other approaches to obstacle avoidance include reactive rules
(Sun et al. 2014) and potential fields (Khatib 1985). Although such rules are
computationally fast, they suffer from local minima and do not guarantee optimal
solutions (Koren et al. 1991).
6.2 Dynamic Maps
29
When maps are partially known or change over time, A*-generated plans can fail. Using
A* to replan whenever a plan fails is computationally expensive. To address this need for
repeated planning, incremental search algorithms like D* (Stentz 1994), focused D*
(Stentz 1995), LPA* (Koenig et al. 2004), D* lite (Koenig & Likhachev 2002), and
MPGAA* (Hernández, Baier, and Asin 2014) have been proposed. Incremental search
algorithms assume that the unknown parts of the map have no obstacles, and find a
shortest path to the goal under this assumption. The robot then follows the path. When
the robot observes new map information it adds the information to the map and, if
necessary, generates a new shortest path to the target from the current position. This new
path generation is more efficient because it uses information from the earlier planning
process.
D* search starts from the goal state and explores until the robot’s initial state is
reached. D* search maintains a list of states to be evaluated, along with the shortest
distance from each state to the goal state. The list begins with the goal state. In every
iteration, D* chooses the state closest to the goal state from the list and adds its neighbors
into the list. When the start state is added to the list, the search stops, and the path to the
goal can be found by backtracking. The robot then follows this path to the goal. When the
robot encounters a new obstacle, cost updates are propagated from the position of the
obstacle to the robot. D* is efficient because most changes in the map are detected near
the robot, and therefore require fewer cost updates.
Focused D* improves upon D* with a heuristic that focuses the propagation of
cost changes toward the robot. LPA* returns the shortest path in a changing graph, but
with its start and end positions held constant. Unlike D*, which is similar to uniform cost
30
search, LPA* is similar to A* search. Each node in LPA* uses heuristics to maintain an
estimate of the path length through that node. When edge costs change, these estimates
are updated and the updated estimates are used to re-compute the plan. D* lite adapts
LPA* to work when the initial position of the robot changes as the robot moves.
MPGAA*, introduced in 2014, is an empirically faster yet simpler algorithm than
D* lite. MPGAA* initially finds an optimal path from the robot to the goal using the A*
algorithm. After the robot’s environment changes, MPGAA* efficiently re-plans with
two main ideas. First, it uses the information from the initial A* search procedure to
improve the heuristic estimate to the goal. This makes subsequent searches faster.
Second, it saves all the previous optimal paths and tries to reuse them instead of
repeatedly planning.
Another important search variation is anytime A*, an incremental algorithm that
allows the user to query for a possibly suboptimal solution at any time during search.
This is useful for robots with real-time requirements (Likhachev et al. 2005).
6.3 Hybrid Architectures
Many hybrid architectures incorporate both sensor information and static internal
representations of the environment in decision making (Firby 1987; Connell 1992; Gat
1997; Goodwin et al. 1997; Ranganathan & Koenig 2003; Rosenblatt 1997; Arkin 1990).
Reactive planning selects plans based only on the current situation without any lookahead
(Firby 1987). Reactive planning uses a set of pre-defined procedures called Reactive
Action Planners (RAP). Every RAP is a specially designed search procedure that chooses
actions only based on the current world state, until the assigned goal state is reached. The
goal-dependent design of its search procedures makes the search more efficient than brute
31
force, while dependency on only the current state ensures that the search algorithm works
despite changes in the environment.
Many three-tier hybrid architectures have been proposed with the following
structure: a reactive tier of behaviors that directly map sensor values to actuators with no
internal state, a sequencer whose algorithms sequence the behaviors to achieve complex
objectives, and a slow deliberative planner to find a long-term plan. In 3T, for example,
the reactive layer has primitive behaviors such as wall finding, wall following, and
collision avoidance (Gat 1997). Algorithms in its sequence layer combined the reactive
behaviors to achieve complex objectives. For example, self-localization is achieved
through a combination of wall following and collision avoidance.
Figure 9: DAMN Architecture (Rosenblatt 1997, p. 2).
Rather than a top-down structure to achieve the desired symbiosis of reactive and
deliberative behavior, DAMN, shown in Figure 9, takes a horizontal approach, where
reactive and deliberative modules concurrently vote to control the robot (Rosenblatt
1997). DAMN’s slower deliberative modules vote at a slower rate, while the faster,
reactive modules vote at a faster rate. Votes from different modules are combined based
on a pre-defined set of weights given to the modules. These weights control the reactive
and deliberative tendencies of decision making in DAMN.
32
7. Human Navigation
Figure 10: (a) Graph. (b) Labelled graph. (c) Survey knowledge (Chrastil and Warren 2013, p. 2).
This section briefly reviews relevant research in psychology about how people capture
spatial information and navigate. Evidence from psychology suggests that the spatial
knowledge that underlies human navigation includes routes, topological graphs, labeled
graphs, and surveys arranged in increasing level of detail (Chrastil & Warren 2013). A
more detailed representation allows superior navigation, but it is much more difficult to
learn. A route is a list of place-action associations that links an action at each
recognizable place. Routes can guide navigation to a known location along a previously
traveled path. A topological graph, of an environment consists of a network of
recognizable place nodes linked by path edges (traversable paths between nodes). An
example appears in Figure 10(a). It represents the known connectivity of the environment
and enables travel between any two nodes in the graph. In contrast, route knowledge does
not accommodate path intersections or junctions, and restricts travel to only known routes
between specific locations.
In addition to topological information, a labeled graph, (as in Figure 10(b))
incorporates approximate local metric information about distances between known places
(edge weights) and/or angles between known paths (node labels). This local metric
33
information can help find efficient routes and detours, and can permit approximate,
distance-based shortcuts through the local integration of information. Survey knowledge
(as in Figure 10(c)) consists of information about environmental locations, their
coordinates in a Euclidian space. The metric distances between these locations can be
computed from the coordinates. If such a map were built by path integration, it would
enable direct and accurate shortcuts between known locations, instead of just a single
possible route.
Wayfinding identifies a route from the source to the destination. It has been
hypothesized that humans use different wayfinding techniques for different purposes
(Golledge 1999). For example, a trip home after work retraces a learned route, while a
trip to a new location in a known city requires a search over more detailed representations
like a labeled graph or survey knowledge. The use of different wayfinding techniques
requires that people learn multiple spatial representations simultaneously to form a spatial
collage (Tversky 1993). This ability to maintain multiple spatial representations of our
environment and use a combination of them as the situation demands is an important
feature of human navigation.
8. Challenges in Autonomous Navigation
This section identifies significant research challenges in autonomous navigation and
argues that solutions to these problems would improve autonomous navigation.
8.1 Partially ordered targets
Most robots in dynamic environments address target ordering and path planning with two
separate algorithms (Veloso et al. 2012; Berhault et al. 2003). Techniques, such as
34
auctions, generate a target order, and then graph-based dynamic planners (e.g., D* lite)
use the target order as input and generate a plan for the next target in sequence. Although
dynamic planners claim to be optimal, they assume a total order for targets. In many
robotic applications, however, navigational tasks are partially ordered. For example,
moving a box on a factory floor might require the robot to carry the box to one of many
inventory rooms, but not necessarily to a specific room. In scenarios where targets are not
fully ordered, the approach of “first order, then plan” may produce suboptimal results.
When a robot’s environment changes, it might be more efficient to reorder target points
rather than to re-plan. Current graph-based dynamic planners fail to do this.
8.2 Learning relevant information
Most state-of-the-art autonomous navigation systems have a predefined model of the
environment (Veloso et al. 2012; Petrovskaya & Thrun 2009). A path planner then finds
the shortest possible path in that model. An accurate model either requires expensive
sensors or forces the robot to work in a static environment. An ideal autonomous
navigation system should learn, from experience, how to ignore information about the
world that is irrelevant to navigation, and how to extract only the relevant information.
Plans in such an ideal navigation system would allow for inaccuracies of the world
model. If the plan fails due to an inaccurate model the robot should be able to identify the
part of the model caused the failure and improve it accordingly.
8.3 Multi-robot navigation
In many applications where redundancy is important and the task is parallelizable, a team
of robots is more appropriate than a single robot. For example, in a search and rescue
scenario a team of robots might be better able to survive harsh conditions. There, failure
35
of a single robot does not mean failure of the entire mission, because the job of a failed
robot can be transferred to a working robot. Although A* is effective in moderate-sized
maps, multi-robot planning even in a static environment is computationally expensive
and does not scale well. The dynamic version of multi-robot planning is even more
difficult.
Another problem in multi-robot navigation is the selection of the correct team size
for the task at hand. For example, on a factory floor with multiple homogenous robots
that move material, too many robots can cause congestion and slow the process, while too
few can result in long task completion time.
8.4 Human Interaction
Current autonomous navigation systems that interact with humans either solicit input
from the user or communicate the result of a computation (Petrovskaya & Thrun 2009).
In applications of autonomous navigation, such as autonomous driving, people often have
useful navigation information beyond the destination location. For example, a passenger
in an autonomously driven vehicle might know that a particular street in the city is
blocked or likely to be blocked. As another example, a passenger might wish to avoid a
neighborhood that she considers unsafe. Another example is in mission-critical
applications, such as search and rescue, where additional human knowledge can direct the
robots in unforeseen situations. All these examples demonstrate the usefulness of a
navigation system that can be easily understood by people and can incorporate human
intuition and understanding for better decision making.
8.5 Changing environments
36
Many existing autonomous navigation systems are hybrids that rely on a mixture of
reactive and deliberative approaches. The reactive components of the architecture ensure
that the system is responsive in a dynamic world, whereas the deliberative components
seek to harness potential efficiency gains from long-term planning. The balance between
the two approaches is then determined manually, based on the nature of the domain.
Domains that experience rapid changes might be biased toward reactive approaches,
whereas domains that experience limited changes might be biased towards deliberative
approaches.
In many domains where robots could be deployed, however, (e.g., in airports to
move cargo, on factory floors to move goods, or on roads for autonomous driving) there
are periods of time when the environment changes are rapid and others periods when the
environment remains static. For example, an optimal strategy for rush hour may be
different from an optimal strategy during other, less busy hours. The ability to focus on
deliberative decision making when the world is static and reactive decision making when
the world is dynamic remains an important challenge for navigation.
9. Conclusion
Autonomous navigation by robots helps automate physical tasks that require robots to
move to different places in an environment. This paper has reviewed different approaches
to automate decision making for robot navigation, with the acquisition and use of spatial
knowledge as the central theme. Reactive methods’ dependence on continuous sensor
data allows them to adapt quickly to changes in the environments, while deliberative
methods can make long term decisions which leads to significant improvements in
efficiency. Hybrid approaches, attempt to capitalize on the best of both. Research on
37
human navigation is relevant because many robot applications are in domains where
humans navigate with relative ease. The limitations in the current decision-making
approaches for navigation restrict the widespread use of mobile autonomous robots, and
present interesting research challenges.
10. References
Agre, P.E. & Chapman, D., 1987. Pengi : Implementation of a Theory of Activity. In Proceedings of AAAI 1987. pp. 268–272.
Arkin, R.C., 1990. Integrating behavioral, perceptual, and world knowledge in reactive navigation. Robotics and Autonomous Systems, 6, pp.105–122.
Berhault, M., Huang, H., Keskinocak, P., Koenig, S., Elmaghraby, W., Griffin, P. & Kleywegt, A., 2003. Robot exploration with combinatorial auctions. In Proceedings of 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003), pp.1957–1962.
Blum, A., 1997. Fast planning through planning graph analysis. Artificial Intelligence, 90, pp.281–300.
Botea, A., Müller, M. & Schaeffer, J., 2004. Near optimal hierarchical path-finding. Journal of game development, pp.1–30.
Brooks, R., 1986. A robust layered control system for a mobile robot. IEEE Journal on Robotics and Automation, 2(1), pp.14–23.
Brooks, R.A., 1991. Intelligence without representation. Artificial Intelligence, 47(1-3), pp.139–159.
Chapman, D., 1987. Planning for conjunctive goals. Artificial Intelligence, 32, pp.333–377.
Chatila, R. & Laumond, J., 1985. Position referencing and consistent world modeling for mobile robots. In Proceedings of IEEE International Conference on Robotics and Automation 1985, 2, pp.138 – 145.
Chhugani, J., Kim, C., Satish, N., Lin, M., Manocha, D. & Dubey, P., 2009. ClearPath: Highly Parallel Collision Avoidance for Multi-Agent Simulation. ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp.177–187.
38
Choset, H.M., 2005. Principles of robot motion: theory, algorithms, and implementation, MIT press.
Chrastil, E. & Warren, W., 2013. From cognitive maps to cognitive graphs. PloS one, 9(11), e112544.
Connell, J.H., 1992. SSS: a hybrid architecture applied to robot navigation. In Proceedings of 1992 IEEE International Conference on Robotics and Automation, pp.2719–2724.
Davids, A., 2002. Urban search and rescue robots: From tragedy to technology. IEEE Intelligent Systems and Their Applications, 17(2), pp.81–83.
Elfes, A., 1989. Using occupancy grids for mobile robot perception and navigation. Computer, 22(6), pp.46–57.
Ferguson, D., Kalra, N. & Stentz, A., 2006. Replanning with RRTs. In Proceedings of IEEE International Conference on Robotics and Automation 2006. pp. 1243–1248.
Fikes, R.E. & Nilsson, N.J., 1971. Strips: A new approach to the application of theorem proving to problem solving. Artificial Intelligence, 2, pp.189–208.
Firby, R., 1987. An investigation into reactive planning in complex domains. In Proceedings of AAAI - 87. pp. 202–206.
Forlizzi, J. & Disalvo, C., 2006. Service Robots in the Domestic Environment: A Study of the Roomba Vacuum in the Home. In Design. ACM Press, pp. 258–265.
Gat, E., 1997. On Three-Layer Architectures. Artificial intelligence and mobile robots, pp.195–210.
Gerevini, A. & Serina, I., 2003. Planning as Propositional CSP: From Walksat to Local Search Techniques for Action Graphs. Constraints, 8, pp.389–413.
Golledge, R.G., 1999. Human Wayfinding and Cognitive Maps. In Wayfinding Behavior: Cognitive Mapping and Other Spatial Processes. pp. 5–45.
Goodwin, R., Goodwin, R., Simmons, R.G., Simmons, R.G., Durrant-Whyte, H.F., Durrant-Whyte, H.F., Koenig, S., Koenig, S., O’Sullivan, J. & O’Sullivan, J., 1997. A layered architecture for office delivery robots. In Proceedings of International conference on Autonomous agents 1997, pp.245–252.
Helmert, M., 2006. The fast downward planning system. Journal of Artificial Intelligence Research, 26, pp.191–246.
39
Hernández, C., Baier, J.A. & Asin, R., 2014. Making A* Run Faster than D*-Lite for Path-Planning in Partially Known Terrain. In Proceedings of the 24th International Conference on Automated Planning and Scheduling (ICAPS), pp.504-508.
Hoffmann, J. & Nebel, B., 2001. The FF planning system: Fast plan generation through heuristic search. Journal of Artificial Intelligence Research, 14, pp.263–312.
Huang, L., 2009. Velocity planning for a mobile robot to track a moving target - a potential field approach. Robot and Autonomous Systems, 57(1), pp.55–63.
Kaelbling, L.P., Littman, M.L. & Cassandra, A.R., 1998. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1-2), pp.99–134.
Kautz, H., McAllester, D. & Selman, B., 1996. Encoding plans in propositional logic. In Proceedings of the Fifth International Conference on Principles of Knowledge Representation and Reasoning (KR-96). pp. 374–384.
Kavraki, L.E., Švestka, P., Latombe, J.C. & Overmars, M.H., 1996. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation, 12(4), pp.566–580.
Khatib, O., 1985. Real-time obstacle avoidance for manipulators and mobile robots. In Proceedings of IEEE International Conference on Robotics and Automation 1985, 2, pp.41–59.
Koenig, S. & Likhachev, M., 2002. D* Lite. In Proceedings of the Eighteenth National Conference on Artificial Intelligence, pp.476–483.
Koenig, S., Likhachev, M. & Furcy, D., 2004. Lifelong Planning A*. Artificial Intelligence, 155, pp.93–146.
Koren, Y., Member, S., Borenstein, J. & Arbor, A., 1991. Potential Field Methods and Their Inherent Limitations for Mobile Robot Navigation. In Proceedings of the IEEE International Conference on Robotics and Automation, pp.1398–1404.
Korf, R.E., 1996. Artificial Intelligence Search Algorithms M. J. Atallah, ed., In Algorithms and Theory of Computation Handbook.
Kuffner, J.J. & LaValle, S.M., 2000. RRT-connect: An efficient approach to single-query path planning. In Proceedings of IEEE International Conference on Robotics and Automation 2000, pp. 995–1001.
Kuipers, B. & Byun, Y.-T., 1991. A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations. Robotics and Autonomous Systems, 8(1-2), pp.47–63.
40
Likhachev, M., Ferguson, D., Gordon, G., Stentz, A. & Thrun, S., 2005. Anytime Dynamic A *: An Anytime , Replanning Algorithm. Science, pp.262–271.
Lowe, D.G., 1999. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision. pp. 1150–1157.
Maes, P. & Brooks, R., 1990. Learning to coordinate behaviors. In Proceedings of the Eighth National Conference on Artificial Intelligence. AAAI Press, pp. 1–7.
Maimone, M., Cheng, Y. & Matthies, L., 2007. Two years of visual odometry on the Mars Exploration Rovers. Journal of Field Robotics, 24(3), pp.169–186.
Mataric, M.J., 1992. Integration of Representation Into Goal-driven Behavior-based Robots. In Proceedings 1992 IEEE International Conference on Robotics and Automation. pp. 304–312.
Montemerlo, M., Thrun, S., Koller, D. & Wegbreit, B., 2002. FastSLAM: A factored solution to the simultaneous localization and mapping problem. In Proceedings of 8th National Conference on Artificial Intelligence/14th Conference on Innovative Applications of Artificial Intelligence 1999, pp. 593–598.
Nguyen, X. & Kambhampati, S., 2001. Reviving partial order planning. In Proceedings of IJCAI International Joint Conference on Artificial Intelligence- 2001, pp.459–464.
Nilsson, N.J., 1980. Principles of Artificial Intelligence, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Penberthy, J.S. & Weld, D., 1992. UCPOP: A Sound, Complete, Partial Order Planner for ADL. In Proceedings of the third international conference on knowledge representation and reasoning, pp.103–114.
Petrovskaya, A. & Thrun, S., 2009. Model based vehicle detection and tracking for autonomous urban driving. Autonomous Robots, 26(2-3), pp.123–139.
Ram, A., Boone, G., Arkin, R. & Pearce, M., 1994. Using Genetic Algorithms to Learn Reactive Control Parameters for Autonomous Robotic Navigation. Adaptive Behavior, 2(3), pp.277–305.
Ranganathan, A. & Koenig, S., 2003. A reactive robot architecture with planning on demand. In Proceedings of 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003), pp.1462–1468.
Richter, S. & Westphal, M., 2010. The LAMA Planner: Guiding Cost-based Anytime Planning with Landmarks. Journal of Artificial Intelligence Research., 39(1), pp.127–177.
41
Rosenblatt, J.K., 1997. DAMN: a distributed architecture for mobile navigation. Journal of Experimental & Theoretical Artificial Intelligence, 9, pp.339–360.
Ross, S., Pineau, J., Paquet, S. & Chaib-draa, B., 2008. Online planning algorithms for POMDPs. Journal of Artificial Intelligence Research, 32, pp.663–704.
Sacerdoti, E.D., 1974. Planning in a hierarchy of abstraction spaces. Artificial intelligence, 5(2), pp.115–135.
Sacerdoti, E.D., 1975. The nonlinear nature of plans. In Proceedings of the 4th International Joint Conference on Artificial Intelligence. IJCAI’75. Morgan Kaufmann Publishers Inc., pp. 206–214.
Se, S., Lowe, D. & Little, J., 2001. Vision-based Mobile Robot Localization And Mapping using Scale-Invariant Features. In Proceedings of IEEE International Conference on Robotics and Automation 2001. pp. 2051 – 2058.
Silver, D. & Veness, J., 2010. Monte-Carlo Planning in Large POMDPs. Advances in neural information processing systems. pp. 2164–2172.
Smith, R., Self, M. & Cheeseman, P., 1987. Estimating uncertain spatial relationships in robotics. In Proceedings of IEEE International Conference on Robotics and Automation 1987, 4, pp.435–461.
Stentz, A., 1994. Optimal and efficient path planning for partially-known environments. In Proceedings of IEEE International Conference on Robotics and Automation 1994, pp.3310–3317.
Stentz, A., 1995. The focussed D* algorithm for real-time replanning. In Proceedings of 14th International Joint Conference on Artificial intelligence (IJCAI). Morgan Kaufmann Publishers Inc., pp. 1652–1659.
Sun, D., Kleiner, A. & Nebel, B., 2014. Behavior-based Multi-Robot Collision Avoidance. In Proceedings of IEEE International Conference on Robotics and Automation 2014, pp. 1668–1673.
Thrun, S., 1998. Learning metric-topological maps for indoor mobile robot navigation. Artificial Intelligence, 99(1), pp.21–71.
Thrun, S., Fox, D., Burgard, W. & Dellaert, F., 2001. Robust Monte Carlo localization for mobile robots. Artificial Intelligence, 128(1-2), pp.99–141.
Tversky, B., 1993. Cognitive maps, cognitive collages, and spatial mental models. Spatial Information Theory A Theoretical Basis for GIS, pp.14–24.
42
Veloso, M., Biswas, J., Coltin, B., Rosenthal, S., Kollar, T., Mericli, C., Samadi, M., Brandao, S. & Ventura, R., 2012. CoBots: Collaborative robots servicing multi-floor buildings. In Proceedings of IEEE International Conference on Intelligent Robots and Systems, pp. 5446–5447.
Zucker, M., Kuffner, J. & Branicky, M., 2007. Multipartite RRTs for rapid replanning in dynamic environments. In Proceedings of IEEE International Conference on Robotics and Automation 2007, pp. 1603–1609.