L04 - SEARCH (MORE SEARCH STRATEGIES)
Outline of this Lecture Informed Search Strategies
Best-first search Greedy best-first search A* search Heuristics Local search algorithms Hill-climbing search Simulated annealing search Beam search Local beam search
Informed Search Relies on additional knowledge about the problem or domain
frequently expressed through heuristics (“rules of thumb”)
Used to distinguish more promising paths towards a goal may be mislead, depending on the quality of the heuristic
In general, performs much better than uninformed search but frequently still exponential in time and space for realistic problems
Review: Tree search
Tree search algorithm:
function Tree-Search( problem, fringe) returns a solution, or failurefringe ← Insert(Make-Node(Initial-State[problem]), fringe)loop do
if fringe is empty then return failurenode ← Remove-Front(fringe)if Goal-Test[problem] applied to State(node) succeeds return nodefringe ← InsertAll(Expand(node, problem), fringe)
A search strategy is defined by picking the order of node expansion
56
Best-First search
Relies on an evaluation function that gives an indication of how useful it would be to expand a node (use an evaluation function f(n) for each node estimate of "desirability" → Expand most desirable unexpanded node)
family of search methods with various evaluation functions usually gives an estimate of the distance to the goal often referred to as heuristics in this context
The node with the lowest value is expanded first the name is a little misleading: the node with the lowest value for
the evaluation function is not necessarily one that is on an optimal path to a goal
if we really know which one is the best, there’s no need to do a search
The following is an algorithm for the Best-First Search:
function BEST-FIRST-SEARCH(problem, EVAL-FN) returns solution
fringe := queue with nodes ordered by EVAL-FN return TREE-SEARCH(problem, fringe)
Special cases: greedy best-first search A* search
Romania with step costs in km
57
Greedy Best-First search
minimizes the estimated cost to a goal expand the node that seems to be closest to a goal utilizes a heuristic function as evaluation function
f(n) = h(n) = estimated cost from the current node to a goal heuristic functions are problem-specific often straight-line distance for route-finding and similar
problems often better than depth-first, although worst-time complexities are
equal or worse (space)
utilizes a heuristic function as evaluation function, f(n) = h(n) = estimated cost from the current node to a goal
heuristic functions are problem-specific often straight-line distance for route-finding and similar problems
For example, hSLD(n) = straight-line distance from n to Bucharest
Greedy best-first search expands the node that appears to be closest to goal
Greedy best-first search is often better than depth-first, although worst-time complexities are equal or worse (space)
The following is an algorithm for Greedy Best-First search:
58
function GREEDY-SEARCH(problem) returns solution
return BEST-FIRST-SEARCH(problem, h)
Greedy best-first search example – first step
Greedy best-first search example – second step
59
Greedy best-first search example – third step
Greedy best-first search example – fourth step
Properties of greedy best-first search
Complete? No – can get stuck in loops, e.g., Iasi Neamt Iasi Neamt Time? O(bm), but a good heuristic can give dramatic improvement Space? O(bm) -- keeps all nodes in memory Optimal? No
60
A* search Idea: avoid expanding paths that are already expensive
It combines greedy and uniform-cost search to find the (estimated) cheapest path through the current node
Evaluation function f(n) = g(n) + h(n), the estimated total cost of path through n to goal g(n) = cost so far to reach n (path cost up to n) h(n) = estimated cost from n to goal
heuristics must be admissible That is, h(n) ≤ h*(n) where h*(n) is the true cost from n. (Also require h(n) ≥
0, so h(G) = 0 for any goal G.) never overestimate the cost to reach the goal (hSLD(n) never overestimates the
actual road distance)
It is a very good search method, but with complexity problems
The following is an algorithm for A* search: function A*-SEARCH(problem) returns solution
return BEST-FIRST-SEARCH(problem, g+h)
A* search example – first step
A* search example – second step
61
A* search example – third step
A* search example – fourth step
A* search example – fifth step
62
A* search example – sixth step
Admissible heuristics
A heuristic h(n) is admissible if for every node n,h(n) ≤ h*(n), where h*(n) is the true cost to reach the goal state from n.
An admissible heuristic never overestimates the cost to reach the goal, i.e., it is optimistic Example: hSLD(n) (never overestimates the actual road distance)
Theorem: If h(n) is admissible, A* using TREE-SEARCH is optimal
63
Optimality of A* (proof)
Suppose some suboptimal goal G2 has been generated and is in the fringe. Let n be an unexpanded node in the fringe such that n is on a shortest path to an optimal goal G.
We shall have: f(G2) = g(G2) since h(G2) = 0 g(G2) > g(G) since G2 is suboptimal f(G) = g(G) since h(G) = 0 f(G2) > f(G) from above
h(n) ≤ h*(n) since h is admissible g(n) + h(n)≤ g(n) ≤ h*(n) f(n) ≤ f(G)
Hence f(G2) > f(n), and A* will never select G2 for expansion
64
Consistent heuristics A heuristic is consistent if for every node n, every successor n' of n generated by
any action a,
h(n) ≤ c(n ,a, n') + h(n')
If h is consistent, we have f(n') = g(n') + h(n')
= g(n) + c(n, a, n') + h(n') ≥ g(n) + h(n) = f(n)
that is, f(n) is non-decreasing along any path.
Theorem: If h(n) is consistent, A* using GRAPH-SEARCH is optimal
This GRAPH-SEARCH is as shown below:
Optimality of A* search
65
A* expands nodes in order of increasing f value
Gradually adds "f-contours" of nodes
Contour i has all nodes with f=fi, where fi < fi+1
A* will find the optimal solution the first solution found is the optimal one
A* is optimally efficient no other algorithm is guaranteed to expand fewer nodes than A*
A* is not always “the best” algorithm optimality refers to the expansion of nodes other criteria might be more relevant
A* generates and keeps all nodes in memory improved in variations of A*
Complexity of A*
66
The number of nodes within the goal contour search space is still exponential with respect to the length of the solution better than other algorithms, but still problematic
Frequently, space complexity is more severe than time complexity A* keeps all generated nodes in memory
Properties of A* search
The value of f never decreases along any path starting from the initial node also known as monotonicity of the function almost all admissible heuristics show monotonicity
those that don’t can be modified through minor changes
This property can be used to draw contours regions where the f-cost is below a certain threshold with uniform cost search (h = 0), the contours are circular the better the heuristics h, the narrower the contour around the optimal path
Complete? Yes (unless there are infinitely many nodes with f ≤ f(G) )
Time? Exponential O(bd)
Space? Keeps all nodes in memory O(bd)
Optimal? Yes
Admissible heuristics
For example, for the 8-puzzle:
67
h1(n) = number of misplaced tiles
h2(n) = total Manhattan distance(that is, the number of squares from desired location of each tile)
h 1(S) = 8
h 2(S) = 3+1+2+2+2+3+3+2 = 18
Dominance
If h2(n) ≥ h1(n) for all n (both admissible) then h2 dominates h1
h2 is better for search
Typical search costs (average number of nodes expanded):
d=12 IDS = 3,644,035 nodesA*(h1) = 227 nodes A*(h2) = 73 nodes
d=24 IDS = too many nodesA*(h1) = 39,135 nodes A*(h2) = 1,641 nodes
Relaxed problems A problem with fewer restrictions on the actions is called a relaxed problem
The cost of an optimal solution to a relaxed problem is an admissible heuristic
68
for the original problem
If the rules of the 8-puzzle are relaxed so that a tile can move anywhere, then h1(n) gives the shortest solution
If the rules are relaxed so that a tile can move to any adjacent square, then h2(n) gives the shortest solution
Heuristics for Searching for many tasks, a good heuristic is the key to finding a solution
prune the search space move towards the goal
relaxed problems fewer restrictions on the successor function (operators) its exact solution may be a good heuristic for the original problem
8-Puzzle Heuristics level of difficulty
around 20 steps for a typical solution branching factor is about 3 exhaustive search would be 320 =3.5 * 109 9!/2 = 181,440 different reachable states
distinct arrangements of 9 squares
candidates for heuristic functions number of tiles in the wrong position sum of distances of the tiles from their goal position
city block or Manhattan distance
generation of heuristics possible from formal specifications
Local Search and Optimization for some problem classes, it is sufficient to find a solution
the path to the solution is not relevant
memory requirements can be dramatically relaxed by modifying the current
69
state all previous states can be discarded since only information about the current state is kept, such methods are
called local
Local search algorithms In many optimization problems, the path to the goal is irrelevant; the goal state
itself is the solution
State space = set of "complete" configurations
Find configuration satisfying constraints, e.g., n-queens
In such cases, we can use local search algorithms
Keep a single "current" state, try to improve it
Example: n-queens Put n queens on an n × n board with no two queens on the same row, column, or
diagonal
Iterative Improvement Search for some problems, the state description provides all the information required for
a solution path costs become irrelevant global maximum or minimum corresponds to the optimal solution
70
iterative improvement algorithms start with some configuration, and try modifications to improve the quality 8-queens: number of un-attacked queens VLSI layout: total wire length
analogy: state space as landscape with hills and valleys
Hill-climbing search
"Like climbing Everest in thick fog with amnesia"
71
continually moves uphill increasing value of the evaluation function gradient descent search is a variation that moves downhill
very simple strategy with low space requirements stores only the state and its evaluation, no search tree
there are some problems local maxima
algorithm can’t go higher, but is not at a satisfactory solution plateau
area where the evaluation function is flat ridges
search may oscillate slowly
general problem: depending on initial state, can get stuck in local maxima
72
73
Hill-climbing search: 8-queens problem
h = number of pairs of queens that are attacking each other, ether directly or indirectly
h = 17 for the above state
74
A local minimum with h = 1
75
Simulated annealing search
Idea: escape local maxima by allowing some "bad" moves but gradually decrease their frequency
similar to hill-climbing, but some down-hill movement random move instead of the best move depends on two parameters
∆E, energy difference between moves; T, temperature temperature is slowly lowered, making bad moves less likely
analogy to annealing gradual cooling of a liquid until it freezes
will find the global optimum if the temperature is lowered slowly enough
applied to routing and scheduling problems
Properties of simulated annealing search
One can prove: If T decreases slowly enough, then simulated annealing search will find a global optimum with probability approaching 1
Widely used in VLSI layout, airline scheduling, etc
76
Beam search
Beam search is a heuristic search algorithm that is an optimization of best-first search that reduces its memory requirement. Best-first search is a graph search which orders all partial solutions (states) according to some heuristic which attempts to predict how close a partial solution is to a complete solution (goal state). In beam search, only a predetermined number of best partial solutions are kept as candidates.
Beam search uses breadth-first search to build its search tree. At each level of the tree, it generates all successors of the states at the current level, sorting them in order of increasing heuristic values. However, it only stores a predetermined number of states at each level (called the beam width). The smaller the beam width, the more states are pruned. Therefore, with an infinite beam width, no states are pruned and beam search is identical to breadth-first search. The beam width bounds the memory required to perform the search, at the expense of risking completeness (possibility that it will not terminate) and optimality (possibility that it will not find the best solution). The reason for this risk is that the goal state could potentially be pruned.
The beam width can either be fixed or variable. In a fixed beam width, a maximum number of successor states is kept. In a variable beam width, a threshold is set around the current best state. All states that fall outside this threshold are discarded. Thus, in places where the best path is obvious, a minimal number of states is searched. In places where the best path is ambiguous, many paths will be searched.
Local beam search variation of beam search
a path-based method that looks at several paths “around” the current one
Keep track of k states rather than just one information between the states can be shared
moves to the most promising areas
Start with k randomly generated states
At each iteration, all the successors of all k states are generated
If any one is a goal state, stop; else select the k best successors from the complete list and repeat.
stochastic local beam search selects the k successor states randomly with a probability determined by the evaluation function
77