External Memory Value Iteration
Stefan Edelkamp, Shahid JabbarChair for Programming Systems, University of Dortmund, Germany
Blai BonetDepartamento de ComputacionUniversidad Simon Bolivar, Caracas, Venezuela
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 2
Motivation: Reinforcement Learning
Aim: Write Controller to act successfully in the environment
Minimize Cost/Maximize Rewards
Agent
Environment
atctst
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 3
Motivation: External Reinforcement Learning
Cover deterministic, non-deterministic, probabilistic environments (and games)
But what to do, if the agent’s state space or policy space is too large to be computed and stored in RAM?
Disk Space is Cheap (500 GB ~ 100$)
External Memory Algorithm
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 4
Overview
Uniform Search Model Internal Memory Value Iteration Existing External Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 5
Overview
Uniform Search Model Internal Memory Value Iteration Existing External Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 6
Uniform Search Modell:
Deterministic
Non-Deterministic
Probabilistic
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 7
Overview
Uniform Search Model Internal Memory Value Iteration Existing External Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 8
ε-Optimal for solving MDPs, AND/OR trees…
Problem:Needs to have the whole state space in the main memory.
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 9
Why External Memory Algorithms ?
Search algorithms perform well as long as they consume RAM only!
Virtual memory slows down the performance!
0x000…000
0xFFF…FFF
Virtual Address Space
Memory Page
7 I/Os
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 10
Overview
Uniform Search Model Internal Memory Value Iteration Existing External Memory Model
and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 11
External Memory Model [Vitter and Shriver, 94]
M
If the input size is very large, running time depends on the I/Os rather than on the number of instructions.
B
N
B
NONsort
B
NONscan
B
Mlog)(
)(
Input of size N >> M
B
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 12
External Breadth-First Search (Munagala and Ranade, SODA’99)
A
D
C
B
E
A
Open (0)
A
A
D
D
E
Exter
nal
Sort
Ope
n (2
)
A
D
E
Compa
ct
Ope
n (2
)
D
E
Remov
e
Dup
licat
es
w.r.t
2
prev
ious
la
yers
Open (2)
B
C
Open (1)
D
A
A
D
E
For undirected graphs, subtracting two layers is enough [Munagala & Ranade, 99].
For directed graphs, the longest back-edge has to be taken into account [Zhou & Hansen, 05].
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 13
External Memory Algorithms for Implicit Graphs Frontier Search [Korf, 03] External A* [Edelkamp, Jabbar, Schrödl, 04] Structured Duplicate Detection [Zhou &
Hansen, 04]. Cost-Optimal External Planning [Edelkamp,
Jabbar, 06] Model Checking for Linear Temporal Logic
[Jabbar & Edelkamp, 05] for safety error detection [Edelkamp & Jabbar, 06] for liveness detection (cycle) [Barnat, Brim, Simecek, 07] for liveness detection
(cycle) Real-Time Model Checking/Scheduling
[Edelkamp, Jabbar, 06]
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 14
Overview
Uniform Search Model Internal Memory Value Iteration Existing External Memory Model and
BFS External Memory Value Iteration Experimental Highlights Summary & Outlook
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 15
External Memory Algorithm for Value Iteration
What makes value iteration different from the usual external memory search algorithms?
Answer: Propagation of information from states to
predecessors!
Edges are more important than the states.
Ext-VI works on Edges: vuwhereavhvu a ,,,,
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 16
External Memory Value Iteration
Phase I: Generate the edge space by External BFS. Open(0) = Init; i = -1 while (Open(i-1) != empty)
Open(i) = Succ(Open(i-1)) Externally-Sort-and-Remove-Duplicates(Open(i)) for loc = 1 to Locality(Graph)
Open(i) = Open(i) \ Open(i - loc) i++
endwhile
Merge all BFS layers into one edge list on disk!
Opent = Open(0) U Open(1) U … U Open(DIAM)
Temp = Opent
Sort Opent wrt. the successors; Sort Temp wrt. the predecessors
Remove previous layers
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 17
Working of Ext-VIPhase-II
{(Ø, 1), (1,2), (1,3), (1,4), (2,3), (2,5), (3,4), (3,8), (4,6), (5,6), (5,7), (6,9), (7,8), (7,10), (9,8), (9,10)}
{(Ø,1), (1,2), (1,3), (2,3), (1,4), (3,4), (2,5), (4,6), (5,6), (5,7), (3,8), (7,8), (9,8), (6,9), (7,10), (9,10)}
3 2 2 2 2 1 2 0 1 1 1 1 0 0 0 0
3 2 2 2 2 2 1 1 1 1 0 0 0 1 0 0
3 2 1 1 2 2 2 2 2 1 0 0 0 1 0 0
1
2
3
4
7
8
9
5
6
10I T Th=3
2
2
2
1
1
1
1
0 0
Temp : Edge List on Disk – Sorted on Predecessors
Opent : Edge List on Disk – Sorted on Successors
h=
h=
h’=
uSuccvvhuh'
min1
Alternate sorting and update until residual < epsilon
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 18
Complexity Analysis Phase-I: External Memory
Breadth-First Search. Expansion:
Scanning the red bucket: O(scan(|E|))
Duplicates Removal: Sorting the green bucket
having one state for every edge from the red bucket.
Scanning and compaction. O(sort(|E|))
Subtraction: Removing states of blue
buckets (duplicates free) from the green one.
O(l x scan(|E|))
Complexity of Phase-I:
O(l x scan(|E|) + sort(|E|) ) I/Os
………
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 19
Complexity Analysis Phase-II: Backward
Update Update:
Simple block-wise scanning. Scanning time for red and
green files: O(scan(|E|)) I/Os External Sort:
Sorting the blue file with the updated values to be used as red file later: O(sort(|E|)) I/Os
Fast External Sort: If |E| / M < Max file pointers O(scan(|E|)) I/Os
Total Complexity of Phase-II: For
tmax iterations,
O(tmax x sort(|E|)) I/Os
With Fast External Sort:
O(tmax x scan(|E|)) I/Os
Sorted on preds
Sorted on states
Updated h-values
………
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 20
Overview
Uniform Search Model Internal Memory Value Iteration Existing External Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 21
Experiments: 3x3 Sliding Tiles Puzzle
p=1.0; heuristic = 0
Alg. |S|/|E| RAM #Iterations
Time
VI 181,440 21M 27 6.3
Ext-VI
483,839 11M 32 71.5p=0.9; heuristic = Manhattan distance
Alg. |S|/|E| RAM #Iterations
Time
VI 181,440 21M 35 8.3
Ext-VI
967,677 12M 43 237.4Number of Iterations differ!!
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 22
3x4 Sliding Tile Puzzle with p=0.9 (State space: 12!/2 = 239 x 106) On 2 Gigabytes, VI could not
generate the state space. External VI Finished:
Took 45 GB of disk space for the edges. Total 1,357,171,197 edges. Took 437 hours and 72 iterations to
converge. ε = 0.0001
RAM used: 1.4 Gigabytes
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 23
Race Track Domain
Example Alg. 150x300 RaceTrack
VI Out of mem.> 2GB
LRTDP
Out of mem.>2 GB; 12 hours
LDFS Out of time>1.5 GB; 118 hours
Ext-VI Converged! 1.6GB; 91 hours
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 24
Overview
Uniform Search Model Internal Memory Value Iteration Existing External Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 25
Summary
Achievements First I/O efficient disk-based algorithm for
solving Markov Decision Processes. I/O Complexity Analysis.Features General Cost Model Can Pause-and-Resume Execution to add more
Hard Disks.
Refinements Disk Space eaten by Duplicate States:
Start “Early” Delayed Duplicate Detection
External Memory Value Iteration
Edelkamp, Jabbar & Bonet 26
Outlook
Application to Bellman-Ford Parallel External Value Iteration:
During the time of internal update, hard disk is not in use..
Thank You!Questions ?