+ All Categories
Home > Documents > A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in...

A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in...

Date post: 27-Dec-2019
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
20
A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan
Transcript
Page 1: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory

Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Page 2: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Motivation

Prior Work • Nuxoll & Laird (‘12): integration and capabilities • Derbinsky & Laird (‘09): efficient algorithms Core Question To what extent is Soar’s episodic memory effective and efficient for real-time agents that persist for long periods of time across a variety of tasks?

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 2

Page 3: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Approach: Multi-Domain Evaluation

• Existing agents from diverse tasks (49) – Linguistics, planning, games, robotics

• Long agent runs

– Hours-days RT (105 – 108 episodes)

• Evaluate at each X episodes – Memory consumption – Reactivity for >100 task relevant cues

• Maximum time for cue matching <? 50 msec.

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 3

Page 4: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Outline

• Overview of Soar’s EpMem • Word Sense Disambiguation (WSD) • Planning • Video Games & Robotics

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 4

Page 5: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Working Memory

Episodic Memory Problem Formulation

20 June 2012 5

Representation • Episode: connected di-graph • Store: temporal sequence

Encoding/Storage • Automatic • No dynamics

Retrieval • Cue: acyclic graph • Semantics: desired features in context • Find the most recent episode that shares

the most leaf nodes in common with the cue

Episodic Memory

Encoding

Storage

Matching

Cue

Soar Workshop 2012 - Ann Arbor, MI

Page 6: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Episodic Memory Algorithmic Overview

Storage – Capture WM-changes as temporal intervals

Cue Matching (reverse walk of cue-relevant Δ’s)

– 2-phase search • Only graph-match episodes that have all cue features

independently – Only evaluate episodes that have changes relevant to

cue features – Incrementally re-score episodes

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 6

Page 7: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Episodic Memory Storage Characterization

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 7

R² = 0.9825

0

1000

2000

3000

4000

5000

6000

0 20 40 60 80 100 120 140 160

Avg.

Byt

es p

er E

piso

de

Avg. Working Memory Changes 1 week, 8GB

1 day, 8GB

Page 8: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Episodic Memory Retrieval Characterization

Assumptions – Few changes per episode (temporal contiguity) – Representational re-use (structural regularity) – Small cue

Scaling

– Search distance (# changes to walk) • Temporal Selectivity: how often does a WME change • Feature Co-Occurrence: how often do WMEs co-occur within a

single episode (related to search-space size) – Episode scoring (similar to rule matching)

• Structural Selectivity: how many ways can a cue WME match an episode (i.e. multi-valued attributes)

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 8

Page 9: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Word Sense Disambiguation Experimental Setup

• Input: <“word”, POS>; Output: sense #; Result – Corpus: SemCor (~900K eps/exposure)

• Agent

– Maintain context as n-gram – Query EpMem for context

• If success, get next episode, output result • If failure, null

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 9

Accuracy First Second

2-gram 14.57% 92.82%

3-gram 2.32% 99.47%

Page 10: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Word Sense Disambiguation Results

Storage – Avg. 234 bytes/episode

Cue Matching

– All 1-, 2-, and 3-gram cues reactive – 0.2% of 4-grams exceed 50msec.

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 10

Page 11: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

N-gram Retrieval Scaling Retrieval Time (msec) vs. Episodes (x1000)

20 June 2012 11

Feat

ure

Co

-Occ

urre

nce

Tem

pora

l Se

lect

ivity

0

0.5

1

1.5

2

2.5

0 1000 2000 3000 4000

{be, say} (69)

{say, group} (6)

{friday, say} (1)

{say}

0

5

10

15

20

25

0 1000 2000 3000 4000

{well, be, say}

{friday, say, group}

{friday, say}

{say}

Soar Workshop 2012 - Ann Arbor, MI

Page 12: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Planning Experimental Setup

• 12 automatically converted PDDL domains – Logistics, Blocksworld, Eight-puzzle, Grid, Gripper,

Hanoi, Maze, Mine-Eater, Miconic, Mystery, Rockets, and Taxi

– 44 distinct problem instances (e.g. # blocks)

• Agent: randomly explore state space

– 50K episodes, measure every 1K

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 12

Page 13: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Planning Results

Storage – Reactive: <12.04 msec./episode – Memory: 562 – 5454 bytes/episode

Cue Matching (reactive: < 50 msec.)

1. Full State: only smallest state + space size (12) 2. Relational: none 3. Schema: all (max = 0.08 msec.)

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 13

Page 14: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Video Games & Mobile Robotics Experimental Setup

• Hand-coded cues (per domain)

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 14

Domain Agent Duration Eval. Rate

TankSoar mapping-bot 3.5M 50K

Eaters advanced-move 3.5M 50K

Infinite Mario [Mohan & Laird ‘11] 3.5M 50K

Rooms World [Laird, Derbinsky & Voigt ‘11] 12 hours 300K

Page 15: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Data: Eaters

20 June 2012 15

05

101520253035404550

0 0.5 1 1.5 2 2.5 3 3.5

Retr

ieva

l Tim

e (m

sec)

Episodes (x1 Million)

1234567

Soar Workshop 2012 - Ann Arbor, MI

813 bytes/episode

Page 16: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Data: Infinite Mario

20 June 2012 16

05

101520253035404550

0 1 2 3

Retr

ieva

l Tim

e (m

sec)

Episodes (x1 Million)

1234567891011

Structural Selectivity

Soar Workshop 2012 - Ann Arbor, MI

2646 bytes/episode

Page 17: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Data: TankSoar

20 June 2012 17

05

101520253035404550

0 1 2 3

Retr

ieva

l Tim

e (m

sec)

Episodes (x1 Million)

1234567891011

Co-Occurrence

Soar Workshop 2012 - Ann Arbor, MI

1035 bytes/episode

Page 18: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Data: Mobile Robotics

20 June 2012 18

05

101520253035404550

0 10 20 30 40 50 60 70 80 90 100 110

Retr

ieva

l Tim

e (m

sec)

Episodes (x1 Million)

123456

Temporal Selectivity

Soar Workshop 2012 - Ann Arbor, MI

113 bytes/episode

Page 19: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Summary of Results Generality

– Demonstrated 7 cognitive capabilities • Virtual sensing, action modeling, long-term goal management, …

Reactivity

– <50 msec. storage time for all tasks (ex. temporal discontiguity) – <50 msec. cue matching for many cues

Scalability

– No growth in cue matching for many cues (days!) • Validated predictive performance models

– 0.18 - 4 kb/episode (days – months)

20 June 2012 19 Soar Workshop 2012 - Ann Arbor, MI

Page 20: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Evaluation

Nuggets • Unprecedented evaluation of

general episodic memory – Breadth, temporal extent, analysis

• Characterization of EpMem

performance via task-independent properties

• Soar’s EpMem (v9.3.2) is effective and efficient for many tasks and cues!

• Domains and cues available

Coal • Still easy to construct

domain/cue that makes Soar unreactive

• Unbounded memory consumption (given enough time)

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 20

For more details, see paper in proceedings of

AAAI 2012


Recommended