Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 214 times |
Download: | 1 times |
Dimensions of Knowledge Transfer
Dif
fere
nce
in C
onte
nt
Difference in Representation00
Memorization
Different Representations
(e.g., most cross-domain
transfer)
Similar Representations
(e.g., within-domain
transfer) Knowledge
Reuse
First-PrinciplesReasoning
Isomorphism
We have already solved these problems.We have already solved these problems.
We know the solution to a similar We know the solution to a similar problem with a different representation, problem with a different representation,
possibly from another domain.possibly from another domain.
We have not solved We have not solved this before, but we this before, but we
know other pertinent know other pertinent information about this information about this domain that uses the domain that uses the same representation.same representation.
We have not solved We have not solved similar problems, and similar problems, and are not familiar with are not familiar with
this domain and this domain and problem representation.problem representation.
Knowledge transfer complexity is determined primarily by differences in the Knowledge transfer complexity is determined primarily by differences in the knowledge content and representation between the source and target problems. knowledge content and representation between the source and target problems.
Problem Solver
Claims about Transfer Learning
Claim: Transfer that produces human rates of learning depends on reusing structures that are relational and composable
Test: Design source/target scenarios which involve shared relational structures that satisfy specified classes of transformations
Example: Draw source and target problems from branches of physics with established relations among statements and solutions
Claim: Deep transfer depends on the ability to discover mappings between superficially different representations
Test: Design source/target scenarios that use different predicates and distinct formulations of states, rules, and goals
Example: Define two games in GGP that are nearly equivalent but have no superficial relationship
Meta-Claim: These claims hold for domains that involve reactive execution, problem-solving search, and conceptual inference
Test: Demonstrate deep transfer in testbeds that need these aspects of cognitive systems
Example: Develop transfer learning agents for Urban Combat, GGP, and Physics
Predicate invention for representation mapping in Markov logic (Washington)
Goal-directed solution analysis for hierarchical skill mapping (ISLE)
Representation mapping through deep structural analogy (Northwestern)
Semantic learning augmented with procedural chunking (Michigan)
We will explore four paths to deep transfer:
ISLE Team Y2 Technology Components
The ICARUS Architecture
Markov Logic Networks The Soar Architecture
The Companions Architecture
Body
Long-Term MemoriesProcedural
Short-Term Memory
Dec
isio
n P
roce
dure
Chunking
Episodic
EpisodicLearning
SemanticLearning
Semantic
ReinforcementLearning
Perception Action
MarkovLogic
WeightedSatisfiability
Markov ChainMonte Carlo
Inductive LogicProgramming
WeightLearning
TargetDomain
SourceDomain
Long-TermLong-TermConceptualConceptual
MemoryMemory
Short-TermShort-TermConceptualConceptual
MemoryMemory
Short-TermShort-TermGoal/SkillGoal/SkillMemoryMemory
ConceptualConceptualInferenceInference
SkillSkillExecutionExecution
PerceptionPerception
EnvironmentEnvironment
PerceptualPerceptualBufferBuffer
Problem SolvingProblem SolvingSkill LearningSkill Learning
MotorMotorBufferBuffer
Skill RetrievalSkill Retrieval
Long-TermLong-TermSkill MemorySkill Memory
Replace w/Alchemyinference software
Augment with CYCknowledge base
Incorporate HTN planning methods
Cycorp
Washington
Maryland
Add methods for learning value fns
UT Austin
Facilitator
Executive
nuSketch GUI RelationalConcept
Map
SessionManager
Cluster
User’sWindows box
MAC/FACDomain Model
Tickler
SessionReasoner
InteractionManager
Visual/SpatialReasoner
SEQLDomain
Generalizer
MAC/FACSelf Model
Tickler
SEQLSelf ModelGeneralizer
MAC/FACUser Model
Tickler
SEQLUser ModelGeneralizer
InteractiveExplanationInterface
OfflineLearningOffline
LearningOfflineLearning
Scientific Claims for IcarusFLEXIBLE INFERENCE OVER RICH COGNITIVE STRUCTURES
•Improve transfer by making conceptual inference and skill retrieval more robust•Approach: Combine inference over Markov logic networks (Alchemy), which unify relational logic and probabilistic reasoning, with goal-indexed retrieval of hierachical task networks (Icarus)
ANALYTICAL DISCOVERY OF CROSS-DOMAIN MAPPINGS• Improve transfer by specifying mappings between concepts and skills in source and target domains• Approach: Analyze problem-solving traces to identify similar structures then use to generate candidate mappings
PROBABILISTIC LEARNING OF HIDDEN PREDICATES• Improve transfer by specifying mappings between concepts and skills in source and target domains• Approach: Use regularities in relational data to postulate hidden predicates that map onto concepts in each domain, then use Markov logic to make inferences for target based on sourcesource
concepts
targetconcepts
targetskills
sourceskills
Ammunition
Combustible_material
Surround_enemy
Contain_fire
ANALYTICAL INVENTION OF HIGH-UTILITY PREDICATES • Improve transfer by exploiting symbolic domain knowledge and statistics from experience to generate new concepts. Use to enhance value functions and refine hierarchical skills.Regression through operators
and clause definitions generate new useful features.
Knowledge compilation and abstraction operations can produce abstract features
Entire concept derivation tree is available for use in transfer domains.
Prof(Ray)
Student(Lily)
Prof(Pedro)
Student(Stan)
Paper(Relearning MLN)
Paper(Path-Finding in MLN)
Paper(SPI)
Paper(Structure Learning)
Project(Transfer Learning)
Project(Alchemy)
Agency(DARPA)
Agency(ONR)
Agency(NSF)
FundedBy(researcher, agency)
AdvisedBy(student, researcher)
Director(Scorsese)
Actor(Damon)
Director(Spielberg)
Actor(Barrymore)
Scene(Infiltration)
Scene(Contact)
Scene(Bicycle)
Movie(Departed)
Movie(ET)
Company(Paramount)
Company(Universal)
InvestorOf(director,company)
DirectedBy(actor, director)
Scene(Exchange)
HasSymptom (Alan,Palpitations)
HasSymptom(Alan,Angina)
HasSymptom(Alan,ShortBreath)
HasDisease(Alan,HeartDisease)
HasSymptom(Bob,Angina)
HasSymptom(Bob,ShortBreath)
HasDisease(Bob,HeartDisease)
HasSymptom(Charles, Palpitation)
HasDisease(Charles,HeartDisease)
FatherOf(Alan,Bob)
hasGeneticBasis(HeartDisease)
FatherOf(Bob, Charles)
Observed
Scientific Claims for SoarDeliberate Reflection Constructs Declarative and Procedural Generalizations and Abstractions
• Analyze episodes, detecting for commonalities across multiple situations and across multiple games
• Store abstractions with expected results in semantic memory (declarative structures) and as chunks (procedures) for future retrieval.
Does this new game have the
concept of pinned in
it?
I used pinned in multiple
games to help me win. I
should use it whenever I can.
Episodic Memory Holds Behavior History for Future Analysis• Automatically records experiences for future efficient retrieval and
playback• Supports post-hoc analysis and detection of generalizations that
occur across multiple states that transfer to new situations• Supports detection, comparison and generalization of regularities
that occur across multiple game episodes
If my enemy “pins” my piece, I
can’t move it without losing another piece
Automatic and Deliberate Retrieval of Stored Results Creates Mappings that Direct Behavior
• Automatically elaborate new situations with simple abstractions• Deliberately analyze new tasks, attempting to detect previously
learned complex abstractions (mappings). • Retrieve results tied to abstractions that are stored in semantic
memory or as chunks. • Direct behavior using retrieved results – transfer!
Use RL to Tune Abstractions and Generalizations• Use reinforcement learning to learn when mappings actually
help problem solving. • Overtime will avoid mappings that only appear to be useful and
will use mappings that lead to success in target domain.
Scientific Claims for CompanionsSelf-modeling capabilities will promote transfer•Learn more robust generalizations through focused, off-line analysis• Approach: Detailed analysis and comparison of game rules and records of played games to evaluate learned knowledge, formulate new learning goals
I keep getting
pinned! I need to work on that!
Analogical Encoding will promote transfer at levels 7-10•Automatic elaboration and reformulation of game descriptions, to achieve better gameplay with fewer instance-level transfers•Approach: Aggressive re-representation to improve productivity of the match, in terms of predictions
Originaldistantanalogy
Get traction byfiguring out which
non-identical predicates best
align
Persistent Mappings for reflection about transfer• Improve transfer at levels 7-10 by learning what does and doesn’t transfer• Approach: Reify mappings, keep track of what inferences are and are not productive
These don’t look alike yet, I need a different perspectiv
e
I can use pawn promotion,
which is like crowning!?
Metamappings will improve performance in far transfer• More robust cross-domain analogies, even in distant (level 10) transfer• Approach: Recursively find analogies between properties of non-identical predicates
Program Requirements• Go/No-go tests vs known transfer targets for five transfer
types (redefined transfer levels 6-10)• Conduct careful science, systematic exploration of transfer
• Showcase tasks in MadRTS• Demonstrate Darpa relevance, technology appeal• Same performance metric, statistical test, and score
aggregation across architectures
Regret Metric
• Shared with Berkeley; basis for both Go/No-go decisions at end of year.• Ratio of area between learning curves and the area in the bounding box.
Benefit = (Area between curves) * 100 / (y-range * x-range)• Naturally scales with problem difficulty; has an easy interpretation as a percentage improvement.
Type-1 Benefit (x = 1 through 5): 30.9218Type-2 Benefit (x = 6 through 45): 28.2873Type-3 Benefit (x = 46 through 50): 83.689 Overall Benefit (x = 1 through 50): 23.4336
The x-ranges are just the 10%-80%-10% numbers for type 1/2/3 metrics
There is some art to choosing the bounding box. For example, the ranges can be adjusted to diminish the effect of outliers, and restricted to remove data after the apparent asymptote..
Type 1 Type 2 Type 3
Ave
rage
Rew
ard
Transfer Targets
• Required Regret scores
TL Year 2 Year 36 30 407 30 408 20 309 20 3010 20 30
• Targets concern learning curve after first element.• Aggregation method:
• Average score >= target, two architectures must each report scores at all levels
GGP Tasks
• Goals:• Foster careful, measurable science• Eliminate mechanical program risk
• Problem characteristics:• One player game with >= 1 deterministic (but unknown)
adversary• Defined within GGP (“internally simulated”)• Turn-taking format• Deterministic dynamics• Fully observable state• Reliable sensing
Showcase Tasks
• Goals• Establish Darpa relevance of TL program • Showcase novel (and riskier) technology
• Domain characteristics:• Externally simulated domain (MadRTS)• >= 1 deterministic (but unknown) adversary• Pause/go format• Primarily deterministic dynamics (except combat)• Partial GDL descriptions of state and dynamics• Aggregate actions, actions over time• Fully or partially observable state (choice)• Reliable or non-reliable sensing (choice)
• What we don’t know about domain characteristics:• What portion of the domain will GDL capture, and when?
Showcase Task Questions
1. Will showcase tasks have Go/No status?• If financially feasible to construct scenarios and interfaces then
include as Go/No-go tests
2. What are showcase tasks?• Tasks known to agents. Domain engineering allowed. • Dan O. suggests one core problem for each of three transfer
types, with syntactic variants to supply statistical relevance. ISLE suggests two core tasks, total.
• Core tasks cooperatively defined with evaluation team. Variants defined by evaluation team, within agreed parameters.
• Goal of statistical test is to verify that agents robustly solve core problems. Coverage of transfer type in MadRTS not an issue.
3. Will all architectures address showcase task?• Yes, pending solution to feasibility issue.
Evaluation Architecture
• GameMaster+ supplies access to internal & external domains
GameMaster+
ExternalSimulators(e.g., MadRTS)
TLAgent
Liet
Percepts
Actions
Experimentation Manager
Game rules,percepts
Actions
GDL Simulator
Game Database
•Experiment specification,•Commands (e.g., start,
pause, analyze)
•Status messages•Analysis results
Commands (e.g.,which scenario,what length pause.)
Statusmessages
Actions
Percepts
Actions
Percepts
GameManager
• Will external/internal distinction be transparent to agent developers?
Experimental Protocol Outline
Source Problems
Target Problems
Transfer Condition
Non-Transfer Condition
HumanHuman ISLEUT
TransferdifferenceScaled transferdifference
Ratio
ARR (narrow)
ARR (wide)
Transfer ratio
Truncatedtransfer ratio
Jump start
Asymptoticadvantage
ISLELevel 1
UM UTUMLevel 2TL
Metrics
Agent Transfer Scores
Statistical Analysis
Agents
Agent
Learning Curves
Significance wrt Null Hypothesis
Metrics (benefit)
Protocol Constraints
• The protocol requires >=2 targets per scenario • >=1 to discover mappings (a cost)• Others to show transfer benefit
• The protocol requires >=2 sources per scenario• >=1 to ground all facets of the mapping• Others to enforce the mapping is persistent
Claim: Deep transfer requires technology for detecting and exploiting persistent mappings between source and target domains.
Protocol Details
(O1, O2 ,…,Oo) (T1, T2 ,…,TT)
(O1, O2 ,…,Oo) (T1, T2 ,…,TT)…
(O1, O2 ,…,Oo) (T1, T2 ,…,TT)
S = scenariosO = sources per scenarioT = targets per scenarioM = matches per source or targetK = trials per scenario
Protocol time complexity = M(O+2T)SK
Primary difference from Y1: scenarios contain sets of problems.
• Each line is one scenario
• Three source, three target problems per scenario• Seven scenarios per transfer type
Protocol Pseudocode
S = number of scenariosO = sources per scenarioT = targets per scenarioM = matches per source or targetK = trials per scenario
Protocol problem (time) complexity = M(O+2T)SK
run_protocol(S,O,T, M, K, PO, PT,) =for n = 1 to S ; For each scenario for k = 1 to K ; Number of permutations (trials) Randomize the order of source problems O and target problems T
foreach c in {no_transfer, transfer} ; For each experiment conditionKB = ; Clear the knowledge baseif (c == transfer) ; If transfer condition, train on source problems
for i = 1 to O for m = 1 to M ; Number of matches
KB = train(pi,KB)endif
for j = 1 to T ; Train on target problemsfor m = 1 to M
{KB, results[n,r,pS,c,pT]} = train_and_record(pj,KB) ; Update KB and record resultoutput(results)
For each transfer level:
Development Plan• Separate Go/No-go, showcase paths
• Incremental model for each: build initial agent technology, evaluate against initial problems, repeat
• Key Go/No-go dates:• Initial experiments March 31• Revised problems April 30• Go/No-go tests completed August 24, 2007
• Key Demonstration dates:• Initial demonstration April 30• Revised problems June 1• Final demonstrations September 15
Integration Plan (Icarus)• GRAPHIC SHOWING TECHNOLOGY from
Maryland, UW, Rutgers, Austin going into Icarus
GDL 4/30 5/31
Go/No-go Task FlowEvaluation Team
ISLE Team
GameMaster+ Development
Experiment Design
Agent DevelopmentInitial Test 3/31
Dec Jan Feb Mar Apr May Jun Jul Aug Sep
O,T,S 12/31
GDL 12/31 1/31 2/28
Scenario (&TL) Definition
GameMaster 12/31
Experimentation Manager 2/28
GameMaster+ Development
Experiment Design
Agent Development Go/No-go 8/24
O,T,S 4/30
Scenario (&TL) Definition
Problem Generator 6/30
Feedback 4/15
Who demos? 1/15
Sim & Agent/LIET Interfaces 2/15
Demonstration Task FlowEvaluation Team
ISLE Team
GameMaster+ Development
Agent Development
Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Domain Choice 12/31
GameMaster+, GDL+ 2/28 3/31
Scenario Construction
Simulator 1/15
Feedback 5/15
Domain Engineering
Evaluation Plan 1/31
Scenario Design 2/15 3/15
Sim & Agent/LIET Interfaces 6/15
GameMaster+ Development
Experiment Design
Agent Development
GameMaster+, GDL+ 6/1 7/1 8/1
Scenario Construction
Domain Engineering
Evaluation Plan 5/31
Initial Demo 4/30
Experiment Design
Scenario Design 5/15 6/15 7/15
Final Demo 9/15
Summary of Work Products, Dates
ISLE Team Evaluation Team
Demonstration DatesDomain Choice 12/31Evaluation Plan 1/31Interface Engineering 2/15Scenario Design 3/15Initial Demonstration 4/30
Evaluation Plan 5/31Interface Engineering 6/15Scenario Design 5/15Scenario Design 6/15Scenario Design 7/15Final Demonstrations 9/15
Go/No-go DatesCurrent GameMaster 12/31GDL for Scenarios 12/31GDL for Scenarios 1/31GDL for Scenarios 2/28Exerimentation Manager 2/28
Feedback on first pass 4/15GDL for Scenarios 4/30GDL for Scenarios 5/31Problem Generator 6/30
Demonstration DatesSimulator 1/15Interface Engineering 2/15GameMaster+ 2/28GDL+ for Scenarios 2/28GameMaster+ 3/31GDL+ for Scenarios 3/31
Feedback on first pass 5/15GameMaster+ 6/1GDL+ for Scenarios 6/1Interface Engineering 6/15GameMaster+ 7/1GDL+ for Scenarios 7/1GameMaster+ 8/1GDL+ for Scenarios 8/1
Go/No-go DatesExperimental Design (O,T,S) 12/31Initial Tests 3/31
Feedback on first pass 4/15Experimental Design (O,T,S) 4/30Go/No-go tests 8/24
Darpa’s Proposed Schedule
Task Deadline Dec Jan Feb Mar Apr May Jun Jul Aug SepTest Harness
First cut of GGP/LIET API released 29-Dec-06 ………XFinal version of extended GameMaster w/ LIET 30-Mar-07 .…X
GDL+Roadmap and 1st version of extension 8-Jan-07 ……XFinal version of GDL 30-Mar-07 ....X
ScenariosInitial Scenarios Defined 22-Dec-06 ……….XFinal Scenario Defined 30-Mar-07 ….XScenario Generators Spec 2-Feb-07 XFirst Scenario Generator 28-Feb-07 ….XBulk of Scenario generators delivered 30-Mar-07 ………..XFinal Scenario deadline 4-May-07 .X
TestingTest Readiness Review 29-Jun-07 ……XGo/NoGo Testing begins 6-Aug-07 ..X
Go/NoGo Testing completed 24-Aug-07 ……XGo/NoGo Test results analyzed and reported 18-Sep-07 ……X
ProgrammaticYear 2 Kickoff ….XY3 Go/NoGo Briefing 28-Sep-07 ……X
Go/No-go Experiment Questions
• There are alternate experimental designs• Learning curves track performance across scenarios, each score
averages across target problems (O=3,T=3,S=4, K=7)• Learning curves track performance across iterations of single match
• Expect transfer on Tth target problem (O,T=3, S=7, K=1, M>>1) • Seek transfer on any target problem (O,T=3, T*S>=7, K=1, M>>1)
• Scenario requirements may be problematic• Stanford Logic Group may be able to construct a scenario generator
for Go/No-go tests, else hand coded
• We’d prefer that evaluation methodology not drive technology • CPU time for evaluation problematic if S, K, or M >>1, time/cycle slow
• Current Icarus cycle times: UCT: 0.2 - 2s, GGP: 3 - 90s• All three tech teams using the same pool of resources (Gamemaster
and opponents) of the Stanford Logic Group.
Demonstration Experiment Questions
• Issues:• GDL descriptions have a different status and may be
less complete than for Go/No-go tasks• Domain scenarios may arrive later• Their numbers will be constrained (no generator)• Current statistical evaluation methodology may
interfere with the demonstration goal (this might be an opportunity to introduce qualitative metrics in Y3).
• Evaluation options:• Measure transfer per the same protocol and metric• and/or use entirely more qualitative method