Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 213 times |
Download: | 0 times |
CSE298
CSE300
1
CSE333
Learning AgentsLearning Agents
Presented by: Huayan Gao
([email protected]), Thibaut Jahan ([email protected]),David Keil ([email protected]),Jian Lian ([email protected])
Students in CSE 333Distributed Component SystemsProf. Steven A. Demurjian, Sr.
Computer Science & Engineering DepartmentThe University of Connecticut
CSE298
CSE300
2
CSE333
OutlineOutline
AgentsAgents Distributed computing agentsDistributed computing agents The JADE platformThe JADE platform Reinforcement learningReinforcement learning UML design of agentsUML design of agents The maze problemThe maze problem Conclusion and future workConclusion and future work
CSE298
CSE300
3
CSE333
AgentsAgents
AutonomyAutonomy goal-orientednessgoal-orientedness collaborationcollaboration flexibilityflexibility ability to be self-startingability to be self-starting temporal continuitytemporal continuity charactercharacter adaptivenessadaptiveness mobilitymobility capacity to learn.capacity to learn.
Some general features characterizing agents:
CSE298
CSE300
4
CSE333
Classification of agentsClassification of agents
Interface AgentsAI techniques to provide assistance to the user
Mobile agentscapable of moving around networks gathering information
Co-operative agentscommunicate with, and react to, other agents in a multi-agent systems within a common environment
Reactive agents“reacts” to a stimulus or input that is governed by some state or event in its environment
CSE298
CSE300
5
CSE333
Distributed Computing AgentsDistributed Computing Agents
Common learning goal (strong sense)
Separate goals but information sharing (weak sense)
CSE298
CSE300
6
CSE333
The JADE PlatformThe JADE Platform
Java Agent Development EnvironmentJava Agent Development Environment-- Java Software frameworkJava Software framework- Middleware platform- Middleware platform- Simplifies implementation and deployment of MAS- Simplifies implementation and deployment of MAS
Services ProvidedServices Provided- AMS (Agent Management System)- AMS (Agent Management System)
registration, directory and managementregistration, directory and management- DF (Directory Facilitator)- DF (Directory Facilitator)
yellow pages serviceyellow pages service- ACC (Agent Communication Channel)- ACC (Agent Communication Channel)
message passing service within the platform (including message passing service within the platform (including remote agents)remote agents)
CSE298
CSE300
8
CSE333 Agent typeAgent type
DeterministicDeterministic StochasticStochastic
AccessibleAccessible ReflexReflex Solves MDPsSolves MDPs
InaccessibleInaccessible Policy-basedPolicy-based SolvesSolvesnon-Markovnon-Markov POMDPs*POMDPs*
*Partially observable Markov decision problems*Partially observable Markov decision problems
Env
iron
men
t typ
e
Agents and Markov processesAgents and Markov processes
CSE298
CSE300
9
CSE333
Learning from the environmentLearning from the environment
Environment, especially a distributed one, Environment, especially a distributed one, may be complex, may changemay be complex, may change
Necessity to learn dynamically, without Necessity to learn dynamically, without supervisionsupervision
Reinforcement learningReinforcement learning- used in adaptive systems- used in adaptive systems- involves finding a - involves finding a policypolicy
Q-learning, a special case of RLQ-learning, a special case of RL- compute Q-values into Q-table- compute Q-values into Q-table- finds optimal policy- finds optimal policy
CSE298
CSE300
10
CSE333
Policy searchPolicy search
Policy: a mapping from states to actions
Policy is as opposed to action sequence
Agents that precompute action sequences cannot respond to new sensory information
Agent that follows a policy incorporates sensory information about state into action determination
CSE298
CSE300
11
CSE333
Components of a learnerComponents of a learner
In learning, In learning, perceptspercepts may help improve may help improve agent’s future success in interactionagent’s future success in interaction
Components:Components:- Learning element (improves policy)- Learning element (improves policy)- Performance element (executes policy)- Performance element (executes policy)- Critic: Applies fixed performance- Critic: Applies fixed performance measure measure- Problem generator: Suggests- Problem generator: Suggests experimental actions that will provide experimental actions that will provide information to learning element information to learning element
CSE298
CSE300
13
CSE333
Temporal difference learningTemporal difference learning
• Uses observed transitions and differences between utilities of successive states to adjust utility estimates
• Update rule based on transition from state i to j:
U(i) U(i) + (R(i) + U(j) U(i))where- U is estimated utility,- R is reward- is learning rate
CSE298
CSE300
14
CSE333
Q-learningQ-learning
Q-learning: Q-learning: a variant of reinforcement a variant of reinforcement learning in which the agent incrementally learning in which the agent incrementally computes a table of expected aggregate computes a table of expected aggregate future rewardsfuture rewards
Agent modifies the values in the table to Agent modifies the values in the table to refine its estimates.refine its estimates.
Using the temporal-difference learning Using the temporal-difference learning approach, update formula is calculated approach, update formula is calculated after the learner goes from state after the learner goes from state ii to state to state jj::
QQ((a, ia, i) ) QQ ( (a, ia, i) + ) + ((RR((ii) + max) + max
aa QQ((aa, , jj) ) -- QQ ( (a, a, ii))))
CSE298
CSE300
15
CSE333
Q-valuesQ-values
Definition: Definition: Q-values Q-values are values are values QQ((a, ia, i) of expected ) of expected utility associated with a given action in a given utility associated with a given action in a given statestate
Utility of state:Utility of state: UU((ii) = max) = maxaa QQ((a, ia, i))
Q-values permit decision making without a Q-values permit decision making without a transition modeltransition model
Q-values are directly learnable from reward Q-values are directly learnable from reward perceptspercepts
CSE298
CSE300
16
CSE333
UML design of agentsUML design of agents
Standard UML did not provide a complete solution for depicting the design of multi-agent systems.
Multi-agent systems being actors and software, their design does not follow typical UML design
Goals, complex strategies, knowledge, etc. are often missed
CSE298
CSE300
18
CSE333
A maze problemA maze problem
Simple example consisting of a maze for which the learner must find a policy, where the reward is determined by eventually reaching or not reaching a goal location in the maze.
Original problem definition may be modified by permitting multiple distributed agents that communicate, either directly or via the environment
CSE298
CSE300
19
CSE333
Cat and Mouse problemCat and Mouse problem
Example of reinforcement learning
The rules of the Cat and Mouse game are: - Cat catches mouse;- Mouse escapes cat;- Mouse catches cheese;- Game is over when the cat catches the mouse.
Source: T. Eden, A. Knittel, R. van Uffelen. Reinforcement learning. www.cse.unsw.edu.au/~aek/catmouse
Our project included modifying existing Java code to enable remote deployment of learning agents and to begin exploration of a multiagent version
CSE298
CSE300
26
CSE333
JADEJADE
Cat look up maze from AMS and DF serviceCat look up maze from AMS and DF service
CSE298
CSE300
27
CSE333
JADEJADE
Mouse Agent Creating and RegistrationMouse Agent Creating and Registration
CSE298
CSE300
29
CSE333
Game begins
Game begins and Maze (master) and Mouse agents exchange information by ACL messages
CSE298
CSE300
30
CSE333
Remote deployment of learning agentsRemote deployment of learning agents
o Using JADE, we can deploy maze, mouse, and Using JADE, we can deploy maze, mouse, and cat agents:cat agents:Jademaze maze1Jademaze maze1Jademouse mouse1Jademouse mouse1Jadecat cat1Jadecat cat1
o Jademaze, jademouse, jadecat Jademaze, jademouse, jadecat are batch file are batch file names to deploy maze and cat agents. If we want names to deploy maze and cat agents. If we want to create them from a remote PC, we will use the to create them from a remote PC, we will use the following commands:following commands:Jademaze –host hostname mazename;Jademaze –host hostname mazename;Jademaze –host hostname catname;Jademaze –host hostname catname;Jademaze –host hostname mousename;Jademaze –host hostname mousename;
CSE298
CSE300
31
CSE333
Cat-Mouse in JADE Cat-Mouse in JADE
JADE allows services to be hosted and JADE allows services to be hosted and discovered in a distributed dynamic discovered in a distributed dynamic environment. environment.
On top of those “basic” services, mouse/cat On top of those “basic” services, mouse/cat agents can conceive maze/mouse/cat services agents can conceive maze/mouse/cat services provided and join/quit from the maze server provided and join/quit from the maze server they discovered from DF service. they discovered from DF service.
CSE298
CSE300
32
CSE333
InnovationInnovation
A backbone for a core platform encouraging other agents to connect and join
Access to ontologies and service description to move towards interoperability at the service level
A baseline set of deployed agent services that can be used as building blocks by application developers to create innovative value added services
A practical test for a learning agent system complying with FIPA standards.
CSE298
CSE300
33
CSE333
Deployment ScenarioDeployment Scenario
Infrastructure DeploymentInfrastructure Deployment - Enable their agents to interact with service agents - Enable their agents to interact with service agents developed by others developed by others - Test applications in a realistic, distributed, open - Test applications in a realistic, distributed, open environment environment
Agent and Service Deployment Agent and Service Deployment - FIPA ACL messages to exchange information- FIPA ACL messages to exchange information- Standard FIPA ACL compatible content- Standard FIPA ACL compatible content languages languages- FIPA defined agent management services|- FIPA defined agent management services| (directories, communication and naming). (directories, communication and naming).
CSE298
CSE300
34
CSE333
ConclusionsConclusions
Demonstration of a feasible research approach exploring the relationship between reinforcement learning and deployment of component-based distributed agents
Communication between agents
Issues with the space complexity of Q-learning:where n = grid size, m = # mice, c = # cats, space complexity is 64n2(m+c+1)
1 mouse + 1 cat => 481Mb of memory storage for Q-Table
CSE298
CSE300
35
CSE333
Future workFuture work
Learning in environments that change in response to the learning agent
Communication among learning agents; multiagent learning
Overcoming problems of table size under multiagent conditions
Security in message-passing
CSE298
CSE300
36
CSE333
Partial list of references Partial list of references
S. Flake, C. Geiger, J. Kuster. Towards UML-based S. Flake, C. Geiger, J. Kuster. Towards UML-based analysis and design of multi-agent systems. ENAIS’2001.analysis and design of multi-agent systems. ENAIS’2001.
T. MitchellT. Mitchell. . Machine learningMachine learning. . McGraw-HillMcGraw-Hill, , 1997.1997. A. Printista, M. Errecalde, C. Montoya. A parallel A. Printista, M. Errecalde, C. Montoya. A parallel
implementation of Q-Learning based on communication implementation of Q-Learning based on communication with cache. with cache. http://journal.info.unlp.edu.ar/journal6/http://journal.info.unlp.edu.ar/journal6/papers/ p4.pdfpapers/ p4.pdf..
S. Russell, P. Norvig. S. Russell, P. Norvig. Artificial intelligence: A modern Artificial intelligence: A modern approach.approach. Prentice Hall, 1995. Prentice Hall, 1995.
S. Sen, G. WeissS. Sen, G. Weiss. . Learning in multiagent systemsLearning in multiagent systems. . In In G. Weiss, Ed.G. Weiss, Ed., , Multiagent systems: A modern approach Multiagent systems: A modern approach to distributed artificial intelligenceto distributed artificial intelligence, , MIT PressMIT Press, , 1999.1999.
R. Sutton, A. BartoR. Sutton, A. Barto. . Reinforcement learning: Reinforcement learning: An introductionAn introduction. . MIT PressMIT Press, , 1998.1998.
K. Sycara, A. Pannu, M. Williamson, D. Zeng, K. Decker.K. Sycara, A. Pannu, M. Williamson, D. Zeng, K. Decker. Distributed intelligent agentsDistributed intelligent agents. . IEEE ExpertIEEE Expert, , 12/96.12/96.