+ All Categories
Home > Documents > CSE298 CSE300 1 CSE333 Learning Agents Presented by: Huayan Gao ([email protected]), Thibaut...

CSE298 CSE300 1 CSE333 Learning Agents Presented by: Huayan Gao ([email protected]), Thibaut...

Date post: 20-Dec-2015
Category:
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
36
CSE298 CSE300 1 CSE333 Learning Agents Learning Agents Presented by: Huayan Gao ([email protected]), Thibaut Jahan ([email protected]), David Keil ([email protected]), Jian Lian ([email protected]) Students in CSE 333 Distributed Component Systems Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut
Transcript

CSE298

CSE300

1

CSE333

Learning AgentsLearning Agents

Presented by: Huayan Gao

([email protected]), Thibaut Jahan ([email protected]),David Keil ([email protected]),Jian Lian ([email protected])

Students in CSE 333Distributed Component SystemsProf. Steven A. Demurjian, Sr.

Computer Science & Engineering DepartmentThe University of Connecticut

CSE298

CSE300

2

CSE333

OutlineOutline

AgentsAgents Distributed computing agentsDistributed computing agents The JADE platformThe JADE platform Reinforcement learningReinforcement learning UML design of agentsUML design of agents The maze problemThe maze problem Conclusion and future workConclusion and future work

CSE298

CSE300

3

CSE333

AgentsAgents

AutonomyAutonomy goal-orientednessgoal-orientedness collaborationcollaboration flexibilityflexibility ability to be self-startingability to be self-starting temporal continuitytemporal continuity charactercharacter adaptivenessadaptiveness mobilitymobility capacity to learn.capacity to learn.

Some general features characterizing agents:

CSE298

CSE300

4

CSE333

Classification of agentsClassification of agents

Interface AgentsAI techniques to provide assistance to the user

Mobile agentscapable of moving around networks gathering information

Co-operative agentscommunicate with, and react to, other agents in a multi-agent systems within a common environment

Reactive agents“reacts” to a stimulus or input that is governed by some state or event in its environment

CSE298

CSE300

5

CSE333

Distributed Computing AgentsDistributed Computing Agents

Common learning goal (strong sense)

Separate goals but information sharing (weak sense)

CSE298

CSE300

6

CSE333

The JADE PlatformThe JADE Platform

Java Agent Development EnvironmentJava Agent Development Environment-- Java Software frameworkJava Software framework- Middleware platform- Middleware platform- Simplifies implementation and deployment of MAS- Simplifies implementation and deployment of MAS

Services ProvidedServices Provided- AMS (Agent Management System)- AMS (Agent Management System)

registration, directory and managementregistration, directory and management- DF (Directory Facilitator)- DF (Directory Facilitator)

yellow pages serviceyellow pages service- ACC (Agent Communication Channel)- ACC (Agent Communication Channel)

message passing service within the platform (including message passing service within the platform (including remote agents)remote agents)

CSE298

CSE300

7

CSE333

JADE Platforms for distributed agentsJADE Platforms for distributed agents

CSE298

CSE300

8

CSE333 Agent typeAgent type

DeterministicDeterministic StochasticStochastic

AccessibleAccessible ReflexReflex Solves MDPsSolves MDPs

InaccessibleInaccessible Policy-basedPolicy-based SolvesSolvesnon-Markovnon-Markov POMDPs*POMDPs*

*Partially observable Markov decision problems*Partially observable Markov decision problems

Env

iron

men

t typ

e

Agents and Markov processesAgents and Markov processes

CSE298

CSE300

9

CSE333

Learning from the environmentLearning from the environment

Environment, especially a distributed one, Environment, especially a distributed one, may be complex, may changemay be complex, may change

Necessity to learn dynamically, without Necessity to learn dynamically, without supervisionsupervision

Reinforcement learningReinforcement learning- used in adaptive systems- used in adaptive systems- involves finding a - involves finding a policypolicy

Q-learning, a special case of RLQ-learning, a special case of RL- compute Q-values into Q-table- compute Q-values into Q-table- finds optimal policy- finds optimal policy

CSE298

CSE300

10

CSE333

Policy searchPolicy search

Policy: a mapping from states to actions

Policy is as opposed to action sequence

Agents that precompute action sequences cannot respond to new sensory information

Agent that follows a policy incorporates sensory information about state into action determination

CSE298

CSE300

11

CSE333

Components of a learnerComponents of a learner

In learning, In learning, perceptspercepts may help improve may help improve agent’s future success in interactionagent’s future success in interaction

Components:Components:- Learning element (improves policy)- Learning element (improves policy)- Performance element (executes policy)- Performance element (executes policy)- Critic: Applies fixed performance- Critic: Applies fixed performance measure measure- Problem generator: Suggests- Problem generator: Suggests experimental actions that will provide experimental actions that will provide information to learning element information to learning element

CSE298

CSE300

12

CSE333

A learning agent and its environmentA learning agent and its environment

CSE298

CSE300

13

CSE333

Temporal difference learningTemporal difference learning

• Uses observed transitions and differences between utilities of successive states to adjust utility estimates

• Update rule based on transition from state i to j:

U(i) U(i) + (R(i) + U(j) U(i))where- U is estimated utility,- R is reward- is learning rate

CSE298

CSE300

14

CSE333

Q-learningQ-learning

Q-learning: Q-learning: a variant of reinforcement a variant of reinforcement learning in which the agent incrementally learning in which the agent incrementally computes a table of expected aggregate computes a table of expected aggregate future rewardsfuture rewards

Agent modifies the values in the table to Agent modifies the values in the table to refine its estimates.refine its estimates.

Using the temporal-difference learning Using the temporal-difference learning approach, update formula is calculated approach, update formula is calculated after the learner goes from state after the learner goes from state ii to state to state jj::

QQ((a, ia, i) ) QQ ( (a, ia, i) + ) + ((RR((ii) + max) + max

aa QQ((aa, , jj) ) -- QQ ( (a, a, ii))))

CSE298

CSE300

15

CSE333

Q-valuesQ-values

Definition: Definition: Q-values Q-values are values are values QQ((a, ia, i) of expected ) of expected utility associated with a given action in a given utility associated with a given action in a given statestate

Utility of state:Utility of state: UU((ii) = max) = maxaa QQ((a, ia, i))

Q-values permit decision making without a Q-values permit decision making without a transition modeltransition model

Q-values are directly learnable from reward Q-values are directly learnable from reward perceptspercepts

CSE298

CSE300

16

CSE333

UML design of agentsUML design of agents

Standard UML did not provide a complete solution for depicting the design of multi-agent systems.

Multi-agent systems being actors and software, their design does not follow typical UML design

Goals, complex strategies, knowledge, etc. are often missed

CSE298

CSE300

17

CSE333

Reactive use casesReactive use cases

CSE298

CSE300

18

CSE333

A maze problemA maze problem

Simple example consisting of a maze for which the learner must find a policy, where the reward is determined by eventually reaching or not reaching a goal location in the maze.

Original problem definition may be modified by permitting multiple distributed agents that communicate, either directly or via the environment

CSE298

CSE300

19

CSE333

Cat and Mouse problemCat and Mouse problem

Example of reinforcement learning

The rules of the Cat and Mouse game are: - Cat catches mouse;- Mouse escapes cat;- Mouse catches cheese;- Game is over when the cat catches the mouse.

Source: T. Eden, A. Knittel, R. van Uffelen. Reinforcement learning. www.cse.unsw.edu.au/~aek/catmouse

Our project included modifying existing Java code to enable remote deployment of learning agents and to begin exploration of a multiagent version

CSE298

CSE300

20

CSE333

Cat-Mouse GUICat-Mouse GUI

CSE298

CSE300

21

CSE333

Use cases in the Cat-Mouse problemUse cases in the Cat-Mouse problem

CSE298

CSE300

22

CSE333

Classes for the Cat-Mouse problemClasses for the Cat-Mouse problem

CSE298

CSE300

23

CSE333

Sequence diagramSequence diagram

CSE298

CSE300

24

CSE333

Maze creation and registration

CSE298

CSE300

25

CSE333

Cat creation and registrationCat creation and registration

CSE298

CSE300

26

CSE333

JADEJADE

Cat look up maze from AMS and DF serviceCat look up maze from AMS and DF service

CSE298

CSE300

27

CSE333

JADEJADE

Mouse Agent Creating and RegistrationMouse Agent Creating and Registration

CSE298

CSE300

28

CSE333

Mouse Agent joins gameMouse Agent joins game

CSE298

CSE300

29

CSE333

Game begins

Game begins and Maze (master) and Mouse agents exchange information by ACL messages

CSE298

CSE300

30

CSE333

Remote deployment of learning agentsRemote deployment of learning agents

o Using JADE, we can deploy maze, mouse, and Using JADE, we can deploy maze, mouse, and cat agents:cat agents:Jademaze maze1Jademaze maze1Jademouse mouse1Jademouse mouse1Jadecat cat1Jadecat cat1

o Jademaze, jademouse, jadecat Jademaze, jademouse, jadecat are batch file are batch file names to deploy maze and cat agents. If we want names to deploy maze and cat agents. If we want to create them from a remote PC, we will use the to create them from a remote PC, we will use the following commands:following commands:Jademaze –host hostname mazename;Jademaze –host hostname mazename;Jademaze –host hostname catname;Jademaze –host hostname catname;Jademaze –host hostname mousename;Jademaze –host hostname mousename;

CSE298

CSE300

31

CSE333

Cat-Mouse in JADE Cat-Mouse in JADE

JADE allows services to be hosted and JADE allows services to be hosted and discovered in a distributed dynamic discovered in a distributed dynamic environment. environment.

On top of those “basic” services, mouse/cat On top of those “basic” services, mouse/cat agents can conceive maze/mouse/cat services agents can conceive maze/mouse/cat services provided and join/quit from the maze server provided and join/quit from the maze server they discovered from DF service. they discovered from DF service.

CSE298

CSE300

32

CSE333

InnovationInnovation

A backbone for a core platform encouraging other agents to connect and join

Access to ontologies and service description to move towards interoperability at the service level

A baseline set of deployed agent services that can be used as building blocks by application developers to create innovative value added services

A practical test for a learning agent system complying with FIPA standards.

CSE298

CSE300

33

CSE333

Deployment ScenarioDeployment Scenario

Infrastructure DeploymentInfrastructure Deployment - Enable their agents to interact with service agents - Enable their agents to interact with service agents developed by others developed by others - Test applications in a realistic, distributed, open - Test applications in a realistic, distributed, open environment environment

Agent and Service Deployment Agent and Service Deployment - FIPA ACL messages to exchange information- FIPA ACL messages to exchange information- Standard FIPA ACL compatible content- Standard FIPA ACL compatible content languages languages- FIPA defined agent management services|- FIPA defined agent management services| (directories, communication and naming). (directories, communication and naming).

CSE298

CSE300

34

CSE333

ConclusionsConclusions

Demonstration of a feasible research approach exploring the relationship between reinforcement learning and deployment of component-based distributed agents

Communication between agents

Issues with the space complexity of Q-learning:where n = grid size, m = # mice, c = # cats, space complexity is 64n2(m+c+1)

1 mouse + 1 cat => 481Mb of memory storage for Q-Table

CSE298

CSE300

35

CSE333

Future workFuture work

Learning in environments that change in response to the learning agent

Communication among learning agents; multiagent learning

Overcoming problems of table size under multiagent conditions

Security in message-passing

CSE298

CSE300

36

CSE333

Partial list of references Partial list of references

S. Flake, C. Geiger, J. Kuster. Towards UML-based S. Flake, C. Geiger, J. Kuster. Towards UML-based analysis and design of multi-agent systems. ENAIS’2001.analysis and design of multi-agent systems. ENAIS’2001.

T. MitchellT. Mitchell. . Machine learningMachine learning. . McGraw-HillMcGraw-Hill, , 1997.1997. A. Printista, M. Errecalde, C. Montoya. A parallel A. Printista, M. Errecalde, C. Montoya. A parallel

implementation of Q-Learning based on communication implementation of Q-Learning based on communication with cache. with cache. http://journal.info.unlp.edu.ar/journal6/http://journal.info.unlp.edu.ar/journal6/papers/ p4.pdfpapers/ p4.pdf..

S. Russell, P. Norvig. S. Russell, P. Norvig. Artificial intelligence: A modern Artificial intelligence: A modern approach.approach. Prentice Hall, 1995. Prentice Hall, 1995.

S. Sen, G. WeissS. Sen, G. Weiss. . Learning in multiagent systemsLearning in multiagent systems. . In In G. Weiss, Ed.G. Weiss, Ed., , Multiagent systems: A modern approach Multiagent systems: A modern approach to distributed artificial intelligenceto distributed artificial intelligence, , MIT PressMIT Press, , 1999.1999.

R. Sutton, A. BartoR. Sutton, A. Barto. . Reinforcement learning: Reinforcement learning: An introductionAn introduction. . MIT PressMIT Press, , 1998.1998.

K. Sycara, A. Pannu, M. Williamson, D. Zeng, K. Decker.K. Sycara, A. Pannu, M. Williamson, D. Zeng, K. Decker. Distributed intelligent agentsDistributed intelligent agents. . IEEE ExpertIEEE Expert, , 12/96.12/96.


Recommended