Darren Low Wei Wen , Dennis Neo Wee Keong and A Senthil Kumar

transcript

A Study on Automatic Fixture Design Using Reinforcement Learning

Darren Low Wei Wen1, Dennis Neo Wee Keong2 and A Senthil Kumar1

1Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive

1, Block EA, Singapore 117576, Singapore

2Singapore Institute of Manufacturing Technology (SIMTech), 73 Nanyang Drive, Singapore

Email: darren.low@u.nus.edu; mpeask@nus.edu.sg

Abstract

Fixtures are used to locate and secure workpieces for further machining or measurement process.

Design of these fixtures remains a costly process due to the significant technical know-how

required. Automated fixture design can mitigate much of these costs by reducing the dependence

on skilled labour, making it an attractive endeavour. Historical attempts in achieving automated

fixture design solutions predominantly relied on case-based reasoning (CBR) to generate fixtures

by extrapolating from previously proven designs. These approaches are limited by their

dependence on a fixturing library. Attempts in using rule-based reasoning (RBR) has also shown

to be difficult to implement comprehensively. Reinforcement learning, on the other hand, does not

require a fixturing library and instead builds experience and learns through interacting with an

environment. This paper discusses the use of reinforcement learning to generate optimised

fixturing solutions. Through a proposed reinforcement learning driven fixture design (RL-FD)

framework, reinforcement learning was used to generate optimised fixturing solutions. In response

to the fixturing environment, adjustments to the reinforcement learning process in the exploration

phase is studied. A case study is presented, comparing a conventional exploration method with an

adjusted one. Both agents show improved average results over time, with the adjusted exploration

model exhibiting faster performance.

Key words

Fixture design, reinforcement learning, deep learning, locator positions

1. Introduction

Fixtures are an integral aspect of manufacturing, providing the essential workpiece locating and

clamping elements prior to subsequent processes. Having well designed fixtures are crucial in

achieving consistent and accurate manufacturing outcomes. Design of these fixtures is challenging

due to many engineering considerations and optimisation strategies. Conventionally, fixture design

relies on the vast heuristic experience of human designers [1], demanding years of apprenticeship

to learn. Reliance on these skilled professionals is essential in creating optimal fixtures, which

results in costly fixtures. Moreover, the experience-heavy nature of fixture design causes a

significant knowledge gap for junior fixture design engineers, which can limit the effectiveness of

their contributions. Much research has therefore been done on automating fixture design, which

would potentially reduce design costs, human error and lead time.

1.1. Automated fixture design systems

There has been significant research on the ideal model of automated fixture design solution. CBR

has been the predominant approach in these attempts [2-5]. CBR leverages on matching a given

workpiece to similar proven fixturing designs, subsequently providing the necessary design

adjustments [2]. In essence, CBR requires feature recognition, an indexed design library, design

retrieval and evaluation, and final adjustments to work.

The need to properly index a design library poses a major limitation of CBR. McSherry [6]

argues that CBR is limited by inseparability cases, whereby poor definition of indexing parameters

could result in two equally weighted solutions. This rare scenario could result in a less optimal

solution being used as the reference design. McSherry [6] suggests that this occurs due to

inadequate representation of the indexing mechanisms used. Therefore, the selection of CBR

indexing parameters can be difficult to optimise.

Another limitation of CBR is the reliance of extensive and comprehensive fixture design

libraries to produce fixturing solutions, which limits the ability of CBR when processing unique

and unorthodox workpieces. If a given workpiece is significantly different from those found in the

library, CBR may not be able to produce a valid result [7]. In other words, although CBR has

shown to be effective in solving experienced-based problems through inference from a library, it

is unable to adapt to significantly different situations. This inflexible nature of CBR therefore

limits its potential in handling edge cases and CBR ultimately still depends on fixture designers to

provide high-quality examples.

Apart from CBR approaches, RBR has also been studied in generating fixture designs [1,

8-10]. Unlike CBR, RBR utilises a selection of defined rules to convert geometric information into

suggested positions for fixture locating elements.

RBR’s main disadvantage is the difficulty in accurately and comprehensively defining the

rules needed to encompass all possible fixture designs. Prentzas and Hatzilygeroudis [11] describes

the complications of converting a domain expert’s knowledge and experience into rules precisely

and exhaustively. Additionally, using too little rules would result in poor coverage of possible

fixturing problems, whereas too many rules would make the code significantly complex.

Zhang, et al. [12] published a case study on the use of a combination of CBR and RBR to

generate fixture design solutions. Such approaches attempt to combine the benefits of both

methods, but still suffer from inheriting their individual limitations as discussed previously.

Machine learning has also been explored in automating fixture design [13]. However, much

of reported work remains conceptual.

1.2. Reinforcement learning

Research into AI has greatly accelerated in the past decade, spurred on by widespread adoption

and availability of machine learning.

Reinforcement learning is a subset of machine learning where an artificial agent learns

through interaction with an environment. Conventional reinforcement learning agents leverages

on deep convolutional neural networks (CNN) to mimic the effects of the neurological vision

processing [14]. CNNs is a highly successful neural network that is capable of processing 2-

dimensional (2D) data like those found in images or videos [15]. In essence, this neural network

receives a 2D input and produces a relevant agent action as the output.

In reinforcement learning, the agent is trained using a reward function for performance

feedback. This reward function evaluates the action chosen by the agent, resulting in a positive or

negative reward. The agent ultimately strives to achieve the maximum cumulative reward, which

mimics a human performing all the optimal actions required for the task.

Reinforcement learning has shown to be effective in training agents in accomplishing

complex tasks. Classic Atari games has been extensively used in demonstrating effective

reinforcement learning [14, 16]. On the other hand, reinforcement learning also has shown

successes in real-world problems like traffic engineering [17] and supply-chain management [18].

1.3. Discussion on reinforcement learning driven automated fixture design

The use of reinforcement learning to automate fixture design has significant advantages over

current approaches in CBR and RBR driven fixture design.

Reinforcement learning removes the requirement of a fixturing library and fixture design

rules, which is a prerequisite for CBR and RBR based approaches respectively. Instead, the use of

reinforcement learning enables the adoption of a single fundamental rule, which is the requirement

for the workpiece to not move. The agent will then train itself to adapt to the given environment

with that rule. This approach therefore negates much of the limitations and assumptions affecting

CBR and RBR based fixture design.

The use of reinforcement learning also brings more flexibility and dynamism to generating

fixture designs. As the agent improves over time through interaction with the fixturing

environment, novel solutions could be generated for even the most unorthodox workpieces. CBR

on the other hand, struggles in generating fixturing solutions for workpieces that are significantly

different from those present in the fixture library. Workpieces that require fundamentally different

design rules would also struggle with the rigidity of RBR’s rules. Therefore, reinforcement

learning can be used to train a smart and adaptive agent in generating novel fixturing solutions for

even the most unconventional of workpieces.

In summary, automatic fixture design using CBR and RBR has been well researched, but

critical assumptions in their fundamental mechanisms limits their practicality for fixture design

use. Reinforcement learning provides a radically different approach to generating fixturing

solutions, whereby an agent is progressively trained to make better fixturing decisions.

2. Application Framework

In this paper, we propose a reinforcement learning driven fixture design framework (RL-FD) as a

novel approach towards automating fixture design decisions. The RL-FD architecture is as shown

in Figure 1. A 2D silhouette of the given workpiece is first generated, following which the initial

locator positions are generated. Using an interactive physics-based silhouette-locator environment,

the fixture’s ability to constrain the workpiece will be tested. A reinforcement learning agent,

which is made using CNN, will be trained against this environment. With the rewards given for

acceptable locator removal, the agent should progressively get better at fixturing decision making.

Eventually, the agent would be able to create highly optimised fixturing solutions, which will be

stored and listed out to the user for selection. The selected solution can then be used to

automatically generate a computer-aided design (CAD) fixture.

Figure 1: RL-FD framework

There are several key requirements necessary for reinforcement learning function, these

would be discussed in the following sections.

2.1. Generating workpiece silhouette

In any 3-dimensional fixture system, the outer surface tends to be more important for locating and

clamping. This surface can be defined using a 2D top-down silhouette of the workpiece as shown

in Figure 2. Using a 2D silhouette also simplifies the physics and improves compatibility with

already available CNNs.

Figure 2: Conversion of workpiece to silhouette (a) Isometric view (b) Top view (c) Workpiece Silhouette

2.2. Physics simulation and state representation

To determine if a fixture is properly constraining a particular workpiece silhouette, it is necessary

to represent physics interactions in the environment. This paper utilises pybox2d, a physics rigid

body simulation library made by Catto [19] and later adapted for python by Lauer [20]. The physics

environment was programmed to be compatible with OpenAI’s Gym requirements, which can be

easily adapted on other Gym compatible agents.

This programmed environment contains two types of objects as shown in Figure 3a. Firstly,

the silhouette is represented as a dynamic object, which in this case is a rectangle with a pair of

opposing chamfers. Circles in this case functions as the locators and is represented as a static circle

shape to replicate a point contact on the silhouette.

(a) (b) (c)

Figure 3: Observation scaling to reduce input size (a) Original (b) Scaled down (c) Greyscale

Reinforcement learning also requires information on the current state of simulation, which

would be used to determine the following actions and subsequent adjustments of the artificial

agent’s behaviour. For this paper, pyglet will be used for the purposes of rendering a 2D image

using input about the current physics state from pybox2d. Following this, the rendered image

would be scaled down and grey-scaled to significantly reduce the state size as shown in Figure 3.

2.3. Action space and reward feedback

There are only two actions available to the agent, which are either a remove or keep locator action.

These two actions represent the absolute minimum necessary for proper interactions with the

physics simulation, which can help speed up the learning process.

Table 1: Fixture testing

# 1 2 3 4 5 6

To encourage the agent to reach a more optimal fixture (with the least number of locators),

positive reward will be given for each successful remove action performed. Successful removals

(a) (b) (c)

are evaluated by applying a series of forces and torques for 30 simulation steps each on the

workpiece silhouette, which are summarised in Table 1. The evaluation simply identifies

significant linear or angular movement beyond acceptable limits, which is indicates that the

workpiece is no longer secure and therefore the fixture is invalid.

Table 2: Example of possible actions and rewards

Frame 1 2 3 4

Action Keep Remove Remove -

Reward 0 1 0 -

Cumulative Reward 0 1 1 0

In RL-FD, the decision to perform either action will be made by the neural network and

discussed subsequently in Section 2.5. To better illustrate the working mechanisms behind the

environment, Table 2 demonstrates an example when the following user selected actions are made:

Frame 1

o A random locator is chosen and highlighted green

o Agent chooses keep action, which provides no reward

Frame 2

o A different randomly chosen locator is chosen and highlighted green

o Agent chooses remove action, which removes the circle and begins the fixture

testing

o Test identified a constrained workpiece silhouette. Therefore, fixture solution is

acceptable, reward of 1 is given.

Frame 3

o A different randomly chosen locator is chosen and highlighted green

o Agent chooses remove action, which removes the circle and begins the fixture

testing

o Test identified an unconstrained workpiece silhouette. Therefore, fixture solution

is unacceptable, reward of 0 is given. The cumulative reward for this episode is

therefore 1.

Frame 4

o As fixture is considered invalid, simulation is reset back to original state and

another random point is chosen again.

The simulation resets when any one of the following conditions are met:

a) The fixture is unable to constrain the workpiece.

b) All locators listed has been acted upon by the agent.

c) There are only 3 remaining locators1.

Randomised selection of locators was performed to provide even representation of all test

locators. If the locators were iterated in a fixed sequence, locators towards the end of the list would

have lesser chances to be acted upon by the agent. Thus, randomising was performed to provide

for wider case representation.

1 3 locators are typically the optimal result in most scenarios, but this goal can be user-adjusted for special cases.

2.4. Automatic generation of initial locators

Figure 4: Locator geometries

Workpiece silhouette geometries are defined using the X-, Y-coordinates of the polygon edges

within the physics engine. The list of given coordinates can be used to generate the initial locators

around the silhouette using the following equations:

𝜃 = arctan 2 (𝑦𝐵−𝑦𝐴

𝑥𝐵−𝑥𝐴) (1)

𝑛 = [ 0,√(𝑥𝑏−𝑥𝑎)2+(𝑦𝑏−𝑦𝑎)2

2𝑟 ] (2)

𝑥𝑛 = 𝑥𝐴 + 𝑟(2𝑛 + 1)(𝑐𝑜𝑠𝜃) 𝑦𝑛 = 𝑦𝐵 + 𝑟(2𝑛 + 1)(𝑠𝑖𝑛𝜃) (3)

𝑥′𝑛 = 𝑥𝑛 + 𝑟(cos(90 − 𝜃)) 𝑦′𝑛 = 𝑦𝑛 − 𝑟(sin (90 − 𝜃)) (4)

Where (𝑥𝐴, 𝑦𝐵) and (𝑥𝐵, 𝑦𝐵) represents the initial and subsequent silhouette geometrical

coordinates respectively. (𝑥𝑛, 𝑦𝑛) are coordinates between A and B, where 𝑛 is a range of integers

(𝑥𝐵, 𝑦𝐵)

(𝑥𝐴, 𝑦𝐴)

(𝑥𝑛, 𝑦𝑛)

(𝑥′0, 𝑦′0)

(𝑥0, 𝑦0)

(𝜃 − 90)

defined by Eq. (2). (𝑥𝑛′ , 𝑦𝑛

′ ), which is centre of each circular locator can then be determined using

Eq. (4). The conversion occurs until all the edges of the given polygon has been acted upon. These

final converted coordinates are used to automatically populate the necessary initial locator

positions based off the given geometry.

Locator geometries can also be manually inserted, which is necessary in cases where

specific edges of the workpiece are involved in machining processes.

2.5. Neural network

This paper will study the use of CNNs to come out with effective fixturing decisions. CNNs are

convenient in this study as a 2D pixel array (Section 2.1) can be used as an input.

The CNN used in this study is shown in Figure 5. This neural network was adapted from

Muntean [21], which was originally designed for use on Atari game environments. Changes

include widening the neural network input layer to incorporate the larger pixel array and

adjustments to grey scaling.

Figure 5: Neural network (Not drawn to scale)

In this paper, the converted image generated from Figure 3 is passed through the CNN

which consists of three convolutional layers. These convolutions reduce the size of the input image

between each layer, which reduces downstream computational loads while maintaining the spatial

connections of the original image. A flatten operation is then performed on the output, which

transforms the resulting image into 1-dimensional data. The flattened output would then be passed

through two fully connected layers with 512 and 2 nodes respectively, with the final 2 nodes

representing the decision to perform either Keep or Remove.

A key consideration of reinforcement learning is the incorporation of an exploration

method, which provides adequate and representative training on all possible actions available to

the agent. At the start of any learning process, the agent would pick actions from this exploration

function instead of the neural network. Overtime, dependence on this exploration mechanism

would decrease and instead the neural network would be used to make decisions.

Figure 6: Exploration mechanism (a) shows equal chance of either keep or remove actions in initial exploration phase (b) shows

75% chance of remove and 25% keep in initial exploration phase

A widely adopted exploration method is selecting actions randomly, with each possible

action having an equal chance of being selected. For the environment in this paper, the agent only

needs to decide between keep and remove. The exploration mechanism of these two actions would

therefore be as illustrated in Figure 6a, where both actions would be chosen 50% of the time.

However, in the proposed fixture design environment, it can be observed that an ideal agent

would perform the remove action frequently in order to maximise its cumulative reward.

Therefore, this paper would also compare a chance-adjusted model whereby the remove and keep

actions are performed 75% and 25% of the time respectively as shown in Figure 6b. This

adjustment should theoretically allow the agent to learn quicker by bringing it closer towards an

ideal state, while also maintaining sufficient representation of the keep action.

3. Case study

This paper will utilise the workpiece shown in Figure 2a as a case study. Using the proposed RL-

FD framework, a feasible fixture would be generated for machining the top surface of this

workpiece.

As discussed in Section 2.5, two different agents would be trained on this environment

until 5,000,000 steps2 are performed. These are:

a) 50-50 Agent: Performs 50% Remove and 50% Keep when exploring

b) 75-25 Agent: Performs 75% Remove and 25% Keep when exploring

𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐹𝑖𝑛𝑎𝑙 𝑅𝑒𝑤𝑎𝑟𝑑 = ∑ 𝐹𝑖𝑛𝑎𝑙 𝑅𝑒𝑤𝑎𝑟𝑑𝑠

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑢𝑛𝑠 (5)

The average final rewards will be used as a means of evaluating learning performance and

is calculated at every step using equation (5). Final rewards are recorded when the minimum

number of active constraints is achieved, or when the solution is not valid. In both cases, the

physics environment is reset after.

2 A step counts as a single action performed by the agent. 5,000,000 steps was chosen as a basis of comparison between

both agents as preliminary runs show stable results prior to this step count.

Figure 7: Average Final Rewards obtained during training, a single step refers to either a Remove or Keep action.

As shown in Figure 7, both agents show performance improvements over the time. This

demonstrates their capabilities in learning from the fixture design environment and progressive

improvements in making better fixture design decisions. However, the 50-50 Agent initially started

with a noticeably lower average reward and took significantly longer to reach higher scores as

compared to the 75-25 Agent. This observation can be attributed by a more optimal exploration

phase, resulting in favourable learning performance as exhibited by the 75-25 Agent.

Using the same computer in both instances, a faster learning was observed from the 75-25

Agent as compared to the 50-50 Agent. The 75-25 Agent also obtained the optimal fixture (highest

reward) in 20 hours, as compared to 35 hours from the 50-50 Agent. The time reduction of 43%

exhibited by the 75-25 Agent is preferable, resulting in an overall faster fixture optimization

process which consequentially reduces fixture lead times.

0 1M 2M 3M 4M 5M

Average Reward

50-50 Agent 75-25 Agent

Figure 8: Selection of training results.

Generated solutions that receive high cumulative rewards are saved and listed out as shown

in Figure 8. The presented solutions have passed the fixturing test described in Section 2.3 and

therefore represent the valid and optimal fixturing solutions for the given workpiece.

Figure 9: Generated fixture (a) Raw workpiece prior to machining (b) Workpiece during intermediate machining processes

Upon selecting any of the fixturing solutions in Figure 8, the user will have to select which of the

given locators should be changed into clamping elements. The completed CAD model can then be

generated as shown in Figure 9.

3.1. Discussions

The case study has demonstrated the use of reinforcement learning to automate desirable fixture

designs. As neither fixture design rules nor other fixture design cases were required, the previously

discussed limitations posed by RBR and CBR respectively was avoided. The adaptive nature of

reinforcement learning empowers the agent to work with varying workpiece shapes, progressively

becoming better at making fixturing decisions through training.

It is also important to plan the relevant machining processes prior to performing RL-FD.

In the case study demonstrated, only the machining of the top surface was considered. Therefore,

the silhouette was obtained with reference to the top surface. RL-FD would need to be performed

again for machining processes that are performed on other surfaces with differing silhouettes.

Optimization in the exploration method has also exhibited an improvement in learning

performance. Adjustments in the decisions made during exploration phase led to a significant

reduction in time required to generate an optimal solution. Such training optimization strategies is

vital as solution lead time is a significant consideration for practical use.

Results generated by the proposed automated RL-FD framework can also be applied on

robot grippers. Solutions generated through the studied method can also be interpreted as gripper

finger positions, ensuring that the object picked up is secure during subsequent movements.

Figure 10: Difficulty in accurately evaluating fixtures

One limitation has been observed from the described automatic RL-FD framework is the

inaccuracies of the fixturing test performed. As illustrated in Figure 10, when Force A is applied,

the workpiece would remain in position and therefore the fixture is valid. On the other hand, Force

B would result in a counter-clockwise rotation of the workpiece, which results in an invalid fixture.

The number, location and direction of the forces applied during the test is therefore essential in

accurate evaluation of the fixture. This problem could be addressed by modifying the physics

engine to represent a more comprehensive physics test, which would be more computationally

expensive if not defined optimally. Consideration into the required machining processes is a

possible avenue of optimising this fixturing test.

4. Conclusions

Fixture design relies heavily on the experience and awareness of the multitude of engineering

considerations. It is therefore very vital and essential to automate this process, which can reduce

the reliance on experienced designers and thereby improving the overall efficiency. Historical

attempts in automated fixture design has largely adopted the use of CBR and RBR, which exhibits

certain fundamental limitations in its practical use. This paper demonstrates the use of an

automated RL-FD framework using reinforcement learning. The framework was described and

demonstrated through a case study. Reinforcement learning was found to be capable of training a

neural network in making better fixture design decisions over time. We also found that

optimisation of the exploration mechanism was shown to be effective in improving the training

time. Further research is ongoing to encompass more complex workpieces using non-silhouette-

based approaches.

References

[1] A. Y. C. Nee and A. S. Kumar, "A Framework for an Object/Rule-Based Automated

Fixture Design System," CIRP Annals, vol. 40, no. 1, pp. 147-151, 1991/01/01/ 1991.

[2] A. S. Kumar and A. Y. C. Nee, "A framework for a variant fixture design system using

case-based reasoning technique," Manufacturing Science and Engineering, ASME, vol. 3,

pp. 763-775, 1995 1995.

[3] S. H. Sun and J. L. Chen, "A fixture design system using case-based reasoning,"

Engineering Applications of Artificial Intelligence, vol. 9, no. 5, pp. 533-540, 1996/10/01/

[4] W. Li, P. Li, and Y. Rong, "Case-based agile fixture design," Journal of Materials

Processing Technology, vol. 128, no. 1, pp. 7-18, 2002/10/06/ 2002.

[5] H. Hashemi, A. M. Shaharoun, and I. Sudin, "A case-based reasoning approach for design

of machining fixture," The International Journal of Advanced Manufacturing Technology,

vol. 74, no. 1, pp. 113-124, 2014/09/01 2014.

[6] D. McSherry, "The Inseparability Problem in Interactive Case-Based Reasoning," in

Research and Development in Intelligent Systems XVIII, London, 2002, pp. 109-122:

Springer London.

[7] P. S. Szczepaniak and A. Duraj, "Case-Based Reasoning: The Search for Similar Solutions

and Identification of Outliers," Complexity, vol. 2018, p. 12, 2018, Art. no. 9280787.

[8] A. Y. C. Nee, A. S. Kurnar, S. Prombanpong, and K. Y. Puah, "A Feature-Based

Classification Scheme for Fixtures," CIRP Annals, vol. 41, no. 1, pp. 189-192, 1992/01/01/

[9] X. Dong, W. R. DeVries, and M. J. Wozny, "Feature-Based Reasoning in Fixture Design,"

CIRP Annals, vol. 40, no. 1, pp. 111-114, 1991/01/01/ 1991.

[10] A. S. Kumar, A. Y. C. Nee, and S. Prombanpong, "Expert fixture-design system for an

automated manufacturing environment," Computer-Aided Design, vol. 24, no. 6, pp. 316-

326, 1992/06/01/ 1992.

[11] J. Prentzas and I. Hatzilygeroudis, "Categorizing approaches combining rule-based and

case-based reasoning," Expert Systems, vol. 24, no. 2, pp. 97-122, 2007.

[12] F. P. Zhang, D. Wu, T. H. Zhang, Y. Yan, and S. I. Butt, "Knowledge component-based

intelligent method for fixture design," The International Journal of Advanced

Manufacturing Technology, vol. 94, no. 9, pp. 4139-4157, 2018/02/01 2018.

[13] A. S. Kumar, V. Subramaniam, and T. Boon Teck, "Conceptual design of fixtures using

machine learning techniques," The International Journal of Advanced Manufacturing

Technology, vol. 16, no. 3, pp. 176-181, 2000.

[14] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves,

M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I.

Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level

control through deep reinforcement learning," Nature, vol. 518, p. 529, 02/25/online 2015.

[15] I. Arel, D. C. Rose, and T. P. Karnowski, "Deep Machine Learning - A New Frontier in

Artificial Intelligence Research [Research Frontier]," IEEE Computational Intelligence

Magazine, vol. 5, no. 4, pp. 13-18, 2010.

[16] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M.

Riedmiller, "Playing atari with deep reinforcement learning," arXiv preprint

arXiv:1312.5602, 2013.

[17] M. D. Pendrith, "Distributed reinforcement learning for a traffic engineering application,"

in Proceedings of the fourth international conference on Autonomous agents, 2000, pp.

404-411: Citeseer.

[18] S. K. Chaharsooghi, J. Heydari, and S. H. Zegordi, "A reinforcement learning model for

supply chain ordering management: An application to the beer game," Decision Support

Systems, vol. 45, no. 4, pp. 949-959, 2008.

[19] E. Catto. (2011). Box2D. Available: https://github.com/erincatto/Box2D

[20] K. Lauer. (2011). pybox2d. Available: https://github.com/pybox2d/pybox2d

[21] A. Muntean. (2017). Deep Q-Learning. Available:

https://github.com/andreimuntean/Deep-Q-Learning

Darren Low Wei Wen , Dennis Neo Wee Keong and A Senthil Kumar

Documents