Page 1 of 6
Multi Agent Based Energy Management Control for Commercial Buildings
M. Godoy Simoes Colorado School of Mines Golden, CO 80401-1887
Abstract - This paper discusses the use of Multi-Agent- Systems to control various systems in a commercial building in order to achieve maximum energy efficiency while maintaining comfort for the occupants and allowing a possible interconnection with a smart-grid. An approximated optimal control is proposed in this paper, where on-line training of a Bayesian state-machine learns the system for a given utility function. Different aspects and challenges associated with the control of a building will be discussed, and a control scheme using Multi-Agent-Systems technology is proposed.
Index Terms - agents, control systems, distributed computing, energy efficiency, multi agent systems
I. INTRODUCTION
Energy usage in commercial and residential buildings
accounts for nearly 40% of the total energy consumption in
the US. Therefore, even for small improvements in the
energy efficiency may provide a tremendous opportunity to
reduce total energy consumption on a national scale. A great
portion of such energy is associated to low efficiency space
heating, air-conditioning, and domestic hot-water heating.
Advanced modeling and control, associated with the
flexibility of the smart-grid technology allow the integration
of renewable energy, energy storage facilities, and customer
participation in such systems that improved the overall
efficiency [I]. A holistic approach considering the many different systems
in a commercial or residential building is required in order to
incorporate the decision making process to achieve maximum
energy efficiency [2]. Thus, a multi-agent-system (MAS)
based control mechanism is proposed in this paper for energy
management performance improvement control.
A MAS approach is used in large, complex problems, with
global goals and operating on local knowledge and
possessing limited abilities [3]. MAS based controllers have
been showing a lot of promises in systems requiring non
linear dynamic, large scale distributed computing resources
[4] where speed, reliability and scalability of such distributed
systems makes them ideal to be used, for example, to control
a complex system such as a building.
II. CONTROLLING BUILDING SYSTEMS
Buildings are very complex structures consisting of
multiple interconnected systems and layers of abstraction. An
advanced energy management control system should
integrate and fuse data based on thermal behavior, user's
occupancy, electric load, light, and some predictions based on
Saurav Bhattarai Colorado School of Mines Golden, CO 80401-1887
scheduling of rooms and halls. In addition, modern buildings
need to include on-site distributed generation technologies,
demand response management as well as energy storage
technologies. Fig. 1 shows some systems and networks of a
modern building and their interconnections.
Fig 1: A building as a collection of interacting networks [5]
The building control approach incorporates multiple inputs
to perform action on multiple outputs (MIMO system). In
addition, the controller needs to maintain communication
among such different systems and networks. Therefore, a
control structure is depicted in Fig. 2, where the controller
receives various sensor data (temperature, humidity, electric
demand, occupancy of the building) as inputs, processes them
and sends set-points to various actuators in the building to
achieve the goal of energy efficiency, such as a furnace for
maintaining the temperature, or controlling the lighting of the
building in accordance to usage, or controlling the combined
heating/power cycle for a fuel cell system. Since it is
very
Ajr/lowconrrol Multi Agent Ughring Control Controller Building
FurnaceConrroi
Sensor Network
Fig. 2: MAS based feedback control for a building
difficult to quantify and define comfort, the MAS control
uses fuzzy logic statements in the controller to implement the
978-1-4244-9500-9/11/$26.00 © 2011 IEEE
user interface input and levels of comfort, can be defined,
considering temperature, air flow, and humidity as variables
that should effect the comfort of the occupants within an
acceptable range.
III. MULTI AGENT SYSTEM BASED CONTROLLER
Multi Agents Systems (MAS) consist of a network of
autonomous algorithms, agents, which are situated in a
particular environment in order to achieve a design objective
using distributed computing resources [4]. Massive
communication between agents enables decision making
considering the entire system. Agents have the ability to learn
from past actions, or through communication with other
agents. Agents have the ability to work with each other and
interact with human to monitor events and perform tasks [6]. MAS are a distributed computing paradigm, i.e. instead of
using a central computer to process inputs and make
decisions, multiple and less powerful computing resources,
dedicated to individual control agents, can achieve a very
complex goal decision.
Agents can be designed to take global goal seeking actions
as well as reactive actions. In large systems, emergency
scenarios may arise where an agent needs to take an action
immediately without seeking information from other agents.
With such multiple interdependent systems, a MAS based
control is very appropriate for energy management in modern
buildings. Maintaining energy efficiency and comfort are
functions of various aspects of a building as well as other
factors such as weather forecasts and electricity costs. Agents
can easily incorporate legacy systems in building and can be
validated for further advanced performance requirements [7]. This work introduces a MAS based controller (MABC) as
a hybrid structure that uses reinforcement learning, dynamic
programming and Bayesian learning [10-21] and a fuzzy
logic kernel. Dynamic programming is a very useful tool in
solving nonlinear MIMO control cases which can be
formulated as either a cost minimization or a maximization
problem. It is well know that backward numerical process
required for running dynamic programming makes the
computation and storage very problematic, especially for high
order nonlinear systems (curse of dimensionality). However,
in the last few years the literature indicates that many options
are possible, such as: Heuristic Dynamic Programming
(HDP), Dual Heuristic Programming (DHP) and Globalized
Dual Heuristic Programming (GDHP), and their action
dependent (AD) versions, that mean Action Dependent
Heuristic Dynamic Programming (ADHDP), Action
Dependent Dual Heuristic Programming (ADDHP) and
Action Dependent Globalized Dual Heuristic Programming
(ADGDHP) [21]. The basic idea in this MABC is to adapt a
Bayesian structure to approximate the future reward-to-go
function l(t) such that it satisfies a modified Bellman
Equation, used in dynamic programming, where instead of
Page 2 of 6
finding the exact mInImUm, an approximate solution is
sought for the following dynamic programming equation:
]*(X(t))=min{]*(X(t+l))+g(X(t), x(t+l))-Uol (1) 1/(/)
Where X(t) is the state of the system, g(X(t), X(t+l»is
the immediate cost incurred by the control action at time t
and U 0 is a heuristic balance term.
A proposed structure for a MABC is presented in Fig. 3 where for the sake of simplification only two systems of the
building are considered: the electrical system, and the
heating/cooling system. They are connected to their
respective agents, allowing inputs to the agents as well as
output actions to the systems. A centralized mechanism
reduces redundancy; cost functions and global goals are
identical and do not need to be included in each agent. A
communication channel enables the agents to have
information related to the cost functions and global goals and
to permit agent cooperation for meeting goals.
IPhy:I.�BIJ.donI --,
I Electric;ttSystem I
I I HeatiOf!, System I I '-- ____ 1
:'- � -t-r- ; -n
-� t -h
--
.. �
I . . i
,communication:
, behavior , ',quick actions:
.. _ - ------ - ----, Fig 3: A multi-agent based control (MABC) system for a building
Fig. 3 shows the internal implementation of the agents.
The communication layer handles the communication of the
agent with other agents as well as the controller. The
behavior layer contains information about the global goal
seeking actions of the agent and quick actions layer are
related to emergency needs, i.e. a sensor input corresponding
to a slight change in temperature will be handled in the
behavior layer, whereas an input corresponding to a fire in
the building will be handled by the quick actions layer.
Agents move through various states in order to achieve
goals, and utility functions are used to measure the fitness of
a certain state to a particular agent goal. The strength layer
contains information about the result of the utility function of
the agent, in a particular state, and the results of the utility
978-1-4244-9500-9/11/$26.00 © 2011 IEEE
Page 3 of 6
function for moving to another state, the expected utility. The
utility function is defined as follows [3]:
(2)
Equation (2) shows the trajectory for ui' the ith utility
function agent where s is the set of states. When the agent
has to change states it calculates the expected utility of
moving to this new state taking a certain action. One can
define the probability of reaching state s· from state s , while
taking action with a probability P(s,a,s' ) . The total
probability of changing states from s to all possible s·· ,
through all possible actions adds to one. So, the expected
utility of moving from state s using a certain action (a) is
[3]:
E[ui, s, a] = I.P(s,a,s')ui(s') S'E S
(3)
The expected utility is very important to determine if the
agent will move to another state, or if it is better to stay at the
current state. Fig. 4 shows the electrical agent in more detail,
including the inputs and outputs to the agent. Inputs such as
Inputs
'-----------':' .. - �-t-r-� n ;th--"�
:C 0 m m u n i cat ion: . . I behavior I . . �quick actions:
,------------ _ ..
Fig. 4: Electrical agent
Outputs
real time price information and electric demand determines
the behavior for electrical agent in order to make decisions
for the building to ramp-up electric storage to minimize costs
or else to use energy from the utility. Table 1 has further
information about the individual agents with their goals and
actions. Lighting can be also controlled by the electric agent
proportionally to the human occupancy of building areas.
TABLE I AGENT CHARACTERISTICS
Agent Type Inputs Actions Goals
Electric Electric Demand, Control Reduce cost of
Utility Pricing, Local electrical electricity for
Storage Information, system building while
Occupancy Info maintain comfort
Heating/Cool Temperature, Control Maintain comfort
ing Humidity heating/cooling for occupants
system while maintaining energy efficiency
IV. TESTING AND SIMULATION METHODOLOGY
Initially a simple building structure was defined based on
EnergyPlus building database and energy balance simulator
[22]. The heating/cooling agent was set to control specific
areas in the building with a defined temperature range. The
simulation and testing performed for this work focused on the
agent behavior layer; four states were defined in order to
achieve the control goal, along with three actions for each
one of those states. In order to have a Markovian decision
process, it was designed such as the choice of moving to a
new state depends on the agent current state and its action [3]. Table II shows the system states and actions for this
simulation scenario.
TABLE II SYSTEM STATES AND ACTIONS
State Description Ideal Temperature
2 Too Cold 3 Too Hot 4 Intermediate
Actions Description I Decrease Temperature 2 Increase Temperature 3 Do Nothing
It is necessary to define the transitions function T(s, a, s· ) where s is the current state, a is the action taken by any
agent and s' is the new state. For this scenario a
deterministic environment is assumed, as in real world, where
the agent action has a predictable effect and always the same
input and sequence of states will have same response. Table
III describes the transition functions for the states and actions
previously defined in Table II. Fig. 5 depicts all possible
actions and the probabilities in the overall data set that cause
transitions. It can be observed that total probability of every
action taken in a particular state (State 1 in the data from
Table 3), equals 1, removing any ambiguity of taking such
action. To calculate the optimal policy (the preferred action)
of each state, one must maximize the expected utility function
given in equation (2); and (4) defines the policy of a state,
* represented by 7( .
7(*(s) = max[E[ui's,a]] = max IT(s,a,s')ul (s') (4) s·
TABLE III SYSTEM TRANSITION FUNCTIONS
s a s' T(s,a,s') 2 0.3
I 4 0.7 2 3 0.3 2 4 0.7 3 1 1
978-1-4244-9500-9/11/$26.00 © 2011 IEEE
(II: 0.1
Fig. 5: State-machine showing actions and transition probabilities
If an agent has in its goal to reach a pre-defined state, it
will be rewarded when reaching there. In this model, State 1
(Sl) was considered to be the ideal state condition, rewarded
with a value of 1 when reached, and 0 for other states. A
possible stability problem that must be carefully studied is
when an agent unnecessarily switches between states just to
receive a larger reward and to overcome this problem, it has
been defined that each movement of the agent causes
depreciation in the utility, represented by the discount factor
(y), of reaching the next state. Therefore, the Bellman
equation (5) must be solved, and a value-iteration algorithm
is used for an approximate solution for real-time control.
While iterating, the maximum allowed error value can be set
to achieve a pre-defined accuracy, and Equ. (6) is used for
updating. Arbitrary values are assigned as initial values of
u(s) for all states and after several iterations, there is a
convergence corresponding to the probabilities of the
transition functions, specified in Table 3 and Figure 5.
u(s) = r(s) + Y.;7* (s) (5)
Ut+I(S) = r(s)+ y.max IT(s,a,s')ut(s') (6) s'
The evolution of states will stop depending on the
maximum change (£5) allowed between successive utility
values of the states; such maximum value is related to the
maximum error (& ) allowed in the system along with its
discount factor (y). Equation (7) shows the convergence limit
threshold for achieving a stable system solution and stops the
transitions. After the utility function converges, the policies
of the states (;7) are calculated using Equ. (4), in order to
decide the preferred action by the agent.
max £5 = &(1- y) Y
V. RESULTS AND ANALYSIS
Page 4 of 6
(7)
The MABC was implemented in MATLAB which will be
eventually integrated with a dynamic model of a building.
The initial values for u(s) were set to zero, a maximum
allowed error of 5% was specified along with a discount
factor of 0.5 (a discount factor of 0.5 means the utility
obtained by the agent moving between states is by 50%).
Using Equ. (7) to define the convergence limits, we get the
maximum allowed £5 = 0.05. Table IV shows the output
iterative values. It can be observed that the maximum £5 for
the calculated utility function is 0.0313 (change in utility
value of state 1 between iterations 5 and 6), which is below
the maximum allowed £5 = 0.05. Therefore, the algorithm
stops after 6 iterations. A graphical representation of the
changes in £5 through all those iterations is presented in Fig.
6.
TABLE IV OUTPUT ITERATIVE VALUES
u(s)
Iteration 1 2 3 4
0 0 0 0 0 1 1 0 0 0 2 1 .5 0. 1 0. 1 0.25 3 1 .75 0.22 0.22 0.4 4 1 .875 0.2990 0.2990 0.4925 5 1 .9375 0.3458 0.3458 0.5435 6 1 .9688 0.37 1 6 0.37 1 6 0.5708
Maximum Change in Utility Values of Stales
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
3 # of Iterations
Fig. 6: Change in /) through iterations. (0=5%, y=0.5)
978-1-4244-9500-9/11/$26.00 © 2011 IEEE
Page 5 of 6
Another scenario was considered where the discount factor
was prescribed to 0.7 (agent is penalized only 30% per move
between states), which changes the maximum allowed 8 to
0.0214. The simulation was run again and results are
presented in Figure 7. The utility function converges after 12
iterations and the decrease in penalty for movement between
states results in the system trying to find more combinations
of movements to maximize reward. This is calculated for the
maximum allowed 8, lower than the previous case. With the
final utility values obtained in the first case, the policy of the
agent was calculated. The results are presented in Table V.
This simulation study suggests that the agent, while in the
Ideal Temperature State 1, would do nothing (Action 3), and
when in State 2, would define to increase temperature (Action
2) and this simple automatic decision is exactly what an ad
hoc solution would suggest.
0.9
0.8
0.7 � � 0.6 " � 0.5 . go 20.4 £ �
0.3
0.2
0.1
Maximum Change in Utility Values of Siaies
Max. allowed Ii= 0.0214
6 # of Iterations
10 11 12
Fig. 7: Change in 15 through iterations. (0=5%, y=0.7)
TABLE V AGENT POLICIES
State Preferred Action 1 3 2 2 3 4
Analysis of the system behavior show that while in State
4, the agent would decide to decrease the temperature (Action
1). Upon further investigation, it can be seen that the value
for utility of moving from State 4 to State 2 or to State 3
were the same, and so the policy could be either to move to
anyone of those states. Therefore, the Action could have
equally been #2 (as opposed to #1, as described in Table V.
Such ambiguity is caused from the definition of the transition
function probabilities (Fig. 5) since the probability for taking
Action 1 and Action 2 from State 4 is the same (0.5), and thus
the utility calculation results in the same value. This
ambiguity can be easily avoided when transition function
probabilities are assigned in real building situations. It is very
unlikely that heating and cooling system of a building or area
will have the exact same characteristics, and thus different
probabilities would result for Action 1 or Action 2. Another
option to improve the performance would be to add a 5th
state, and change State 4 to intermediate cold, and State 5 to
intermediate hot, or vice-versa.
VI. PATH FORWARD
The simulation and analysis results for the behavior layer
of a single agent have showed very promising performance
and support the continuation of this Multi-Agent-Systems
energy management control for applications in a real physical
environment. A first step in improving the accuracy of the
system would be to implement an online learning scheme,
where the characteristics of the physical system (heating,
cooling, and insulation) are constantly monitored. This would
enable the probabilities of the transition functions to be
constantly updated constantly, allowing real-time tracking of
the utility function for different states.
In order to improve overall efficiency of the building
operation, agents for the other systems of the building
(electrical, sensing, monitoring and so on) will have to be
developed, along with a communication layer among the
agents. This would allow calculation of utilities of the states
of the agents, with the global goal of reducing energy usage,
improving storage capabilities for renewable energy sources,
improving electricity bills, increasing comfort of people and
many other possible variables.
VII. CONCLUSION
This paper introduced a Multi-Agent-Systems based
controller for performance improvement and energy
management of modern buildings. The characteristics of
agents dedicated for controlling a building, the strategy of
communication and a methodology of implementing an
optimization algorithm have been demonstrated. Two agents,
electrical and heating/cooling were analyzed and their
individual behaviors were discussed. The simulation results
are very promising, and future full-fledged system for
controlling a building will be reported. In order to improve
the accuracy of this control approach a possibility is to
implement an online learning scheme, where the
characteristics of the physical system (heating, cooling, and
insulation) can be constantly monitored. This will enable the
probabilities of the transition functions to be updated
constantly, which in turn allows for constant updates to the
utility values of the different states. To improve overall
efficiency of the buildings, agents for the other systems of the
building (electrical, sensing etc) will be developed, along a
communication layer for the agents. The controller will
calculate the utility function for the global goal of improving
energy efficiency and inclusion of renewable energy sources
978-1-4244-9500-9/11/$26.00 © 2011 IEEE
in the energy mix. Future work for the development of the
multi agent based control system includes the development of
a complete simulation environment for a building for all the
systems in order to consider different decision making
abilities for the control system and the energy management
operation.
ACKNOWLEDGEMENTS
This material is based upon work supported by the
National Science Foundation under Grant No. 0931748
REFERENCES
(1) M. flie, L. Xie, U. Khan, 1. Moura, "Modeling of Future CyperPhysical Energy Systems for Distributed Sensing and Control" IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, Vo1.40, No.4, July 2010
(2) P. Zhao, "" Master's Thesis, Dept. Eng., Colorado School of Mines, 2010.
(3) J. Vidal, Fundamentals of Multi agent Systems, 2007, pp. 9-11. (4) G. Weiss, Multiagent Systems, Massachusetts: The MIT Press, 1999,
pp. 6-8, 29. (5) R. Braun, B. Hoff, D. Mehta, K. Moore, M. Simoes, S.
Suryanarayanan, T. Vincent, CPS: Medium: Cyber-Enabled Efficient Energy Management of Structures (CEEMS), NSF Award: 0931748
(6) P. Maes, "Agents that Reduce Work and Information Overload" Communications of rhte ACM, July 1994
(7) K. Y. Lee, R. M. Edwards and P. D. McDaniel, "Survivable power plant operation through multi-agent system-based distributed and intelligent control systems," [Online). Available: http://www.ece.cmu.edul-nsf-cpsl. [Accessed: Oct. 5, 2010).
(8) Peng Zhao, M. Godoy Simoes, Siddhartb Suryanarayanan "A Conceptual Scheme for Cyber-Physical Systems Based Energy Management in Building Structures" 9th IEEB/IAS INDUSCON International Conference on Industry Applications, Silo Paulo (Brazil): November 2010.
(9) Peng Zhao, S. Suryanarayanan, M.G. Simoes, "An Energy Management System for Building Structures Using a Multi-Agent Decision-Making Control Methodology" Industry Applications Society Annual Meeting (lAS), 2010 IEEE, Oct. 2010
(10) P. J. Werbos , "Advanced forecasting methods for global crisis warning and models of intelligence," General Systems Yearbook, vol.22 , pp. 25-38 , 1977.
[11) D. P. Bertsekas and 1. N. Tsitsiklis, Neuro-Dynamic Programming , Belmont, MA: Athena Scientific, 1996.
(12) G. G. Lendaris and C. Paintz , " Training strategies for critic and action neural networks in dual heuristic programming method," Proceedings of the 1997 IEEE International Conference on Neural Networks, Houston, TX, June 1997, pp 712-717.
(13) D. V. Prokhorov, R. A. Santiago and D. C. Wunsch, "Adaptive critic designs : A case study for neurocontrol," Neural Networks, vol. 8, pp. 1367-1372, 1995.
(14) D. V. Prokhorov and D. C. Wunsch, " Adaptive Critic Designs," IEEE Transactions on Neural Networks, vol. 8, pp. 997-1007, Sept. 1997.
(15) S. Chakraborty, M. Godoy Simoes, "Neural dynamic programming based online controller with a novel trim approach" lEE Proceedings Control Theory & Applications, JanlMar 2005, vol. 152 no. 1, pp. 95-104
(16) F. W. Lewis, J. Campos and R. Selmic, " On adaptive critic architectures in feedback control," Invited session on Neural Control Systems, IEEE CDC, Phoenix, Az, 5-10 Dec. 1999.
(17) P. J. Werbos, "A menu of designs for reinforcement learning over time," in Neural Networks for Control(Chapter 3), Edited by W. T.Miller, R. S. Sutton, and P. J. Werbos, Cambridge, MA: The MIT Press, 1990.
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
Page 6 of 6
P. J. Werbos, "Approximate dynamic programming for real-time control and neural modeling," in Handbook of Intelligent Control : Neural, Fuzzy, and Adaptive Approaches (Chapter 13), Edited by D. A. White and D. A. Sofge, New York, NY: Van Nostrand Reinhold, 1992. S. Ferrari and R. F. Stengel , " An Adaptive Critic Global Controller," American Control Conference, May, 2002. T. T. Shannon and G. G. Lendaris , " Adaptive critic based design of a fuzzy motor speed controller," Proceedings of the 2001 IEEE International Symposium on Intelligent Control, Mexico City, Mexico, pp. 359-363, Sept. 2001. P.E.M. Almeida, M. Godoy Simoes, "Neural optimal control of PEMfuel cells with parametric CMAC networks" IEEE Transactions on Industry Applications, Jan./Feb. 2005, vol. 41 no. 1, pp. 237-245. (IEEE lAS Annual Meeting Paper Prize) P. Zhao, S. Suryanarayanan, M. Godoy Simoes, "An Energy Management System for Building Structures Using a Multi-agent Decision-Making Control Methodology," IEEE Industry Applications Society Annual Meeting, 2010. lAS 2010 vol., no., pp.l-7, 3-7 Oct. 20102010. P. Maes, "Agents that Reduce Work and Information Overload," Communications of the ACM, July 1994. E. A. Lee. Cyber Physical Systems: Design Challenges. [Online). http://chess.eecs.berkeley.edu/pubsl4271Lee_CyberPhysicaLISORC.pd f R.M. Edwards and P.D McDaniel K. Y. Lee. (2010, October) Survivable power plant operation through multi-agent system-based distributed and intelligent control systems. [Online). http://www.ece.cmu.edul-nsf-cpsl F. Y. Wang, "The Emergence of Intelligent Enterprises: From CPS to CPSS," IEEE Computer Society, July 2010. P.F. Ribeiro, M.G. Simoes S. Suryanarayanan, "Grid modernization efforts in the USA and Brazil - some common lessons based on the Smart Grid Initiative," in IEEE Power and Energy Society General Meeting, 2010, pp. 1-5. M. G. Simoes, A. Miraoui J. Lagorse, "A Multiagent Fuzzy-LogicBased Energy Management of Hybrid Systems," IEEE Transactions on Industry Applications, vol. 45, no. 6, pp. 2123-2129, Nov 2009. L. Xie, U. Khan, J. Moura M. flic, "Modeling of Future Cyber-Physical Energy Systems for Distributed Sensing and Control," IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, vol. 40, no. 4, July 2010. Yuvraj Agarwal Jan Kleissl, "Cyber-physical energy systems: focus on smart buildings," in Proceedings of the 47th Design Automation Conference, Anaheim, 201 O. R. Nathuji and K. Schwan, "VirtuaIPower: Coordinated Power Management in Virtualized Enterprise Systems," ACM SIGOPS Operating Systems Review, vol. 41, no. 6, p. 278, 2007. T. Weng, and R. Gupta Y. Agarwal, "The Energy Dashboard: Improving the Visibility of Energy Consumption at a Campus-Wide scale," in ACM BuildSys, 2009.
978-1-4244-9500-9/11/$26.00 © 2011 IEEE