+ All Categories
Home > Documents > Unifying temporal and organizational scales in multiscale decision-making

Unifying temporal and organizational scales in multiscale decision-making

Date post: 25-Nov-2016
Category:
Upload: abhijit
View: 213 times
Download: 1 times
Share this document with a friend
13
Decision Support Unifying temporal and organizational scales in multiscale decision-making Christian Wernz a,, Abhijit Deshmukh b a Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA 24061, USA b School of Industrial Engineering, Purdue University, West Lafayette, IN 47907, USA article info Article history: Received 21 March 2011 Accepted 26 June 2012 Available online 6 July 2012 Keywords: Distributed decision-making Game theory Markov processes Multi-agent systems OR in service industries abstract In enterprise systems, making decisions is a complex task for agents at all levels of the organizational hierarchy. To calculate an optimal course of action, an agent has to include uncertainties and the anticipated decisions of other agents, recognizing that they also engage in a stochastic, game-theoretic reasoning process. Furthermore, higher-level agents seek to align the interests of their subordinates by providing incentives. Incentive-giving and receiving agents need to include the effect of the incentive on their payoffs in the optimal strategy calculations. In this paper, we present a multiscale decision-mak- ing model that accounts for uncertainties and organizational interdependencies over time. Multiscale decision-making combines stochastic games with hierarchical Markov decision processes to model and solve multi-organizational-scale and multi-time-scale problems. This is the first model that unifies the organizational and temporal scales and can solve a 3-agent, 3-period problem. Solutions can be derived as analytic equations with low computational effort. We apply the model to a service enterprise challenge that illustrates the applicability and relevance of the model. This paper makes an important contribution to the foundation of multiscale decision theory and represents a key step towards solving the general X- agent, T-period problem. Ó 2012 Elsevier B.V. All rights reserved. 1. Introduction Decision-making in enterprise systems is complex. Reasons for the complexity are that (1) decisions across various organizational levels are interdependent, (2) decisions are made over time with past and future decisions affecting each other, and (3) decisions are made under uncertainty. An action by a decision-making agent at one level of the organization can affect the actions and their out- comes at other organizational levels. Consequentially, a decision maker, who seeks a good or optimal course of action, must con- sider these interdependencies. In addition, decisions are made over time, with short-term decisions and outcomes affecting long-term decisions and their outcomes. Information about future periods must be taken into consideration when determining an optimal course of action. The organizational and temporal scales of deci- sion-making in complex systems are intertwined and these scales must not be considered in isolation. To find an optimal decision resulting in the highest expected re- ward, an agent has to consider how its actions affect other agents at different organizational levels and at different time scales, and how the decisions of other agents at other organizational and tem- poral scales affect its own decisions and outcomes. This challeng- ing task is further complicated by the uncertainty that exists between decisions and their outcomes. Lastly, agents may have limited access to information because information is not available, restricted or too costly to obtain. The organizational and temporal scales are tightly coupled. Deci- sions at lower levels of the organization are made with higher fre- quency than decisions at higher levels. Top-level agents make strategic decisions, mid-level agents make tactical decisions and low-level agents make operational decisions. For example, an oper- ational decision in a production environment, such as deciding on order quantities, is made on a daily basis; decisions on production programs which are made higher up in the hierarchy are made on a weekly or monthly basis; and, even further up the hierarchy, deci- sions on the purchase of new production equipment are made with even lower frequency, e.g., quarterly or annually. Thus, the decision frequency is correlated to the hierarchical level. Only by recognizing the interconnectedness of the organizational level and the decision frequency can accurate models be developed that connect short- term, low-level decisions with long-term, high-level decisions. Another aspect of organizational decision-making that needs to be considered is that of goal alignment and incentives. Decision makers in organizations, particularly in large, anonymous systems, may seek to maximize their own payoff and not necessarily that for the organization. An effective incentive structure ensures that self-interested agents make decisions that are also beneficial to the organization. The alignment of organizational and individual interests can be achieved through incentives that are passed down 0377-2217/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.ejor.2012.06.038 Corresponding author. Tel.: +1 540 231 9772; fax: +1 540 231 3322. E-mail address: [email protected] (C. Wernz). European Journal of Operational Research 223 (2012) 739–751 Contents lists available at SciVerse ScienceDirect European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor
Transcript
Page 1: Unifying temporal and organizational scales in multiscale decision-making

European Journal of Operational Research 223 (2012) 739–751

Contents lists available at SciVerse ScienceDirect

European Journal of Operational Research

journal homepage: www.elsevier .com/locate /e jor

Decision Support

Unifying temporal and organizational scales in multiscale decision-making

Christian Wernz a,⇑, Abhijit Deshmukh b

a Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA 24061, USAb School of Industrial Engineering, Purdue University, West Lafayette, IN 47907, USA

a r t i c l e i n f o

Article history:Received 21 March 2011Accepted 26 June 2012Available online 6 July 2012

Keywords:Distributed decision-makingGame theoryMarkov processesMulti-agent systemsOR in service industries

0377-2217/$ - see front matter � 2012 Elsevier B.V. Ahttp://dx.doi.org/10.1016/j.ejor.2012.06.038

⇑ Corresponding author. Tel.: +1 540 231 9772; faxE-mail address: [email protected] (C. Wernz).

a b s t r a c t

In enterprise systems, making decisions is a complex task for agents at all levels of the organizationalhierarchy. To calculate an optimal course of action, an agent has to include uncertainties and theanticipated decisions of other agents, recognizing that they also engage in a stochastic, game-theoreticreasoning process. Furthermore, higher-level agents seek to align the interests of their subordinates byproviding incentives. Incentive-giving and receiving agents need to include the effect of the incentiveon their payoffs in the optimal strategy calculations. In this paper, we present a multiscale decision-mak-ing model that accounts for uncertainties and organizational interdependencies over time. Multiscaledecision-making combines stochastic games with hierarchical Markov decision processes to model andsolve multi-organizational-scale and multi-time-scale problems. This is the first model that unifies theorganizational and temporal scales and can solve a 3-agent, 3-period problem. Solutions can be derivedas analytic equations with low computational effort. We apply the model to a service enterprise challengethat illustrates the applicability and relevance of the model. This paper makes an important contributionto the foundation of multiscale decision theory and represents a key step towards solving the general X-agent, T-period problem.

� 2012 Elsevier B.V. All rights reserved.

1. Introduction

Decision-making in enterprise systems is complex. Reasons forthe complexity are that (1) decisions across various organizationallevels are interdependent, (2) decisions are made over time withpast and future decisions affecting each other, and (3) decisionsare made under uncertainty. An action by a decision-making agentat one level of the organization can affect the actions and their out-comes at other organizational levels. Consequentially, a decisionmaker, who seeks a good or optimal course of action, must con-sider these interdependencies. In addition, decisions are made overtime, with short-term decisions and outcomes affecting long-termdecisions and their outcomes. Information about future periodsmust be taken into consideration when determining an optimalcourse of action. The organizational and temporal scales of deci-sion-making in complex systems are intertwined and these scalesmust not be considered in isolation.

To find an optimal decision resulting in the highest expected re-ward, an agent has to consider how its actions affect other agentsat different organizational levels and at different time scales, andhow the decisions of other agents at other organizational and tem-poral scales affect its own decisions and outcomes. This challeng-ing task is further complicated by the uncertainty that exists

ll rights reserved.

: +1 540 231 3322.

between decisions and their outcomes. Lastly, agents may havelimited access to information because information is not available,restricted or too costly to obtain.

The organizational and temporal scales are tightly coupled. Deci-sions at lower levels of the organization are made with higher fre-quency than decisions at higher levels. Top-level agents makestrategic decisions, mid-level agents make tactical decisions andlow-level agents make operational decisions. For example, an oper-ational decision in a production environment, such as deciding onorder quantities, is made on a daily basis; decisions on productionprograms which are made higher up in the hierarchy are made ona weekly or monthly basis; and, even further up the hierarchy, deci-sions on the purchase of new production equipment are made witheven lower frequency, e.g., quarterly or annually. Thus, the decisionfrequency is correlated to the hierarchical level. Only by recognizingthe interconnectedness of the organizational level and the decisionfrequency can accurate models be developed that connect short-term, low-level decisions with long-term, high-level decisions.

Another aspect of organizational decision-making that needs tobe considered is that of goal alignment and incentives. Decisionmakers in organizations, particularly in large, anonymous systems,may seek to maximize their own payoff and not necessarily that forthe organization. An effective incentive structure ensures thatself-interested agents make decisions that are also beneficial tothe organization. The alignment of organizational and individualinterests can be achieved through incentives that are passed down

Page 2: Unifying temporal and organizational scales in multiscale decision-making

740 C. Wernz, A. Deshmukh / European Journal of Operational Research 223 (2012) 739–751

by superiors to their subordinates. By providing incentives that, forexample, depend on the performance of the next higher-levelagent, the subordinate’s conflict of interest between personal gainand enterprise benefit can be resolved. The incentive has to com-pensate the difference in rewards between choosing the action thatis most beneficial for the agent and the action that is most benefi-cial for its superior. Through a hierarchical chain of superiors andsubordinates, an incentive structure can ensure that the interestof the top-most agent – who acts in the interest of the organization– is met and that the top-level organizational goals trickle downthrough the hierarchy.

In this paper, we develop a multiscale decision-making modelthat integrates the organizational and temporal scale into a unifiedmodel with the goal of determining optimal decision strategies un-der uncertainty and identifying optimal incentives so that subordi-nates act in the interest of their superiors. The contribution of thispaper is that it combines, in a novel way, game-theoretic modelingof hierarchical interactions with multi-time-scale Markov decisionprocesses. This new approach resulted in the unification of theorganizational and temporal scale for hierarchical multi-agent sys-tems. Previous research has focused either on the organizational ortemporal scale, but a model that can include both scales has not yetbeen developed. The paper lays the foundation for a comprehen-sive multiscale decision theory, which recognizes the interactionswithin and across multiple scales. Scales other than time and hier-archy, for example geography and information, may be included infuture model developments.

The paper is structured as follows: in Section 2, we review theexisting literature. Section 3 provides a service enterprise examplethat illustrates a prototypical multiscale decision-making chal-lenge. In Section 4, we introduce the model formulation, and Sec-tion 5 describes the solution approach. Section 6 discusses andsummarizes the paper.

2. Literature review

Multiscale decision theory combines two streams of research:(1) hierarchical and multi-organizational-scale models and (2)multi-time-scale models. We present the relevant literature inthe following sections.

2.1. Multi-organizational-scale interactions

The foundations of multiscale decision theory were laid byWernz (2008) in his dissertation. The initial model, which servedas a building block for further extensions, was a two-agent, one-period model that described the hierarchical interactions betweenthe two agents through their influence on rewards and chances ofsuccess (Wernz and Deshmukh, 2007a). In a first extension, Wernzand Deshmukh (2007b) analyzed the effect of multiple agents onthe lower organizational level interacting with one superior agent.The generalization of the model that accounts for multi-organiza-tional scale interactions with multiple agents at each level waspresented in Wernz and Deshmukh (2010b). The latter paper wasthe first to introduce the multiscale decision-making concept andcoined the model’s name. Multiscale decision-making models havebeen applied to production planning (Wernz and Deshmukh,2007a, 2010b), a service enterprise system (Wernz and Henry,2009) and a ‘‘flat world’’ management challenge (Wernz and Desh-mukh, 2007b).

The multiscale framework takes into consideration numerousprior modeling approaches. Schneeweiss (1995, 2003a,b) devel-oped a taxonomy to classify and formally describe hierarchicalagent interactions. A unified notation for various distributed deci-sion-making systems was developed. Schneeweiss’ approach was

applied to production planning (Heinrich and Schneeweiss, 1986)and supply chain management (Schneeweiss and Zimmer, 2004),among others.

The mathematical challenge of multi-level or multiscale sys-tems has been addressed by numerous authors in different disci-plines. In systems engineering, Mesarovic et al. (1970) developeda framework for multi-level hierarchical systems consisting ofinterdependent units. A mathematical theory of coordination wasdeveloped based on the conceptual and formal aspects of hierar-chical systems. The following interaction relationship betweenlower and higher levels was proposed: the success of the supremalunit depends on the performance of the infimal unit. This type ofhierarchical interaction is also used in multiscale decision theory.The model by Mesarovic et al. is a control theory approach, whichrequires continuous-time system equations that may be difficult toobtain for real-world systems with human decision makers. Tomitigate this problem, the multiscale decision-making model usesa discrete-time approach, which makes obtaining uncertainty datamore feasible.

Haimes (1981) proposed a hierarchical holographic model(HHM), a mathematical theory to represent multiple objectivesand constraints at various levels of large-scale systems. His ap-proach is similar to bi-level and multi-level optimization (Bard,1998). A summary of Haimes’ work can be found in his book onhierarchical multi-objective analysis (Haimes et al., 1990). HHMhas been extensively used in risk management (Haimes et al.,1995).

In hierarchical production planning (HPP), pioneered by Haxand Meal (1975), the hierarchical levels of production planningare connected through a sequential top-down decision process.The higher-level decisions form the constraints for the next lowerlevels. Similar models and further references are presented in Gfr-erer and Zäpfel (1995), Özdamar et al. (1998) and Stadtler (2005).

Game theory has been used to analyze systems with hierarchi-cally interacting decision makers. Deng and Papadimitriou (1999)developed a game-theoretic model with hierarchically arrangeddecision makers and proposed a linear program to solve conflictingobjective functions. Cruz et al. (2001) modeled a military air oper-ation using game theory and employed a two-level hierarchy ofcommand and control for each of the two opposing forces. Okadaand Mikami (1992) developed a game-theoretic model to resolveconflicts in a hierarchical acid rain abatement process.

Principal-agent models (Laffont, 1990; Vetschera, 2000) andStackelberg games (Stackelberg, 1952) have been used to describestrategic interactions between hierarchically interacting agents.Stackelberg games can be modeled as bi-level optimization prob-lems; see for example Nie et al. (2006).

In addition to non-cooperative game theory, cooperation be-tween agents has been considered and modeled. Groves (1973)and Geanakoplos and Milgrom (1991) analyzed hierarchical agentinteractions using the economic theory of teams (Marschak andRadner, 1972), a cooperative game theory approach. Brânzei et al.(2002) used cooperative game theory to analyze tree-connectedpeer group games.

In computer science, multi-agent systems (MAS) are used tosimulate the interactions between agents (Weiss, 1999; Monostoriet al., 2006). The Defense Advanced Research Project Agency(DARPA) called for more research in the area of multiscale model-ing of agent behavior (Barber, 2007). Dolgov and Durfee (2004)developed a graphical model based on dependency graphs to de-scribe the interactions between agents in MAS. A bipartite graphwith agents and their objectives has been proposed by Darabiet al. (2010). MAS have also been used by other disciplines, includ-ing industrial engineering, to model systems with distributeddecision makers (Krothapalli and Deshmukh, 1999; Middelkoopand Deshmukh, 1999).

Page 3: Unifying temporal and organizational scales in multiscale decision-making

C. Wernz, A. Deshmukh / European Journal of Operational Research 223 (2012) 739–751 741

2.2. Multi-time-scale interactions

Most hierarchical models are one-period models, while mostmulti-period models consider only one agent. Even if multi-periodmodels consider two or more agents, the agents’ decisions are as-sumed to occur on the same time scale. The paper at hand is thefirst to develop a comprehensive model for bridging temporalscales and hierarchical agent interactions. Leading up to this workare the contributions by Wernz and Deshmukh (2009, 2010a).Wernz and Deshmukh (2009) analyzed a multi-period, hierarchicalinteraction between two agents that make decisions on the sametime scale. Wernz and Deshmukh (2010a) extended this work toa multi-time-scale interaction model between two agents, recog-nizing that the superior agent makes decisions at a lowerfrequency than its subordinate.

The multi-period, game-theoretic formulation in multiscaledecision-making has similarities with stochastic games. Stochasticgames build upon classical game theory and model strategic andprobabilistic agent interactions over multiple periods. Stochasticgames are a combination of Markov decision processes (MDPs)and strategic games. Shapley (1953) was the first to propose analgorithm to solve stochastic games. Zachrisson (1964) coinedthe term ‘‘Markov games’’ to emphasize the connection to MDPs.For survey papers on stochastic games and solution algorithmssee Howard (1960), Pollatschek and Avi-Itzhak (1969), Raghavanand Filar (1991), Mertens (1992), Filar and Vrieze (1996) and Bowl-ing and Veloso (2000).

Reinforcement learning (Sutton and Barto, 1998) is an alterna-tive approach to solving stochastic games and has been popularin computer science. Agents are assumed to learn over time andto develop a model of the environment. Prominent reinforcementlearning solution techniques are Minimax-Q (Littman, 1994) forzero-sum stochastic games and Nash-Q (Hu and Wellman, 1998)for the general-sum problem.

Fictitious play (Vrieze, 1987) is a learning rule in which agentsupdate their beliefs as they play a stochastic game. Fictitious playis at the intersection of traditional game theory and reinforcementlearning. Similar to reinforcement learning, agents simulate thegame to refine their strategy; however, the algorithm that selectsthe best strategy is a game-theoretic one. Fictitious play wasnamed and introduced by Brown (1951); see also Robinson(1951) for an early contribution.

Researchers have developed various approaches to model mul-ti-time-scale phenomena using MDPs. One can distinguish be-tween two categories: (1) multi-time-scale systems that aremodeled with MDPs and (2) one-time-scale systems with imposedhierarchical, multi-time-scale structures. Category (2) is mostlyused to increase the computational efficiency of solution algo-rithms; see for example Sutton (1995), Hauskrecht et al. (1998),Parr (1998) and Sutton et al. (1999). For this paper, we are primar-ily interested in models for category (1), which are described in thefollowing paragraphs.

Chang et al. (2003) proposed multi-time-scale Markov decisionprocesses (MMDPs). The authors’ model is an extension of Muppalaet al. (1996) and Goseva-Popstojanova and Trivedi (2000) in whichthe lower level is modeled as a Markov chain and the upper level isdescribed as a Markov reward process.

Jacobson et al. (2003) proposed periodically time-inhomoge-neous Markov decision processes (PTMDPs). The evolution of thesystem is described by (N + 1)-periodic sequences of reward func-tions and transition probabilities. The first N epochs are fast-scaleepochs and the interval N + 1 is a slow-scale cycle.

In their monograph, Sethi and Zhang (1994) focused on hierar-chical decision-making in stochastic manufacturing systems. Dif-ferent hierarchical levels have different time frames associatedwith their decisions. However, hierarchy is only understood in a

temporal sense and the models do not include multi-organiza-tional aspects.

3. Service enterprise example

This section provides an example to demonstrate a prototypicalmultiscale decision-making challenge. This example is a multi-time-scale extension of a one period example introduced in Wernzand Henry (2009).

We consider an enterprise that produces, installs, and maintainsescalators. The maintenance service division of this enterprise is athree-level hierarchical organization consisting of an accountmanager, maintenance supervisor, and maintenance worker. Themanager is responsible for overall customer satisfaction and seeksservice contract renewals. The supervisor sets the schedule of theworkers, who carry out the maintenance work at the customer site.Each decision maker chooses one of two possible actions which re-sults in one of two possible outcomes. The outcomes at lower lev-els affect the chances of success of their superiors, i.e., theprobabilities that the superiors reach their preferred states.

Starting at the bottom of the hierarchy, the worker is responsi-ble for preventive maintenance, e.g., lubrication of bearings andgears. Additionally, the worker is tasked with inspecting the esca-lator for problems, such as abnormal wear, which could jeopardizethe equipment’s performance. During the visual inspection, theworker chooses to conduct either a superficial inspection or a thor-ough inspection. The inspection decision of the worker will result,with a certain probability, in the escalator being under-serviced orwell-serviced. Given a thorough inspection, the outcome of a well-serviced escalator is more likely than that of an under-servicedone, but both states are possible for either of the worker’s deci-sions. A well-serviced escalator is associated with a larger effort bythe worker. A superficial inspection is more likely to result in an un-der-serviced state with low effort expended by the worker. Theworker seeks to dispense as little effort as possible and thus prefersthe superficial inspection over the thorough inspection.

The maintenance supervisor is in charge of the maintenanceoperation. She is tasked with setting the daily job schedule forthe worker and as such has a larger decisions time horizon thanthe maintenance worker. When choosing the schedule, the super-visor must decide between a heavy workload and a light workloadfor her subordinate. A heavy workload requires the worker to visitand service a large number of jobsites in a day. A heavy workloadhas a higher probability of causing the worker to rush, not spend-ing enough time at each job. The worker having to rush will likelyresult in a future malfunction of the escalator, which will contrib-ute to low service satisfaction of the customer. In contrast, a lightworkload level allows the worker to spend more time at each cus-tomer site, which will lead with greater probability to a high servicesatisfaction. Again, both states (high and low service satisfaction) arepossible given either decision (light and heavy workload). Thesupervisor is evaluated on a cost per job basis, and her goal is tokeep costs low. Therefore, even though service satisfaction is likelyto suffer, the decision for a heavy workload is preferred by thesupervisor, as the expected cost is lower than with the light work-load. In terms of decision outcomes, a well-serviced escalator in-creases the probability of high service satisfaction and high cost; incontrast, an under-serviced escalator increases the probability oflow service satisfaction and low cost.

At the top level of the maintenance services division is the ac-count manager. The manager is responsible for negotiating servicecontracts with the customer. The manager receives a commissionfor customers renewing their maintenance service contracts andhis strategic perspective has the largest time scale among the threeagents. The service satisfaction outcome influences the probability

Page 4: Unifying temporal and organizational scales in multiscale decision-making

742 C. Wernz, A. Deshmukh / European Journal of Operational Research 223 (2012) 739–751

of contract renewal. High service satisfaction increases the probabil-ity of contract renewal while low service satisfaction reduces theprobability. The manager considers a reward-sharing incentive toalign the supervisor’s goals with his own.

The rewards at the account manager level are larger than thoseat the supervisor or worker level. The supervisor’s and the worker’spreferred decisions (heavy workload, superficial inspection, respec-tively) do not support the manager’s goal of contract renewal. Toincentivize his subordinates to take cooperative actions, i.e., lightworkload and thorough inspection, the manager is willing to sharea part of his reward to obtain cooperation from the lower levels.Manager, supervisor and worker are represented by agents A1,A2 and A3 in the following general model formulation.

4. Multiscale decision-making model

4.1. Model and notation

The agents’ decisions and their consequences are described as acompetitive Markov decision process (Filar and Vrieze, 1996), acombination of stochastic games and Markov decision processes(MDPs). MDPs model an agent’s actions and the consequences,which include the transition to a new state and the associated re-ward. The extension of MDPs to competitive MDPs accounts for theagents’ stochastic game interactions.

In this paper, we will use and extend the notation first intro-duced by Wernz and Deshmukh (2007a), which is based on thestandard MDP formulation (Puterman, 1994). We begin by describ-ing the MDP formulation for individual agents before introducingthe game-theoretic component that models the agents’ hierarchi-cal interactions.

We denote agents by A1, A2, . . ., Ax, . . . with x = 1, . . . , X. AgentA1 is the top-most agents, agent A2 its direct subordinate and soon. The interaction over time is modeled as a discrete time problemwith N � 1 periods. Agents make decisions at the beginning of aperiod, which is referred to as the decision epoch. Time begins withdecision epoch 1, the start of period 1, and ends with period N � 1followed by the final epoch N. At every decision epoch t = 1, . . . ,N � 1, each agent carries out an action aAx

m;t 2 A, where indexm = 1, 2, . . . denotes the different actions. No action is carried outat the final epoch N, which is merely needed to describe the agents’final states. An agent Ax’s current state is denoted by sAx

i;t 2 S, whereindex i = 1, 2, . . . denotes the different states. Given its current statesAx

i;t and decision aAxm;t , the agent moves to a new state sAx

j;tþ1 with

probability pAxt sAx

j;tþ1

���sAxi;t ; a

Axm;t

� �, the so-called transition probability.

Each agent receives a reward rAxt sAx

j;tþ1

� �that depends on the state

that is reached in the next period. We assume that the action hasno cost. In each period t = 1, . . . ,N � 1, agents take actions, transi-tion to new states and receive the corresponding rewards.

We present the model formulation for three agents on three dif-ferent organizational scales, but the model formulation allows foran arbitrarily large number of hierarchical levels and agents. Fur-thermore, we assume that an agent has the choice between twopossible actions and is in either of two states. Consequentially,the action spaces for agents A1–A3 are denoted by

aAx1;t ; a

Ax2;t

n ox¼1;2;3; t¼1;...;N�1

and their state spaces by

sAx1;t ; sAx

2;t

n ox¼1;2;3;t¼1;...;N

. In each period, each agent has a distinct set

of two actions and two states.In the context of our example, the model formulation can be

interpreted as follows: agent A3’s action aA31;t corresponds to the

thorough inspection and action aA32;t refers to the superficial inspec-

tion. The resulting state sA31;tþ1 signifies a well-serviced escalator

and state sA32;tþ1 denotes an under-serviced escalator. Agent A2’s ac-

tion aA21;t corresponds to the assignment of a light workload and ac-

tion aA22;t refers to a heavy workload. The resulting state sA2

1;tþ1 denotes

high service satisfaction and state sA22;tþ1 denotes low service satisfac-

tion. In our example, agent A1 does not make a decision, but we in-clude this possibility in our model formulation. The two possible

states at the highest level are contract renewal sA11;tþ1

� �and no con-

tract renewal sA12;tþ1

� �.

Returning to the model notation, the initial rewards for agentsAx are

rAxt sAx

1;tþ1

� �:¼ qAx

1;t ; rAxt sAx

2;tþ1

� �:¼ qAx

2;t; ð1Þ

where rAxt is the reward function and qAx

i;t the actual reward. Rewardscan be also compactly represented as a vector:

rAxt :¼

qAx1;t

qAx2;t

!: ð2Þ

The initial state-dependent transition probabilities for agents Axare

pAxt sAx

1;tþ1

���sAxi;t ; a

Ax1;t

� �:¼ aAx

i:1;t ; pAxt sAx

2;tþ1

���sAxi;t ; a

Ax1;t

� �:¼ 1� aAx

i:1;t ; ð3Þ

pAxt sAx

1;tþ1

���sAxi;t ; a

Ax2;t

� �:¼ 1� aAx

i:2;t; pAxt sAx

2;tþ1

���sAxi;t ; a

Ax2;t

� �:¼ aAx

i:2;t ; ð4Þ

with i = 1, 2, t = 1, . . . , N � 1, and 0 6 aAxi�m:t 6 1, where m = 1, 2. Sim-

ilar to the reward notation, pAxt represents the transition probability

function. Parameter aAxi�m:t is the actual transition probability. Transi-

tion probabilities can also be represented by a matrix

pAxt sAx

i;t

� �:¼

aAxi:1;t 1� aAx

i:1;t

1� aAxi:2;t aAx

i:2;t

!: ð5Þ

The matrix rows describe the action, and the columns describethe state. These rewards and transition probabilities are initial val-ues. Next, we show how the initial values are affected by the agentinteraction.

As stated earlier, lower-level agents influence higher-levelagents’ chances of success. We operationalize this by transformingthe initial transition probabilities into final transition probabilities.The final transition probability of agent A2 depends on the state towhich agent A3 transitions. Similarly, agent A1’s transition proba-bility is directly affected by agent A2, but is also indirectly affectedby agent A3, since agent A3 had an effect on agent A2. Agent A3’stransition probability is unaffected since agent A3 is the lowest le-vel agent.

We model the influence on transition probabilities using anadditive influence function f. Function f is described by a constantcx,t, which we call the change coefficient. The final transition proba-bility of agent A2 is

pA2final;t sA2

k;tþ1

���sA2h;t;s

A3l;tþ1;a

A2n;t

� �:¼pA2

t sA2k;tþ1

���sA2h;t ;a

A2n;t

� �þ f A2

t sA2k;tþ1

���sA3l;tþ1

� �ð6Þ

with

f A2t sA2

k;tþ1

���sA3l;tþ1

� �:¼

c2;t if k ¼ l;

�c2;t if k – l;

�ð7Þ

for k, h, l, n = 1, 2 and t = 1, . . . , N � 1. The final transition probabilityfor agent A1 is

pA1final;t sA1

j;tþ1

���sA1h;t;s

A2k;tþ1;s

A3l;tþ1;a

A1m;t

� �:¼pA1

t sA1j;tþ1

���sA1g;t;a

A1m;t

� �þ f A1

t sA1j;tþ1

���sA2k;tþ1

� �� 1þ f A2

t sA2k;tþ1

���sA3l;tþ1

� �� �ð8Þ

with

Page 5: Unifying temporal and organizational scales in multiscale decision-making

C. Wernz, A. Deshmukh / European Journal of Operational Research 223 (2012) 739–751 743

f A1t sA1

j;tþ1

���sA2k;tþ1

� �:¼

c1;t if j ¼ k;

�c1;t if j – k;

�ð9Þ

for g, h, j, k, l, m = 1, 2, t = 1, . . . , N � 1, and cx,t > 0. Since probabilitiescannot be negative nor exceed unity, 0 6 pAx

final;tð�Þ 6 1 must hold,which restricts the range of the change coefficients to

0 < c2;t 6 min aA2i:1;t;a

A2i:2;t ;1� aA2

i:1;t ;1� aA2i:2;t

n oand ð10Þ

0 < c1;t � ð1þ c2;tÞ 6 min aA1i:1;t;a

A1i:2;t ;1� aA1

i:1;t ;1� aA1i:2;t

n oð11Þ

for i = 1, 2 and t = 1, . . . , N � 1.The meaning and impact of the chosen change coefficient struc-

ture is as follows: for t = 2, . . . , N, state sA31;t increases the probability

of state sA21;t and consequentially reduces the probability of state sA2

2;t .Similarly, given state sA3

2;t , state sA22;t becomes more likely and state

sA21;t less likely. The same effect on transition probabilities takes

place between agents A1 and A2. In other words, states with thesame index correspond to each other, and the transition probabil-ity of agent Ax to reach the state with the same index as the nextlower level agent A(x + 1) is additively affected by cx,t.

This type of effect on transition probabilities applies to situa-tions where agent Ax’s state supports or hinders the next higher-le-vel agent A(x � 1) reaching a specific state. In the context of ourexample, a well-serviced escalator increases the chance of high cus-tomer satisfaction, which in turn increases the chance of contractrenewal.

In addition to transition probabilities, the final agent rewardsaffect each other since agents pay incentives to their subordinatesto motivate cooperation. To model the incentive, we introduce theshare coefficient bx,t. The share coefficient is the portion of a supe-rior agent’s reward that this agent will share with its subordinate.Agent A1 gives an incentive of b1;t � qA1

j;t to agent A2. Agent A2 thenpasses on the share b2,t of its final reward to agent A3. Thus, the fi-nal rewards for the agents in period t are

rA1final;t sA1

g;tþ1

� �:¼ ð1� b1;tÞ � qA1

g;t; ð12Þ

rA2final;t sA1

j;tþ1; sA2k;tþ1

� �:¼ ð1� b2;tÞ rA2

t sA2k;tþ1

� �þ b1;t � rA1

t sA1j;tþ1

� �h i¼ ð1� b2;tÞ qA2

k;t þ b1;t � qA1j;t

� �; ð13Þ

Fig. 1. Schematic mod

rA3final;t sA1

h;tþ1; sA2j;tþ1; s

A3l;tþ1

� �:¼ rA3

t sA3l;tþ1

� �

þ b2;t rA2t sA2

k;tþ1

� �þ b1;t � rA1

t sA1j;tþ1

� �h i

¼ qA3l;t þ b2;t � qA2

k;t þ b1;t � b2;t � qA1j;t : ð14Þ

Fig. 1 graphically summarizes the agent interactions, depictingtwo agents and two periods.

Next, we show how we model that higher-level agents makedecisions more strategically compared to lower level agents, whomake more operative decisions. The time horizon for which anagent makes a decision until it makes its next decision is referredto as the decision scale horizon. The lowest level agent has a deci-sion scale horizon of one period, whereas the next higher-levelagent has a decision scale horizon of two or more periods. The stra-tegic nature of higher-level decisions requires these agents todetermine their actions for the entire decision scale horizon andselect one time-homogeneous share coefficient value that appliesto the entire time span. We denote the share coefficient that agentAx chooses that applies to the decision scale horizon of periods t1

through t2 by bx,t1. . .t2.Lastly, the following assumptions are made:

qA11;t > qA1

2;t ; qA21;t < qA2

2;t ; qA31;t < qA3

2;t ð15Þ

aAxi�m:t >

12: ð16Þ

Inequalities in (15) express that agent A1 prefers state sA11;tþ1 over

sA22;tþ1 and agents A2 and A3 reversely prefer sAx

2;tþ1 over sAx1;tþ1, at least

initially. In the context of our example, this means that the man-ager receives a higher reward for contract renewal, the supervisorinitially receives a higher reward for assigning a high workloadand the worker initially receives a higher reward for a superficialinspection.

Expression (16) states that an action is linked to the statewith the same index. In other words, there is a correspondingaction for every state, which is the most likely outcome ofthe respective action. This restriction circumvents redundantcases in the analysis, but does not limit the generality of themodel.

Table 1 provides an overview of the notation introduced in thissection.

el representation.

Page 6: Unifying temporal and organizational scales in multiscale decision-making

Table 1Overview of model notation.

aAxm;t

Action with index m by agent Ax in period t

sAxi;t

State with index i of agent Ax in period t

rAxt ðsAx

j;tþ1Þ Reward function of agent Ax in period t, which dependson the state with index j that the agent reaches inperiod t + 1

qAxj;t

Reward of agent Ax in period t when agents transitionsto state with index j period t + 1

rAxt

Reward vector of agent Ax in period t

pAxt sAx

j;tþ1jsAxi;t ; a

Axm;t

� �Transition probability function of agent Ax for movingfrom state with index i to state with index j whentaking action with index m in period t

aAxi�m;t

Transition probability of agent Ax in period t for movingfrom state with index i to state that corresponds to theaction with index m

pAxt sAx

i;t

� �Transition probability matrix for agent Ax when in statewith index i in period t. Matrix rows refer to actions inperiod t and columns to states reached in period t + 1

rAxfinal;tð�Þ Final reward function of period t, which includes

incoming and outgoing incentive payments

pAxfinal;tð�Þ Final transition probability function of period t, which

includes influence of other agents

f Axt sAx

j;tþ1jsAðxþ1Þk;tþ1

� �Influence function of period t, which describes howagent Ax’s transition probability is affected bysubordinate agent A(x + 1)

cx,t Change coefficient of period t, which determines thestrengths of the additive influence on agent Ax’stransition probability based on agent A(x + 1)’s statereached in period t + 1

bx,t Share coefficient of period t, which determines theshare of agent Ax’s reward that is passed down tosubordinate agent A(x + 1)

bx,t1. . .t2 Share coefficient of decision scale horizon from periodt1 to t2, which determines the time-homogenous shareof agent Ax’s rewards in each period t passed down toagent A(x + 1)

744 C. Wernz, A. Deshmukh / European Journal of Operational Research 223 (2012) 739–751

4.2. Agents’ decision criteria

We assume that agents are risk-neutral and rational, i.e., agentsmaximize their expected utilities, or equivalently their expectedrewards. Rational agents are able to calculate their own and otheragents’ expected rewards, can analyze the strategic interdepen-dency of their action with the actions of other agents, and thuscan decide which actions yield the highest expected rewards forthemselves. Hence, agents will engage in a game-theoretic reason-ing process. The expected reward for agent Ax for a given period tis:

E rAxfinal;t

���sA1g;t; s

A2h;t ; s

A3i;t ; a

A1m;t; a

A2n;t ; a

A3o;t

� �

:¼X2

j¼1

X2

k¼1

X2

l¼1

rAxfinal;tð�Þ

� pfinal;t sA1j;tþ1; s

A2k;tþ1; s

A3l;tþ1

���sA1g;t; s

A2h;t; s

A3i;t ; a

A1m;t; a

A2n;t ; a

A3o;t

� �ð17Þ

with

pfinal;t sA1j;tþ1; s

A2k;tþ1; s

A3l;tþ1

���sA1g;t ; s

A2h;t ; s

A3i;t ; a

A1m;t ; a

A2n;t; a

A3o;t

� �:¼ pA1

final;t sA1j;tþ1

���sA1g;t ; s

A2k;tþ1; s

A3l;tþ1; a

A1m;t

� �� pA2

final;t sA2k;tþ1

���sA2h;t ; s

A3l;tþ1; a

A2n;t

� �� pA3

t sA3l;tþ1

���sA3i;t ; a

A3o;t

� �: ð18Þ

Eq. (18) relies on the insight that once the effect of the agents’interdependence has been included in the final transition probabil-ities, each of the multipliers is independent from one another.

Agents seek to maximize their cumulative rewards, i.e., thesum of all rAx

final;t over all periods. The cumulative rewardfunction is denoted by rAx

finalðtÞð�Þ. For agent A3, the cumulativereward is:

rA3finalðtÞ sA1

j;t ; sA2k;t ; s

A3l;t

� �:¼XN�1

s¼t

rA3final;s sA1

g;sþ1; sA2h;sþ1; s

A3i;sþ1

� �ð19Þ

for t ¼ 1; . . . ;N � 1:

The cumulative rewards for agent A1 and A2 are denotedaccordingly and are calculated in the same way. Notice the differ-ence between rAx

final;t , which is the reward in period t, and rAxfinalðtÞ,

which is the cumulative reward from period t until the end ofthe time horizon.

To calculate the expected cumulative reward, which agents seekto maximize, the backward induction principle (Bellman, 1957; Pu-terman, 1994) is applied. Starting in the last period of the timehorizon and working backwards to period t, the recursive Bellmanequations is

E rAxfinalðtÞ

���sA1g;t ; s

A2h;t ; s

A3i;t ; a

A1m;t ; a

A2n;t; a

A3o;t

� �¼ E rAx

final;t

���sA1g;t; s

A2h;t ; s

A3i;t ; a

A1m;t; a

A2n;t ; a

A3o;t

� �

þX2

j¼1

X2

k¼1

X2

l¼1

pfinal;t sA1j;tþ1; s

A2k;tþ1; s

A3l;tþ1

���sA1g;t; s

A2h;t ; s

A3i;t ; a

A1m;t; a

A2n;t ; a

A3o;t

� �

� E� rAxfinalðtþ1Þ

���sA1j;tþ1; s

A2k;tþ1; s

A3l;tþ1

� �ð20Þ

with

E� rAxfinalðNÞjsA1

j;N; sA2k;N ; s

A3l;N

� �:¼ 0:

To derive the optimal expected cumulative rewards

E� rAxfinalðtÞ

���sA1g;t ; s

A2h;t ; s

A3i;t

� �from the expected cumulative rewards

E rAxfinalðtÞ

���sA1g;t ; s

A2h;t; s

A3i;t ; a

A1m;t ; a

A2n;t ; a

A3o;t

� �, agents determine their optimal

actions aA1m;t ; a

A2n;t ; a

A3o;t

� ��for t = 1, . . . , N � 1.

Without further calculations, we can determine that agentA1’s optimal action is aA1

1;t since it best supports reaching pre-ferred state sA1

1;t regardless of the actions and outcomes of thesubordinates.

Still, agent A1 needs to evaluate whether paying an incentiveto agent A2 to ensure cooperation, i.e., action aA2

1;t is beneficial.Initially, agent A2 prefers to choose the non-cooperative actionaA2

2;t , which leads with probability >0.5 to a transition to state

sA22;tþ1. This state however reduces agent A1’s probability of

reaching its preferred state sA11;tþ1 due to influence function

f A1t sA1

1;tþ1

���sA22;tþ1

� �¼ �c1;t .

Given a sufficiently large incentive, agent A2 will prefer actionaA2

1;t . Once it prefers the cooperative action, agent A2 will benefit

from agent A3 also choosing cooperative action aA31;t . Agent A1 can

incentivize agent A2 by choosing sufficiently large share coeffi-cient, and, in the same way, agent A2 can incentivize agent A3.We assume that an agent always takes the action which it (weakly)prefers, in particular chooses the cooperative action if indifferentbetween two actions.

In the next section, we determine the optimal incentives andresulting optimal decision strategies.

5. Analysis

The following list shows key steps of how the multiscale deci-sion-making problem can be solved:

1. Analyze the problem and determine the number of organiza-tional levels and periods, including the length of the decisionscale horizons at each level.

Page 7: Unifying temporal and organizational scales in multiscale decision-making

1 2 3 4

5 6 7 8

9 10 11 12

Cooperative action Non-cooperative action

Coop. action in one period, non-coop. action in other period

Fig. 3. Twelve decision scenarios.

C. Wernz, A. Deshmukh / European Journal of Operational Research 223 (2012) 739–751 745

2. Determine all possible scenarios that can result from agents’actions by identifying all applicable permutations of agents’decisions and incentive payments.

3. Calculate the cooperation conditions, which describe the mini-mal value of the share coefficients bx,t that motivate cooperativebehavior by the subordinate agent.

4. Determine each scenario’s optimal share coefficients bx,t1. . .t2.5. Calculate the agents’ expected rewards for each scenario’s opti-

mal share coefficients bx,t1. . .t2.6. Identify the scenario that is the subgame perfect Nash equilib-

rium. The optimal share coefficients b�x;t1...t2 and correspondingdecision strategies are found.

Next, we discuss the details of each step and demonstrate thesolution procedure alongside a 3-agent, 3-period example (X = 3,N = 4).

5.1. Analyze problem

In general, one starts by analyzing the problem, which includesidentifying the agents, their interactions over time, their decisionscale horizons and the relevant data. We consider our introductoryexample in which three agents form a superior-subordinate chain.

Agent A1 is the highest-level agent and as such makes long-term, strategic decisions. Its decision scale horizon is three periods,i.e., four epochs. In epoch 1, agent A1 determines the share coeffi-cient b1,1. . .3, which is the fixed share coefficient for all periods of itsdecision scale horizon. Agent A1 also determines one type of actionthat applies to the entire decision scale horizon. Agent A2 at thetactical level has a two-period decision scale horizon for t = 1, 2,followed by a single period decision scale horizon in t = 3. Its sharecoefficients are b2,1. . .2 and b2,3. Agent A3 makes operative decisionsin every decision epoch. Lastly, we assume that agent A2 choosesits share coefficient first, followed by agent A1’s choice. This orderprevents agent A2 from free-riding, i.e., choosing zero for its sharecoefficients after agent A1 has committed to a positive sharecoefficient.

Fig. 2 illustrates the 3-agent, 3-period model. The shading of thecircles indicates in which periods agents make decisions (stripes)and in which periods decisions are merely executed (bricks).

To demonstrative the results of the analysis, we have chosen thefollowing data:

rA1t ¼

45050

� �; rA2

t ¼1030

� �; rA3

t ¼�10

� �;

pAxt sAx

1;t

� �¼

0:8 0:20:4 0:6

� �; pAx

t sAx2;t

� �¼

0:6 0:40:2 0:8

� �;

c1;t ¼ 0:15; c2;t ¼ 0:15for t; x ¼ 1;2;3:

Fig. 2. Multiscale model with three agents.

The data is stationary, i.e., time-homogenous, but parametersand coefficients could change from period to period.

5.2. Determining scenarios

In the next step, one identifies all possible permutations ofcooperative and non-cooperative actions agents can take. InFig. 3, we show a graphical tool to identify and keep track of all fea-sible scenarios. Similar to Fig. 2, each circle represents a decisionby an agent. The rows of circles in each scenario refer to the agentsat the different hierarchical levels. The columns represent the dif-ferent periods. White circles indicate cooperative actions and blackcircles non-cooperative actions. Half-black, half-white circles sub-sume two sub-scenarios, where in the first period the circle iswhite, while in the second period it is black, or vice versa. Wecan combine these two sub-scenarios into one scenario since theiranalysis is identical. The result of the analysis shows which of thetwo mutually exclusive sub-scenarios applies.

For the 3-level, 3-period problem with the chosen decision scalehorizons, twelve scenarios capture all possible permutations ofcooperative (white circles) and non-cooperative (black circles)agent behavior.

A white circle above another white circle indicates that a coop-eration inducing incentive payment by the higher-level agent ismade. A white circle above a black circle indicates no incentivepayment is made. No scenario exists where a black circle is abovea white circle. Such a scenario would be non-feasible, since a supe-rior agent that chooses the non-cooperative action prefers the sub-ordinate agent to also choose the non-cooperative action.Consequentially, the superior agent would not pay an incentiveto change the subordinate’s already preferred non-cooperativeaction.

As a reminder, the type of actions within a decision scale hori-zon must be the same, either cooperative or non-cooperative. Theconsequence is that both circles in agent A2’s decision scale hori-zon over periods 1 and 2 must be either black or white. A combi-nation of black and white circles within a decision scale horizonis, as per definition, not possible.

In all twelve scenarios, agent A1 chooses the cooperative ac-tions, represented by three white circles in the first row. Agent

Page 8: Unifying temporal and organizational scales in multiscale decision-making

746 C. Wernz, A. Deshmukh / European Journal of Operational Research 223 (2012) 739–751

A1 always prefers cooperative actions to non-cooperative actions.Thus, there are no scenarios with black circles in the first row.

For a larger number of agents, periods and decision scale hori-zons, the number of scenarios that need to be considered increases.The 3-agent, 3-period problem, however, can serve as a startingpoint and represents a building block to derive the additional sce-narios. For example, when extending the model to a 3-agent, 4-period problem – with a 4-period decision scale horizon at thetop, two 2-period decision scale horizons at the middle level, andperiod-by-period decision-making at the bottom – four additionalscenarios need to be considered. Scenarios 2, 5, 9 and 11 show awhite circle for agent A2 over a black circle for agent A3 in period3. For the 4-period problem, these scenarios branch into two sce-narios, respectively. In one scenario, two white circles of agentA2 in periods 3 and 4 are over two black circles, associated withagent A2. In the other scenario, two white circles are over twohalf-black, half-white circles. The constellations for all other sce-narios extend to four periods by merely repeating the circle colorsof the third period in the fourth. Consequentially, only four newscenarios would need to be considered.

5.3. Calculate cooperation conditions

One calculates the so-called cooperation conditions, which de-scribe the share coefficients bx,t for which the subordinate agentwould choose the cooperative action. This step builds upon themethods previously developed in multiscale decision theory(Wernz and Deshmukh, 2010a,; Wernz and Henry, 2009). One ob-tains inequality conditions for share coefficients bx,t, which areintermediate results that are the basis for the next step of the solu-tion procedure.

Following the backward induction solution approach for MDPs,we first compute the share coefficients of the last period. The sharecoefficients of agents A1 and A2 that would result in cooperativebehavior by their subordinates in the final decision epoch N � 1are determined in the following two theorems.

Theorem 1. To motivate cooperative behavior by agent A2, the sharecoefficient offered by agent A1 in the last decision epoch must satisfy

b1;N�1 PqA2

2;N�1 � qA21;N�1

2c1;N�1 qA11;N�1 � qA1

2;N�1

� � : ð21Þ

Proof. Agent A2 (weakly) prefers the cooperative action aA21;N�1, if

its expected reward is larger than for the non-cooperative actionaA2

2;N�1, i.e., if

E rA2finalðN�1Þ

���sA1g;N�1; s

A2h;N�1; s

A3i;N�1; a

A11;N�1;a

A21;N�1; a

A3o;N�1

� �P E rA2

finalðN�1Þ

���sA1g;N�1; s

A2h;N�1; s

A3i;N�1; a

A11;N�1;a

A22;N�1; a

A3o;N�1

� �: ð22Þ

We solve inequality (22) using Bellman’s principle for theexpected reward calculation shown in (20). For the final period,

the backward induction Eq. (20) simplifies and only E rA2final;N�1j�

� �,

the last period’s expected reward, needs to be evaluated. Thus,(22) is equivalent to

E rA2final;N�1

���sA1g;N�1; s

A2h;N�1; s

A3i;N�1; a

A11;N�1;a

A21;N�1; a

A3o;N�1

� �P E rA2

final;N�1

���sA1g;N�1; s

A2h;N�1; s

A3i;N�1; a

A11;N�1;a

A22;N�1; a

A3o;N�1

� �: ð23Þ

The evaluation of the left and right side according to theexpected value calculation (17) shows that the influence of agentA3’s decisions on the expected value is the same on both sides of(23). The respective probability terms cancel each other out, i.e.,

agent A3’s actions aA3o;N�1 does not affect inequality (22). Further-

more, all other transition probabilities of agents A1 and A2 canceleach other out. This means that none of the initial states of anyagent affects the result. What remains are reward difference andthe coefficients. Solving (23) and eliminating all terms as justdescribed, results in

2c1;N�1b1;N�1 qA11;N�1 � qA1

2;N�1

� �P qA2

2;N�1 � qA21;N�1: ð24Þ

Solving (24) for the share coefficient b1,N�1 results in (21),which we set out to find. h

We determine b2,N�1 in the following theorem, which showsthat it depends on the incentive paid by agent A1, but is still inde-pendent of transition probabilities.

Theorem 2. To motivate cooperative behavior by agent A3, the sharecoefficient offered by agent A2 in the last decision epoch must satisfy

b2;N�1 PqA3

2;N�1 � qA31;N�1

2c2;N�1 qA21;N�1 � qA2

2;N�1 þ 3b1;N�1c1;N�1 qA11;N�1 � qA1

2;N�1

� �� � :ð25Þ

Proof. Agent A3 (weakly) prefers the cooperative action aA31;N�1, if

its expected reward is larger than for the non-cooperative actionaA3

2;N�1, i.e., if

E rA3finalðN�1Þ

���sA1g;N�1; s

A2h;N�1; s

A3i;N�1; a

A11;N�1; a

A21;N�1;a

A31;N�1

� �P E rA3

finalðN�1Þ

���sA1g;N�1; s

A2h;N�1; s

A3i;N�1; a

A11;N�1; a

A21;N�1;a

A32;N�1

� �ð26Þ

Notice that agent A2 takes the cooperative action aA21;N�1. Only if

agent A2 has been incentivized to take the cooperative action,would the agent consider passing on an incentive. The ensuinganalysis steps are identical to Theorem 1 and the observationsare similar, i.e., transition probabilities on both sides cancel eachother out, and what remains are merely reward differences, shareand change coefficients. Solving for b2,N�1 results in (25). h

The next step is to calculate the cooperation conditions for ear-lier periods t = 1, . . . , N � 2, using the backward induction principle.The mathematical steps are the same as described in Theorems 1and 2. One compares expected values of the cooperative decisionswith the non-cooperative decisions of the subordinate agent todetermine the cooperation conditions.

Usually, when solving discrete stochastic programs, all possiblestate paths need to be analyzed, leading to an exponentially grow-ing solution tree and the curse of dimensionality (Bellman, 1957).Our model has the rare property that we do not have to considerthe states in which the agents are in when calculating cooperationconditions. As shown in Theorems 1 and 2, the agent’s states andtransition probabilities do not affect the cooperation conditions.Consequentially, we do not have to distinguish between variouscases in the analysis and the curse of dimensionality does not ap-ply to our model. This fact makes the multiscale decision makingmodel scalable with respect to the number of periods and resultsin low computational complexity.

However, the analytic equations of the cooperation conditionsgrow larger with each step backwards in time. We used the math-ematical software Maple� to determine the analytic equations ofthe remaining cooperation conditions. In the appendix, we providethe analytic equations for t = N � 2. Writing out all analytic equa-tions would not be instructive. Yet, the computational complexityto derive the solutions remains low.

In the next section, we show, using our example data, how thecooperation conditions discussed in this section are used to iden-tify the optimal share coefficients.

Page 9: Unifying temporal and organizational scales in multiscale decision-making

Table 2Share coefficients for scenarios 1–12.

Scenario b1,1. . .3 b2,1. . .2 b2,3

1 0.1667 0.1630 02 0.1667 0.4 04 0.1667 0.1333 0.33335 0.1291 0.2296 06 0.1667 0.5414 07 0.1667 0 0.33339 0.1667 0 0

10 0.1291 0 012 0 0 0

Table 3Initial conditions reference keys.

Reference key Initial condition

1 sA11;1; s

A21;1; s

A31;1

� �2 sA1

1;1; sA21;1; s

A32;1

� �3 sA1

1;1; sA22;1; s

A31;1

� �4 sA1

1;1; sA22;1; s

A32;1

� �5 sA1

2;1; sA21;1; s

A31;1

� �6 sA1

2;1; sA21;1; s

A32;1

� �7 sA1

2;1; sA22;1; s

A31;1

� �8 sA1

2;1; sA22;1; s

A32;1

� �

C. Wernz, A. Deshmukh / European Journal of Operational Research 223 (2012) 739–751 747

5.4. Determining each scenario’s optimal share coefficient

For scenario 1, we show in detail how the optimal share coeffi-cients are determined. For the other scenarios, the analysis is sim-ilar and, in the interest of conciseness, omitted in this paper.

The first step is to compute the candidates for the share coeffi-cients based on the cooperation conditions determined in the pre-vious step. For agent A1, for example, this means that we computeb1,1, b1,2, b1,3 as the basis for b1,1. . .3, the share coefficient of the deci-sion scale horizon.

Starting with the final decision epoch, we evaluate inequalities(21) and (25) of Theorems 1 and 2 to determine the candidates forthe share coefficients b1,1. . .3 and b2,3 offered by agents A1 and A2respectively. Since the offering agents have to pay the incentive,they will choose the smallest share coefficient that meets the coop-eration conditions, i.e., fulfills the cooperation conditions as anequality. Given this insight and by applying the chosen data,inequalities (21) and (25) result in

b1;3 ¼ 0:1�6; ð27Þ

b2;3 ¼0:1�6

9b1;1...3 � 1: ð28Þ

Continuing the backward induction and moving onto t = 2, wedetermine candidates for b2,1. . .2:

b1;2a=b ¼6:25b2;1...2 � 8:1�6�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1:5625ðb2;1...2Þ2 � 3:�3b2;1...2 þ 1:9�4

q45b2;1...2 � 63

;

ð29Þ

b2;2 ¼4b1;1...3 � 5:�3

3240ðb1;1...3Þ2 � 720b1;1...3 þ 40: ð30Þ

Finally, we evaluate the first period, t = 1, and the share coeffi-cients are

b1;1a=b ¼16b2;1...2 � 17�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi4ðb2;1...2Þ2 � 8:6b2;1...2 þ 4:76

q126b2;1...2 � 136:8

; ð31Þ

b2;1 ¼18:72b1;1...3 � 2:24

1490:4ðb1;1...3Þ2 � 316:8b1;1...3 þ 16:8: ð32Þ

The functional form of the share coefficients is the same for t = 1and t = 2; merely the numerical values are different. On the right sideof Eqs. (28)–(32), we replaced b1,3 with b1,1. . .3, as well as b2,1 andb2,2with b2,1. . ..2. In addition, Eq. (28), which expresses b2,3 in termsof b1,1. . .3, has been plugged into (29)–(32), and thus b2,3 does not ap-pear in the equations anymore. For the general form of these equa-tions, without plugged-in data and substitutions, see the appendix.

Eqs. (29) and (31) have two solutions, recognized by the sub-script a/b which refers to solutions a (+sign) and b (�sign), respec-tively. The two solutions stem from the fact that b1,2 and b1,1 arequadratic in their respective equations. The quadratic terms re-sulted from substituting share coefficient b2,3 according to (28).

Now that the candidates for b1,1. . .3 and b2,1. . .2 have been deter-mined, we have to identify the right one. In scenario 1, we seekcooperation in all periods, which is achieved for the maximum va-lue, i.e., when

b1;1...3 ¼maxfb1;1a=b; b1;2a=b; b1;3g; b2;1...2 ¼maxfb2;1; b2;2g: ð33Þ

We are now confronted with a system of equations (27)–(33).The most effective way to find a solution in this case is to pick avalue for b1,1. . .3 among the five possible values and determine ifit solves Eqs. (27)–(33). If not, another one of the remaining valuesneeds to be tried. We assume

b1;1...3 ¼ b1;3 ¼ 0:1�6: ð34Þ

Knowing b1,1. . .3, we can calculate

b2;3 ¼ 0:�3 and ð35Þb2;1...2 ¼maxfb2;1; b2;2g ¼ maxf0:1630;0:1�3g ¼ 0:1630: ð36Þ

With knowledge of b2,1. . .2, we can now confirm our first pick in(34) was indeed correct:

b1;1...3 ¼maxf0:1078; 0:1398; 0:1068; 0:1500; 0:1�6g¼ 0:1�6: ð37Þ

This concludes the analysis of scenario 1. All share coefficientshave been determined. We analyzed the remaining scenarios andfound that scenarios 3, 8 and 11 do not have feasible solutions,i.e., for the selected data no share coefficient existed that lead tothe agent behavior described by the scenarios. For the feasible sce-narios, Table 2 summarizes the results by showing the values ofthe share coefficients.

Having determined the share coefficients for all scenarios, wecan now calculate the expected rewards, which will enable us toidentify the scenario that describes the agents’ optimal strategies.

5.5. Calculate expected rewards

The expected rewards depend on the initial conditions. Eightinitial conditions have to be considered, since each of the threeagents can start in either of two states (23 = 8). Table 3 lists all pos-sible initial conditions and for reference purposes, a key is assignedto each.

For each of the eight initial conditions, Table 4 lists the cumula-tive expected rewards of agents A1 and A2 for all feasible scenarios.

The cumulative expected rewards E rA1finalð1Þ

����� �and E rA2

finalð1Þ

����� �are

the sum of the agents’ rewards from periods 1–3. The values arerounded to one decimal point.

5.6. Identify Nash equilibrium scenario

In the sixth and final step of the analysis, we determine whichscenario the agents will choose. Agents A1 and A2 choose their

Page 10: Unifying temporal and organizational scales in multiscale decision-making

Table 4Expected rewards in each feasible scenario.

Initial condition E rA1finalð1Þ

����� �E rA2

finalð1Þ

����� �E rA1

finalð1Þ

����� �E rA2

finalð1Þ

����� �E rA1

finalð1Þ

����� �E rA2

finalð1Þ

����� �Scenario 1 Scenario 2 Scenario 4

1 1035.5 191.5 1017.5 178.4 1004.3 194.12 1019.9 190.4 1001.9 177.5 988.7 193.13 1005.1 190.6 987.1 177.6 973.9 193.24 989.5 189.5 971.5 176.7 958.3 192.25 952.8 177.7 934.8 168.3 921.6 179.96 937.2 176.7 919.2 167.4 906.0 178.87 922.4 176.8 904.4 167.5 981.2 179.08 906.8 175.8 888.8 166.6 875.6 177.9

Scenario 5 Scenario 6 Scenario 71 989.9 204.5 992.7 127.4 976.7 213.92 974.3 203.4 976.4 127.1 961.1 212.73 959.5 203.5 960.9 127.4 946.3 212.94 943.8 202.5 944.6 127.1 930.7 211.75 907.2 191.6 906.3 121.3 894.0 197.56 891.6 190.6 890.0 121.0 878.4 196.37 876.8 190.7 874.5 121.3 863.6 196.58 861.2 189.6 858.2 121.3 848.0 195.3

Scenario 9 Scenario 10 Scenario 121 958.7 239.3 960.1 197.9 962.2 75.12 943.1 238.0 943.8 197.3 943.5 76.93 928.3 238.2 928.3 198.1 925.8 80.04 912.7 236.9 912.0 197.5 907.0 81.95 876.0 222.8 873.7 185.1 863.0 75.16 860.4 221.5 857.4 184.5 844.3 76.97 845.6 221.6 841.9 185.3 826.6 80.08 830.0 220.3 825.6 184.7 807.8 81.9

Table 5Expected rewards of agent A1 for scenarios 9, 10 and 12.

Initial condition E rA1finalð1Þj�

� �Scenario 9 Scenario 10 Scenario 12

1 958.7 960.1 962.22 943.1 943.8 943.53 928.27 928.30 925.84 912.7 912.0 907.05 876.0 873.7 863.06 860.4 857.4 844.37 845.6 841.9 826.68 830.0 825.6 807.8

748 C. Wernz, A. Deshmukh / European Journal of Operational Research 223 (2012) 739–751

optimal share coefficients – and thereby the winning scenario – byanalyzing a sequential game of selecting share coefficients. Theagents’ behavior can be predicted by finding the subgame perfectNash equilibrium. As a reminder, agent A2 chooses its share coeffi-cients first, followed by agent A1.

Typically, sequential games are analyzed by pruning the gametree, i.e., by starting with the agent who makes the last decision(A1) and then determining the decision of the first agent (A2). Thisbackward induction would be a cumbersome process in our case,since all nine remaining scenarios with eight initial conditionswould have to be evaluated. Instead, we propose a forward analy-sis, which is more efficient since fewer scenarios and initial condi-tions need to be evaluated.

We start with agent A2’s best scenario, which is the scenario thatresults in the highest expected rewards. Given the first mover’sdecision, the best response of the second mover, agent A1, can bedetermined. If agent A1’s response – i.e., its choice of share coeffi-cients – results in the best scenario for agent A2, we have foundthe subgame perfect Nash equilibrium. If agent A1’s response doesnot give agent A2 its highest expected reward, we do not have anequilibrium and we repeat the analysis for agent A2’s next best sce-nario. We proceed in this fashion until an equilibrium is found.

We apply these steps of the analysis to our data. The scenariowith the highest expected rewards for agent A2 is scenario 9. Inscenario 9, agent A2 chooses its share coefficients to be zero, whichleads to non-cooperative behavior of agent A3 in all three periods.The only other feasible scenarios that describe non-cooperativebehavior by agent A3 in all three periods are scenarios 10 and 12.Consequentially, we consider the three scenarios 9, 10 and 12when determining agent A1’s response. In general, the relevantgroup of scenarios can be found by identifying the scenarios thathave the same share coefficient values for agent A2.

Given agent A2’s share coefficients, agent A1 can respond withthree different share coefficient choices: b1,1. . .3 = 0.1667,b1,1. . .3 = 0.1291 and b1,1. . .3 = 0 corresponding to scenarios 9, 10and 12. To determine which scenario, and thus which share coeffi-

cient, agent A1 will choose, the expected rewards of agent A1 arecompared. Agent A1 will choose the highest expected reward.

Table 5 compares the expected rewards of agent A1 for thethree scenarios for all initial conditions. The highest expected re-wards are in bold font.

For initial conditions 4–8, agent A1’s best response is scenario 9with b1;1�3 ¼ 0:1�6. For these initial conditions, the subgame perfectNash equilibrium is found, since scenario 9 was agent A2’s highestreward scenario. Consequentially, agents choose incentives andtake actions according to scenario 9 for initial conditions 4–8.

For initial conditions 1–3, agent A1’s response does not provideagent A2 with the highest expected reward and the equilibrium forthese initial conditions is not yet found. We continue the forwardanalysis with agent A2’s next best option, scenario 7.

In scenario 7, agent A2 chooses b2,1. . .2 = 0 and b2;3 ¼ 0:�3, whichwould result in non-cooperative behavior by agent A3 in periods1 and 2, and cooperative behavior in period 3. Scenario 7 is the onlyremaining scenario describing this behavior, however one morescenario needs to be considered: Agent A1 could threaten to re-spond with a share coefficient of zero, which would lead to scenario12. A comparison of the expected rewards for agent A1 in scenarios7 and 12 (Table 6) shows that this threat is not credible since agent

Page 11: Unifying temporal and organizational scales in multiscale decision-making

Table 6Expected rewards of agent A1 for scenarios 7 and 12.

Initial condition E rA1finalð1Þ

����� �Scenario 7 Scenario 12

1 976.7 962.22 961.1 943.53 946.3 925.8

C. Wernz, A. Deshmukh / European Journal of Operational Research 223 (2012) 739–751 749

A1 receives a higher expected reward in scenario 7. Consequen-tially, scenario 7 represents the solution for initial conditions 1–3.

Table 7 summarizes the results by showing the expected re-wards for agents A1 and A2, the agents’ optimal actions as picto-grams, and the corresponding share coefficients for each initialcondition.

We next discuss these results in the context of our example. Forinitial conditions 1–3, the manager offers 16.67% of his reward tohis subordinate, the maintenance supervisor, which ensures coop-erative behavior, i.e., assignment of a light workload. The supervi-sor does not pass on an incentives to her subordinate worker inperiods 1 and 2 resulting in a superficial inspection. In period 3,the supervisor gives 33.33% of her reward to the worker, whichmotivates a thorough inspection. For initial conditions 4–8, thesupervisor does not provide an incentive to the worker in the thirdperiod. All other incentives and actions are the same as for initialconditions 1–3.

The results are sensitive to the data and small variations canlead to different results. For example, by increasing reward qA3

1;t

from �1 to �0.1 for t = 1–3, scenario 1 would be the solution forall initial conditions. In opposite direction, given a too low rewarddifference qA1

1;t � qA12;t for agent A1, no share coefficient would exist

that is feasible and incentivizes cooperative behavior. Scenario 12(no cooperation) would be the result.

6. Discussion and conclusions

We developed a multiscale decision-making model that inte-grates the organizational and temporal scales into one unifiedmodel. Using the model and its solution procedure, optimal deci-sions and incentives were derived in the form of analyticequations.

We modeled and analyzed the non-cooperative interactions be-tween self-interested agents using stochastic game theory. Wecoupled the game-theoretic model with multi-time-scale, hierar-chical Markov decision processes to derive optimal decision strat-

Table 7Result summary.

egies. To model the hierarchical interactions, we considered that,in bottom-up direction, lower level states influence the transitionprobability of higher-level states. In top-down direction, incentiveswere used to align the goals of lower level agents with the goals ofthe highest-level agent.

We outlined a general six-step solution procedure for X-agent,T-period models. We analyzed a specific 3-agent, 3-period modeland provided a service enterprise example with data to illustratethe analysis and results.

For more than three agents and three periods, the six-step solu-tion procedure remains valid, though each step will require furtherresearch to be readily applicable to many-agents, many-periodproblems. In particular, future research should develop proceduresthat can automatically generate all possible decision and incentivescenarios in step 2. Not only does each step need to be generalizedfor the X-agent, T-period case, but also an integrated algorithmshould be developed that can automatically and efficiently com-pute the solution. This algorithm would also provide insights onthe computational scalability of the model with respect to thenumber of agents and periods.

The paper makes a number of important contributions. It ad-vances multiscale decision theory by combining multi-level andmulti-period methods into a unified model. Prior models focusedon either the multi-organizational-scale or the multi-temporal-scale aspect. Similarly, models outside of multiscale decision the-ory have also been limited to one multiscale dimension only. Tothe best of our knowledge, no model has been published that cananalytically derive decision strategies for multi-organizationaland multi-temporal scale problems.

Another notable contribution is that the results are in form ofanalytic equations that can be determined independently of thedata. As expected, the equations quickly become large, but stillprove to be effective when applied to data. In general, equation-based solutions are useful as they allow for sensitivity analyseswith little computational effort. The multiscale decision makingmodels are an alternative to simulations or mathematical pro-gramming methods for solving optimization problems in complexsystems. The goal of future research is to provide an algorithmicsolution approach for the X-agent, T-period problem.

Acknowledgements

This research has been funded in part by NSF grants DMI-0122173, IIS-0325168, DMI-0330171 and CMMI-0944670.

Appendix A

In Theorems 1 and 2, we showed how we derived the coopera-tion conditions for the final decision epoch N � 1. As the backwardinduction moves to earlier periods, the equations become morecomplex. Here, we will show the analytic equations of cooperationconditions for period t = N � 2.

Similar to Theorem 1, we determine the condition under whichagent A2 (weakly) prefers the cooperative action aA2

1;N�2 over thenon cooperative action aA2

2;N�2. Mathematically, this condition canbe expressed by

E rA2final;N�2

���sA1g;N�2; s

A2h;N�2; s

A3i;N�2; a

A11;N�2;a

A21;N�2; a

A3o;N�2

� �P E rA2

final;N�2

���sA1g;N�2; s

A2h;N�2; s

A3i;N�2; a

A11;N�2;a

A22;N�2; a

A3o;N�2

� �: ð38Þ

Solving (38), we notice, again, that the decision of agent A3 (in-dex o) does not play a role as the related terms cancel each otherout. Solving for agent A1’s share coefficient, we get

Page 12: Unifying temporal and organizational scales in multiscale decision-making

750 C. Wernz, A. Deshmukh / European Journal of Operational Research 223 (2012) 739–751

b1;N�2 Pð1� b2;N�1Þ qA2

2;N�1 � qA21;N�1

� �aA2

1:1;N�1 � aA22:1;N�1

� �2c1;N�2ð1� b2;N�2Þ qA1

1;N�2 � qA12;N�2

� �

þqA2

2;N�2 � qA21;N�2

� �ð1� b2;N�2Þ

2c1;N�2 qA11;N�2 � qA1

2;N�2

� �

�b1;N�1ð1� b2;N�1Þ qA1

1;N�1 � qA12;N�1

� �ð1� b2;N�2Þ qA1

1;N�2 � qA12;N�2

� � aA11:1;N�1 � aA1

2:1;N�1

þ c1;N�1

c1;N�2aA2

1:1;N�1 � aA22:1;N�1

� ��: ð39Þ

The share coefficients on the right side of the inequality willhave to be replaced by the share coefficients of the decision scalehorizon for step 4 of the solution procedure (Section 5.4). Givenour example, b1,N�1 needs to be replace by b1,1. . .3, as well asb2,N�2 with b2,1. . .2, and b2,N�1 with b2,3.

Moving onto agent A2’s share coefficient b2,N�2, we evaluate

E rA3final;N�2

���sA1g;N�2; s

A2h;N�2; s

A3i;N�2; a

A11;N�2; a

A21;N�2;a

A31;N�2

� �P E rA3

final;N�2

���sA1g;N�2; s

A2h;N�2; s

A3i;N�2; a

A11;N�2; a

A21;N�2; a

A32;N�2

� �; ð40Þ

which results in

b2;N�2 P1

2c2;N�2 3b1;N�2c1;N�2 qA11;N�2 � qA1

2;N�2

� �� qA2

2;N�2 � qA21;N�2

� �� �� 2b2;N�1 qA2

2;N�1 � qA21;N�1

� �c2;N�1 aA3

1:1;N�1 � aA32:1;N�1

� ��hþc2;N�2 aA2

1:1;N�1 � aA22:1;N�1

� ��þ qA3

2;N�1 � qA31;N�1

� �aA3

1:1;N�1

��aA3

2:1;N�1

�þ qA3

2;N�2 � qA31;N�2

� �� 2b1;N�1b2;N�1c1;N�2

qA11;N�1 � qA1

2;N�1

� �� 3c2;N�2 aA1

1:1;N�1 � aA12:1;N�1

� �hþ2c2;N�2 aA2

1:1;N�1 � aA22:1;N�1

� �þ 3c2;N�1 aA3

1:1;N�1 � aA32:1;N�1

� �ii: ð41Þ

For the next period in the backward induction, N � 3, the coop-eration conditions for b1,N�3 and b2,N�3 continue to grow in size, asparameters and coefficients of all following periods play a role. It isno longer practical to represent them in analytical form. However,equations can still be solved using mathematical software and con-crete results can be derived based on data.

References

Barber, K.S., 2007. Multi-Scale Behavioral Modeling and Analysis Promoting aFundamental Understanding of Agent-Based System Design and Operation.Final Technical Report (AFRL-IF-RS-TR-2007-58). <http://stinet.dtic.mil/cgi-bin/GetTRDoc?AD=ADA465613&Location=U2&doc=GetTRDoc.pdf> (accessed10.09.09).

Bard, J.F., 1998. Practical Bilevel Optimization: Algorithms and Applications. KluwerAcademic, Boston.

Bellman, R.E., 1957. Dynamic Programming. Princeton University Press, Princenton,NJ.

Bowling, M., Veloso, M., 2000. An Analysis of Stochastic Game Theory for MultiagentReinforcement Learning, School of Computer Science. Carnegie MellonUniversity.

Brânzei, R., Fragnelli, V., Tijs, S., 2002. Tree-connected peer group situations andpeer group games. Mathematical Methods of Operations Research 55 (1), 93–106.

Brown, G.W., 1951. Iterative solution of games by fictitious play. In: Koopmans, T.C.(Ed.), Activity Analysis of Production and Allocation. Wiley, New York.

Chang, H.S., Fard, P.J., Marcus, S.I., Shayman, M., 2003. Multitime scale Markovdecision processes. IEEE Transactions on Automatic Control 48 (6), 976–987.

Cruz Jr., J.B., Simaan, M.A., Gacic, A., Jiang, H., Letelliier, B., Li, M., Liu, Y., 2001. Game-theoretic modeling and control of a military air operation. IEEE Transactions onAerospace and Electronic Systems 37 (4), 1393–1405.

Darabi, H., Mansouri, M., Andalibi, N., Para, E., 2010. A framework for decisionmaking in extended enterprises: the FAA NextGen case. In: InternationalCongress on Ultra Modern Telecommunications and Control Systems andWorkshops (ICUMT). IEEE, pp. 662–667.

Deng, X., Papadimitriou, C.H., 1999. Decision-making by hierarchies of discordantagents. Mathematical Programming 86 (2), 417–431.

Dolgov, D., Durfee, E., 2004. Graphical models in local, asymmetric multi-agentmarkov decision processes. In: Proceedings of the Third International JointConference on Autonomous Agents and Multiagent Systems, vol. 2, pp. 956–963.

Filar, J.A., Vrieze, K., 1996. Competitive Markov Decision Processes. Springer, NewYork.

Geanakoplos, J., Milgrom, P., 1991. A theory of hierarchies based on limitedmanagerial attention. Journal of the Japanese and International Economies 5 (3),205–225.

Gfrerer, H., Zäpfel, G., 1995. Hierarchical model for production planning in the caseof uncertain demand. European Journal of Operational Research 86 (1), 142–161.

Goseva-Popstojanova, K., Trivedi, K.S., 2000. Stochastic modeling formalisms fordependability, performance and performability. In: Haring, G., Lindemann, C.,Reiser, M. (Eds.), Performance Evaluation – Origins and Directions. Springer,New York.

Groves, T., 1973. Incentives in teams. Econometrica 41 (4), 617–631.Haimes, Y.Y., 1981. Hierarchical holographic modeling. IEEE Transactions on

Systems, Man and Cybernetics 11 (9), 606–617.Haimes, Y.Y., Tarvainen, K., Shima, T., Thadathil, J., 1990. Hierarchical Multiobjective

Analysis of Large-scale Systems. Hemisphere Pub. Corp., New York.Haimes, Y.Y., Lambert, J., Duan, L., Schooff, R., Tulsiani, V., 1995. Hierarchical

holographic modeling for risk identification in complex systems. IEEEInternational Conference on Systems, Man and Cybernetics, 1027–1032.

Hauskrecht, M., Meuleau, N., Kaelbling, L.P., Dean, T., Boutilier, C., 1998. Hierarchicalsolution of markov decision processes using macro-actions. In: Proceedings ofthe Fourteenth Conference on Uncertainty in Artificial Intelligence. Universityof Wisconsin Business School, Madison, WI, pp. 220–229.

Hax, A.C., Meal, H.C., 1975. Hierarchical integration of production planning andscheduling. In: Geisler, M.A. (Ed.), Studies in the Management Sciences. NorthHolland, Amsterdam.

Heinrich, C.E., Schneeweiss, C., 1986. Multi-stage lot-sizing for general productionsystems. In: Axsäter, S., Schneeweiss, C., Silver, E. (Eds.), Multistage ProductionPlanning and Inventory Control. Lecture Notes in Economics and MathematicalSystems, vol. 266. Springer, Berlin.

Howard, R.A., 1960. Dynamic Programming and Markov Process. MIT Press,Cambridge, MA.

Hu, J., Wellman, M.P., 1998. Multiagent reinforcement learning: theoreticalframework and an algorithm. In: Proceedings of the Fifteenth InternationalConference on Machine Learning. Morgan Kaufman, San Francisco, pp. 242–250.

Jacobson, M., Shimkin, N., Shwartz, A., 2003. Markov decision processes withslow scale periodic decisions. Mathematics of Operations Research 28 (4),777–800.

Krothapalli, N.K.C., Deshmukh, A., 1999. Design of negotiation protocols for multi-agent manufacturing systems. International Journal of Production Research 37(7), 1601–1624.

Laffont, J.J., 1990. Analysis of hidden gaming in a three-level hierarchy. Journal ofLaw, Economics, and Organization 6 (2), 301–324.

Littman, M.L., 1994. Markov games as a framework for multi-agent reinforcementlearning. In: Proceedings of the Eleventh International Conference on MachineLearning. Morgan Kaufman, Rutgers University, New Brunswick, NJ, pp. 157–163.

Marschak, J., Radner, R., 1972. Economic Theory of Teams. Yale University Press,New Haven.

Mertens, J.F., 1992. Stochastic games. In: Aumann, R.J., Hart, S. (Eds.), Handbook ofGame Theory with Economic Applications. North-Holland, Amsterdam.

Mesarovic, M.D., Macko, D., Takahara, Y., 1970. Theory of Hierarchical, Multilevel,Systems. Academic Press, New York.

Middelkoop, T., Deshmukh, A., 1999. Caution! Agent based systems in operation.International Journal of Complex Systems, 256.

Monostori, L., Váncza, J., Kumara, S., 2006. Agent-based systems for manufacturing.CIRP Annals-Manufacturing Technology 55 (2), 697–720.

Muppala, J.K., Malhotra, M., Trivedi, K.S., 1996. Markov dependability models ofcomplex systems: analysis techniques. In: Özekici, S. (Ed.), Reliability andMaintenance of Complex Systems. Springer, Berlin, Germany.

Nie, P., Chen, L., Fukushima, M., 2006. Dynamic programming approach to discretetime dynamic feedback Stackelberg games with independent and dependentfollowers. European Journal of Operational Research 169 (1), 310–328.

Okada, N., Mikami, Y., 1992. A game-theoretic approach to acid rain abatement:conflict analysis of environmental load allocation. Journal of the AmericanWater Resources Association 28 (1), 155–162.

Özdamar, L., Bozyel, M.A., Birbil, S.I., 1998. A hierarchical decision support systemfor production planning (with case study). European Journal of OperationalResearch 104 (3), 403–422.

Parr, R.E., 1998. Hierarchical Control and Learning for Markov Decision Processes.PhD Thesis, University of California, Berkeley.

Pollatschek, M.A., Avi-Itzhak, B., 1969. Algorithms for stochastic games withgeometrical interpretation. Management Science 15 (7), 399–415.

Puterman, M.L., 1994. Markov Decision Processes: Discrete Stochastic DynamicProgramming. Wiley, New York.

Raghavan, T.E.S., Filar, J.A., 1991. Algorithms for stochastic games – a survey.Mathematical Methods of Operations Research (ZOR) 35 (6), 437–472.

Robinson, J., 1951. An iterative method of solving a game. The Annals ofMathematics 54 (2), 296–301.

Page 13: Unifying temporal and organizational scales in multiscale decision-making

C. Wernz, A. Deshmukh / European Journal of Operational Research 223 (2012) 739–751 751

Schneeweiss, C., 1995. Hierarchical structures in organizations: a conceptualframework. European Journal of Operational Research 86 (1), 4–31.

Schneeweiss, C., 2003a. Distributed Decision Making. Springer, Berlin.Schneeweiss, C., 2003b. Distributed decision making – a unified approach. European

Journal of Operational Research 150 (2), 237–252.Schneeweiss, C., Zimmer, K., 2004. Hierarchical coordination mechanisms within

the supply chain. European Journal of Operational Research 153 (3), 687–703.Sethi, S.P., Zhang, Q., 1994. Hierarchical Decision Making in Stochastic

Manufacturing Systems. Birkhäuser Verlag, Basel, Switzerland.Shapley, L.S., 1953. Stochastic games. Proceedings of the National Academy of

Sciences 39 (10), 1095–1100.Stackelberg, H.v., 1952. The Theory of the Market Economy. Oxford University Press,

New York.Stadtler, H., 2005. Supply chain management and advanced planning – basics,

overview and challenges. European Journal of Operational Research 163 (3),575–588.

Sutton, R.S., 1995. TD models: modeling the world at a mixture of time scales. In:Proceedings of the Twelfth International Conference on Machine Learning,Tahoe City, CA, pp. 531–539.

Sutton, R.S., Barto, A.G., 1998. Reinforcement Learning: An Introduction. MIT Press,Cambridge, MA.

Sutton, R.S., Precup, D., Singh, S., 1999. Between MDPs and semi-MDPs: aframework for temporal abstraction in reinforcement learning. ArtificialIntelligence 112 (1–2), 181–211.

Vetschera, R., 2000. A multi-criteria agency model with incomplete preferenceinformation. European Journal of Operational Research 126 (1), 152–165.

Vrieze, O.J., 1987. Stochastic Games with Finite State and Action Spaces. CWI Tracts,Amsterdam.

Weiss, G., 1999. Multiagent Systems: A Modern Approach to Distributed ArtificialIntelligence. MIT Press, Cambridge, MA.

Wernz, C., 2008. Multiscale Decision Making: Bridging Temporal and OrganizationalScales in Hierarchical Systems. Mechanical and Industrial Engineering.Dissertation, University of Massachusetts Amherst. Ph.D.

Wernz, C., Deshmukh, A., 2007a. Decision strategies and design of agentinteractions in hierarchical manufacturing systems. Journal of ManufacturingSystems 26 (2), 135–143.

Wernz, C., Deshmukh, A., 2007b. Managing hierarchies in a flat world. In:Proceedings of the 2007 Industrial Engineering Research Conference,Nashville, TN, pp. 1266–1271.

Wernz, C., Deshmukh, A., 2009. An incentive-based, multi-period decision model forhierarchical systems. In: Proceedings of the 3rd Annual Conference of the IndianSubcontinent Decision Sciences Institute Region (ISDSI), Hyderabad, India.

Wernz, C., Deshmukh, A., 2010a. Multi-time-scale decision making for strategicagent interactions. In: Proceedings of the 2010 Industrial Engineering ResearchConference, Cancun, Mexico.

Wernz, C., Deshmukh, A., 2010b. Multiscale decision-making: bridgingorganizational scales in systems with distributed decision makers. EuropeanJournal of Operational Research 202 (3), 828–840.

Wernz, C., Henry, A., 2009. Multilevel coordination and decision-making in serviceoperations. Service Science 1 (4), 270–283.

Zachrisson, L.E., 1964. Markov games’ advances in game theory. Annals ofMathematical Studies 52, 211–253.


Recommended