+ All Categories
Home > Documents > 1608 IEEE/ACM TRANSACTIONS ON NETWORKING, …inlab.lab.asu.edu/Publications/BodShaYin_12.pdfa nice...

1608 IEEE/ACM TRANSACTIONS ON NETWORKING, …inlab.lab.asu.edu/Publications/BodShaYin_12.pdfa nice...

Date post: 19-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
14
1608 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 5, OCTOBER 2012 Low-Complexity Scheduling Algorithms for Multichannel Downlink Wireless Networks Shreeshankar Bodas, Sanjay Shakkottai, Member, IEEE, Lei Ying, Member, IEEE, and R. Srikant, Fellow, IEEE Abstract—This paper considers the problem of designing sched- uling algorithms for multichannel (e.g., OFDM-based) wireless downlink networks, with a large number of users and propor- tionally large bandwidth. For this system, while the classical MaxWeight algorithm is known to be throughput-optimal, its buffer-overow performance is very poor (formally, it is shown that it has zero rate function in our setting). To address this, a class of algorithms called iterated Heaviest matching with Longest Queues First (iHLQF) is proposed. The algorithms in this class are shown to be throughput-optimal for a general class of arrival/channel processes, and also rate-function-optimal (i.e., exponentially small buffer overow probability) for certain ar- rival/channel processes. iHLQF, however, has higher complexity than MaxWeight ( versus , respectively). To overcome this issue, a new algorithm called Server-Side Greedy (SSG) is proposed. It is shown that SSG is throughput-optimal, results in a much better per-user buffer overow performance than the MaxWeight algorithm (positive rate function for certain ar- rival/channel processes), and has a computational complexity that is comparable to the MaxWeight algorithm. Thus, it provides a nice tradeoff between buffer-overow performance and compu- tational complexity. These results are validated by both analysis and simulations. Index Terms—Large deviations, low complexity, scheduling algorithms, small buffer. I. INTRODUCTION T HE EMERGENCE of 4G orthogonal frequency-division multiplexing (OFDM)-based wireless systems (e.g., WiMAX [2] and LTE [3]) has led to an increasing interest in the design of scheduling algorithms for the downlink of wireless networks. In these systems, the downlink bandwidth Manuscript received May 18, 2010; revised December 31, 2010; July 25, 2011; and December 25, 2011; accepted January 04, 2012; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor S. Borst. Date of publica- tion February 29, 2012; date of current version October 11, 2012. This work was supported in part by the NSF under Grants CNS-0721286, CNS-0347400, CNS-0721380, 09-53165, and HDTRA1-09-1-005 and the DARPA ITMANET program. A shorter version of this paper appeared in the Proceedings of the IEEE International Conference on Computer Communications (INFOCOM), San Diego, CA, March 15–19, 2010. S. Bodas was with the Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712 USA. He is now with the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail: [email protected]). S. Shakkottai is with the Department of Electrical and Computer Engi- neering, The University of Texas at Austin, Austin, TX 78712 USA (e-mail: [email protected]). L. Ying is with the Department of Electrical and Computer Engineering, Iowa State University, Ames, IA 50011 USA (e-mail: [email protected]). R. Srikant is with the Department of Electrical and Computer Engineering, University of Illinois at Urbana–Champaign, Urbana, IL 61801 USA (e-mail: [email protected]). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TNET.2012.2185709 Fig. 1. System model. is partitioned into several (tens to hundreds) parallel channels. Scheduling decisions are made every time-slot (2–5 ms), where in each time-slot, each of the channels is allocated to a poten- tially different user. The data rate that a channel can support is time-slot-, user- and channel-dependent. This scenario trans- lates into a multiuser multiserver system, where a given server (channel) can serve only one user per time-slot. For typical values of these parameters (number of channels, data rates, etc.), please see [4, Section 3]. From a networking perspective, the goal is to allocate re- sources in this multichannel system to satisfy certain objectives. The fundamental objective is to design scheduling algorithms that provide the maximum network throughput. It is well known that the MaxWeight algorithm ([5]; also see Denition 2) is throughput-optimal for this system. The MaxWeight algorithm has received considerable attention from the researchers, and has been analyzed in a variety of scenarios such as the large queues [6]–[8] or the heavy-load [9]–[11] regime. However, performance (e.g., small per-user queues) is equally important in order to ensure small user-perceived delays. In this paper, we develop low-complexity and throughput-optimal algorithms that also provide good per- formance. A key outcome of our study is that the algorithm design insights from a buffer-overow performance view- point are different from the design insights that follow from a throughput-optimality viewpoint, as discussed in the following section. A. Main Contributions To put this section in context, we briey describe the system model. For more details, please see Section II. We consider a multiqueue, multiserver queuing system as shown in Fig. 1. There are external packet arrivals to the queues, and the chan- nels connecting the queues to the servers support time-varying data-rates. As in an OFDM system, a given server (frequency band) can be allocated to serve only one queue in a given time-slot. The goal is to design a scheduling rule for allocating 1063-6692/$31.00 © 2012 IEEE
Transcript
Page 1: 1608 IEEE/ACM TRANSACTIONS ON NETWORKING, …inlab.lab.asu.edu/Publications/BodShaYin_12.pdfa nice tradeoff between buffer-overflow performance and compu-tational complexity. These

1608 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 5, OCTOBER 2012

Low-Complexity Scheduling Algorithms forMultichannel Downlink Wireless Networks

Shreeshankar Bodas, Sanjay Shakkottai, Member, IEEE, Lei Ying, Member, IEEE, and R. Srikant, Fellow, IEEE

Abstract—This paper considers the problem of designing sched-uling algorithms for multichannel (e.g., OFDM-based) wirelessdownlink networks, with a large number of users and propor-tionally large bandwidth. For this system, while the classicalMaxWeight algorithm is known to be throughput-optimal, itsbuffer-overflow performance is very poor (formally, it is shownthat it has zero rate function in our setting). To address this,a class of algorithms called iterated Heaviest matching withLongest Queues First (iHLQF) is proposed. The algorithms inthis class are shown to be throughput-optimal for a general classof arrival/channel processes, and also rate-function-optimal (i.e.,exponentially small buffer overflow probability) for certain ar-rival/channel processes. iHLQF, however, has higher complexitythan MaxWeight ( versus , respectively). To overcomethis issue, a new algorithm called Server-Side Greedy (SSG) isproposed. It is shown that SSG is throughput-optimal, resultsin a much better per-user buffer overflow performance thanthe MaxWeight algorithm (positive rate function for certain ar-rival/channel processes), and has a computational complexitythat is comparable to the MaxWeight algorithm. Thus, it providesa nice tradeoff between buffer-overflow performance and compu-tational complexity. These results are validated by both analysisand simulations.

Index Terms—Large deviations, low complexity, schedulingalgorithms, small buffer.

I. INTRODUCTION

T HE EMERGENCE of 4G orthogonal frequency-divisionmultiplexing (OFDM)-based wireless systems (e.g.,

WiMAX [2] and LTE [3]) has led to an increasing interestin the design of scheduling algorithms for the downlink ofwireless networks. In these systems, the downlink bandwidth

Manuscript received May 18, 2010; revised December 31, 2010; July 25,2011; and December 25, 2011; accepted January 04, 2012; approved byIEEE/ACM TRANSACTIONS ON NETWORKING Editor S. Borst. Date of publica-tion February 29, 2012; date of current version October 11, 2012. This workwas supported in part by the NSF under Grants CNS-0721286, CNS-0347400,CNS-0721380, 09-53165, and HDTRA1-09-1-005 and the DARPA ITMANETprogram. A shorter version of this paper appeared in the Proceedings of theIEEE International Conference on Computer Communications (INFOCOM),San Diego, CA, March 15–19, 2010.S. Bodas was with the Department of Electrical and Computer Engineering,

The University of Texas at Austin, Austin, TX 78712 USA. He is now withthe Department of Electrical Engineering and Computer Science,MassachusettsInstitute of Technology, Cambridge, MA 02139 USA (e-mail: [email protected]).S. Shakkottai is with the Department of Electrical and Computer Engi-

neering, The University of Texas at Austin, Austin, TX 78712 USA (e-mail:[email protected]).L. Ying is with the Department of Electrical and Computer Engineering, Iowa

State University, Ames, IA 50011 USA (e-mail: [email protected]).R. Srikant is with the Department of Electrical and Computer Engineering,

University of Illinois at Urbana–Champaign, Urbana, IL 61801 USA (e-mail:[email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TNET.2012.2185709

Fig. 1. System model.

is partitioned into several (tens to hundreds) parallel channels.Scheduling decisions are made every time-slot (2–5 ms), wherein each time-slot, each of the channels is allocated to a poten-tially different user. The data rate that a channel can support istime-slot-, user- and channel-dependent. This scenario trans-lates into a multiuser multiserver system, where a given server(channel) can serve only one user per time-slot. For typicalvalues of these parameters (number of channels, data rates,etc.), please see [4, Section 3].From a networking perspective, the goal is to allocate re-

sources in this multichannel system to satisfy certain objectives.The fundamental objective is to design scheduling algorithmsthat provide the maximum network throughput. It is well knownthat the MaxWeight algorithm ([5]; also see Definition 2) isthroughput-optimal for this system. The MaxWeight algorithmhas received considerable attention from the researchers, andhas been analyzed in a variety of scenarios such as the largequeues [6]–[8] or the heavy-load [9]–[11] regime.However, performance (e.g., small per-user queues) is

equally important in order to ensure small user-perceiveddelays. In this paper, we develop low-complexity andthroughput-optimal algorithms that also provide good per-formance. A key outcome of our study is that the algorithmdesign insights from a buffer-overflow performance view-point are different from the design insights that follow from athroughput-optimality viewpoint, as discussed in the followingsection.

A. Main Contributions

To put this section in context, we briefly describe the systemmodel. For more details, please see Section II. We consider amultiqueue, multiserver queuing system as shown in Fig. 1.There are external packet arrivals to the queues, and the chan-nels connecting the queues to the servers support time-varyingdata-rates. As in an OFDM system, a given server (frequencyband) can be allocated to serve only one queue in a giventime-slot. The goal is to design a scheduling rule for allocating

1063-6692/$31.00 © 2012 IEEE

Page 2: 1608 IEEE/ACM TRANSACTIONS ON NETWORKING, …inlab.lab.asu.edu/Publications/BodShaYin_12.pdfa nice tradeoff between buffer-overflow performance and compu-tational complexity. These

BODAS et al.: LOW-COMPLEXITY SCHEDULING ALGORITHMS FOR MULTICHANNEL DOWNLINK WIRELESS NETWORKS 1609

the servers to the queues that, in addition to throughput opti-mality, also guarantees small per-user queues. We look at this“small-queues” or “small-buffer at the base-station” problemfrom a large deviations perspective, where the number of usersin the system and the available bandwidth is large. In [4], wehave shown that a class of algorithms called iterated LongestQueues First (iLQF), under certain technical conditions, israte-function-optimal for the small buffer overflow event.However, the results regarding iLQF were derived assumingsymmetric, i.i.d., ON–OFF traffic and i.i.d. ON–OFF channels. Inparticular, for more general arrival and channel processes, anumber of fundamental questions were left unanswered, suchas the following.1) Are the algorithms in the iLQF class throughput-optimalfor the system?

2) The well-known MaxWeight algorithm [5] is throughput-optimal for the system. What is its performance for thesmall buffer overflow problem?

3) The iLQF-class algorithms typically have a higher com-putational complexity than the MaxWeight algorithm. Canwe design an algorithm with lower complexity withoutcompromising either throughput optimality or small-queueperformance?

In this paper, we show that the answers to these questions areyes, poor, and yes, respectively. The following is a summary ofour main contributions in this paper.• We show that a generalization of the iLQF-class rules,namely iHLQF, is throughput-optimal for very general ar-rival and channel process models (Section III).

• We show that the classical MaxWeight rule is very poor atkeeping the per-user queues small (Section IV). Formally,we show that this rule results in a zero rate function (tobe defined) for the small buffer overflow event defined inSection II.

• We propose a new scheduling algorithm called theServer-Side Greedy (SSG) service rule, which is aniterative version of the MaxWeight rule, where thequeue lengths are updated after each server (OFDMsubchannel) finishes its service. We show that this ruleis throughput-optimal under general arrival and channelprocesses, results in a strictly positive rate function forthe small buffer overflow event (implying small per-userqueues), and has complexity comparable to that of theMaxWeight rule and much less than the iLQF-class rules(Section V).

An important design insight that emerges from the aboveresults is the following: For throughput optimality, theMaxWeight algorithm argues that the scheduling algorithmshould maximize the sum of channel-rate-weighted queuelengths at each time-slot. However, from a small-queue per-formance viewpoint, our results indicate that as long as weare “close” to the maximum weighted sum, the schedulingalgorithm’s objective should shift to equalizing the queues, andthat this allocation of channel resources should proceed in aniterative manner.

B. Related Work

Scheduling for multiuser wireless networks is a well-in-vestigated problem [12]–[15]. Researchers have analyzed thisproblem from a variety of angles: heavy traffic limits [9]–[11],

TABLE INOTATION

tail probabilities of queue lengths [6]–[8], and energy-delaytradeoffs for wireless downlink [16]. Another line of inquiryinvestigates the problem of delay or packet-loss in input-queuedswitches [17], [18].These results provide very useful guidelines for designing

scheduling algorithms, but strictly speaking, a majority of theearlier results are valid only in the large queues regime, i.e., asthe queue lengths tend to infinity. In a recent paper [19], a modelsimilar to our model was considered, and optimality results werederived for the two-user case. To the best of our knowledge,the first paper to consider the small buffer overflow problemin a large deviations setting was [4], where we considered a re-stricted version of the model in this paper and derived rate func-tion optimality results in the many-user, small-queues regime.

II. SYSTEM MODEL

We consider a multiqueue, multiserver, discrete-time queuingsystem as shown in Fig. 1. Table I summarizes the notationused in this paper. The system has queues and servers,connected by time-varying channels that can potentially changefrom time-slot to time-slot. Here, is a fixed constant,independent of .1 This range of values of is of interest be-cause in typical OFDM-based wireless downlink systems, thenumber of orthogonal channels is much larger than the numberof active users. If no confusion is possible, we denoteby . We assume that takevalues in the set of nonnegative integers.Wemake the followingassumptions regarding the arrival and channel processes.Assumption 1 (Motivated by [20]):

Channel State Process:1) Let denote the set of possible channel states. The channelstate process has a stationary distribution , and

for all .2) Let denote the channel state in time-slot . Given

, there exists a positive integer such that for all, all , and all , we have

1We ignore the issues involving not being an integer. The reason is thatin the limit as can be replaced by with cosmetic changes tothe proofs. For finite , the reader may consider a sequencethat approaches such that is an integer for all , and a system with

servers. Hereafter in this paper, we do not discuss this issue.

Page 3: 1608 IEEE/ACM TRANSACTIONS ON NETWORKING, …inlab.lab.asu.edu/Publications/BodShaYin_12.pdfa nice tradeoff between buffer-overflow performance and compu-tational complexity. These

1610 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 5, OCTOBER 2012

3) There exists such that for all .Arrival Process:1) The arrivals to each queue form a stationary process,with mean .

2) The given set of arrival rates lies inside the throughputregion of the system, i.e., there exists a static service splitrule that can stabilize the system under the given arrivalprocess.

3) Given , there exists a positive integer such thatfor all , and for all , we have

4) The arrival process satisfies, for all

Our aim is to design a service policy for this system that meetscertain performance metrics that will be subsequently defined.A service policy is essentially a rule that allocates the serversto the queues in every time-slot by defining the random vari-ables , where

if is allocated to serve in time-slototherwise.

If no confusion is possible, we denote by . Weimpose the condition that in a given time-slot, a given server canbe allocated to serve at most one queue. This condition trans-lates to the following: For all and all , anyvalid service policy must obey . The individualqueues in the system evolve according to the following equa-tion: For

Each queue stores the incoming packets in a buffer of infinitesize, so that no packets are ever dropped. Our objective is todesign a service rule that allocates the servers to the queues ineach time-slot based on the information of the entire historyof queue lengths, arrival and channel realizations and allocationdecisions, and also the arrivals and channel realizations in thecurrent time-slot, and any amount of external randomness (ifnecessary).Our first goal is to identify policies other than MaxWeight-

type policies that are throughput-optimal under Assumption 1;the reason being that while throughput optimality (or systemstability) is of prime importance, we want to also guaranteesmall per-user delay under “well-behaved” arrival and channelprocesses. The very general conditions of Assumption 1 allowfor a wide class of arrival and channel processes with correla-tions across users or channels, bursty arrivals with unboundedsupports, etc., and make the delay (or queue size) analysis in-tractable. Thus, we study the queue-length performance of dif-ferent algorithms under the following more restrictive set of as-sumptions, with the hope of developing general principles foralgorithm design for the wireless downlink systems.Assumption 2: The number of packet arrivals to queue in

time-slot is the random variable , wherewith probabilitywith probability

where is an integer with . In time-slot , theserver can potentially serve packets from , where

are modeled as Bernoulli random variables with

with probabilitywith probability

with . All the random variables and areassumed to be mutually independent for all possible values ofthe involved parameters.Note that Assumption 2 is subsumed by Assumption 1. Our

goal is to design policies that, under Assumption 2 and for everyinteger , result in a strictly positive value of

Here, is the length of at the end of time-slot 0. Theprobability measure in the above expression is the stationarymeasure of the queue-length process (it may be helpful toimagine that the queuing system is started at time ).Furthermore, we require that the complexity of the policy

must be computations per time-slot. The reason wechoose this benchmark is that, as we later show, the MaxWeightalgorithm has a complexity computations per time-slot.We refer to the event as the small buffer

overflow event or simply the overflow event. The probabilityterm in the above expression can thus be thought of as the prob-ability of the overflow event under the stationary distribution ofthe queue-length process (provided one exists).The function is called the rate function in the large

deviations theory. If a scheduling algorithm results in apositive value of , then for large values of , we have

. Thus, the probability of the

small buffer overflow event decays to zero rapidly with , andit is desirable to have as large a value of as possible.To summarize, our objective is to design an algorithm that

is throughput-optimal under Assumption 1, results in a strictlypositive value of the rate function under Assumption 2, and has acomplexity of computations per time-slot. We concludethis section with a simple result that relates the system parame-ters to stability under any algorithm.Lemma 1: Under Assumption 2, if , then the system

is unstable under any scheduling algorithm. If , thenthere exists a constant such that for all

, the system is stable under some algorithm.Proof: Please see Appendix A.

III. iHLQF CLASS OF ALGORITHMS

In this section, we present a class of scheduling rules callediterated Heaviest matching with Longest Queues First (iHLQF).We show that any algorithm in this class is throughput-optimal(Theorem 1) and relate the iLQF with PullUp algorithm in [4]to the iHLQF class.We consider a class of algorithms called iHLQF, which is

a generalization of the iLQF algorithms presented in [4]. TheiLQF class of algorithms is defined for systems with ON–OFFchannels. A particular algorithm in the iLQF-class, callediLQF with PullUp, was shown to be rate-function-optimalfor the small buffer overflow event under Assumption 2 with

Page 4: 1608 IEEE/ACM TRANSACTIONS ON NETWORKING, …inlab.lab.asu.edu/Publications/BodShaYin_12.pdfa nice tradeoff between buffer-overflow performance and compu-tational complexity. These

BODAS et al.: LOW-COMPLEXITY SCHEDULING ALGORITHMS FOR MULTICHANNEL DOWNLINK WIRELESS NETWORKS 1611

[4, Theorem 4] and was shown to have a positive ratefunction for all [4, Corollary 2]. Here, we present itsgeneralization (namely, iHLQF) to multirate channels (i.e.,

can take value other than 0 and 1). In every time-slot,the iHLQF rule proceeds in multiple rounds of server allocationas explained below.Definition 1 (The iHLQF Rule): The iHLQF rule allocates

servers to queues in time-slot according to the followingprocedure.

Input:1) The queue lengths for .2) The arrival vector for .3) The channel rates for .

Steps:1) Update the queue-length vector to account for thearrivals. Computefor all . Initialize and for

.2) For all , define

and

For any server and all , if , thenredefine , and use this updated value of

throughout the rest of the time-slot . Letdenote the length of the longest queue(s) immediatelyafter arrivals. Throughout the description of thisalgorithm, let denote the set of unallocatedservers.

3) In round , define a bipartite graph ,where the set of nodes represents the set of queuesof length (i.e., ), the set of nodesrepresents the servers in , and is the set of weightededges. The edge between nodes representing queueand server has a weight equal to . Find amaximum weight matching in the graph ,breaking ties arbitrarily. Allocate the servers to thequeues according to the matching , update thequeue lengths and remove the used servers from furtherconsideration. In particular, if is allocated , thendefine , remove theserver from , and define .

4) If , then define and stop. Else, decrementby 1. If , then define and stop. Else,

increment by 1, and go to Step 3.

Output:1) The server allocations, for

.2) The queue lengths for .

Here is a description of the algorithm in words. In everytime-slot, have multiple rounds of server allocation. In eachround, choose the heaviest (edge-weight) matching between the

set of longest queues and available serves, breaking ties be-tween multiple heaviest matchings arbitrarily. Here, the weightof an edge is the corresponding channel rate. Allocate servers toqueues according to the matching, update the queue lengths, re-move the allocated servers from further consideration, and pro-ceed to the next round. When choosing the heaviest matching,the iHLQF rule allocates a server to a queue only if its channelto that queue has a high enough rate, as evident from the Step 2in the definition.Note that iHLQF is a class of rules and not a single rule,

as a result of the arbitrary tie-breaking between largest weightmatchings. We establish a crucial property of the iHLQF-classrules that is useful in proving their throughput optimality. First,we present some terminology. Fix any time-slot . At the begin-ning of the time-slot, let the state of the system be denoted bythe queue lengths , channel realizations , andarrivals . Let

be called the weight contributed by server to a schedule.Note that at most one term in the above summation is nonzerosince for at most one value of . Let the weight of aschedule in time-slot (for the given state) be defined as

that is, the sum of the weight-contributions of the individualservers. Note that depends upon the rule , although wedo not show the explicit dependence for simplicity of notation.We now establish that given the same queue lengths, the samenumber of arrivals (to each queue), and the same channel real-izations, the weight of the schedule selected by the MaxWeightrule is at most an additive constant more than that under anyiHLQF-class rule, where this constant depends upon the systemsize and the maximum channel rate , but is indepen-dent of the queue lengths.Lemma 2: Fix any iHLQF-class rule . Fix any time-slot

and the state of the system. Let the length of the queue im-mediately after arrivals be . Let beas defined in step 2 in the definition of the iHLQF-class rules(Definition 1). Then, .

Proof: Please see Appendix B.Theorem 1 (Throughput Optimality of iHLQF): Under

Assumption 1 on the arrival and channel processes, anyiHLQF-class rule makes the system stable in the mean, i.e.,

. In addition, if thearrival and channel state processes are such that the iHLQFrule makes the queuing system an aperiodic Markov chain witha single communicating class, then the stability in the meanimplies that the Markov chain is positive recurrent [21].

Proof: In view of Lemma 2, this proof is identical to thatof Theorem 5 presented in Section V and has been omitted toavoid repetition.We now turn our attention to the iLQF with PullUp rule, in-

troduced in [4]. This rule is an iHLQF-class rule for the systemunder Assumption 2. The iLQF with PullUp rule employs aparticular form of tie-breaking between the heaviest matchings,and in its original form, it terminates after the round when it

Page 5: 1608 IEEE/ACM TRANSACTIONS ON NETWORKING, …inlab.lab.asu.edu/Publications/BodShaYin_12.pdfa nice tradeoff between buffer-overflow performance and compu-tational complexity. These

1612 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 5, OCTOBER 2012

cannot find a matching that serves all the queues under con-sideration (queues of length ). This stopping rule ensures thatthe complexity of the rule is limited to computations pertime-slot. If we instead allow the iLQF with PullUp rule to ter-minate in the same way as an iHLQF-class rule (namely, whenno more servers are left or when all the nonempty queues havebeen considered for allocation), then it retains its rate-functionoptimality for the system under Assumption 2 with ,yields a strictly positive rate function for all , and is alsothroughput-optimal for the system under Assumption 1 withBernoulli (0–1) channels. We refer to this rule as the ModifiediLQF with PullUp rule. The following theorem formally sum-marizes these properties.Theorem 2 (Properties of Modified iLQF With PullUp): The

Modified iLQFwith PullUp belongs to the iHLQF class of rules,and the conclusions of Theorem 1 apply. The Modified iLQFwith PullUp rule is rate-function-optimal for the system underAssumption 2 with and results in

for all integers . Furthermore, it yields a strictly positiverate function under Assumption 2 for all , and can beimplemented in computations per time-slot.

Proof: For the proof of rate-function optimality for the case, we refer the reader to [4]. The proof for the case

is similar to the case , as follows. Assume the result forthe case and consider a system with queues andservers, i.e., a system with but with a larger “ .” Then,

is the rate function for this system, anda system with fewer queues and servers can onlyhave a larger rate function (formally, by the sample-path dom-inance property of the PullUp operation; see [4, Theorem 5]).The quantity is also a lower bound onthe rate function: The proof is based on constructing an eventthat leads to overflow under any algorithm (the same construc-tion as [4, Theorem 1]). Thus, the small buffer overflow eventhas the claimed rate function for all . The proof of com-putational complexity is elementary and has been omitted.Thus, the answer to question 1 in Section I-A is “Yes, the

iLQF rules (and their generalization, iHLQF, for multirate chan-nels) are throughput-optimal for the system.”

IV. MAXWEIGHT RULE

We now analyze the MaxWeight algorithm from differentangles and show that it performs poorly for the multichannelwireless downlink setting under consideration. The classicMaxWeight rule [5] results in the following service allocationrule for our system, with a particular tie-breaking rule for thesake of concreteness.Definition 2 (The MaxWeight Rule, [5]): In every time-slot,

each server independently picks a queue that maximizes theproduct of queue length and channel rate, breaking ties in favorof the smallest queue index.This service allocation rule is throughput-optimal for the

system. However, as the next theorem shows, it yields a zerorate function for the small buffer overflow event, implying avery poor small-queue performance.

Theorem 3 (MaxWeight Gives Zero Rate Function): UnderAssumption 2 with , with the MaxWeight rule for allo-cating servers to queues, and for any given integer , thereexists an integer such that for all , we have

Consequently, for every integer , we have

Proof: Please see Appendix C.The result of Theorem 3 can be strengthened to the following.Theorem 4: Consider any function such

that . Then, under Assumption 2, with theMaxWeight rule for allocating the servers to the queues, and forany fixed integer

Proof: The result for the case follows from theproof of Theorem 3, which shows that under Assumption 2 with

, the MaxWeight rule results in at least a constant prob-ability for the small buffer overflow event, for large enough.The proof for the case is almost identical.Thus, the answer to question 2 in Section I-A is “the

MaxWeight algorithm is very inefficient at keeping the per-userqueues small.” The main reason behind these negative resultsis that the MaxWeight rule potentially assigns all the avail-able servers to serve the longest queue, essentially treating aslightly shorter queue as if it were empty. When a large numberof servers are available, this results in draining the longestqueue(s) much more than is warranted by good load-bal-ancing, and also leads to the following situation: When theMaxWeight rule runs into a state when a significant fraction ofthe queues is long (note that such a state is reached infinitelyoften, almost surely, because the system is positive recurrentunder MaxWeight), then it is very difficult to leave this state“quickly.” Therefore, the MaxWeight rule is not effective inkeeping the queues really small.Lemma 3 (Complexity of MaxWeight): Under Assumption 2,

implementing the MaxWeight rule requires computa-tions per time-slot.

Proof: Please see Appendix D.Note that Lemma 3 holds under much weaker assumptions on

the channel process. This motivates us to design a schedulingrule that, in addition to throughput optimality, also guaranteesa good delay performance and has a computational complexitycomparable to that of the MaxWeight rule.

V. SSG SCHEDULING RULE

In this section, we propose the SSG service rule for theproblem stated in Section II. This service rule can be thoughtof as a recursive version of the MaxWeight rule, where thequeue lengths are updated after each server finishes its ser-vice. We show that the SSG rule is throughput-optimal forthe system (Theorem 5). Under Assumption 2, it results ina strictly positive value of the rate function, for every

Page 6: 1608 IEEE/ACM TRANSACTIONS ON NETWORKING, …inlab.lab.asu.edu/Publications/BodShaYin_12.pdfa nice tradeoff between buffer-overflow performance and compu-tational complexity. These

BODAS et al.: LOW-COMPLEXITY SCHEDULING ALGORITHMS FOR MULTICHANNEL DOWNLINK WIRELESS NETWORKS 1613

Fig. 2. Example of the SSG rule.

integer (Theorems 6 and 7). It can be implemented incomputations per time-slot (Theorem 8).

This (SSG) rule proceeds in multiple rounds of service allo-cation in every time-slot, explained as follows.Definition 3 (The SSG Rule): The SSG rule allocates servers

to queues in time-slot according to the following procedure.

Input:1) The queue lengths for .2) The arrival vector for .3) The channel rates for .

Steps:1) Update the queue length vector to account for thearrivals, i.e., compute for all . Initializeand for .

2) In the th round of service allocation, search for thequeue index

breaking ties in favor of the smaller queue index.Allocate the server to serve thus found, i.e.,define and for all .Update the length of so that

and the lengths of all other queues are left unchanged,i.e., for all .

3) If , then stop. Else, increment by 1, go to step 2.

Output:1) The server allocations, for

.2) The queue lengths for .

Unlike the MaxWeight rule, the SSG rule updates queuelengths after each server finishes its service. An example of theSSG rule is shown in Fig. 2. Now, we analyze the SSG servicerule in detail. Our first aim is to establish the throughput opti-mality of the SSG rule. We first establish that in every time-slot,the weight of the service schedule selected by the SSG rule isat most an additive constant away from the maximum possible.That is, under the same queue lengths at the beginning of thetime-slot, the same channel realizations and the same arrivals tothe queues, the sum of the channel-rate-weighted queue lengths

selected for service under the MaxWeight rule is at most anadditive constant larger than that under the SSG rule.Lemma 4: Fix any time-slot , and let the queue lengths im-

mediately after arrivals in that time-slot be . Then,.

Proof: Please see Appendix E.Theorem 5 (Throughput Optimality of SSG): Under

Assumption 1 on the arrival and channel processes, theSSG rule makes the system stable in the mean, i.e.,

. In addition, ifthe arrival and channel state processes are such that the SSGrule makes the queuing system an aperiodic Markov chain witha single communicating class, then the stability in the meanimplies that the Markov chain is positive recurrent [21].

Proof: Please see Appendix F.Remark 1: Lemma 4 and Theorem 5 hold even if the SSG

rule chooses an arbitrary tie-breaking rule in Step 2.Next, we show that if two queuing systems have sample-path

coupled arrivals and channels, both implement the SSG rule,and at the end of a time-slot, one system has queues that arerespectively longer than the corresponding queues in the secondsystem, then this property continues to hold for all the futuretime-slots.Lemma 5 (Sample-Path Dominance): Under Assump-

tion 2, consider two queuing systems and with queuesand , such that

at the end of some time-slot , we have for all .Both the systems have the same arrivals and channel realiza-tions for all times, and in particular for the time-slot . Bothimplement the SSG rule. Then, .

Proof: Please see Appendix G.As in the case of the proof of Theorem 3 in [4], this sample-

path property is the key to obtaining rate function positivity re-sults. The next technical lemma provides a sufficient conditionfor all the longest queues to be served by the SSG rule.Lemma 6: Under Assumption 2, let the set of longest queues

after arrivals be of cardinality . If in that time-slot, each one ofthe longest queues is connected to at least servers, then all thelongest queues are served at least once under SSG.

Proof: Let be the set of longestqueues. If is connected to server , then it is a longestqueue connected to . Since is connected to at leastservers, there are only other longest queues in the

system, and the queue lengths are updated after service, isserved by at least one server.We now show that under the SSG rule, the probability that

the maximum queue in the system increases in a given time-slotis extremely small for large .

Page 7: 1608 IEEE/ACM TRANSACTIONS ON NETWORKING, …inlab.lab.asu.edu/Publications/BodShaYin_12.pdfa nice tradeoff between buffer-overflow performance and compu-tational complexity. These

1614 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 5, OCTOBER 2012

Lemma 7: Let Assumption 2 hold with . Fix anyand . In particular, let

and . Then, under the SSG rule,for large enough, and for any given

Proof: Please see Appendix H.Next, we establish that under the SSG rule and for large, the

max. queue length in the system decreases in a constant numberof time-slots with at least a constant probability.Lemma 8: Under Assumption 2 with , there exists a

constant integer , independent of , such that for all largeenough, and for all

Proof: Please see Appendix I.As a consequence of Lemmas 7 and 8, the maximum queue

length in the system, has the following behavior: Overa constant number of time-slots, it increases by a finite amountwith very low probability, and decreases with at least a constantprobability. Thus, it is reasonable to expect that the stationarydistribution of is strongly concentrated around 0. In-deed, this is the essence of our next claim.Theorem 6 (Positive Rate Function Under SSG): Let

Assumption 2 hold with . Fix any and

. In particular, let

and . Fix any constant . If the systemuses the SSG rule for allocating servers to queues, then

Proof: The proof is similar to that of [4, Theorem 3]. Inparticular, under the SSG rule, the system has the sample-pathdominance property (Lemma 5), an exponentially decayingprobability of increasing in a time-slot (Lemma 7),and a constant probability of decreasing in a constantnumber of time-slots (Lemma 8). Thus, following the proof of[4, Theorem 3], the exponent of decay for increasingin a time-slot is a lower bound on the rate function for ,and the lower bound linearly scales with , concluding (anoutline of) the proof. We omit the details.The only difference between the iLQF with PullUp algorithm

in [4] and the SSG algorithm is decay exponent in the upperbound on increasing in a given time-slot. As a result,we get a weaker result than in [4], namely, that the exponentis possibly suboptimal in the case of SSG. Next, we show thatthe SSG rule returns a positive rate function for the small bufferoverflow event under a bursty arrival process (Assumption 2)for any fixed integer .

Theorem 7: Let Assumption 2 hold. Fix any

and . In particular, let

and . Fix any constant .If the system uses the SSG rule, then

Proof: We use Theorem 6 to establish the result. Startwith a system where all the queues have length(before arrivals). After packet arrivals, with high proba-bility, no more than queues have a length .Partition the set of servers into sets, with the set

. Using theresult of Theorem 6, with high probability, when the servers inthe set are allocated, the longest queue in the system is oflength (or less), and similarly for .We omit the details.An immediate strengthening of the above result is obtained

by maximizing the right-hand side (RHS) with respect to andover the appropriate ranges. We now turn our attention to thecomputational complexity of implementing the SSG rule.Theorem 8 (Complexity of the SSG Rule): The SSG rule can

be implemented in computations per time-slot.Proof: The proof is elementary and has been omitted.

Thus, we have designed a scheduling algorithm (the SSGrule) that is throughput-optimal, yields a positive rate functionfor the small buffer overflow event, and has a computationalcomplexity , which is no larger than the MaxWeight rule.This answers question 3 in Section I-A in the affirmative. Inview of these results, the new intuition that emerges from thiswork is that we should not allocate service to maximize thechannel-weighted sum of queue lengths. We should allocate re-sources in an iterative fashion, taking into account the effectsof prior allocations. This results in good performance (smallper-user queues) in addition to throughput optimality.

VI. SIMULATION RESULTS

We compare the performance of the proposed SSG rule withthe MaxWeight rule and the Modified iLQF with PullUp rulestudied in [4]. Throughout this section, we assume be-cause that is the case with the fewest system resources (as com-pared to the number of users) and therefore gives a conservativeestimate of performance.In the first set of simulations, we consider a system with

bursty arrivals as per Assumption 2 with and, i.e., a system with 95% of the maximum symmetric load.

The channel ON probability is set to . We run the simu-lations for time-slots, vary the number of queues andservers in the system, and study the empirical probability ofbuffer overflow and the empirical delay distribution of packets.The results are summarized in Figs. 3 and 4. It can be seenthat depending upon the system size, the MaxWeight algorithmneeds about 3–7 times as much buffer as SSG. Furthermore,the performance of the MaxWeight algorithm actually degrades

Page 8: 1608 IEEE/ACM TRANSACTIONS ON NETWORKING, …inlab.lab.asu.edu/Publications/BodShaYin_12.pdfa nice tradeoff between buffer-overflow performance and compu-tational complexity. These

BODAS et al.: LOW-COMPLEXITY SCHEDULING ALGORITHMS FOR MULTICHANNEL DOWNLINK WIRELESS NETWORKS 1615

Fig. 3. SSG versus MaxWeight: bursty load, buffer overflow probabilities.

Fig. 4. SSG versus MaxWeight: bursty load, packet delay profiles.

Fig. 5. SSG versus Modified iLQF with PullUp: asymmetric arrivals, bufferoverflow probabilities.

with the system size, while that of the SSG improves. Sim-ilar conclusions hold for the per-packet delay under the twoalgorithms.In the second set of simulations (Fig. 5), we compare the per-

formance of the SSG algorithm against the Modified iLQF withPullUp algorithm. We analyze a system with asymmetric ar-rival rates to the queues. Of the queues, we choosethree queues to receive much higher mean loads comparedto the others. In our simulations, queues , and re-ceive, in every time-slot, a random number of packets that isuniformly distributed in , while the other queues each re-ceive a packet with probability 0.12, all independently of eachother. We set the channel ON probability to to ensurethat the system is stable but heavily loaded (about 95.7% for

). We run the simulations for 10 time-slots. As can beseen from the plots (Figs. 5 and 6), the proposed SSG algorithmand the Modified iLQF with PullUp algorithm give very similarperformance, both for the small buffer overflow probabilities

Fig. 6. SSG versus Modified iLQF with PullUp: asymmetric arrivals, packetdelay profiles.

Fig. 7. SSG versus Modified MaxWeight: bursty load, buffer overflowprobabilities.

Fig. 8. SSG versus Modified MaxWeight: bursty load, packet delay profiles.

and delay profiles. In fact, we can hardly distinguish the twocurves in each pair. Even if the -axis scale is changed fromlogarithmic to linear, the two algorithms result in almost over-lapping curves.The third set of simulations has the same set up as the first

set, but we modify the MaxWeight rule as follows: Instead ofchoosing the longest queue with the smallest index, each servernow selects a longest queue that has received the minimum ser-vice so far (from the previous server allocation rounds) in thecurrent time-slot. The servers are allocated sequentially, fromto . We compare the buffer and delay performance of this

Modified MaxWeight algorithm and the SSG algorithm. The re-sults are summarized in Figs. 7 and 8. It can be seen that evenunder this implementation, the MaxWeight algorithm continuesto perform much worse than the SSG algorithm.In the fourth set of simulations, we considered a system with

queues and servers. We set the channel ON probability

Page 9: 1608 IEEE/ACM TRANSACTIONS ON NETWORKING, …inlab.lab.asu.edu/Publications/BodShaYin_12.pdfa nice tradeoff between buffer-overflow performance and compu-tational complexity. These

1616 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 5, OCTOBER 2012

Fig. 9. SSG versus Modified iLQF with PullUp: bursty load, buffer overflowprobabilities.

to and run the simulations for time-slots. Thearrival process is according to Assumption 2, with .We vary the probability of packet arrivals and compare theperformance of the SSG and the Modified iLQF with PullUpalgorithms. At , the system operates at 96% of themaximum stable load. As can be seen from Fig. 9, the two al-gorithms result in almost identical (empirical) buffer overflowprobabilities.As mentioned before, the simulations were (each) run for 10

time-slots. Some typical number for the busy periods simulatedfor the different algorithms, under similar loading (about 95%of capacity), are as follows: iLQF: ; SSG: ;and MaxWeight: . This fits well with the discussion so far,as the iLQF and SSG algorithms empty the queues much moreaggressively than the MaxWeight algorithm. All the simulationresults presented in this section vouch for the conclusion thatthe SSG and the Modified iLQF with PullUp algorithms per-form very similarly to each other, and consistently better thantheMaxWeight algorithm, under a variety of arrival and channelprocesses.

VII. CONCLUSION AND FUTURE WORK

We considered the problem of designing scheduling algo-rithms for OFDM-based wireless downlink from the point ofview of minimizing per-user delay, which is closely related tohaving small per-user queues at the base station. We first con-sidered a class of algorithms called iHLQF and showed thatall the algorithms in this class are throughput-optimal for thesystem. In addition, under a certain technical condition, thesealgorithms are rate-function-optimal for the small buffer over-flow event. However, the computational complexity of thesealgorithms is somewhat large. We then considered the classicMaxWeight algorithm and showed that it results in a very poorsmall-buffer performance for our system.We proposed a new al-gorithm called SSG (Server-Side Greedy) that is throughput-op-timal for the system, results in a positive rate function for thesmall buffer overflow event under symmetric, i.i.d., ON–OFF(possibly with multiple arrivals per time-slot) traffic and i.i.d.ON–OFF channels, and has a small computational complexity.We verified the results through simulations.The new intuition that emerges from our work is that

maximizing the sum of channel-rate-weighted queue lengths,although throughput-optimal, is not good for small per-userqueues. Instead, approximately maximizing this sum while

paying attention to the finer queuing dynamics results in goodsmall-buffer performance in addition to throughput optimality.

APPENDIX APROOF OF LEMMA 1

If , then the mean number of packet arrivals to thesystem in a given time-slot is more than the maximumnumber of packets that can be served in that time-slotsince one server can serve at most one packet. Thus, under anyalgorithm, the system is unstable.Consider the case . A server can potentially serve

a queue in time-slot if . Consider a sched-uling rule that, in every time-slot, allocates a server uni-formly at random to one of the queues that it can potentiallyserve. Elementary calculations [4], Theorem 2 show that the ex-pected number of packets that can be served from any givenqueue in a given time-slot is units. Define

. Then, for all ,we have , and every queue (and thus, thesystem) is stable.

APPENDIX BPROOF OF LEMMA 2

Fix any . We prove that the contribution ofthe server to the weights of the schedules under the iHLQFrule and the MaxWeight rule cannot differ by more than .Once we prove this, the desired result follows by taking a sum-mation over to .Case 1: Suppose that under the iHLQF-rule , the serveris allocated to some queue, . If , then we have

and the server contributes the sameweight to both the rules, by the definition of . Hence, we focuson the case . Furthermore, WLG let .We first show that under the iHLQF rule, the server is

allocated to serve a queue of length at least . Considerthe case when , i.e., the round in the algorithm when thequeues of length are considered for server allocation. (If thealgorithm never reaches a stage when the queues of lengthare considered for service, then as condition 1 below makes

clear, there is nothing to prove.) The following is an exhaustivelist of possibilities regarding at this stage.1) is not available for allocation in this round.2) is allocated in the round when .3) is available for allocation, but not allocated to serve inthis round (when ).

Under the condition 1, the server has already been allo-cated to a queue with length more than , and there is nothingto prove. Under the condition 2, it serves a queue of length ,and the claim holds. Under condition 3, the queue must beallocated a server, else the weight of the matching can be strictlyincreased by allocating to . As a result of allocation, the(updated) length of is at least .Consider the round when is again considered for service.

Repeating the above argument, either is allocated to serve aqueue of length , or is allocated a server, that re-duces its length further but no less than . Since thereare only servers available for allocation, the server is al-located to serve a queue of length at least .

Page 10: 1608 IEEE/ACM TRANSACTIONS ON NETWORKING, …inlab.lab.asu.edu/Publications/BodShaYin_12.pdfa nice tradeoff between buffer-overflow performance and compu-tational complexity. These

BODAS et al.: LOW-COMPLEXITY SCHEDULING ALGORITHMS FOR MULTICHANNEL DOWNLINK WIRELESS NETWORKS 1617

Furthermore, since (by construction) the server serves aqueue only if , we have

since for all .Case 2: Consider the case when the iHLQF rule does not

allocate the server to any queue. This can only happen ifand if all the packets in the queue are served

by the allocations under the rule . For if not, then the servercan be allocated to , which is a nonempty queue, con-

tradicting the termination condition of the algorithm. However,the “weight” that the server contributes to the MaxWeightschedule is , by definition of . Sinceand , it follows that the server ’s contribution tothe weight of the MaxWeight schedule is at most .Thus, we have demonstrated that the contribution of serverto the weights of the schedules under the iHLQF rule and the

MaxWeight rule cannot differ by more than , completingthe proof.

APPENDIX CPROOF OF THEOREM 3

Fix any integer , and consider the queues at the end of time-slot . Define , and .Time-slot : By the Chernoff bound, there exists an in-

teger such that for all , with probability at least, at least queues see arrivals in time-slot . De-

fine . Fix an integer such that for all, we have . Define , and con-

sider the first queues in the order of priority for service (afterarrivals). In particular, the first queue in the order of priorityis the longest queue with the smallest index, the second one isthe longest queue with the second smallest index or the secondlongest queue with the smallest index, and so on. Let the set ofthese queues be . Let denote theevent that server is not connected with any of the queues in. Then, since , we have

Thus, with probability at least , each one of theservers is connected to a queue in . By the definition of theMaxWeight rule, a server connected to one of the queues inis allocated to one of the queues in . Since and atleast queues had packet arrivals, it follows that at the endof time-slot , the system has at least queuesat length 1. By the union bound (for ), theprobability of this event is at least . Let thisset of queues (of length at least 1) be called .Time-slot : The arrivals in the time-slot are in-

dependent of all the random variables involved in the definitionof the set . Thus, by appropriately using the Chernoff bound,there exists an integer such that for all , with proba-bility at least , at least fraction of queues in the set seearrivals in time-slot . By an argument similar to that forTime-slot , it follows that, with probability at least ,no more than of the queues receive service. Combiningthe results for the time-slots and using the union bound, we have

the following conclusion: For all , withprobability at least , there exists a set ofqueues such that we have the following.• .• Each queue in has a length at least 2.Continuing this way (formally, by induction), the fol-

lowing claim holds: At the end of time-slot , for, with probability at least

, there exists a set of queuessuch that we have the following.• , implying

.• Each queue in has a length at least .There exists an integer such that for all , we have

and .Hence, for a system with , withat least a probability 1/2 and starting with any initial configura-tion of queue lengths, we have a queue of length at the endof a further time-slots. Thus, even for large , the smallbuffer overflow event occurs with at least a constant probability,and the proof is complete.

APPENDIX DPROOF OF LEMMA 3

Let Assumption 2 hold. In a given time-slot , the MaxWeightrule needs to find, for every server , a queue that maxi-mizes the product of the instantaneous channel rate andthe length of queue after packet arrivals (if any). Thus, forevery server, the rule needs to perform multiplications. Allthe multiplications are necessary: Not performing even oneof these multiplications (in the worst case) can lead to an incor-rect allocation. Furthermore, since all the channels are mutuallyindependent, each server needs separate computations, i.e.,the calculations not involving are useless for . Sincethere are servers in the system, any implementation of theMaxWeight rule requires computations per time-slot.

APPENDIX EPROOF OF LEMMA 4

Fix any . We prove that the contribution ofthe server to the weights of the schedules under the SSG ruleand the MaxWeight rule cannot differ by more than . Oncewe prove this, the desired result follows by taking a summationover to .Case 1: Suppose that under the SSG rule, the server is

allocated to some queue , while under the MaxWeight rule,it is allocated to . If , then we have

, and the server contributes the same weight toboth the rules. Hence, we focus on the case . In thiscase, is connected to server , but does not get served byunder the SSG rule. It follows that

Here, the inequality holds because

since queue lengths can only monotonically decrease

Page 11: 1608 IEEE/ACM TRANSACTIONS ON NETWORKING, …inlab.lab.asu.edu/Publications/BodShaYin_12.pdfa nice tradeoff between buffer-overflow performance and compu-tational complexity. These

1618 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 5, OCTOBER 2012

as the successive rounds proceed (recall that the arrivals occurbefore round 1). The inequality holds because in round ,the SSG rule allocates server to a queue that maximizes theproduct of the channel-rate and queue length. The inequalityholds because any given queue can receive at most units ofservice in a given round (of SSG), implying that the length of

after rounds is at least .It follows that

Case 2: Consider the case when the SSG rule does not allo-cate the server to any queue, but the MaxWeight rule assignsit to the queue . This can happen only if ,and if all the packets in the queue (in fact, in each of thequeues connected to ) are served by the allocations under theSSG rule, before the server is considered for service in theth round. For if not, then the server can be allocated to ,which is a nonempty queue, contradicting the definition of thealgorithm. However, the “weight” that the server contributesto the MaxWeight schedule is . Since and

, it follows that the server ’s contribution to theweight of the MaxWeight schedule is at most .Thus, we have demonstrated that the contribution of

server to the weighs of the schedules under the SSG ruleand the MaxWeight rule cannot differ by more than ,completing the proof.

APPENDIX FPROOF OF THEOREM 5

We use the following result [20, Theorem 1]. Consider asystem of queues . Let there bepossible channel states with strictly positive stationary proba-bilities, denoted by . Let denotethe offered service rate to queue under a candidate algo-rithm, when the channel state is . Let denote the achievablerate region when the channel state is . That is

The queue can be offered

a service rate for all

The authors of [20] make the following assumptions about, which are easily verifiable for our system: Each

is convex; bounded; if , then; and given any , for all

, there exists such that for anysatisfying for all and ,we have

where and.

The last property of is nontrivial, and we establishfor our system at the end of this proof. Define

If the scheduling algorithm is such that for any giventhere exists such that the service rates offered to thequeues, satisfy

(1)

whenever , then it makes the queuing system stable inthe mean, i.e., . In ad-dition, if the arrival and channel state processes are such that thecandidate rule makes the queuing system an aperiodic Markovchain with a single communicating class, then the stability in themean implies that the Markov chain is positive recurrent [21].We use this result to establish the throughput optimality of

the proposed SSG algorithm. Fix any time-slot , a channel re-alization , and let denote the length of after arrivals inthat time-slot (if any). Let denote the amount of service of-fered (in terms of the number of packets) to under SSG. Weknow from Lemma 4 that there exists a constantsuch that in the channel state , we have

Furthermore, we have

for some

Combining the last two inequalities

Note that by norm equivalence, we can equivalently satisfy (1)whenever for some constant . Let

. Then, for , we have

Since the constant can be chosen to be arbitrarily large, theratio can be made arbitrarily small. In particular,for any given we can choose so that(1) holds for our algorithm. Thus, conditioned on the validity of, the proof is complete.Proving : Again by norm equivalence, we consider the

case when for some . Without lossof generality, let . For each

Page 12: 1608 IEEE/ACM TRANSACTIONS ON NETWORKING, …inlab.lab.asu.edu/Publications/BodShaYin_12.pdfa nice tradeoff between buffer-overflow performance and compu-tational complexity. These

BODAS et al.: LOW-COMPLEXITY SCHEDULING ALGORITHMS FOR MULTICHANNEL DOWNLINK WIRELESS NETWORKS 1619

, we have , implying forall . Hence

Here, the inequality holds because maximizesthe RHS of inequality over all choices of for thegiven queue lengths (by definition of the function ).Equality holds because, in our system, adding the same con-stant to each of the queue lengths does not change theweight-maximizing schedule. The last inequality holds becauseis the maximum channel rate and , is the total number of

servers, implying that the maximum number of packets that canbe served in a given time-slot is . Thus, given any ,we can choose to guarantee

whenever , establishing and completing theproof.

APPENDIX GPROOF OF LEMMA 5

We need to prove that for all .Suppose that for some , we have

We need to prove that for all , andthe proof would be complete by the principle of mathematicalinduction.Let in the system , server be allocated to serve queue .

If (in the system ) the server is allocated to serve , thenthere is nothing to prove. If is not allocated to serve , itmust be because of one of the following reasons.1) for some , and isconnected to , i.e., .

2) for some , and thequeues are both among the longest queues con-nected to , so according to the tie-breaking rule, queue

is served in the th round.In case 1, we have (by hypothesis)

and by the definition of theSSG rule. (Otherwise, would have been allocated to servebecause .) Hence, irrespective of the allocationof (in the system ), we have ,

and consequently for all .

In case 2, we must have

because if , then

and implies that in the system , servermust have been allocated to and not . Therefore,we have , and hence

, implyingfor all .

APPENDIX HPROOF OF LEMMA 7

We first consider the case . The result for the casefollows as an immediate consequence.Define . For some time-slot , let

. Adding dummy packets if necessary, we considera system where all the queues are of length at the end oftime-slot , and by doing so, we get only a “worse” system, asystem with sample-pathwise longer queues for all future times,thanks to Lemma 5.Now, fix any . By the Chernoff bound for the

binomial random variables, for all , the probability that intime-slot , more than queues have arrivals is no morethan . We condition on the (high probability)event of having at most queues with a nonzero number ofarrivals. Again, adding packets if necessary, ensure that exactly

queues see arrivals in time-slot , so that after arrivals,the queue-length profile is the following:

Queue lengthNumber of queues

Fix , so that

and . Let denote the event that the first serverseach serve a longest queue (i.e., a queue of length ). Then

does not serve a longest queue

served a longest queue

All the channels connecting to the

longest queues are OFF

Therefore, for large enough, is a high-probability event.Conditioned on , the queue-length profile at the end of exactly

rounds of service is the following:

Queue lengthNumber of queues

Consider the system configuration after rounds ofservice. There are servers left for allocation, which(because of the choice of ) is more than . Consider any

Page 13: 1608 IEEE/ACM TRANSACTIONS ON NETWORKING, …inlab.lab.asu.edu/Publications/BodShaYin_12.pdfa nice tradeoff between buffer-overflow performance and compu-tational complexity. These

1620 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 5, OCTOBER 2012

one of the remaining longest queue, say . The probabilitythat is connected to one of the remaining servers, say ,is exactly since the decisions in the earlier rounds of the SSGalgorithm are independent of the channel realizations for . Letdenote the event that is connected to less than of the

remaining servers. By the Chernoffbound, and since

implying that (by the union bound) the probability that any oneof the longest queues is connected to less than servers is atmost . Thus, by Lemma 6, with prob-ability at least , all the remaininglongest queues are served.Therefore, by the union bound, the probability that in

a given time-slot, at least one of the longest queues isnot served is at most

, completing the proof for .For the case , consider a system with

servers and queues. This system has the same numberof servers and queues, and the earlier result (for the case

) applies. Therefore, the probability that in a giventime-slot, at least one of the longest queues is not servedis no more than

. The same upper bound clearlyholds if there are servers and queues, since by Lemma 5,the queue lengths in this system are sample-path dominated bythe system with queues and servers. (Informally, it is easierto keep a smaller number of queues from “overflowing.”) Thiscompletes the proof.

APPENDIX IPROOF OF LEMMA 8

As before, we first prove the claim for the case , andthen argue for the case . We first prove an auxiliary claim.Claim 1: Let the queue-length profile at the beginning of

time-slot be the following:

Queue lengthNumber of queues

Fix and . In particular,

let and . Define

. Then, there exists andan integer such that for all , with probability atleast (and adding dummy packets if necessary), thequeue-length profile at the beginning of the time-slot is

Queue lengthNumber of queues

The proof of this claim is based upon an intersection of afinite number of high-probability events (each of which occurswith probability at least for some ), which is itself ahigh-probability event.

Proof of Claim [1]: Fix . By the Chernoffbound, there exists an integer such that for all ,with probability at least , the number of

queues that see arrivals in time-slot is no more than .Adding dummy packets if necessary, we ensure that the numberof queues seeing arrivals is exactly . Let of the queues al-ready at length have packet arrivals. Thus, the queue-lengthprofile after arrivals is

Queue length Number of queues

Following an argument similar to that in the proof ofLemma 7, it follows that there exists and an integersuch that if , then with probability at least , atthe end of rounds of service, the queue-lengthprofile is the following (or, can be obtained by adding dummypackets):

Queue lengthNumber of queues

Thus, by an argument similar to that in the proof of Lemma 7,there exists and an integer such that if ,then with probability at least , after a further

rounds of service, the queue-length profile is thefollowing (or, can be obtained by adding dummy packets):

Queue lengthNumber of queues

At this stage, we have servers left for al-location. Therefore, by an argument similar to that in the proofof Lemma 7, there exists and an integer such that if

, then with probability at least , at leastof the longest queues are served (unless

, in which case all the of the longest queues are served).Thus, by the union bound, the probability that we have (or canobtain by adding dummy packets) the desired queue-length pro-file is at least . Define

. Then, there exists an integersuch that for all , we have

Thus, is a valid choice for ,completing the proof.

Proof of Lemma 8: WLG let , and let . ByLemma 5, we can add dummy packets to any of the queues andget a dominant system for all future time-slots. Hence, addingdummy packets if necessary, we consider a system where at thebeginning of the first time-slot, all the queues have a length. We make no further references to the original queue lengths

and refer the queues and the queue lengths in the “new” systemby and , respectively. Thus, at the beginning of the firsttime-slot, the queue-length profile is as desired for Claim 1 tohold, with . Therefore, at the end of at mosttime-slots and (by the union bound) with probability at least

, the maximum queue length in the system is no morethan . Since is independent of and , we canchoose large enough so that . This com-pletes the proof for the case . The proof for the casefollows an argument similar to that in the proof of Lemma 7. Inparticular, consider a system with queues and servers.

Page 14: 1608 IEEE/ACM TRANSACTIONS ON NETWORKING, …inlab.lab.asu.edu/Publications/BodShaYin_12.pdfa nice tradeoff between buffer-overflow performance and compu-tational complexity. These

BODAS et al.: LOW-COMPLEXITY SCHEDULING ALGORITHMS FOR MULTICHANNEL DOWNLINK WIRELESS NETWORKS 1621

This system has the same number of queues and servers, so theearlier result implies that there exists a constant such that theprobability that the longest queue length decreases in consec-utive time-slots is at least 1/2. This system sample path domi-nates a system with queues and servers (Lemma 5), sothe same constant works for the system with queues andservers, completing the proof.

REFERENCES[1] S. Bodas, S. Shakkottai, L. Ying, and R. Srikant, “Low-complexity

scheduling algorithms for multi-channel downlink wireless networks,”in Proc. IEEE INFOCOM, Mar. 2010, pp. 1–9.

[2] W. Forum, “Mobile WiMAX Part I: A technical overview and perfor-mance evaluation,” White Paper, Mar. 2006.

[3] Requirements for Evolved UTRA (E-UTRA) and Evolved UTRAN(E-UTRAN), G. T. 25.913, Mar. 2006.

[4] S. Bodas, S. Shakkottai, L. Ying, and R. Srikant, “Scheduling in multi-channel wireless networks: Rate function optimality in the small-bufferregime,” in Proc. SIGMETRICS/Performance Conf., Jun. 2009, pp.121–132.

[5] L. Tassiulas and A. Ephremides, “Dynamic server allocation to parallelqueues with randomly varying connectivity,” IEEE Trans. Inf. Theory,vol. 39, no. 2, pp. 466–478, Mar. 1993.

[6] L. Ying, R. Srikant, A. Eryilmaz, and G. Dullerud, “A large deviationsanalysis of scheduling in wireless networks,” IEEE Trans. Inf. Theory,vol. 52, no. 11, pp. 5088–5098, Nov. 2006.

[7] A. Stolyar, “Large deviations of queues sharing a randomlytime-varying server,” Queueing Syst., vol. 59, pp. 1–35, 2008.

[8] V. J. Venkataramanan and X. Lin, “Structural properties of LDP forqueue-length based wireless scheduling algorithms,” in Proc. Annu.Allerton Conf. Commun., Control, Comput., Monticello, IL, Sep. 2007.

[9] A. Stolyar, “MaxWeight scheduling in a generalized switch: Statespace collapse and workload minimization in heavy traffic,” Ann.Appl. Prob., vol. 14, no. 1, 2004.

[10] S. Shakkottai, R. Srikant, and A. Stolyar, “Pathwise optimality of theexponential scheduling rule for wireless channels,” Ann. Appl. Prob.,vol. 36, no. 4, pp. 1021–1045, Dec. 2004.

[11] S. Meyn, “Stability and asymptotic optimality of generalizedmaxweight policies,” SIAM J. Control Optimiz., vol. 47, no. 6, pp.3259–3294, 2009.

[12] L. Tassiulas and A. Ephremides, “Stability properties of constrainedqueueing systems and scheduling policies for maximum throughput inmultihop radio networks,” IEEE Trans. Autom. Control, vol. 4, no. 12,pp. 1936–1948, Dec. 1992.

[13] M. Andrews, K. Kumaran, K. Ramanan, A. Stolyar, R. Vijayakumar,and P. Whiting, “CDMA data QoS scheduling on the forward link withvariable channel conditions,” Bell Labs Tech. Memo, Apr. 2000.

[14] S. Shakkottai and A. Stolyar, “Scheduling for multiple flows sharing atime-varying channel: The exponential rule,” Ann. Math. Statist., vol.207, pp. 185–202, 2002.

[15] M. J. Neely, E. Modiano, and C. E. Rohrs, “Power and server allocationin a multi-beam satellite with time varying channels,” in Proc. IEEEINFOCOM, New York, NY, Jun. 2002, vol. 3, pp. 1451–1460.

[16] M. J. Neely, “Optimal energy and delay tradeoffs for multiuser wirelessdownlinks,” IEEE Trans. Inf. Theory, vol. 53, no. 9, pp. 3095–3113,Sep. 2007.

[17] M. Neely, E. Modiano, and Y.-S. Cheng, “Logarithmic delay forN N packet switches under the crossbar constraint,” IEEE/ACMTrans. Netw., vol. 15, no. 3, pp. 657–668, Jun. 2007.

[18] S. Sarkar, “Optimum scheduling and memory management in inputqueued switches with finite buffer space,” IEEE Trans. Inf. Theory,vol. 50, no. 12, pp. 3197–3220, Dec. 2004.

[19] S. Kittipiyakul and T. Javidi, “Delay-optimal server allocation in mul-tiqueue multiserver systems with time-varying connectivities,” IEEETrans. Inf. Theory, vol. 55, no. 5, pp. 2319–2333, May 2009.

[20] A. Eryilmaz, R. Srikant, and J. Perkins, “Stable scheduling policies forfading wireless channels,” IEEE/ACM Trans. Netw., vol. 13, no. 2, pp.411–424, Apr. 2005.

[21] P. R. Kumar and S. P. Meyn, “Stability of queueing networks andscheduling policies,” IEEE Trans. Autom. Control, vol. 40, no. 2, pp.251–260, Feb. 1995.

Shreeshankar Bodas received the B.Tech. degreein electrical engineering from the Indian Institute ofTechnology (IIT), Madras, India, in 2005, and thePh.D. degree in electrical and computer engineeringfrom The University of Texas at Austin in 2010.His Ph.D. dissertation focused on the design oflow-complexity scheduling algorithms for wirelessdownlink networks that guarantee low per-user delayin addition to network stability.He is currently a Postdoctoral Associate with the

Massachusetts Institute of Technology, Cambridge.His research interests include design and analysis of algorithms for wirelessnetworks and data center computations.

Sanjay Shakkottai (M’02) received the Ph.D. de-gree in electrical and computer engineering from theUniversity of Illinois at Urbana–Champaign in 2002.He is with The University of Texas at Austin,

where he is currently an Associate Professor with theDepartment of Electrical and Computer Engineering.His current research interests include network archi-tectures, algorithms, and performance analysis forwireless and sensor networks.Dr. Shakkottai received the NSF CAREER Award

in 2004.

Lei Ying (M’08) received the B.E. degree fromTsinghua University, Beijing, China, in 2001, andthe M.S. and Ph.D. degrees in electrical engineeringfrom the University of Illinois at Urbana–Champaignin 2003 and 2007, respectively.During Fall 2007, he worked as a Postdoctoral

Fellow with The University of Texas at Austin. He iscurrently an Assistant Professor with the Departmentof Electrical and Computer Engineering, Iowa StateUniversity, Ames. His research interest is broadly inthe area of information networks, including wireless

networks, mobile ad hoc networks, P2P networks, and social networks.Dr. Ying received a Young Investigator Award from the Defense Threat Re-

duction Agency (DTRA) in 2009, an NSF CAREER Award in 2010, and isnamed the Northrop Grumman Assistant Professor (formerly the Litton Indus-tries Assistant Professor) with the Department of Electrical and Computer En-gineering, Iowa State University, for 2010–2012.

R. Srikant (S’90–M’91–SM’01–F’06) receivedthe B.Tech. degree from the Indian Institute ofTechnology, Madras, India, in 1985, and the M.S.and Ph.D. degrees from the University of Illinois atUrbana–Champaign in 1988 and 1991, respectively,all in electrical engineering.He was a Member of Technical Staff with AT&T

Bell Laboratories, Holmdel, NJ, from 1991 to 1995.He is currently with the University of Illinois atUrbana–Champaign, where he is the Fredric G. andElizabeth H. Nearing Professor with the Department

of Electrical and Computer Engineering and a Research Professor with theCoordinated Science Laboratory. His research interests include communicationnetworks, stochastic processes, queueing theory, information theory, and gametheory.Prof. Srikant was an Associate Editor of Automatica, the IEEE

TRANSACTIONS ON AUTOMATIC CONTROL, and the IEEE/ACM TRANSACTIONSON NETWORKING. He has also served on the Editorial Boards of special issuesof the IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS and theIEEE TRANSACTIONS ON INFORMATION THEORY. He was the Chair of the 2002IEEE Computer Communications Workshop in Santa Fe, NM, and a ProgramCo-Chair of IEEE INFOCOM 2007. He is a Distinguished Lecturer for theIEEE Communications Society for 2011–2012.


Recommended