Scheduling Algorithms for OFDMA Broadband …...Scheduling Algorithms for OFDMA Broadband Wireless...

transcript

Scheduling Algorithms forOFDMA Broadband Wireless

Networks

Guy Grebla

Scheduling Algorithms forOFDMA Broadband Wireless

Networks

Research Thesis

Submitted in partial fulfillment of the requirements

for the degree of Doctor of Philosophy

Guy Grebla

Submitted to the Senate

of the Technion — Israel Institute of Technology

Shevat 5773 Haifa February 2013

This research was carried out under the supervision of Prof. Reuven Cohen, in the

Faculty of Computer Science.

Some results in this thesis have been published as articles by the author and research

collaborators in conferences and journals during the course of the author’s doctoral

research period, the most up-to-date versions of which being:

Reuven Cohen and Guy Grebla. Efficient allocation of CQI channels in broadband wirelessnetworks. In IEEE INFOCOM, pages 96–100. 2011.

Reuven Cohen, Guy Grebla, and Liran Katzir. Cross-layer hybrid FEC/ARQ reliable multicastwith adaptive modulation and coding in broadband wireless networks. In IEEE/ACMTransactions on Networking, volume 18, pages 1908–1920, 2010.

The generous financial help of the Technion is gratefully acknowledged.

Contents

List of Figures

Abstract 1

Abbreviations and Notations 3

1 Introduction 5

2 Cross-Layer Hybrid FEC/ARQ Reliable Multicast with Adaptive Mod-

ulation and Coding in Broadband Wireless Networks 9

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.1 Reliable multicast streaming service model . . . . . . . . . . . . 11

2.3.2 Using one or more rounds . . . . . . . . . . . . . . . . . . . . . . 12

2.3.3 The effect of AMC on schedule efficiency . . . . . . . . . . . . . 12

2.3.4 Combining multiple rounds and multiple MCSs . . . . . . . . . . 13

2.3.5 1-round RM-AMC(OC-1) is NP-hard . . . . . . . . . . . . . . . . 14

2.4 Algorithms for 1-round RM-AMC(OC-1) . . . . . . . . . . . . . . . . . . 15

2.4.1 Verifying the correctness of a solution . . . . . . . . . . . . . . . 15

2.4.2 An optimal algorithm for 1-round RM-AMC(OC-1) with a small

number of MCSs . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4.3 A heuristic for 1-round RM-AMC(OC-1) based on the Unbounded

Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5 Extending RM-AMC(OC-1) to Multiple Rounds . . . . . . . . . . . . . 18

2.5.1 The R-rounds RM-AMC(OC-1) problem . . . . . . . . . . . . . . 18

2.5.2 An optimal algorithm for R-round RM-AMC(OC-1) with a small

number of MCSs . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5.3 A heuristic for R-round RM-AMC(OC-1) with a large number of

MCSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.5.4 Unbounded number of rounds . . . . . . . . . . . . . . . . . . . . 22

2.6 Simulation Study of the Various Algorithms . . . . . . . . . . . . . . . . 24

2.7 Extensions to other Optimization Criteria . . . . . . . . . . . . . . . . . 28

2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3 Joint Scheduling and Fast Cell Selection in OFDMA Wireless Net-

works 33

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3 Frequency Reuse Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4 The OFDMA Joint Scheduling Problem . . . . . . . . . . . . . . . . . . 38

3.5 OFDMA Joint Scheduling With Dynamic MCS Selection . . . . . . . . 40

3.6 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.6.1 Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.6.2 The Simulated Joint Scheduling Algorithms . . . . . . . . . . . . 47

3.6.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4 Multi-Dimensional OFDMA Scheduling in a Wireless Network with

Relay Nodes 51

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.3 Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.3.1 Inband vs. Outband Relaying . . . . . . . . . . . . . . . . . . . . 53

4.3.2 Our Scheduling Model . . . . . . . . . . . . . . . . . . . . . . . . 54

4.3.3 Frequency Reuse Models . . . . . . . . . . . . . . . . . . . . . . . 56

4.4 The Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.4.2 d-MCKP vs. Sparse d-MCKP . . . . . . . . . . . . . . . . . . . . 60

4.5 Scheduling Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.5.1 A Pseudo-Polynomial Time Algorithm . . . . . . . . . . . . . . . 63

4.5.2 A Water-Filling Algorithm . . . . . . . . . . . . . . . . . . . . . 65

4.6 Adapting Our Algorithms to Model-2 . . . . . . . . . . . . . . . . . . . 65

4.7 OFDMA Joint Scheduling with Relays in a sectorized cell . . . . . . . . 67

4.8 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.8.1 Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.8.2 Interference Model . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.8.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5 Efficient Allocation of Periodic Feedback Channels in Broadband Wire-

less Networks 75

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.3.1 CSI channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.3.2 Power of 2 allocation . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.3.3 CSI Allocation Framework . . . . . . . . . . . . . . . . . . . . . . 80

5.4 Algorithms for CSI Allocation . . . . . . . . . . . . . . . . . . . . . . . . 81

5.4.1 Optimization Criterion . . . . . . . . . . . . . . . . . . . . . . . . 81

5.4.2 CSI Allocation When the Tree Is Empty . . . . . . . . . . . . . . 82

5.4.3 CSI Allocation with No Change to Previously Allocated CSI

channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.5 Simulation Study and a Complete BS Scheme . . . . . . . . . . . . . . . 86

5.5.1 The Performance of Algorithm 5.1 and Algorithm 5.2 . . . . . . 86

5.5.2 A Complete BS Scheme . . . . . . . . . . . . . . . . . . . . . . . 87

5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

A Proofs for the Theorems of Chapter 2 91

A.1 The Proof of Theorem 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 91

B Simulation Interference Model 95

C Dynamic Programming Algorithm for d-MCKP 97

Hebrew Abstract i

List of Figures

1.1 Example of a network with RNs and their donor BSs . . . . . . . . . . . 7

2.1 The various algorithms proposed for RM-AMC(OC-1) . . . . . . . . . . 18

2.2 Probability that the designated receiver will correctly decode a data block

vs. the SNR it experiences for Algorithm 2.1, 2.2 and 2.6 . . . . . . . . 25

vs. the bandwidth limitation for Algorithm 2.1 and 2.2 . . . . . . . . . . 26

vs. the bandwidth limitation for Algorithm 2.2 and 2.6 . . . . . . . . . . 26

vs. the SNR it experiences for Algorithm 2.1, 2.3 and 2.5 with K = 6 . 27

vs. the SNR it experiences for Algorithm 2.1, 2.3 and 2.5 with K = 30

when Bmax is sufficient for 29 packets of the most bandwidth consuming

MCS and 1 packet of the second most bandwidth consuming MCS . . . 27

vs. the SNR it experiences for Algorithm 2.1, 2.3 and 2.5 with K = 30

when Bmax is sufficient for 25 packets of the most bandwidth consuming

MCS and 5 packets of the second most bandwidth consuming MCS . . . 28

vs. the bandwidth limitation for Algorithm 2.1, 2.3 and 2.5 . . . . . . . 28

2.9 The bandwidth for transmitting a data block vs. the SNR experienced

by the designated receiver for OC-2 . . . . . . . . . . . . . . . . . . . . . 30

2.10 The bandwidth for transmitting a data block vs. the SNR experienced

by the designated receiver from the lower-quality subgroup for OC-3 . . 30

3.1 A cell of a cellular network, divided into three sectors using antennas A1,

A2, A3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 An abstract structure of the LTE frame and subframe . . . . . . . . . . 37

3.3 A cell with 3 sectors and 3 users . . . . . . . . . . . . . . . . . . . . . . 37

3.4 The OFDMA subframes of a cell transmitted in the 3 sectors by antenna

A1, A2 and A3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.5 Entries of item i in the profit matrices used in our new MC-GAP algorithm 43

3.6 Simulation network model . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.7 Total profit improvement ratio over water-filling algorithm for the 4

algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.1 Cell containing R = 3 RNs . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.2 An abstract structure of the LTE frame . . . . . . . . . . . . . . . . . . 54

4.3 An abstract structure of the LTE subframe (F1 and F2 are two orthogonal

OFDMA subbands) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.4 The frequency reuse models considered in this chapter . . . . . . . . . . 57

4.5 Comparative difficulty of the the various problems related to this chapter 62

4.6 FFR in a cluster of 3 sectorized cells . . . . . . . . . . . . . . . . . . . . 67

4.7 The profit with 3 RNs divided by the profit with no RNs for two UE

distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.8 The profit with 3 RNs divided by the maximum profit for the two algorithms 71

distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.1 (a) A CSI super-channel consists of the same slot in every uplink OFDMA

frame; (b) a CSI channel consists of the same slot in every τ = 2i frames 76

5.2 An example of a labeled CSI allocation tree for a super-channel . . . . . 79

5.3 Examples for two collision-free allocations . . . . . . . . . . . . . . . . . 79

5.4 Fragmentation of a CSI channel . . . . . . . . . . . . . . . . . . . . . . . 79

5.5 Consecutive packets transmitted to MSj using correct CSI value . . . . 81

5.6 A CSI tree with its 4 max-free subtrees (black nodes are occupied) . . . 84

5.7 Normalized profit of Algorithm 5.1 vs. the number of MSs (load) . . . . 87

5.8 Total profit of Algorithm 5.1 divided by total profit of Algorithm 5.2 vs.

the number of MSs (load) . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.9 The complete BS scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.10 The profit achieved by the proposed scheme divided by the maximum

profit that can be achieved using Algorithm 5.1, as a function of the

threshold t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.11 Average number of changes per event of the proposed scheme as a function

of the threshold t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.12 Average number of changes per event of the proposed scheme as a function

of the average number of MSs for t = 0.94 . . . . . . . . . . . . . . . . . 90Tec

Abstract

In this thesis, we define and study problems related to scheduling in OFDMA wireless

networks. In such networks, the BS (Base Station) receives packets destined for its

mobile stations. Downlink bandwidth is used to transmit the packets, and since this

bandwidth is a limited resource, a careful optimization is required.

We start by addressing the problem that arises when the BS wishes to multicast

information to a large group of nodes and to guarantee a certain level of reliability. The

problem is to determine which MCS (Modulation and Coding Scheme) should be used

by the BS for each packet. We present several variants of this problem, which differ in

the number of rounds during which the information delivery must be completed.

A crucial step in the evolution of broadband wireless (cellular) networks is reducing

the size of the cells and increasing their number. This target is usually obtained using

cell sectorization, where the omni-directional antenna at each BS is replaced by 3 or 6

directional antennas. While each sector can run its own scheduling algorithm, bandwidth

utilization can be significantly increased if a joint scheduler makes these decisions for all

the sectors. This gives rise to the “joint scheduling” problem, addressed in this thesis

for the first time.

LTE-advanced and other 4G OFDMA standards allow relay nodes (RNs) to be

deployed as a substitute for BSs. Each RN is associated with a donor BS, to which

it is connected through the wireless link. In a network with RNs, packet scheduling

decisions must be made in each cell not only for the BS, but also for the RNs. Because

the scheduler in a network with RNs must take into account the transmission resources

of the BS and the RNs, it needs to find a feasible schedule that does not exceed the

resources of a multi-dimensional resource pool. This makes the scheduling problem

computationally harder than in a network without RNs. In this thesis we define and

study this scheduling problem for the first time.

Advanced OFDMA technologies such as MIMO require each mobile station to

send many feedback messages to the BS. This feedback consumes much of the uplink

bandwidth, mainly because it is sent periodically. Therefore, the uplink bandwidth to

these indicators must be allocated very carefully, while achieving certain optimization

objectives. We propose a framework for the allocation of periodic feedback channels

to the nodes of a wireless network, and scheduling algorithms that allow the BS to

optimize this allocation.

Abbreviations and Notations

ACK : Acknowledegment

AMC : Adaptive Modulation and Coding

ARQ : Automatic Repeat reQuest

BS : Base Station

CoMP : Coordinated Multipoint Transmission

CQI : Channel Quality Indication

CSI : Channel Status Information

d-KP : d-dimensional Knapsack Problem

d-MCKP : d-dimensional Multiple-Choice Knapsack Problem

FEC : Forward Error Correction

FFR : Fractional Frequency Reuse

GAP : Generalized Assignment Problem

LTE : Long Term Evolution

MCKP : Multiple-Choice Knapsack Problem

MC-GAP : Multiple-Choice GAP

MC-MKP : Multiple-Choice Multiple Knapsack Problem

MCS : Modulation and Coding Scheme

MDS : Maximum Distance Separable

MIMO : Multiple Input Multiple Output

NACK : Negative Acknowledgement

OFDMA : Orthogonal Frequency Division Multiple Access

QoS : Quality of Service

RM-AMC : Reliable Multicast using Adaptive Modulation and Coding

RN : Relay Node

scheduled

: The minimum allocated block

scheduling

: Reuse-1 or reuse-1/3 areas

SFR : Soft Frequency Reuse

SINR : Signal to Interference plus Noise Ratio

SNR : Signal to Noise Ratio

UE : User Equipment

Chapter 1

Introduction

Reliable Multicast using Adaptive Modulation and Coding

A prominent feature of advanced wireless technologies such as WiMax/802.16 [45] and

3GPP/LTE [2] is the base station’s ability to transmit a single copy of a packet to

a group of receivers, a concept known as multicast. Indeed, streaming multicast is

considered as one of the most important applications in such networks.

To ensure a certain level of reliability, streaming multicast often uses application

layer FEC (Forward Error Correction) codes, with or without ARQ (Automatic Repeat

reQuest). In a typical FEC-based multicast, the sender creates from each data block

K + n packets, and every receiver must receive any K + ε of these packets in order to

correctly decode the data block [61]. In rateless erasure codes, the value of n can be

different for different data blocks.

Application layer FEC codes can be classified into two main groups: near-optimal

codes and optimal codes. In near-optimal codes, (1 + ε) ·K packets are required in order

to correctly decode the data block, while in optimal codes, K packets are required. We

assume that an MDS (Maximum Distance Separable) code is used in the application

layer FEC [31, 62]. MDS is a family of optimal codes that includes the well known

Reed-Solomon code.

In a hybrid FEC/ARQ-based scheme [6, 36, 58, 67], receivers that have not received

enough packets notify the sender by sending a NACK message [5], and the sender may

send additional repair packets. The number of such repair rounds is, in practice, limited

by real-time, buffer space, and similar considerations.

Adaptive modulation and coding (AMC) is crucial for increasing the performance of

broadband wireless networks. With AMC, the base station usually uses higher order

modulation (such as 16- or 64-QAM) and higher code rate (such as R=3/4 turbo code)

when transmitting unicast packets to nearby receivers, and lower order modulation

(such as QPSK) and code rate when transmitting unicast packets to distant receivers.

Multicast packets, however, are usually transmitted using low order modulation and

coding, because of the very high probability that at least one of the receivers is not

close enough to the base station.

In Chapter 2, we show that the base station can improve the performance of multicast

by optimizing the selection of an MCS for each individual packet. We are not aware of any

previous work that has addressed this cross-layer combination of application layer hybrid

FEC/ARQ with physical layer Adaptive Modulation and Coding (AMC). Therefore, to

the best of our knowledge, not only are the theoretical results and algorithms presented

new, but so is the problem itself.

Joint Scheduling in OFDMA Wireless Networks

A crucial step in the evolution of broadband wireless (cellular) networks is reducing

the size of the cells and increasing their number, in order to address the fast growing

demand for bandwidth. The major expenditure in the deployment of a wireless network

is installing base stations (BSs) and connecting them to the backbone. Thus, it is

important to increase the number of cells without the concomitant cost associated with

the deployment of many new BSs. This goal can be attained in one of the following two

ways, or by a combination thereof.

(a) Using cell sectorization: the omni-directional antenna at each BS is replaced by 3

antennas of 120 degrees, or 6 antennas of 60 degrees, all operated by the same BS.

(b) Using relay nodes: such relay nodes are governed by low-cost BS that have only

wireless connectivity to the backbone through their “parent” (regular) BS.

In Chapter 3, we consider the first approach. A cell is divided into multiple sectors,

each is served by a directional antenna, and all the antennas are governed by the same

BS. We define the new OFDMA (Orthogonal Frequency Division Multiple Access)

scheduling problem encountered by a BS in the proposed architecture as “OFDMA joint

scheduling,” because a single entity (the BS) needs to make scheduling decisions for

multiple transmitting sectors/antennas. This is a new OFDMA scheduling problem,

defined and solved for the first time in this thesis

Multi-Dimensional Scheduling in a Wireless Network with Relay Nodes

The advent of sophisticated mobile devices and new applications has made spectral opti-

mization crucial for wireless networks. New 4G technologies, such as LTE Advanced [4],

employ OFDMA in their physical layer and use new concepts such as MIMO, CoMP

and Relay Nodes (RNs) [8, 55, 60, 65, 66, 91] to increase the network throughput.

Deploying long-range wireless networks with good coverage is a complex task, one

that introduces a trade-off between cost and performance. One example of this trade-off

is the desire to decrease the size of the cells in order to increase the network bandwidth

available to every user. But decreasing cell size by adding more base stations (BSs)

increases installation costs substantially, because the most expensive factor in the

installation of a new BS is connecting it to the optical backbone.

��

Figure 1.1: Example of a network with RNs and their donor BSs

To overcome this barrier, 4G cellular standards allow RNs to be deployed as a

substitute for BSs. Unlike a BS, an RN is not directly connected to the backbone.

Rather, each RN is associated with a donor BS, to which it is connected through the

OFDMA wireless link (see Figure 1.1). Each user equipment (UE1) receives its data

packets either directly from the BS, or indirectly over the BS→RN→UE route. The

performance benefits from the deployment of RNs are three-fold: (a) increased network

density; (b) increased network coverage; (c) increased network roll-out speed.

An important task in the operation of a wireless network is packet scheduling. This

task comprises all real-time decisions that must be made by the BS before transmitting

data on the downlink channel: which data packets to transmit during the next OFDMA

subframe, which modulation and coding scheme (MCS) to use for each packet, whether

to transmit a packet directly to the UE or via an RN, and so on. In a network with

RNs, such scheduling decisions must be made for the RNs as well. In Chapter 4, we

propose the first packet-level scheduling algorithm for such networks.

Adding RNs to the network makes the scheduling problem computationally harder.

Without RNs, the BS needs to decide which packets to transmit and which MCS to use

for each transmitted packet. Each transmission of a packet using some MCS requires a

certain amount of bandwidth in the next subframe and is associated with a certain utility

function. The goal is to maximize the total profit without exceeding the total bandwidth.

Therefore, without RNs, the scheduling problem is equivalent to the known NP-hard

Multiple-Choice Knapsack Problem (MCKP) [49], to which excellent approximations,

heuristics and dynamic programming algorithms exist.

Efficient Allocation of Periodic Feedback Channels in Broadband Wire-

less Networks

In order to achieve high throughput in wireless networks, the transmitter needs to

obtain up-to-date information about the channel quality observed by the receiver. To

this end, in advanced wireless standards each mobile station (MS) periodically transmits

1Throughout this thesis, the terms UE and MS are equivalent and are used interchangeably.

to the base station (BS) its Channel Quality Indicator (CQI). CQI is a measure of the

downlink mobile channel, and is used by the BS to adapt the modulation and coding

parameters to the channel status of the corresponding node. These measurements also

play a major role in the BS’s scheduling algorithm [22, 25].

When Multiple Input Multiple Output (MIMO) technology is incorporated into 4G

wireless networks, the amount of feedback that must be transmitted from the MSs to

the BS increases dramatically. In the MIMO closed-loop spatial multiplexing mode, for

example, this feedback includes the Rank Indicator (RI), the Precoding Matrix Indicator

(PMI), and the Channel Quality Indicator (CQI). The BS uses the PMI reports to

determine how the precoding matrix should be configured for transmission. The RI

reports indicate the number of MIMO transmission layers available to the reporting

MS. All these indicators require a lot of expensive uplink bandwidth, mainly because

they are sent periodically as long as there is transmission on the downlink channel.

This expensive bandwidth is very often viewed as a major obstacle to the deployment

of MIMO and other advanced closed-loop wireless technologies. Therefore, the uplink

bandwidth to these indicators must be allocated very carefully, while achieving certain

optimization objectives.

Our framework, presented in Chapter 5, encompasses all common indicators, includ-

ing CQI, RI and PMI. However, we do not distinguish between the various indicators

and view them collectively as CSI (Channel Status Information) channels. Both

3GPP/LTE [3] and WiMax/802.16 [45] support periodic and aperiodic CSI feedback.

While aperiodic CSI feedback requires the BS to send a signaling message each time

it wants to receive a CSI report from an MS, periodic CSI feedback requires only one

signaling message for the allocation of a CSI channel and one for its release. The

allocation message indicates the location and periodicity of the CSI slots that comprise

the allocated CSI channel. Once a CSI channel is allocated, the MS transmits CSI

messages on the slots of this channel until it receives a deallocation message.

Chapter 2

Cross-Layer Hybrid FEC/ARQ

Reliable Multicast with Adaptive

Modulation and Coding in

Broadband Wireless Networks

2.1 Introduction

In this chapter, we show that the base station can improve the performance of multicast

by optimizing the selection of an MCS for each individual packet. We are not aware

of any previous work that has addressed this cross-layer combination of Application

layer hybrid FEC/ARQ with physical layer Adaptive Modulation and Coding (AMC).

Therefore, to the best of our knowledge, not only are the theoretical results and

algorithms presented new, but so is the problem itself.

The new problem we define is referred to as RM-AMC (Reliable Multicast using

Adaptive Modulation and Coding). RM-AMC has two main variants: for a pure FEC

scheme, where only one round is used for the delivery of every data block, and for a

hybrid FEC/ARQ scheme, where multiple rounds can be used. With one round, the

base station sends K + n packets for every data block and must decide:

• what the value of n should be;

• what MCS should be used for each of these K + n packets.

With multiple rounds, the sender needs to address these issues not only for the first

round, but for every additional one.

It is important to note that in the considered model FEC is used at the application

layer and MCS at the PHY layer. Therefore, the Application layer of the receiver can

correctly decode the data block if it receives any K packets, regardless of the MCSs

used to transmit these packets in the PHY layer.

The RM-AMC problem defined in this chapter and the algorithms for solving it

rely heavily on the concept of cross-layer optimization. That is, information retrieved

by a lower layer (PHY) is used by an upper layer (Application/Transport) in order to

improve the performance of the upper layer’s protocol.

The rest of this chapter is organized as follows. In Section 2.2, we discuss related

work. In Section 2.3, we describe the considered multicast service model, define the

RM-AMC problem, and prove that it is NP-hard. In Section 2.4, we present several

algorithms for RM-AMC. In Section 2.5, we extend RM-AMC to multiple rounds and

present a simulation study of the various algorithms in Section 2.6. In section 2.7, we

extend our results to more optimization criteria. Finally, Section 2.8 concludes the

chapter.

2.2 Related Work

In recent years, the number of important applications for multicast in broadband access

wireless networks has been growing steadily. One such application is Internet Protocol

Television (IPTV) over Wimax [75, 80], which is supposed to enable mobile users to

receive streaming video content.

The concept of reliable multicast for streaming and other applications has been

addressed by the IETF RMT (Reliable Multicast Transport) working group. This

working group has published several RFCs on large-scale multicast. The main proto-

col developed by the RMT WG for large-scale reliable streaming multicast is called

NORM (NACK oriented reliable multicast) [5], which employs the concept of hybrid

FEC/ARQ [36, 58, 67, 77, 76]. For a good overview of the RMT WG, see [2].

In [44], problems related to MAC layer multicast are studied. This paper does not

study Application layer hybrid FEC/ARQ for reliable multicast, but is more concerned

with Physical layer transmission codes. When the sender wants to send a message, it

splits it into several hierarchical layers and transmits each layer using its own MCS

(modulation and coding scheme). The MCS depends on the importance of the encoded

layer. Similar ideas are also presented in [50].

In [51], three schemes to adaptively change the MCS of multicast packets are

discussed. In each scheme, the sender uses the channel conditions of the receivers

to determine, for every packet, which MCS to use. The three schemes have different

reliability and throughput. However, unlike our work, [51] does not use FEC or ARQ.

In [36], Application layer FEC/ARQ is used, but without AMC. The sender encodes

every data block into multiple packets. It is then supposed to get feedback messages

from the receivers in order to decide how many more packets to send for the same data

block. This is the standard Application layer hybrid FEC/ARQ proposed by NORM.

In [72], convolutional coding and nonuniform PSK modulation are combined to provide

greater efficiency. Nonuniform PSK is used to transmit additional information to the

more capable receivers.

In [81] and [82], the authors introduce and analyze a cross-layer framework for

video multicast. Several video layers are generated and bi data blocks, each of Ki bytes,

are used for every layer i. Each data block is encoded and expanded into N bytes using

an (N,Ki) Reed-Solomon code. Then, a packet composed of one byte from every data

block is generated using a modified multiple description coding scheme (MDC) in which

superposition coding is used to encode each layer using a different MCS. In [82], an

analysis is performed for the worst receiver in a Rayleigh channel. Neither [81] and

[82] consider ARQ. In addition, in both frameworks, every data block is encoded into

N bytes, where N is the same for all data blocks.

In [16], a layered coding approach that uses error correction coding within each

packet and erasure correction coding across the packets is proposed. The authors

consider a Nakagami wireless channel and optimize the transmission assuming the

transmission rate is continuous. They show that the performance is close to optimal

when the transmission is performed using a set of known MCSs. ARQ and multicast

transmission are not considered in this paper.

In [92], optimal partitioning of receivers into groups for multirate multicast is studied.

A dynamic programming algorithm that finds an optimal partition is presented. In [47],

algorithms for the problem of maximizing the aggregate receiver utility for the case of

multirate multicast sessions are presented. As in our work, several MCSs are used in

order to increase performance. However, Application layer FEC/ARQ is not applied.

2.3 Preliminaries

2.3.1 Reliable multicast streaming service model

In this chapter we consider a streaming multicast service for which full reliability is

neither possible nor essential. It is not possible due to: (a) occasionally bad wireless

channel conditions and intermittent disconnection introduced by mobility of the hosts;

(b) the streaming nature of the broadcast data, which puts hard limits on the time the

delivery of every data block must be completed. Full reliability of streaming multicast

is not essential because streaming applications (audio and video) can tolerate data loss.

If the loss is temporary, it might not even be noticed by the user due to the robustness

of the audio/video codecs. If the loss is long in duration, e.g., due to a physical obstacle

between a mobile node and the base station, the user will probably want to continue

receiving the audio/video multicast despite the blackout period.

For the RM-AMC problem defined in this chapter, one may consider several opti-

mization criteria, all of which are related to the “designated group.” This group does

not include all the nodes that join the multicast group, but only those whose wireless

channel is “not too bad” because satisfying nodes whose wireless channel is too bad

would consume too much bandwidth. The designated group contains only nodes whose

SNR is above some threshold. Those are the nodes to which some level of QoS has to be

guaranteed. The optimization criterion considered throughout most of this chapter is:

OC-1 Let pi be the probability that the ith receiver of the designated group will

correctly decode the data block. Maximize mini(pi), while guaranteeing that the

total bandwidth is not larger than Bmax.

In Section 2.7, we address other optimization criteria as well.

Consider a multicast packet sent by the base station. The probability that a certain

receiver will correctly receive this packet is determined by the receiver’s SNR (signal-to-

noise ratio). Throughout the chapter we assume that for two receivers a and b, if the

SNR of a is higher than the SNR of b, then the probability that a will correctly receive

a multicast packet is not smaller than the probability that b will correctly receive the

same packet. This is true regardless of the MCS used by the base station for the PHY

layer encoding of this packet. This implies that in OC-1, the minimum probability

should only be guaranteed to the receiver with the worst SNR from the designated

group. For the rest of the chapter, such a receiver will be referred to as the designated

receiver.

2.3.2 Using one or more rounds

An optimal solution for RM-AMC(OC-1) depends on the number of rounds the sender

can use for sending the packets of a certain data block. If only one round is possible,

the sender needs to decide how many packets should be sent in this round and what

MCS should be used for each of them. These packets are then transmitted, and no

more packets can be used for this data block.

If R > 1 rounds are possible, we assume that after every round of transmission

the sender receives a feedback message about the outcome of the previous round. The

sender will use this information to decide how many new packets should be broadcast

in the next round for the same data block, and what MCS should be used for each.

The exact feedback the base station should receive in every round depends on the

optimization criteria we want to address. Receiving a feedback message from every

individual receiver is impractical because it leads to the well-known feedback implosion

problem. For OC-1 it is sufficient to receive a feedback message from only one receiver,

as discussed in Section 2.5.1.

2.3.3 The effect of AMC on schedule efficiency

In what follows, we give some examples of the relationship between the PHY layer AMC

and the schedule efficiency. Consider two MCSs, MCS-1 and MCS-2. Suppose that

when a packet is encoded using MCS-1, it requires twice the bandwidth required by

MCS-2. On the other hand, suppose that the probability that the designated receiver

will correctly receive an MCS-1 packet is 1 − ε, where ε is very close to 0, and the

probability that it will correctly receive an MCS-2 packet is only 12 . Suppose also that

K = 2 and that the bandwidth B is sufficient for (a) 2 MCS-1 packets, or (b) 4 MCS-2

packets, or (c) 1 MCS-1 packet and 2 MCS-2 packets. With only one round, the best

choice is (a). Using this option, the probability that the designated receiver will correctly

decode the data block is (1 − ε)2, compared to(42

)4+(43

)4+(44

)4= 11

16 for

option (b), or 2(1− ε) · 12 · 12 + ε · 12 · 12 + (1− ε) · 12 · 12 ≈ 34 − ε for option (c).

If K = 2 and the available bandwidth B is sufficient for transmitting only 1 MCS-1

packet or 2 MCS-2 packets, the best choice is of course the latter, because the success

probability is 14 compared to 0.

Finally, suppose that the available bandwidth B is sufficient for transmitting 3

MCS-2 packets or 1 MCS-1 packet and 1 MCS-2 packet. In this case, the best choice is

to transmit 3 MCS-2 packets. The probability that the designated receiver will correctly

decode the data block is(32

)· 12

2 , compared to (1− ε) · 12 = 12 − ε

2 using

only 1 MCS-1 packet and 1 MCS-2 packet.

2.3.4 Combining multiple rounds and multiple MCSs

To see how we can increase the performance by increasing the number of rounds, suppose

that K = 2 and that the available bandwidth B is sufficient for transmitting 3 MCS-2

packets or 1.5 MCS-1 packets. Suppose also that the probability that the designated

receiver will correctly receive an MCS-1 packet is 1− ε, and the probability that it will

correctly receive an MCS-2 packet is 12 .

Definition 2.3.1. A transmission configuration is a vector τ = (τ1, . . . , τN ) of N

integers that describes the packets transmitted by the sender for a given data block.

Element τj in this vector indicates the number of packets transmitted using MCS-j.

The optimal 1-round transmission configuration is to transmit 3 MCS-2 packets,

in which case the probability of the designated receiver to correctly decode the data

block is(32

)· 12

2 . The optimal 2-round protocol starts with a single MCS-2

packet. If the packet is correctly received by the designated receiver, the base station

transmits a single MCS-1 packet in the next round. If the first transmission fails, the

base station transmits two MCS-2 packets in the next round. The probability that the

designated receiver will correctly decode the data block is 12 · (1− ε) + 1

2 · 12 · 12 = 58 − ε

which is higher than for the 1-round optimal transmission configuration (12).

For the rest of this subsection, we generalize the above example and show that

when one uses MCS-1 and MCS-2 as defined above, the probability that the designated

receiver will correctly decode the data block converges to 1 when the number of rounds

increases. This is not a straightforward example when the bandwidth allocated for the

transmission of each data block is limited.

Let K = n + 1, and suppose that the available bandwidth B is sufficient for

transmitting 2n+ 1 MCS-2 packets or n+ 0.5 MCS-1 packets. The optimal 1-round

transmission configuration is 2n+ 1 MCS-2 packets, in which case the probability that

the designated receiver will correctly decode the data block is 12 .

The optimal 2n+ 1-round schedule is to transmit a single MCS-2 packet in every

round until the number of packets correctly received by the designated receiver is strictly

larger than the number of incorrectly received packets. Then, in the next (and last)

round, the sender should transmit as many MCS-1 packets as possible. Let r be the last

transmission round of an MCS-2 packet. Since r must be odd, let r = 2k+1. At the end

of round r, there are k+1 correctly received packets, and K−k−1 = n+1−k−1 = n−kmore packets are required to correctly decode the data block. The remaining bandwidth

is sufficient for transmitting n− k MCS-1 packets, which guarantees (with probability

1− ε) that the receiver will be able to correctly decode the data block.

Denote the transmission results as a binary vector, where the ith bit indicates

whether the designated receiver correctly received the packet transmitted in the ith

round. The probability that the designated receiver will not be able to decode the data

block is equal, up to an ε, to the probability that every prefix of this vector will not

contain more 1s than 0s. Note that in this case the vector is of size 2n+ 1 and all the

packets are transmitted using MCS-2.

We now show that the number of binary vectors of size 2n+ 1 for which no prefix

contains more 1s than 0s is(2n+1n+1

). Let A be the set of binary vectors of size 2n+ 1 that

have n+ 1 0s and n 1s. Let B be the set of binary vectors of size 2n+ 1 for which every

prefix contains no more 1s than 0s. Clearly, |A| =(2n+1n+1

). We now present a bijection

f : A→ B. Given a ∈ A, if a ∈ B then f(a) = a. Otherwise, there is a prefix in a that

has more 1s than 0s. Consider the following transformation g on a: find the shortest

prefix that has more 1s than 0s and flip every bit in this prefix. Clearly, the number of

1s decreases by exactly 1. Note that g is reversible (simply find the first prefix that holds

more 0s than 1s and flip its bits). If g(a) ∈ B then f(a) = g(a); otherwise continue

to apply g on a until receiving a vector that belongs to B. The function f is bijective

since g is reversible and one can tell how many times g has been applied by the total

number of 1s. Thus, the number of vectors in B is equal to the number of vectors in A.

Now, note that the probability to receive each of the(2n+1n+1

)vectors is 1

22n+1 .

Thus, the probability that the designated receiver will correctly decode the data

block is 1 −[(

2n+1n+1

)/22n+1

]. Since

(2n+1n+1

)= (2n+1)!

(n+1)!n! ≤√2( 2n+1

e)2n+1√

2π(2n+1)

)n+1√

2π(n+1)(ne)n√

2π(n)≤

( 2n+1e

)2n+1√

(2n+1)

)n+1√

(n+1)(ne)n√π√n

=22n+1(n+ 1

2)n+1(n+ 1

2)n√2√n+ 1

(n+1)n+1√

(n+1)nn√π√n

≤(1 + 1

)n 22n+1√n≤ 22n+1√e√

we get that 1− (2n+1n+1 )

22n+1 ≥ 1−O(

), which converges to 1 as n grows.

2.3.5 1-round RM-AMC(OC-1) is NP-hard

We start by formally defining the 1-round RM-AMC(OC-1) problems:

Problem 1 (1-round RM-AMC(OC-1)):

Instance: The number K of packets required to correctly decode a data block, an

SNR for the designated receiver (the worst receiver in the designated group), an

upper bound Bmax on the bandwidth the sender can use for every data block,

and a collection of N MCSs: MCS-1, . . ., MCS-N . Each MCS-j is a pair (bj , fj),

where bj is the bandwidth cost for transmitting a packet using MCS-j and fj is

the function that translates from an SNR value to the probability that a receiver

with such an SNR will receive an MCS-j packet with no error. Without loss of

generality, we assume that bj ≤ bk holds for every j < k and that b1 = 1.

Objective: Find a transmission configuration such that the total bandwidth used for

all the packets is not larger than Bmax and the probability that the designated

receiver will correctly decode the data block is maximized.

Theorem 2.1. The decision version of RM-AMC(OC-1) for 1-round is NP-hard.

The proof is presented in Appendix A.

2.4 Algorithms for 1-round RM-AMC(OC-1)

2.4.1 Verifying the correctness of a solution

We now show how the sender can efficiently check whether OC-1 holds for a given

transmission configuration to the 1-round RM-AMC(OC-1) problem. Let t be the

number of packets in the transmission configuration and K be the number of packets a

receiver needs to correctly decode a data block. Let MCS(h) be the index of the MCS

used for the hth packet in the transmission configuration. Let V (h) be a vector with

two elements: V (h) = (pMCS(h), 1− pMCS(h)), where pMCS(h) is the probability that the

designated receiver will correctly receive an MCS(h) packet. Denote by U = (u0, . . . , ut)

the convolution of V (1), . . . , V (t). U is a vector of length t + 1, where ul, 0 ≤ l ≤ t

is the probability that the designated receiver will correctly receive exactly l packets.

Hence, the probability that this receiver will correctly decode the data block is∑t

l=K ul.

To efficiently compute the convolution of V (1), . . . , V (t), we divide this set of vectors

into 2 equal sets. We recursively compute the convolution of the vectors in each of the

2 sets and get two new vectors. Then, we compute the convolution of the returned new

vectors. We use the fact that the convolution of 2 vectors with size n can be computed

in O(n · log(n)) using Fast Fourier Transform [27]. Hence, each recursive step takes

O(t · log(t)) time and the total computation takes O(t · log2(t)).The O(t · log2(t)) computational complexity can be improved using the following

observation. When the convolution of two vectors creates a vector with more than

K elements, the resulting vector can be replaced by a short vector with exactly K

elements. The first K − 1 elements of the short vector are identical to those of the long

one. The Kth element is set to∑

i≥K yi, where yi is the ith element of the long vector.

Consequently, the Kth element indicates the probability that the designated receiver

will be able to correctly decode the data block. The information we lose in this process,

namely, how many packets the designated receiver will be able to decode in addition to

the required K packets, is not relevant.

If T (x) is the time required for computing the convolution of x short vectors, then

the following recursive equation holds:

T (x) ≤{

2 · T (x/2) + x log x if x < K

2 · T (x/2) +K logK Otherwise.(2.1)

Thus, for t ≥ K we get T (t) = O(t · log(K) +K · log2(K)).

2.4.2 An optimal algorithm for 1-round RM-AMC(OC-1) with a small

number of MCSs

Definition 2.4.1. An MCS is said to be unacceptable for a given SNR if the probability

that a packet will be correctly received by a receiver with such an SNR is almost 0.

Definition 2.4.2. MCS-1 is said to dominate MCS-2 for a given SNR if the probabilities

that a receiver with such an SNR will correctly receive an MCS-1 packet and an MCS-2

packet are almost identical, but the bandwidth used for transmitting an MCS-1 packet

is smaller than that used for transmitting an MCS-2 packet.

A transmission configuration that uses an unacceptable MCS is not optimal because

the contribution of the packets transmitted using this MCS does not justify their

bandwidth cost. A transmission configuration that uses a dominated MCS is not

optimal because it can be replaced with the dominating MCS that uses less bandwidth

without affecting the probability that a receiver will correctly decode the data block.

In many practical applications, there are at most 3 MCSs that are acceptable

and are not dominated by other MCSs. For such applications a brute-force search is

sufficient. Therefore, Algorithm 2.1 can be used to find an optimal solution for 1-round

RM-AMC(OC-1).

Algorithm 2.1 An optimal algorithm for 1-round RM-AMC(OC-1) with a small numberof MCSs

1: Set the list Lp to contain all possible transmission configurations whose bandwidth

≤ Bmax.

2: Find in Lp the transmission configuration m that maximizes the probability that

the designated receiver will correctly decode the data block, and store it in solval.

3: Return solval.

The running time of Algorithm 2.1 is O(β · (Bmax)N ) where β is the time complexity

for verifying that OC-1 holds and N is the number of MCSs.

2.4.3 A heuristic for 1-round RM-AMC(OC-1) based on the Unbounded

Knapsack Problem

We now present a heuristic for 1-round RM-AMC(OC-1), based on a reduction to the

Unbounded Knapsack Problem (UKP) [49]. UKP is an extension of USSP [49]. The

instance is a set S of item types s1, s2, . . . , sm and a capacity C. Each type si has a

weight w(si) and a profit p(si). The objective is to find a vector S′

= (s′1, . . . , s′m) of

items whose aggregated profit∑m

i=1 s′i · p(si) is maximum and whose aggregated weight∑m

i=1 s′i · w(si) is not larger than C.

To reduce an instance of this problem to an instance of UKP, each MCS is represented

by an item type, and the bandwidth limitation Bmax is translated into the capacity C.

The weight of a type is the bandwidth cost of the corresponding MCS, and the profit of

each type is the probability that a packet of the corresponding MCS will be correctly

received by the designated receiver. To transform a solution S′

= (s′1, . . . , s′m) for the

reduced UKP problem to a solution for RM-AMC(OC-1), we construct a transmission

configuration with s′i packets transmitted using MCS-i for every i.

Observation 2.4.3. The expected number of correctly received packets for a given

transmission configuration in the 1-round RM-AMC(OC-1) problem is equal to the

aggregated profit in the corresponding UKP problem.

UKP has a simple 2-approximation greedy algorithm whose running time is O(m ·log(m)) using sorting andO(m) using linear selection [49]. It also has a pseudopolynomial

time-optimal dynamic programming algorithm whose running time is O(m ·C) [49] and

an FPTAS [49].

When the number of MCSs is small, the number of UKP types is also small. Small

instances can be optimally solved in polynomial time [57]. This gives rise to the following

heuristic for the 1-round RM-AMC(OC-1) problem.

Algorithm 2.2 A heuristic for 1-round RM-AMC(OC-1) with a large number of MCSs

1: Reduce the 1-round RM-AMC(OC-1) instance to an UKP instance as described

above.

2: Run an algorithm for finding a solution S′

= (s′1, . . . , s′m) for the UKP instance.

3: Translate S′

to a solution for 1-round RM-AMC(OC-1), where the number of

packets transmitted using MCS-i is s′i.

The running time of Algorithm 2.2 is equal to the running time of the algorithm

used to solve the UKP problem in step 2. Note, however, that Algorithm 2.2 has no

performance guarantee even if UKP is solved optimally. To see this, consider two MCSs:

MCS-1 and MCS-2. Suppose that a packet encoded using MCS-1 requires twice the

bandwidth required by MCS-2. On the other hand, suppose that the probability that the

designated receiver will correctly receive an MCS-1 packet is 1− ε, and the probability

that it will correctly receive an MCS-2 packet is 14 . Suppose that the available bandwidth

Problem Algorithm Performance Time complexity

1-round RM-AMC(OC-1)Alg. 2.1 Optimal O(β · (Bmax)N )

Alg. 2.2 HeuristicThe time for solving

the reduced UKP problem

R-round RM-AMC(OC-1)Alg. 2.3 Optimal O

((Bmax)(N+3) ·K ·R

)Alg. 2.4 Heuristic O

((Bmax)3 ·N ·K ·R

)RM-AMC(OC-1)

Alg. 2.5 Optimal O(N ·K ·Bmax)with an unboundednumber of rounds

Figure 2.1: The various algorithms proposed for RM-AMC(OC-1)

B is sufficient for transmitting 1 MCS-1 packet or 2 MCS-2 packets and that K = 2. In

this case the transmission configuration returned by Algorithm 2.2 is composed of a

single MCS-1 packet. Consequently, the probability that the designated receiver will

correctly decode the data block is 0. In contrast, the optimal transmission configuration

for this instance is to send 2 MCS-2 packets, which results in probability 116 .

The table in Figure 2.1 summarizes the algorithms proposed in this section.

2.5 Extending RM-AMC(OC-1) to Multiple Rounds

We now describe how to extend 1-round RM-AMC(OC-1) to multiple rounds.

2.5.1 The R-rounds RM-AMC(OC-1) problem

The R-rounds RM-AMC(OC-1) problem is similar to the 1-round RM-AMC(OC-1),

except that there are up to R transmission rounds for the same data block. The number

of rounds R is chosen in advance to meet the delay constraint. If the application can

tolerate a higher delay, the sender will use a larger value of R. This will increase the

probability for successfully decoding the data block for a given value of Bmax. After

every round of transmission, the sender receives a feedback message about the number

of packets correctly received by the designated receiver during this round. Since the

base station does not know which node is the designated receiver, it should run an

algorithm similar to that proposed by NORM [5], where a receiver reports the number

of missing packets only if this report is not superseded by the reports already sent by

other receivers.

We now formally define the R-round RM-AMC(OC-1) problem:

Problem 2 (R-round RM-AMC(OC-1)):

Instance: The same as for 1-round RM-AMC(OC-1).

Objective: Find a transmission configuration to be used in each round, based on the

outcome of previous rounds, such that the total bandwidth used is not larger than

Bmax and the probability that the designated receiver will correctly decode the

data block is maximized.

Implementation Note: In practice, no node is “nominated“ as the designated

receiver. A practical way for the base station to know which node is the designated

receiver is to ask those receivers whose SNR is above the desired threshold (i.e., the

designated group) to report how many packets they are missing and their SNR. (This

is done after the first round of packets is broadcast.) Each such receiver draws a

random backoff time from a truncated exponential distribution. The random backoff

time depends also on the SNR, such that a receiver with a larger SNR will be likely to

wait longer. A receiver whose timer expires checks whether its SNR is smaller than the

smallest SNR reported so far. (Thus, each feedback sent by a receiver on the uplink

should be reflected by the base station on the downlink.) If it is not smaller, the receiver

suppresses its feedback. If it is smaller, the receiver sends a feedback message that

contains its SNR and the number of missing packets. The last reporting receiver is

considered to be the designated receiver. If more than 2 rounds are necessary, this

receiver will be explicitly queried by the base station in the next feedback rounds.

Theorem 2.2. The decision version of R-round RM-AMC(OC-1) is NP-hard1.

Observation 2.5.1. From the proof of Theorem 2.2 it follows that for every solution

for the R-round RM-AMC(OC-1) problem with K = 1, there is a solution with the

same performance guarantee and the same bandwidth limitation that uses only a single

round. The only benefit in using more than a single round in this case (K = 1) is the

possible reduction in total bandwidth cost.

2.5.2 An optimal algorithm for R-round RM-AMC(OC-1) with a small

number of MCSs

Let Gτ [≥ k] be the probability that at least k packets will be correctly received by

the designated receiver for a transmission configuration τ in 1-round RM-AMC(OC-1),

and let Gτ [k] be the probability that exactly k packets will be correctly received. In

Section 2.4.1 we showed how to find Gτ [≥ k] and Gτ [k]. We now assume their values

are given in an O((Bmax)N ·K

)size array, where Bmax is the total bandwidth allowed

for the transmission of the data block, N is the number of available MCSs, and K

is the number of packets required for decoding the data block. Let H(k, b, r) be the

maximum probability that the receiver will correctly receive at least k packets using a

protocol of r rounds whose total bandwidth consumption is b, and let Tb be the set of

1Following Theorem 2.1, this theorem is trivial for an arbitrary R. However, here we consider aconstant R > 1 that is known in advance and is not a part of the input.

all transmission configurations whose bandwidth consumption is b. We now define the

following equation for computing H(K,Bmax, R) using dynamic programming:

H(k, b, 1) = maxτ∈Tb

Gτ [≥ k], (2.2)

H(k, b, r) is the maximum of

c∑i=0

Gτ [i] ·H(max(k − i, 0), b− c, r − 1)

obtained for every transmission configuration τ ∈ Tc

where 0 ≤ c ≤ b.

Theorem 2.3. Eq. 2.2 calculates H(k, b, r) as defined earlier.

There are K ·Bmax ·R entries to compute. Each entry takes O(∑Bmax

i=0 |Ti| ·Bmax)

O((Bmax)(N+2)

)time. Therefore, the total time complexity is O

((Bmax)(N+3) ·K ·R

To return the transmission configuration that corresponds to the value of H(k, b, r), we

create an array T . During the computation of H(k, b, r) we update entry T [k, b, r] to

contain the transmission configuration used to obtain the value of H(k, b, r). This idea

is summarized in the following algorithm.

Algorithm 2.3 An optimal algorithm for R-round RM-AMC(OC-1) with a smallnumber of MCSs

1: Using Eq. 2.2 and dynamic programming, compute H(k, b, r) for 0 ≤ k ≤ K,

0 ≤ b ≤ Bmax, 1 ≤ r ≤ R.

2: curk ← K, curb← Bmax.

3: for i = 1 to R do

4: Use T [curk, curb,R− i+ 1] as the transmission configuration in the ith round

and subtract its bandwidth cost from curb.

5: After getting a feedback message about the outcome of the previous round,

subtract from curk the number of packets correctly received by the designated

receiver in the ith round.

6: end for

2.5.3 A heuristic for R-round RM-AMC(OC-1) with a large number

of MCSs

When the value of N is larger than 2-3 or the value of Bmax is in the order of several

hundreds, the running time complexity of Algorithm 2.3 renders it impractical. We now

describe a heuristic whose running time is much better.

In the beginning of every round, the algorithm is given the remaining bandwidth

and the number of packets the designated receiver has already correctly received. The

algorithm returns the transmission configuration for this round.

During every step of its execution the algorithm determines: (a) the amount of

bandwidth to be used in the next round, and (b) whether to use this bandwidth as

an input to Algorithm 2.2 or to use it for a transmission configuration that contains a

single MCS. If Algorithm 2.2 is used in every round and the UKP problem in Algorithm

2.2 is solved optimally, the solution produced by the heuristic has the same probability

as a solution for a single round with the same Bmax. We will see, in Section 2.6, that

combining Algorithm 2.2 with an algorithm that uses a single MCS is a good heuristic

for 1-round, and therefore it makes sense to use a similar rationale for multiple rounds.

Let τb be the transmission configuration returned by Algorithm 2.2 when running

with Bmax = b. Let τ jb be the transmission configuration containing only MCS-j packets

that uses the maximum possible bandwidth under bandwidth limitation b. Recall that

Gτ [≥ k] is defined as the probability that at least k packets will be correctly received by

the designated receiver for a transmission configuration τ in 1-round RM-AMC(OC-1),

and Gτ [k] is defined as the probability that exactly k packets will be correctly received.

We also define Ub[k] and Sjb [k] as follows:

Ub[k] =

{Gτb [k] if τb uses bandwidth b

0 Otherwise(2.3)

Sjb [k] =

jb [k] if τ jb uses bandwidth b

0 Otherwise.

Similarly, we define Ub[≥ k] and Sjb [≥ k] as follows:

Ub[≥ k] =

{Gτb [≥ k] if τb uses bandwidth b

0 Otherwise(2.4)

Sjb [≥ k] =

jb [≥ k] if τ jb uses bandwidth b

0 Otherwise.

When Ub[k] and Ub[≥ k] are computed using Eq. 2.3 and Eq. 2.4, τb is found using

Algorithm 2.2.

Let M(k, b, r) be the maximum probability that the designated receiver will correctly

receive at least k packets using an r-round algorithm whose total bandwidth consumption

is b when we use in every round Algorithm 2.2 or a single MCS. We now define the

following equations for computing M(K,Bmax, R) using dynamic programming:

M(k, b, 1) = max{Ub[≥ k], S1

b [≥ k], . . . , SNb [≥ k]}, (2.5)

M(k, b, r) = max

i=0 Uc[i]·M(max(k − i,0), b− c, r − 1)

∑ci=0 S

jc [i]·

M(max(k − i,0), b− c, r − 1)

for 1 ≤ j ≤ N and 0 ≤ c ≤ b.

Theorem 2.4. Eq. 2.5 calculates M(k, b, r) as defined earlier.

Note that in the computation of M(k, b, r) there are N + 1 elements from which

the maximum is taken. There are K ·Bmax ·R entries to compute. Each entry takes

O((Bmax)2 ·N

)time. Assuming that Algorithm 2.2 is solved using a 2-approximation

polynomial time algorithm [49], the total time complexity is O((Bmax)3 ·N ·K ·R

Using a similar idea to that presented in Section 2.5.2, we create an array T whose [k, b, r]

entry contains the transmission configuration used to obtain the value of M(k, b, r). We

now summarize the whole algorithm.

Algorithm 2.4 A heuristic for R-round RM-AMC(OC-1) with a large number of MCSs

1: Use dynamic programming to compute M(k, b, r) for 0 ≤ k ≤ K, 0 ≤ b ≤ Bmax,

1 ≤ r ≤ R, according to Eq. 2.5.

3: for i=1 to R do

4: Use T [curk, curb,R− i+ 1] as the transmission configuration in the ith round

and subtract the bandwidth cost of this transmission configuration from curb.

5: After getting a feedback message about the outcome of the previous round,

subtract from curk the number of packets correctly received by the designated

receiver in the ith round.

6: end for

2.5.4 Unbounded number of rounds

The case where the number of rounds is unbounded is interesting not only because of the

theoretical analysis, but also because it allows us to find the number of rounds for which

the performance is very close to the maximum possible with an unbounded number

of rounds. Since the bandwidth limit still holds, the number of rounds is, in practice,

limited by the maximum number of packets the sender can send. Thus, the unbounded

number of rounds is equivalent to R-rounds RM-AMC(OC-1) with R = Bmax.

Observation 2.5.2. Every optimal transmission configuration for the RM-AMC(OC-1)

with an unbounded number of rounds that uses more than one packet in any round can

be replaced by an optimal transmission configuration that uses exactly one packet in

every round.

Let F (k, b) be the maximum probability that at least k packets will be correctly

received by the designated receiver using a bandwidth cost of b. The following equation

is used for computing F (K,Bmax) using dynamic programming.

F (k, 0) =

{0 if k > 0

1 Otherwise.(2.6)

F (k, b) =

F (k, 0) if ∀j,bj > b

maxbj≤b((pj · F (k − 1, b− bj)+

(1− pj) · F (k, b− bj))) Otherwise.

Theorem 2.5. Eq. 2.6 calculates F (k, b) as defined earlier.

The computation of each entry requires O(N) operations and the total number of

entries is O(K ·Bmax). Thus, the total running time is O(N ·K ·Bmax).

During the computation of F (k, b) we update entry A[k, b] to contain the MCS used

to obtain the value of F (k, b). From Observation 2.5.2 we note that there exists an

optimal solution for the RM-AMC(OC-1) problem with an unbounded number of rounds

that uses a single packet whose MCS is A[i, b] in every round. The value of i indicates

the number of packets the designated receiver has to receive in order to correctly decode

the data block. It is equal to K minus the number of packets correctly received in all

previous rounds. The value of b indicates the bandwidth available for transmission of

this data block, namely, Bmax minus the bandwidth used in previous rounds.

We now present an optimal algorithm for RM-AMC(OC-1) with an unbounded

number of rounds.

Algorithm 2.5 An optimal algorithm for RM-AMC(OC-1) with an unbounded numberof rounds

1: Use dynamic programming to compute F (k, b) for 0 ≤ k ≤ K, 0 ≤ b ≤ Bmax,

according to Eq. 2.6.

3: while curk > 0 and curb ≥ b1 (i.e., the designated receiver has not yet correctly

received K packets and there is enough bandwidth for a new packet) do

4: In the ith round, transmit 1 packet using the MCS indicated in A[curk, curb]

and subtract the bandwidth cost of this packet from curb.

5: If the designated receiver correctly receives the last transmitted packet, curk ←curk − 1.

6: end while

Since there are at most Bmax rounds, the running time of Algorithm 2.5 is O(N ·K ·Bmax +Bmax) = O(N ·K ·Bmax).

2.6 Simulation Study of the Various Algorithms

The goal of this section is threefold:

• To compare the benefit of using multiple MCSs for the considered reliable multicast

application to the current practice of using only one MCS, for OC-1.

• To compare the performance of the various algorithms presented in this chapter

for OC-1.

• To evaluate the benefit of using multiple rounds.

Throughout this section we consider 7 possible MCSs. These MCSs and the cor-

responding probabilities of the designated receiver to correctly receive a packet for a

certain SNR are computed according to [12]. To compare the results of using multiple

MCSs with those of a single MCS, we now present the optimal single MCS algorithm:

Algorithm 2.6 An optimal algorithm for 1-round RM-AMC(OC-1) with a single MCS

1: For every MCS, build a transmission configuration with as many packets as can be

accommodated using a bandwidth of Bmax.

2: From all these transmission configurations, use the one that maximizes the proba-

bility that the designated receiver will correctly decode the data block.

The time complexity of Algorithm 2.6 is O(N ·Bmax · β).

The results reported in this section are for SNR values between 6dB and 10dB.

However, we saw similar results for different SNR values. Some of our graphs show the

probability that the designated receiver will correctly decode the data block vs. the

SNR it experiences. The SNR values displayed in those graphs are between 6dB and a

value for which the success probability is very close to 1 (> 0.999), because increasing

the SNR value further does not affect the success probability. In the considered SNR

range, there are up to 4 relevant MCSs, for which the success probability is greater than

Throughout this section we consider K = 6 packets per data block. We saw no

substantial differences when we increased K to 10. For every SNR value, we set

Bmax to be sufficient for 5 packets of the MCS that consumes the highest bandwidth,

plus 1 packet of the MCS that consumes the second-highest bandwidth. Figure 2.2

shows the probability that the designated receiver will correctly decode the data block

vs. the SNR it experiences for three algorithms: Algorithm 2.1 (optimal), Algorithm

2.2 and Algorithm 2.6 (optimal for 1 MCS). For Algorithm 2.2 we used an optimal

pseudopolynomial algorithm to solve the UKP problem. However, solving UKP using

the greedy algorithm instead of the optimal pseudopolynomial algorithm only slightly

reduces the performance of Algorithm 2.2.

Algorithm 2.2 performs very much like Algorithm 2.1 (the optimal algorithm) and

both are represented by a single curve. When we use a single MCS, the performance is

significantly worse. This is because the value of Bmax is not large enough for transmitting

6 packets using the best MCS for the designated receiver.

6 6.5 7 7.5 8 8.5 9 9.5

Algorithm 1 (optimal) and Algorithm 2 (UKP based heuristic)Algorithm 6 (single MCS)

Figure 2.2: Probability that the designated receiver will correctly decode a data blockvs. the SNR it experiences for Algorithm 2.1, 2.2 and 2.6

In Figure 2.3 and Figure 2.4 we concentrate on a single SNR, of 7.5dB, but consider

different Bmax values. In both graphs we show the probability that the designated

receiver will correctly decode the data block vs. Bmax. In Figure 2.3 we see that

Algorithm 2.2 performs very much like Algorithm 2.1 for most of the Bmax values.

However, there is a range of Bmax where Algorithm 2.2 is suboptimal because it sends

fewer than 6 packets, which results in a probability 0 that the designated receiver will

correctly decode the data block.

In Figure 2.4 we see that Algorithm 2.6 (the optimal single MCS algorithm) performs

very much like Algorithm 2.1 exactly in the same Bmax values for which Algorithm 2.2

performs poorly. This is because Algorithm 2.6 uses at least 6 packets as soon as the

bandwidth allows it.

We conclude that for 1-round RM-AMC(OC-1) with a small number of MCSs,

Algorithm 2.1 is recommended. For more than 3 MCSs, running both Algorithms 2.2

and 2.6 is recommended. From the two returned transmission configurations, the one

that maximizes the probability for the designated receiver to correctly decode the data

block should be chosen. This algorithm will be close to optimal for all Bmax values, and

its time complexity is equal to the time complexity of Algorithm 2.2. In addition, in

Algorithm 2.2 the greedy 2-approximation procedure is sufficient for solving the UKP

problem.

We now present simulation results for multiple rounds. We used K = 6, and set

Bmax to be sufficient for exactly 5 packets of the most bandwidth consuming MCS (for

every considered SNR) and 1 packet of the second most bandwidth consuming MCS.

130 140 150 160 170 180 190 200 210 220

Bandwidth limitation (Bmax)

Algorithm 1 (optimal)Algorithm 2 (pseudo polynomial UKP based heuristic)

Figure 2.3: Probability that the designated receiver will correctly decode a data blockvs. the bandwidth limitation for Algorithm 2.1 and 2.2

130 140 150 160 170 180 190 200 210 220

Algorithm 6 (single MCS)Algorithm 2 (pseudo polynomial UKP based heuristic)

Figure 2.4: Probability that the designated receiver will correctly decode a data blockvs. the bandwidth limitation for Algorithm 2.2 and 2.6

We compare the performance of 1 round to 2 rounds and to an unbounded number of

rounds. For 1 round we used Algorithm 2.1, for 2 rounds we used Algorithm 2.3 with

R = 2, and for the theoretical unbounded number of rounds we used Algorithm 2.5.

Recall that all these algorithms are optimal.

Figure 2.5 shows the probability that the designated receiver will correctly decode

the data block vs. the SNR it experiences. We see that the 2-round protocol performs

better than the 1-round protocol and very close to the unbounded number of rounds

protocol.

We increased K to 30, and set Bmax to be sufficient for 29 packets of the most

bandwidth consuming MCS and 1 packet of the second most bandwidth consuming

MCS. The results are depicted in Figure 2.6. We see that for larger values of K, the

benefit of using more rounds increases. This is because the bandwidth used by the

sender for each data block is greater, which allows more combinations of MCSs using

the information collected during every round.

Another interesting case is when K = 30 and Bmax is sufficient for 5 · 5 = 25

packets of the most bandwidth consuming MCS and 1 · 5 = 5 packets of the second

most bandwidth consuming MCS. The results for this case are depicted in Figure 2.7.

Compared to Figure 2.6, the success probability is smaller for SNRs < 8.5dB. The

reason is that less bandwidth is used and therefore more packets are transmitted using

the second most bandwidth consuming MCS. For low SNRs, the probability to receive

such a packet is very small. As the SNR increases, the probability increases until it

reaches 1 for SNR of 9dB, because for this SNR value the probability to receive a packet

transmitted using the second most bandwidth consuming MCS is close to 1. We can

see that for this case the benefit of using more rounds is greater than for the case where

K = 6.

6 6.5 7 7.5 8 8.5 9 9.5

Algorithm 5 (unbounded number of rounds)Algorithm 3 (2 rounds)Algorithm 1 (1 round)

Figure 2.5: Probability that the designated receiver will correctly decode a data blockvs. the SNR it experiences for Algorithm 2.1, 2.3 and 2.5 with K = 6

6 6.5 7 7.5 8 8.5 9

Figure 2.6: Probability that the designated receiver will correctly decode a data blockvs. the SNR it experiences for Algorithm 2.1, 2.3 and 2.5 with K = 30 when Bmax issufficient for 29 packets of the most bandwidth consuming MCS and 1 packet of thesecond most bandwidth consuming MCS

We also compared the performance of 1 round, 2 rounds, and the theoretical

unbounded number of rounds with different values of Bmax for K = 6 and constant SNR

of 8.5dB. The results, depicted in Figure 2.8, are similar to those reported in Figure 2.5.

We also compared the performance of Algorithm 2.3 to the performance of Algorithm

2.4 with 2 rounds, for K = 6 and K = 30. Bmax is set to be sufficient for K − 1 packets

of the most bandwidth consuming MCS and 1 packet of the second most bandwidth

consuming MCS. Despite the fact that Algorithm 2.3 is optimal while Algorithm 2.4

is only heuristic, we found no difference in their performance. We made a similar

observation when we used other sets of parameters.

6 6.5 7 7.5 8 8.5 9

Figure 2.7: Probability that the designated receiver will correctly decode a data blockvs. the SNR it experiences for Algorithm 2.1, 2.3 and 2.5 with K = 30 when Bmax issufficient for 25 packets of the most bandwidth consuming MCS and 5 packets of thesecond most bandwidth consuming MCS

100 120 140 160 180 200 220

Figure 2.8: Probability that the designated receiver will correctly decode a data blockvs. the bandwidth limitation for Algorithm 2.1, 2.3 and 2.5

In summary, we saw that using multiple MCSs improves the performance for OC-1.

We also saw that when the optimal algorithm (Algorithm 2.1) cannot be used, Algorithm

2.2 is the second best. In addition, we saw that increasing the number of rounds from 1

to 2 for OC-1 improves the performance significantly for some SNR values. However,

increasing the number of rounds further adds no significant improvement. Finally, we

saw that the polynomial time heuristic for multiple rounds (Algorithm 2.4) yields results

similar to those of the pseudopolynomial optimal algorithm.

2.7 Extensions to other Optimization Criteria

While OC-1 is an important optimization criterion for reliable multicast, other opti-

mization criteria are relevant as well. In this section we describe two such criteria:

OC-2 Minimize the total bandwidth, while guaranteeing that the probability of every

receiver from the designated group to decode a data block is ≥ P .

OC-3 Like OC-2, except that P might be different for different subgroups. For example,

one may define two subgroups: one with a good channel and one with a bad one,

and assign to the first subgroup a higher target probability.

We note that all the problems defined earlier, except for the unbounded number

of rounds case, are NP-hard for OC-2 and OC-3 as well. To see why, observe that

the decision version of RM-AMC(OC-2) is identical to the decision version of RM-

AMC(OC-1), and therefore RM-AMC(OC-2) is also NP-hard. To prove that OC-3 is

also NP-hard, it is sufficient to show that every OC-2 instance can be reduced to an

OC-3 instance. The reduction is trivial: select for every receiver the same probability

threshold considered for OC-2.

That OC-2 holds for a given transmission configuration can be verified using our

OC-1 verification procedure, described in Section 2.4.1. To verify that OC-3 holds for a

given transmission configuration, we divide the receivers into groups according to the

OC-3 thresholds. Then, we verify that OC-2 holds for each subgroup.

In practical applications there exists an MCS for which the probability of the receiver

with the worst SNR in the designated group to successfully receive a packet encoded

using this MCS is very close to 1. Let bm be the bandwidth cost of a packet sent using

this MCS. Transmitting only K packets using this MCS guarantees, with probability

very close to 1, that every receiver in the designated group of OC-2 and OC-3 will

successfully receive the data block. Therefore, an optimal solution for RM-AMC(OC-2)

and RM-AMC(OC-3) will have a bandwidth ≤ K · bm. This rationale can be used for

finding an optimal solution for OC-2/OC-3 using the following algorithm:

1. Let Lp contain all possible transmission configurations whose bandwidth ≤ K · bm.

2. Find in Lp the transmission configuration m that satisfies OC-2/OC-3 and has

the minimal bandwidth requirements. Store it in solval.

3. Return solval.

The running time of this algorithm is O(β · (K · bm)N ).

As we did for the optimal single MCS algorithm for OC-1 (Algorithm 2.6), we can

also define polynomial time optimal single MCS algorithms for OC-2 and OC-3, as

follows:

1. For every MCS, build a transmission configuration with as many packets as can

be accommodated using a bandwidth of ≤ K · bm.

2. From all these transmission configurations, choose the one that minimizes the

bandwidth and satisfies the relevant probability threshold(s).

We compared the performance of the OC-2 optimal algorithm and the OC-2 single

MCS algorithm for K = 6 and K = 10 with probability threshold of P = 0.99. The

results are depicted in Figure 2.9, which describes the bandwidth used for transmitting

a data block vs. the SNR. It turns out that both algorithms have the same performance

5 10 15 20 25

Optimal and single MCS for OC-2

Figure 2.9: The bandwidth for transmitting a data block vs. the SNR experienced bythe designated receiver for OC-2

for this set of parameters, as well as for every other set we used. Therefore, the graph

shows only one curve.

In Figure 2.9 we see that as the SNR value of the designated receiver increases, the

bandwidth remains constant up to a certain point and then drops until it reaches a new

step. This discrete drop between two constant bandwidth points can be explained as

follows. Although we see an increase in the SNR, a new MCS is relevant only when the

probability to receive a packet using this MCS becomes sufficiently high. As the SNR

further increases, it becomes sufficiently high for transmitting fewer packets using this

MCS. If we can transmit K packets using this MCS, the bandwidth remains constant

until the next MCS becomes relevant.

To simulate our OC-3 algorithms, we used K = 6 and two groups of receivers. The

group with the lower SNRs is assigned a probability threshold of 0.95 and the group

with the higher SNRs is assigned a probability threshold of 0.99. Figure 2.10 shows the

bandwidth vs. the SNR experienced by the designated receiver in the group with the

lower SNRs. The SNR value of the designated receiver in the group with the higher

SNRs was set to be higher by 2.0dB. We saw, again, no differences between the optimal

algorithm and the best MCS algorithm. Thus, only one curve is shown in the graph.

Similar results were obtained when we increased the value of K to 10.

5 10 15 20 25

Optimal and single MCS for OC-3

Figure 2.10: The bandwidth for transmitting a data block vs. the SNR experienced bythe designated receiver from the lower-quality subgroup for OC-3

When we used 3 OC-3 subgroups with probability thresholds of 0.95, 0.99 and 0.999

and a 2.0dB difference between the SNRs of the designated receivers, we got very similar

results.

For both OC-2 and OC-3, we found very specific scenarios for which the optimal

algorithm reduces the bandwidth consumed by the single MCS algorithm. Since the

maximum improvement we found was smaller than 8%, and since in most scenarios no

difference was found, we believe that the polynomial time single MCS algorithm should

be preferred for OC-2 and OC-3, unless the number N of MCSs is very small (2-3).

2.8 Conclusions

We defined a new problem, called RM-AMC, that arises when a base station in a

broadband wireless network wishes to multicast information to a large group of nodes

and to guarantee some level of reliability using Application layer FEC codes with or

without ARQ. The problem is to determine which PHY layer MCS the base station

should use for each packet. RM-AMC was shown to have several variants, depending

on the number of transmission rounds the sender can use. We defined an optimization

criterion, referred to as OC-1. We showed that RM-AMC(OC-1) is NP-hard for any

fixed number of rounds. We then presented several algorithms for one or more rounds

and studied their performance under different conditions. We then considered two other

optimization criteria, referred to as OC-2 and OC-3. We showed that RM-AMC remains

NP-hard, and compared the performance of the optimal and the single MCS algorithms

for OC-2 and OC-3 under different conditions.

Chapter 3

Joint Scheduling and Fast Cell

Selection in OFDMA Wireless

Networks

3.1 Introduction

In this chapter, a cell is divided into multiple sectors, each is served by a directional

antenna, and all the antennas are governed by the same BS (Fig. 3.1). We define the

new OFDMA (Orthogonal Frequency Division Multiple Access) scheduling problem

encountered by a BS in the proposed architecture as “OFDMA joint scheduling,” because

a single entity (the BS) needs to make scheduling decisions for multiple transmitting

sectors/antennas1.

In contrast to a “regular” scheduling algorithm, which only needs to decide which

packet should be transmitted in the next OFDMA 1ms subframe, the joint scheduling

algorithm also needs to determine which sector/antenna is the best for serving each

packet. This will not necessarily be the one with which the target user has the best

SINR. For example, if the user has the best SINR with antenna A1 but reasonable SINR

with A2, and the sector of A1 is more heavily loaded than A2, then a global optimum

is likely to be obtained by scheduling the transmission of this packet using the OFDMA

1Throughout this chapter we use the words antenna and sector interchangeably.

Figure 3.1: A cell of a cellular network, divided into three sectors using antennas A1,A2, A3.

resources of A2 rather than those of A1.

We show that the joint scheduling problem is equivalent to the known NP-hard

problem called Generalized Assignment Problem (GAP) if the scheduler does not have to

choose an MCS (Modulation and Coding Scheme) for each packet. However, to improve

the performance of joint scheduling we generalize it to also select the most appropriate

MCS for each packet. In this case we get a new theoretical NP-hard problem,

which combines two known NP-hard problems: GAP [26] and MCKP (Multiple Choice

Knapsack Problem) [49]. In addition to formulating this problem for the first time, we

also develop an efficient approximation for solving it.

The fact that the scheduler determines the transmitting sector for each user can be

viewed as an implementation of a concept sometimes known as “Fast Cell Selection.”

While this concept is currently not standardized by LTE (Long Term Evolution), we

believe that the results of this chapter can play an important role in the integration of

joint scheduling and fast cell selection into LTE.

The rest of the chapter is organized as follows. In Section 3.2 we discuss related work.

In Section 3.3 we present our OFDMA joint scheduling network model. In Section 3.4

we define the joint scheduling problem and show its equivalence to the NP-hard GAP

problem. In Section 3.5 we extend the joint scheduling problem to allow dynamic MCS

selection of each packet. This results in a new NP-hard problem to which we present a

new approximation. Section 3.6 presents an extensive simulation study and Section 3.7

concludes the chapter.

3.2 Related Work

To the best of our knowledge, we are the first to define a packet-level joint scheduling

scheme for an OFDMA wireless network. The problem of deciding which BS should

transmit to which user has been addressed [9, 20, 28, 64, 73, 74, 95]. We refer to this

problem as user-level fast cell selection, which is different from our packet-level fast

sector selection.

In [9], the authors formalize the cell selection problem as an optimization problem

and show that the problem is NP-hard. They propose approximation algorithms for

special cases of this problem and compare them to a greedy algorithm that selects

for every user device the BS with which it has the highest SINR. There are several

important differences between [9] and our work. First, the algorithms in [9] are for the

user-level and are therefore more appropriate for admission control. In contrast, our

algorithms are for packet-level, and are therefore appropriate for a real-time scheduler

that needs to make packet-level decisions once every 1ms subframe. Second, we allow

different MCSs to be used for every packet, while in [9] only one MCS is considered.

Finally, in [9] the profit associated with a (user, BS) pair is fixed, while in our work the

profit is dynamically determined.

In [95], an adaptive resource allocation scheme is proposed for OFDM networks.

The proposed scheme involves cell selection and adaptive modulation. Unlike in our

work, cell selection decisions and adaptive modulation decisions are made separately. In

addition, packet-level optimization is not performed. The target of the scheme in [95]

is to maximize the overall throughput. Thus, different QoS for different users is not

supported.

In [32], a joint scheduling scheme for joint processing fast cell selection is proposed

and evaluated. The scheme applies muting to the strongest neighbor cell for decreasing

interference to cell edge users. The scheme improves cell edge user throughput and cell

average user throughput, but overall optimization is not considered. In addition, this is

not a packet-level scheduling scheme, because it allocates the scheduling blocks on a

per user basis.

Earlier papers that mention the relationship between wireless scheduling and GAP

are [10, 39, 71]. In [10], the scheduling problem in MIMO wireless networks is formulated

as a GAP problem, and a general solution that uses adaptive proportional fair scheduling

is proposed. In [39], the multi-carrier proportional fair scheduling problem is shown to

be equivalent to GAP when each user always has data to transmit. In [71], the authors

address the problem of providing minimum rate guarantees to different service classes

in an OFDMA network.

In [28], the advantage of fast cell selection in High Speed Downlink Packet Access

(HSDPA) is investigated. Such a selection scheme is proposed and evaluated. Like the

scheme in [9], the proposed scheme is a user-level admission control scheme and not a

packet-level scheduling scheme.

In [20], two basic cell selection schemes are considered and a new handover decision

algorithm for improving cell edge throughput is proposed. In contrast to our scheme,

the algorithm in [20] aims at improving cell edge throughput, while we optimize overall

network performance.

In [34], several downlink scheduling schemes combined with fast cell selection are

proposed for WCDMA. Our work differs from [34] mainly in that our algorithms are

for OFDMA networks. Other differences are that in [34] (a) cell selection and MCS

selection are performed separately; (b) the scheduling is user- and not packet-level;

(c) a fast Rayleigh fading channel is assumed; (d) different QoS for different users is

not considered; and (e) throughput and fairness are improved but not overall network

performance.

In [73], a new cell selection strategy is proposed. In this scheme a node is more likely

to select a low power relay node as its serving station in order to reduce the interference

caused by this transmission. The proposed scheme is suitable for networks with low

power nodes. This scheme does not schedule the transmissions and its main goal is to

improve spectral efficiency.

While much work has been done on scheduling in wireless networks, only a few

papers address resource allocation in OFDMA networks [24, 42]. In contrast to our

work, these papers do not consider joint scheduling. In [24], the authors formulate the

OFDMA scheduling problem in the context of WiMax, and propose efficient algorithms

for solving it. The BS determines which packets will be transmitted in each OFDMA

frame, using which MCS, and how the OFDMA frame matrix will be constructed. This

paper is probably the first to propose to model the MCS selection as an instance of

MCKP. When the BS needs to make scheduling decisions for multiple consecutive frames

rather than for each frame separately, the packet selection problem is also shown to be

similar to GAP.

3.3 Frequency Reuse Model

In general, algorithms for joint scheduling depend to a large extent on the network

model, and in particular on the frequency reuse model employed by the network. In

order to make our contribution more concrete, we present our algorithms in the context

of the FFR (Fractional Frequency Reuse) model [53, 68], which is the most common

frequency reuse model in wireless networks. However, the algorithms are applicable even

if another frequency reuse model is used, including SFR (Soft Frequency Reuse) [53]

and reuse-1.

Throughout the chapter we consider a cell with 3 sectors. However, all our results

are applicable to cells with 6 or any other number of sectors. In Fig. 3.1 we showed a

division of an OFDMA cell into 3 sectors. Fig. 3.3 shows a schematic description of such

a division, but this time with the implementation of FFR. Bandwidth is partitioned

into N + 1 subbands: F0, F1, F2 and F3 (N = 3 in the figure). Subband F0 is used

by all three sectors at the same time (reuse-1) and is intended for users who can get

a relatively good SINR from this band despite interference from neighboring sectors.

Subband F1 is used only by sector 1. Therefore, users receiving their transmission in

this subband will not suffer from interference due to neighboring sectors. Similarly,

subband F2 is used only by sector 2 and subband F3 only by sector 3. Thus, the reuse

factor of F1, F2 and F3 is 1/3.

As an example for a typical frame in an OFDMA network, Fig. 3.2 shows a schematic

structure of an LTE 10 ms frame2. The frame is divided into 10 1ms subframes, and

the scheduler needs to make a scheduling decision for each. The frame can be logically

viewed as divided into the 4 subbands mentioned above. Each subband consists of

several scheduled blocks3. The total number of scheduled blocks in a subframe depends

on the system capacity; it is 100 in a 20MHz system, for example.

Throughout the chapter, each reuse-1/3 or reuse-1 area that corresponds to a sector

within a cell, is referred to as a scheduling area. In our case, there are 6 such areas:

F01, F02, F03, F11, F22, and F33 (see Fig. 3.3), where Fij indicates that this scheduling

2We are trying to abstract the problem in the most generic way. Therefore, we skip some of the LTEphysical layer details that are not directly relevant to the description of the problem and algorithms.

3A scheduled block is the minimum allocation unit. Its size is equal to 12 ·14 = 168 OFDMA symbols.The bit capacity of a symbol depends on the MCS of the packet; e.g., with a modulation of 16-QAMand a coding rate of 3/4, each symbol accommodates 4 · 3/4 = 3 bits.

1 scheduled

1ms subframe

10ms frame

Figure 3.2: An abstract structure of the LTE frame and subframe

��

��this sector is

used byAntenna A2

this sector isused byAntenna A1

this sector isused byAntenna A3

F22F01

Figure 3.3: A cell with 3 sectors and 3 users

area is in the Fi bands and the transmitting antenna is Aj. Before the transmission of

every subframe, the joint scheduler needs to decide how to fill up the 6 scheduling areas.

Its output is 3 subframes: one for transmission by antenna A1 in sector 1, for which

F01 and F11 are used; one for transmission by antenna A2 in sector 2, for which F02

and F22 are used; and one for transmission by antenna A3 in sector 3, for which F03

and F33 are used.

As in standard wireless networks, in the considered model the BS receives periodic

channel state indicators (CSIs) [21] from the users. Using these reports, the BS is able

to predict the SINR for the transmission to the user using each scheduling area.

The joint scheduler not only needs to determine which packet will be sent by which

antenna and in what scheduling area, but also what MCS should be used for each packet.

By selecting the appropriate MCS for every packet, the scheduler can significantly

increase bandwidth utilization. For example, suppose that the transmission of a certain

packet requires 1.3 scheduled blocks using the default MCS. In such a case, the scheduler

must allocate 2 scheduled blocks because only integral numbers of blocks can be allocated

to each packet. Now, suppose that the scheduler is given the option to use other MCSs

for this packet. Specifically, it can choose a more efficient but less robust MCS, which

requires only 0.9 scheduled blocks and reduces the probability for successful transmission

from 0.97 to 0.9. By choosing this MCS, the scheduler reduces the transmission cost of

this packet by 50%, because only 1 scheduled block is needed rather than 2. This is

accomplished with a profit reduction of only 7.22% (from 0.97 to 0.9).

3.4 The OFDMA Joint Scheduling Problem

In this section we define and study the basic problem of OFDMA joint scheduling,

where we assume that each packet can be transmitted in every scheduling area using at

most one MCS. This default MCS is chosen in the following way:

• If the SINR enables the user to receive a packet with a probability not smaller

than 1 − ε, then the MCS that consumes minimum bandwidth and guarantees

this probability is chosen.

• Else, the most robust MCS (which guarantees the highest success probability) is

chosen.

The value of ε may vary from one packet to another depending on the application QoS

requirements. In our scheduling model, we assume that the transmission of a packet in

each scheduling area is associated with a profit that depends on the following parameters

(see [23] for more details): (a) the importance of this packet for the sending application;

(b) the importance of transmitting the packet in this subframe, rather than in a future

one; and (c) the probability that this packet will be successfully received by the user.

The success probability for transmitting a given packet varies from one scheduling

area to another. Thus, the profit of a packet is also likely to be different for different

scheduling areas.

As an example, consider Fig. 3.3 with three users: a, b and c. Suppose that:

• packet1 of user a can be transmitted either in the reuse-1/3 area of sector 1 (F11) or

in the reuse-1/3 area of sector 3 (F33). Suppose that in the former case, the default

MCS that guarantees the 1− ε success probability is 16-QAM with a coding rate of 1/2,

which is translated into 0.9 scheduled blocks. Since allocation is possible using only

integral numbers of scheduled blocks, 1 scheduled block is actually needed. Suppose

that in the case where the packet is transmitted by sector 3 in F33, the default MCS

that guarantees the 1− ε success probability is QPSK with a coding rate of 2/3, which

is translated into d1.35e = 2 scheduled blocks.

• packet2 of user b can be transmitted either in the reuse-1/3 area of sector 3 (F33)

using [64-QAM, 5/6], or in the reuse-1 area of sector 3 (F03) using [16-QAM, 3/4].

• packet3 of user c can be transmitted in the reuse-1/3 area of sector 1 (F11) using

[64-QAM, 5/6], or in the reuse-1/3 area of sector 2 (F22) using [16-QAM, 2/3], or in

the reuse-1 area of sector 1 (F01) using [16-QAM, 3/4].

Based on the input above, and on the input regarding other waiting packets of

all users, the scheduler should determine which packet will be transmitted in each

scheduling area (F01, F02, F03, F11, F22 or F33) during the next OFDMA subframe.

The decision regarding the MCS to be used for every packet is a consequence of the

selected scheduling area. To this end, the scheduler needs to solve the following problem:

Problem 3 (OFDMA Joint Scheduling):

Instance: The scheduler is given a set of scheduling areas for OFDMA joint scheduling,

and the number of scheduled blocks to be allocated in each. The scheduler is also

given a set of packets that are awaiting transmission in the next subframe. For

each packeti, the scheduler is given a set of feasible scheduling areas for which the

packet’s receiver has sufficiently good SINR. From this information, the scheduler

determines the default MCS and the success probability for transmitting the

packet in each scheduling area. Then, the scheduler determines the number of

scheduled blocks required for transmitting the packet in each scheduling area, i.e.,

the transmission cost, and the profit for each transmission. All this information is

considered as input for the OFDMA joint scheduling problem.

Objective: Find a feasible schedule that maximizes the total profit for the next

subframe. A feasible schedule is a mapping between waiting packets and scheduling

areas such that: (a) at most one scheduling area is chosen for each packet; (b)

the number of scheduled blocks available in every scheduling area is not exceeded;

and (c) for each user no two packets are scheduled to be transmitted by different

antennas in the same subbands at the same time.

To understand restriction (c), consider Fig. 3.4. This figure shows the OFDMA

subframe transmitted by each antenna when a cell is divided into 3 sectors. Recall that

subband F0 is the reuse-1 area, occupied by all 3 sectors. If the scheduler decides that

A1 and A2 have to transmit to the same user using subband F0, i.e., one packet is

scheduled in F01 and another in F02, the user will be able to decode at most one of

these packets. We avoid such a collision using restriction (c).

��

A1 A2 A3

Figure 3.4: The OFDMA subframes of a cell transmitted in the 3 sectors by antennaA1, A2 and A3

Lemma 3.4.1. The set of feasible scheduling areas for each packet contains at most

one reuse-1 scheduling area.

Proof: Let p(Ai) be the power received by a user from the transmission of antenna Ai

in reuse-1 area F0i for i ∈ {1, 2, 3}. Suppose that the set of scheduling areas for a given

user contains the reuse-1 area F01. Thus, the SINR for the transmission of A1 is bigger

than 1 and therefore p(A1) > p(A2) + p(A3) (see Eq. (B.1) in Appendix B for more

details). This implies that the SINR for the transmission of A2 and A3 is smaller than 1

and therefore it is not possible for the transmission of A2 or A3 to have a good SINR.

Corollary 3.1. Restriction (c) is always met.

Lemma 3.4.2. Under the considered FFR model, Problem 3 is equivalent to the Gen-

eralized Assignment Problem (GAP). Thus, (a) the problem is NP-hard; (b) any α-

approximation algorithm for the Knapsack problem can be transformed into a (1 + α)-

approximation4 algorithm for Problem 3.

Proof: GAP is defined as follows [26]. The instance is a pair (B, I) and a 2D profit

matrix P , where B is a set of bins (knapsacks), I is a set of items, and P is a |B| × |I|matrix that indicates the profit and size for each item in each bin. The objective is to

find a subset U ⊆ S of items that has a feasible packing in B, such that the profit is

maximized. A feasible packing is a mapping of each item to at most one bin such that

the capacity of each bin is not exceeded.

We first show how to transform an instance of GAP into an instance of Problem 3

in polynomial time. Without loss of generality, we assume that the bin sizes are of

the same size S. Every bin is transformed into a reuse-(1/|B|) scheduling area with S

scheduled blocks. Every GAP item is transformed into a waiting packet whose size and

profit for each scheduling area are equal to the size and profit of the GAP’s item in

the corresponding bin. Note that condition (c) of Problem 3 holds for the constructed

instance.

Next, we present a polynomial time transformation of a Problem 3 instance into a

GAP instance. Every scheduling area is considered as a GAP bin whose size is equal to

the number of scheduled blocks in that area. Every packet is transformed into a GAP

item. For a given scheduling area, the size and profit are determined according to the

default MCS and the target success probability of the packet.

In [26] it is shown that GAP is NP-hard and that any α-approximation algorithm

for the Knapsack problem can be transformed into a (1 + α)-approximation algorithm

for GAP.

Knapsack is one of the most studied problems in combinatorial optimization [49].

Although it is NP-hard, it has many efficient algorithms. From Lemma 3.4.2 it follows

that the well-known polynomial time greedy 2-approximation for Knapsack can be

transformed into a 3-approximation algorithm for Problem 3. The algorithm for

Knapsack described in [54] will be transformed into a (2 + ε)-approximation algorithm

that runs in poly(n, 1/ε) time where n is the total input length.

3.5 OFDMA Joint Scheduling With Dynamic MCS Selec-

In the previous section we assumed that a packet is transmitted using a default MCS

based on the target success probability. The performance of the joint scheduler can be

improved if it is permitted to choose the MCS for every packet in every scheduling area

4Let popt be the total profit of the optimal solution and α ≥ 1. An α-approximation returns asolution whose profit is at least

QPSK 1/2 16-QAM 3/4Length Success Length Success

prob. prob.

packet1 3 (1− ε) 1 0.5

packet2 3 (1− 2ε) 1 0.4

Table 3.1: An example of the advantage of joint scheduling and MCS selection

instead. When the scheduler chooses a more efficient but less robust MCS for a packet,

it reduces the cost of the assignment but also reduces the profit, because the profit is

proportional to the transmission success probability.

As an example, suppose that there are 2 scheduling areas: SA1, which contains 3

scheduled blocks, and SA2, which contains 1 scheduled block. Suppose there are two

waiting packets whose scheduling parameters are identical in both scheduling areas, and

are shown in Table 3.1. If every packet can only be transmitted using its default MCS

[QPSK, 1/2], then only one packet can be accommodated in the next subframe. The

extension proposed in this section allows the joint scheduler to choose [16-QAM, 3/4]

for packet1 and to schedule both packets: one in SA1 and one in SA2. The new problem

is called “OFDMA Joint Scheduling with Dynamic MCS Selection,” and is formally

defined as follows.

Definition 3.5.1. A transmission instance is a combination of a scheduling area and

an MCS as determined by the scheduler for a given waiting packet.

Problem 4 (OFDMA Joint Scheduling with Dynamic MCS Selection):

Instance: Same as Problem 3, except that for each packeti, we are not given a set

of feasible scheduling areas but a set of feasible transmission instances, packet1i ,

packet2i · · · packetMi . Each such set may contain transmission instances from the

same scheduling area but with different MCSs.

Objective: Find a feasible schedule that maximizes the total profit for the next

subframe. A feasible schedule is a mapping between the waiting packets and their

transmission instances, such that: (a) at most one scheduling area is chosen for

each packet; (b) the number of scheduled blocks available in every scheduling

area is not exceeded; and (c) for each user no two packets are scheduled to be

transmitted by different antennas in the same subbands at the same time.

To solve Problem 4, we define a new general theoretical problem, which extends

GAP to allow multiple choices from each item. The new problem is called Multiple

Choice GAP (MC-GAP), and is defined as follows.

Problem 5 (MC-GAP):

Instance: A triplet (B, I, C) and a 3D profit matrix P , where B is a set of bins

(knapsacks), I is a set of items, C is a set of configurations, and P is a |B|×|I|×|C|

matrix that indicates the profit and size for each item in each bin using each

configuration.

Objective: Find a subset U ⊆ (I ×C) of [item, configuration] pairs that has a feasible

packing in B, such that each item is packed at most once, using one of its

configurations, and the profit is maximized.

Lemma 3.5.2. Problem 4 can be transformed into an instance of MC-GAP in linear

Proof: Every scheduling area of Problem 4 can be considered as a bin whose size is equal

to the number of scheduled blocks in that area. Every packet is mapped to an MC-GAP

item. Each MCS is an MC-GAP configuration. If the packet has a transmission instance

for a given scheduling area and a given MCS, the size and the profit are determined

according to this instance.

MC-GAP is a combination of two known NP-hard problems: GAP and MCKP

(Multiple Choice Knapsack Problem). In MCKP there is only one knapsack, i.e., only

one scheduling area, whereas in GAP there is only one choice (one MCS) for selecting an

item (a packet) into a knapsack (scheduling area). Although MCKP is NP-hard [49], it

has efficient approximations [13] and an optimal pseudo-polynomial time algorithm [49].

We now present an algorithm for solving MC-GAP. The algorithm extends the one

presented in [26] for solving GAP. Using the local-ratio technique [14], our algorithm

transforms any α-approximation algorithm for MCKP into a (1 + α)-approximation

algorithm for MC-GAP.

The local-ratio argument is as follows. Let F be a set of constraints and let

p(), p1(), p2() be profit functions such that p() = p1() + p2(). Then, if x is an r-

approximate solution with respect to (F, p1()) and with respect to (F, p2()), it is also an

r-approximate solution with respect to (F, p()). The proof is very simple [14]. Let x∗,

x∗1 and x∗2 be optimal solutions for (F, p()), (F, p1()), and (F, p2()) respectively. Then

p(x) = p1(x) + p2(x) ≥ r · p1(x∗1) + r · p2(x∗2) ≥ r · (p1(x∗) + p2(x∗)) = r · p(x∗).

An informal description of the algorithm is as follows. Let ALGMCKP be an α-

approximation algorithm for MCKP. Our algorithm first invokes ALGMCKP with respect

to the first bin of MC-GAP. Let S be the output of ALGMCKP. If there is only one

bin, S is the final output of the algorithm. Otherwise, the algorithm partitions the

profit matrix p into two profit matrices, p1 and p2, whose sum equals p. The partition is

explained in the following and demonstrated in Fig. 3.5 for item i. The profit matrix p1

leaves the profit of i in the first bin unchanged. For any other bin, if i does not appear

in S, its profit is set to 0 in p1 (Fig. 3.5(a)); otherwise, there is some configuration c

for which (i, c) ∈ S, and the profit of i is set to p(i, c, 1) in p1 (Fig. 3.5(b)). Matrix

p2 is defined as p2 = p − p1. The algorithm then ignores the first bin and continues

recursively with p2 as the new profit matrix. Let S be the solution returned by the

bin |B|bin 2

p(i,1,1)

p(i,2,1)

configuration 1

configuration 2

... .... . .

. . .configuration|C| p(i, |C|,1)

(b) Entries of itemi in p1 for the case wherei is not inS

bin |B|bin 2

p(i,1,1)

p(i,2,1)

...... ...

... ... ...

p(i, |C|,1)

configuration 1

configuration 2

configurationc

configuration|C|

p(i,c,1)

p(i,c,1)p(i,c,1)

p(i,c,1)

(c) Entries of itemi in p1 for the case where(i,c) ∈ S

p(i,1,2)

p(i,2,2)

p(i,1, |B|)

p(i,2, |B|)

bin |B|bin 2

p(i,1,1)

p(i,2,1)

...... ...

configuration 1

configuration 2

p(i, |C|,1) p(i, |C|,2) p(i, |C|, |B|). . .configuration|C|

(a) Entries of itemi in the original profit matrixp

Figure 3.5: Entries of item i in the profit matrices used in our new MC-GAP algorithm

recursive call. For every (i, c) ∈ S, if i is not already in S, it is added. Finally, the

algorithm returns S.

For a single bin, the returned solution of ALGMCKP is clearly a (1+α)-approximation.

If there are more bins, each time the algorithm returns from the recursive call and

considers another bin, the obtained profit increases by some amount X, while the profit

of the optimal solution increases by at most (1 +α) ·X. Therefore, the updated solution

is also a (1 + α)-approximation.

We now give a formal description of the algorithm.

A (1 + α)-approximation algorithm for MC-GAP

Recall that B is the set of bins, I is the set of items, C is the set of configurations,

and p is a |B|× |I|× |C| profit matrix. The value of p[i, c, j] indicates the profit of item i

in bin j using configuration c. We now construct from ALGMCKP a recursive algorithm

for MC-GAP. Since our algorithm dynamically updates the profit function, we use pj

to indicate the profit matrix at the beginning of the jth recursive call. Initially we set

p1 ← p, and we invoke the following Next-Bin procedure with j = 1:

Procedure Next-Bin(j)

1: Run ALGMCKP on bin j using pj as the profit function. Let Sj be the set of selected

(item, configuration) pairs returned by ALGMCKP.

2: Decompose the profit function pj into two profit functions p1j and p2j such that for

every i, c and k,

p1j [i, c, k] =

pj [i, c, j] if k 6= j and ∃c such

that (i, c) ∈ Sjpj [i, c, k] if k = j

0 Otherwise

and p2j = pj − p1j .3: if j < |B| then

4: Set pj+1 ← p2j , and remove the column of bin j from pj+1.

5: Invoke Next-Bin(j + 1). Let Sj+1 be the returned assignment list.

6: Let Sj be the same as Sj+1 except that for each item i, if i is assigned in Sj for

some c, (i, c) ∈ Sj , and it is not assigned in ∪|B|k=j+1Sk, then the assignment of (i, c)

to bin j is added to Sj .

7: Return Sj .

8: else return Sj = Sj .

9: end if

Theorem 3.2. If ALGMCKP is an α-approximation for MCKP, then our new MC-GAP

algorithm is a (1 + α) approximation for MC-GAP.

Proof: We use the notation p(S) to indicate the profit gained by assignment S. The

proof is by induction on the number of bins available when the algorithm is invoked. For

a single bin, S|B| is an α-approximation solution due to ALGMCKP, and therefore it is a

(1+α)-approximation with respect to p|B|. For the inductive step, assume that Sj+1 is a

(1 +α)-approximation with respect to pj+1. Matrix p2j is identical to pj+1 except that it

contains a column with profit 0. Thus, Sj+1 is also an (1+α)-approximation with respect

to p2j . Since Sj contains the items assigned by Sj+1, it is also a (1 + α)-approximation

with respect to p2j .

Profit matrix p1j has three components: (1) items in bin j, whose profit is the same

as in pj ; (2) items not in bin j, which belong to Sj ; their profit in any configuration is

identical to their profit in Sj (using the configuration specified in Sj); (3) the remaining

entries are all 0. Only components (1) and (2) of p1j can contribute profit to an

assignment. By the validity of ALGMCKP, Sj is an α-approximation with respect to

component (1). Therefore, the best solution with respect to component (1) will gain a

profit of at most α · p1j (Sj). Moreover, the best solution with respect to component (2)

will gain a profit of at most p1j (Sj), since the profit of these items is the same regardless

of where they are assigned and which configuration they use. This implies that Sj is a

(1 + α)-approximation with respect to p1j . According to the last step of the algorithm,

p1j (Sj) = p1j (Sj) and Sj is a (1 + α)-approximation with respect to both p1j and p2j .

Since pj = p1j + p2j , by the local-ratio argument, Sj is also a (1 + α)-approximation with

respect to pj .

The lower bound proven in Theorem 3.2 is tight. Namely, there are instances of

MC-GAP such that the profit returned by our new MC-GAP algorithm equals 1/(1 +α)

of the maximum profit. This is because instances of MC-GAP for which |C| = 1, i.e.,

there is only one configuration per item, are identical to instances of GAP. Furthermore,

our new MC-GAP algorithm on such instances is identical to the algorithm for GAP

presented in [26]. Since in [26] it is shown that the approximation ratio of the algorithm

for GAP is tight, the approximation ratio of our new MC-GAP algorithm is also tight.

Our new MC-GAP algorithm can be implemented by an iterative algorithm whose

running time is O(|B| · f(|I|, |C|) + |B| · |I| · |C|), where f(|I|, |C|) is the running time

of ALGMCKP.

From Theorem 3.2 it follows that the performance of our new MC-GAP algorithm

depends on the performance of ALGMCKP. The most efficient ALGMCKP is the algorithm

described in [13]. This algorithm finds a (1 + ε)-approximate solution in O(|I|2 · |C|/ε)time. Thus, it can be transformed into a (2 + ε)-approximation algorithm for MC-GAP

and Problem 4 whose running time is O(|B| · (|I|2 · |C|/ε) + |B| · |I| · |C|). In [37], a

(5/4)-approximation algorithm for MCKP whose running time is O(|I| · |C| · log |I|) is

proposed. This algorithm can be transformed into a (9/4)-approximation algorithm for

MC-GAP whose running time is O(|B| · (|I| · |C| log |I|) + |B| · |I| · |C|).

��

Figure 3.6: Simulation network model

3.6 Simulation Study

In this section we present Monte-Carlo simulation results for the algorithms proposed

in the chapter. The purpose of this section is three-fold. First, we evaluate our

approximation for the new MC-GAP problem by comparing its performance to that

of an exponential-time optimal algorithm. Since the problem is NP-hard, this part

of the study is conducted for small instances only. Second, we use the results of a

water-filling algorithm, which fills the scheduling areas in each sector, as a benchmark

to which we compare the performance of the algorithms proposed in Section 3.4 and

Section 3.5 under various network parameters. Third, we evaluate the performance

gain from considering both joint scheduling and dynamic MCS selection (MC-GAP)

compared to using only joint scheduling (GAP).

3.6.1 Network Model

Fig. 3.6 shows the LTE network considered in the simulation study. Scheduling is

performed for the cell in the center of the network, while the surrounding cells are

considered for the calculations of the SINR experienced by each receiver. Our interference

model and parameters are based on the 3GPP specifications [1] and on the work presented

in [85]. These parameters are summarized in Table 3.2. The number of reuse-1 blocks

in a 1-ms subframe is 40 in each sector and the number of reuse-(1/3) blocks is 20.

As proposed in [85], each antenna is 20 meters high, and has a vertical tilt of 16°.The distance between two antennas in neighboring cells is 1700 meters.

The average size of each packet is 3.5 scheduled blocks if it is transmitted using [QPSK,

1/2], which is the most robust MCS out of 7 possible MCSs. The success probability

for every [scheduling area, user, MCS] triplet is determined from the corresponding

SINR value using data taken from [12]. The profit from transmitting a packet to a user

using a particular MCS is taken as the corresponding success probability. Thus, our

utility function in this section aims at maximizing the expected number of successfully

delivered packets. The cost of transmitting a packet is equal to the discrete number of

scheduled blocks used for the transmission, which depends on the length of the packet

and the chosen MCS. The interference model of the network is described in Appendix B.

3.6.2 The Simulated Joint Scheduling Algorithms

We compare the performance of our algorithms to a standard water-filling algorithm,

which works as follows. Each user device is associated with the sector whose antenna

yields the best SINR. When a new packet is introduced, the algorithm first tries to

schedule the packet in the reuse-1 area of this sector using the default MCS. If there

are not enough scheduled blocks available in the reuse-1 area, the algorithm tries to

schedule the packet using the default MCS in the reuse-1/3 area of the same BS.

The benefit of our joint scheduling algorithms compared to this water-filling algorithm

can be divided into two parts. First, for each sector we solve the problem for both

1/3- and 1-reuse areas together, which can be viewed as intra-sector joint scheduling.

Second, we solve the problem for all the sectors in the cell together, which can be

viewed as inter-sector joint scheduling. To distinguish between the benefit from each

part, we implement two versions of each algorithm: one that uses only intra-sector joint

scheduling and one that uses both inter- and intra-sector joint scheduling. Thus, for the

rest of this section we refer to the following 4 algorithms:

• Alg-1: a GAP algorithm, used for inter-sector joint scheduling using only a default

MCS for each packet.

• Alg-2: the new MC-GAP algorithm proposed in this chapter, used for inter-sector

joint scheduling with dynamic MCS selection.

• Alg-3: a GAP algorithm, used for intra-sector joint scheduling using only a default

MCS for each packet.

• Alg-4: the new MC-GAP algorithm proposed in this chapter, used for intra-sector

joint scheduling and dynamic MCS selection.

For the simulations, we implemented modified versions of the approximation algo-

rithm for GAP (from [26]) and for MC-GAP (from Sections 3.4 and 3.5). The purpose

of these modifications is to improve the average-case performance of these algorithms

without affecting their lower bounds. Instead of considering the bins (scheduling areas)

in some arbitrary order, we consider 4 specific orderings, and choose the one that yields

the maximum profit. The considered orderings are as follows:

(a) an ordering where a reuse-1 bin is chosen before a reuse-(1/3) bin of the same

antenna.

(b) an ordering where a reuse-(1/3) bin is chosen before a reuse-1 bin of the same

antenna.

(c) an ordering where all reuse-1 bins are chosen before all reuse-(1/3) bins.

(d) an ordering where all reuse-(1/3) bins are chosen before all reuse-1 bins.

For the GAP and MC-GAP algorithms invoked for solving Problem 3 and Problem 4

respectively, we use as a procedure the optimal pseudopolynomial time algorithm for

MCKP [49]. Thus, both GAP and MC-GAP algorithms are 2-approximation.

3.6.3 Simulation Results

Throughout this section, to draw one point on a graph, 100 random instances are

generated and the results are averaged. First, we want to compare the performance of

our new MC-GAP algorithm to the optimal solution. Since MC-GAP is NP-hard, we

use an exponential time algorithm for finding the optimal solution for small instances

(15 packets) and compare this solution to the one found by our new MC-GAP algorithm.

We test different network parameters and the results show that the actual profit obtained

by our new MC-GAP algorithm is only 4-6% lower than that of the optimal solution.

This suggests that the new algorithm performs very well.

We now compare the performance of our algorithms to the standard water-filling

algorithm described in Section 3.6.2. We use 2 different running sets, which differ in

how user devices are distributed across a scheduling cluster. In Fig. 3.7(a) the user

devices are uniformly distributed, while in Fig. 3.7(b) the probability of a user device to

be in sector 1 is 20 times greater than its probability to be in sector 2 or sector 3. Both

figures show the ratio between the profit of each of the four algorithms described in

Section 3.6.2 and the profit of the water-filling algorithm, as a function of the normalized

load. The load is defined as the number of waiting packets divided by the total number

of scheduled blocks in the cell. The number of users is identical to the number of waiting

packets because we assign to each user one packet on average. In general, we see that

all 4 algorithms perform much better than the water-filling algorithm, and that the

performance gain increases when the load increases.

In Fig. 3.7(a) we see that the performance of Alg-1 is equal to that of Alg-3, and

the performance of Alg-2 is equal to that of Alg-4. This implies that in this setting, all

the benefit compared to the water-filling algorithm is attributed to intra-sector joint

scheduling. The reason is that when the users are uniformly distributed, there is no

advantage from scheduling a user using the resources of a remote sector. This is in

contrast to Fig. 3.7(b), where user distribution is not uniform; thus, Alg-1 is significantly

better than Alg-3 and Alg-2 is significantly better than Alg-4.

In the next set of simulations we investigate how user distribution affects the benefit

obtained by the various algorithms. The x-axis in Fig. 3.7(c) shows the ratio between

the probability of a user to be in sector 1 and the probability of a user to be in sector 2

or sector 3. As before, all 4 algorithms perform better than the water-filling algorithm,

and the gain increases when the unbalanced ratio increases. As expected, we can see

Parameter Value Parameter Value

network layout 7 BSs TX power 39dBm

system bandwidth 20MHz inter-site distance 1700m

BS antenna height 20m user height 1.5m

propagation loss Hatasystem frequency

1,500model model MHz

TX antenna gain 18.9dBi vertical tilt −16°vertical half

+10°horizontal half

+70°power beam power beamwidth (θ3dB) width (ϕ3dB)

side lobe20dB

front-to-back25dB

attenuation (SLAv) attenuation (Am)

Table 3.2: Simulation network parameters

that the contribution of inter-sector joint scheduling is significantly greater than the

contribution of intra-sector joint scheduling for higher values of unbalance ratio.

3.7 Conclusions

We addressed the new OFDMA joint scheduling problem encountered by a base station

(BS) that controls multiple sectors, we showed that it is equivalent to the well-known

NP-hard GAP problem. In order to further improve the joint scheduler’s performance,

we extended its role to also determine the MCS to be used for each packet. This resulted

in a new NP-hard problem, which we called MC-GAP, and for which we proposed an

efficient and practical approximation scheme. We conducted an extensive system level

simulation study of the various algorithms, under various network parameters, and

showed that the performance of the new MC-GAP algorithm is very close to optimal

and that our proposed joint scheduling algorithms significantly increase the throughput

of an OFDMA network.

(a) uniform user distribution

(b) non-uniform user distribution

(c) as a function of the unbalanced ratio

Figure 3.7: Total profit improvement ratio over water-filling algorithm for the 4 algo-rithms

Chapter 4

Multi-Dimensional OFDMA

Scheduling in a Wireless Network

with Relay Nodes

4.1 Introduction

In a network with RNs, the scheduler must also take into account the bandwidth

available to each RN. Thus, each packet transmission now has a 2-dimensional size: the

first dimension indicates the bandwidth resources required for the BS→RN transmission

and the second indicates the bandwidth resources required for the RN→UE transmission.

Thus, the scheduler must find a feasible schedule that does not exceed the resources of

a multi-dimensional resource pool, whose number of dimensions depends on the number

of RNs. This makes the scheduling problem in a network with RNs more similar to

an extension of MCKP into multiple dimensions, a problem known as d-dimensional

Multiple-Choice Knapsack (d-MCKP), which is computationally harder than MCKP. In

order to solve this problem for a network with RNs, we transform it into a less general

case of d-MCKP, called sparse d-MCKP, and propose efficient algorithms to solve this

new problem. One of our algorithms is proven to have a performance guarantee, and

can also be optimal for realistic input size.

For ease of presentation, we explain the main concepts of our proposed algorithms

for a BS with one omnidirectional antenna, although in many cellular networks that

employ RNs the BS uses multiple directional antennas (also known as sectors). For

such multi-sector networks, the algorithms proposed in this chapter can be invoked

independently for each sector, in which case the BS in each sector would also run an

independent scheduler, for its directional antenna and for each RN in its sector. If

this option is chosen, no changes are required to the proposed algorithms. A second

option would be to use one scheduler for all the sectors. In that case, the algorithms

must be combined with the algorithm proposed in Chapter 3 for joint scheduling in

Acronym Meaning

BS base station

CQI channel quality indication

d-KP d-dimensional knapsack problem

d-MCKP d-dimensional multiple-choice knapsack problem

GAP generalized assignment problem

MCKP multiple-choice knapsack problem

MCS modulation and coding scheme

RN relay node

SINR signal to interference plus noise ratio

UE user equipment (the mobile host)

Table 4.1: Abbreviations and acronyms used in the chapter

a multi-sector cellular network (Chapter 3 does not consider RNs). In Section 4.7 we

explain how this can be done.

The rest of the chapter is organized as follows. In Section 4.2 we discuss related

work. In Section 4.3 we present our scheduling network. In Section 4.4, we define the

new “OFDMA Scheduling with Relays and Dynamic MCS Selection” problem, which is

the core of this chapter. We show that it is NP-hard and equivalent to a special case of

d-MCKP. In Section 4.5 we present efficient algorithms for solving this new problem. In

Section 4.6 we show how to adapt our algorithms to inband relaying. In Section 4.7

we show how our algorithms can be extended to address the case where each cell has

multiple sectors, each with a BS and one or more RNs. Section 4.8 presents an extensive

simulation study and Section 4.9 concludes the chapter.

Table 4.1 summarizes the main abbreviations and acronyms used throughout the

chapter.

4.2 Related Work

We are the first to propose packet level scheduling algorithms for an OFDMA/LTE

network with relay nodes (RNs). We therefore classify the papers described in this

section into two groups. The first includes papers that propose packet level scheduling

for an OFDMA/LTE network without RNs. The second includes papers that address

scheduling related issues in a network with RNs.

Papers belonging to the first group are [9, 20, 73]. Both Chapter 3 and this one

propose a packet level scheduling algorithm to be employed by a scheduling logic at

the BS once every OFDMA subframe. Both chapters solve the most basic and most

important scheduling question: which transmitter will transmit which packet and using

what modulation and coding scheme (MCS). However, the two chapters solve different

problems. In Chapter 3, the scheduling decisions are made for multiple independent

sectors. This problem is shown to be related to the theoretical generalized assignment

problem (GAP). In contrast, in this chapter, scheduling decisions are made for a BS

and its RNs. The resulting problem is shown to be more similar to the theoretical

d-dimensional knapsack problem (d-KP) or d-MCKP problem. This is because every

packet to be transmitted via an RN must be allocated resources from 2 pools, and

these packets can be transmitted using any one of several possible MCSs. This makes

for a different and computationally harder problem than GAP. Section 4.7 links the

two chapters by combining the algorithm in Chapter 3 with the ones proposed here for

efficient scheduling in a cell with a multi-sector BS, assuming that each sector also has

one or more RNs.

Papers in the second group include [66], in which the authors consider a cell with

RNs, and assume that there is no direct wireless link between the BS and the UEs. The

UEs are either delay sensitive or non-delay sensitive, and algorithms that select one

of four possible transmission strategies for each UE are presented. In [93], user-level

admission control algorithms are proposed for inband relaying in OFDMA networks.

Two algorithms for utilizing spatial reuse are developed and are shown to improve the

throughput. In [84], the throughput of a network with RNs is improved through adaptive

frame segmentation and employing spatial reuse by RNs. In [38], the implementation

aspects and constraints of the simplest network coding schemes for a two-way relay

channel in LTE is considered.

In [17], [48], [65], and [70] relay strategies are compared. In [65], the downlink

performance for Layer-3 and Layer-1 relays is investigated. System-level simulations are

used to demonstrate the impact of several relay conditions. In [70], the performance of

several emerging half-duplex relay strategies in interference-limited cellular systems is

analyzed. The performance of each strategy as a function of location, sectoring, and

frequency reuse are compared with localized base station coordination. In [48], the

performance of an infrastructure based multi-antenna relay network in the absence

of a direct link is studied. As expected, their results show that the performance

depends on the location of the RNs. Finally, in [17], the authors evaluate relay based

heterogeneous deployment within the LTE-Advanced uplink framework. Different power

control optimization strategies are proposed for 3GPP urban and suburban scenarios.

4.3 Network Model

4.3.1 Inband vs. Outband Relaying

We consider a cell with a BS at its center and R RNs, as shown in Figure 4.1. Figure 4.2

shows a schematic structure of a 10-ms LTE frame, divided into 10 1-ms subframes1.

Each RN is connected to its BS by an OFDMA wireless link, using either inband or

outband relaying. In outband relaying, BS and RN transmissions use different subbands.

1We are trying to abstract the problem in the most generic way. Therefore, we skip some of the LTEphysical layer details that are not directly relevant to the description of the problem and algorithms.

Figure 4.1: Cell containing R = 3 RNs

��

1 scheduled block

1ms subframe

10ms frame

Figure 4.2: An abstract structure of the LTE frame

Therefore, they can transmit simultaneously in each subframe, with no interference

(Figure 4.3(a)). In inband relaying, however, the transmissions from the BS to the RNs

or to the UEs are performed over the same subbands as those from the RNs to the UEs.

Thus, simultaneous transmissions by the BS and RNs are not possible unless sufficient

isolation in time or in space is ensured. Figure 4.3(b) assumes such isolation: in every

two consecutive subframes, one is dedicated for transmissions from the BS and one for

transmissions by the BS and RNs to UEs (isolation in time). The BS and the RNs can

transmit together in every second subframe only if they are located far enough from

each other (isolation in space). Otherwise, only the RNs can transmit in every second

subframe.

4.3.2 Our Scheduling Model

In an LTE network with RNs, one may distinguish between distributed and centralized

scheduling. In distributed scheduling, each transmission entity, namely, a BS or an RN,

autonomously decides what to transmit in every subframe. In centralized scheduling,

all transmission decisions for the BS and the RNs are performed by the BS. In this

chapter we focus on centralized scheduling, because it has an important advantage over

distributed scheduling [41]: the scheduler has a global view of the network resources

and can optimize their usage. For instance, if an RN is overloaded, the BS can decide

to transmit to some UEs directly, even if these UEs have better SINR with the busy

RN than with the BS.

In the considered model, the BS receives periodic channel quality indications

• BS transmits

to UEs and

to RNs

• RNs only listen

• All RNs transmit

concurrently to

their UEs

1ms subframe

+F2• RNs only listen • RNs transmit

• BS transmits

to UEs

1ms subframe 1ms subframe

• BS transmits

to RNs

to UEs and

to UEs

(a) one LTE subframe (b) two LTE subframes

for outband relaying for inband relaying

Figure 4.3: An abstract structure of the LTE subframe (F1 and F2 are two orthogonalOFDMA subbands)

(CQI) [21] from the UEs and RNs. Using these reports, the BS is able to estimate

the SINR for transmissions from the BS to each UE or RN. The BS also receives CQI

reports on the SINR between each UE and its closest RN. These reports are either

transmitted directly by every UE to the BS, or forwarded by the RNs to the BS over an

RN→BS control channel. The BS uses this information to make the following scheduling

decisions:

• which packet to transmit;

• whether to transmit the packet directly or through an RN;

• if the packet is not transmitted directly, through which RN to forward it;

• which MCS (Modulation and Coding Scheme) to use for each transmission.

The scheduler determines how many scheduled blocks to allocate to every packet

according to the chosen MCS. Some MCSs are more efficient, i.e., require fewer scheduled

blocks, but are less robust to transmission errors. Other MCSs are less efficient but

more robust. Since there are several “pools” of scheduled blocks that the scheduler uses,

a more formal discussion will require the following definitions:

Definition 4.3.1. A scheduling area is a set of scheduled blocks to be assigned for

transmission by the same transmission entity (a BS or an RN).

In the outband relaying model, the scheduler needs to make a scheduling decision

every 1ms subframe for 1 BS scheduling area and R RN scheduling areas (Figure 4.3(a)).

Thus, the scheduler has to allocate resources from R+ 1 scheduling areas (pools) every

1ms. In the inband relaying model, the scheduler needs to make a scheduling decision

every 2ms for two consecutive 1ms subframes (Figure 4.3(b)). In the first 1ms subframe,

the scheduler allocates resources only from the BS scheduling area. In the second

1ms subframe, the scheduler allocates resources from the BS scheduling area and R

RN scheduling areas. Thus, in the inband relaying model, the scheduler has to make

decisions for R+ 2 scheduling areas every 2ms.

Definition 4.3.2. A transmission instance of a packet is a triple [packet, path, MCSs],

where path is either BS→UE or BS→RNi→UE, and MCSs is a list that indicates the

MCS to be used for the transmission of the packet over each link along the path (1

link if the path is BS→UE; 2 links if it is BS→RNi→UE). Each transmission instance

requires allocation of scheduled blocks from the corresponding scheduling area(s).

We adopt the profit-based scheduling model proposed in [23]. Thus, each transmission

instance of a data packet at time t is associated with a profit and a cost. The profit

depends on the following parameters:

(a) How important it is to the application that the packet be delivered at t.

(b) The probability that this packet will be successfully received by the UE. This

probability can be computed by the BS by taking into account (a) the SINR on

each wireless link (BS→UE or BS→RNi and RNi→UE); (b) the length of the

packet; and (c) the MCS used for transmitting this packet [12, 63].

While the profit of a packet is a scalar, the cost is a vector that has one or more

dimensions: one for each link over which the packet is scheduled. The cost on each link

is equal to the number of scheduled blocks required for transmitting the packet in the

scheduling area associated with this link. It depends on the length of the packet and

the MCS, and it is what makes the scheduling problem for a BS with RNs intractable.

4.3.3 Frequency Reuse Models

In addition to the decision whether to use inband or outband relaying, the frequency

reuse model must also be decided upon. In order to describe our algorithms in a specific

context, we focus on two models. However, these algorithms are easily adaptable to

other frequency reuse models as well. The first model, called model-1, is shown in

Figure 4.4(a) and is relevant for outband relaying. Here, bandwidth is partitioned into

N + 1 subbands: F0, F1, F2 and F3 (N = 3 in this figure). The BS in every cell uses

subband F0 (i.e., the BSs work using frequency reuse 1), while all the RNs in every

cell use either F1, F2 or F3. This guarantees that close RNs in neighboring cells use

different subbands. This combination of reuse-1 by the BSs and reuse 1/3 by the RNs

can be viewed as an implementation of FFR (Fractional Frequency Reuse), which is very

common in networks with no RNs [19, 53, 68, 88]. Since outband relaying is considered

for this model, the BSs and RNs use different orthogonal subbands. Thus, the BSs

transmit using high power, and they can reach the cell-edge UEs with no interference

from/to their RNs.

��

��BS

��

(a) model-1 (for outband relaying) (b) model-2 (for inband relaying)

different subbands are used only one subband is used everywhere

Figure 4.4: The frequency reuse models considered in this chapter

The second model, called model-2, is shown in Figure 4.4(b) and is relevant for

inband relaying. This model employs a complete reuse-1, where every BS and every

RN uses all subbands. To avoid interference with their RNs, the BSs transmit using

lower power than in model-1. But this power is sufficient to allow each BS to reach its

RNs with good SINR. The transmission power of the RNs is strong enough to reach

their cell edge UEs, but not so strong so as to interfere with other RNs in the same or

adjacent cell.

We emphasize that this chapter does not claim that the considered frequency reuse

model is the best for an LTE network with RNs. The decision about which model to use

depends on many factors and regulations that are beyond the scope of this chapter. We

use model-1 and model-2 because we believe that they are general enough for presenting

our ideas and algorithms in a concrete context.

4.4 The Scheduling Problem

This section is divided into two subsections. In the first subsection, we define the

scheduling problem in OFDMA networks with RNs and show hardness results. In the

second subsection, we define a new theoretical problem called sparse d-MCKP and show

that it is equivalent to our OFDMA scheduling problem.

4.4.1 Preliminaries

Throughout the section, the following lemma will be used in order to reduce the number

of transmission instances the scheduler considers for each data packet.

Lemma 4.4.1. If the scheduler is configured to use only links with an SINR>0dBm,

then: (a) a packet can be transmitted either directly or through the RN with which

the user has the best SINR, and not through any other RN; (b) each packet can be

associated with at most (M2 +M) transmission instances, where M is the number of

MCSs. (SINR >0dBm is chosen because transmission success probability for SINR ≤ to

it is very low [12].)

Proof: The SINR of a UE for a packet received from a BS or an RN is equal to SI+N0

where S is the received power at this UE from the transmitter, I is the interference

power of other simultaneous transmissions, and N0 is the noise power. In both model-1

and model-2, all RNs in the same cell transmit using the same subbands. Therefore,

when calculating the SINR of a UE for RNk, S = Pk and I ≥∑j 6=k Pj , where Pj is the

power of the signal received by this UE from RNj . This implies that for each UE there

can be only one RN for which the SINR > 1. This RN is the one for which the received

power at this UE is the highest, since in the expression of the SINR for any other RN

this power is considered as interference. To prove (b), note that by (a) a packet is either

transmitted directly, or through a specific RN, say RNi. In the first case, the MCS to be

used on the BS→UE link is one of the M possible MCSs. In the latter case, there are

M MCSs for the link BS→RNi, and M for the link RNi→UE. Thus, there are at most

M2 possible combinations, and the total number of transmission instances is (M2 +M).

We now define the “OFDMA Scheduling with Relays and Dynamic MCS Selection”

problem, which is the core of this chapter.

Problem 6 (OFDMA Scheduling with Relays and Dynamic MCS Selec-

tion):

Instance: The scheduler is given the number of scheduled blocks to be allocated in

each scheduling area. For each packeti, the scheduler determines the RN with

which the UE has the best SINR, say RNj . It then considers at most (M +M2)

transmission instances for transmitting this packet to the UE. M instances are for

the direct BS→UE transmission and M2 for transmissions through RNj , where

M is the number of MCSs. Each transmission instance is associated with a profit

and with a 2-dimensional size: one that indicates the number of scheduled blocks

for the transmission by the BS, and one that indicates the number of scheduled

blocks for the transmission by the default RN2. The latter is 0 if the packet is

transmitted directly over the BS→UE path.

Objective: Find a feasible schedule that maximizes the total profit. A feasible schedule

is one for which the number of scheduled blocks available in each scheduling area

is not exceeded.

As an example, consider a BS that has 3 packets waiting for transmission: packet1,

packet2 and packet3 to UE1, UE2 and UE3 respectively. Suppose that the default RNs

2The default RN is the RN for which the UE has an SINR>0dBm. By Lemma 4.4.1 there is onlyone such RN for each UE.

for these UEs are RN1, RN2 and RN3 respectively. Examples for two possible schedules

(schedule 1) packet1 is transmitted using MCS-1 to RN1 and then using MCS-2 to

UE1; packet2 is transmitted using MCS-1 to RN2 and then using MCS-1 to UE2;

packet3 is transmitted using MCS-3 directly to UE3;

(schedule 2) packet1 is transmitted using MCS-1 directly to UE1; packet2 is transmitted

using MCS-2 to RN2 and then using MCS-1 to UE2; packet3 is not transmitted

(it might either be transmitted during one of the next subframes or dropped by

the BS due to lack of bandwidth).

Technically, there are (M2 + M) different ways to transmit packet1, (M2 + M)

ways to transmit packet2, and M ways to transmit packet3. Thus, the total number

of different schedules are (M2 +M + 1)(M2 +M + 1)(M + 1). The “+1” covers the

case where the packet is not transmitted during this schedule. Obviously, the number

of possible schedules grows exponentially with the number of packets.

We start by showing hardness results for Problem 6 using a reduction from the NP-

hard two-dimensional knapsack problem (2-knapsack) [49]. An instance for 2-knapsack

is a set of n items and a 2-dimensional knapsack. Each item i has a profit pji ≥ 0 and a

2-dimensional size: si[1] and si[2]. The knapsack’s 2-dimensional size is [K1,K2] where

K1 and K2 are integers > 0. The objective is to find a feasible set of items with a

maximum profit. A feasible set of items is a set for which the total size of the selected

items in each dimension d is at most Kd.

When we compare a transmission instance in Problem 6 to an item of 2-knapsack,

we see clear similarity: both are associated with a scalar profit and with a 2-dimensional

cost. However, there is also one difference: in Problem 6 the 2nd element of the

2-dimensional cost refers to one of R different scheduling areas (i.e., R “knapsacks”),

while in 2-knapsack it refers to the same knapsack. For example, the scheduler may

have the following two transmission instances for two different packets:

1. A transmission instance [packet1, BS→RN1→UE1, MCSs1]. This instance has a

cost >0 in the scheduling areas of the BS and RN1.

2. A transmission instance [packet2, BS→RN2→UE1, MCSs2]. This instance has a

cost >0 in the scheduling areas of the BS and RN2.

This implies that while we can use 2-knapsack to show that Problem 6 is NP-hard,

we cannot simply run an algorithm for 2-knapsack to solve Problem 6. Note also that

2-knapsack is not a special case of Problem 6 with R = 1. This is because each packet

in Problem 6 can have more than one transmission instance from which at most one is

selected, while in 2-knapsack each item has only one configuration.

Lemma 4.4.2. Problem 6 is NP-hard. Moreover, it admits no EPTAS3.

Proof: In [49] it is shown that 2-knapsack is NP-hard. In [52] it is shown that

it is also unlikely to have an EPTAS. We first show how to transform an instance of

2-knapsack into an instance of Problem 6 in polynomial time such that a solution for

the transformed instance will also solve the 2-knapsack instance. We use a network

with a single RN (i.e., R = 1). Without loss of generality, let the knapsack size

be K for each dimension. The first dimension is transformed into a BS scheduling

area with K · (n + 1) + n scheduled blocks, where n is the number of items in the

2-knapsack instance. The 2nd dimension is transformed into an RN scheduling area with

K scheduled blocks. Every 2-knapsack item i is transformed into a transmission instance

of a data packet whose profit is set to pi. The path and MCSs for this transmission

instance are determined as follows:

• If si[1] > 0 and si[2] > 0, the path for the transmission instance is BS→RN→UE.

The data packet size and MCSs are chosen such that (n+1) ·si[1] scheduled blocks

are required in the BS scheduling area and si[2] scheduled blocks are required in

the RN scheduling area.

• If si[1] > 0 and si[2] = 0, the path is BS→UE. The data packet size and MCS are

chosen such that (n+ 1) · si[1] scheduled blocks are required in the BS scheduling

• If si[1] = 0 and si[2] > 0, the path for the transmission instance is BS→RN→UE.

The data packet size and MCSs are chosen such that one scheduled block is

required in the BS transmission area and si[2] scheduled blocks are required in

the RN transmission area.

Note that this is a valid input to Problem 6, since each packet is either transmitted only

in the BS scheduling area or in the scheduling areas of the BS and the RN. The above

transformation can be performed in polynomial time.

Given an instance of 2-knapsack and a solution to the corresponding transformed

instance of Problem 6, we can easily convert this solution to a solution for the original

2-knapsack problem. In fact, the selected set of items is also an optimal solution for

2-knapsack. This is because the only change in the item sizes is made in the first

dimension, and at most n items whose size in this dimension is 1 are selected. Each

other item i has a size of at least zi · (n+ 1), for some integer zi > 0.

4.4.2 d-MCKP vs. Sparse d-MCKP

Our algorithms for Problem 6 are presented in Section 4.5. They first transform an

instance of Problem 6 into an instance of another well-known theoretical problem,

3An EPTAS (Efficient Polynomial-Time Approximation Scheme) is an algorithm which takes aninstance of an optimization problem and a parameter ε>0 and, in time O(f(1/ε) · nc), where n is theproblem size and c>0 is a constant, produces a solution that is within a factor 1+ε of being optimal.

called d-dimensional multiple-choice knapsack (d-MCKP [69]). This problem differs

from 2-knapsack in two important ways: (a) each item has several configurations, from

which at most one can be chosen for the solution; (b) the size of each item is a D

dimensional vector, where D is an integer > 0 (i.e., D = 2 does not necessarily hold).

These differences make d-MCKP more similar to our problem, and allow a solution for

d-MCKP to be transformed into a solution to Problem 6.

An instance of d-MCKP consists of a D-dimensional knapsack and a set of n items,

each with m or fewer D-dimensional configurations. Each configuration j of item i has

a D-dimensional vector size sji ∈ (N+)D

, in which the dth dimension sji [d] is an integer

≥ 0. Each configuration j of item i has profit pji ≥ 0. The size of the D-dimensional

knapsack is also a vector, [K[1], . . . ,K[D]], where K[i] is an integer > 0. The objective

is to find a feasible set of configurations such that the profit is maximized. A feasible set

of configurations is a set for which the total size of the selected configurations in each

dimension d is at most K[d] and at most one configuration of each item is selected. It

is important to note that despite their similarity, d-MCKP and Problem 6 are different

because a configuration of d-MCKP may have a size > 0 in each of the D-dimensions,

whereas a configuration (transmission instance) in Problem 6 may have a size > 0 in at

most two dimensions: that of the BS and that of one RN. We take advantage of this

difference in order to develop efficient algorithms for Problem 6.

Lemma 4.4.3. Any algorithm for d-MCKP can be transformed into an algorithm for

Problem 6 with the same running time and performance guarantees.

Proof: A transformation similar to that presented in the proof of Lemma 4.4.2 can

be used to transform an instance of Problem 6 into an instance of d-MCKP in linear

Many heuristics exist for d-MCKP [7, 18], but they do not provide a known

performance guarantee. In [69], a (1 + ε)-approximation4 for d-MCKP is given for ε ≥ 0.

However, this algorithm is impractical for Problem 6 for two reasons: (a) it requires

solving a linear program, which is impractical for a BS that needs to solve Problem 6

once every 1ms; (b) its running time becomes impractical for large values of D. In [87], a

dynamic programming algorithm for solving d-KP (the d-dimensional knapsack problem)

is presented. This problem is similar to d-MCKP except that each item has only one

configuration. Using similar ideas to those in [87], a dynamic programming for d-MCKP

can be devised. We present this dynamic programming algorithm in Appendix C. It

returns an optimal solution, but its running time renders it impractical when the number

of RNs grows. However, we later show that it can be invoked as a procedure on small

d-MCKP instances (D = 2 and D = 3) to solve Problem 6.

A closer look at Problem 6 reveals an important difference between it and d-MCKP:

in Problem 6 each item has at most two size dimensions while in d-MCKP there are D.

4Let popt be the total profit of the optimal solution and α≥1. An α-approximation returns a solutionwhose profit is at least

d−MCKP2−KPdifficult

less more

MCKPknapsack sparse d−MCKP

Problem 1and

difficult

Figure 4.5: Comparative difficulty of the the various problems related to this chapter

This difference allows us to define a new theoretical problem called “sparse d-MCKP,”

which will be shown to be more equivalent to Problem 6 than d-MCKP. This problem is

more difficult than 2-knapsack but less difficult than d-MCKP (Figure 4.5). An instance

of sparse d-MCKP consists of a D-dimensional knapsack and a set of n items, each

with at most m configurations. Each configuration j of item i has a profit pji ≥ 0 and

a 2-dimensional size sji [1] and sji [2], where sji [1] is the size of this configuration in the

1st dimension and sji [2] is the size of this configuration in some other dimension di,

where di ∈ {2, . . . , D}. In addition, sji [2] > 0 implies that sji [1] > 0 must hold. The size

of the D-dimensional knapsack is a vector, [K[1], . . . ,K[D]], where each component is

an integer > 0. The objective is to find a feasible set of configurations, with at most

one configuration for each item, such that the profit is maximized. A feasible set of

configurations is a set for which the total size of the selected configurations in each

dimension does not exceed the knapsack size.

Lemma 4.4.4. Problem 6 is equivalent to sparse d-MCKP.

Proof: We prove this by showing how to transform an instance of Problem 6

into an instance of sparse d-MCKP in polynomial time such that a solution for the

transformed instance will also solve the instance of Problem 6. The other direction,

namely, transforming an instance of sparse d-MCKP into an instance of Problem 6, can

be proven in a similar way.

By Lemma 4.4.1, each transmission instance of a packet is associated with the path

BS→UE or BS→RNi→UE, where UE is the destination of the packet and RNi is the

default RN for this UE. Using the same transformation as in the proof of Lemma 4.4.3,

we create an instance of d-MCKP for which all configurations of the same item have

size > 0 in the first dimension, size ≥ 0 in the default RN dimension, and size 0 in every

other dimension. Note that if the size in the default RN dimension is > 0, so is the

size in the first (BS) dimension. The resulting d-MCKP instance is now transformed

into a sparse d-MCKP instance in the following way. Each D-dimensional d-MCKP

configuration j of item i with a vector size sji [1], . . . , sji [D] is transformed into a sparse

d-MCKP configuration in which di is set to the dimension d > 1 whose size > 0. We set

sji [1] = sji [1] and sji [2] = sji [di].

4.5 Scheduling Algorithms

This section is divided into two subsections. In the first subsection we present a

pseudo-polynomial time algorithm, which uses algorithms for MCKP and 2-MCKP as

procedures, and prove that this algorithm returns an approximation for Problem 6. In

the second subsection we present a water-filling algorithm for Problem 6. This algorithm

does not have a performance guarantee, but has a better running time and is simpler

to implement. Both algorithms are first developed in the context of model-1 described

in Section 4.3.3. However, as we show in Section 4.6, they can be easily adapted for

model-2 as well.

4.5.1 A Pseudo-Polynomial Time Algorithm

We now propose a pseudo-polynomial time algorithm, which transforms any α-approxima-

tion algorithm for 2-MCKP (A2-MCKP) and any β-approximation algorithm for MCKP

(AMCKP) into an (α · β)-approximation algorithm for sparse d-MCKP. The algorithm

divides the items into D − 1 disjoint sets and solves an instance of 2-MCKP for each

set separately. Then, an MCKP (which is equivalent to 1-MCKP) instance is generated,

in which an item configuration corresponds to a solution for a 2-MCKP instance. The

MCKP instance is solved and all corresponding item configurations are returned as a

solution.

Algorithm 4.1 An (α · β)-approximation algorithm for sparse d-MCKP

1: Divide the items into D − 1 disjoint sets according to their di dimension (di ∈{2, . . . , D}). The set corresponding to di is denoted M [di].

2: for d = 2 . . . D do

3: for k = 0 . . .K[1] do

4: run A2-MCKP on M [d] with knapsack size [k,K[d]]. Let SOLdk1 be the returned

solution.

5: end for

6: end for

7: Create a new MCKP instance as follows:

• The knapsack size is K[1].

• Each M [d] is transformed into an MCKP item with K[1] + 1 configurations.

The size of configuration j (j ∈ {0, . . . ,K[1]}) is the total size in the 1st

dimension of SOLdj and its profit is the total profit of this solution. Thus, in

the resulting MCKP instance, the total number of items is (D − 1) and each

item has (K[1] + 1) configurations.

8: Run AMCKP to solve the MCKP instance. Each configuration in the solution

corresponds to a subset of the configurations given in the original sparse d-MCKP

instance. Return the union of all those subsets.

Lemma 4.5.1. If A2-MCKP is an α-approximation for 2-MCKP and AMCKP is a β-

approximation for MCKP, Algorithm 4.1 is an (α ·β)-approximation for sparse d-MCKP.

Proof: Let OPT be an optimal solution for the sparse d-MCKP instance. The

chosen configurations in OPT can be partitioned into D − 1 sets according to their di

dimension. For each set i, i = 2 . . . D, let si be the total size in the first dimension of

all configurations in this set, and let pi be their total profit. In step 3 of Algorithm 4.1,

when the inner loop reaches k = si (because si ≤ K[1]) and the outer loop reaches d = i

(i.e., the items in M [i] are considered), A2-MCKP is invoked on a 2-MCKP instance with

item set M [i] (all configurations of these items from the sparse d-MCKP instance are

considered) and a knapsack size [si,K[i]]. Therefore, in the MCKP instance created in

step 7 of the algorithm, the item transformed from M [i] has a configuration whose size

in the first dimension is at most si and its profit is ≥ piα , where α is the approximation

ratio of A2-MCKP.

We now show that the solution returned in step 8 of Algorithm 4.1 is an (α · β)-

approximation. First, note that the solution returned by Algorithm 4.1 is feasible,

because (a) the knapsack size of the MCKP instance created in step 7 is K[1] and thus

the total size in the first dimension does not exceed the capacity; (b) for any other

dimension (d > 1), the MCKP instance created in step 7 contains at most one item

whose size in this dimension in one or more configurations is > 0 but does not exceed

the knapsack size. Let P (OPT) be the total profit of OPT (the optimal solution for the

sparse d-MCKP instance). The profit of the optimal solution to the MCKP problem

returned in step 8 is at least∑

ipiα = P (OPT)

α . Since in step 8 AMCKP is invoked, its

output returns a β-approximation and thus its total profit is at least P (OPT)α·β . Finally,

in step 8, the union of all original configurations has the same total profit. Thus, the

approximation ratio of the solution for the sparse d-MCKP instance holds.

We now analyze the running time of Algorithm 4.1, which depends on the running

time of the procedures it uses in step 4 and step 8 . Let T (A2-MCKP, n,m,K[1],K[2])

be the running time of A2-MCKP on a 2-MCKP instance with n items, each with at most

m configurations, and a 2-dimensional knapsack size [K[1],K[2]]. Algorithm 4.1 invokes

A2-MCKP ((D− 1) · (K[1] + 1)) times. Let T (AMCKP, n,m,K[1]) be the running time of

AMCKP on an MCKP instance with n items, each with at most m configurations and

a knapsack size K[1]. The MCKP in step 7 has (D − 1) items, each with (K[1] + 1)

configurations and a knapsack size K[1]. The total running time of Algorithm 4.1 is

therefore O(D ·K[1] ·T (A2-MCKP, n,m,K[1],maxi≥2{K[i]})+T (AMCKP, D,K[1],K[1])),

where n is the number of items, each with at most m configurations, in the sparse

d-MCKP instance. This running time remains practical even when D (the number of

RNs in Problem 6) grows.

The dynamic programming algorithm for 2-MCKP presented in Appendix C can be

used by Algorithm 4.1 in step 4, and the dynamic programming algorithm presented

in [49] can be used by Algorithm 4.1 in step 8. In this case both AMCKP and A2-MCKP

are optimal and thus Algorithm 4.1 is an optimal algorithm whose running time is

O(D · (K[1])2 ·maxi≥2{K[i]} · n ·m

), where n is the number of items (i.e., the number

of packets waiting for transmission in Problem 6) and m is the maximum number of

item configurations (i.e., the maximum number of transmission instances for a packet

in Problem 6). By Lemma 4.4.1, m ≤ (M2 +M).

An additional improvement can be applied when the dynamic programming algorithm

for 2-MCKP is used by Algorithm 4.1. We can avoid the loop in step 3 and instead

generate the required K[1] + 1 solutions for each M [d] set by running the dynamic

programming algorithm for knapsack size [K[1],K[d]], and then finding the solution

using the corresponding entry in the dynamic programming array. This reduces the

time complexity of the algorithm to O (D ·K[1] ·maxi≥2{K[i]} · n ·m).

4.5.2 A Water-Filling Algorithm

We now present a new polynomial time algorithm for sparse d-MCKP, which is based on

the heuristic for d-KP presented in [33]. Unlike Algorithm 4.1, the new algorithm does

not have a theoretical performance guarantee. However, it is simple to implement and

its running time is better than that of Algorithm 4.1 when the latter uses the dynamic

programming algorithms as its sub-procedures. To describe the new algorithm we need

the following definition [33]:

Definition 4.5.2. The efficiency of a sparse d-MCKP configuration j of item i ispji

sji [1]+sji [2]

, where pji is the profit of the corresponding configuration, and sji [1] and sji [2]

are its 2-dimensional size.

The algorithm first sorts the configurations in decreasing order of their efficiency,

and then considers them for the solution in this order. Each configuration is added to

the final schedule if: (a) no previous configuration for the corresponding item is already

in the solution; and (b) the resource pool in each dimension is not exceeded.

Algorithm 4.2 A water-filling algorithm for sparse d-MCKP

1: Compute the efficiency for configuration j of item i for each pair (i, j), i ∈ {1, . . . , n}, j ∈ {1, . . . ,m}.

2: Sort all the configurations of all items in decreasing order of efficiency.

3: Go over the configurations list from the most efficient to the least efficient; add each

configuration to the solution if (a) its item has not been selected yet (in previous

configurations); (b) it does not exceed the resource pool in any dimension.

4: Return the resulting schedule.

Given that D ≤ n, sorting the configurations dominates the running time of this

algorithm, and its time complexity is O(n ·m · log(n ·m)).

4.6 Adapting Our Algorithms to Model-2

We have shown that Problem 6 is equivalent to sparse d-MCKP for model-1. But this

is not the case for model-2 since here there are two BS scheduling areas. Thus, in each

configuration of each item (packet), there are at most 3 dimensions whose size can be

larger than 0: two correspond to the scheduling areas of the two BSs (the first and

second dimensions) and one to the default RN’s scheduling area. We can solve such

instances by applying a small change to Algorithm 4.1, as follows:

• In step 1 of Algorithm 4.1, the items are divided according to the dimension of

their default RN’s scheduling area. Thus, D − 2 item sets are created: M [d], for

d ∈ {3, . . . , D}.

• In step 3 we loop over all pairs (k1, k2) for k1 ∈ {0, . . .K[1]} and k2 ∈ {0, . . .K[2]}.In each iteration, a 3-MCKP instance, whose item set is M [d] and knapsack size

is [k1, k2], is solved using an algorithm for 3-MCKP, A3-MCKP.

• In step 7 we create an instance of 2-MCKP instead of MCKP. Here, the knapsack

size is [K[1],K[2]] and each item has (K[1] + 1) · (K[2] + 1) configurations. Each

configuration in step 7 is created while taking into account two dimensions (the

two that correspond to the BS scheduling areas).

• We run A2-MCKP to solve the 2-MCKP instance in step 8.

Lemma 4.6.1. If A3-MCKP is an α-approximation for 3-MCKP and A2-MCKP is a

β-approximation for 2-MCKP, the adapted version of Algorithm 4.1 is an (α · β)-

approximation for sparse d-MCKP in model-2.

Proof: The proof is similar to that of Lemma 4.5.1.

We now analyze the time complexity of the algorithm. Let T (A3-MCKP, n,m, k[1],

k[2], k[3]) be the running time of A3-MCKP on a 3-MCKP instance with n items, each

with at most m configurations, and knapsack size [k[1], k[2], k[3]]. The time complexity

of the algorithm is O(D ·K[1] ·K[2] · T (A3-MCKP, n,m,K[1],K[2],max{K[i]})), since

3-MCKP is solved instead of 2-MCKP. When the dynamic programming algorithm is

used as a procedure for solving 3-MCKP in step 4, an additional improvement can be

made, similar to the one made for Algorithm 4.1, which reduces the time complexity by

a factor of K[1] ·K[2].

Next, we explain how to adapt Algorithm 4.2 for model-2. For sparse d-MCKP

under model-2, an item i has at most 3 non-zero dimensions. However, since in each

configuration there are at most 2 dimensions whose size can be larger than 0, we can

run Algorithm 4.2, except that D + 2 dimensions should be considered. It is easy to see

that the running time and correctness analysis remain valid.

��

��BS

RN3RN5

Figure 4.6: FFR in a cluster of 3 sectorized cells

4.7 OFDMA Joint Scheduling with Relays in a sectorized

So far we have considered a BS with an omni-directional antenna. However, relay nodes

(RNs) are often used in a sectorized cell, where the BS has several directional antennas.

Each antenna transmits in one sector, either to an RN or to a UE. In this section we

consider a cell with N sectors and R RNs in each sector, and assume that model-1

is used in each sector. However, the following discussion applies to model-2 as well.

Figure 4.6 shows a network with 3 cells, for which N = 3 and R = 2.

Assuming that a centralized BS scheduler serves all its sectors, as proposed in

Chapter 3, the problem this scheduler faces for every packet is which sector to use,

whether or not to route the packet through the RN of the chosen sector, and which MCS

to use over each link. It is shown in Chapter 3 that without RNs, the scheduling problem

of the BS in a sectorized cell is equivalent to a new problem called Multiple-Choice

GAP (MC-GAP). In this problem, which combines MCKP and GAP, each GAP bin

(sector) is an instance of MCKP. When each sector has one or more RNs, the centralized

scheduling problem is solved by defining an extension of MC-GAP, where each GAP

bin is an instance of sparse d-MCKP. The extended problem is called d-MC-GAP. It

is shown in Chapter 3 that any α-approximation to MCKP can be transformed into

an (α + 1)-approximation to MC-GAP. This transformation uses an approximation

of MCKP as a procedure. Using similar ideas, we can invoke an α-approximation to

sparse d-MCKP as a procedure for solving each GAP bin of d-MC-GAP, to obtain an

(α+ 1)-approximation to d-MC-GAP.

Thus, when Algorithm 4.1 uses the dynamic programming algorithm presented

in Appendix C as a procedure for solving 2-MCKP and the dynamic programming

algorithm presented in [49] as a procedure for solving MCKP, it can be transformed

into a 2-approximation for d-MC-GAP.

4.8 Simulation Study

In this section we present Monte-Carlo simulation results for the algorithms proposed in

the chapter. The purpose of this section is three-fold: (1) to compare the performance of

Algorithm 4.1 and Algorithm 4.2; (2) to study the impact of various network parameters

on the performance of our algorithms; and (3) to study the performance gain from using

4.8.1 Network Model

We consider a hexagonal network cell and its 2-hop neighboring cells (total of 19 cells).

Scheduling is performed in this cell, while the surrounding cells are considered for

the calculations of the SINR experienced by each receiver. Our interference model

and parameters are based on the 3GPP specifications [1] and on the work presented

in [85, 94], except that omni-directional antennas are considered instead of directional

antennas. These parameters are summarized in Table 4.2.

Parameter Value Parameter Value

network layout 19 BSs UE/RN height 1.5m

system frequency 1,500MHz TX power 39dBm

BS antenna height 20m TX ant. gain 18.9dBi

inter-site distance 1,700m RN power 30dBm

number of MCSs 7 system bw. 20Mhz

Table 4.2: Simulation network parameters

The average size of each data packet is 3.5 scheduled blocks if it is transmitted using

[QPSK, 1/2], which is the most robust MCS out of the 7 MCSs considered in this study.

For each MCS and link (BS→RN, RN→UE and BS→UE), the success probability of a

transmitted packet is determined from the corresponding SINR value at the receiver

using data taken from [12]. Our utility function in this section aims at maximizing the

number of successfully delivered packets. Thus, the profit from transmitting a packet to

a user using a particular MCS is taken as the probability that the packet is successfully

received over the BS→UE or the BS→RN→UE links. The cost of transmitting a packet

is equal to the number of scheduled blocks used in each link, which depends on the

length of the packet and the chosen MCS for each link. This cost is always rounded up

to the nearest integer.

4.8.2 Interference Model

We start by describing how the SINR of each user is calculated. Let pt(u) be the power

received by UE u from transmitter t, where t is either a BS or an RN. In addition, let

T (t) be the set of transmitters, other than t, that transmit over the same subband used

by t. The SINR experienced by u is defined by:

γt(u) =pt(u)∑

t′∈T (t)

pt′(u) + n0w,

where w is the system bandwidth (20 MHz), n0 is the thermal noise over the bandwidth

w, and pt(u) is the end power given by the following equation [85]:

pt(u) =pt − PLt(u) + gt(dBm).

In this equation, pt is the dBm power of antenna t, and gt is the gain of this antenna.

PLt(u) is the path loss, estimated using the Hata propagation model. It is calculated

using the following equation [40]:

PLt(u) = 69.55 + 26.16 log10(f0)− 13.82 log10(zt)

− a(zu) + (44.9− 6.55 log10(zu)) log10(dt(u)),

where f0 = 1,500MHz is the transmission frequency, zt is the height (meters) of t’s

antenna, zu is the height (meters) of user u, dt(u) is the distance (kilometers) between

u and the antenna of t, and a(zu) = 0.8 + (1.1 · log10(f0)− 0.7) · zu − 1.56 log10(f0) is a

function that fits a small or medium sized city.

4.8.3 Simulation Results

To draw one point on each of the graphs presented in this section, we generate 100

random instances with different seeds and average their results.

We start by evaluating the performance gain from adding RNs. Throughout this

section, the normalized load is defined as the number of scheduled blocks required to

transmit all pending packets, if they are all transmitted directly by the BS using the

most efficient MCS, divided by the total number of scheduled blocks in a subframe over

all subbands.

We first consider model-1. The number of scheduled blocks in the reuse-1 subband

at the BS is set to 55 and the number of scheduled blocks in the reuse-1/3 subband

available to each RN is set to 15. We found that for the considered network parameters

(Table 4.2), placing the RNs at a distance of 500 meters from the BS results in a

reasonable SINR for a BS→RN transmission and a reasonable SINR for RN to cell-edge

UE transmissions.

To see the benefit from adding RNs to a network, we compare the performance

to that of a network that employs the same FFR scheme but does not employ RNs.

For a fair comparison, similar parameters are used with and without RNs. Specifically,

the same number of scheduled blocks for reuse-1/3 and reuse-1 subbands is considered.

Under these parameters, the maximum number of packets that can be scheduled in a

subframe with no RNs is 70 (15 in the reuse-1/3 subband and 55 in the reuse-1 subband),

and the normalized load is calculated according to this number. A UE is viewed as a

cell-edge UE if its distance from the BS is more than 700 meters, and its distance from

some RN is shorter than the distance of this RN from the cell edge. In this case, the

SINR for direct BS→UE transmission is very low. This allows us to simulate practical

scenarios where RNs are placed in areas where many UEs have a poor SINR for direct

transmission by the BS.

Figure 4.7 shows the performance gain when 3 RNs are placed in every cell. The

y-axis shows the ratio between the total profit obtained by Algorithm 4.1 for a network

with 3 RNs and the total profit obtained without RNs. The latter is determined by a

dynamic programming algorithm that obtains an optimal solution. The x-axis in this

figure is the normalized load as defined earlier. The figure shows 2 curves: in the lower

curve a UE is 5 times more likely to be a cell-edge UE than to be uniformly located

in the cell; in the upper curve this ratio increases to 10. As expected, the increase in

performance is greater when there are more cell-edge UEs. In addition, we can see that

with RNs the performance of the network increases by up to 60%. For small loads, the

increase is small since there are not many pending packets and they can be scheduled

in the reuse-1/3 subband of the BS when no RNs are used. But, as the load increases,

there is not enough reuse-1/3 bandwidth to accommodate all these packets. When these

packets are transmitted using the BS reuse-1 bandwidth, they acquire a small profit

due to a poor SINR. With RNs, however, these packets can be transmitted with good

SINR through the RNs. As the load increases further, there are more UEs closer to the

BS; these UEs do not require the assistance of the RNs and thus the performance gain

decreases.

For the parameters used for Figure 4.7, Algorithm 4.2 performs very close to

Algorithm 4.1. Thus, the same curves shown in Figure 4.7 for Algorithm 4.1 also

represent Algorithm 4.2. The reason for this is that with this set of parameters, the

efficiency of the transmission configurations that use the RNs is very high, which makes

them attractive for selection by Algorithm 4.2.

In Figure 4.8, we reduce the distance for which a UE is considered as a cell-edge

UE from 700 to 500. A UE is now 5 times more likely to be a cell-edge UE than to

be uniformly located within the cell. Other than that, we use the same parameters

as for Figure 4.7. The y-axis shows the ratio between the total profit obtained by the

algorithm (Algorithm 4.1 or Algorithm 4.2) and the maximum profit obtained if all

packets are transmitted directly by the BS using the most efficient MCS. The x-axis in

this figure is the normalized load as defined earlier.

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2pr

UE in the edge factor = 10 UE in the edge factor = 5

Figure 4.7: The profit with 3 RNs divided by the profit with no RNs for two UEdistributions

0 0.2 0.4 0.6 0.8 1 1.2

Algorithm 1 Algorithm 2

Figure 4.8: The profit with 3 RNs divided by the maximum profit for the two algorithms

This time, Algorithm 4.1 exhibits better performance than Algorithm 4.2 for high

loads. This is because for such loads more cell-edge UEs have a reasonable SINR for

the direct BS transmissions. Therefore, such configurations have a higher efficiency.

Algorithm 4.2 is more likely to choose direct BS→UE configurations, and it obtains a

smaller profit.

The performance gain due to the addition of RNs in the setting of Figure 4.8 is

smaller compared to the gain in the settings of Figure 4.7. This is expected, since more

UEs can be reached directly by the BS.

We now consider model-2. Recall that the decision about whether model-1 or

model-2 should be used depends on many factors and regulations that are beyond the

scope of this chapter. However, we compare the performance of our algorithms in two

different models to show that they are generic and they work well in different models.

Since the interferences are different for these two models, some network parameters,

such as the location of the RNs, are determined separately for each model.

As we did for model-1, we start with 3 RNs. All the parameters remain the same

as in model-1, except that now, because reuse-1 is employed by the BS and RNs, each

BS and each RN has 100 scheduled blocks. In addition, the distance of the RNs from

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2

the BS increases to 700 meters. The RNs are required to be closer to the cell edge in

order to have a reasonable SINR for the transmissions from the RNs to cell-edge UEs.

Figure 4.9 shows the performance gain when 3 RNs are placed in every cell. The y-axis

shows the ratio between the total profit obtained by Algorithm 4.1 for a network with 3

RNs and the total profit obtained when RNs are not used. The latter is determined by

a dynamic programming algorithm that obtains an optimal solution. The x-axis in this

figure is the normalized load as defined earlier. Since model-2 schedules 2 consecutive

subframes together (see Figure 4.3(b)), in each subframe we set the number of packets

for the cell with no RNs to be half of the number of packets when the cell has RNs. This

is necessary in order to fairly evaluate how the addition of RNs affects performance.

We can see in Figure 4.9 that for low loads, the addition of RNs results in much

higher profit gain compared to Figure 4.7. This is because when reuse-1 is employed,

cell-edge UEs experience significant interference from neighboring cells. Their SINR for

direct BS transmission is so low that they cannot be reached without the assistance of

an RN. When the load increases, some UEs are closer to the BS and thus can receive

their packets directly. Therefore, the profit ratio decreases.

The next scenario for model-2 is obtained by reducing the distance for which a

UE is considered as a cell-edge from 700 to 500. In contrast to model-1, it turns out

that even in this case the performance of Algorithm 4.2 is very close to the optimal

solution (no graph is presented for this scenario). This is because the RNs and BS use

the same subband, and they interfere with each other. Consequently, a packet can have

a reasonable SINR for direct BS→UE transmission or for BS→RN→UE transmission,

but not for both. This leads to a good transmission success probability and to good

efficiency only for one configuration, which is likely to be selected by Algorithm 4.2.

Finally, we increase the number of RNs to 6. For model-1 we set the number of

scheduled blocks in the reuse-1 subband to 70 and the number of scheduled blocks

in the reuse-1/3 subband of each RN to 10. These parameters are chosen such that

there are sufficient scheduled blocks for the BS to transmit enough packets to each RN

simultaneously in the same subframe. In model-1 the RN distance is 500 and in model-2

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2

(a) model-1 (b) model-2

it is 700. In both models a UE is considered as a cell-edge if its distance is ≥ 700 meters

from the BS, and its distance from the RN is not more than the distance of the RN from

the cell edge. The results are shown in Figure 4.10(a) for model-1 and Figure 4.10(b)

for model-2. While increasing the number of RNs allows better spatial reuse, it also

increases the interference experienced by each RN. Therefore, with 6 RNs each RN uses

a more robust and less efficient MCS for its transmissions, and the performance gain is

similar to that experienced for 3 RNs.

4.9 Conclusions

We defined the scheduling problem for an OFDMA cell with relay nodes (RNs) as

a new optimization problem called sparse d-MCKP and proved it is NP-hard. We

proposed an algorithm with a performance guarantee and also developed a water-filling

algorithm with simple implementation and low time complexity. We focused on a specific

model and evaluated the performance of our algorithms for this model. Although the

algorithms were presented in the context of this model, they can be easily adapted to

other FFR models of an OFDMA wireless network with RNs. We used an extensive

simulation study to compare the two algorithms. Our main conclusions are that our

water-filling heuristic is usually as efficient as our approximation, even if the latter is

implemented such that its results are optimal. We also showed that increasing the

network throughput with RNs is not a trivial task, and it depends on the location of

the RNs and the UEs, and on the number of scheduled blocks available to each RN.

Chapter 5

Efficient Allocation of Periodic

Feedback Channels in Broadband

Wireless Networks

5.1 Introduction

The contribution of this chapter is threefold. It is, to the best of our knowledge, the

first to present a formal framework for the allocation of periodic CSI channels. It also

defines, again for the first time, several problems relevant to this framework and presents

efficient algorithms for solving them. Finally, it presents a holistic scheme that indicates

when the BS should invoke each of the proposed algorithms.

The framework proposed in this chapter defines a profit/utility function for the

allocation of a CSI channel to each MS. While the proposed framework and algorithms

are general enough to address every profit function, we propose and discuss a specific

function, for which the profit is equal to the expected number of packets transmitted to

an MS using a correct CSI value due to the allocation of a CSI channel with a certain

bandwidth.

The rest of this chapter is organized as follows. In Section 5.2, we discuss related

work. In Section 5.3, we show how to allocate slots to CSI channels using a complete

binary tree, in order to guarantee an efficient collision-free allocation, and describe

the considered CSI channel allocation model. Section 5.4 is the core of the chapter.

It defines the CSI allocation problems and presents efficient algorithms for them. In

Section 5.5 we study the performance of the various algorithms and present a complete

BS scheme for the allocation of CSI channels. Finally, Section 5.6 concludes the chapter.

5.2 Related Work

Previous works have addressed aspects of the problem other than the one we address here.

For example, with the exception of [59] and [90], previous works have not attempted

��

Preamble

unicast bursts

broadcast regionPreamble

unicast bursts

broadcast region

CSI channel slot

Preamble

unicast bursts

broadcast region

CSI channel slot

Frame-2 Frame-3Frame-1

��

Preamble

unicast bursts

broadcast region

CSI channel slot CSI channel slot

CSI channel slot

Frame-(τ + 1) Frame-(n · τ + 1)Frame-1

Figure 5.1: (a) A CSI super-channel consists of the same slot in every uplink OFDMAframe; (b) a CSI channel consists of the same slot in every τ = 2i frames

to adjust the periodicity of the CQI reports to the specific needs of each MS. Rather,

they have tried to reduce the cost of the CQI reports by: (i) not sending CQI reports if

the channel condition has not significantly changed [29, 35, 43, 46, 83, 89]; (ii) sending

a single CQI report to a group of MSs [56]; or (iii) sending a single CQI report for a

subset of OFDM subchannels [83, 86]. All these works are orthogonal to the scheme

and algorithms presented in this chapter.

In [59, 90], the authors propose a CQI allocation scheme for 802.16. Their scheme

views the CQI bandwidth as a “toy brick.” In contrast to these works, we represent the

CSI bandwidth as a binary tree, which allows us to minimize the number of changes

for allocating a CSI channel when the available CSI bandwidth is fragmented. We also

allow different channels to have different profit functions and seek to optimize the total

profit of the BS.

In [30], the authors address the OVSF code assignment problem. While their work

does not target the allocation of CSI channels, some of their results are relevant to us.

In particular, the allocation framework proposed in this chapter is based on a complete

binary tree that is similar to the OVSF tree used in [30]. However, OVSF codes in [30]

can only be assigned to a specific level in the tree whereas we allow each CSI channel to

be associated with different levels and profit.

In [46], an adaptive CQI scheme is proposed, where a node reports the CQI value

only if it has changed since the last report or if a timer expires. With the proposed

scheme, battery capacity of the MS is conserved and uplink interference is reduced.

While [46] also considers periodic CQI channels, it does not, in contrast to our scheme,

(a) attempt to change the periodicity of the CQI reports; (b) address the case where

the CQI bandwidth is insufficient for all the CQI channels.

In [89], the problem of getting too many CQI reports at the BS is studied. The goal

of the proposed scheme is to reduce the number of these reports by careful selection of

the specific OFDM subchannels for which such reports are required. In [43] a similar

scheme is proposed, which also takes into account the QoS requirements of each MS.

In [29], a new metric for the performance of CQI schemes is proposed and studied. It

takes into account the total resources consumed by each CQI scheme. It is then used

for comparing different, periodic and aperiodic, CQI schemes with different SNR values.

In [86], the authors propose to reduce the CQI bandwidth cost by reporting a single

CQI value for a subset of sufficiently proximate OFDM subchannels. A hierarchical

tree is used to create groups of subchannels. In [83], a similar hierarchical mechanism is

used, but only CQI values with sufficient quality are reported. It is claimed that the

proposed scheme can significantly reduce the CQI feedback overhead at the expense of

a little downlink performance degradation. In [56], proximate MSs are considered as a

“CQI feedback group,” and only one representative node is asked to send a CQI report.

This chapter deals with the allocation of feedback channels, and not with how and

when the nodes send feedback information. This important topic is addressed by many

papers, some of which are mentioned in what follows.

In [79], the authors present an efficient method for calculating the PMI at the

receiver. The method is based on maximizing the mutual information between the

transmitted and received symbols with respect to the precoding matrix applied at the

transmitter.

In [78], the authors present an efficient method for calculating the PMI, RI and CQI

at the MS. To reduce the MS computational burden, the proposed method decomposes

the problem into two separate steps: jointly evaluating the PMI and RI using a mutual

information metric, and choosing the CQI value to achieve a given target block error

ratio constraint.

In [15], the authors discuss the suitability of two options for the closed loop precoded

MIMO transmission in LTE uplink. The first option is to use the same codebook

of precoding matrices defined for LTE downlink, while the second option exploits

the singular value decomposition of the channel matrix. Qualitative benefits of both

solutions are discussed.

Finally, in [11] the authors give a brief overview of the LTE and LTE-advanced

system downlink transmission and discuss different precoding matrix selection criteria.

Following the analytical and numerical results, the authors conclude that the “minimum

post-mean squared error” based criterion is a good candidate for precoding matrix

selection at the receivers.

5.3 Preliminaries

5.3.1 CSI channels

Decision-making schemes that might decide not to send certain CSI reports [29, 35,

43, 46, 83, 89], e.g., if the channel condition has not changed notably, cannot easily

take advantage of the unused slots. This is because these slots are too short for regular

packets and because the MS cannot rely on their availability. The approach taken in this

chapter is different in the sense that the BS allocates different bandwidth to different CSI

channels in accordance with each channel’s individual profit function. Using the scheme

we propose, the BS views the CSI bandwidth as a shared resource, to be dynamically

allocated to the MSs. The BS can also adjust the size of this resource. For instance,

when it realizes that there are not so many dynamic MSs in its cell, the BS can decrease

the total CSI bandwidth and use it for other purposes. The CSI bandwidth is divided

into several super-channels. A super-channel consists of one slot in every uplink frame

(Figure 5.1a). Therefore, the number of such super-channels is equal to the number

of CSI slots in every frame. Each super-channel is divided into multiple CSI channels

(Figure 5.1b). This chapter presents algorithms for the division of a super-channel into

multiple channels and for the allocation and deallocation of these CSI channels. To

allocate a CSI channel, the BS sends to an MS a control message with the following

parameters:

(a) The sequence number of the first frame that contains a slot of this channel.

(b) The number of frames τ between two consecutive slots of this channel.

(c) The time during which this CSI channel is allocated to the MS. The BS can also

allocate the channel with no expiration time, and then explicitly request it back.

A CSI channel Cj is denoted αj |τj , where αj is the sequence number of the first

frame that contains a slot of this channel and τj is the periodicity of the slots. A smaller

value of τj means more frequent CSI reports, which provide the BS with more accurate

information about the channel state of the corresponding MS. However, if τj is too

small, the BS is likely to receive too many identical CSI reports. Therefore, the optimal

value of τj depends on the stability of the channel, which is affected by many factors

such as MS mobility speed, physical obstacles, weather conditions, interference from

other BSs/MSs or other wireless networks.

5.3.2 Power of 2 allocation

A power of 2 allocation is an allocation of CSI channels for which τ = 2i holds for every

channel, where i is an integer between 0 and C. Such an allocation is useful because it

can prevent collisions between slots of two different CSI channels.

Definition 5.3.1. Two or more CSI channels are said to collide if they contain the

same slot. In other words, a collision occurs between α1|τ1 and α2|τ2 if for some integers

x > 0 and y > 0, α1 + τ1 · x = α2 + τ2 · y.

We now show how a power of 2 allocation can be performed when the bandwidth

of each super-channel is maintained using a complete binary tree TC whose height is

00 01 10 11

000 001 010 011 100 101 110 111

0 1 level 3 (C)

level 2

level 1

level 0

Figure 5.2: An example of a labeled CSI allocation tree for a super-channel

1/8 1/8

1/8 1/8 1/8 1/8

1/41/4

Figure 5.3: Examples for two collision-free allocations

C. We refer to such a tree as a CSI allocation tree. Then we shall see how such an

allocation can be guaranteed to be collision-free. The leaves of TC are in level 0, their

parents are in level 1, and so on. We assign a label to every tree node in the following

way. For a node in level l, the assigned label consists of C − l digits from which the

first C − l − 1 are the same as of the node’s parent and the last digit is set to 0 for a

left child or to 1 for a right child. Figure 5.2 gives an example.

Let r be the reversed label of node v in the tree, and d(r) be the decimal value of r.

Then, node v is the root of a subtree whose height is l associated with the CSI slots

d(r)|2C−l. For example, node v1 in the tree of Figure 5.2 is the root of a subtree whose

height is 2 associated with the CSI slots 0|2, while node v2 is the root of a subtree

whose height is 1 associated with the slots 1|4. We now prove that if each root-to-leaf

path in the allocation tree has at most one allocated node, then the CSI channels

represented by the tree do not collide. For instance, consider the two trees in Figure 5.3

and suppose that the black nodes indicate allocated slots. In both trees there is at most

one allocated node on every root-to-leaf path. By the lemma below, this indicates that

the CSI channels represented by the allocated tree nodes are collision-free. The fraction

near every black node indicates the fraction of the super-channel bandwidth assigned to

the corresponding CSI channel.

Lemma 5.3.2. Two nodes of a CSI allocation tree are on the same root-to-leaf path if

and only if their corresponding slots collide.

Proof: Consider node v1 in level l1 and node v2 in level l2 of the tree. Without

loss of generality, let l1 > l2. Recall that the corresponding CSI slots of v1 and v2

are d(r1)|2C−l1 and d(r2)|2C−l2 , where r1 and r2 are the reverse labels of v1 and v2

respectively.

frame #

1 2 2 2 2

I+1 I+2 I+3 I+4 I+5 I+6 I+7 I+8 I+9 I+10 I+11 I+12I I+13 I+14 I+15

Figure 5.4: Fragmentation of a CSI channel

If v1 and v2 are on the same root-to-leaf path, the last C − l1 digits of r1 and r2

are identical. Therefore, there exists an integer x such that d(r1) + 2C−l1 · x = d(r2),

implying that the corresponding CSI slots of v1 and v2 collide. If v1 and v2 are not on

the same root-to-leaf path, the last C − l1 digits of r1 and r2 are different. Therefore,

for every two integers x and y, d(r1) + 2C−l1 · x 6= d(r2) + 2C−l2 · y holds.

When the CSI channel is allocated 2 nodes that are not in each other’s subtree,

channel fragmentation occurs. For example, assigning the slots 0|16 and 1|4 to the same

MS is translated to the allocation of slots shown in Figure 5.4. We can see that the

slots are not uniformly distributed along the time axis. This is a suboptimal allocation

because some of the slots are too close to previous slots and are not useful. For this

reason, we do not allow channel fragmentation.

5.3.3 CSI Allocation Framework

Following the discussion above, we now describe our requirements from a CSI allocation

framework:

(R1) Collisions and fragmentation of CSI channels are not allowed. Therefore, (a) a

super-channel is divided into multiple CSI channels using a complete binary tree;

(b) each CSI channel consists of at most one tree node, which is the root of a

subtree; (c) subtrees allocated to different CSI channels are mutually disjoint.

(R2) For each tree level l and MSj , a profit function Pj(l) indicates the “profit of the

system” from allocating this CSI channel to this MS.

While our framework is general enough to address every Pj function, throughout

the chapter we focus on the following specific function:

Pj(l) =

Ej · 2(lMAXj ) if l > lMAX

Ej · 2(l) Otherwise.(5.1)

As shown later, this function guarantees that the profit is equal to the number of packets

expected to be transmitted to MSj using a correct CSI value.

With the proposed Pj(l) function, the BS estimates how dynamic the downlink

channel of MSj is. This estimation is translated into the average time window wj during

which the CSI value of MSj changes. The BS also calculates the average data packet

rate rj for MSj , and sets Ej ← wj · rj . Consequently, Ej represents the average number

of consecutive packets transmitted to MSj using a correct CSI value (Figure 5.5).

For example, if a CSI channel is allocated to MSj in level 0 (a leaf tree node), then

the periodicity of this CSI channel is 2C , and therefore Pj(0) = Ej is the desired value.

If we allocate to MSj a CSI channel in a higher level, then Ej is multiplied by the

number of CSI reports in a time window of 2C subframes. When the time between two

CSI reports becomes close to wj , the BS is likely to receive from MSj many identical

rj · wj packets are transmittedby BS, on the average, usinga correct CSI value

t+ wjt

Figure 5.5: Consecutive packets transmitted to MSj using correct CSI value

CSI reports. Thus, there is an upper bound lMAXj on the level of the tree for which

extra profit is obtained.

5.4 Algorithms for CSI Allocation

5.4.1 Optimization Criterion

In this section we address the following problems related to the CSI allocation framework

described in Section 5.3:

1. How to allocate bandwidth to CSI channels when a tree (super-channel) is empty.

2. How to reallocate the bandwidth of a released CSI channel.

3. How to allocate a CSI channel to a new MS when the available CSI bandwidth is

fragmented.

4. How to change the bandwidth of a CSI channel in order to take into account

changes in the profit function of some MS(s), e.g., due to a new mobility pattern.

We first present algorithms for the various cases and then combine them into a scheme

that indicates when each algorithm should be executed by the BS.

When a new MS enters the cell, the BS needs to determine its corresponding profit

function. To this end, the BS allocates a basic (minimum bandwidth) CSI channel to

every active MS. The bandwidth dedicated for the initial CSI channels is assumed to

be sufficient for all active MSs. This channel is used by the BS in order to determine

the initial Ej value for MSj , and to allocate a broader CSI channel when necessary.

Since the BS can easily determine the expected number of packets transmitted by MSj

between two CSI reports, it can also determine Ej and Pj(l). To simplify the discussion,

if no CSI channel is allocated to MSj (except the initial channel), we say that MSj is

allocated a tree node at level lj = −1, and that Pj(−1) = 0.

In all the problems defined below, the optimization criteria is maximizing the total

profit of the system; i.e.,∑n

j=1 Pj(lj), where MS1, . . ., MSn are the active MSs. As

explained earlier, since our profit function is equal to the expected number of packets

transmitted when the BS has a correct CSI value, maximizing the total profit is

equivalent to maximizing the total number of packets transmitted by all active MSs

using a correct CSI value. This is also equivalent to minimizing the expected number of

packets transmitted by the BS when it has an incorrect CSI value.

5.4.2 CSI Allocation When the Tree Is Empty

We start with the basic problem, where we assume that the tree is empty and the goal

is to find the best allocation for a given set of active MSs. This problem is referred to

as CF-CSI-E (Collision Free CSI allocation in an Empty tree), and is formally defined

as follows:

Problem 7 (CF-CSI-E):

Instance: The height of the allocation tree C and the profit function Pj for every

active MSj 1 ≤ j ≤ n.

Objective: Find an allocation of CSI channels to the active MSs such that the total

profit is maximized.

We now show that CF-CSI-E can be reduced to the Multiple Choice Multiple

Knapsack Problem (MCKP) [49]. An MCKP instance is a set of m mutually disjoint

classes N1, . . . , Nm of items to be packed into a knapsack of capacity B. Each item

i ∈ Nj has a profit pij and a weight wij . The objective is to choose at most one item

from each class such that the aggregated profit is maximized and the aggregated weight

is not larger than B.

To reduce an instance of CF-CSI-E to an instance of MCKP, each CSI channel Cj is

represented by a class Nj , and for each level i CSI subtree that can be allocated to Cj

(1 ≤ i ≤ lmaxj ) there is an item i ∈ Nj . The knapsack capacity is set to B = 2C . The

weight of i ∈ Nj is set to wij = 2i and the profit is set to pij = Pj(i).

The above reduction gives rise to the following algorithm for CF-CSI-E:

Algorithm 5.1 An algorithm for CF-CSI-E

1: Reduce the CF-CSI-E instance to an MCKP instance as described above.

2: Run an algorithm, AMCKP, that finds a solution to the MCKP instance.

3: Translate the solution returned by AMCKP to a solution for CF-CSI-E, such that a

CSI channel Cj is allocated a tree node in level i if item i in class Nj is chosen for

the MCKP solution.

Lemma 5.4.1. If AMCKP is an α-approximation to MCKP, Algorithm 5.1 is an α-

approximation to CF-CSI-E.

Proof: Let POPT be the total profit of the optimal solution to the CF-CSI-E

instance, and PS be the total profit of the solution returned by Algorithm 5.1. The

translations performed in steps 1 and 3 maintain the same total profit with respect to

the optimal solution. Therefore, since in step 2 AMCKP is invoked, α · PS ≥ POPT.

MCKP has a simple 2-approximation greedy algorithm whose running time is

O(I log I), where I is the total number of items [49]. Using linear selection, the running

time can be improved to O(I) [49]. It also has a pseudopolynomial time optimal dynamic

programming algorithm whose running time is O(B · I) [49]. For practical instances,

B ≤ 210 = 1024 holds for CF-CSI-E, because this allows a periodicity of up to 1 second.

Thus, the running time for the optimal dynamic programming algorithm is O(I). This

algorithm is converted into an optimal polynomial time algorithm for CF-CSI-E.

The solution found by Algorithm 5.1 indicates only the tree level of each CSI channel

and not the specific tree node. However, given such a solution, we use the following

result, stated in [30] in the context of OVSF code assignment, to convert this information

into a concrete allocation:

There exists a collision-free allocation of the tree nodes if and only if∑u∈V 2l(u) ≤ 2C , where V is the set of all allocated nodes in the tree,

l(u) is the level of an allocated node u in the tree, and C is the height of

the tree.

Figure 5.2 shows how a specific level l node is represented by a label consisting of C − ldigits. To obtain a concrete allocation, we sort the nodes in descending order of their

level and for each node in level l, we find the smallest (in lexicographic order) label of

C − l digits that is still available. This process takes O(|V | log |V |) time, where V is

the set of all allocated nodes in the tree.

The reduction of a CF-CSI-E instance to an MCKP instance in step 1 can be

performed in O(C · n) time. If the running time of the MCKP algorithm used in step 2

is TMCKP(I,B), the total running time of Algorithm 5.1 is O(C · n+ TMCKP(C · n, 2C)).

5.4.3 CSI Allocation with No Change to Previously Allocated CSI

channels

We now define the second problem, referred to as CF-CSI-NC (Collision Free CSI

allocation with No Change to previously allocated CSI channels). Here, some bandwidth

of a super-channel tree becomes available following the release of a CSI channel when

an active MS leaves the cell or becomes inactive. This bandwidth can be allocated by

the BS to improve the total profit gained by the current active MSs.

Problem 8 (CF-CSI-NC):

Instance: The height of the allocation tree C, the profit function Pj for every active

MSj 1 ≤ j ≤ n, and information about already allocated CSI channels.

Objective: Allocate the unused CSI bandwidth such that the gained profit is maxi-

mized.

Definition 5.4.2.

(a) A free subtree in T is a subtree that contains only free nodes.

(b) A free subtree is max-free if the subtree rooted at its parent is not free.

Figure 5.6: A CSI tree with its 4 max-free subtrees (black nodes are occupied)

For example, Figure 5.6 shows 4 max-free subtrees (one of which is a leaf).

We now present an algorithm for CF-CSI-NC that is based on a reduction to the

Multiple Choice Multiple Knapsack Problem (MC-MKP), which is an extension of

MCKP to multiple knapsacks [49]. The instance of MC-MKP is a set of m mutually

disjoint classes N1, . . . , Nm of items and a set B = (B1, . . . , B|B|) of knapsack capacities.

Each item i ∈ Nj has a profit pij and a weight wij . The objective is to choose at most

one item from each class and pack it in one of the knapsacks such that the total profit

is maximized and the aggregated weight in each knapsack does not exceed its capacity.

As an example, we now define an MC-MKP instance and show its optimal solution.

Our instance consists of two knapsacks with capacities B1 = 1 and B2 = 2, and 3 classes

of items. Each class contains 2 items whose weights and profits are: w11 = w12 = w13 = 1,

p11 = p12 = 2, w21 = w22 = w23 = 2, p21 = p22 = 4, p13 = 3, p23 = 6. An optimal

solution for this instance is to pack item 1 from class N1 in knapsack B1 and item 2

from class N3 in knapsack B2. This solution has a total profit of 8.

To reduce an instance of CF-CSI-NC to an instance of MC-MKP, each CSI channel

Cj is represented by a class Nj , and for each level i subtree, which can be allocated

to Cj (1 ≤ i ≤ lmaxj ), there is an item i ∈ Nj . Each max-free subtree of height h is

represented by a knapsack of capacity 2h in B.

As an example of the reduction of CF-CSI-NC into MC-MKP, consider the CSI

tree in Figure 5.6. This tree is translated into an MC-MKP instance that consists of 4

knapsacks whose capacities are 1, 2, 2, and 4. Suppose that there is one active MS with

Ej = 3 and lMAXj = 2. This MS is translated into a class with 3 items: item 1 has a

profit of Ej = 3 and weight 1, item 2 has a profit of 6 and weight 2, and item 3 has a

profit of 12 and weight 4.

The above reduction gives rise to the following algorithm for CF-CSI-NC:

Algorithm 5.2 An algorithm for CF-CSI-NC

1: Reduce the CF-CSI-NC instance to an MC-MKP instance as described above.

2: Run an algorithm, AMC-MKP, that finds a solution for the MC-MKP instance.

3: Translate the solution returned by AMC-MKP to a solution for CF-CSI-NC, such

that a CSI channel Cj is allocated a tree node in level i of subtree z if item i in

class Nj is packed in knapsack Bz of the MCKP solution.

Lemma 5.4.3. If AMC-MKP is an α-approximation to MC-MKP, Algorithm 5.2 is an

α-approximation to CF-CSI-NC.

Proof: Since the translations in steps 1 and 3 maintain the same total profit with

respect to the optimal solution, the proof is similar to that of Lemma 5.4.1

We now present a 2-approximation greedy algorithm for MC-MKP. This algorithm

combines the 2-approximation greedy algorithm for MCKP [49] and the 2-approximation

algorithm for MKP [49].

Algorithm 5.3 A 2-approximation greedy algorithm for MC-MKP

1: For each class Nj with m items, create m new items, the first of which is the first

item from Nj . For each item j > 1, the weight and profit are wij − wi(j−1) and

pij − pi(j−1) respectively. From now on the algorithm relates to the new generated

I items.

2: Sort the new items in decreasing order of their efficiencies (profit divided by weight).

3: Go over the knapsacks in increasing order of capacity. For each knapsack try

to pack items in decreasing order of their efficiencies (only items whose weight is

smaller than the current knapsack capacity are considered). The first item that

does not fit into knapsack z is called the split item for z and denoted sz.

4: Return the maximum between the items packed so far and the solution obtained

by packing of sz in knapsack z.

Lemma 5.4.4. Algorithm 5.3 is a 2-approximation to MC-MKP.

Proof: Consider the linear relaxation of a given MC-MKP instance. The total

profit of the optimal solution for the linear relaxation is not smaller than that of the

MC-MKP instance. Therefore, proving that Algorithm 5.3 is a 2-approximation with

respect to the linear relaxation will complete the proof.

Let G be the total profit of the items packed at the end of step 3, S the total profit

of the solution obtained by packing all the split items, and OPT the total profit of

the optimal solution for the linear relaxation. Since the algorithm considers items in

decreasing order of their efficiency, S + G ≥ OPT holds. The profit of the solution

returned by Algorithm 5.3 in step 4 equals max{S,G}, and therefore it is ≥OPT2 .

The running time of Algorithm 5.3 is O(I log I +K · I), where I is the total number

of items and K is the number of knapsacks. Using linear selection, the running time

can be improved to O(K · I).

As an example of the execution of Algorithm 5.3, recall the MC-MKP instance

considered earlier. Running the greedy algorithm on this instance first chooses item 1 of

class 3 for knapsack 1, since this is the item with highest efficiency. Next, the algorithm

proceeds to knapsack 2 and since all remaining items have an efficiency of 2, the profit

of the returned solution is 7.

The solution found by Algorithm 5.3 is converted into a concrete allocation of CSI

channels in the same way described earlier for Algorithm 5.1. However, this time sorting

is performed for each max-free subtree (knapsack) separately.

Since there are at most O(n ·C) max-free subtrees, the CF-CSI-NC instance in step 1

can be reduced to an MC-MKP instance in O(n · C) time. If the running time of the

MC-MKP algorithm used in step 2 is TMC-MKP, the total running time of Algorithm 5.2

is O(C · n+ TMC-MKP).

5.5 Simulation Study and a Complete BS Scheme

We now present Monte Carlo simulation results of the various algorithms introduced in

the chapter. The goal of this section is twofold:

• To investigate the performance of these algorithms.

• Use the simulation results to develop a complete BS allocation scheme that

indicates when the BS should invoke each algorithm.

Throughout this section we consider CSI allocation trees whose heights are C = 10

and C = 8. The average time window wj between SINR changes is randomly selected

between 32 and 1, 024 subframes. Therefore, for each MS, 0 ≤ lMAXj ≤ 5 holds. The

average data packet rate rj for each MS is uniformly chosen between 50 and 1, 000

packets/second. For every MSj , Ej is set to rj ·wj , and the profit function is as described

in Eq. (5.1). An optimal pseudopolynomial time algorithm is used to solve the reduced

MCKP instance in Algorithm 5.1 and a 2-approximation algorithm is used to solve the

reduced MC-MKP instance in Algorithm 5.2.

5.5.1 The Performance of Algorithm 5.1 and Algorithm 5.2

We first compare Algorithm 5.1 to an algorithm that allocates only level-0 CSI channels

(i.e., only tree leaves). We consider up to 1, 600 active MSs. For each number of

MSs, we repeat the simulation 1, 000 times with different seeds and average the results.

In Figure 5.7 the x-axis indicates the number of active MSs (load) and the y-axis

indicates the normalized profit obtained by Algorithm 5.1, i.e., the profit obtained by

Algorithm 5.1 divided by the profit obtained by an algorithm that allocates only level-0

CSI channel to each MS.

As expected, when the number of MSs is small, allocating each of them a level-0

CSI channel leaves most of the allocation tree unused. Therefore, the normalized profit

of Algorithm 5.1 is high. As the number of MSs increases, more of the tree can be used

by allocating only level-0 nodes and the profit ratio decreases. Since a tree whose height

is C = 8 has fewer leaves (bandwidth) than a tree with C = 10, level-0 allocation takes

a bigger portion of the CSI tree. Therefore, the normalized profit is smaller for C = 8

than for C = 10.

Next, we compare the performance of Algorithm 5.1 with that of Algorithm 5.2. The

total profit obtained by Algorithm 5.2 is expected to be smaller due to the fragmentation

that might result because we do not allow this algorithm to delete already allocated

Figure 5.7: Normalized profit of Algorithm 5.1 vs. the number of MSs (load)

CSI channels. For example, consider the CSI allocation tree in Figure 5.6 and assume

that all MSs are allocated a CSI channel in their maximal level. Assume that a new

MS whose maximal level is 3 becomes active. Since the height of the highest max-free

subtree is 2, the new MS can be allocated a node in level ≤2. In contrast, Algorithm 5.1

deletes the currently allocated CSI channels and returns an allocation where all MSs

get their maximum level CSI channel, thereby obtaining a greater profit. If the new MS

has a larger Ej value, the difference in profit is larger.

In the next trial, we start with an initial list of MSs and invoke Algorithm 5.1 to

allocate them CSI channels. Then, we simulate 1, 000 random events of adding or

deleting randomly chosen MSs. Thus, the average load is proportional to the initial

number of MSs. We maintain two separate CSI allocation trees. After each MS insertion

or deletion, we invoke Algorithm 5.1 on the first tree and Algorithm 5.2 on the second.

The results are shown in Figure 5.8, where the x-axis indicates the average number

of initial MSs (load) and the y-axis indicates the total profit ratio between the tree

maintained by Algorithm 5.1 and the tree maintained by Algorithm 5.2. Again, we

present two curves: one for C = 8 and one for C = 10. When the number of MSs is

small, there is enough CSI bandwidth to accomodate each arriving MS in its maximal

level. Therefore, the profit ratio is very close to 1. As the number of MSs increases,

Algorithm 5.2 is unable to allocate CSI channels at the optimal levels and the profit

ratio increases. When the number of MSs increases further, both Algorithm 5.2 and

Algorithm 5.1 are able to allocate CSI channels (at low levels) only to MSs whose Ej is

high, and the profit ratio decreases back to 1.

5.5.2 A Complete BS Scheme

We now combine Algorithm 5.1 and Algorithm 5.2 into a complete allocation scheme for

the BS. An action is required from the BS in the following cases: (a) a new MS becomes

active; (b) an active MS leaves the cell or becomes inactive; (c) the profit function of an

active MS changes (e.g., due to a change in the user mobility speed). Algorithm 5.2

allows an increase in the profit without the overhead associated with the removal of

Figure 5.8: Total profit of Algorithm 5.1 divided by total profit of Algorithm 5.2 vs.the number of MSs (load)

existing CSI channels. However, Algorithm 5.2 is often unable to allocate a CSI channel

not because the bandwidth is insufficient, but because it is fragmented. In such cases it

might be more beneficial for the BS to clear the CSI allocation tree and invoke Algorithm

5.1. Thus, Algorithm 5.1 brings two important benefits to the scheduler. First, it serves

as a benchmark for Algorithm 5.2, because it indicates the maximum total profit that

can be obtained at every moment. Second, it can be occasionally invoked by the BS

in order to replace the existing tree with a new one for the purpose of maximizing the

profit.

All these considerations are combined into the complete BS scheme presented in

Figure 5.9. The scheme is invoked when a new event is triggered at the BS. When a new

MS becomes active or an active MS becomes inactive, the BS checks the ratio between

the profit obtained by updating the current tree using Algorithm 5.2 and that obtained

by building a new tree using Algorithm 5.1. If this ratio is smaller than a certain

threshold t (0 < t ≤ 1), then the new tree built by Algorithm 5.1 is used. Otherwise,

the current tree is updated using Algorithm 5.2. This ensures that the obtained profit

is never worse by a factor of t than the maximum possible. However, as t approaches

1, the number of CSI control (allocation and deallocation) messages sent to the MSs

increases.

We evaluate the above scheme for a CSI allocation tree with C = 10. We set the

average number of MSs to 250, which is where, as Figure 5.8 shows, the ratio between

the profit obtained by Algorithm 5.1 and that obtained by Algorithm 5.2 is very high

for C = 10. 1, 000 random events are considered and averaged for each value of t.

Figure 5.10 shows the ratio between the profit achieved by the scheme and the

(maximal) profit achieved by Algorithm 5.1 as a function of t. As expected, when the

value of t increases, the profit of the scheme is closer to the optimal because Algorithm 5.1

is invoked more often. To study the cost of invoking Algorithm 5.1 more frequently, we

average number of changes per event for the proposed scheme as a function of t. The

results are shown in Figure 5.11. We can see that a good tradeoff between efficiency

and cost can be obtained for 0.84 ≤ t ≤ 0.94.

The BS receives

a new event

Run Algorithm 2

Run Algorithm 1

of Algorithm 1 to

Compare the solution

that of Algorithm 2

Use the allocation

of Algorithm 1

Use the allocation

of Algorithm 2

of Algorithm 1 is

”significantly

better”

solutionThe

Figure 5.9: The complete BS scheme

Finally, we enforce an upper bound of t = 0.94 and test the performance of the

proposed scheme for different numbers of active MSs. The results are shown in Fig-

ure 5.12, where the x-axis indicates the average number of MSs (load) and the y-axis

indicates the average number of changes per event. The maximum number of changes

per event occurs when the average number of MSs is ≈ 250, which is expected because,

as Figure 5.8 shows, this is where the maximum profit ratio between Algorithm 5.1 and

Algorithm 5.2 is obtained.

5.6 Conclusions

We presented a formal framework for the allocation of periodic CSI channels. In the

proposed framework, the allocated bandwidth is maintained as a tree. Every MS is

associated with a profit function that indicates the “profit of the system” from allocating

a CSI channel of certain bandwidth to this MS. We defined two optimization problems

for this framework, and proposed optimal polynomial-time algorithms for them. Our

simulation study shows how the proposed algorithms can be combined into a unified

scheme, to be invoked by the BS when a new event takes place.

0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1

it / m

t (threshold)

Figure 5.10: The profit achieved by the proposed scheme divided by the maximumprofit that can be achieved using Algorithm 5.1, as a function of the threshold t

0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1

t (threshold)

Figure 5.11: Average number of changes per event of the proposed scheme as a functionof the threshold t

Figure 5.12: Average number of changes per event of the proposed scheme as a functionof the average number of MSs for t = 0.94

Appendix A

Proofs for the Theorems of

Chapter 2

A.1 The Proof of Theorem 2.1

To prove that 1-round RM-AMC(OC-1) is NP-hard, we present the Unbounded Subset

Sum Problem (USSP) [49]. The instance of USSP is a set S of item types s1, s2, . . . , sm

and a capacity C. Each type si has a weight w(si). The objective is to find a vector

= (s′1, . . . , s′m) of items whose aggregated weight

∑mi=1 s

′i · w(si) is maximum but not

larger than C.

USSP is known to be NP-hard in the weak sense1 [49]. We reduce the decision version

of USSP into the decision version of 1-round RM-AMC(OC-1). Recall that in a decision

version of a problem, an algorithm is only expected to tell whether or not a solution

with a specified value exists. In the decision version of 1-round RM-AMC(OC-1), the

algorithm only needs to tell whether there is a transmission configuration whose total

bandwidth cost is B and the probability that the designated receiver will correctly

decode the data block is P .

Given an input for USSP, we translate it into an input for 1-round RM-AMC(OC-

1) in the following way. Every item type si ∈ S is transformed into a function

fi(SNR) = 1− 2−w(si)/C that determines the probability to correctly receive an MCS-i

packet for a given SNR value. The size of a packet encoded using MCS-i is equal to

w(si). Note that with this transformation we have N = |S| = m MCSs. In addition, for

the RM-AMC(OC-1) instance, we set P ← 12 , K ← 1 and B ← C.

We now show that there exists an USSP solution that uses the entire capacity C

if and only if there exists a transmission configuration with bandwidth B = C such

that the probability that the designated receiver will correctly decode the data block is

P = 12 . Let τi be the number of packets transmitted using MCS-i in the solution for

RM-AMC(OC-1). The probability that the designated receiver will not receive the data

1A problem is NP-hard in the weak sense if it is NP-hard when the input is represented as a binarystring.

block correctly is equal to the probability that it will not receive any packet, namely,∏Nj=1 (1− fj(SNR))τj =

∏Nj=1 2−w(sj)·τj/C = 2−

∑Nj=1 w(sj)·τj/C . If there is a solution

for USSP that uses the entire capacity C, then there is a transmission configuration

that uses the entire bandwidth B = C. In such a case,∑N

j=1w(sj) · τj = C and

the probability that the designated receiver will correctly decode the data block is12 . If a solution for USSP does not exist, then for every transmission configuration,∑Nj=1w(sj) · τj < C, and the probability that the designated receiver will correctly

decode the data block is < 12 .

In Theorem 2.1, we proved that 1-round RM-AMC(OC-1) is NP-hard even if K = 1. We

now reduce the instance of the 1-round RM-AMC(OC-1) decision problem considered

in Theorem 2.1 to an instance of the R-round RM-AMC(OC-1) decision problem for a

constant R > 1. The reduction is trivial: the input remains the same and the decision to

be made by an R-round RM-AMC(OC-1) algorithm is whether there exists an R-round

algorithm with total bandwidth cost B for which the probability that the designated

receiver will correctly decode the data block is P .

If there is a solution for the considered 1-round RM-AMC(OC-1) instance, the

same transmission configuration can be used in the first round of the R-round RM-

AMC(OC-1) problem, and in the remaining R− 1 rounds no packet is sent. If there is a

solution for the reduced R-round RM-AMC(OC-1) problem, then, since K = 1, there

is an algorithm composed of R transmission configurations where the ith transmission

configuration is used in the ith round. The event that occurs if in the ith round the

receiver correctly receives at least 1 packet is denoted Di. Note that since K = 1, this

is equivalent to the event that the receiver correctly decodes the data block after the

ith round. The probability that the receiver will correctly decode the data block is

Pr(D1∨D2∨· · ·∨DR) = 1−Pr(Dc1∧Dc

2∧· · ·∧DcR), where Dc

i is the complement of Di.

Let τ(i) be the transmission configuration used in the ith round and let τ =∑R

i=1 τ(i).

The probability for correctly decoding the data block when using τ in the considered 1-

round RM-AMC(OC-1) instance is exactly 1−Pr(Dc1∧Dc

2∧· · ·∧DcR). The bandwidth of

τ is equal to the bandwidth used in the solution for the reduced R-round RM-AMC(OC-

1) problem. Hence, there is a solution for the considered 1-round RM-AMC(OC-1)

instance and the reduction holds.

We prove the theorem by induction on the number of rounds r. For r = 1, the correctness

of Eq. 2.2 is straightforward, because we select the maximum value of Gτ [≥ k] over all

transmission configurations with bandwidth b. By the induction hypothesis, Eq. 2.2

calculates H(k, b, r) correctly for every r < l for a given l > 1 and for every k, b. We

now show that Eq. 2.2 also calculates H(k, b, l) correctly. By the induction hypothesis,

for every i Gτ [i] · H(max(k − i, 0), b − c, l − 1) is equal to the maximum probability

that the designated receiver will correctly receive at least k packets using l rounds and

bandwidth b, assuming that in round l the transmission configuration τ is used and

exactly i packets are correctly received during this round. Since the summation in Eq.

2.2 is performed for 0 ≤ i ≤ c, it is equal to the probability that the designated receiver

will correctly receive at least k packets using l rounds and bandwidth b assuming that

in round l the transmission configuration τ is used. Since the selected summation value

is the maximum over all transmission configurations with bandwidth ≤ b, the theorem

holds.

We prove the theorem by induction on the number of rounds r. For r = 1, the correctness

of Eq. 2.5 is straightforward, because we select the maximum between the performance

of Alg. 2.2 and that of a single MCS (for every MCS-j) in the single round. Assume

that Eq. 2.5 calculates M(k, b, r) correctly for every r < l for a given l > 1 and for every

k, b. We now show that Eq. 2.5 also calculates M(k, b, l) correctly. By the induction

hypothesis,∑c

i=0 Uc[i] ·M(max(k− i, 0), b−c, l−1) is equal to the maximum probability

that the designated receiver will correctly receive at least k packets using l rounds and

bandwidth b, while Alg. 2.2 is used with bandwidth c in the lth round. Now, note that∑ci=0 S

jc [i] ·M(max(k − i, 0), b− c, l − 1) is equal to the maximum probability that the

designated receiver will correctly receive at least k packets using l rounds and bandwidth

b where in round l only MCS-j packets are transmitted and the total bandwidth of this

round is c. Since in Eq. 2.5 we consider every 0 ≤ c ≤ b and every 1 ≤ j ≤ N and select

the maximal value, the theorem holds.

We prove the theorem by induction on the bandwidth b. For b = 0, the correctness of

Eq. 2.6 is straightforward because no packet can be transmitted. By the induction

hypothesis we assume that Eq. 2.6 calculates F (k, b) correctly for every b < l where

l > 0 and for every k. We now show that Eq. 2.6 calculates F (k, l) correctly. First,

note that by Observation 2.5.2 it is sufficient to consider the cases where in every

round a single packet is transmitted. We first consider the case where for every j,

bj > l. Although the bandwidth is > 0, in this case we still cannot transmit any packet.

Thus, F (k, l) = F (k, 0), and Eq. 2.6 holds. If bj ≥ l holds for some j, the expression

pj ·F (k− 1, l− bj) + (1− pj) ·F (k, l− bj) is equal to the maximum probability that the

designated receiver will correctly receive at least k packets using bandwidth l, where in

round l an MCS-j packet is transmitted. Since we select the maximum value over all

values of j from which a packet can be sent using MCS-j (bj < b), Eq. 2.6 calculates

F (k, l) correctly in this case as well.

Appendix B

Simulation Interference Model

We start by describing how the SINR of each user is calculated as a function of the

end power it experiences. Recall that the bandwidth of each cell is partitioned into

4 subbands: F0, F1, F2 and F3 (Fig. 3.3). Let Si be the set of scheduling areas that

use subband Fi, for i ∈ {0, 1, 2, 3}. For example, in the 7-cell network presented in

Fig. 3.6, |S1| = |S2| = |S3| = 7 and |S0| = 21. Let ps(u) be the power received by user

u in scheduling area s ∈ Si. The SINR experienced by u is defined by:

γs(u) =ps(u)∑

s′ 6=s,s′∈Si

ps′(u) + n0w, (B.1)

where w is the total bandwidth used in the sector, n0 is the thermal noise over the

bandwidth w, and the end power ps(u) is given by the following equation [85]:

ps(u) =ps − PLs(u) + gs−avers (θs(u))− ahor

s (ϕs(u))(dBm),

where ps is the power, in dBm, of the antenna transmitting to scheduling area s, and gs

is the gain of this antenna. In addition, avers and ahor

s are the vertical and horizontal

radiation pattern due to the position of the user in relation to that of the transmitting

antenna. Thus, they are a function of the vertical angle θs(u) and horizontal angle ϕs(u)

between the user and the antenna main beam. The path loss is estimated using the

Hata propagation model for small to medium-sized cities and is denoted PLs(u).

The vertical and horizontal radiation patterns are calculated using the following

equations [85]:

ahors (θs(u)) = −min

(θs(u)

), SLAv

avers (ϕs(u)) = −min

(ϕs(u)

where SLAv = 20dB is the side lobe attenuation, Am = 25dB is the front-to-back

attenuation, and θ3dB, ϕ3dB are the half power beam width in vertical, horizontal

plane respectively. The Hata propagation model for urban areas is calculated using the

following equation [40]:

PLs(u) = 69.55 + 26.16 log10(f0)− 13.82 log10(zs)

− a(zu) + (44.9− 6.55 log10(zu)) log10(ds(u)),

where f0 = 1500MHz is the frequency of transmission, zs is the height (meters) of

the antenna used for scheduling area s, zu is the height (meters) of user u, ds(u)

is the distance (kilometers) between u and the antenna of scheduling area s, and

a(zu) = 0.8 + (1.1 · log10(f0)− 0.7) · zu − 1.56 log10(f0) for a small/medium sized city.

Appendix C

Dynamic Programming

Algorithm for d-MCKP

We now present a dynamic programming algorithm for d-MCKP. Recall that the input

consists of a D-dimension knapsack size [K[1], . . . ,K[D]] and a set of n items {1, . . . , n},each with at most m configurations. Let Zi (k[1], . . . , k[D]) be the total profit of an

optimal solution for a d-MCKP instance with item set {1, . . . , i} and knapsack size

[k[1], . . . , k[D]]. Let sji [k] be the size of item i using configuration j in the kth dimension

and pji be the profit of item i using configuration j. We now define the equations

required for computing Zn (K[1], . . . ,K[D]) using dynamic programming. Note that

by its definition, Zn (K[1], . . . ,K[D]) is the total profit of the optimal solution for the

d-MCKP instance.

For clarity, the equations for computing Zi (k[1], . . . , k[D]) are presented separately

for the following disjoint cases:

(case-1) The knapsack size [k[1], . . . , k[D]] is illegal, namely, ∃j, k[j] < 0.

(case-2) The knapsack size is legal and the item set is empty, namely, ∀j, k[j] ≥ 0 and

n = 0.

(case-3) The knapsack size is legal and the item set is not empty, namely, ∀j, k[j] ≥ 0

and n > 0.

We start with the equation for case-1. Since the knapsack size is illegal, there is no

feasible solution in this case. Therefore, the total profit is set to the worst case, namely

−∞, and the resulting equation for case-1 is:

Zi (k[1], . . . , k[D]) = −∞ if ∃j, k[j] < 0.

In case-2, the knapsack size is legal and the item set is empty. Thus, the empty solution

is optimal and its total profit is 0, which yields:

Z0 (k[1], . . . , k[D]) = 0 if ∀j, k[j] ≥ 0.

For case-3, the knapsack size is legal and the item set is not empty. If the item set is

{1, . . . , i}, the total profit of the optimal solution can be obtained by going over each

possible configuration of item i and finding the maximum total profit if this configuration

is selected. The following equation takes each configuration of item i and computes the

total profit after adding this configuration’s profit and updating the knapsack size. Out

of all these configurations, the one that yields maximum total profit is selected:

Zi (k[1], . . . , k[D]) =

Zi−1(k[1]− s1i [1], . . . , k[D]− s1i [D]

)+ p1i

Zi−1 (k[1]− smi [1], . . . , k[D]− smi [D]) + pmi .

The above equations compute the total profit of the optimal solution. However,

a simple extension can be used to obtain the optimal solution, which is feasible and

has the maximal profit. To this end, an array of size n ·∏K[i] is maintained, where

each entry contains the total profit and the configuration selected when computing

Zn (k[1], . . . , k[D]). The optimal solution is obtained by calculating Zn (K[1], . . . ,K[D]),

and then using this entry to find the configuration selected for item n. The size of this

configuration is reduced from the knapsack size, and so we are left with n− 1 items and

an updated knapsack size. The process is now repeated recursively until there are no

items left. The set of all configurations is the optimal solution.

We now analyze the time complexity of this dynamic programming algorithm. As de-

scribed earlier, an array of size n·∏K[i] is used, and when computing Zn (k[1], . . . , k[D])

each entry is computed at most once, and each computation takes O(m) time. Therefore,

the time complexity of the dynamic programming algorithm is O ((∏K[i]) · n ·m). The

optimal solution is found by going over the array as described, which does not increase

the time complexity.

Bibliography

[1] 3GPP. Evolved Universal Terrestrial Radio Access E-UTRA; Further Ad-

vancements for E-UTRA Physical Layer Aspects,TR 36.814 .

[2] 3GPP. Introduction of the Multimedia Broadcast/Multicast Service(MBMS)

in the Radio Access Network (RAN), March 2006.

[3] 3GPP. Technical Specification Group Radio Access Network; Evolved Univer-

sal Terrestrial Radio Access (E-UTRA); Physical layer procedures (Release

10), 3GPP TS 36.213, June 2010.

[4] 3GPP. Technical Specification Group Radio Access Network; Evolved Univer-

sal Terrestrial Radio Access (E-UTRA); Physical Channels and Modulation

(Release 10), 3GPP TS 36.211, 2011.

[5] B. Adamson, C. Bormann, M. Handley, and J. Macker. Negative-

acknowledgment (NACK) - oriented reliable multicast (NORM) building

blocks. RFC-3941, November 2004.

[6] B. Adamson, C. Bormann, M. Handley, and J. Macker. Negative-

acknowledgment (NACK) - oriented reliable multicast (NORM) protocol.

RFC-3940, November 2004.

[7] Md Mostofa Akbar, M. Sohel Rahman, M. Kaykobad, E.G. Manning, and

G.C. Shoja. Solving the multidimensional multiple-choice knapsack problem

by constructing convex hulls. Computers & Operations Research, 33(5):1259 –

1273, 2006.

[8] I. Akyildiz, D. Gutierrez-Estevez, and E. Reyes. The evolution to 4G cellular

systems: LTE-Advanced. Physical Communication, 3(4), 2010.

[9] David Amzallag, Reuven Bar-Yehuda, Danny Raz, and Gabriel Scalosub.

Cell selection in 4G cellular networks. IEEE INFOCOM, 2008.

[10] Ghassane Aniba and Sonia Aıssa. Adaptive scheduling for MIMO wireless

networks: cross-layer approach and application to HSDPA. IEEE Transactions

on Wireless Communications, (1), 2007.

[11] Zijian Bai, C. Spiegel, G.H. Bruck, P. Jung, M. Horvat, J. Berkmann,

C. Drewes, and B. Gunzelmann. Closed loop transmission with precod-

ing selection in LTE/LTE-Advanced system. International Symposium on

Applied Sciences in Biomedical and Communication Technologies, pages 1–5,

November 2010.

[12] Krishna Balachandran et al. Design and analysis of an IEEE 802.16e-based

OFDMA communication system. BLTJ, 11(4), 2007.

[13] M. Bansal and V. Venkaiah. Improved fully polynomial time approximation

scheme for the 0-1 multiple-choice knapsack problem. SIAM Conference on

Discrete Mathematics, 2004.

[14] Reuven Bar-Yehuda and Shimon Even. A local-ratio theorem for approxi-

mating the weighted vertex cover problem. Annals of Discrete Mathematics,

25:27–45, 1985.

[15] G. Berardinelli, T.B. Sørensen, P. Mogensen, and K. Pajukoski. SVD-based

vs. release 8 codebooks for single user MIMO LTE-A uplink. IEEE Vehicular

Technology Conference, pages 1–5, May 2010.

[16] Christian R. Berger, Shengli Zhou, Yonggang Wen, Peter Willett, and Krishna

Pattipati. Optimizing joint erasure- and error-correction coding for wireless

packet transmissions. IEEE Transactions on Wireless Communications,

7(11):4586–4595, Nov. 2008.

[17] Omer Bulakci, Simone Redana, Bernhard Raaf, and Jyri Hamalainen. Impact

of power control optimization on the system performance of relay based

LTE-advanced heterogeneous networks. Journal of Communications and

Networks, 13(4):345–359, 2011.

[18] N. Cherfi and M. Hifi. A column generation method for the multiple-choice

multi-dimensional knapsack problem. Computational Optimization and Ap-

plications, 46:51–73, 2010.

[19] Che-Sheng Chiu and Chia-Chi Huang. Improving inter-sector handover user

throughput by using partial reuse and softer handover in 3GPP LTE downlink.

ICACT, 1:463 –467, feb. 2008.

[20] Hyun-Ho Choi, Jong Bu Lim, Hyosun Hwang, and Kyunghun Jang. Optimal

handover decision algorithm for throughput enhancement in cooperative

cellular networks. IEEE VTC Fall, pages 1–5, 2010.

[21] R. Cohen and G. Grebla. Efficient allocation of CQI channels in broadband

wireless networks. IEEE INFOCOM, pages 96 –100, April 2011.

[22] Reuven Cohen, Guy Grebla, and Liran Katzir. Cross-layer hybrid FEC/ARQ

reliable multicast with adaptive modulation and coding in broadband wireless

networks. IEEE/ACM Trans. Netw., 18(6):1908–1920, December 2010.

[23] Reuven Cohen and Liran Katzir. A generic quantitative approach to the

scheduling of synchronous packets in a shared uplink wireless channel.

IEEE/ACM Trans. Netw., 15(4):932–943, August 2007.

[24] Reuven Cohen and Liran Katzir. Computational analysis and efficient al-

gorithms for micro and macro OFDMA downlink scheduling. IEEE/ACM

Trans. Netw., 18(1):15–26, 2010.

[25] Reuven Cohen and Liran Katzir. Computational analysis and efficient algo-

rithms for micro and macro OFDMA scheduling. IEEE/ACM Trans. Netw.,

18(1):15–26, February 2010.

[26] Reuven Cohen, Liran Katzir, and Danny Raz. An efficient approximation

for the generalized assignment problem. Information Processing Letters,

100(4):162–166, November 2006.

[27] Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction

to Algorithms. The MIT Press, 1990.

[28] A. Das, Krishna Balachandran, F. Khan, Ashwin Sampath, and H.J. Su.

Network controlled cell selection for the high speed downlink packet access in

UMTS. IEEE WCNC, 4:1975–1979, March 2004.

[29] M. Dottling, B. Raaf, and J. Michel. Efficient channel quality feedback

schemes for adaptive modulation and coding of packet data. IEEE Vehicular

Technology Conference, 2:1243–1247, September 2004.

[30] Thomas Erlebach, Riko Jacob, Matus Mihalak, Marc Nunkesser, Gabor

Szabo, and Peter Widmayer. An algorithmic view on OVSF code assignment.

Algorithmica, 47(3):269–298, 2007.

[31] S. Fashandi, S. Oveisgharan, and A.K. Khandani. Coding over an erasure

channel with a large alphabet size. IEEE International Symposium on

Information Theory, pages 1053–1057, July 2008.

[32] Minghai Feng, Xiaoming She, Lan Chen, and Yoshihisa Kishiyama. Enhanced

dynamic cell selection with muting scheme for DL CoMP in LTE-A. IEEE

VTC Spring, pages 1–5, 2010.

[33] G.E. Fox and G.D. Scudder. A heuristic with tie breaking for certain 0-1

integer programming models. Naval Research Logistics Quarterly, 32:613–623,

[34] Hua Fu and Dong In Kim. Downlink scheduling with AMC and FCS in

WCDMA networks. IEEE GLOBECOM, 2007.

[35] Noriyuki Fukui. Study of channel quality feedback in UMTS HSPDA. IEEE

International Symposium on Personal, Indoor and Mobile Radio Communi-

cation Proceedings, 1:336–340, September 2003.

[36] G.D.Papadopoulos, G.Koltsidas, and F.N.Pavlidou. Two hybrid ARQ al-

gorithms for reliable multicast communications in UMTS networks. IEEE

Communications Letters, 10(4):260–262, 2006.

[37] George Gens and Eugene Levner. An approximate binary search algorithm

for the multiple-choice knapsack problem. Information Processing Letters,

67(5):261–265, 1998.

[38] Hassan Hamdoun, Pavel Loskot, Timothy O’Farrell, and Jianhua He. Practical

network coding for two way relay channels in LTE networks. In VTC Spring,

pages 1–5. IEEE, 2011.

[39] A.X. Han and I-Tai Lu. Optimizing beyond the carrier by carrier proportional

fair scheduler. IEEE Sarnoff Symposium, pages 1–5, may 2011.

[40] M. Hata. Empirical formula for propagation loss in land mobile radio services.

IEEE Transactions on Vehicular Technology, 29(3):317–325, Aug. 1980.

[41] Fan Huang, Jian Geng, Guoxing Wei, Yafeng Wang, and Dacheng Yang.

Performance analysis of distributed and centralized scheduling in two-hop

relaying cellular system. IEEE 20th International Symposium on Personal,

Indoor and Mobile Radio Communications, pages 1337 –1341, Sept. 2009.

[42] Jianwei Huang, Vijay G. Subramanian, Rajeev Agrawal, and Randall A. Berry.

Downlink scheduling and resource allocation for OFDM systems. IEEE Trans.

on Wireless Communic., 8(1):288–296, 2009.

[43] Tian Hui, Sun Qiaoyun, Dong Kun, and Li Xingmin. A hybrid CQI feedback

scheme for 4G wireless systems. IEEE Vehicular Technology Conference,

pages 1–4, April 2009.

[44] Chan Soo Hwang and Yungsoo Kim. An adaptive modulation method for

multicast communications of hierarchical data in wireless networks. In ICC,

pages 896–900, 2002.

[45] Institute of Electrical and Electronics Engineers Inc. IEEE Draft Standard for

Local and Metropolitan Area Networks – Part 16: Air Interface for Broadband

Wireless Access Systems, June 2008.

[46] Soo Yong Jeon and Dong Ho Cho. Channel adaptive CQI reporting schemes

for HSDPA systems. IEEE Communications Letters, 10:459–461, 2006.

[47] Koushik Kar, Saswati Sarkar, and Leandros Tassiulas. Optimization based

rate control for multirate multicast sessions. In INFOCOM, pages 123–132,

[48] Himanshu Katiyar and R. Bhattacharjee. Outage performance of multi-

antenna relay cooperation in the absence of direct link. IEEE Communications

Letters, 15(4):398–400, 2011.

[49] Hans Kellerer, Ulrich Pferschy, and David Pisinger. Knapsack Problems.

Springer, 2004.

[50] Junu Kim, Jinsung Cho, and Heonshik Shin. Resource allocation for scalable

video broadcast in wireless cellular networks. In WiMob, volume 2, pages

174–180, August 2005.

[51] JuYeop Kim and Dong Ho Cho. Enhanced adaptive modulation and coding

schemes based on multiple channel reportings for wireless multicast systems.

In Vehicular Technology Conference, pages 725–729, 2005.

[52] Ariel Kulik and Hadas Shachnai. There is no EPTAS for two-dimensional

knapsack. Inf. Process. Lett, 110(16):707–710, 2010.

[53] Young Min Kwon, Ok Kyung Lee, Ju Yong Lee, and Min Young Chung.

Power control for soft fractional frequency reuse in OFDMA system. ICCSA,

6018:63–71, 2010.

[54] Eugene L. Lawler. Fast approximation algorithms for knapsack problems.

Math. Oper. Res, 4(4):339–356, 1979.

[55] Daewon Lee, Hanbyul Seo, Bruno Clerckx, Eric Hardouin, David Mazzarese,

Satoshi Nagata, and Krishna Sayana. Coordinated multipoint transmis-

sion and reception in LTE-advanced: deployment scenarios and operational

challenges. IEEE Communications Magazine, 50(2):148–155, 2012.

[56] Woongsup Lee and Dong Ho Cho. CQI feedback reduction based on spatial

correlation in OFDMA system. IEEE Vehicular Technology Conference, pages

1–5, September 2008.

[57] H.W.jun. Lenstra. Integer programming with a fixed number of variables.

Math. Oper. Res., 8:538–548, 1983.

[58] Dan Li and David R. Cheriton. Evaluating the utility of FEC with reliable

multicast. In ICNP, pages 97–105, 1999.

[59] Wang Lilei, Wang Xiaoyi, and Xu Huimin. Strategies to improve the utilization

ratio of CQI channels in IEEE 802.16 systems. WiCOM ’08, pages 1–4.

[60] Lingjia Liu, Runhua Chen, Stefan Geirhofer, Krishna Sayana, Zhihua Shi,

and Yongxing Zhou. Downlink MIMO in LTE-advanced: SU-MIMO vs.

MU-MIMO. IEEE Communications Magazine, 50(2):140–147, 2012.

[61] M. Luby, L. Vicisano, J. Gemmell, L. Rizzo, M. Handley, and J. Crowcroft.

The use of Forward Error Correction (FEC) in reliable multicast. RFC-3453,

December 2002.

[62] F. J. MacWilliams and N. J. A. Sloane. The theory of Error-Correcting Codes.

Amsterdam, The Netherlands, 1977.

[63] V.P. Mhatre and C.P. Rosenberg. The impact of link layer model on the

capacity of a random ad hoc network. In IEEE International Symposium on

Information Theory, pages 1688 –1692, July 2006.

[64] Patric Mitran, Catherin Rosenberg, John Sydor, Jun Luo, and Samat Shab-

danov. On the capacity and scheduling of a multi-sector cell with co-channel

interference knowledge. Med-Hoc-Net, pages 1–8, 2010.

[65] Satoshi Nagata, Yuan Yan, Xinying Gao, Anxin Li, Hidetoshi Kayama,

Tetsushi Abe, and Takehiro Nakamura. Investigation on system performance

of L1/L3 relays in LTE-advanced downlink. IEEE VTC, 2011.

[66] Derrick Wing Kwan Ng, Ernest S. Lo, and Robert Schober. Dynamic resource

allocation in MIMO-OFDMA systems with full-duplex and hybrid relaying.

IEEE Trans. on Communic., 60(5):1291–1304, 2012.

[67] Jorg Nonnenmacher, Ernst Biersack, and Donald F. Towsley. Parity-based

loss recovery for reliable multicast transmission. IEEE/ACM Trans. Netw.,

6(4):349–361, 1998.

[68] Thomas David Novlan, Jeffrey G. Andrews, Illsoo Sohn, Radha Krishna Ganti,

and Arunabha Ghosh. Comparison of fractional frequency reuse approaches

in the OFDMA cellular downlink. IEEE GLOBECOM, 2010.

[69] Boaz Patt-Shamir and Dror Rawitz. Vector bin packing with multiple-choice.

Discrete Applied Mathematics, 160(10-11):1591–1600, 2012.

[70] Steven W. Peters, Ali Y. Panah, Kien T. Truong, and Robert W. Heath Jr.

Relay architectures for 3GPP LTE-advanced. EURASIP J. Wireless Comm.

and Networking, 2009.

[71] Razvan Pitic and Antonio Capone. An opportunistic scheduling scheme with

minimum data-rate guarantees for OFDMA. IEEE WCNC, pages 1716–1721,

[72] M.B. Pursley and J.M. Shea. Multimedia multicast wireless communications

with phase-shift-key modulation and convolutional coding. IEEE Journal on

Selected Areas in Communications, 17(11):1999–2010, November 1999.

[73] Tongwei Qu, Dengkun Xiao, and Dongkai Yang. A novel cell selection method

in heterogeneous LTE-advanced systems. IC-BNMT, October 2010.

[74] Tongwei Qu, Dengkun Xiao, Dongkai Yang, Wei Jin, and Yuan He. Cell

selection analysis in outdoor heterogeneous networks. ICACTE, 2010.

[75] F.E. Retnasothie, M.K. Ozdemir, T. Yucek, H. Celebi, J. Zhang, and

R. Muththaiah. Wireless IPTV over WiMAX: Challenges and applications.

In WAMICON, pages 1–5, December 2006.

[76] L. Rizzo and L. Vicisano. RMDP: An FEC-based reliable multicast proto-

col for wireless environments. ACM SIGMOBILE Mobile Computing and

Communications Review, 2(2), April 1998.

[77] Dan Rubenstein, James F. Kurose, and Donald F. Towsley. A study of

proactive hybrid FEC/ARQ and scalable feedback techniques for reliable,

real-time multicast. Computer Communications, 24(5-6):563–574, 2001.

[78] Stefan Schwarz, Christian Melfuhrer, and Markus Rupp. Calculation of

the spatial preprocessing and link adaption feedback for 3GPP UMTS/LTE.

IEEE Conference on Wireless Advanced, pages 1–6, June 2010.

[79] Stefan Schwarz, Martin Wrulich, and Markus Rupp. Mutual information

based calculation of the precoding matrix indicator for 3GPP UMTS/LTE.

International ITG Workshop on Smart Antennas, pages 52–58, February

[80] J. She, Fen Hou, Pin Han Ho, and Liang Liang Xie. IPTV over WiMAX: Key

success factors, challenges, and solutions [advances in mobile multimedia].

IEEE Communications Magazine, 45(8):87–93, August 2007.

[81] James She and Pin Han Ho. Cooperative coded video multicast for IPTV

services under EPON-WiMAX integration. IEEE Communications Magazine,

pages 104–110, Aug. 2008.

[82] James She, Xiang Yu, Pin Han Ho, and En Hui Yang. A cross-layer design

framework for robust IPTV services over IEEE 802.16 networks. IEEE

Journal on Selected Areas in Communications, 27(2):235–245, Feb. 2009.

[83] Jian Su, Bin Fan, Kan Zheng, and Wenbo Wang. A hierarchical selective CQI

feedback scheme for 3GPP long-term evolution system. IEEE International

Symposium on Microwave, Antenna, Propagation and EMC Technologies for

Wireless Communications, pages 5–8, August 2007.

[84] Karthikeyan Sundaresan and Sampath Rangarajan. Adaptive resource

scheduling in wireless OFDMA relay networks. In Albert G. Greenberg

and Kazem Sohraby, editors, INFOCOM, pages 1080–1088. IEEE, 2012.

[85] N. Tabia, A. Gondran, O. Baala, and A. Caminada. Interference model and

evaluation in LTE networks. (WMNC), Oct. 2011.

[86] Rath Vannithamby, Guangjie Li, Hujun Yin, and Sassan Ahmadi. Proposal

for IEEE 802.16m CQI feedback framework. May 2008.

[87] H.M. Weingartner and D.N. Ness. Methods for the solution of the multidi-

mensional 0/1 knapsack problem. Operations Research, 15:83–103, 1967.

[88] Weiwei Wu and T. Sakurai. Capacity of reuse partitioning schemes for

OFDMA wireless data networks. IEEE International Symposium on Indoor

and Mobile Radio Communications, pages 2240 –2244, Sept. 2009.

[89] Xiaoxin Wu, Juejia Zhou, Guangjie Li, and May Wu. Low overhead CQI

feedback in multi-carrier systems. GLOBECOM, pages 371–375, 2007.

[90] Wang Xiaoyi and Wang Lilei. A new mathematical model for analyzing

CQI channel allocation mechanism in IEEE 802.16 systems. ICCSC, pages

177–181, 2008.

[91] Xiao Xue, Ji hong Zhao, and Hua Qu. Inter-cell interference coordination

scheme based on CoMP. pages 33 –36, Feb. 2012.

[92] Yang Richard Yang, Min Sik Kim, and Simon S. Lam. Optimal partitioning

of multicast receivers. In ICNP, pages 129–140, 2000.

[93] Z. Yang, Q. Zhang, and Z. Niu. Throughput improvement by joint relay selec-

tion and link scheduling in relay-assisted cellular networks. IEEE Transactions

on Vehicular Technology, 61(6):2824 –2835, July 2012.

[94] O. Yilmaz, S. Hamalainen, and J. Hamalainen. System level analysis of

vertical sectorization for 3GPP LTE. ISWCS, pages 453 –457, Sept. 2009.

[95] Ying Jun Zhang and Khaled Ben Letaief. Multiuser adaptive subcarrier-and-

bit allocation with adaptive cell selection for OFDM systems. IEEE Trans.

Wireless Communic., 3(5):1566–1575, 2004.

.ce`n miaeh md

zpgz lk ly dhilwd i`pz jnq lr zeliag oenfiz iabl dizehlgd z` zlawn qiqad zpgz

,lynl .ynzydl xeciy zhiy dfi`a dhlgdd lr mirityn dhilwd i`pz ,hxta .dvw

xeciy zhiya zeliag dil` gelyl sicrp ,miaeh xzei dvwd zpgz ly dhilwd i`pzy lkk

i`pz iabl rcin gelyl dvw zpgz lk dkixv ,jkl i` .ze`iby ipta zegt dcinre xzei dxidn

yeniy xyt`n `ede ,mipey mibeqn zeidl leki glypd rcind .shey ote`a dly dhilwd

CSI `xwp df rcin .(Multiple Input Multiple Output) MIMO oebk zencwzn zeibelepkha

,qiqad zpgzl dlerd uexra ax qt agex jxev CSI rcin .(Channel State Information)

zgily zaehl oinfd qtd agex ea lcena mipc ep` .ixefgn ote`a glyp `edy meyn xwira

,illk oexzt mirivn ep` .mieqn divfinihte` oeixhixw biydl dti`y jez dvwen CSI rcin

zeglypd rcind zeliag xtqn meqwin ly ihxwpew divfinihte` oeixhixw mibivn mb j`

zepgzd lv` dhilwd zeki` iabl miipkcr mipezp yi qiqad zpgzl xy`k dvwd zepgzl

mibivne efk d`vwd revial minzixebl` xtqn mixicbn ep` ,ok enk .rcind glyp odil`

.mzixebl` lk uixdl izn hilgdl qiqad zpgzl zxyt`ny d`ln dnikq

zhiy ly zlkyen dxiga ici lr multicast ly mireviad z` xtyl ozip cvik mi`xn ep`

qiqad zpgz xy`k zxxerznd dycg dira mixicbn ep` .dliag lkl iphxt ote`a xeciyd

jqy jk zexizi zeliag siqedl dleki qiqad zpgz .miynzyn zveawl rcin xcyl dvex

rcind z` gprtl dleki zeliag K zegtl zlawnd dpgz lk j` ,zeliag K+n zexceyn lkd

zhiy .dpey xeciy zhiya zxceyn zeidl dleki dliag lk zifitd dakya ,ok enk .dglvda

dglvda hlwz dliagdy iekiqd z` oke dliagd xear yxciiy qtd agex z` zraew xeciyd

xtqna èd lcadd xy`k ,dirad ly zeivìxè xtqn mibivn ep` .crid zepgzn zg` lka

zeivìxèd aex ik ilnxet oteà migiken ep` .rcind xeciyl mixzend dgilyd iaaq

oke ,miize`ivn miyigxzl miilnihte` zepexzt mibivn j` ,(NP-hard) ziaeyig zeyw od

.miillkd mixwnl zewihqixeide miaexiw

zephp` xtqn qiqad zpgzl oda zeixlelq zeihegl` zezyxa micwnzn ep` ,jynda

,zcxtp dxfibl zxcyn dphp` lk .(mixehwq) zexfb xtqnl zyxd `z z` zewlgnd zeipeeik

ède ce`n uetp zeipeeik zephpà yeniyd .envr ipta ohw zyx `z dyrnl ìd efk dxfib lke

yeniy rvean llk jxca .mxtqn z` licbdle xeciyd i`z ly lcebd z` oihwdl xyt`n

hilgdl zyxcp qiqad zpgz ,xvw onf wxt lka .zyx `z lka zeipeeik zephp` 6 e` 3-a

xtqn ici lr xeciya jenzl ick .dvwd zepgzl d`ad zxbqna xcyz `id dliag dfi`

-zd inzixebl`a .xeciy lk rvaz dphp` dfi` hilgdl dzr dkixv qiqad zpgz ,mixehwq

ynzyn lk llk jxca xy`k cxtpa xehwq lka zrvazn oenfizd zhlgd ,"milibx"d oenfi

z` llekd oenfiz ik mi`xn ep` .xzeia miaehd md exear uexrd i`pz ea xehwql jiieyn

ep` ,jk jxevl .xehwq lka cxtp oenfiz znerl mireviaa xetiy biyn `za mixehwqd llk

lawl dvw zpgzl mixizn ep` dycgd diraa ."szeyn oenfiz" ly dycg dira mixicbn

ixeciyn dvwd zpgz ly dhilwd i`pza miaygzn ep`y jez zipeeik dphp` lkn zeliag

.zifitd dakya dliag lk ly zepeyd xeciyd zeiexyt`a oke zeipeeikd zephp`dn zg` lk

ep` zeivleniq zxfra .diral miliri miaexiw mibivn j` ,NPC `id dirad ik migiken ep`

"szeynd oenfiz"d ike miilnihte`l miaexw zepexzt mibiyn eply minzixebl`d ik mi`xn

.cxtpa zipeeik dphp` lkl oenfiz revia znerl izernyn xetiy biyn

xqnn zepgza yeniyd epid zeixlelq zeihegl` OFDMA zezyxa aeyge sqep yecig

qiqa zepgzl cebipa .qiqa zepgzl lef silgzk zyxa zeyxtp el` zepgz .(relay nodes)

ihegl` xeyiwa zexaegn xqnnd zepgz ,zxeywzd zyx zaill ew xeaig zeyxec xy`

zehlgd ,xqnn zepgz mr zyxa .qiqad zepgzl qgia xzei zelef od okle ,qiqa zpgzl

zepgz l` qiqad zpgzn mixeciy xear wx `l `z lka zervean zeliagd ly oenfiz lr

zepgzl xqnnd zepgzne xqnnd zepgzl qiqad zpgzn mixeciy xear mb `l` ,dvwd

xear od oinfd qtd agex ia`yn z` oeayga zgwl jixv zeliagd ly oenfizdy oeeik .dvwd

-ax xehwe ly mia`yndn bxeg `ly oenfiz `evnl yi ,xqnnd zepgz xear ode qiqad zpgz

ziira z` jted df .`za xqnnd zepgz xtqnl m`zda rawp micnind xtqn xy`k ,icnin

dkixv qiqad zpgz .xqnn zepgz `ll zyxa xy`n ziaeyig xzei dywl zeliagd oenfiz

dfi`ae ,dliag lk xear ynzydl xeciy zhiy dfi`a ,xcyl zeliag dfi` hilgdl dzr

,ziaeyig dyw `id dycgd dirad ik mi`xn ep` .dliag zgilyl ynzydl xqnn zpgz

mirvend minzixebl`dy divleniq zxfra mi`xne ,dwihqixeide aexiw mzixebl` mirivn

zeliagd oenfiz zlert .zixlelq zihegl` zyx zlerta daeyg dniyn `id zeliag oenfiz

uexra rcin zeliag zgily iptl ,zn` onfa lawl dkixv qiqad zpgzy zehlgdn zakxen

ici lr ,d`ad zxbqna dpglyz zeliag dfi` hilgdl qiqad zpgz lr .zepgzd l` cxeid

-wzdl dkixv oenfizd lr dhlgdd .dliag lk glyiz xeciy zhiy dfi`ae dphp` dfi`

lkk miliri minzixebl` z`ivna jxev yi okle ,diipyilin 1 lka k"ca ,xvw onf wxta la

.xyt`d

,OFDMA zeqqean zeixlelq zeihegl` zezyx ly mireviad xetiya mipc ep` df xeaiga

zpgz ici lr ldepn `z lk xy`k ,zyx i`z ly ax xtqn zelikn el` zezyx .LTE oebk

zpgz jxc odly rcind zeliag z` zelawn xy` ,zeax dvw zepgz opyi `z lka .qiqa

miliri minzixebl` rivn `ede ,dl`k zezyxa zexxerznd zeira xtqna oc xeaigd .qiqad

.oze` xeztl

miaeygd mizexiyd cg` `edy ,reliable multicast-a cwnzn xeaigd ly oey`xd wlgd

rcin ly dvtd xyt`n df zexiy .zeixlelq zeihegl` zezyx ici lr miwteqnd xzeia

zliag ly cg` wzer xcyl qiqad zpgz ly zlekia yeniy ici lr miynzyn ly dveawl

dpi` zihegl` zyxa zhlgen zepin` ik mipreh ep` .dly `za dvwd zepgz lkl rcin

,zepgzd zeciipe uexrd aih ly zeifit zeaiqn zixyt` dpi` `id .dvegp dpi` s`e zixyt`

dvegp dpi` zhlgen zepin` .ecril ribdl llk jxca jixv rcind ea xvwd onfd wxt llbae

ly mieqn feg` ceai` ipta micinr dl` mineyiiy oeeikn e`ciee lew zvtd ly mineyiia

,zexizi zeliaga yeniy dyer reliable multicast-d ,ziwlg zepin` gihadl ick .zeliag

dhiyl .rcind z` gprtl lbeqn zeliag ly mieqn xtqn lrn lawnd ynzyn lk okle

cer xcyl ozip `l` ,zenieqn zeliag ly mixfeg mixeciy xcyl jxev oi`y jka oexzi ef

zlawny dpgz lk ici lr dglvda elawzdy zeliagd zqknl zetqeezny zexizi zeliag

zezyxa sqep izernyn mirevia xetiyl `iadl leki zeliag ly xeciyd zhiy ly inpic iepiy

zeleki zeliagd ,aexw ynzynl zecreind zeliag zgley qiqad zpgz xy`k .zeihegl`

,cbpn .uexra zerxtd ipta zecinr zegt j` xidn avw zxyt`nd xeciy zhiya glydl

zhiya dpglyiz zeliagd ,wegx ynzynl zecreind zeliag zgley qiqad zpgz xy`k

zeglyp multicast-a zeglypd zeliag .zerxtd ipta xzei dcinr mb j` ,xzei zihi` xeciy

-ynzyndn cg` zegtly deab ce`n iekiq yiy meyn ,zihi` xeciy zhiy zxfra llk jxca

.qiqad zpgzn ziqgi wegx mi

.aygnd ircnl dhlewta ,odk oae`x xeqtext ly eziigpda rvea xwgnd

-iazkae miqpka xwgnl eitzeye xagnd z`n mixn`nk enqxet df xeaiga ze`vezd on wlg

:opid xzeia zeipkcrd mdize`qxb xy` ,xagnd ly hxehwecd xwgn ztewz jldna zr

Reuven Cohen and Guy Grebla. Efficient allocation of CQI channels in broadband wirelessnetworks. In IEEE INFOCOM, pages 96–100. 2011.

Reuven Cohen, Guy Grebla, and Liran Katzir. Cross-layer hybrid FEC/ARQ reliable multicastwith adaptive modulation and coding in broadband wireless networks. In IEEE/ACMTransactions on Networking, volume 18, pages 1908–1920, 2010.

.izenlzyda daicpd zitqkd dkinzd lr oeipkhl dcen ip`

OFDMA zezyxl oenfiz inzixebl`zeihegl`

xwgn lr xeaig

x`ezd zlawl zeyixcd ly iwlg ielin myl

dlaxb `ib

l`xyil ibelepkh oekn --- oeipkhd hpql ybed

2013 x`exat dtig b"rzyd hay

OFDMA zezyxl oenfiz inzixebl`zeihegl`

dlaxb `ib

Scheduling Algorithms for OFDMA Broadband …...Scheduling Algorithms for OFDMA Broadband Wireless...

Documents