+ All Categories
Home > Documents > IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12...

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12...

Date post: 05-Mar-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
19
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013 3081 Pruned Bit-Reversal Permutations: Mathematical Characterization, Fast Algorithms and Architectures Mohammad M. Mansour, Senior Member, IEEE Abstract—A mathematical characterization of serially pruned permutations (SPPs) employed in variable-length permuters and their associated fast pruning algorithms and architectures are proposed. Permuters are used in many signal processing systems for shufing data and in communication systems as an adjunct to coding for error correction. Typically, only a small set of dis- crete permuter lengths are supported. Serial pruning is a simple technique to alter the length of a permutation to support a wider range of lengths, but results in a serial processing bottleneck. In this paper, parallelizing SPPs is formulated in terms of recursively computing sums involving integer oor functions using integer operations, in a fashion analogous to evaluating Dedekind sums. A mathematical treatment for bit-reversal permutations (BRPs) is presented, and closed-form expressions for BRP statistics in- cluding descents/ascents, major index, excedances/descedances, inversions, and serial correlations are derived. It is shown that BRP sequences have weak correlation properties. Moreover, a new statistic called permutation inliers that characterizes the pruning gap of pruned interleavers is proposed. Using this statistic, a recursive algorithm that computes the minimum inliers count of a pruned BR interleaver (PBRI) in logarithmic time is presented. This algorithm enables parallelizing a serial PBRI algorithm by any desired parallelism factor by computing the pruning gap in lookahead rather than a serial fashion, resulting in signicant reduction in interleaving latency and memory overhead. Exten- sions to 2-D block and stream interleavers, as well as applications to pruned fast Fourier transforms and LTE turbo interleavers, are also presented. Moreover, hardware-efcient architectures for the proposed algorithms are developed. Simulation results of interleavers employed in modern communication standards demonstrate three to four orders of magnitude improvement in interleaving time compared to existing approaches. Index Terms—Bit-reversal, permutation polynomials, permuta- tion statistics, pruned interleavers, turbo interleavers. I. INTRODUCTION AND MOTIVATION P ERMUTERS are devices that reorder a sequence of symbols according to some permutation [1]. They have a variety of applications in communication systems, signal processing, networking, and cryptography. In communica- tion systems, permuters are used as an adjunct to coding for error correction [1], [2] and are more commonly known as Manuscript received November 11, 2011; revised September 06, 2012 and December 15, 2012; accepted January 30, 2013. Date of publication February 07, 2013; date of current version May 20, 2013. The associate editor coordi- nating the review of this manuscript and approving it for publication was Dr. Ut-Va Koc. The author is with the Department of Electrical and Computer Engineering at the American University of Beirut, Beirut 1107 2020, Lebanon (e-mail: mman- [email protected]). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TSP.2013.2245656 interleavers. Interleavers are a subclass of permuters with carefully chosen permutations to break certain patterns in the input sequence, and strategically reposition symbols according to their relevance in protecting the overall sequence against errors. Examples include interleavers in turbo codes [3], edge permuters in Tanner graphs [4] for low-density-parity check (LDPC) codes [5], channel interleavers in bit-interleaved coded modulation schemes [6], and carrier interleaving for diversity gain in multi-carrier wireless systems with frequency-selective fading and multiple-access interference [7]. In signal processing, permuters are used to shufe streaming data [8] into a particular order such as in signal transform (e.g., discrete cosine [9], Hartley [10], and fast Fourier transforms (FFT) [11], [12]), matrix transposition [13], [14], and matrix decomposition algorithms [15]. In networking, permuters are widely used as interconnection and sorting networks for switching and routing [16]. In cryptography, permuters are commonly used in cipher algorithms for encryption [17]. The theory of interleavers has been established in the classic papers [1], [2] and more recently in [18]. Interleavers can be im- plemented using hard-wired connections, recongurable inter- connection networks, or memory buffers with address genera- tors depending on the desired throughput, recongurability, and resource requirements. A class of computationally efcient in- terleavers with simple address generation are block interleavers [18] of power-of-2 length . They are expressed in closed- form by , where and are basic per- mutations of lengths and , respectively, and . Here the symbols are written row-wise into a array and read column-wise after permuting the rows by and the columns by . Example permutations proposed in the literature or adopted in modern communications standards [19]–[21] include the bit-reversal permutation (BRP) [20] which reverses the order of bits in , and polynomial-based permutations where is a degree- permutation polynomial (PP) over the ring [22]. Commonly used polynomials include circular shift by a constant (e.g., [23], the parity and column twist (PCT) interleaver [24]), linear PPs (e.g., [20], [25], almost regular permutations (ARP) [26], dithered rel- ative prime (DRP) interleavers [23]), and quadratic PPs (QPPs) , where are appropriately chosen integers (e.g. [19], [22], [27]). Many practical interleavers are limited to a small set of discrete lengths. Pruning is a technique used to support more exible block lengths [28]–[30]. Communication standards [19]–[21] typically vary depending on the input data rate requirements and channel conditions. To support any length , interleaving is done using a mother interleaver with smallest 1053-587X/$31.00 © 2013 IEEE
Transcript
Page 1: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013 3081

Pruned Bit-Reversal Permutations: MathematicalCharacterization, Fast Algorithms and Architectures

Mohammad M. Mansour, Senior Member, IEEE

Abstract—A mathematical characterization of serially prunedpermutations (SPPs) employed in variable-length permuters andtheir associated fast pruning algorithms and architectures areproposed. Permuters are used in many signal processing systemsfor shuffling data and in communication systems as an adjunctto coding for error correction. Typically, only a small set of dis-crete permuter lengths are supported. Serial pruning is a simpletechnique to alter the length of a permutation to support a widerrange of lengths, but results in a serial processing bottleneck. Inthis paper, parallelizing SPPs is formulated in terms of recursivelycomputing sums involving integer floor functions using integeroperations, in a fashion analogous to evaluating Dedekind sums.A mathematical treatment for bit-reversal permutations (BRPs)is presented, and closed-form expressions for BRP statistics in-cluding descents/ascents, major index, excedances/descedances,inversions, and serial correlations are derived. It is shown thatBRP sequences have weak correlation properties. Moreover, a newstatistic called permutation inliers that characterizes the pruninggap of pruned interleavers is proposed. Using this statistic, arecursive algorithm that computes the minimum inliers count ofa pruned BR interleaver (PBRI) in logarithmic time is presented.This algorithm enables parallelizing a serial PBRI algorithm byany desired parallelism factor by computing the pruning gap inlookahead rather than a serial fashion, resulting in significantreduction in interleaving latency and memory overhead. Exten-sions to 2-D block and stream interleavers, as well as applicationsto pruned fast Fourier transforms and LTE turbo interleavers,are also presented. Moreover, hardware-efficient architecturesfor the proposed algorithms are developed. Simulation resultsof interleavers employed in modern communication standardsdemonstrate three to four orders of magnitude improvement ininterleaving time compared to existing approaches.

Index Terms—Bit-reversal, permutation polynomials, permuta-tion statistics, pruned interleavers, turbo interleavers.

I. INTRODUCTION AND MOTIVATION

P ERMUTERS are devices that reorder a sequence ofsymbols according to some permutation [1]. They have

a variety of applications in communication systems, signalprocessing, networking, and cryptography. In communica-tion systems, permuters are used as an adjunct to coding forerror correction [1], [2] and are more commonly known as

Manuscript received November 11, 2011; revised September 06, 2012 andDecember 15, 2012; accepted January 30, 2013. Date of publication February07, 2013; date of current version May 20, 2013. The associate editor coordi-nating the review of this manuscript and approving it for publication was Dr.Ut-Va Koc.The author is with the Department of Electrical and Computer Engineering at

the American University of Beirut, Beirut 1107 2020, Lebanon (e-mail: [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TSP.2013.2245656

interleavers. Interleavers are a subclass of permuters withcarefully chosen permutations to break certain patterns in theinput sequence, and strategically reposition symbols accordingto their relevance in protecting the overall sequence againsterrors. Examples include interleavers in turbo codes [3], edgepermuters in Tanner graphs [4] for low-density-parity check(LDPC) codes [5], channel interleavers in bit-interleaved codedmodulation schemes [6], and carrier interleaving for diversitygain in multi-carrier wireless systems with frequency-selectivefading and multiple-access interference [7].In signal processing, permuters are used to shuffle streaming

data [8] into a particular order such as in signal transform (e.g.,discrete cosine [9], Hartley [10], and fast Fourier transforms(FFT) [11], [12]), matrix transposition [13], [14], and matrixdecomposition algorithms [15]. In networking, permutersare widely used as interconnection and sorting networks forswitching and routing [16]. In cryptography, permuters arecommonly used in cipher algorithms for encryption [17].The theory of interleavers has been established in the classic

papers [1], [2] and more recently in [18]. Interleavers can be im-plemented using hard-wired connections, reconfigurable inter-connection networks, or memory buffers with address genera-tors depending on the desired throughput, reconfigurability, andresource requirements. A class of computationally efficient in-terleavers with simple address generation are block interleavers[18] of power-of-2 length . They are expressed in closed-

form by ,where and are basic per-mutations of lengths and , respectively,and . Here the symbols are written row-wise into a

array and read column-wise after permuting the rows byand the columns by . Example permutations proposed in

the literature or adopted in modern communications standards[19]–[21] include the bit-reversal permutation (BRP)

[20] which reverses the order of bits in , andpolynomial-based permutations where

is a degree- permutation polynomial (PP) over the ring[22]. Commonly used polynomials include circular shift by

a constant (e.g., [23], the parity and columntwist (PCT) interleaver [24]), linear PPs (e.g.,[20], [25], almost regular permutations (ARP) [26], dithered rel-ative prime (DRP) interleavers [23]), and quadratic PPs (QPPs)

, where are appropriately chosenintegers (e.g. [19], [22], [27]).Many practical interleavers are limited to a small set of

discrete lengths. Pruning is a technique used to support moreflexible block lengths [28]–[30]. Communication standards[19]–[21] typically vary depending on the input data raterequirements and channel conditions. To support any length ,interleaving is done using a mother interleaver with smallest

1053-587X/$31.00 © 2013 IEEE

Page 2: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

3082 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013

such that outlier interleaved addresses are ex-cluded. However, pruning alters the spread characteristics ofthe mother interleaver, and creates a serial bottleneck since in-terleaved indices become address-dependent. Hence permutingstreaming data in parallel on the fly is no longer practicallyfeasible [8]. Expensive buffering of the data is required tomaintain a desired system throughput. Hence it is essentialto characterize the pruned permutation structure to study itsspread characteristics, and to parallelize the pruning operationto reduce latency and memory overhead by interleaving anaddress without interleaving all its predecessors.Alternatively, pruning can also be employed to design effi-

cient FFTs by eliminating redundant or vacuous computationswhen the input vector has many zeros and/or when the requiredoutputs may be very sparse compared to the transform length.Pruning interleavers has motivated the following problem.

Given a set of integers and a per-mutation on , determine how many of the firstintegers in are mapped to indices less than somein the permuted sequence. For example, for the permutation

, and , out of the firstfive integers only {1,3,4} map to positions less than six. Surpris-ingly, this problem has largely been unattempted before in theliterature. In [31], a solution for linear PPs based on Dedekindsums [25], [32] was proposed.In this paper, we propose a mathematical formulation of this

problem for general permutations using sums involving integerfloor and the so-called “saw-tooth” functions (Section II). Thearithmetic properties of these sums are analyzed in Section III,and a set of mathematical identities used to solve the problem re-cursively are derived. We specialize to BRPs and give a mathe-matical characterization of these permutations, which have beenmainly treated before using numerical techniques to speed upradix-2 FFT computations and related transforms (e.g., see [10],[33]–[43]). In [44] a combinatorial solution based on bit manip-ulations was proposed. Here we derive in Section IV closed-form expressions for BRP statistics including descents/ascents,major index, excedances/descedances, inversions, serial corre-lations, and show that BRP sequences have weak correlationproperties (i.e., a permuted index strongly depends on theunpermuted index ). We propose a new statistic called permu-tation inliers, and prove that it characterizes the pruning gap ofpruned interleavers. Using this statistic, we derive a recursive al-gorithm in Section V to compute the minimum inliers count in apruned BR interleaver (PBRI) in logarithmic time, and apply itto parallelize a serial PBRI and reduce its latency and memoryoverhead. In Section VI we extend the discussion to block andstream interleavers that are composed of two or more permu-tations. In Section VII, we apply the inliers problem to designparallel BRPs for pruned FFTs, as well as parallel pruned in-terleavers for LTE turbo codes. In Section VIII, we considerimplementation aspects of the proposed algorithms and presenthardware-efficient architectures. We perform simulations usingseveral practical examples to demonstrate the advantages of theproposed algorithms. Finally, Section IX provides concludingremarks. All proofs are included in the Appendix.

II. PRELIMINARIES AND PROBLEM FORMULATION

Consider the set of integers and let be a permutation on. Denote by the -bit binary representation

of , where and , .

The bit-reversal of is defined as. Note that and hence .

The goal is to characterize the so-called permutation statistics ofwhen is the BRP. The subject of permutation statistics dates

back to Euler [45], but was formally established as a disciplineof mathematics by MacMahon in [46], [47]. We start with somedefinitions.A fixed point of is an integer such that . An

excedance [46] of is an integer such that . Denote byand the sets consisting of all fixed points and all

excedances of , respectively, and by andthe number of fixed points and excedances of (sometimescalled excedance number). An element of a permutation that isneither a fixed point nor an excedance is called a descedance.For example, the permutation hasfixed points and excedances

, and hence and .We say that is a descent [46] of if .

Similarly, is an ascent of if . Denoteby and the sets of descents and ascents of ,respectively, and by and the number ofdescents and ascents of . The major index [46] of , ,is the sum of the descents, i.e. . In ourexample, the descents are and hence

, the ascents are andhence , and the major index is

.A pair is called an inversion [46] of if and

. The set of all inversions is denoted by andits size by . Continuing our example, the inversionsare {(0,1), (0,3), (0,8), (1,8), (2,3), (2,4), (2,6), (2,7),(2,8), (3,8), (4,7), (4,8), (5,6), (5,7), (5,8), (6,7), (6,8), (7,8)},and .The spread of entries with span of mea-

sures how far are spread apart after permuting. The min-imum spread [48] of all distinct entries of with a span isdefined as . For ourexample, no 2 consecutive entries map into consecutive entries,but entries 0, 1 map to , hence .Often it is convenient to represent a permutation on by a

array with a cross in each of the squares . Fig. 1shows the array representation of the permutation in our ex-ample. Fixed points correspond to crosses on the main diagonal,excedances to crosses to the right of this diagonal, while de-scedances are represented by crosses on the left.In this paper we introduce a new permutation statistic useful

for analyzing pruned interleavers called permutation inliers. Aninteger is called an -inlier of if and

. Let denote the set of all -inliers,

(1)

and the number of -inliers of . We calldetermining for arbitrary the permutation inliersproblem. Similarly, an integer is called an -out-lier if and . denotes the set of all

-outliers, and their number:, or equivalently

(2)

Page 3: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

MANSOUR: PRUNED BIT-REVERSAL PERMUTATIONS 3083

Fig. 1. Array representation of .

where ‘ ’ is the set-difference operator. Referring to thearray diagram of in Fig. 1, the -inliers correspond to thecrosses included in the rectangle with diagonal vertices (0,0)and . In the previous example, the (5,7)-inliers are

, while the outliers are the complementset .The more general case of counting inliers in a bounded re-

gion and ,

, reduces to the original problemin (1) by observing that

. Hence withoutloss of generality, we focus on (1) in the remainder of this paper.There are no known number theoretic techniques to analyze

the structure of INL for arbitrary permutations as presentedabove. However, with the help of Lemma 1, we can recastthe problem into one of evaluating a summation that involvesinteger floors, a device which is well-studied in number theory.Lemma 1: The number of -inliers of is given by

(3)

Proof: The floor function is the largest integer lessthan or equal to the real number . The first floor function in(3) evaluates to 1 for and 0 otherwise, while thesecond evaluates to 1 for and 0 otherwise.Hence the sum of their product counts the number of elementsin . The outliers count in the complement set is simply

(4)

Lemma 2 (Properties of ): If , thenand hence

(5)

Moreover, if , thenfor , .

Proof: Since , if maps to ,then maps to . Hence the two sums in(5) count the same elements. For , substitutein (3) and use .Similarly, we can recast the inversions problem into floor

sums as follows. First observe that inversions are the union of

the outlier sets , where the ele-ments of each set are paired with :

(6)

The notation is the set of pairs, .

Lemma 3 (Inversions): The number of inversions is given by

(7)

Proof: From (6), is the sum of for. Also, . Summing (4) with

for , the result follows.For certain permutations such as ,

, it is possible to obtain closed form expressions for(3) and (7). First we need the following lemma.Lemma 4: For any integers , we have

(8)

Proof: Write and, then substitute in summation (8).

Now applying (8) in (3) for , we have

(9)

since , . For example, for , , ,, the number of (15,19)-inliers is .

To count the inversions, we substitute (9) within (4), then (7). It is easy to verify

that , and that

when , and 0 otherwise.Then from (7) and (4)

For other types of permutations such as polynomial-basedpermutations or BRPs, finding a closed form expression for (3)

Page 4: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

3084 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013

is not as simple due to the presence of the floor functions. For-tunately, such sums can be more conveniently manipulated byreplacing with the “saw-tooth” function

(10)

where if is an integer, and 0 otherwise.It will be shown in this paper that for any permutation that

fixes the zero element (i.e, ),

(11)

where is a constant. Hence in the remainder of this paper,we focus on evaluating summations of the form

(12)

when is the BRP, which are reminiscent of Dedekind sums[25]. Evaluating (12) for arbitrary permutations is still an openproblem. For BRPs, we show that these sums can be evaluatedrecursively in steps using only integer addition andshift operations. This result is extended to evaluate sums (3) and(7) using simple mathematical recursions.Moreover, for the purposes of characterizing the random-

ness of pseudo-random numbers generated by BRPs, we studythe serial correlations between an entry in the bit-reversed se-quence and its successors. These correlations require evaluatingrelated sums of the form for all

successors. We propose a simple recursive algorithmto evaluate these sums in logarithmic complexity.

III. RECURSIVE RELATIONSHIPS

In this section, we derive recursive expressions for sums in-volving that are useful for computing permutation statis-tics. The following properties immediately follow from (10):

; ; ;for integers , real ; and

(13)

(14)

(15)

Next, consider the sum of product of the -power integersand the bit-reversed integers , :

(16)

Theorem 1: can be evaluated as

with initial conditions and , where arethe Bernoulli numbers. Also, since in (16) the order in whichintegers are summed is irrelevant and , we have

.Corollary 1: For we have

. For , 2, we have

(17)

Moreover, the function possesses many interestingproperties when is a rational number , especially when

is summed over a complete residue system (RS) modulo. Lemma 5 summarizes some of these identities:Lemma 5 (Sum of Saw-Fractions Over a Complete RS):

(18)

(19)

(20)

(21)

Further properties are derived when or

are summed over half a RS for shift values.Lemma 6 (Sum of Saw-Fractions Over Half a RS): Let be

a non-negative integer and set , then

;.

(22)

(23)

In particular, when , .Sums of and involving squared integers

have never been attempted before. Below we derive an inter-esting identity for these sums over a complete RS.Lemma 7:

even;

odd.(24)

Page 5: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

MANSOUR: PRUNED BIT-REVERSAL PERMUTATIONS 3085

even;

odd.(25)

For the arithmetic analysis of BRPs , sums involvingand their variations are of particular in-

terest.Lemma 8: For ,

(26)

More generally, sums of productsfor shift integers can also be evaluated efficiently.Lemma 9: Let if is odd and if

is even. Define and . Then

odd;even.

(27)

.(28)

Next, we investigate summations that involve products of dif-ferences of saw-functions similar to those in (11):

(29)

Lemma 10: If or , then . Else, let, , be as defined in Lemma 9. Then (see (30) at the bottom

of the page). This recursion (and (27)) can be evaluated usinginteger arithmetic in at most steps since .Note that the recursive solution in (30) is similar to that for

linear permutation polynomials involving Dedekind sums [31].Specifically, when in (29), a closed form expres-

sion for the sum can be derived. Thesesums appear in equations similar to (7) for counting inversions.Lemma 11:

(31)

We next consider sums of products of saw-fractions involvingand their -th successors . These sums are used in

studying the serial correlation properties of BRPs.Lemma 12: Consider and let be

the position of the least-significant ‘1’ in . Set

. Then

(32)

For example, when , and, we have and

(33)

and when , we have and

. An algorithm for computing usinginteger operations is shown below. Note that

is an integer since , , and aredivisible by 3 since is a power of 2 (proved by induction).Another related sum is one involving shifted saw-fractions

and their first-successors , for shift values. These sums are used in studying the probability of consec-

utive BRP terms falling within specific intervals. Let

(34)

Algorithm 1: Integer Algorithm to Compute

for to do

end for

in (32)

Lemma 13: satisfies the recursion

odd;

even.(30)

Page 6: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

3086 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013

where if is even and if is odd, if iseven and if is odd,if is even and if isodd, if is even and

if is odd, and if bothare odd or both even and otherwise.Finally, generalizing (34) into products of differences as

given in (35), we have the following lemma:

(35)

Lemma 14: satisfies the recursion

(36)

where if is even and 1 if is odd,, , ,

, and .

IV. PERMUTATION STATISTICS OF BRPS

We next derive permutation statistics for BRPs and presenta solution for the permutation inliers problem using the resultsfrom Section III. Let be the sequence

, generated by the BRP on bits, and let .

A. Descents and Major Index

We start by enumerating descents induced by a BRP.Lemma 15: The probability that is and the

number of descents is . More generally, theprobability that is 0 for .

Proof: Let , . We enumeratethe occurrences in , with taken . Aneven has while an odd has . Thenand when is odd since and

. Hence occurs exactly times,which gives . When , then

cannot occur because is even for at least oneand hence .

The number of ascents is. The major index of is the sum of the indices of the firstelement in each pair that is a descent.Lemma 16: The major index of a BRP is .

Proof: From Lemma 15, descents occur at odd indices,hence the major index is .

B. Fixed Points, Excedances, and Descedances

The number of fixed points of a BRP is the number of palin-dromes (when ):

(37)

The sum of all fixed points as well as their squares can be eval-uated using the following lemma:Lemma 17 (Sum of Fixed Points):

even;

odd.(38)

even;

odd.

(39)

An excedance of is an integer such that .Lemma 18: The excedance number of a BRP is

. The probability thatis .

Proof: Let , . These repre-sentations can be grouped such that , , or

. From (37), the number of palindromes is .Since there are equal number of remaining representations cor-responding to and , the number of oc-currences of or is , and theprobability that is .Corollary 2:

Proof: The floor functions evaluate to 1

when , hence their sum is the excedance number.

Next we consider the sum of all excedances of ,

.

Lemma 19: The sum of excedances is given by

even;

odd(40)

Corollary 3: The sum of descedances is given by

even;

odd.

Proof: Apply

, and note that

which sums all such that .

Page 7: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

MANSOUR: PRUNED BIT-REVERSAL PERMUTATIONS 3087

In fact the sum of the squares of all excedances can besimilarly evaluated.Lemma 20 (Sum of Squares of Excedances):

even;

odd.

Proof: The proof is similar to Lemma 19. When iseven, we obtain the recursion

when . Similarly, when is

odd, we obtain

when .Corollary 4 (Sum of Squares of Descedances):

even;

odd.

Proof: The proof is similar to Corollary (3).

C. Minimum Spread

Lemma 21: The minimum spread, of a

BRP with is

Proof: For , if is even. Forodd, the minimum occurs when , in which case

. For ,. For ,

when and , we have, hence .

D. Inliers and Outliers

Theorem 2: For any permutation that fixes the zero element(i.e, ), the number of inliers is given by

(41)

where is given by

(42)

Corollary 5: Specifically, for BRPs, reduces to

(43)

where is given in (30), and reduces to

(44)Equation (43) can be evaluated recursively in stepsusing (30). Note that since , only integer shift and addoperations are needed to evaluate (30) and (43), assuming theproduct of the constants is computed off-line.Example 1: Let , , ,

. Then andsince and . Using (30), we have

and. Next we have

. These steps are repeated using (30), re-sulting in . Therefore, using(43) we have

.Corollary 6: For BRPs and , , it follows that

.Theorem 3: Let . Then

the probability that for is

where is given by (43).Proof: enumerates all such that

, while enumerates all such that .Hence the difference counts all such that

. Similarly for . Therefore

counts all such that .We can similarly enumerate all that have successive

inliers, i.e. those such that and ::

(45)

Theorem 4: For , , reduces to

(46)

Page 8: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

3088 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013

where is given in (35), (36),

(47)

, and . Moreover, ifand , then .

Proof: Expand (45) using similar to Theorem (2),multiply out terms and then simplify the expression using(20), (21) to obtain (46) and (47). Further, it is easy to showthat evaluates todepending on . The details of the proof are omitted. Fi-nally, if both and sinceeither or is odd which implies either or

, and hence all .Example 2: Continuing Example 1, we have

. Also ;

; ;. There-

fore . Next, using (36) we have;;

; ;.

Summing all terms, we get. Therefore

.Theorem 5: Let . Then

the probability that and is

where is given by (46). This is similar to Theorem 3.

E. Inversions

Lemma 22 (Inversions): The number of inversions is

(48)

Proof: Using (3), (41) in (7) with , we have

where are given in (17) and (31), respectively, andfrom (44) when , and 0 oth-

erwise. Substituting (17) and (31) and simplifying terms, (48)follows.

F. Serial Correlations

A necessary condition for the apparent randomness ofis the small size of the serial correlation statistic :

(49)

for , where is the expectation operator.measures the extent to which depends on its -th successor

. To compute , we first determine the variance :, and

. Hence

. The only difficult part of (49) is thecovariance:Theorem 6 (Covariance):

(50)

where is given by (32), and is the positionof the least-significant one-bit in (starting from 0).Corollary 7 (Serial Correlations for ):

Proof: Substitute (33) for and in (50), thendivide by the variance .A correlation coefficient always lies between 1. A small

coefficient indicates that and are almost independent.Hence it is desirable to have . Since ,it follows that BRPs have weak correlation properties.

V. SERIALLY-PRUNED BRIS AND MINIMAL INLIERS

The permutation inliers problem is applied to study prunedbit-reversal interleavers (BRIs). A BRI maps an -bit integerinto -bit integer such that , where , and

is the interleaver size. A serially-pruned BRI (PBRI) ofsize and pruning length , with , is definedby , such that: 1)

, and 2) is the serial pruning functionwhere is the pruning gap of defined to be the minimum

such that (i.e., for, is satisfied exactly times). The domain and rangeof are and . Pruned interleavers areused when blocks of arbitrary lengths (other than powers-of-2)are needed. To interleave a block of size , a mother interleaverwhose size is the smallest power-of-2 that is is selected andpruned. Hence, in the following, we assume that .There are several ways to prune addresses from the mother

interleaver. One method is to ignore positions beyondin the permuted sequence, which we consider in this work (seealso [29], [30]). Other methods prune addresses beyond

Page 9: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

MANSOUR: PRUNED BIT-REVERSAL PERMUTATIONS 3089

Fig. 2. (a) Flowchart of the serial pruning algorithm. (b) Smallest interval ofaddresses that has exactly inliers with respect to .

in the original sequence, or prune a mixture of addresses fromboth the original and permuted sequences [30]. Hence any ad-dress that maps to an address is dropped and the nextconsecutive address is tried instead. To determine where an ad-dress is mapped, a serial PBRI (S-PBRI) starts fromand maintains the number of invalid mappings (pruning gap)that have been skipped along the way (see Fig. 2(a)). Ifmaps to a valid address (i.e., ), then is in-cremented by 1. If maps to an invalid address (i.e.,

), is incremented by 1. These steps arerepeated until reaches and , and hence

. Therefore, . Algorithm2 shows the pseudo-code of a generic cascadable S-PBRI with

, , the pruninggap up to , and up to . The parametersare set to to compute .

Algorithm 2: Serial PBRI Algorithm:

while do

if then

else

end ifend while

The time complexity to determine is . However, usingthe inliers problem formulation, is simply the minimum non-negative integer to be added to such that has ex-actly inliers: such that (seeFig. 2(b)). Out of the first addresses, there areoutliers . Hence . Next consider the ex-panded interval of addresses . This set con-tains outliers. Hence again . Thisprocess is repeated by expanding the interval into

and determining the corresponding number of out-liers. The process terminates whenat some step when there are no more outliers, and hence

. This process for computing the minimum numberof inliers is implemented in Algorithm 3.Example 3: Let

. Applying the MI algorithm, wehave using (2),(43). Next we expand to and recompute

. At step 3 we have. The steps are repeated

until with .

Algorithm 3:Minimal Inliers Algorithm:

repeat

until

The convergence rate of the MI algorithm isas shown in Theorem 7. The proof is based on deriving exactexpressions for tight lower and upper bounds on . Fig. 3 plotsthese bounds and for when .Theorem 7 (Rate of Convergence): The minimal inliers algo-

rithm converges at a rate .Using the MI algorithm a parallel PBRI of length with

a parallelism factor of over the S-PBRI can be designed byemploying (or if ) S-PBRI’s of size

and pruning length as shown in Algorithm 4.

Algorithm 4: Parallel PBRI Algorithm:

for all do

end for

if then

end if

Page 10: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

3090 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013

Fig. 3. (a) Lower and upper bounds on the pruning gap for , , and (b) convergence rate of the MI algorithm for .

VI. EXTENSIONS TO OTHER INTERLEAVERS

We extend the discussion in this section to composite inter-leavers that employ smaller interleavers to construct a larger in-terleaver, such as 2D block and stream interleavers.

A. 2D Block Interleavers

A 2D block interleaver [18], [28] of size is defined by a per-mutation composed of two smaller permutations and ofsize and , respectively, where . Let ,

, and . Then. Alternatively, we say is mapped

to . This is equivalent to writingthe sequence of integers into a array row-wise, per-muting the entries in each column by and in each row by ,then reading the entries from the array column-wise. Such inter-leaver is referred to as row-by-column. The reversal of dimen-sions in general improves the spread properties of . If iden-tical permutations are applied to all columns and identicalapplied to all rows, then the order of applying the permuta-

tions does not matter. Otherwise, if is column-specific, say, and is row-specific, say , then the order matters.

In a row-first block interleaver, an entry maps to rowthen to column , while in a

column-first interleaver, it maps to column thento row . For simplicity, we assume identical ’sand identical ’s below.A pruned 2D block interleaver of size

and pruning length , with , is definedby the map where similarto a pruned 1D block interleaver. Here, ,

, , , and the integermaps to ,

where , , ,and is a 2D permutation. In a pruned2D block BRI (P2BRI), and are BRPson and bits, respectively, and , .To count the -inliers

in a pruned 2D interleaver, we count the number of

times for. These are satisfied if: 1a) ,

or 1b) and ; and 2a) , or 2b)and . Conditions 1a), 2a) are both satisfied

times. 1a), 2b) count the -inliers for , which are. Similarly 1b), 2a) count the -inliers

for , which are . Finally, 1b), 2b) are satisfiedonce if and . Adding the results:

if ;otherwise.

(51)

Example 4: Consider a P2BRI with , ,, . We have ;

, , , ,; , ;

, . Using (43), wecompute ,. Since , conditions 1b), 2b)

are not satisfied. Hence.

The minimal inliers algorithm can be applied to compute thepruning gap of a P2BRI with outliers

computed using (51). A parallel P2BRI canbe realized as well using Algorithm 4. Extensions to multi-di-mensional hyper-block pruned interleavers can be similarly de-fined, but the details are omitted due to lack of space.

B. Stream Interleavers

In some communication systems (e.g. [20], [49]), a block ofinformation bits is divided into sub-blocks each of which is in-terleaved independently. Interleaved bits out of each sub-blockare treated as streams that are concatenated (or even further in-terleaved) to form the interleaved bits of the original block. Forexample, a 2-stream interleaver divides a block of lengthinto two sub-blocks of size , interleaves sub-block 0 usingand sub-block 1 using , and then combines the outputs bits

Page 11: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

MANSOUR: PRUNED BIT-REVERSAL PERMUTATIONS 3091

Fig. 4. 8-way parallel contention-free mapping for (a) an unpruned, and (b) a pruned FFT bit-reversal mapping with , , , .

from both streams in an alternating fashion. The resulting per-mutation is given by

where . A 2-stream bit-reversal interleaveruses bit-reversal maps on bits to interleave the sub-blocks,i.e., . A pruned 2-stream bit-reversalinterleaver is defined similar to a PBRI.To count the -inliers for any pruned

2-stream interleaver, we simply count the -in-liers of and the -inliers of , to obtain

In fact, the above formula can be generalized to a pruned-stream interleaver employing generic constituent permu-

tations of size , where the output bits fromthe streams are permuted according to some permutationof size . The resulting permutation is given by

for . To count its -inliers, we simply count the -in-

liers of for and add the results:

where .

VII. APPLICATIONS OF SERIALLY-PRUNED PERMUTERS

In this section, we apply the inliers problem to design par-allel bit-reversal permuters for pruned FFTs, as well as parallelpruned interleavers for LTE turbo codes.

A. Pruned FFT Algorithm

The FFT algorithm is widely used in signal processing andcommunications applications such as digital filtering, spec-tral analysis, and polyphase filter multicarrier demultiplexing(MCD) [50]–[53]. In some of these FFT applications, there

exist cases where the input vector has many zeros or the re-quired outputs may be very sparse compared to the transformlength. For example, in digital filtering, one may only requirethe spectrum corresponding to certain frequency windows ofthe FFT, or in MCD, only a few carriers out of the overall rangeof available carriers are needed. In digital image processing,only part of the images are of interest to certain applications. Insuch cases, most of the FFT outputs are not required. SeveralFFT pruning algorithms have been proposed [53]–[57] to avoidredundant computations on zero inputs or for unused outputs.However, most of these algorithms do not consider the cost ofpruned bit reversal reordering of the inputs or outputs whenperforming in-place FFT computations.For simplicity of exposition, we assume in the following that

only a narrow spectrum is of interest, but the resolution withinthat band has to be very high. Hence, the DFT hassome input values , but fewer than outputs areneeded. We also assume a radix-2 FFT algorithm with in-placeFFT computations using a set of butterflies that compute thefinal outputs in a set of memory banks in bit-reversed order.A subsequent stage performs BRP re-ordering of the outputsback to natural order. Note that since a BRP is an involution(i.e., ), re-ordering a bit-reversed output is analogousto bit-reversed ordering of the output in natural order. Hence,we assume that the FFT outputs are written in natural order inthe output memory banks, and the BRP stage does bit-reversalordering. Fig. 4(a) illustrates the BRP stage for the unprunedFFT case, which reads from the FFT memory banks and writesto the input memory banks at the receiving end. We show nextthat this BRP stage (both unpruned or serially pruned) can beparallelized to match the parallelism degree of the FFT, elim-inating its serial bottleneck on throughput.A permutation of length in general is said to

be contention-free (CF) [58] with degree if an arrayof data elements stored in one set of read memory bankseach of size can be permuted and written into anotherset of write memory banks, such that at each step, dataelements are read in parallel from the read banks and writtenin parallel into the write banks without reading or writingmore than one element from/to any bank (see Fig. 4(a)). Datais stored sequentially in the read banks such that linear address

corresponds to location in bank , where

Page 12: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

3092 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013

and . To permute any data entriesat linear addresses in parallel, theCF property stipulates that

for all and , where

the bank addressing function is either or

. This is a more general condition than[58], and effectively uses either the most or least significantbits (MSBs/LSBs) of as a permuted bank address.It is easy to prove that the BRP is contention-free for any

, , , where and ,using the property . For anypair of distinct windows , we have

for . Hence the LSBs designate a permutedbank address. Fig. 4(a) illustrates the CF property of the BRPfor a 32-point unpruned FFT block whose outputs are stored in

read memory banks. The BRP stage permutes the 8banks in steps and stores the data in the write memorybanks in parallel without any stalls due to address collisions.An arbitrary pruning of a permutation does not preserve its

CF property. However, serial pruning does, and a CF prunedpermuter can be designed as shown in Theorem 8. First, the se-rial-pruning map itself is CF. To show this, taketwo addresses and that corre-spond to banks and . Then

for any since is mono-tonically increasing and hence .Theorem 8: Any serially-pruned, contention-free permuta-

tion (interleaver) remains contention free.Proof: One scenario is to insert zero filler bits in the pruned

positions while storing the data sequentially in memory acrossthe banks. This requires comparing with serially forevery before writing to memory. Hence the CF property ap-plies for the pruned interleaver across all the banks if the motherinterleaver is CF.Another scenario is to store the data across the banks

without filler bits as shown in Fig. 4(b). To interleave prop-erly, we need to keep track of the inliers that fall within eachwindow. First, since the number of inliers up to window is

, data located between addressand are stored sequentially in bank .

We know that addresses mapto distinct windows under . Address in window , whichmight be pruned, actually corresponds to the unpruned address

, where is defined as:

(52)

with initial condition . Then, for and, we have

Hence a serially-pruned interleaver is CF when the banks areaccessed sequentially using a counter from ,if the mother interleaver is CF.The pruning gaps in (52) can be computed efficiently using

Algorithm 3 together with any scheme for enumerating inliersdepending on the permutation. In Fig. 4(b), Theorem 8 is ap-plied to parallelize a pruned BRP stage of a 32-point FFT al-gorithm pruned to points and permute its outputs inparallel without contention when accessing the memory banks.Pruned locations are marked as . Each read memory bank isinitialized with the appropriate using (52), and accessedby a counter that runs from 0 to . When reading bankat step , the actual address corresponds to .If , the read is successful. Otherwise, the lo-cation is pruned, reading from bank is stalled and isincremented. The FFT results are written in parallel in PBRPorder in the write memory banks in 3 steps.

B. Pruned LTE Turbo Interleavers

Serial pruning is also valuable in turbo coding applicationsbecause it allows for flexible codeword lengths. In a typicalcommunication system employing adaptive modulation andcoding, only a small set of discrete codeword lengths aresupported. Bits are either punctured or filled in to match thenearest supported length. For a pruned interleaver of lengthto be useful, it is desirable to have the following characteristics:1) It does not require extra memory to store the pruned indices,2) pruning preserves the CF property [58], [59] of its motherinterleaver (if present), and 3) its spread factor [60] degradesgracefully with the number of pruned indices , andhence the impact on BER is limited.Serial pruning satisfies properties 1 and 2 as shown in

Section VII-A. The implications are that serially-prunedcontention-free interleavers are parallelizable at a low imple-mentation cost using the schemes proposed in this work toenumerate inliers. When coupled with windowing techniquesto parallelize the constituent a posteriori probability (APP)decoders (see Fig. 4(b) with APP decoders instead of FFTblocks), a turbo decoder can be efficiently parallelized to meetthroughput requirements in 4G wireless standards and beyond.We next show that serial pruning also satisfies the property 3.The spread factor of an interleaver is a popular measure of

merit for turbo codes [60]. The spread measures of and as-sociated with two indices are

and

. The minimum spreads of and are defined

as and

, . The following the-orem shows that remains close to when is small.Theorem 9: The minimum spread of a serially-pruned inter-

leaver of length is at least

(53)

where is a small positive constant and.

The proof relies on the fact that, where are such that

. The difference

Page 13: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

MANSOUR: PRUNED BIT-REVERSAL PERMUTATIONS 3093

Fig. 5. (a) Minimum spread of pruned QPP interleavers in LTE. (b) BER of LTE turbo codes with pruned 2D QPP-BRP interleavers of length .

is upper bounded as , assuming , sinceis a monotonically increasing function. Since cannot

be separated by more than positions, we need tofind the maximum of when . This isequivalent to finding the maximum expansion of an interval oflength such that it contains at least inliers. FromTheorem 2, this expansion leads to finding the minimumthat satisfies , fromwhich (53) follows. For example, the QPP interleaver

has and . Ifpositions are pruned, then . In fact, the

actual is 62.Fig. 5(a) plots the minimum spread of serially-pruned QPP

interleavers as a function of , for several mother interleavers.The lower bound in (53) is also plotted. The length , minimumspread and constant of the mother interleavers are shownin brackets. As shown, of the pruned interleavers remainsvery close to when up to indices are pruned.Hence the bounds predicted by (53) are rather tight.To assess the impact of serial pruning on error-correc-

tion performance, the BER of 3GPP LTE turbo codes em-ploying serially-pruned 2D QPP-BRP interleavers weresimulated over an AWGN channel. The 2D mother inter-leaver of length is a concatenation of a QPP anda BRP defined by ,where , , , , and

. 500,000 frames were simu-lated assuming BPSK modulation and log-MAP decoding withup to 6 decoding iterations. Fig. 5(b) shows the results usingthe 2D mother interleaver and 11 serially-pruned interleaversof the indicated lengths. Also shown for comparison are resultsfor three other 1D QPP interleavers of lengths 2048, 2016 and1664 used in LTE (the other 9 lengths from 2016 to 2046 arenot supported). In almost all cases, the pruned interleaversperform very close to the 2D mother QPP-BRP and 1D QPPinterleavers.

Fig. 6. Architecture for computing in (43) for bit-reversal permuta-tions using the function in (30).

VIII. PRACTICAL IMPLEMENTATION ASPECTS

Fig. 6 shows the architecture for computing in (43)for BRPs using the function in (30) using elemen-tary logic gates. The block is clocked for clock cyclesto produce the result. The three shift registers on the left areinitialized with . The register with symbol % drops outthe MSB every cycle and stores the resulting contents backin the register, while registers with symbols perform a leftshift by one position every cycle. Block reverses the bitsof or depending on whether is odd or even.The multiplexer logic simply selects what the expression in the

recursion in (30) evaluates to (see proof of Lemma10 in Appendix). The block with symbol multiplies by 2 togenerate or . Afterclock cycles, the output is then divided by usingthe block, and then (block divides by )and are added to generate .Fig. 7 shows the implementation of the minimal inliers algo-

rithm in Algorithm 3. The architecture can be used to computethe minimal inliers of any permutation by using the appropriate

block. For the BRP, the block in Fig. 6 is used. Forlinear congruential permutations, the block proposed in [31] canbe used. For a generic permutation, a table lookup implementa-tion can be used when the size is small. A parallel pruned in-

Page 14: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

3094 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013

TABLE IPARAMETERS OF 1D, 2D, AND 2-STREAM INTERLEAVERS CONSIDERED

Fig. 7. Architecture of the minimal inliers algorithm in Algorithm 3.

Fig. 8. (a) 1D, (b) 2D block, and (c) 2-stream interleavers.

terleaver can be realized simply by cascading several minimalinliers blocks according to Algorithm 4.

A. Practical Examples

To demonstrate the performance advantage of the proposedschemes in this paper, several pruned interleavers were con-structed and simulated using the proposed pruning algorithmsas well as existing serial pruning algorithms in the literature.1D, 2D block, and 2-stream interleavers are considered (seeFig. 8). For the 1D case, bit-reversal (brev) and linear congruen-tial sequence (lcs) [25] are considered (see Table I). For the 2Dcase, four combinations of permutations across the two dimen-sions are considered: brev across both, brev across the first andreversed brev across the second, lcs across the first and brevacross the second, lcs across the first and a quadratic permuta-tion polynomial (qpp) across the second. The lcs permutations

vary from column to column by changing

(odd). The qpp permutation has size 32 and its inliers are im-plemented using a look-up table. These interleavers are used inpractice for example in [19]–[21], [49].For the 2-stream case, three combinations of permutations

across the two streams are considered: brev across the first di-mension and reversed brev across the second, lcs across the firstand brev across the second, lcs across both dimensions. The pa-rameters of all interleavers are listed in Table I.Fig. 9(a) plots the normalized time of the proposed pruning

algorithm for 1D and 2D interleavers as a function of interleaversize. Also shown are the corresponding times of serial pruningalgorithms. Fig. 9(b) shows the results for 2-stream interleavers.The plots demonstrate a reduction of 3 to 4 orders of magnitudein pruning time compared to the serial case.

IX. CONCLUSIONS AND REMARKS

A mathematical formulation for analyzing the pruning of bit-reversal permutations has been presented. Pruning a permuta-tion has been cast mathematically in terms of evaluating sumsinvolving integer floors and saw-tooth functions. Bit-reversalpermutations have been characterized in terms of permutationstatistics, and shown to possess weak correlation properties.Moreover, using a new permutation statistic called permuta-tion inliers that characterizes the pruning gap of BRPs, a com-putationally efficient algorithm for parallelizing serially-prunedbit-reversal interleavers has been proposed. Extensions to blockand stream interleavers have been considered as well. The effi-ciency of this algorithm in terms of reducing interleaving la-tency and memory overhead has been demonstrated in the con-text of LTE turbo codes and pruned FFTs. The importance of thisalgorithm further lies in that it enables flexible and high speedimplementations of PBRIs and other pruned permutations em-ployed in communication standards that support multiple datarates and variable-length codewords.The work proposed in this paper can be applied to more

general block interleavers that involve generic permutations.We are investigating the class of interleavers based on permu-tation polynomials of general degree [19], [61]. Similar to(12), these permutations require evaluating sums of the form

Page 15: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

MANSOUR: PRUNED BIT-REVERSAL PERMUTATIONS 3095

Fig. 9. Normalized pruned interleaving time as a function of interleaver size for (a) 1D and 2D interleavers, and (b) 2-stream interleavers.

with constant coeffi-cients , including the class of second-degree QPPs. Weconjecture that there exist recursive Euclidean-like algorithmsto evaluate these sums that are analogous to those used forevaluating sums for linear permutation polynomials based ongeneralized Dedekind sums [25].

APPENDIX

PROOF OF THEOREM 1

Apply the Binomial theorem to expand ,followed by the Bernoulli expansion

, where are theBernoulli numbers .

PROOF OF LEMMA 5

For (18), we have. For (19), let for in-

teger and real . Then

, which follows since .

For (20), since the twosum the same elements but in a different order. The proof of(21) is similar to (19) by noting that is a

permutation and that . Finally for the

second part of (18), let , , ,

then

using (14). Since is a

permutation on , then by (20).

PROOF OF LEMMA 6

If , then and

using (14). Hence

we assume . If , then using (10),

reduces to , while if

, it simplifies to . For (23), we use thefollowing property that relates on bits to onbits:

;.

(54)

Then

which equals zero using (13).

PROOF OF LEMMA 7

We first show that the sum in (24)

satisfies the recursion for byenumerating the quadratic residues modulo , which areintegers of the form and .From number theory, these residue classes are the odd integersin of the form , . Since thereare a total of odd integers in , and odd integers are

(equally distributed), then the number ofquadratic residues is . Moreover, if the oddinteger maps to the residue , then so do

. Hence,

which is independent of .

Therefore,

which

proves , with initial conditions ,. For , this recursion can be rewritten as

if is even, and

Page 16: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

3096 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013

otherwise, with . Solving the recursion, (24) follows.

Finally, substituting (10) in (24), and noting that

when , if is even, andwhen if is odd, (25)follows.

PROOF OF LEMMA 8

Applying (54), we can split in (26) as follows:

(55)

(56)

where in (56), property (15) and (61) with are applied.Applying (22) with and simplifying terms, we obtain

. Solving the recurrencewith

yields .

PROOF OF LEMMA 9

Using (54) we can split similar to (55). If is odd,set . Then using

(61), reduces to

. Otherwise, if is

even, set . Using (61),

reduces to

. Evaluating the first sumin both cases using (22), (27) and (28) follow.

PROOF OF LEMMA 10

From (29), obviously if or , then. Hence we assume , . Using (54) we can split

similar to (55), then apply (61) with to obtain

Case 1: When is odd, applying (61) again weobtain

and , where. Adding the two and using

(15), we get

. Simplifying thesaw functions using (10), the first equation in (30) follows.Moreover, it is easy to show that these saw functions evaluate toeither dependingon and .Case 2: When is even, still simplifies as shownabove but with , while becomes

.

Hence

.Again, simplifying the saw functions, the second equationin (30) follows. Moreover, it is easy to show that these sawfunctions evaluate to either depending on and .

PROOF OF LEMMA 11

We split similar to (55) with respect to both and ,and then apply (61). First note that for ,

then is even. Therefore .On the other hand, for , then is

odd, and hence .After applying (61) twice for even and odd, the splittingresults in ,

where: ; ; ;

; and

; ;

. Substituting

back for and applying (54) on , and sim-plifying terms, reduces to .Solving the recurrence similar to Lemma 8 with ,(31) follows.

PROOF OF LEMMA 12

For , we have

. For , we split and apply (54)to obtain . Next, for

, we split and apply (54) to and. Similar to earlier proofs, it can be shown that

satisfies the recursion. Let denote the position of the

least significant ‘1’ in . Then

(57)

Substituting back in the recursion we obtain. Finally, when

, a similar derivation results in the re-cursion .

Page 17: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

MANSOUR: PRUNED BIT-REVERSAL PERMUTATIONS 3097

PROOF OF LEMMA 13

The proof is lengthy. Due to lack of space, we outline theproof only. Assume are both even. Other cases are similar.We split (34) similar to earlier proofs, apply (54) to and

, and then adjust the missing terms for and. Next, apply Lemma (23) twice, multiply out terms,

and then use property (21).

PROOF OF LEMMA 14

The proof is similar to Lemma 13. The following identity isalso applied: since bothgive 1 when . Due to lack of space, details areomitted.

PROOF OF LEMMA 17

Consider for . When is even,

we have

. When is odd,

. Simplifying both ex-pressions, (38) follows. Similarly for , when is

even, we have

. When is

odd,

. Simpli-fying both expressions and using (17), (39) follows.

PROOF OF LEMMA 19

The first equality follows since the floor functions are 1when and 0 otherwise. Assume is even and con-sider the binary representation of . Patterns that lead to ex-cedances are , ,

, where in and , the middle patterns are alsoexcedances. For example, andare excedances. The sum of all integers with binary patternis , with pattern

is , and with pattern is, where is the ex-

cedance number in Lemma (18) for . Collecting terms, weget the recursionfor , with . Similarly, when is odd we obtain

for ,with . Solving both recursions, (40) follows.

PROOF OF THEOREM 2

First write (3) as

, then replace with . Expanding

the sum and using (20), (21) we obtain

, where

(58)

and . Equation (58) can be further simplified byexpanding in terms of floor functions, resulting in (42).The condition slightly simplifies the expression forthe constant but does not make the result less general.

PROOF OF COROLLARY 6

after change of variables. The th term of the 2nd sum is 0since . Adding and subtracting the 0th term,(since ), the result follows.

PROOF OF THEOREM 6

for . Writing the summand terms using , multi-plying out terms, and using in (32), the above sum re-duces to

. Let be the position of the least-significant‘1’ in . Substituting similarto (57), (50) follows.

II. PROOF OF THEOREM 7

We first derive bounds on in (29). Table II liststhe first few terms of the minimum and maximumvalues of empirically. It is easy to showby induction that and satisfy the recursions

andfor with initial

conditions and , where andrepresent cases when is even/odd. Solving the recursions, weobtain the bound:

(59)

Let be the minimum integer added to in Algo-rithm 3 at iteration . Then at iteration ,

Page 18: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

3098 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013

TABLE IIMINIMUM AND MAXIMUM VALUES OF

from (2), (43). Substituting the maximum and minimum valuesfrom (59) in this equation, and using the maximum and min-imum values of in (44), we obtain

(60)

where and. To determine the

convergence rate, we study the convergence of the bounds in(60). The solution of the lower-bound recursion

is the sum of a geometric serieswhich

converges to ata rate . Similar resultshold for the upper bound recursion and with subscriptsreplaced by . Hence converges at a rate .Lemma 23: Let be even and . Then for

, we have

odd;even.

(61)

Proof: First write. Case odd: Then (odd)

and (even). If or , then, so . Otherwise if ,

then . Therefore, .Case 2 even: Then . If or ,then . Otherwise, if and ,or , then , so . There-fore, .

REFERENCES[1] J. Ramsey, “Realization of optimum interleavers,” IEEE Trans. Inf.

Theory, vol. 16, no. 3, pp. 338–345, May 1970.[2] G. D. Forney, Jr., “Burst-correcting codes for the classic bursty

channel,” IEEE Trans. Commun. Technol., vol. 19, no. 5, pp. 772–781,Oct. 1971.

[3] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limiterror-correcting coding and decoding: Turbo codes,” in Proc. IEEEConf. Commun. (ICC), Geneva, Switzerland, May 1993, vol. 2, pp.1064–1070.

[4] R. Tanner, “A recursive approach to low complexity codes,” IEEETrans. Inf. Theory, vol. 27, pp. 533–547, Sep. 1981.

[5] R. Gallager, Low-Density Parity-Check Codes. Cambridge, MA,USA: MIT Press, 1963.

[6] E. Zehavi, “8-PSK trellis codes for a Rayleigh channel,” IEEE Trans.Commun., vol. 40, no. 5, pp. 873–884, May 1992.

[7] J. Bingham, “Multicarrier modulation for data transmission: An ideawhose time has come,” IEEE Commun. Mag., vol. 28, no. 5, pp. 5–14,May 1990.

[8] A. Parsons, “The symmetric group in data permutation, with appli-cations to high-bandwidth pipelined FFT architectures,” IEEE SignalProcess. Lett., vol. 16, no. 6, pp. 477–480, Jun. 2009.

[9] A. Skodras and A. Constantinides, “Efficient input-reordering al-gorithms for fast DCT,” IEE Electron. Lett., vol. 27, no. 21, pp.1973–1975, Oct. 1991.

[10] D. Evans, “An improved digit-reversal permutation algorithm for thefast Fourier and Hartley transforms,” IEEE Trans. Acoust., Speech,Signal Process., vol. 35, no. 8, pp. 1120–1125, Aug. 1987.

[11] J. Cooley and J. Tukey, “An algorithm for the machine calculation ofcomplex Fourier series,”Math. Comput., vol. 19, no. 90, pp. 297–301,1965.

[12] C. Burrus, “Unscrambling for fast DFT algorithms,” IEEE Trans.Acoust., Speech, Signal Process., vol. 36, no. 7, pp. 1086–1087, Jul.1988.

[13] K. Kim, “Shuffle memory system,” in Proc. 13th IEEE Int. ParallelProcess. Symp./10th Symp. Parallel Distrib. Process. (IPPS/SPDP),San Juan, Puerto Rico, Apr. 12–16, 1999, pp. 268–272.

[14] M. Portnoff, “An efficient parallel-processing method for transposinglarge matrices in place,” IEEE Trans. Image Process., vol. 8, no. 9, pp.1265–1275, Sep. 1999.

[15] I. Verbauwhede et al., “In-place memory management of algebraic al-gorithms on application specific ICs,” J. VLSI Signal Process., vol. 3,pp. 193–200, 1991.

[16] G. Chang, F. Hwang, and L.-D. Tong, “Characterizing bit permutationnetworks,” Networks, vol. 33, no. 4, pp. 261–267, 1999.

[17] R. Lee et al., “On permutation operations in cipher design,” in Proc.IEEE Conf. Inf. Technol., Coding Comput. (ITCC), Los Alamitos, CA,USA, 2004, vol. 2, pp. 569–577.

[18] R. Garello, G. Montorsi, S. Benedetto, and G. Cancellieri, “Interleaverproperties and their applications to the trellis complexity analysis ofturbo codes,” IEEE Trans. Commun., vol. 49, no. 5, pp. 793–807, May2001.

[19] Evolved Universal Terrestrial Radio Access (E-UTRA): Multiplexingand Channel Coding, 3GPP TS 36.212, 3rd Generation PartnershipProject (3GPP), Sep. 2008.

[20] IEEE Standard For Local And Metropolitan Area Networks—Part20: Air Interface for Mobile Broadband Wireless Access SystemsSupporting Vehicular Mobility, IEEE Standard 802.20, 2008.

[21] IEEE Standard for Local and Metropolitan Area Networks—Part 16:Air Interface For Broadband Wireless Access Systems, IEEE Standard802.16, 2009.

[22] J. Sun and O. Takeshita, “Interleavers for turbo codes using permuta-tion polynomials over integer rings,” IEEE Trans. Inf. Theory, vol. 51,no. 1, pp. 101–119, Jan. 2005.

[23] S. Crozier and P. Guinand, “High-performance low-memory inter-leaver banks for turbo-codes,” in Proc. IEEE Veh. Tech. Conf. (VTC),Newark, NJ, USA, Oct. 2001, vol. 4, pp. 2394–2398.

[24] ETSI Std. EN 302 755 v1.2.1, “Frame structure channel coding andmodulation for a second generation digital terrestrial television broad-casting system (DVB-T2),” ETSI, 2011.

[25] D. Knuth, The Art of Computer Programming, 3rd ed. Reading, MA,USA: Addison-Wesley, 1998, vol. 2.

[26] C. Berrou, Y. Saouter, C. Douillard, S. Kerouedan, and M. Jezequel,“Designing good permutations for turbo codes: Towards a singlemodel,” in Proc. IEEE Conf. Commun. (ICC), Paris, France, Jun.2004, vol. 1, pp. 341–345.

[27] A. Nimbalker, Y. Blankenship, B. Classon, and T. K. Blankenship,“ARP andQPP interleavers for LTE turbo coding,” inProc. IEEEWire-less Commun. Netw. Conf. (WCNC), Las Vegas, NV, USA, Apr. 2008,pp. 1032–1037.

[28] M. Eroz and A. R. Hammons, Jr., “On the design of prunable in-terleavers for turbo codes,” in Proc. IEEE Veh. Tech. Conf. (VTC),Houston, Texas, USA, May 1999, vol. 2, pp. 1669–1673.

[29] M. Ferrari, F. Scalise, and S. Bellini, “Prunable S-random inter-leavers,” in Proc. IEEE Conf. Commun. (ICC), New York, NY, USA,Apr. 2002, vol. 3, pp. 1711–1715.

[30] L. Dinoi and S. Benedetto, “Design of fast-prunable S-random inter-leavers,” IEEE Trans. Wireless Commun., vol. 4, no. 5, pp. 2540–2548,Sep. 2005.

[31] M. M. Mansour, “Parallel lookahead algorithms for pruned inter-leavers,” IEEE Trans. Commun., vol. 57, no. 11, pp. 3188–3194, Nov.2009.

Page 19: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12 ...staff.aub.edu.lb/~mm14/pdf/journals/2013_IEEE_TSP... · Index Terms—Bit-reversal, permutation polynomials, permuta-tion

MANSOUR: PRUNED BIT-REVERSAL PERMUTATIONS 3099

[32] U. Dieter and J. Ahrens, “An exact determination of serial correlationsof pseudo-random numbers,” Numerische Math., vol. 17, pp. 101–123,1971.

[33] R. Polge, B. Bhagavan, and J. Carswell, “Fast computational algo-rithms for bit reversal,” IEEE Trans. Comput., vol. C-23, no. 1, pp.1–9, Jan. 1974.

[34] J. Rodriguez, “An improved bit-reversal algorithm for the fast Fouriertransform,” in Proc. IEEE Conf. Acoust., Speech, Signal Process.(ICASSP), New York, NY, USA, Apr. 1988, vol. 3, pp. 1407–1410.

[35] A. Elster, “Fast bit-reversal algorithms,” in Proc. IEEE Conf. Acoust.,Speech, Signal Process. (ICASSP), Glasgow, Scotland, U.K., May1989, vol. 2, pp. 1099–1102.

[36] A. Biswas, “Bit reversal in FFT from matrix viewpoint,” IEEE Trans.Signal Process., vol. 39, no. 6, pp. 1415–1418, Jun. 1991.

[37] A. Yong, “A better FFT bit-reversal algorithm without tables,” IEEETrans. Signal Process., vol. 39, no. 10, pp. 2365–2367, Oct. 1991.

[38] M. Orchard, “Fast bit-reversal algorithms based on index represen-tations in GF(2b),” IEEE Trans. Signal Process., vol. 40, no. 4, pp.1004–1008, Apr. 1992.

[39] J. Jeong and W. Williams, “A unified fast recursive algorithm for datashuffling in various orders,” IEEE Trans. Signal Process., vol. 40, no.5, pp. 1091–1095, May 1992.

[40] J. Rius and R. De Porrata-Doria, “New FFT bit-reversal algorithm,”IEEE Trans. Signal Process., vol. 43, no. 4, pp. 991–994, Apr. 1995.

[41] K. Drouiche, “A new efficient computational algorithm for bit reversalmapping,” IEEE Trans. Signal Process., vol. 49, no. 1, pp. 251–254,Jan. 2001.

[42] J. Prado, “A new fast bit-reversal permutation algorithm based on asymmetry,” IEEE Signal Process. Lett., vol. 11, no. 12, pp. 933–936,Dec. 2004.

[43] S.-C. Pei and K.-W. Chang, “Efficient bit and digital reversal algorithmusing vector calculation,” IEEE Trans. Signal Process., vol. 55, no. 3,pp. 1173–1175, Mar. 2007.

[44] M. M. Mansour, “A parallel pruned bit-reversal interleaver,” IEEETrans. VLSI Syst., vol. 17, no. 8, pp. 1147–1151, Aug. 2009.

[45] R. Clarke, E. Steingrímsson, and J. Zeng, “New Euler-Mahonian per-mutation statistics,” Adv. Appl. Math., vol. 18, pp. 237–270, 1997.

[46] P. MacMahon, Combinatory Analysis. Cambridge, U.K.: CambridgeUniv. Press, 1915, vol. 1–2, (Reprint. by Chelsea, New York, 1955).

[47] P. MacMahon, “The indices of permutations and the derivation there-from of functions of a single variable associated with the permuta-tions of any assemblage of objects,” Amer. J. Math., vol. 35, no. 3,pp. 281–322, 1913.

[48] D. Divsalar and F. Pollara, “Multiple turbo codes,” in Proc. IEEE Mil.Commun. Conf. (MILCOM), San Diego, CA, USA, Nov. 1995, vol. 1,pp. 279–285.

[49] IEEE Standard for Local and Metropolitan Area Networks—Part 11:Wireless LAN Medium Access Control (MAC) and Physical Layer(PHY) Specifications: Enhancements for Higher Throughput, IEEEStandard 802.11n, 2009.

[50] S. Holm, “FFT pruning applied to time domain interpolation and peaklocalization,” IEEE Trans. Acoust., Speech, Signal Process., vol. 35,no. 12, pp. 1776–1778, Dec. 1987.

[51] S. He and M. Torkelson, “Computing partial DFT for comb spectrumevaluation,” IEEE Signal Process. Lett., vol. 3, no. 6, pp. 173–175, Jun.1996.

[52] T. Sreenivas and P. Rao, “High-resolution narrow-band spectra by FFTpruning,” IEEE Trans. Acoust., Speech, Signal Process., vol. 28, no. 2,pp. 254–257, Apr. 1980.

[53] Z. Hu and H. Wan, “A novel generic fast Fourier transform pruningtechnique and complexity analysis,” IEEE Trans. Signal Process., vol.53, no. 1, pp. 274–282, Jan. 2005.

[54] J. Markel, “FFT pruning,” IEEE Trans. Audio Electroacoust., vol. 19,no. 4, pp. 305–311, Dec. 1971.

[55] T. Sreenivas and P. Rao, “FFT algorithm for both input and outputpruning,” IEEE Trans. Acoust., Speech, Signal Process., vol. 27, no. 3,pp. 291–292, Jun. 1979.

[56] H. V. Sorensen and C. S. Burrus, “Efficient computation of the DFTwith only a subset of input or output points,” IEEE Trans. SignalProcess., vol. 41, no. 3, pp. 1184–1200, Mar. 1993.

[57] L. Wang, X. Zhou, G. E. Sobelman, and R. Liu, “Generic mixed-radixFFT pruning,” IEEE Signal Process. Lett., vol. 19, no. 3, pp. 167–170,Mar. 2012.

[58] A. Nimbalker, T. K. Blankenship, B. Classon, T. E. Fuja, and D. J.Costello, Jr., “Contention-free interleavers for high-throughput turbodecoding,” IEEE Trans. Commun., vol. 56, no. 8, pp. 1258–1267, Aug.2008.

[59] O. Takeshita, “On maximum contention-free interleavers and permu-tation polynomials over integer rings,” IEEE Trans. Inf. Theory, vol.52, no. 3, pp. 1249–1253, Mar. 2006.

[60] S. Dolinar and D. Divsalar, “Weight distributions for turbo codesusing random and nonrandom permutations,” JPL TDA Progress Rep.42-122, Aug. 1995.

[61] O.Y. Takeshita, “Permutation polynomial interleavers: An algebraic-geometric perspective,” IEEE Trans. Inf. Theory, vol. 53, no. 6, pp.2116–2132, Jun. 2007.

Mohammad M. Mansour (SM’08) received theB.E. degree with distinction and the M.E. degree,both in computer and communications engineering,from the American University of Beirut (AUB),Lebanon, in 1996 and 1998, respectively, and theM.S. degree in mathematics and the Ph.D. degree inelectrical engineering from the University of Illinoisat Urbana-Champaign (UIUC), Urbana, IL, USA, in2002 and 2003, respectively.In 1997, he was a Research Assistant at the Elec-

trical and Computer Engineering (ECE) Departmentat AUB, and in 1996, he was a Teaching Assistant at the same department.From 1998 to 2003, he was a Research Assistant at the Coordinated ScienceLaboratory (CSL) at UIUC. During summer 2000, he worked at National Semi-conductor Corporation, San Francisco, CA, USA, with the wireless researchgroup. He joined the faculty at AUB in October 2003. He is currently an As-sociate Professor of electrical and computer engineering with the ECE Depart-ment at AUB. From December 2006 to August 2008, he was on research leavewith QUALCOMM Flarion Technologies, Bridgewater, NJ, USA, where heworked on modem design and implementation for 3GPP-LTE, 3GPP-UMB, andpeer-to-peer wireless networking PHY layer standards. His research interestsare VLSI design and implementation for embedded signal processing and wire-less communications systems, coding theory and its applications, digital signalprocessing systems, and general purpose computing systems.Prof. Mansour is a member of the Design and Implementation of Signal Pro-

cessing Systems Technical Committee of the IEEE Signal Processing Society,and a Senior Member of the IEEE. He has been serving as an Associate Editor(AE) for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II since April 2008,AE for the IEEE TRANSACTIONS ON VLSI SYSTEMS since January 2011, andAE for the IEEE SIGNAL PROCESSING LETTERS since January 2012. He servedas the Technical Co-Chair of the IEEEWorkshop on Signal Processing Systems(SiPS 2011), and as a member of the Technical Program Committee of variousinternational conferences. He is the recipient of the PHI Kappa PHI Honor So-ciety Award twice in 2000 and 2001, and the recipient of the Hewlett FoundationFellowship Award in March 2006.


Recommended