+ All Categories
Home > Documents > Reduced Complexity Software Receivers for TD …combines aspects of TDMA with CDMA. It allows for...

Reduced Complexity Software Receivers for TD …combines aspects of TDMA with CDMA. It allows for...

Date post: 13-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
5
Reduced Complexity Software Receivers for TD-SCDMA Downlink Sanyogita Shamsunder and John Glossner Sandbridge Technologies White Plains, NY 10601 Email: {sshamsunder, glossner}@sandbridgetech.com Abstract— Evolving 3G standards such as TD-SCDMA use multi-user detection (MUD) at the base station to enhance the link bugdet in the uplink. Several sub-optimal block-based techniques have been proposed for detecting the user signals in multipath. These methods have been prohibitively expensive for implementation at the mobile. In this paper, we examine some of the most promising algorithms and consider their suitability for implementation in software on the Sandblaster platform. It is shown that joint detection at the mobile is well within the capabilities of the Sandbridge processor. I. I NTRODUCTION TD-SCDMA belongs to the family of 3G wireless standards as defined by the 3GPP and will be deployed by many carriers in their TDD mode. The 3GPP has adopted 2 different chip rates for this mode, high (3.84 Mcps) and low (1.28 Mcps). The former is compatible with WCDMA FDD. Both versions of TD-SCDMA include sophisticated physical layer techniques such as smart antennas and joint detection to increase system capacity even with inter-chip and multi-user interference. Joint or multi-user detection at the baseband tremendously improves performance over a conventional Rake receiver. The latter deteriorates in performance when multipath reduces the orthogonality of the spreading codes. Multi-user detection techniques for the uplink have been advocated even during the TD-SCDMA standardization process, e.g., [2] and [3]. However, the complexity of these detection techniques has prevented their widespread use especially on the downlink at the mobile. Because of the slotted TDMA scheme with short PN sequences and a small number of simultaneous users, and the availibility of powerful baseband processors, there is a possibility for using MUD even in the downlink. II. TD-SCDMA SYSTEM AND SIGNAL MODEL The TD-SCDMA system as defined in the 3GPP standard combines aspects of TDMA with CDMA. It allows for the existence of multiple users in a given uplink or downlink time- slot. The structure of a time-slot is shown in Figure 1. In the low chip rate mode, TD-SCDMA defines 7 slots (5 ms) with the first slot reserved for downlink and multiple switching points for reversing the direction of transmission. Each user data vector is spread and the composite downlink data vector is scrambled using a scrambling code of length 16. A downlink burst consists of two data blocks with a user-specific mid- amble that is used for various tasks by the receiver. There is also a guard interval between neighbouring bursts. Based on the supported data rates, the modulation can be either QPSK, 8PSK or 16QAM. Let K be the number of synchronous downlink codes in a time-slot. A user can have multiple codes assigned to it. Let N be the number of symbols/block and 1 Q 16 be the spreading factor. It is assumed that there is a single receive antenna at the mobile. Let c k , d k (i) and Q k be the spreading code, data symbols and the spreading factor for the k th -user. The discrete-time signal transmitted by the base station in TD- SCDMA is y(n)= z(n) K k=1 [ i d k (i)c k (n - iQ k )], (1) where z(n) the base-station specific scrambling code will be dropped in the remainder of the paper as it is not relevant to the algorithm comparison. Let h be the downlink channel impulse response with duration W chips. The filtered and sampled signal at the receiver is: x = Ad + v. (2) The vector x denotes the received chips (1 sample/chip) in a slot and the columns of (NQ + W - 1) × NK sized matrix A are the convolution of h and c k , see e.g., [4]. The block- banded A is as shown in Figure 2. The noise vector v is approximately additive, white Gaussian (AWGN) and includes interference. In a typical TD-SCDMA deployment, for the downlink K 8, Q ∈{1, 2, 4, 8, 16}, 2N = 704/Q and W 20. Without loss of generaility, let d 1 (n) be the desired user symbols. A receiver matched to this user treats the interfering user signals as additive noise and extracts the desired signal based on the user-specific spreading or channelization code. Therefore, a Rake receiver, which is based on the matched- filter bank concept, is susceptible to near-far effects and deteriorates in performance when stronger undesired users are present. However, due to its simplicity, this is the method of choice in most downlink receivers for cellular CDMA. 0 The channel here includes the transmit and recive filters as well
Transcript
Page 1: Reduced Complexity Software Receivers for TD …combines aspects of TDMA with CDMA. It allows for the existence of multiple users in a given uplink or downlink time-slot. The structure

Reduced Complexity Software Receivers for TD-SCDMA Downlink

Sanyogita Shamsunder and John GlossnerSandbridge TechnologiesWhite Plains, NY 10601

Email: {sshamsunder, glossner}@sandbridgetech.com

Abstract— Evolving 3G standards such as TD-SCDMA usemulti-user detection (MUD) at the base station to enhancethe link bugdet in the uplink. Several sub-optimal block-basedtechniques have been proposed for detecting the user signals inmultipath. These methods have been prohibitively expensive forimplementation at the mobile. In this paper, we examine someof the most promising algorithms and consider their suitabilityfor implementation in software on the Sandblaster platform. Itis shown that joint detection at the mobile is well within thecapabilities of the Sandbridge processor.

I. I NTRODUCTION

TD-SCDMA belongs to the family of 3G wireless standardsas defined by the 3GPP and will be deployed by manycarriers in their TDD mode. The 3GPP has adopted 2 differentchip rates for this mode, high (3.84 Mcps) and low (1.28Mcps). The former is compatible with WCDMA FDD. Bothversions of TD-SCDMA include sophisticated physical layertechniques such as smart antennas and joint detection toincrease system capacity even with inter-chip and multi-userinterference.

Joint or multi-user detection at the baseband tremendouslyimproves performance over a conventional Rake receiver. Thelatter deteriorates in performance when multipath reduces theorthogonality of the spreading codes. Multi-user detectiontechniques for the uplink have been advocated even duringthe TD-SCDMA standardization process, e.g., [2] and [3].However, the complexity of these detection techniques hasprevented their widespread use especially on the downlink atthe mobile. Because of the slotted TDMA scheme with shortPN sequences and a small number of simultaneous users, andthe availibility of powerful baseband processors, there is apossibility for using MUD even in the downlink.

II. TD-SCDMA SYSTEM AND SIGNAL MODEL

The TD-SCDMA system as defined in the 3GPP standardcombines aspects of TDMA with CDMA. It allows for theexistence of multiple users in a given uplink or downlink time-slot. The structure of a time-slot is shown in Figure 1. Inthe low chip rate mode, TD-SCDMA defines 7 slots (5 ms)with the first slot reserved for downlink and multiple switchingpoints for reversing the direction of transmission. Each userdata vector is spread and the composite downlink data vector isscrambled using a scrambling code of length 16. A downlink

burst consists of two data blocks with a user-specific mid-amble that is used for various tasks by the receiver. There isalso a guard interval between neighbouring bursts. Based onthe supported data rates, the modulation can be either QPSK,8PSK or 16QAM.

Let K be the number of synchronous downlink codes in atime-slot. A user can have multiple codes assigned to it. LetN be the number of symbols/block and1 ≤ Q ≤ 16 be thespreading factor. It is assumed that there is a single receiveantenna at the mobile. Letck, dk(i) andQk be the spreadingcode, data symbols and the spreading factor for thekth−user.The discrete-time signal transmitted by the base station in TD-SCDMA is

y(n) = z(n)K∑

k=1

[∑

i

dk(i)ck(n− iQk)], (1)

wherez(n) the base-station specific scrambling code will bedropped in the remainder of the paper as it is not relevant to thealgorithm comparison. Leth be the downlink channel impulseresponse with durationW chips. The filtered and sampledsignal at the receiver is:

x = Ad + v. (2)

The vectorx denotes the received chips (1 sample/chip) in aslot and the columns of(NQ + W − 1) ×NK sized matrixA are the convolution ofh andck, see e.g., [4]. The block-bandedA is as shown in Figure 2. The noise vectorv isapproximately additive, white Gaussian (AWGN) and includesinterference. In a typical TD-SCDMA deployment, for thedownlink K ≤ 8, Q ∈ {1, 2, 4, 8, 16}, 2N = 704/Q andW ≈ 20.

Without loss of generaility, letd1(n) be the desired usersymbols. A receiver matched to this user treats the interferinguser signals as additive noise and extracts the desired signalbased on the user-specific spreading or channelization code.Therefore, a Rake receiver, which is based on the matched-filter bank concept, is susceptible to near-far effects anddeteriorates in performance when stronger undesired users arepresent. However, due to its simplicity, this is the method ofchoice in most downlink receivers for cellular CDMA.

0The channel here includes the transmit and recive filters as well

Page 2: Reduced Complexity Software Receivers for TD …combines aspects of TDMA with CDMA. It allows for the existence of multiple users in a given uplink or downlink time-slot. The structure

On the other hand, a joint-detector exploits the structureof the interfering signals to jointly estimate all the usersymbolsd. MUDs are near-far resistant but are not practicalfor most of the cellular CDMA receivers. The time-slottednature of TD-SCDMA enables the use of MUDs at the base-station side. Several sub-optimal and computationally efficienttechniques for inverting the system matrixA in (2) havebeen proposed ([2], [3] and references therein). The resultingalgorithms are amenable to hardware/software implementationat the base-station where complexity and power constraints arenot critical; they are still considered complex for use at thehand-set. In addition, most commercial applications usuallyemploy dedicated hardware for such complex receiver blocks.

Traditional communications systems have typically beenimplemented using custom hardware solutions. Chip rate,symbol rate, and bit rate co-processors are often coordinatedby programmable DSPs but the DSP processor does nottypically participate in computationally intensive tasks. As ittypically happens in a modern day receiver, when multiplesystems requirements are considered, both silicon area anddesign validation are major inhibitors to commercial success.A software-based platform capable of dynamically reconfigur-ing communications systems enables elegant reuse of siliconarea and dramatically reduces time to market through softwaremodifications instead of time consuming hardware redesigns.SDR solutions based on the SandBlaster platform have beenproposed for WCDMA, 802.11 and other wireless basebandreceivers [6] and [8].

In this paper, we will examine three different reducedcomplexity techniques that have been proposed for joint de-tection. We will consider their suitability for implementionon a fixed-point, multi-threaded Sandblaster platform [7]. Thealgorithms are briefly described in the next Section and theimplementation issues are discussed in Section IV.

III. M ULTI -USERDETECTION ALGORITHMS

To solve ford in (2), we need the system matrixA whichin turn can be easily computed based on estimates of thechannel coefficientsh and spreading codes. In this paper, weassume that he channel coefficients are known. The MaximumLikelihood solution for (2) involves searching ford over amulti-dimensional space and is thus impractical. The sub-optimal least-squares estimate of theNK data symbols isgiven by:

d̂ = T−1y, where, T = (AHA),y = AHx. (3)

There are a number of approaches for implementing jointdetection in a CDMA system. The zero-forcing (ZF) equalizeror decorrelating detector applies the inverse of the systemmatrix to separate the user signals and eliminate multi-accessinterference (MAI). This scheme is very popular and was

considered, for the early TD-SCDMA demos [1]. A Minimum-Mean Square Error (MMSE) detector minimizes the errorbetween the weighted received signal and the desired bits andcan result in lower BER at high SNR levels. The computa-tional costs of the ZF equalizer are smaller than that for theMMSE detector because the latter requires an estimate of thenoise covariance matrix; however, the implementation issuesare very similar. This paper only addresses the zero-forcingdetector.

A. Complexity of Cholesky Decomposition

In theory, the pseudo-inverse ofA can be computed viathe Singular Value Decomposition (SVD). However, due to itscomplexity, SVD is not a practical approach. For Hermitianmatrices, the Cholesky factorization which results in a lowertriangular matrix requires fewer computations. TheNK×NK

matrix T = AHA in (3), is Hermitian. The matricesA andT are block diagonal with2ν− 1, ν = d(Q+W − 1)/Qe− 1non-zero diagonal blocks. Direct computation of the Choleskyfactors ofT requiresO(N3K3) operations [10], which resultsin exponential complexity for a typical TD-SCDMA handset.Let R denote the Cholesky factor ofT. The computationscan be reduced by a factor ofN2 by exploiting the block-banded property ofT. Also, asN →∞, the Cholesky factorR is approximately block Toeplitz forN >> ν [9]. Only thefirst few block rowsar ≥ ν of R need to be computed; theremaining blocks are simply copies of the last computed block.This approximation error is acceptable as long asN >> ν

and ar ≥ ν. Additional details on complexity reduction forthis approach can be found in [2] and [5].

B. Complexity of Schur Decomposition

SinceT is sparse and block Toeplitz, its Schur decomposi-tion is an efficient way to Cholesky factorization. By workingwith a low redundancy representation based on the generatorsof T rather thanT, leads to a more efficient algorithm.Generator computation involves multiplications, square-rootsand reciprocals, for each of theNK elements of the generatorvectorsαi, 1 ≤ i ≤ K. The algorithm proceeds by computingthe lower triangular the Cholesky factor ofG, e.q., [11] and[4]. The resulting factor is also the Cholesky factor ofT.

For example, if Given rotations is used to reduceG, then aselements gets zero-ed out and rows are eliminated,G shrinksprogressively as the lower triangular matrixR is built. Thealgorithm may be terminated as soon asar ≥ ν block rowsare computed. The algorithm as applied to TD-SCDMA uplinkis described in detail in [4].

C. Complexity of Block Fourier Algorithm

Another approach to solving the least-squares solution (2)is uses the fact thatA is approximately block circulant.The eigen vectors of circulant matrices are the columns of

Page 3: Reduced Complexity Software Receivers for TD …combines aspects of TDMA with CDMA. It allows for the existence of multiple users in a given uplink or downlink time-slot. The structure

Discrete Fourier Transform (DFT) matrix [10]. Thus, systemsof equations can be solved via a diagonalization using theFourier Transform. In [3] a frequency domain approach wassuggested for performing joint detection. The block ToeplitzmatrixT is made block circulant (and thus square) by paddingit with rows and columns. This approach to joint detectionrequires multiple DFTs, reciprocal and square-root operations.As NK increases, the complexity can be managed by applyingthe FFT to smaller overlapping blocks of data. As long as theblock size is greater thanN + ν− 1, this approximation leadsto acceptable error. It was shown in [3], that the block Fourierapproach requires fewer real multiplications than either theapproximate Cholesky or block Schur techniques.

IV. SOFTWARE-DEFINED RADIO IMPLEMENTATION

The increasing need for the support of multiple wirelessstandards and eternally evolving standards has led to theadoption of software-defined radios in mobile terminals. Fur-ther, DSPs are increasingly powerful providing billions ofoperations per second and with power efficiency levels thatare appropriate for handset deployment.

Sandbridge Technologies has designed a multi-threadedprocessor capable of executing DSP, Control, and Java codein a single compound instruction set optimized for handsetradio applications. The Sandbridge design overcomes thedeficiencies of previous approaches by providing substantialparallelism and throughput for high-performance DSP appli-cations while maintaining fast interrupt response, high-levellanguage programmability, and very low power dissipation.

As shown in Figure 9, the design includes a unique com-bination of modern techniques such as a SIMD Vector/DSPunit, a parallel reduction unit, and RISC-based integer unit.Instruction space is conserved through the use of compoundedinstructions that are grouped into packets for execution. Theresulting combination provides for efficient Control Code,DSP, and Java processing execution.

The SandblasterTM platform consists of the fixed- pointSandblaster multi-threaded DSP processor, see Figure 9, whichdoes the base band processing. The software tool chain is pri-marily dedicated towards generating and simulating efficientcode for this processor. The Sandbridge compiler analyzesthe C code, automatically extracts the DSP operations andsynthesizes optimized DSP code without the excess operationsrequired to specify DSP arithmetic in C code. This techniquehas a significant software productivity gain over intrinsicfunctions.

The Sandbridge vectorizing compiler is efficient at ex-tracting this parallelism using Vectorizing optimizations. TheSandbridge compiler also handles the difficult problem ofouter loop vectorization which is often a requirement for innerloop optimizations.

The common steps in all the three algorithms are the matrixmultiplication involving the block diagonalA required to gen-erateT andy. These operations can be easily vectorized andimplemented in parallel on the multi-threaded platform. Forthe Schur decompsition, the generator matrix computation andsome steps in the Given rotations involve matrix multiplies;these are again implemented efficiently as vector operationson the Sandbridge platform.

The Fourier domain technique operates on chunks of re-ceived samples, thus it lends itself well to low latency ap-plications. The block Fourier algorithm can be split up sothat multiple threads implement the FFT on the data blocks.Note also that the FFT block size is dictated by the numberof data symbols in a TD-SCDMA burst. Small block sizesalso lead to larger implementation overheads, while largerblock sizes lead to greater FFT complexity and latency. Thecurrent FFT implementation requiresN log2(N) MACs for thetypical block sizes encountered in TD-SCDMA. Finally, allthe algorithms involve several scalar inversions (reciprocals)and square-root operations which are implemented via iterativetechniques requiring several cycles.

Figures 3- 8 compare the estimated processing power re-quired for a software implementation of the joint detectoron the Sandblaster platform. Other implementation overheadssuch as those due to synchronization, memory access are notincluded here. The techniques used by Vollmeret al, [3] areemployed to estimate the number of operations in each case.In all cases, the block size for the Fourier algorithm is constantat 32. Reference [3] compared the algorithms taking intoaccount only the number of real multiplications, while hereother compute intensive operations such as reciprocals andsquare-roots are also taking into account. It was also shownthat the Fourier-based approach required the least number ofmultiplications. However, as is evident from the results here,the gains in a practical implementation are smaller (or evennon-existent). This is primarily due to the greater numberof reciprocal and square-root operations needed in the blockFourier method when compared to either the approximateCholesky or Schur. These operations consume many morecycles in a typical processor. Also, we are currently optimizingour FFT performance, however, the Fourier method will stillbe inefficient in a few cases.

Note that the complexity of the approximate Cholesky andSchur methods depends on the number of rowsar that arecomputed. The approximation error is small as long asar >

ν. Thus the complexity goes up with the delay spread. Forexample, in Figure 5 withar = 6, the Schur and Fourier arecomparable in complexity. But ifar = 4, the complexity ofthe Schur algorithm drops (ar is not relevant to the Fouriermethod).

The Cholesky and Schur decompositions require

Page 4: Reduced Complexity Software Receivers for TD …combines aspects of TDMA with CDMA. It allows for the existence of multiple users in a given uplink or downlink time-slot. The structure

Guard�

period�

Data Block 2�N symbols or NQ chips�

Midamble�user �k�

Data Block 1�N symbols or NQ chips�

7 slots per 5 ms sub-frame (1.28 Mcps, low chip rate option)�

Slot duration = 5/7 ms�

Fig. 1. Structure of a TD-SCDMA burst

V�

V�V�

V�

Block diagonal matrix A� (NQ+W-1) x NK�

Q�

K�

V�

Q+W-1�

matrix V�

b�1� b�2� b�k�

Fig. 2. Structure of the system matrixA. The columnsbk = hk ∗ ck.

O(ν2NK3) and O(νNK3) respectively. However, theconstants involved in the Schur are much larger than thosein Cholesky. Therefore, unlessν is large, the complexity ofthe two algorithms is comparable. However, the orthogonaloperations used in the Schur algorithm are less susceptibleto numerical errors than are the row operations used in theCholesky decomposition [11]. Thus, depending on the givenscenario, one of the three algorithms may be used for jointdetection. For example, for longer channel delay spreads,the Fourier method, since it is a frequency domain approachoffers lower complexity. While support of multiple algorithmsis expensive to implement in hardware, it is certainly feasiblein a software platform such as the Sandblaster.

V. CONCLUSIONS

We have shown that it is possible to implement a sub-optimal TD-SCDMA joint or multi-user detector in a software-based downlink receiver. The candidate algorithms consideredeach offer different advantages under varying operating con-ditions. Since the implementation is in software, it is possible,without additional costs, to switch to the algorithm that bestsuits the given operating scenario.

0 2 4 6 8 10 12 14 160

200

400

600

800

1000

1200

1400

Number of spreading codes, K

MH

z

Complexity Comparison

Cholesky & Schur: ar = 4

N=22, Q=16, W=20

CholeskySchurFourier

Fig. 3. Complexity as a function of number of spreading codes,N = 22

1 1.5 2 2.5 3 3.5 40

100

200

300

400

500

600

700

Number of spreading codes, K

MH

z

Complexity Comparison

N=88, Q=4, W=20

Cholesky & Schur: ar = 6

CholeskySchurFourier

Fig. 4. Complexity as a function of number of spreading codes,N = 88

REFERENCES

[1] “Zero-forcing and minimum-mean square error equalization for multi-user detection in code-division multiple access channels,”IEEE Trans.on Veh. Tech., pp. 276-287, May 1996.

[2] N. W. Anderson, H. R. Karimi, and P. Mangold,“Software-DefinableImplementation of a TDMA/CDMA Transciever,”Proc. of ICSPAT, 1998.

[3] M. Vollmer, M. Haardt, J. Gotze, “Comparative Study of Joint-DetectionTechniques for TD-CDMA Based Mobile Radio Systems”,IEEE J.Selected Areas of Communications, pp. 1461-1475, Aug. 2001.

[4] M. Vollmer, M. Haardt, J. Gotze, “Schur Algorithms for joint-detection inTD-CDMA based mobile radio systems”,Annals of Telecommunications,pp. 365-378, 1999.

[5] M. Beretta, A. Colamonico, M. Nicoli, V. Rampa, U. Spagnolini, “Space-Time multi-user detectors for TDD-UTRA: design and optimization”,Proc. IEEE of VTC, pp. 375-379, 2001.

[6] J. Glossner, D. Iancu, J. Lu, E. Hokenek, and M. Moudgill, “Soft-ware Defined Communications Baseband Design”,IEEE CommunicationsMagazine, Vol. 41, No. 1, pages 120-128, January, 2003.

[7] S. Jinturkar, J. Glossner, E. Hokenek, and M. Moudgill, “Programmingthe Sandbridge Multithreaded Processor”,Proceedings of the 2003 GlobalSignal Processing Expo (GSPx) and International Signal ProcessingConference (ISPC), March 31-April 3, 2003, Dallas, Texas.

[8] J. Glossner, D. Iancu, G. Nacer, S. Stanley, E. Hokenek, and M. Moudgill,“Multiple Communication Protocols for Software Defined Radio”,IEEColloquium on DSP Enable Radio, pp. 227-236, September 22-23, 2003,ISIL, Livingston, Scotland.

[9] J. Rissanen, “Algorithms for Triangular Decomposition of Block Hankeland Toeplitz Matrices with Application to Factoring Positive MatrixPolynomials,”Math. Computations, Vol. 27, pp. 147-154, Jan 1973.

Page 5: Reduced Complexity Software Receivers for TD …combines aspects of TDMA with CDMA. It allows for the existence of multiple users in a given uplink or downlink time-slot. The structure

5 10 15 20 25 30 35 40200

250

300

350

400

450

500

Channel Memory, W (in chips)

MH

z

Complexity Comparison

Cholesky & Schur: ar=6

N=22, Q=16, K=8

CholeskySchurFourier

Fig. 5. Complexity as a function of channel memory,N = 22

5 10 15 20 25 30 35 40150

200

250

300

350

400

450

500

Channel Memory, W (in chips)

MH

z

Complexity Comparison

N=22, Q=16, K=8

Cholesky & Schur: ar=4

CholeskySchurFourier

Fig. 6. Complexity as a function of channel memory,N = 88

[10] G. H. Golub and C. F. VanLoan,Matrix Computations, The JohnsHopkins University Press, 1991.

[11] G. Golub and I. Mitchell, “ Matrix Factorizations in FixedPoint on the C6x VLIW Architecture”, TI report, 1998,http://sccm.stanford.edu/students/mitchell/reportTI.ps

20 30 40 50 60 70 80 90100

200

300

400

500

600

700

800

900

1000

1100

Number of Symbols, N

MH

z

Complexity Comparison

K = 8, W = 20, N & Q varying

Cholesky & Schur: 6 row approx

CholeskySchurFourier

Fig. 7. Complexity as a function of number of user symbols/block,8 parallelusers

20 30 40 50 60 70 80 9050

100

150

200

250

300

350

400

450

500

Number of Symbols, N

MH

zComplexity Comparison

K = 4, W = 20, N & Q varying

Cholesky & Schur: 6 row approx

CholeskySchurFourier

Fig. 8. Complexity as a function of number of user symbols/block,4 parallelusers

Inter�-�Chip Connection� Ext

erna

l Mem

ory�

Inter�-�Chip Connection� Ext

erna

l Mem

ory�

Inter�-�Chip Connection� Ext

erna

l Mem

ory�

Inter�-�Chip Connection� Ext

erna

l Mem

ory�

Inter�-�Chip Connection� Ext

erna

l Mem

ory�

Inter�-�Chip Connection� Ext

erna

l Mem

ory�

Thr

ead

Cac

he�

Instruction Decode�

Branch � PC�PC�CR�CR�

LR�LR�

CTR�CTR�

Integer IQ�

Register�File�

Offset�Offset�

External Memory�External Memory�External Memory�Data Memory�External Memory�External Memory�External Memory�Data Memory�

Data Buffer�

MPY�

VRABC�

Vector �File�

MPY�

VRABC�

Vector�File�

PABC�

MPY�

Vector�File�

MPY�

Vector�File�

Vector IQ�

Offset�Offset�

SAT�

VRABC� VRABC�

PABC�PABC�PABC�

ACC� ACC� ACC� ACC�

RA� RB�

ADD�

RA� RB�

ADD�ADD� ADD� ADD� ADD�

ADD�

Fig. 9. SandblasterTM Multithreaded Processor


Recommended