Reconfigurable Baseband Blocks for Wireless Multistandard Transceivers
Department of Electrical and Computer Engineering
Faculty of Engineering and Architecture American University of Beirut
Final Year Project Spring 2005-2006
Advisors: Prof. Mazen Saghir
Prof. Walid Ali Ahmad
Members: Abdul Hadi Al-Sayed 200300531
Hasan Khalifeh 200301843
Houssam Hayek 200302327
Submitted On: 23.5.2006
iii
AcknowledgementsAcknowledgementsAcknowledgementsAcknowledgements
This report couldn’t have been possible without the help of the big-hearted people
who supplied us with their invaluable time, information and appreciated benevolence.
First, we would like to thank our supervisors, Prof. Mazen Saghir and Prof. Walid Ali-
Ahmad, for searching and providing us with essential documents for our project. They
motivated us to achieve our goals. They regularly checked out our latest progress and
provided us with their valuable comments. Second, great thanks to Mr. Khaled Joujou
who supported us on a daily basis and who helped us in installing the PCI 5640
Labview8.0 board on our computer in the communication lab. Third, we would like to
thank National Instruments Company, “a technology pioneer and leader in virtual
instrumentation”, for providing the PCI 5640 Labview8.0 board which is an essential
device in our project.
Table of Contents
v
Table of Contents Table of Contents Table of Contents Table of Contents
List of Illustrations vii
List of Tables ix
Abstract x
1- Introduction 1
1.1- Problem Definition 2
1.2- Report Structure 3
2-Literature Survey 4
2.1- Overview of Proposed Wireless Standards 5
2.1.2- WIMAX 5
2.1.3- WCDMA 8
2.2- Software Radio Concept 11
2.3- FIR Filters 13
2.3.1- Variable FIR Filters 14
2.3.1.1- Design Methods for Variable FIR Filters 14
2.3.3.2- Tap Design With Variable Frequency
Response Filters 15
2.3.2- Area Considerations in FIR Design Schemes 18
2.4- Hardware Platforms 19
2.4.1- FPGA features 19
2.4.3- LABVIEW 8.0 System Board: PCI 5640 20
2.5- Error Vector Magnitude (EVM) Metric 21
3-Design Alternatives 22
3.1- FIR vs. IIR: Advantages and Disadvantages 23
3.2 FPGA vs. DSP Chips 25
4- Project Design& Analysis 27
4.1- System Definition 28
4.2- FIR Filter Coefficients Design 29
4.2.1- FIR System Definition 29
4.2.2- Filter Coefficients Generation 29
4.2.2.1- WIMAX Channel FIR Filter Design 30
4.2.2.2- WCDMA Channel FIR Filter Design 32
4.3- LABVIEW Simulation of 3G Channels 34
4.3.1-WCDMA channel Simulation 34
4.3.1.1-WCDMA Acquire Input Signal 38
4.3.1.2-Noise Introduction VI 41
4.3.1.3-Channel Filter VI 43
Table of Contents
vi
4.3.1.4-EVM VI 45
4.3.2- WIMAX Channel Simulation 47
4.3.2.1- WIMAX VIs Explanation 49
4.4- Reconfigurable System Architecture 53
4.5- Hardware Implementation 56
4.5.1- Reconfigurability Aspect 56
4.5.2- Data Links and Communications 57
4.5.3- Host-FPGA synchronization 58
4.5.4- FPGA-HOST synchronization 60
4.5.5- Memory Component 61
4.5.6- Number Representation 63
4.5.7- Convolution Process 64
4.5.8- Host Application 66
4.5.9- FPGA Process 66
4.6- Design Assessment 68
4.6.1- Testing Scheme 68
4.6.2- Results Assessment 69
5- System Design Constraints 74
Conclusion 77
Appendix 78
I- Digital Filter Coefficients Design 78
II-Area Considerations for Variable FIR Design 80
III-LABVIEW 8.0 System Board: PCI 5640 84
IV- Virtex II Pro FPGA Capabilities 87
V- Fixed Point Notation 88
Bibliography 89
vii
List of IllustrationsList of IllustrationsList of IllustrationsList of Illustrations
-Figure 2.1: WIMAX Block Diagram 6
-Figure 2.2: WCDMA Block Diagram 9
-Figure 2.3: Ideal root raised cosine filters 10
-Figure 2.4: Standard FIR Filter 13
-Figure 2.5: Magnitude and Phase Error 21
-Figure 4.1: WIMAX FIR baseband filter 31
-Figure 4.2: FS10 WIMAX FIR 31
-Figure 4.3: WIMAX FIR Filter Response using FS10 32
-Figure 4.4: WCDMA FIR baseband filter 33
-Figure 4.5: WCDMA FIR baseband filter using FS10 33
-Figure 4.6: Simplified WCDMA channel 34
-Figure 4.7: LABVIEW based WCDMA transceiver block diagram 35
-Figure 4.8: WCDMA transmitter using ADS 36
-Figure 4.9: WCDMA transmitted signal and its constellation using ADS 37
-Figure 4.10: Top level View of the WCDMA Channel 38
-Figure 4.11: WCDMA Acquire Input Signal VI 39
-Figure 4.12: PSD of a WCDMA Signal 40
-Figure 4.13: Constellation of the Input Data 40
-Figure 4.14: Noise Introduction VI 41
-Figure 4.15: Noisy PSD 42
-Figure 4.16: Noisy QPSK Constellation 42
-Figure 4.17: Channel Filter LABVIEW VI 43
-Figure 4.18: PSD of the Filtered Signal 44
-Figure 4.19: Constellation after Filtering 44
-Figure 4.20: EVM LABVIEW VI 46
-Figure 4.21: LABVIEW WIMAX Transceiver Block Diagram 47
-Figure 4.22: WIMAX transmitter using ADS 47
-Figure 4.23: WIMAX transmitted signal using ADS 48
-Figure 4.24: Top level View of the WIMAX Channel in LABVIEW 49
-Figure 4.25: WIMAX PSD in LABVIEW 50
-Figure 4.26- WIMAX EVM LABVIEW Module 51
-Figure 4.27: WIMAX filter response. A) no noise b) noisy WIMAX signal for 0.3
noise standard deviation c) WIMAX filter response for 0.3 noise
standard deviation 52
-Figure 4.28: Reconfigurable Transceiver Block Diagram 54
-Figure 4.29: Case Structure (Choose WCDMA / WIMAX) 57
viii
-Figure 4.30: Host-FPGA Link 58
-Figure 4.31: FPGA Read Process 59
-Figure 4.32: DMA FIFO Read Method 60
-Figure 4.33: FIR implementation Using Arrays and FIFOs 61
-Figure 4.34: Write a 32-bit coefficient in the memory 62
-Figure 4.35: Convolution Process 65
-Figure 4.36: HOST VI 66
-Figure 4.37: FPGA VI 67
-Figure 4.38: Frequency Domain of the Input WCDMA signal 70
-Figure 4.39: Frequency Domain of the Input WCDMA signal 70
-Figure 4.40: WCDMA Initial Constellations 71
-Figure 4.41: WCDMA Filtered Constellations 71
-Figure 4.42: Frequency Domain of the Input WIMAX signal 72
-Figure 4.43: Frequency Domain of the filtered WIMAX signal 73
-Figure A.1: Transposed FIR 82
-Figure A.2: Transposed FIR with multiplier block 82
-Figure A.3: High level FPGA_ I/O architecture 85
-Figure A.5: High level Diagram of the PCI 5640 85
ix
List of TablesList of TablesList of TablesList of Tables
-Table 4.1: MATLAB code to generate FIR coefficients of WIMAX 31
-Table 4.2: MATLAB Testing for Fixed Point Notation 63
x
AbstractAbstractAbstractAbstract
As new wireless communication standards are introduced to market, the idea of
reconfigurable systems is becoming essential to solve the different problems that the
coexistence of multiple standards poses. In this report, a proposed implementation
technique is given to reconfigure WIMAX and WCDMA transceivers. This
implementation technique highlights the design considerations related to the channel
FIR filter present in the receiver of each of the prementioned standards. This report, also,
discusses the features of PCI5640 Labview8.0 device on which the proposed design is
downloaded. In addition, a common architecture design is proposed in order to facilitate
future job of reconfiguring the different modules in the transceiver. Implementation,
performance, reconfigurable FIR filter testing, and results are further discussed in details.
1. Introduction
1.1- Problem Definition
1.2- Report Structure
Reconfigurable Baseband Blocks for Wireless
Multistandard Transceivers
Introduction 1.1- Problem Definition
2
1-Introduction
Since early 1980s, the evolution of new wireless communication standards has been
remarkably noticed, especially in migrating from analog communication systems to their
equivalent in the digital domain. Later, the industrial competition between Asia, Europe,
and America encouraged the development of a unique mobile system standardized all
over the world which would be of great benefit to the market [1]. From now till the
deployment of the above mentioned standard, the market will be facing many problems
due to the coexistence of multistandardized communication systems. Nowadays, many
researchers are working on short end solutions before the transition to the worldwide
standards takes place. One leading solution, the subject of our project, is the dynamic
reconfiguration of the different modules in the system to suit the specs of as many
standards as possible.
1.1- Problem Definition
Nowadays, the heterogeneity found at the different layers of wireless communication
channel is increasing as new standards are introduced. Despite this problem, many
countries such as European countries and Japan, are willing to install new base stations
that support multitude of communication standards such as GSM, EDGE, UMTS-FDD and
Bluetooth. Designing such base stations efficiently requires studying the reconfigurable
aspects of the different modules in order to avoid duplication of resources. Thus, the
system is capable of dynamically reconfiguring itself to the environment as needed. This
solution is beneficial for both, to the final user and the manufacturers. Starting with the
end user, he will benefit from a higher quality of service, better connectivity, and
enhanced roaming concept. Concerning the manufacturers, they would profit from ease
of introduction of new types of services, less to market time and reduction in the cost of
addition of new standards.
Introduction 1.2- Report Structure
3
In this report, we present the design and implementation of some blocks of a
reconfigurable transceiver that is adaptable to WCDMA and WIMAX.
1.2- Report Structure
Our report is organized as follows. Chapter 2Chapter 2Chapter 2Chapter 2 introduces our topic by giving a general
survey about the related subjects in our design; an overview of WIMAX, and WCDMA
wireless standards, with their specifications, is given. A general introduction about SDR
concept comes afterwards. FIR filters and FPGA related topics are followed. . In chapter chapter chapter chapter
3333, different design alternatives that were studied throughout our survey are presented. A
more detailed description of our system design and analysis, including simulation and
hardware implementation, is then introduced in chapter chapter chapter chapter 4444. In this chapter, we also go
further by presenting our testing scheme used and a detailed description of the results
obtained. In chapter chapter chapter chapter 5555, we present system design constraints form different perspective
such as economic, social, sustainability, political… Finally a conclusionconclusionconclusionconclusion section is added.
An appendix appendix appendix appendix covering further details about the fixed point notation, the PCI 5640, area
considerations in FIR design, Virtex II pro FPGA capabilities and filter coefficients design
is included for further information.
System Design Constraints
2. Literature Survey
2.1- Overview of Proposed Wireless Standards 2.1.1- WIMAX
2.1.2- WCDMA
2.2- Software Radio Concept
2.3- FIR Filters 2.3.1- Variable FIR Filters 2.3.1.1- Design Methods for Variable FIR Filters
2.3.1.2- FIR Tap Design With Variable Frequency Response
2.3.2- Area Considerations in FIR Design Schemes 2.4- Hardware Platforms 2.4.1- FPGA features
2.4.2- LABVIEW 8.0 System Board: PCI 5640
2.5- Error Vector Magnitude (EVM) Metric
Reconfigurable Baseband Blocks for Wireless Multistandard Transceivers
Literature Survey 2.1- Overview of proposed Wireless Standards
5
2- Literature Survey
In this chapter, we present a literature survey about some topics needed for the design
and implementation of the final year project. An overview the proposed 3G wireless
standards: WIMAX and WCDMA including their specifications is presented. A general
description of the software radio concept (SDR) is also introduced. A survey about FIR
filters, in particular variable FIR filters and some design related techniques follows.
Finally, we present the features of the used Virtex-II Pro FPGA as well as the definition
of the used Error Vector Magnitude (EVM) metric.
2.1- Overview of Proposed Wireless Standards
Due to the evolving technology, users’ needs are becoming more crucial, especially in
the field of wireless communication. He no more feels sufficient to use his mobile phone
for voice communication, but also looks forward for high data rata communications
through SMS or even multimedia communication. For all these reasons, new wireless
standards evolved in order to meet such and other user’s requirements. Of these
standards, we mention: WIMAX and WCDMA.
2.1.1- WIMAX
WIMAX, (Worldwide for Microwave Interoperability Access) also referred to as
802.16, is the current standard for Broadband Wireless MAN networks that is aimed to
provide a wireless alternative to cable, DSL and T1/E1 for last mile broadband access. It
will be also used to connect hot-spots to the internet [2]. It has the potential for very long
range (5 - 30 miles) and high speeds [3]. The first version of WIMAX was approved as an
IEEE Standard 802.16-2001 and this was published in 2002. This standard, however, had
the drawback of addressing only fixed line-of-sight connections by focusing on licensed
Literature Survey 2.1- Overview of proposed Wireless Standards
6
frequencies in the range of 10-66 GHz; this standard could reach a maximum distance of
5 Km [2].
Because of the mentioned drawbacks, they had to enhance the current standard thus
leading to a new standard 802.16a that addresses lower frequencies 2-11 GHz range; it
could reach a maximum distance of 50Km (ten times better) with a bit rate up to
75Mbit/s. The most important advantage for this standard, in addition to the previous
mentioned ones, is the fact that it supports Non-line-of-sight. This is because this
standard runs on lower frequency bands in comparison to the high frequency bands
involved in the previous standard (10-66GHz) [2].
WIMAX has a higher capacity with a lower cost than DSL or any cable for extending
fiber networks. It also has the advantage of supporting multimedia and fast internet
applications. The block diagram of 802.16a is illustrated in figure 2.1.
Figure 2.1: WIMAX Block Diagram
As shown in the above diagram, the WIMAX uses OFDM and this provides the possibility
of using NLOS (no line of sight) systems as deduced [2].
WIMAX specifications are summarized below:
- Selectable channel bandwidths of 1.5, 1.75, 3, 3.5, 5.5, 7, 10, 14 and 20 MHz
Literature Survey 2.1- Overview of proposed Wireless Standards
7
- 256 – point FFT / IFFT
- 10-bit AGC with fully programmable outputs for interface to any type of attenuator.
- Supports maximum 128dB of attenuation with 1⁄2 dB step resolution.
- Includes interpolation and decimation filters for 2x oversampling
Moreover, the filter characteristics depend on the ADC (Analog to digital converter)
dynamic range and sample rate [5].
It is useful at this step, to explain some of the blocks that appeared in the figure above:
Convolutional EncoderConvolutional EncoderConvolutional EncoderConvolutional Encoder: This encoder encodes a stream of binary input vectors (K) and
outputs a, usually, larger stream of output vectors (K*L) where L is a certain positive
integer chosen suitably for the design specifications [4]. This encoder plays an important
role in a fading, or noisy, environment because it is capable of correcting some errors
affected by such environment.
Interleaver: Interleaver: Interleaver: Interleaver: This block improves further the performance of encoder at the transmitter
side and decoder at the receiver side [4]. Its presence becomes more important in a fading
environment where it spreads the errors into many (K*L) output bits thus leaving few
errors in each (K*L) output bits and thus is capable of correcting these few errors.
Modulator: Modulator: Modulator: Modulator: It (802.11a) uses OFDM (Orthogonal frequency division multiplexing) where
it divides the given bandwidth into many multicarriers and sends the data (for one user
or multiple users) on each multicarrier. It uses shifted pulse shaping filters at transmitter
and receiver. This would filter some crossings between the multicarriers [4].
Time guard: Time guard: Time guard: Time guard: This block is added to improve the modulator more by decreasing the effect
of multipath propagation. Accordingly, OFDM needs only one multiplication on each
subcarrier as equalization [4].
Puncturer: Puncturer: Puncturer: Puncturer: This block decreases the rate of bits to match the rate of the interleaver in
such a way, it won’t loose any information.
Literature Survey 2.1- Overview of proposed Wireless Standards
8
2.1.2- WCDMA
Wideband CDMA is a third-generation (3G) wireless standard. It uses a 5 MHz
channel for both voice and data, offering an initial data speed of 384 Kbps [3]. It can also
reach speeds of up to 2 Mbps for voice, video, data and image transmission.
WCDMA is also referred to as UMTS - the two terms have become interchangeable [3].
This standard is based on code division multiple access modulation (CDMA) which
provides the capability of finding multi-user scenarios. However, this leads to inter-
symbol and intra-symbol interference (ISI). Thus, it uses a spread spectrum modulation
technique (SS) that is capable of reducing interference by a factor “L” called “the
spreading gain” or the “spreading factor”. Actually, this modulation technique has other
advantages. For example, it spreads the signal into a larger bandwidth with same energy,
(thus reducing the amplitude of the signal) and accordingly, it can escape any voluntarily
jamming action that has a certain noise threshold (the information signal will lie below
this threshold). Also, this technique enables for multiple accessing for the same frequency
band and at the same time. This, however, leads to some problems, especially,
interference which can be solved by some techniques that are implemented by the
WCDMA like: Soft handover, and softer handover solution techniques. The WCDMA
block diagram is shown in figure 2.2.
As shown in the figure 2.2, the mapping used is a QPSK mapping which maps every
two bits into one symbol. Upsampling is then performed, usually by a factor of 4, to
increase the bit rate. Upsampling is performed so that we can input the output of our
processing blocks at the same rate to the DAC to be processed correctly [6].
Upsampling is performed by inserting (4-1) zeros between each original input and the
net result would be a compressed DTFT signal by a factor of 4 [7]. However, Upsampling
adds to the original signal undesired spectral images which are centered on multiples of
the original sampling rate. Accordingly, we have to perform some kind of filtering to
Literature Survey 2.1- Overview of proposed Wireless Standards
9
remove the undesired spectral images (this is the interpolation filtering at the end of the
chain) [6].
Figure 2.2: WCDMA Block Diagram
The standard uses a root raised cosine filter (RRC) which has the characteristic, in
addition to being a pulse shaping filter, of canceling ISI in an ideal channel scenario since
the peak of only one of the signals will lie above the zero crossings of all the other signals.
This means that all the other signals will have no impact on this single signal and thus it
will not suffer from ISI. The FIR root raised cosine time function is given in Figure 2.3.
The function obeys the following equation:
Where α is the rolloff factor.
2
(1 ) (1 )cos sin
4 4( )
41
t T t
T t Th t
T
T
α π α π
α α
π α
+ − +
=
−
Literature Survey 2.1- Overview of proposed Wireless Standards
10
Figure 2.3: Ideal root raised cosine filters
Upcoversion is then followed. It converts the low frequency signal into an RF
frequency signal. A filter is then applied called “interpolation filter”, this filter, as
discussed previously, removes the undesired spectral images caused by upsampling and
upconversion. The output has then the same rate as the DAC and accordingly, it can now
be inputted to the DAC block and then transmitted.
The receiver components contain the inverse of the blocks explained previously.
Literature Survey 2.2- Software Radio Concept
11
2.2- Software Radio Concept
In the transformation from 2G to 3G standards, we need a common implementation
platform which can group all wireless standards in one way or another.
One evolving technique for the variable FIR implementation is using the SDR concept.
“SDR is a rapidly evolving technology that is receiving enormous recognition and
generating widespread interest in the telecommunication industry. It is the focus of
research in the communication field world wide” [8].
“SDR refers to the technology where software modules running on a generic
hardware platform are used to implement radio functions” [8, 9]. In other words, the
hardware platform supports multiple software modules. So the system can switch
between different standards by running the corresponding software.
SDR tries to achieve two main goals [10].
-to move the digital part of the transmitter and the receiver as much as possible
toward the antenna (RF end)
-to replace ASIC with DSPs since DSPs are able to process baseband signals and thus
radio functionalities through software
As illustrated above, most solutions are trying to rely on software to solve different
problems. These solutions, however, need not be the optimal ones. In addition to
software, hardware programmable devices such as FPGAs need to be used in parallel to
reach optimal performance.
SDR offers many advantages to the user, the most important of which is
reconfigurability. The Reconfigurability subject has been extensively researched in the
last decade especially in the field of mobile communications. Different people have been
working on reconfiguring the different modules of the channels. For example in [11], the
paper presents the implementation in software of the different modulation/demodulation
schemes for the GSM, UMTS, EDGE and Bluetooth on a unique hardware platform.
Looking into these schemes, UMTS uses QPSK as modulation technique, GSM – GMSK,
Literature Survey 2.2- Software Radio Concept
12
EDGE - 8PSK, Bluetooth-GFSK. The need to implement these different systems on the
hardware platform forced the researchers to look into the mathematical representation of
these types of modulation. They observed that all these modulation schemes can be
expressed by a quadrature decomposition, which encouraged them to build a common
architecture called digital-IF. Their implementation is totally done in software and is
download on a DSP, also the transition in the frequency band is done in software. This
implementation takes advantage of the common mathematical aspect of the different
modulation schemes and benefits from DSP board that allows work to be done in the
software domain.
More advantages can be offered by SDR. These include “Multi service”; the SDR
system can theoretically operate in multi-service environments, without being
constrained to a particular standard, “Multi band”; SDR systems can theoretically
function on any radio frequency band, “Update Feature”: the software modules that
implement new characteristics can be downloaded to the hardware platform and thus the
system can be kept up to date [1,8,12,13].
Some drawbacks of SDR is that the system can have higher power consumption,
higher processing power (MIPS) requirement or higher initial costs depending on the
design.
Literature Survey 2.3-FIR Filters
13
2.3- FIR Filters
The digital filter is one of the basic blocks in Digital Signal Processing and
Communication systems. Their design and thus their operation can affect the
performance of the whole system. FIR Filters are characterized by several parameters
including their orders and the values of their coefficients which depend on the desired
frequency response. Generally, there are two kinds of digital filters: IIR and FIR. Infinite
Impulse Response, IIR, Filters are filters whose impulse response can be infinite in
duration. Finite impulse response filter, FIR, on the other hand, is a special kind that
contains a finite number of taps in its impulse response [14, 15, 16]. It is one of the most
widely used modules in DSP applications. “It performs a moving, weighted average on a
discrete input signal, x(n), to produce an output signal” [17]. Thus, the FIR output
depends only on the previous N inputs; where N is the number of taps.
The basic operation in the filtering process is to convolve the input by the filter weights
or taps as given by the following equation:
0
( ) ( ) ( )N
k
y n x n h n k=
= −∑ (1)
Fig 2.4 shows the architecture of a standard fully pipelined FIR filter for the
implementation above formula.
Figure 2.4: Standard FIR Filter
FIR filter design involves two main stages: coefficients design and architecture design.
The following section deals with the first stage: coefficients design. The second stage,
architecture design, is referred to in the FIR Implementation section. For more
Literature Survey 2.3-FIR Filters
14
information about the first stage, the filter coefficients design, refer to section I in the
Appendix. As for the second stage, the FIR implementation stage, it is illustrated in
chapter 3 in the report, Project design and analysis.
2.3.1- Variable FIR filters
Variable digital filters are digital filters whose frequency characteristics depend on
control or tuning parameters. The most common variable parameters include:
•••• Variable cutoff frequency
•••• Adjustable Passband Width
•••• Adjustable Stopband Width
•••• Controllable Fractional Delay
•••• Magnitude and number of ripples
•••• Attenuation level in various bands
Varying any of the above parameters results in a change of the order of the filter i.e. the
number of taps or coefficients of the filter and of course their values.
2.3.2.3.2.3.2.3.1111.1.1.1.1---- Design Methods for Variable FIR Filters Design Methods for Variable FIR Filters Design Methods for Variable FIR Filters Design Methods for Variable FIR Filters
Methods for designing variable digital filters can be classified into two main
categories: the transformation based methods and the spectral parameter approximation
methods [18]. The transformation methods are based on first designing a filter with
certain fixed frequency characteristics and then applying a certain transformation to
obtain the new filter with new desired frequency characteristics based on predesigned
parameters. Generally, this method is applied to filters with variable cutoff frequencies.
The spectral parameter approximation methods, on the other hand, approximate either
the impulse response or the poles and the zeros of the filter by polynomials that are
functions of certain spectral parameters [18, 19, 20, 21]. One used technique is the curve
fitting technique as shall be examined in the following section.
Literature Survey 2.3-FIR Filters
15
2.3.1.2- FIR Tap Design with Variable Frequency Response
Different approaches for each category have been proposed for the design of digital
variable filters taps. In this paper, we try to focus on the most widely used approaches.
One old but still evolving technique that belongs to the second category expresses the FIR
impulse response as a linear combination of some basis functions. Another technique
relies on the Frequency masking concept. Note that the frequency masking approach is a
mix of the two categories as will be illustrated later.
In the former, each filter coefficient is a multidimensional function or polynomial of the
spectral parameter. The famous algorithms for the optimal approximation of filter
coefficients include the LSE (least squares method), the WLS (weighted least squares) and
the curve fitting approaches as shown below:
Least Square Approximation MethodLeast Square Approximation MethodLeast Square Approximation MethodLeast Square Approximation Method
By expressing the impulse response of the filter as a linear combination of basis
functions, the optimal LS (Least squares) solution for designing the filter then reduces to
solving a system of linear equations.
The impulse response of the variable filter, h(n,Φ), is considered as a linear combination
of the functions ψm (Φ), which depend on the variable parameter (Φ). This is illustrated in
the following equation.
,
0
( , ) ( )M
n m m
m
h n c ψ=
Φ = Φ∑
The functions ψm (Φ) constitute the basis functions and are most often chosen to be
orthonormal but this is not necessary. The cn,m values are the expansion coefficients. The
aim is to determine the expansion coefficients given the basis functions such that the
frequency response h(n, Φ) approximates a desirable variable frequency response as a
function of the spectral parameter Φ. The approximation error, which is the difference
between the desired frequency response and the approximated response in frequency
domain, is a function of the expansion coefficients cn,m as illustrated [18]. The L2 form of
E(ω,Φ) is given by:
Literature Survey 2.3-FIR Filters
16
2( , ). | ( , ) |
s s
jE W e E d d
ω
φ
φ ω φ ω φΩ
= ∫ ∫
where “W” is some weighting function that controls the amount of approximation error
in the frequency space. Ωs is the frequency space and Φs is the parameter space over
which the spectral parameter vector Φ is to be varied. The L2 norm of E(ω,Φ) is a
quadratic function of the expansion coefficients and has a unique minimum characterized
by a system of linear equations. Writing “E” in the appropriate form and differentiating
with respect to the expansion coefficients and then setting the result to zero, one gets a
linear equation with the optimal coefficients solution [18].
Weighted Least Square ApproximationWeighted Least Square ApproximationWeighted Least Square ApproximationWeighted Least Square Approximation
Using the weighted least squares approximation method, the designer can control the
frequency response error by minimizing it in the passband frequency region. The cost,
thus, is the increased value of errors in the other regions.
By using an adaptive weight function1, the norm of the frequency response error can be
minimized.
If ρ is a real number representing the parameter to be varied; e.g. bandwidth, resonance
frequency, group delay …, then the actual filter can be written in the following form:
N
0
F(z, ) = ( ) n
na zρ ρ −∑
Where an ( ρ ) is a polynomial function in ρ .Thus F(z, ρ ) can be written as the
product of the 3 matrices: F(z, ρ ) = ZT(z).AMP( ρ ) such that
Z(z) = [1, z-1, z-2, ….z-N]T,
P( ρ ) = [1, ρ , ρ 2, …. ρ k]T,
where AM = amplitudes matrix. F(z, ρ ) can also be written as
F(z, ρ ) = AvT (P( ρ ) ⊗ Z(z))
where ⊗ is the Kronecker product and Av is a row vector denoting the concatenation of
the rows of the AM matrix i.e. [r1 r2 ….. rN].
1 Changing the weights at run time
Literature Survey 2.3-FIR Filters
17
Thus F is a linear expression in function of z and ρ . Note that the implementation
cost is proportional to the number of elements in the AM matrix.
Defining the cost function so the frequency error function is minimized both along the v
and ρ axis as
J = 2 2| ( , ) ( , ) | ( , )i
l i
i v
l id i l i l
p C z L
F e p F v p w v pπ
∈ ∈
−∑∑
where “id” denotes the ideal filter and w is the weight function. Assuming that w is
independent in v and ρ , then
w(v, ρ ) = w(v) . w( ρ ).
The weights are then designed such that J is minimum [22].
Curve Fitting TechniqueCurve Fitting TechniqueCurve Fitting TechniqueCurve Fitting Technique
Conventional techniques suffer from the fact that the edge frequencies of the various
bands cannot be independently controlled. The proposed technique removes this
restriction by expressing the filter coefficients as analytical functions of the frequency
specifications by using a curve fitting technique. Thus more flexibility is available than
the transformation based techniques. The technique also belongs to the spectral
parameter approximation category illustrated above. The technique, however, suffers
from the fact that it requires a large design time. The main idea is that the technique
optimizes several fixed filters having different spectral parameter values and then a curve
fitting technique in-order to fit an analytic function to the coefficient values. Assuming
that the frequency response changes slightly between different given fixed filter
responses, and thus small change in filter coefficients, the curve fitting technique can
present highly accurate results. The time required to come up with the designed
coefficients is highly dependent on the number of given fixed frequency responses (of
various fixed filters) as well as the number of selected points chosen from the responses.
A tradeoff should be decided between the accuracy of the filter taps and time on one
hand and the number of selected points on the other hand. Increasing the number of
selected points per filter response increases the accuracy of the filter taps as well as the
Literature Survey 2.3-FIR Filters
18
execution time of the convolution process. Moreover, the degrees of the chosen
polynomials have a considerable effect on the accuracy of filter coefficients [21].
2.3.2- Area Considerations in FIR Design Schemes
There exists a never-ending demand for decreasing the amount of hardware used in a
system. This leads to substantial benefits like reduced cost and power consumption,
increased application functionality1, and thus increased utilization of FPGA resources …
In most FIR implementation, hardware consumption is mainly due to the multiplier
blocks rather than adder modules [23]. Different algorithms have been proposed for
efficient implementation of multiplier blocks. Previously, different algorithms were
proposed for minimizing adder hardware cost since it was assumed that the adder cost
dominates the area requirement; from a VLSI point of view. However, after the
introduction of the FPGA as the hardware platform, the solution of minimizing adders’
complexity does not work anymore since the “FPGA has a fixed architecture for
implementing digital logic”. Instead, it is the architecture design that minimizes such cost
[23].
Different commonly used approaches and architectures that increase resource
utilization are considered below:
Consider first the standard FIR implementation shown in Fig 2.5. The figure shows a full
parallel, fixed coefficient FIR filter. For each tap, the filter requires one multiplier, one
adder and one delay element. Thus the resource usage is proportional to the number of
coefficients [14, 24, & 25]. Other enhanced architectures and techniques with higher
complexity include array multiplication, multipliers using add and shift operations,
transposed FIR, transposed FIR architecture with multiplier block, MAG (Minimized
adder group) algorithm or multiplier design, architecture based on computational sharing
multipliers (CSHM). For more details about the mentioned methods, refer to appendix II.
1 by using the extra available area
Literature Survey 2.4- Hardware Platforms
19
2.4- Hardware Platforms
“An FPGA is an array of gates with programmable interconnect and logic functions
that can be redefined/ reconfigured after manufacture” [24].It is characterized by its small
size and high resource utilization.
Designers usually use VHDL or Verilog to define various hardware resources. To access
these hardware resources, designers implement driver modules. Drivers are defined as the
interface between software and hardware and thus decrease the gap between
hardware/software design as much as possible.
2.4.1 – FPGA Features
One interesting advantage of the FPGA is its optimized DSP applications. FPGAs, for
example, can perform MAC (multiply and accumulate) operations very quickly. One way
to implement the MAC operation on an FPGA is to use array multiplication with a
pipelined structure resulting in fast throughput. Another design can use LUT, look up
tables, which can execute operations in high speeds [24].
Thus, FPGAs are increasingly becoming the implementation platform for high-speed
DSP systems. They offer many advantages that fulfill the DSP applications needs.
FPGA design can reduce design time and thus time-to-market duration. Most current
FPGA tools have been created for fast ASIC prototyping since they are very efficient in
the use of engineering time [24]. The reason is that the FPGA offers a high degree of
flexibility which facilitates the testing process where errors are incurred and thus
modifications take place at nearly no cost.
Another characteristic is the dynamic reconfigurability of the FPGA. Usually, the
FPGA is configured when the system is powered on. Thus the FPGA does a fixed
operation until the system power is turned off. However, recent FPGAs offer dynamic
reconfiguration, where the user can reconfigure the FPGA during processing time [24].
Literature Survey 2.4- Hardware Platforms
20
2.4.2 – LABVIEW 8.0 System Board: PCI 5640
The IF RIO LABVIEW based reconfigurable system board, manufactured by NI,
contains a reconfigurable FPGA surrounded by fixed I/O resources such as ADC and DAC
that can be controlled through software. The RIO board uses the LABVIEW software to
create VI modules that can run on the FPGA, also known as FPGA VI.
The advantage of such board is that it uses relatively easy software, LABVIEW, and
thus hides the complexity of the common HDL languages, VHDL and Verilog, that are
commonly used to design hardware components. Moreover, it also supports designs that
are created using HDL. So, modules created using VHDL or some other HDL language
can be imported to the LABVIEW as custom VIs.
Due to these reasons, we propose the LABVIEW 8.0 System Board: PCI 5640 to use as
the underlying hardware platform. For more information about the LABVIEW board as
refer to appendix III & IV.
Literature Survey 2.5- EVM Metric
21
2.5- Error Vector Magnitude (EVM) Metric
The error vector magnitude metric (EVM) measure is used to evaluate the
efficiency of the design of modulators, filters… This measure has shown excellent
performance in testing communication systems and is widely used in nowadays research
centers such as in the IS-54 TDMA digital cellular systems to verify the specifications
needed for accuracy of /4-DQPSK modulators.
The algorithm involves the following steps:
1. Read an input signal with I and Q components.
2. Compute the ideal constellation positions such as ± 0.7 ± 0.7 j for an ideal QPSK
modulator to use as a reference for comparison.
3. For every input signal (I and Q combination)
a. Calculate the distance between this point and the 4 possible reference
constellations (distance between A&B in figure 11 where B is a reference
constellation and A is an input signal).
b. Choose the minimum distance so that this input signal is mapped to this
reference constellation in the demodulator.
c. Square the minimum distance and normalize it with respect to the square
of the distance between the origin and the reference constellation point
(AB2/OB2).
4. Add all the normalized square distances
5. Take the square root of the sum and divide by the number of inputs to get the
EVM metric.
Figure 2.5: Magnitude and Phase Error
System Design Constraints
CH 3: Design Alternatives
3.1- FIR vs. IIR: Advantages
and Disadvantages
3.2 FPGA vs. DSP Chips
Reconfigurable Baseband Blocks for Wireless
Multistandard Transceivers
Design Alternatives 3.1- FIR vs. IIR
23
3- Design Alternatives
Even though the literature survey various design issues, it is still important to
present some other design alternatives and evaluate their performance and how they
affected our choice in what alternative to choose throughout the design. Two of the main
studied design alternatives are presented in section 3.1 and 3.2. Other alternatives and
algorithms were presented previously in the literature survey as illustrating examples.
3.1- FIR vs. IIR: Advantages and Disadvantages
FIR filters are characterized by their simple architecture and thus lower
implementation complexity. For example, the FIR filter can be implemented using only a
single multiplier and an accumulator. In addition the FIR can use fewer bits than the IIR
filter due to the absence of a feedback loop which introduces more errors1.
In contrast to the IIR filters where the output can sometimes be unstable, the FIR, on the
other hand, can always be designed such that its output is stable. In addition, the FIR
filter can have a linear phase if the filter coefficients are symmetrical or anti-symmetrical
around the center frequency [14]. This feature is essential for data transmission, video
processing and high-quality audio systems [14,16].
Another advantage of the FIR errors is that errors introduced as a result of quantizing
filter coefficients can have a low impact on the filter outputs case the quantization
process was properly handled. This is a very important property when a low bit-error-
rate is desired [14].
Even though the FIR possesses many advantages; many disadvantages arise compared
to the IIR. FIR filters usually have a higher order than IIR filters for a given spectral
characteristic. Thus, FIR filters require a higher number of multipliers compared to IIR
1 Given that the IIR filter output relies on previous outputs, then errors propagate to future outputs and thus
we need more bits to get the desired accuracy
Design Alternatives 3.1- FIR vs. IIR
24
filters case the implementation is fully pipelined1, and thus every output needs one
iteration. On the other hand, if the implementation is not pipelined, the FIR would take
more time than the IIR filter.
These disadvantages translate into larger memory requirements and
computational resources. Inaddition, “FIR coefficients must be designed using an iterative
method since the required filter length to satisfy a given filter specification can only be
estimated” [16]. In other words, the designer specifies the order of the filter, given certain
specs, and then simulates the frequency response. If it didn’t meet the desired response,
he re-estimates a new order based on the previous results and repeats the process.
1 By pipelined, we mean that every tap is assigned one multiplier
Design Alternatives 3.2- FPGA vs. DSP Chip
25
3.2 FPGA vs. DSP Chips
Digital signal processors are optimized processors designed to perform signal
processing mathematics operations. They have been extensively used in the market
during the last three decades. Nowadays however, after the introduction of FPGAs,
customer attraction to DSP has recessed. The use of either the DSP or the FPGA depends
on several factors as illustrated below.
DSPs are characterized by their flexibility and ease of programming relative to the
FPGA. In a DSP system, the programmer does not need to understand the hardware
architecture [24]; the hardware implementation is hidden from the user. The DSP
programmer uses either C or assembly whereas the FPGA designer usually uses VHDL or
Verilog.
With respect to the performance criterion, the speed of the DSP chip is limited by the
clock speed of the board, given that the DSP processor operates in a sequential manner
and accordingly cannot be fully parallelized. FPGAs, on the other hand, can work very
fast if an appropriate parallelized architecture is designed however they offer less
flexibility than the DSP processors.
Reconfigurability in DSPs can be achieved by changing the memory content of its
program. This is in contrast to FPGAs where reconfigurability can be performed by
downloading reconfiguration data to the RAM.
Regarding power consumption in a DSP, it depends on the number of memory
elements used regardless of the size of the executable program. As for the FPGA, the
power consumption depends on the circuit design.
FPGAs are important when there is a need to implement a parallel algorithm, that is,
when different components operate in parallel to implement the system functionality.
Thus the speed of execution is independent of the number of modules. This is in contrast
to DSP systems where their execution speed is inversely proportional to the number of
functionalities.
Design Alternatives 3.2- FPGA vs. DSP Chip
26
In conclusion, in most engineering projects, it is the application that dictates which
device and platform to use in order to achieve optimal performance at a low cost. FPGAs
outperform DSP systems in the area of filter implementation, convolvers, correlators,
FFTs … Whereas, DSPs are more practical for signal processing programs of sequential
nature.
Given that we are designing a reconfigurable hardware platform that operates in a
parallel fashion, FPGA is more suitable to use than a DSP board. A complete description
of the details of the implementation process will be presented in the successive chapters.
System Design Constraints
4. Project Design & Analysis
4.1- System Definition
4.4- FIR Filter Coefficients Design 4444.4.1.4.1.4.1.4.1---- FIR System Definition FIR System Definition FIR System Definition FIR System Definition
4444.4.2.4.2.4.2.4.2---- Filter Coefficients Generation Filter Coefficients Generation Filter Coefficients Generation Filter Coefficients Generation
4.4.2.14.4.2.14.4.2.14.4.2.1---- WIMAX WIMAX WIMAX WIMAX ChannelChannelChannelChannel FIR Filter Design FIR Filter Design FIR Filter Design FIR Filter Design
4.4.2.24.4.2.24.4.2.24.4.2.2---- WCDMA Channel FIR Filter Desi WCDMA Channel FIR Filter Desi WCDMA Channel FIR Filter Desi WCDMA Channel FIR Filter Designgngngn
4.2- LABVIEW Simulation of 3G Channels 4.2.14.2.14.2.14.2.1----WCDMA channel SimulationWCDMA channel SimulationWCDMA channel SimulationWCDMA channel Simulation
4.2.14.2.14.2.14.2.1----WCDMA Acquire Input SignalWCDMA Acquire Input SignalWCDMA Acquire Input SignalWCDMA Acquire Input Signal
4.2.4.2.4.2.4.2.2222--------Noise Introduction VINoise Introduction VINoise Introduction VINoise Introduction VI
4.2.4.2.4.2.4.2.3333--------Channel Filter VIChannel Filter VIChannel Filter VIChannel Filter VI
4.2.4.2.4.2.4.2.4444--------EVM VIEVM VIEVM VIEVM VI
4.2.24.2.24.2.24.2.2---- WIMAX Channel SimulationWIMAX Channel SimulationWIMAX Channel SimulationWIMAX Channel Simulation
WIMAX VIs ExplanationWIMAX VIs ExplanationWIMAX VIs ExplanationWIMAX VIs Explanation
4.3- Reconfigurable System Architecture
4.5- Hardware Implementation 4.5.14.5.14.5.14.5.1---- Reconfigurability AspectReconfigurability AspectReconfigurability AspectReconfigurability Aspect
4.5.24.5.24.5.24.5.2---- Data Links and CommunicatData Links and CommunicatData Links and CommunicatData Links and Communicatioioioionsnsnsns
4.5.34.5.34.5.34.5.3---- HostHostHostHost----FPGA synchronizationFPGA synchronizationFPGA synchronizationFPGA synchronization
4.5.44.5.44.5.44.5.4---- FPGAFPGAFPGAFPGA----HOST synchronizationHOST synchronizationHOST synchronizationHOST synchronization
4.5.54.5.54.5.54.5.5---- Memory ComponentMemory ComponentMemory ComponentMemory Component
4.5.64.5.64.5.64.5.6---- Number RepresentationNumber RepresentationNumber RepresentationNumber Representation
4.5.74.5.74.5.74.5.7---- ConvConvConvConvolution Processolution Processolution Processolution Process
4.5.84.5.84.5.84.5.8---- Host Application ProcessingHost Application ProcessingHost Application ProcessingHost Application Processing
4.6- Design Assessment 4444....6.16.16.16.1---- Testing Scheme Testing Scheme Testing Scheme Testing Scheme 4.6.2- Results
Reconfigurable Baseband Blocks for Wireless
Multistandard Transceivers
Project Design& Analysis 4.1- System Definition
28
4-Project Design & Analysis
After conducting a literature survey, we move to the design and analysis phase.
The chapter presents the details of the system definition, the design of the variable FIR
filter, the performed simulations, the reconfigurable architecture, as well as the hardware
implementation and the project assessment.
4.1 System Definition
Our system is primarily a reconfigurable transceiver supporting two of the 3G
standards: WCDMA and WIMAX standards. The transceiver is optimized to support
these two standards that include different modules in their channels. Initially, we studied
each scheme alone by looking into its channel and requirements, and then we tried to
build a common architecture where we emphasize on the idea of reconfiguring the
common modules like channel filers, pulse shaping filters, modulation on both sides from
the transceiver. This reconfigurability scheme helps in developing new systems that
supply the designer with both flexibility of design as well as less hardware resources,
especially that the hardware platforms used in such implementations have limited
resources. For example, implementing three different FIR channel filters would require
the use of 3n multipliers, instead of n multipliers for one reconfigurable filter. The
hardware platform used in our project is PCI 5640 IF RIO system board, manufactured by
National Instruments. This board is typical for our design providing us with high data
rates, AD and DA converters, and the Virtex II-PRO v3000 FPGA. Our system has two
types of inputs: the control/switch, to choose between the two used standards, and the
data input port which receives a sequence of I & Q modulated values representing a given
message sent over one of these standards. At the output side, we get another set of I & Q
values and we would be targeting a low EVM degradation.
Project Design& Analysis 4.2- FIR Filter Coefficients Design
29
4.2- FIR Filter Coefficients Design
This section discusses the design of an FIR filter, in terms of order and coefficient
values design. First, however, we define the FIR system and then move to the generation
of the filter coefficients for the WCDMA and the WIMAX channel filters.
4.2.1- FIR System Definition
A reconfigurable FIR filter, supporting WCDMA and WIMAX standards, with a
variable frequency response is proposed. This reconfigurability aspect has the vivid
advantage of removing some extra unneeded hardware. Our system is part of a larger
system aiming to adapt itself to a large variety of wireless systems, already standardized,
by means of a common hardware platform. We will first generate the different impulse
responses of the different filters using MATLAB. Then, we will use the Virtex-II V2MB
1000 system board as the hardware platform for our system. We will instantiate different
hardware blocks on the system board including the interrupt controller (choose type of
signal), BRAM memory (storing coefficients), on-chip multipliers …and then connect the
different components to come up with the whole system. Our system includes control as
well as data inputs. In-order for the filter to be able to distinguish between the different
proposed standards, the system checks a boolean control: TRUE for WCDMA and FALSE
for WIMAX. The data input is the “InputSignal”, sequence of I & Q values that need to be
filtered. Based on the control, it then uses the corresponding response in the convolution
process. The output for our system is the “OutputSignal” is the filtered values.
4.2.2- Filter Coefficients Generation
The filter design is mainly divided into 2 steps: order and coefficients. The order of
the filter varies according to the specs. In order to achieve certain specs, there is a
minimum order that we need to satisfy to get an acceptable frequency response.
Increasing the order above this target will lead to a more sharp response, but we will pay
Project Design& Analysis 4.2- FIR Filter Coefficients Design
30
for the delay and hardware usage. First, we wrote a MATLAB function that generates the
required order of the filter, generate its coefficients, and then plots its frequency response
based on the rejections in the different bands and the pass-band as shown in Table 4.1.
We also used the FS_10 program to generate the FIR filter coefficients and compare them
with those generated by the MATLAB simulation. In the subsequent section, we generate
the FIR coefficients for the WIMAX standard. Coefficients for the WLAN can be
similarly generated and shall be included in the spring final project report.
4.2.2.1- WIMAX Channel FIR Filter Design
Section 2.1.1, WIMAX standard specification, presents the different bit rates that
WIMAX can support. Of these bit rates, we consider an RF bandwidth of 7 MHz. That is,
the WIMAX can send a data rate up to 7 Mbits/sec at RF. Accordingly, at baseband the
bandwidth of the WIMAX signal will be 3.5M1. [5] presents the FIR baseband
specifications of WIMAX. Based on these specifications, we wrote a MATLAB code to
generate the FIR coefficients as shown in table 4.1. The order of the filter came to be 52.
Figure 4.1 shows the corresponding frequency response of the WIMAX FIR filter. As you
notice, the center frequency is approximately 3 MHz and the specs are met.
% The WIMAX has a bandwidth of 7MHz at RF frequency. Therefore, at baseband, it % has a bandwidth of 7/2 = 3.5MHz. At Adj. Ch (7MHz), the attenuation is at least % 38dB. At Alt. Ch (14MHz) the attenuation is at least 57dB freq_band =[0 2000 3500 7000 14000 16000]; % define the frequency bands of WIMAX attenuation_dB = [0 0 -38 -57]; % define the attenuation at each band in (dB) attenuation = 10 .^(attenuation_dB/10); % transform the attenuation into linear scale Ripple_Ratio = [0.001 0.001 0.001 0.001];% define the percentage ripple at each band Sampling_Frequency = 32e3; % the sampling time is given by ADS to be 0.03125us [N,fpts,mag,wt]= firpmord(freq_band,attenuation,Ripple_Ratio,Sampling_Frequency); b = remez(N,fpts,mag);% apply the remez algorithm given the order % and frequency bands with their attenuations [H,f]=freqz(b,1,512); plot(f/pi*11500,20*log10(abs(H))) axis([0 10000 -100 1]) xlabel('Frequency (Hz)'); ylabel('Attenuation (dB)') title('FIR Baseband filter of WIMAX')
1 7M/2
Project Design& Analysis 4.2- FIR Filter Coefficients Design
31
Table 4.1: MATLAB code to generate FIR coefficients of WIMAX
Figure 4.1: WIMAX FIR baseband filter
To validate our results, the specs were supplied to the FS10 program as shown in
Figure 4.2. The sampling rate is 32 MHz, the same value used by the MATLAB code. This
is also true for the order of the filter and the center frequency. Figure 4.3 shows the
corresponding frequency response, which came to be as expected. The only difference is
the presence of higher amplitude ripples which are characteristic of the built in function
of the program. The group delay, also shown in the figure, is constant. This implies a
linear phase filter, a common characteristic of FIR.
Figure 4.2: FS10 WIMAX FIR
Project Design& Analysis 4.2- FIR Filter Coefficients Design
32
Figure 4.3: WIMAX FIR Filter Response using FS10
4.2.2.2- WCDMA Channel FIR Filter Design
The WCDMA signal is a wideband signal of bandwidth 3.84MHz at radio frequency.
Accordingly, at baseband frequency, the cutoff frequency is 1.92MHz. [26] specifies the
baseband filter specifications of WCDMA. Following the same procedure as the one in
the previous section, the order of the filter turned out to be 48 with the frequency
response shown in figure 4.4. As shown in the figure, the cutoff frequency is
approximately 1.92MHz at -3dB value. This frequency response looks very similar to the
one shown in figure 4.5 which is simulated using the FS10 program. Again the group
delay is constant.
Project Design& Analysis 4.2- FIR Filter Coefficients Design
33
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-70
-60
-50
-40
-30
-20
-10
0
10
Frequency (Hz)
Att
enuation (
dB
)
FIR Baseband filter of WCDMA
X: 1914
Y: -3.83
Figure 4.4: WCDMA FIR baseband filter
Figure 4.5: WCDMA FIR baseband filter using FS10
Project Design& Analysis 4.3- LABVIEW Channel Simulation
34
4.3- LABVIEW Channel Simulation
The LABVIEW simulation is intended to test the validity of our design before
implementing it on a hardware platform. The metric used to test the reliability of the
channel filter design is Constellation Error Vector Magnitude Metric (EVM). Since we
currently care for testing the channel filter, and since this metric doesn’t necessitate the
design of a whole transceiver channel, we can then compare the IQ channel input of the
filter and the IQ channel output from it and then calculate the EVM. Noting that
hardware implementation is time consuming, this LABVIEW software simulation is
essential for testing and debugging purposes in order to find and recover errors as early as
possible in the design process. Also, it allows the user to easily optimize the design by
choosing the best possible set of different parameters.
4.3.1- Simulation of WCDMA channel using LABVIEW
The LABVIEW simulation of the WCDMA channel aims at finding a basis or a
reference based on which we can compare the output of the implemented WCDMA
blocks on the PCI IF RIO 5640 system board. The presented chain has a simpler block
diagram than the one in section 2.1.2. This goes back to the fact that our design doesn’t
take into account the effect of fading (only noise). Accordingly, the WCDMA simplified
chain is given in figure 4.6.
Source
(Stream of bits)
QPSK
Modulator Spread
Spectrum DAC
Basic Modulation Advanced Modulation Pulse Shaping Filter
Root Raised
Cosine Filter
Basic
Demodulation
Advanced
Demodulation Pulse Shaping Filter
ADC QPSK
Demodulator
Sink
(Detected Root Raised
Cosine Filter Despreading Channel
Filter (FIR)
Figure 4.6: Simplified WCDMA channel
Project Design& Analysis 4.3- LABVIEW Channel Simulation
35
As revealed in figure 4.6, the complete transceiver starts with a bit generator, on the
transmitter side, and ends with the BER module, on the receiver side. The BER metric is
used to evaluate the performance of our channel filter and the correctness of the whole
transceiver design in general. Figure 4.7 shows the high level LABVIEW designed
WCDMA transceiver block diagram.
Figure 4.7: LABVIEW based WCDMA transceiver block diagram
After testing each module separately, we connected all the blocks together as
shown in the above diagram. A technical problem, however, arose when executing the
whole chain on the host: virtual memory too low resulting in a fatal error and halting the
execution of LABVIEW software. Even though the host memory capacity is not low, it
was not enough to run this simulation. This is due to the fact that simulating such
transceivers necessitates the use of large number of input bits; in order to observe a clear
spectrum of the signal (time–frequency uncertainty principle). In our design, for example,
we used a spreading factor of 16, and QPSK modulation type. Thus, in order to reach
3.84Mcps at the transmitter’s output, the number of generated bits, n, at the beginning of
the transmitter then is:
63.84 102 / 480,000 !!!
16 /
chipsn bis symbol bits
chips symbol
×= × =
a number that a host, with a 250MB of RAM, can't perform heavy processing on through
the different channel stages. As a result, and due to the fact that the promise given to us
of upgrading the host’s memory wasn't fulfilled except for the last weeks of the semester,
we had to figure out another way to perform the simulation process.
Project Design& Analysis 4.3- LABVIEW Channel Simulation
36
Therefore, we thought of using the ADS (Advanced design system) software,
designed by Agilent, to simulate the transmitter’s part. Accordingly, we generated the
transmitted I and Q (inphase and quadrature) components of the signal using ADS, wrote
these values into a text file, and then read them from LABVIEW and passed them to the
receiver’s side. This technique is possible because of two main reasons: first the generated
files to be read from LABVIEW are not so large; contains around 45000 values (enough to
resemble a WCDMA signal); much smaller than the previous case: 000,480 values that
were used to generate a WCDMA signal. Second, the critical path of the channel and
thus the processing load is tremendously decreased. Thus, we place the designed channel
filter at the front end of the receiver’s side, in the LABVIEW software, to test if it meets
the specifications. Figure 4.8 shows the WCDMA transmitter using ADS:
Qch
Ich
VAR
VAR1
FilterLength=16*SamplesPerChip+1
TimeStop=(StartSlot+NumSlotMeasured)*(666.6667e-6)
TimeStep=1/(3840000*SamplesPerChip)
TimeStart=StartSlot*666.6667e-6
StartSlot=0
NumSlotMeasured=15
ChipsPerSlot=2560
SamplesPerChip=8
SpecVersion=1
EqnVar
1
3GPPFDD_DPCH
G8
SymbolRate=30ksps
PilotPowerOffset=0.0 dB
PilotBitsNum=4 Bits
TPCPowerOffset=0.0 dB
TFCIPowerOffset=0.0 dB
tDPCHOffset=0
TPCValue=0x5555
TFCIValue=0
TFCIField=Off
SpreadCode=0
ScrambleType=Normal
ScrambleOffset=0
ScrambleCode=0
UserFileName="datafile.txt"
RepBitValue=0xff
DataPattern=Random
SpecVersion=Version 12-00
DPCH
3GPP
1 2
RaisedCosineCx
Filter
SquareRoot=YES
ExcessBW=0.22
SymbolInterval=SamplesPerChip
Length=FilterLength
Interpolation=SamplesPerChip
DecimationPhase=0
Decimation=1
1
EVM
E1
OptimizeSamplingInstant=YES
Constellation="(1,1) (1,-1) (-1,-1) (-1,1)"
ModType=QPSK
MeasType=EVM RMS
SymBurstLen=2560
SymTime=(1/3840000) sec
Start=666.6667 usec
RLoad=DefaultRLoad
Plot=None
EVM
1
TimedDataWrite
Time2
FileName="3GPPFDD_BS_DL_Q_Data"
ControlSimulation=YES
Stop=2 msec
Start=666.6667 usec
1
TimedDataWrite
Time1
FileName="3GPPFDD_BS_DL_I_Data"
ControlSimulation=YES
Stop=2 msec
Start=666.6667 usec
1
TimedSink
I
ControlSimulation=YES
Stop=700 usec
Start=666.6667 usec
RLoad=DefaultRLoad
Plot=None
1
SpectrumAnalyzer
SpectrumI
WindowConstant=0.0
Window=Kaiser 7.865
Stop=DefaultTimeStop
Start=DefaultTimeStart
Plot=Rectangular
1
SpectrumAnalyzer
Spectrum
WindowConstant=0.0
Window=Kaiser 7.865
Stop=DefaultT imeStop
Start=DefaultT imeStart
Plot=Rectangular
1
TimedSink
Q
ControlSimulation=YES
Stop=700 usec
Start=666.6667 usec
RLoad=DefaultRLoad
Plot=None
DF
DF
DefaultTimeStop=TimeStop sec
DefaultTimeStart=TimeStart sec
DefaultNumericStop=100
DefaultNumericStart=0
1 2
CxToTimed
C2
FCarrier=2140 MHz
TStep=TimeStep sec
1 2
FloatToTimed
F2
TStep=TimeStep sec
1 2
FloatToTimed
F1
TStep=TimeStep sec
12
3
CxToRect
C1
Figure 4.8: WCDMA transmitter using ADS
Project Design& Analysis 4.3- LABVIEW Channel Simulation
37
The above figure shows that EVM metric, check section 2.5, is now used to evaluate
the performance of our digital channel filter. This is due to the fact that we are no more
generating bits and comparing them with the output bits using the BER metric. Instead,
we compute the transmitted constellation points and compare them with the output
constellation points using the EVM metric. The transmitted WCDMA signal with its
constellation is shown in Figure 4.9. The figure shows the 3.84Mcps bandwidth and the
QPSK basic modulation respectively.
-0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4-0.6 0.5
-0.6
-0.4
-0.2
0.0
0.2
0.4
-0.8
0.6
I
Q
Figure 4.9: WCDMA transmitted signal and its constellation using ADS
Project Design& Analysis 4.3- LABVIEW Channel Simulation
38
The WCDMA channel LABVIEW VIs can be organized in a hierarchical manner.
It includes four main components:
• WCDMA Acquire Input Signal
• Noise Introduction
• Channel Filtering
• EVM Evaluation
Figure 4.10 illustrates the top level connections between the four major
components. Detailed explanation of each VI, component, are presented in subsequent
sections.
Figure 4.10: Top level View of the WCDMA Channel
4.3.1.1- WCDMA Acquire Input Signal
This VI represents the whole ADS-based transmitter shown in the previous
section. Thus, it contains the spreader, the QPSK modulator, and the root raised cosine
pulse shaping filter blocks. The VI basically acquires I and Q waveforms; the ADS
generated data files generated. Each file contains coded and spread 3G I / Q signals taken
over two 3GPP time slots. Each WCDMA signal has 15 slots with total duration of 10ms.
Thus, the 3.84Mcps will be spread equally over the 15 time slots leaving a total of
Project Design& Analysis 4.3- LABVIEW Channel Simulation
39
256Kbps per time slot. Accordingly, two 3GPP time slots will take total time of
2x666.667x1e-6 sec, each running at 3.84Mcps, but sampled at 8x chiprate (4xNyquist). So
every 8 samples in the time waveform represents one chip. These data are based on
30ksps voice coded rate and then spread x128 by spreading code number 0 (which
represents 128 streams of ones). A more detailed view of the VI is shown is figure 4.11. As
can be seen, the inphase and quadrature of the WCDMA signal are stored in two files
which are fetched at run time and transmitted to corresponding VIs. In the VI, we used
an incremental time dt = 3.255E-8 which is calculated as follows:
8
610255.3
81084.3
11 −×=××
==chiptime
dt
The plot of the power spectrum density of the signal and its constellations are shown
in figures 4.12 and 4.13 respectively.
Figure 4.11: WCDMA Acquire Input Signal VI
Project Design& Analysis 4.3- LABVIEW Channel Simulation
40
Figure 4.12: PSD of a WCDMA Signal
Figure 4.13: Constellation of the Input Data
Notice from figure 4.12 that the signal has its 3dB difference from its peak value at
about 1.92MHz which is the baseband bandwidth frequency of the WCDMA signal. The
constellation type of the signal, as shown in figure 4.13, is QPSK which characterizes the
type of modulation applied to the input data bit stream. The QPSK type is one of several
valid types of constellations for a WCDMA signal.
Project Design& Analysis 4.3- LABVIEW Channel Simulation
41
In a QPSK constellation, the ideal constellation points are centered at 4 different
locations, two on the x-axis and the other two on the y-axis. This is roughly shown in
figure 4, where we notice 4 main condensed points, representing the four ideal points,
inaddition to some other points deviating from their ideal locations. These deviations are
due to the effect of pulse shaping filter which spreads the points and changes their I and
Q values.
4.3.1.2- Noise Introduction VI
In real wireless communication scenarios, the channel is rarely an ideal channel.
It often experiences several types of noise including pathloss, shadowing and fading of the
signal due to the transmitting environment. Thus adding noise to our channel is
important for better modeling of real channel parameters. In our simulation, however,
due to the complex modeling of all types of noise sources and due to the fact that noise is
not the main aspect of our project, we limited the effect of noise to an AWGN noise. Thus
“Noise Introduction” VI, shown in figure 4.14, uses the “Add AWGN” VI to add noise to
both; the I and Q components of the WCDMA signal. This AWGN block has a variable
input that determines the standard deviation of the Gaussian noise, and thus the amount
of noise contribution to the signal. The corresponding power spectral density of the noisy
signal and its constellation are shown in figures 4.15 and 4.16.
Figure 4.14: Noise Introduction VI
Project Design& Analysis 4.3- LABVIEW Channel Simulation
42
Figure 4.15: Noisy PSD
Figure 4.16: Noisy QPSK Constellation
As can be noticed from figure 4.15, the WCDMA signal includes some distortion
components. Inaddition, the corresponding constellation, shown in figure 4.16, gets
noisier as well. The above constellation contains only 400 samples of the given noisy
signal. Notice the spread crosses away from the four major locations due to the additional
noise.
4.3.1.3- Channel Filter VI
The Channel Filter VI is one of the most important VIs in our project. It evaluates
the WCDMA filter designed previously. A detailed view of the channel filter is shown in
figure 4.17. The filter first reads the filter coefficients from a file, previously designed
using FS_10, and convolves them with the noisy WCDMA signal. Of course, the
Project Design& Analysis 4.3- LABVIEW Channel Simulation
43
convolution process is done separately for each of the inphase and quadrature
components of the signal.
Figure 4.17: Channel Filter LABVIEW VI
The power spectral density of signal and the corresponding constellation are shown in
figures 4.18 and 4.19.
Figure 4.18: PSD of the Filtered Signal
Project Design& Analysis 4.3- LABVIEW Channel Simulation
44
Figure 4.19: Constellation after Filtering (Quadrature vs. In-phase)
In Figure 4.18, the filtered WCDMA signal is far better than the noisy WCDMA
signal, shown in figure 4.15. This validates the effect of the design of the channel filter.
Notice that the passband amplitude is kept as it was before filtering; about 10-6 units. The
stop band, on the other hand, has been reduced from around 10-11 to 10-14 in dB (factor of
1000). The constellation in figure 4.19 assures the previous analysis. The constellation
points become more deviated towards the four major centers, i.e. more towards the ideal
transmitted WCDMA signal constellations. This deviation (after filtering) is used to
measure the EVM in the signal, as discussed below.
4.3.1.4- Error Vector Magnitude (EVM) VI
The error vector magnitude metric (EVM) measure is used to evaluate the
efficiency of the design of modulators, filters... Refer to section 2.5 for more explanation
of the EVM metric algorithm.
The EVM measure is implemented using LABVIEW. A detailed view of the block
diagram is shown in figure 4.20. The VI has three inputs: the signal, its size, and the
reference positions for the constellations. For our WCDMA signal and since the
generated signal is upsampled by 8 which means that every point is sent 8 times, we
decided to take the best of the eight sent points to calculate the EVM. In order to test the
efficiency of the channel filter designed, we compare the value of the EVM before adding
noise and the value of the EVM after adding and filtering the signal using our channel
Project Design& Analysis 4.3- LABVIEW Channel Simulation
45
filter. The former value came to be 10.0417% and it was degraded into 10.2858 % after
filtering. The degradation percentage is given by:
22 ___ EVMOldEVMNewPercentagenDegradatio −=
22 100417.0102858.0_ −=PercentagenDegradatio = 2.227%2.227%2.227%2.227%
In other words, the degradation percentage after adding noise and filtering it using our
channel filter is 2.227% which is acceptable; less that 5%. Thus, we can deduce that our
channel is working efficiently and we are now ready to synthesize it on the board.
Project Design& Analysis 4.3- LABVIEW Channel Simulation
47
4.3.2- WIMAX Channel Simulation
The same arguments presented earlier in the WCDMA channel simulation section
also apply for the WIMAX channel simulation case. Similarly, we designed the whole
WIMAX transceiver, but the host couldn't simulate the true number of bits needed…
The high level LABVIEW based WIMAX transceiver bock diagram is shown in figure
4.21.
Figure 4.21: LABVIEW WIMAX Transceiver Block Diagram
Using the specifications of the WIMAX standard, see section 2.1.1, we simulated the
WIMAX transmitter using ADS. Figure 4.22 shows the ADS simulated WIMAX
transmitter:
Note: Rate_ID Modulation RS-CC 0 BPSK 1/2 1 QPSK 1/2 2 QPSK 3/4 3 16QAM 1/2 4 16QAM 3/4 5 64QAM 2/3 6 64QAM 3/4
VAR
Signal_Generation_VAR
Cy clicPref ix=1/4
Bandwidth=7.0 MHz
Ov ersamplingOption=2
Rate_ID=3
DataLength=864
SignalPower=10
BurstWithFEC=1
NumberOf Burst=1
Eqn
Var
DF
DF2
OutVar=OutVar
CxToTimedIQ
C1TStep=0.03125 usec
i
q
SpectrumAnalyzerResBWSpec_Q
SegmentTime=NumSegments=
ResBW=3 kHzWindow=Hanning 0.50Stop=DefaultTimeStop
Start=DefaultTimeStartPlot=Rectangular
ResBW
TimedDataWriteT3
FileName="i"ControlSimulation=YES
RLoad=DefaultRLoadStop=DefaultTimeStop
Start=DefaultTimeStart
TimedDataWriteT2
FileName="q"ControlSimulation=YES
RLoad=DefaultRLoadStop=DefaultTimeStop
Start=DefaultTimeStart
WMAN_DL_SignalSrc_RF
DL_source
DataPattern=S_16-QAM
CyclicPrefix=CyclicPrefixBandwidth=Bandwidth
OversamplingOption=OversamplingOptionRate_ID=Rate_IDDataLength=DataLength
BurstWithFEC=BurstWithFECNumberOfBurst=NumberOfBurst
Power=dbmtow(SignalPower)FCarrier=IF_Freq1
RF
DL Source
WMAN
SplitterRFS1
WMAN_OFDM_DL_RxSensitivity_Info
Information
WMAN 802.16-2004 Design Information
VAR
OutVar
OutVar="RSS_Power"
EqnVar
VAR
VAR2
RSS_Power=Prx
EqnVar
VARMeasurement_VARs
Frame=200
EqnVar
VARRF_Channel_VARs
Prx=-102+SNR_Rx+10*log(200/256*RF_Bandwidth*10e-6)
SNR_Rx=16.4
IF_Freq1=380 MHz
EqnVar
TimedToCx
T1
Figure 4.22: WIMAX transmitter using ADS
Project Design& Analysis 4.3- LABVIEW Channel Simulation
48
The transmitted WIMAX signal is an OFDM signal. Thus, it comprises the peaks of a
series of shifted sinc functions in the frequency domain. This phenomenon is further
illustrated in figure 4.23.
365 370 375 380 385 390 395360 400
-150
-100
-50
-200
0
freq, MHz
dB
m(S
pec_Q
)
Figure 4.23: WIMAX transmitted signal using ADS
The WIMAX transceiver LABVIEW VIs can be organized, as in the WCDMA case, in
a hierarchical manner. It includes four main components:
• WIMAX Acquire Input Signal
• Noise Introduction
• Channel Filtering
• EVM Evaluation
Figure 4.24 illustrates the top level connections between the four major components.
Detailed explanation of each VI, component, will be presented in subsequent sections.
Project Design& Analysis 4.3- LABVIEW Channel Simulation
49
Figure 4.24: Top level View of the WIMAX Channel in LABVIEW
4.3.2.1- WIMAX VIs Explanation
The WIMAX Acquire Input signal VI represents the transmitter designed in ADS and
shown in Figure 4.22. Thus, it contains the basic modulation, the OFDM with no pulse
shaping filter blocks as stated in WIMAX specs. The WIMAX was simulated with 7MHz
RF bandwidth. The VI basically acquires I and Q waveforms from the ADS generated
files. Each file contains coded 3G I / Q signals taken for a 4.32ms time slot (containing 108
symbols at 40us per symbol). The used modulation type is 16QAM signal, 1/2
convolutionally coded with 256 FFT sub carriers. The sampling rate is at 0.03125 us.
To test this VI, we simulated the power spectral density of the signal generated by the
given files, and compared the result with the WIMAX transmitted signal using ADS
shown in Figure 4.23. For the comparison to be fair, we split the ADS WIMAX signal into
two symmetric parts and compare each half of the RF WIMAX signal with the baseband
WIMAX LABVIEW PSD. The result was quite similar as shown in Figure 4.25. Note the
baseband bandwidth which is half the RF bandwidth (3.5MHz = 2
7MHz)
Project Design& Analysis 4.3- LABVIEW Channel Simulation
50
Figure 4.25: WIMAX PSD in LABVIEW (Amp. Versus freq)
The noise introduction and the EVM evaluation blocks are quite similar to their
counterparts in WCDMA. Regarding the channel filtering block, it has the same
architecture as that in WCDMA, but obviously with different coefficient values and
number of taps.
We chose the hamming filter rather than rectangular or remez, because this type of
filtering proved its efficiency in case of high noise addition impact. The vividness of such
type is illustrated in the diagrams shown in figure 4.27. For a 0.3 noise standard deviation,
the original signal gets highly distorted. However, through the use of the hamming
filtering, the signal gets successfully recovered.
As stated previously, EVM is the best metric that evaluates the performance of the
channel filter. Unlike the EVM metric for the WCDMA QPSK constellation, the EVM for
the WIMAX has to take into account the 16QAM constellation. Thus the EVM
mathematical formula is changed and this will surely affect the EVM VI that is shown in
figure 4.26.
Project Design& Analysis 4.3- LABVIEW Channel Simulation
51
Figure 4.26- WIMAX EVM LABVIEW Module
Applying the above EVM metric, the EVM of the signal, before applying the channel
filter, turns out to be 51.9898%. As you notice, such value is considered a very high
number. This is due to the fact that the 16QAM constellation points have been randomly
distributed after applying the OFDM function. We would have applied the OFDM
demodulator before calculating the EVM and the EVM will be much smaller but as
explained previously, we aim at lowering the EVM degradation after applying the
channel filter and not the value itself which doesn’t constitute any trouble to us. The
EVM after applying the filter appears to be 52.0834%. Accordingly the degradation
percentage is given by:
%12.3520834.00.519898_ 22 =−=PercentagenDegradatio
Again, the degradation is less than 5% which gives us a good motive to shift to the
implementation part.
Project Design& Analysis 4.3- LABVIEW Channel Simulation
52
Figure 4.27: WIMAX filter response. A) no noise b) noisy WIMAX signal for 0.3 noise standard
deviation c) WIMAX filter response for 0.3 noise standard deviation (Amp vs. freq)
Project Design& Analysis 4.4- Reconfigurable System Architecture
53
4.4- Reconfigurable System Architecture
Given the block diagrams for each of WLAN, WCDMA, and WIMAX, in the previous
chapter, we need a way to combine such standards in order to facilitate the job of
reconfiguration between these standards. As you may have noticed from the previous
block diagrams, we have included the blocks that deal with a real channel scenario (noise
and fading). Given, as a first step, that no fading simulation has to be done, our channel
will be highly depending on noise, and noise alone. So, this pushes us to remove some
blocks from the previous block diagrams, as the interleaver, the Cyclic Prefix Insertion
block, convolution encoding and some other blocks. Accordingly, this leaves us with only
the blocks dealing with modulation/demodulation (“basic” such as QPSK and “advanced”
such as OFDM and spread spectrum), and filtering. However, we kept the block diagrams
as they are (those accounting for the real channel scenario) in order to be a reference for
our future work in case we want to improve our project further and account for real
fading channels (in addition to noise). In figure 4.28, we show the common block
diagram for noisy channels reconfigurable for the prementioned standards: WCDMA and
WIMAX.
Notice that both the interpolation and the decimation filtering are not included in
this block diagram explicitly, however the ADC and the DCA present in PCI 5640R we
are working with, include some specifications for decimation and interpolation filtering
respectively. Briefly, the design will start by inputting a certain number of bits; the
number of such bit rates may be dealt with in either of two ways. Either we input a fixed
number of bits and then throughout the chain of blocks we upsample the bits to meet the
specified bit rate for each standard, or we can input a variable bit rate so that would use
same upsampling factor. Afterwards, we have to modulate these bits. As you may have
noticed from the specifications, for each standard, QPSK is the common possible basic
modulation scheme for the three standards. Then, we have to perform either a DSSS
(direct sequence spread spectrum) for WCDMA and OFDM (orthogonal frequency
Project Design& Analysis 4.4- Reconfigurable System Architecture
54
division multiple access) for both WIMAX and WLAN. Now, the specifications for both
WIMAX and WLAN differ in performing OFDM and that’s why such block must be
reconfigurable between both. Afterwards, a 256 IFFT is performed for both WIMAX and
WLAN which is a very efficient discrete implementation that replaces the pulse shaping
filter. Finally, a root raised cosine filter is applied for WCDMA. While WIMAX doesn’t
use any pulse shaping filter, the WLAN uses a shifted Gaussian shaping filter at each of
the multicarrier specified by the OFDM operation. The output is then converted into the
analog domain using the DAC block with the appropriate interpolation and decimation
factor as discussed previously.
Given that the channel is noisy, the “Reconfigurable Channel Filter (FIR)” block is
implemented as shown in chapter 4.5.
Figure 4.28: Reconfigurable Transceiver Block Diagram
It depends on the specifications of each of the standard as the adjacent channel
attenuation, the baseband frequency, the attenuation at some frequencies, etc…
Project Design& Analysis 4.4- Reconfigurable System Architecture
55
Refer to section 2.3 for additional information about the “channel Filter (FIR)” block,
for each case, with its corresponding specifications.
Project Design& Analysis 4.5- Hardware Implementation
56
4.5- Hardware Implementation
In this section, we present the implementation details of the FPGA-based
reconfigurable FIR filter. We discuss how reconfigurability of the filter is implemented.
Moreover, we present the different problems encountered at the different stages of the
process, their solutions and some other proposed alternatives including their advantages
and disadvantages from an implementation point of view.
4.5.1- Reconfigurability Aspect
The LABVIEW-based board (PCI-5640) simplifies the implementation of a
reconfigurable system because of its two-level hierarchical model: Host and FPGA. The
host VI has the ability to read and modify the different parameters (controls and
indicators) of the FPGA by reading/passing their values to the FPGA VI through built-in
read/write control methods. It also transfers the digitized input signal from a file on the
host to a pre-assigned memory on the FPGA through the read/write control methods.
These methods serve as means of data communication between the host and the FPGA.
All of these features help us to implement an efficient reconfigurable FIR filter where the
number of taps is controlled from the host level and the coefficient values downloaded
and stored on the FPGA memory at run-time. So, instead of storing the different
coefficients of all the filters for the different standards on the FPGA memory, we
download the coefficients of only one filter at run time depending on a user control. This
method is preferred because of its scalability when including several standards; less
memory space. In sum, we can switch between the two standards by just changing a
boolean input control from the host as shown in Figure 4.29. The boolean control is
wired to the select of a case structure. When the select is true the WCDMA coefficient
and data files and other parameters are loaded and when false the WIMAX files are
loaded.
Project Design& Analysis 4.5- Hardware Implementation
57
Figure 4.29: Case Structure (Choose WCDMA / WIMAX)
Another alternative method for reconfiguring the FIR filter is to store the
coefficients of the different standards on the memory blocks of the FPGA before run
time. So, when the host VI is run, only a boolean control is transferred from the host to
the FPGA and then the corresponding coefficients are used for filtering. Even though this
alternative has a less initialization phase, it is inefficient from a memory usage point of
view. Targeting an efficient design, we decided to use the first method.
4.5.2- Data Links and Communications
As previously mentioned, the PCI-5640 board hierarchy is divided into two levels:
host and FPGA. The former is used to keep track of controls and indicators that are sent
and received from while communicating with the FPGA. It is also used to configure (in
LABVIEW) some hardware resources that are already synthesized on the FPGA such as
the ADC and the DACs. Concerning the FPGA level or VI, the designed blocks such as
registers and memory blocks of the design are compiled and then synthesized on the
Virtex II Pro FPGA. After setting the configuration parameters, the FPGA listens for data
coming to its inputs and sends results to a DMA FIFO depending on the logical modeling
of the project. Figure 4.30 shows a high level diagram of the communication process
between the host and FPGA.
Project Design& Analysis 4.5- Hardware Implementation
58
Figure 4.30: Host-FPGA Link
The communication link shown in figure 4.29 is asynchronous, meaning that
there is no guarantee that a sent value from the host VI will be received to the FPGA at a
certain time. In other words, a received value may be read several times before a new
value is received or a sent value may not be read at all!!! So the incoming signal gets
distorted and wrong results are experienced. Due to this fact, we experienced a very hard
time trying to synchronize the two levels, the solution of which is presented below.
4.5.3- Host-FPGA Synchronization
The PCI 5640 board is a very important tool for communications systems
implementation especially that it is LABVIEW based where most of the needed
components can be easily implemented. Yet, this board is still under test and we, as
students, are helping in this procedure. As we mentioned earlier, we faced a major
problem during the filter implementation: how to read values on the FGPA that are sent
from the Host continuously. This issue is very critical in any application; since we always
need to send data from the host level to the FGPA level. We contacted NI via the
available online forums, in particular the NI IF-RIO forum until we finally reached a
solution. Figure 4.31 shows the process of reading a value on the FPGA from the
coefficient control. This read process guarantees that the sent element is read only once.
It saves the previous value in a shift register, and on every iteration of the while loop, it
compares the previous value (the one in the shift register) with the one available in the
Project Design& Analysis 4.5- Hardware Implementation
59
Shift register
coefficients buffer. If they are not equal then, we can recognize that a new value was
received. Thus we read it, save its value in the shift register to compare it with the
coefficient values in the next iteration of the while loop and then pass it to the next stage.
Note that it should be different than zero because the default value for any parameter on
the FPGA is zero, so if we don’t set this condition, we would loose iterations initially. In
other words, the FPGA would write a new value if and only if the output of the AND
gate is true as seen in Figure 4.31.
Figure 4.31: FPGA Read Process
One may ask that we can’t send a zero or the same value twice back to back to the
FPGA? A more complex implementation can be used to allow you to send the same value
more than once. With each value sent from the host to the FPGA a “time stamp” is
attached. Since a time stamp is never equal to the one from the previous iteration, we can
then compare the time stamps instead of comparing the value of the coefficients. Also,
one can send a zero and it would be successful since its time stamp is different than zero.
But, in order to implement this design, we need to encode the time stamp to each value
sent to the FPGA in order to ensure that every value and its corresponding time stamp are
mapped to each other. Accordingly several bits need to be reserved for the time stamp,
and thus several bits of precision for the input values are lost. The signal distortion then
would dramatically increase due to error accumulation upon adding and multiplying
signal terms.
Since, our input and coefficient adjacent values are different; we decided to go
with the simple design without the time stamp to gain higher precision in our results.
Project Design& Analysis 4.5- Hardware Implementation
60
4.5.4- FPGA- Host Synchronization
The link from the FPGA to the host is much simpler than in the opposite
direction. The reason is that the FPGA supports the use of DMA FIFOs in this direction
but not the other way around. Using DMA FIFOs, we can store values in the FIFO and
then read them from the host VI. Since we control the write operation to the FIFO on the
FPGA VI; every value is written once to the FIFO and when read from the host VI is
popped out (removed from the FIFO and thus not read again) and since reading the FIFO
on the host does not allow reading an empty, null, value, then there is no need to
synchronize the two blocks. Putting the read FIFO operation in loop guarantees that the
values are read in the correct manner. This procedure, FIFO read operation, is used to
store the results of the filtering process which are then read from the HOST VI and saved
in an array for later evaluation. Figure 4.32 shows a block diagram of a DMA FIFO. The
inputs for the FIFO include the Timeout value, usually set to zero in order not to cause
any delays. The currently used PCI 5640 board supports only a zero timeout; otherwise a
compilation error would occur. A second input is the “Number of Elements”; the number
of Data values that should be read from the FIFO in every iteration. Since we write one
value per iteration, and thus only one value needs to be transferred from the FPGA to the
host, then we set the number of elements to be read to 1. Also note that the FIFO
supports 32-bit number representation which is in accordance with our design as
described in 4.5.6.
Figure 4.32: DMA FIFO Read Method
Project Design& Analysis 4.5- Hardware Implementation
61
4.5.5- Memory Component
In designing the FIR filter, we need to save the coefficients in memory in order to
repeatedly use them for the convolution process. In addition, we need to keep track of
the last N = “number of coefficients” input values. The reason is that every output
depends on the last N inputs. Thus every new input needs to replace the oldest input in
memory before the convolution process takes place. This is descried in details in the
section 4.5.7.
First, we tried to implement our FIR filter using arrays and FIFOs (local and
DMA). But, the compile took around 15 hours and failed, resulting in a fatal error. After
daily contacts with the NI forums, we recognized that using arrays is not encouraged on
such type of boards (as advised by NI) and that other users of the same board experienced
the same problem as well. The reason is due to the fact that arrays at the software level
are mapped to registers on the board, and that there exists a shortage in registers for large
arrays, typical to the ones we are using. The design of the project using arrays is shown in
figure 4.33. This figure includes a preliminary implementation of our filter where we
were trying to do a simple convolution process with fixed coefficients and filter order.
Thus, it failed because of the above described reasons.
Project Design& Analysis 4.5- Hardware Implementation
62
Figure 4.33: FIR implementation Using Arrays and FIFOs
So, we decided to migrate to a more practical solution, which is using BRAM memory
blocks. These memory blocks are organized as 8k x 16 bits. But since we are using a 32 bit
representation, as illustrated in section 4.5.6, a new problem arises; storing and reading
32-bit number in a 16-bit memory.
To solve this problem we split/concatenated the 32 bit number into two 16 bit
numbers and wrote/read each 16 bits alone to/from memory, thus using 2 memory
addresses. Thus, every 32 bit write/read operation involves two 16 bit write/read
operations; one for the 16 MSB bits and another for the LSB bits, see figure 4.34.
Although this operation doubles the size of the needed memory, it is however necessary
to get the required precision of our numbers.
Figure 4.34: Write a 32-bit coefficient in the memory
Address
Data
Project Design& Analysis 4.5- Hardware Implementation
63
4.5.6- Number Representation
One of the important decisions in any digital design process is which number
representation should be used. Using MATLAB, we first compute the resulting error
when rounding a given number using the fixed point notation. All the numbers are
mapped to a single exponent; chosen in accordance with the given data range and that
would minimize the loss in precision. Knowing that a 16 bit representation resulted in an
unacceptable error value, a 32-bit representation, on the other hand, was sufficient. Using
this representation, we recognized using MATLAB if we scale the output of the
convolution process by higher than 109, we would face a problem of overflow because we
would need more than 32-bit. So, we notice that the highest possible scaling is then 109,
so as result the scaling of the coefficients and input values should not surpass this value
when multiplied.
Table 4.2 shows the MATLAB code that rounds the coefficients and input values to
different exponents between 103 and 106 to get the best combination for the case of the
WCDMA filter design. The results shown that multiplying the coefficients by 982 and the
input values by (109/982) would result in the minimum error. So, based on these result,
we decided to scale up the coefficients by 103 and the input values by 106 before sending
them to the FPGA for filtering process.
coeff_ideal = load('WIMAX_Ham.txt'); % Open the Filter Coeffecient File
input_ideal = load('WIMAX_Q.txt'); % Open the Input I File
output_ideal = conv(coeff_ideal,input_ideal); % Perform the FIR operation for the ideal values
error =ones(1,991); %initialize error vector with ones
for i=1:10:1000
coeff_round = round(coeff_ideal*i*1000);
input_round = round(input_ideal*(1000/i)*1000);
output_ideal = conv(coeff_ideal,input_ideal);
output_round = conv(coeff_round,input_round)/(1000000000);
error(i) = mean(abs(output_ideal-output_round));
end;
[exp, min_err] = min(error) Table 4.2: MATLAB Testing for Fixed Point Notation
Project Design& Analysis 4.5- Hardware Implementation
64
4.5.7- Convolution Process
The convolution process represents the bulk of the FIR filter. Implementing the
convolution efficiently is very important for the performance of the system. The main
efficient metrics to consider in an FIR design and implementation are minimum latency
and minimum hardware usage. Since the order of the filter is around 50, as described in
section 2.3, then we have two choices: the first choice is implementing a parallel filter,
meaning that the filter is able to perform the convolution in a parallel manner. This
choice however translates into fetching 2x50 (for the coefficients and input) values from
memory on every iteration which is way far from being practical, this number of fetches
is not supported by the board and thus we would have to fetch them serially. Such an
implementation would be beneficial in the case that registers were used instead of
memory blocks, because register reads are independent from each other. But, since we are
obliged to use memory blocks as described previously in section 4.55, the parallel filter is
then not a option. So we decided to use a serial implementation of an FIR filter. Figure
4.35 shows how the convolution process takes place on our hardware platform.
Project Design& Analysis 4.5- Hardware Implementation
65
Figure 4.35: Convolution Process
The filtering process consists first of finding the address of the data to be fetched from
memory in the iteration, fetching the corresponding values twice; for the found address
and for address + 1, and then concatenated the two as shown in figure 4. Then, the
multiplication and accumulation process starts until it reaches the last iteration (equal to
“Number of Coefficients -1”). At the last iteration, the output of the convolution is stored
once on the DMA FIFO that would then be read from the host level at run time. The
accumulator in our design is a 32-bit register initialized to zero.
The critical path of our hardware design, consisting mainly of “number of
coefficients” multiplications and additions, is less than the clock period at which the host
sends data values to the FPGA.
Split/join
numbers
Find
Address Accumulator Memory
Read
Project Design& Analysis 4.5- Hardware Implementation
66
4.5.8- Host Application
On the host level, the host reads the output of the FIR filter from the DMA FIFO
before displaying and evaluating the results. A block diagram of the host VI is shown in
figure 4.36. Then, the host VI plots the frequency domain representation of the filter
output and calculates the corresponding EVM values. The frequency domain
representation is important to show that specific frequency contributions to the signal
were preserved. This however is only a visual evaluation. The EVM metric, however,
gives a numerical value of the degradation/deviation of the signal from the original one.
Since, we are designing a channel filter, so degradation less than 5 % is a good indicator
of the effect of the filter on the distribution of the constellations.
Figure 4.36: HOST VI
4.5.9- FPGA Process
The FPGA platform includes the hardware implementation of our reconfigurable FIR
filter. On each iteration of the host’s clock, it receives a new value that needs to go the
filtering process. This value is read as described in section 4.5.2 where the whole FGPA
VI is looping continuously in order to guarantee a read of every new value. Then since
we do have an In-phase and Quadrature component of the signal, we have decided to do
the filtering process on the different channels in parallel as
Project Design& Analysis 4.5- Hardware Implementation
67
shown in figure 4.37. Our FPGA acts on 32-bit numbers and uses BREM memory blocks and DMA fifos for data transfer as described through
this section.
Figure 4.37: FPGA VI
Project Design& Analysis 4.6- Project Assessment
68
4.6- Design Assessment
In this section, we present the testing scheme used through the design process and
the way it helps reaching the final results. Moreover, we emphasize on the both, the
software and hardware levels of the testing scheme. Also, we compare the results
obtained from the hardware implementation with those of the software simulation.
4.6.1- Testing Scheme
The testing process for the implementation of the FIR filter is divided into two main
phases presented shortly. The aim of such testing procedure is to limit the possible errors
and check the feasibility of each phase as early as possible in the design/implementation
process in order to optimize the whole system with the minimum possible cost.
The first phase in the testing scheme involves simulating the FIR in LABVIEW and
testing the different metrics used to qualify our design. In our case, the two main metrics
used to assess our design are the frequency domain response of the output and the error
vector magnitude (EVM). Once, this design phase is successful and we are satisfied with
the results, we can set the designed filter coefficients and start working on the hardware
simulation. The results for this phase are presented in section 4.3; where we show how
we surpassed this phase for the WCDMA and WIMAX channels. As mentioned
previously, the EVM value is a useful metric for filter testing because it allows us to test
our filter and how it affects the distribution of the constellations, especially in our case
where we can’t measure bit error rate of the received bits. In order to assess the efficiency
of our filter design, we need to guarantee that the degradation in the EVM value between
the input signal and the filtered one stays less than 5%, like 2.227% in the case of
WCDMA. Also, concerning the output frequency domain, we need to ensure that
frequencies higher than the transmission frequency experience a deep rejection that
would cancel the effect of any jammer. Once, we have reached the optimum design
(lowest EVM degradation) via the LABVIEW simulation, we have decided on the
optimum type of the FIR filter (rectangular for WCDMA and hamming for WIMAX).
Project Design& Analysis 4.6- Project Assessment
69
The second phase is the hardware implementation testing where we have to test the
FIR filter after downloading it to the FPGA. This phase is the hardest part of the testing
scheme because it requires compiling the FPGA for every single change in the FPGA VI,
sometimes such runs may require hours of running. During this phase, we have passed
through different design alternatives before reaching our final design. For example, we
have started designing our reconfigurable filter using only arrays and FIFOs (local and
DMA), but this VI failed to compile after 15 hour running time (see figure 4.33). Then,
we shifted to an implementation using BRAM memory blocks that usually do cause huge
latencies because of their negligible fetching times. In the next section, we would present
the achieved results based on this implementation and how would they compare to the
simulated results.
4.6.2- Results Assessment
In this section, we would present the strength of our implementation and how it
helps in saving hardware resources. First, the PCI 5640 board helps in exploring the
reconfigurability aspect by presenting a two level hierarchy, Host and FPGA. This
hierarchy permits the Host to use the same hardware resources in different channels by
using some control parameters as described in section 4.5. Starting with the WCDMA
channel, we were able to reach successful results based on our metrics. The first metric is
the frequency domain response of our filtered signal compared to the input as shown in
figure 4.38 and 4.39. The results show that the frequencies below 1.92 MHz conserve
their power, while any kind of jamming at high frequencies is highly attenuated from 10-
11-10-12 to 10-13-10-19 as seen in their graphs. Moreover compared to the software
simulation, the attenuation level is better because in the LABVIEW simulation, we
reached lower rejections of the order 10-13-10-15.
Project Design& Analysis 4.6- Project Assessment
70
Figure 4.38: Frequency Domain of the Input WCDMA signal (Amp vs. freq)
Figure 4.39: Frequency Domain of the Input WCDMA signal (Amp vs. freq)
Concerning the second metric, the EVM value witnessed an increase from 0.102317
before filtering to 0.112394 after filtering which is equivalent to a percentage decrease:
22 ___ EVMOldEVMNewPercentagenDegradatio −=
22 102317.0112394.0_ −=PercentagenDegradatio = 4.654.654.654.65 % % % %
The distributions of the 4-QAM constellations before and after filtering are shown in
figures 4.40 and 4.41 respectively. Moreover, it is worth to mention that the initial value
of the EVM is different than the one obtained in the LABVIEW simulation because once
Project Design& Analysis 4.6- Project Assessment
71
measured in such case, we need to use the fixed point notation thus rounding each value
before measuring the EVM. This is done in order to have a better idea about the
degradation effect that is only caused by the filter thus excluding other effects such as
fixed point notation errors. This percentage increase is higher than the case of the
software simulation where the increase was only approximated by 2.27 %. This difference
is due to the fact that in the software simulation we have higher precision values that are
truncated in the case of hardware implementation.
Figure 4.40: WCDMA Initial Constellations (Quad vs. in-phase)
Figure 4.41: WCDMA Filtered Constellations (Quad vs. in-phase)
The second standard, WIMAX, in our reconfigurable system exhibits similar results
under testing. The EVM value increases from 0.521008 to 0.524038 which is equivalent to
degradation of:
Project Design& Analysis 4.6- Project Assessment
72
22 521008.0524038.0_ −=PercentagenDegradatio = 5.62 %
This degradation is also higher than the obtained one from the simulation which is
3.12%. We should mention that the value of the EVM in the case of the WIMAX is
higher than the one for WCDMA because the transmitter part from the WIMAX channel
does not include a pulse shaping filter as in the WCDMA case, especially that the pulse
shaping filter plays a major role in well distributing the constellations, so reducing the
EVM value. Concerning the frequency domain representation of the input signal and the
output signal are represented in figures 4.42 and 4.43 respectively. We easily notice that
the transition phase in PSD, shown in figure 4.42, is shopped off as seen in figure 4.43.
Figure 4.42: Frequency Domain of the Input WIMAX signal (Amp vs. Freq)
Project Design& Analysis 4.6- Project Assessment
73
Figure 4.42: Frequency Domain of the filtered WIMAX signal (Amp vs. Freq)
After presenting our results, we believe that the LABVIEW simulation and the
hardware implementation have reached quite similar results with some differences in the
case of EVM in favor of the simulation and the frequency response in favor of the
hardware implementation. So, implementing a reconfigurable FIR filter on an FPGA is a
good option and it would be able to compete with the implementation of any other type
of FIR filters.
CH 5: System Design
Constraints
Reconfigurable Baseband Blocks for Wireless
Multistandard Transceivers
System Design Constraints
75
5- System Design Constraints
For any project, designers should be down to earth and try to face as many constraints
as possible that their system poses and at any level, ranging from economical levels,
passing through social, political and ethical levels and ending with more technical
constraints; like manufacturability and sustainability.
Actually our project is a very important step towards achieving a uniform global
system, which although has different languages of communications within it (WIMAX,
WLAN, WCDMA, GSM, etc…) there would exist a unique communication language
capable of connecting all such non-uniform standards. Accordingly, there will be no
more a need for the present so called “Roaming Service” which is a relatively costly
service. Therefore, economically, there would be huge savings due to the absence of such
service on one hand especially for business men who keep traveling from one region to
anther. On the other hand, such reconfigurable system has the vivid advantage of
reducing hardware resources and adapting to different standards.
However, such system poses some economical constraints. This might be due to the
fact that since this reconfigurable system has to group multiple standards together, base
stations and mobile systems or any wireless card have to install new wireless blocks such
as switches (to switch between these reconfigurable standards) or blocks to remove any
possible RF coupling between these signals; such blocks are not needed in the case of a
single wireless standard device.
Socially, our project would help increase the social activities between different
cultures because it would allow people from different regions using different
communication standards to communicate with each other at low costs. Moreover, this
would lead to a decrease in the price of the international calls, thus parents can easily
contact their relatives and keep track of their news.
System Design Constraints
76
Also, the manufacturing process would take a long time to be processed especially
that installing a reconfigurable hardware design on today cell phones and base stations
would take a long time. This transition phase between these two generations is going to
affect the huge advancement in the latest facilities and applications integrated in our
phones such as cameras, video streaming… because the research in this phase would be
mainly concentrated on building area and power efficient reconfigurable blocks. So, as a
result, this transition includes a kind of trade-off between unifying the world and having
a low development rate in design and facilities.
Our project would be a typical competitor to roaming services by replacing them with
very low costs. But, this would create an issue of security since communications between
different countries would be allowed without sufficient level of control which may pose
its own effect especially nowadays where many countries have been encouraged by
globalization to dominate others.
Projecting the issue of security, discussed previously, on an individual bases may lead
to some of the undesirable ethical consequences. Again, people may take advantage of
this new system to spy other systems and you can imagine in what such unethical act
would result and to whom it might be directed if this subject was not solved. For this
serious problem, there needs to be a sufficient attention given to this issue and its effects
need to be seriously considered and dealt with.
Concerning sustainability, our project imposes a common hardware platform on
which any new standard can be integrated. However, this wouldn’t be applicable for new
highly developed standards especially if they follow a complete different architecture.
Thus, two possibilities are left for our reconfigurable design, either to survive with
present standards on behalf of the new ones, or undergo major changes in its architecture
to support the new standards; surely this would lead to sacrificing some old ones.
77
ConclusionConclusionConclusionConclusion
The objective of our FYP project is to give a solution to the problem that the
commercial wireless communication industry is currently facing due to the different
link-layer evolution steps that each wireless standard has undergone. This existence, in
many countries, of incompatible different wireless network technologies, ranging from
2G to 4G, has imposed many difficulties in the deployment of global roaming facilities
and problems in rolling-out new features or services due to the presence of wide-spread
legacy subscriber handsets [26]. Our project concept promises to solve such problems by
implementing the radio functionality as Software modules (using LABVIEW) running on
a generic hardware platform, the PCI5640 Labview8.0 board.
We started with a general overview of the design by giving a literature survey about the
related subjects such as FIR, SDR, EVM etc… We have seen some proposed
reconfigurable, common, architectures and, more importantly, our proposed
reconfigurable block diagram for WIMAX, and WCDMA, in a noisy environment, is
introduced. We have simulated the WCDMA and WIMAX chains using LABVIEW to
have a theoretical reference for our implementation. Moreover, FIR design was
established, and most importantly a reconfigurable FIR module was implemented on the
PCI5640 Labview8.0 board. The results were quite pleasing especially when compared
with the theoretical outcomes resulted from LABVIEW simulation. There was a small
difference between EVM degradation of both the simulation and implementation which
proves the validity of both our design and implementation.
We believe that such project is a very important step towards achieving a whole
reconfigurable transceiver. Therefore, we suggest for future FYP students to continue
where we stopped and try to reconfigure the other blocks in the whole channel
transceiver.
Appendix I- Digital Filter Coefficients Design
78
APPENDIAPPENDIAPPENDIAPPENDIXXXX
I- Digital Filter Coefficients Design
Filter coefficients can be designed using an automated tool, a software application that
generates the taps based on various user-defined parameters. One example of such tools is
the “FS10.0” software or the “Filter Solutions 10.0”.
The following algorithm explains the tap design of an FIR filter. The frequency response
for FIR filters is periodic in the frequency domain with a period of sampling frequency.
Since it is periodic, it can be represented by a Fourier series as shown below:
∑∞
−∞=
Ω−Ω =k
jKjekheH ).()(
where h(k) are the impulse response coefficients that describe the digital FIR filter [16].
These coefficients can be determined from the frequency response using the following
equation:
Ω= Ω
+Ω
−Ω
Ω
∫ deeHnh jjo
o
π
π
ππ
).(2
1)(
Notice the finite number of coefficients given in the above formula. The chosen number
of coefficients (N) should be chosen according to time delay and implementation cost.
The indices above range between –M and M and accordingly, we are assuming the
number of coefficients is equal to N = 2M+1. By making this selection, we are effectively
setting all other coefficients to zero [15].
The frequency response can be determined using the following formula:
∑∞
−∞=
Ω−Ω =n
jnjenheH ).()(
Plotting the desired frequency response and the one based on the designed coefficients
allows us to check if the design is acceptable. Thus, the user can adjust several
Appendix I- Digital Filter Coefficients Design
79
parameters: allowed ripple, transition band … and accordingly increase or decrease the
number of taps that implement the filter [15].
Appendix II- Area Considerations for Variable FIR Design
80
II-Area Considerations for Variable FIR Design
As mentioned in section 2.3.2, there exists a never-ending demand for decreasing the
amount of hardware used in a system. This leads to substantial benefits like reduced cost
and power consumption, increased application functionality1, and thus increased
utilization of FPGA resources …
In most FIR implementation, hardware consumption is mainly due to the multiplier
blocks rather than adder modules [23]. Different algorithms have been proposed for
efficient implementation of multiplier blocks. Previously, different algorithms were
proposed for minimizing adder hardware cost since it was assumed that the adder cost
dominates the area requirement; from a VLSI point of view. However, after the
introduction of the FPGA as the hardware platform, the solution of minimizing adders’
complexity does not work anymore since the “FPGA has a fixed architecture for
implementing digital logic”. Instead, it is the architecture design that minimizes such cost
[23].
Different commonly used approaches and architectures that increase resource
utilization are considered below:
Consider first the standard FIR implementation shown in figure 2.4. The figure shows a
full parallel, fixed coefficient FIR filter. For each tap, the filter requires one multiplier,
one adder and one delay element. Thus the resource usage is proportional to the number
of coefficients [14, 24, 25].
Other enhanced architectures and techniques with higher complexity include array
multiplication, multipliers using add and shift operations, transposed FIR, transposed FIR
architecture with multiplier block, MAG (Minimized adder group) algorithm or
multiplier design, architecture based on computational sharing multipliers (CSHM).
These algorithms are explained in more details below:
1 by using the extra available area
Appendix II- Area Considerations for Variable FIR Design
81
Array MultiplicationArray MultiplicationArray MultiplicationArray Multiplication: this is one of the most commonly used techniques when fast
multiplication is needed. It is mainly used to implement MAC (multiply and Accumulate)
operations. In array multiplication, rows of adders are placed in parallel. Multiplexers
then decide whether to add partial products or not, based on the corresponding bit of the
multiplicand [24]. A pipeline structure can be implemented by inserting flip-flops or
registers between the different rows of adders or stages. The drawback of array
multiplication is that it needs a large number of logic blocks, even for a small number of
multipliers multiplier.
Multipliers Using Add and Shift OperationsMultipliers Using Add and Shift OperationsMultipliers Using Add and Shift OperationsMultipliers Using Add and Shift Operations: this technique is also called distributed
arithmetic. It differs from the previously mentioned technique in the order in which it
performs the steps in a MAC operation. Consider the FIR example: One typical operation
in the FIR is the multiplication of ai by bj, the multiplication of the ith tap (ai) by the jth
input (bj). By breaking ai into its bits, aibj can be represented as follows:
aibj = (ai0bj) + (ai1bj)s(1) + … + (ai,n-1bj)s(n-1)
In distributed arithmetic, however, aibj is modified to:
aibj = (ai0bj) + ( (ai1bj)s(1) + ( ( (ai2bj)s(1) + … + (…( ( (ai,n-1bj)s(1)) ) )…)s(1)
In other words, addition is performed before the multiplication operation (i.e shift and
then add). This helps reduce FPGA resources [24].
Transposed FIR filterTransposed FIR filterTransposed FIR filterTransposed FIR filter: Another commonly used architecture is the transposed FIR
architecture. The architecture is shown in figure A.1. This architecture is mathematically
identical to the standard FIR implementation. However, it performs a more efficient
pipelining than the standard one because of its reduced latency; taps receive input sample
simultaneously and thus identical tap coefficient magnitudes can share multiplication
resources [27].
Appendix II- Area Considerations for Variable FIR Design
82
Figure A.1: Transposed FIR
Transposed FIR aTransposed FIR aTransposed FIR aTransposed FIR architecture with multiplier blockrchitecture with multiplier blockrchitecture with multiplier blockrchitecture with multiplier block: this is an enhanced version of the
transposed FIR filter. The architecture is shown in figure A.2. This architecture
introduces a multiplier block that is based on cascaded additions, subtractions or shifts.
The complexity of the multiplication process is hidden inside the block and is
independent of the other operations. The multiplier block, thus, determines the
efficiency of the filter implementation [23].
Figure A.2: Transposed FIR with multiplier block
MAG (MinimizMAG (MinimizMAG (MinimizMAG (Minimized adder group) algorithm or multiplier designed adder group) algorithm or multiplier designed adder group) algorithm or multiplier designed adder group) algorithm or multiplier design: The algorithm was
proposed by Dempster and Macleod. It generates minimum adder graphs that minimize
the number of adders used for implementing integer multiplication. These adder
reductions reduce hardware cost. In brief, the algorithm first finds different graphs that
can perform the required multiplication. It then chooses between the different graphs
according to the minimum number of required single-bit full adders [23].
Architecture based on ComputationalArchitecture based on ComputationalArchitecture based on ComputationalArchitecture based on Computational Sharing Multipliers (CSHM): Sharing Multipliers (CSHM): Sharing Multipliers (CSHM): Sharing Multipliers (CSHM): The architecture
takes advantage of the computational reuse of different partial vector products inorder to
enhance resource utilization. It aims to reduce redundant computations in the
convolution process. The main idea is to decompose the sequence of bits that represent
the different coefficients by a smaller set of sequences called alphabets. For example, if
Appendix II- Area Considerations for Variable FIR Design
83
c0= 00110111, then C0.X can be rewritten as 24.x. (0011) + 0111.x. Thus the coefficient is
composed of two alphabets 0011 and 0111. Note that an alphabet space should span all
the available coefficients. So if another coefficient contains an equal alphabet of another
coefficient, it can reuse the previously computed multiplication result. Moreover, the
entire multiplication process is reduced to a set of add and shift operations [12,27]. The
approach can also be applied for FIR filters with programmable coefficients.
Appendix III- LABVIEW 8.0 System Board: PCI 5640
84
III-LABVIEW 8.0 System Board: PCI 5640
The PCI 5640 device, a LABVIEW 8.0 RIO system board, is mainly based on a
reconfigurable FPGA and some fixed I/O resources, i.e. an IF transceiver. Unlike
traditional IF digitizers where the functionality of the system is completely
predetermined, the FPGA allows the user to configure the behavior of various modules to
meet the system requirements. The FPGA is built around a reconfigurable architecture
where the user can define I/O resources or create new ones. Figure A.3 shows a high level
diagram of the reconfigurable architecture [28].
Figure A.3: High level FPGA_ I/O architecture
The I/O resources can either be outputs of the ADC and DAC, digital input lines,
digital output lines …. Software modules access the device through the BUS interface
while the FPGA provides logic need for the connectivity1 between the bus interface and
the I/O resources. Figure A.4 illustrates the FPGA logic for the IF transceiver [28].
Figure A.5 shows a high level diagram of the PCI 5640 device. Note that the DC power
control and the memory modules are hidden to simply the diagram [28].
1 Timing, triggering, processing, custom I/O
Appendix III- LABVIEW 8.0 System Board: PCI 5640
85
Figure A.4: FPGA logic for the IF transceiver
Figure A.5: High level Diagram of the PCI 5640
The PCI 5640 device has two analog inputs, AI, ports. The input signal is passed
through low pass filter, converted to a differential signal and then passed to the ADC1.
The signal is then downconverted and passed to the FPGA.
The device also includes two analog outputs, AO, ports. The pulse shaping filter maps
bits to signals that are passed to a compensation filter, and then to an interpolation filter.
The device then performs upconversion in the digital domain and finally passes the signal
to the DAC2.
1 AD6654 component form analog instruments
2 AD9857 component from analog instruments
Appendix III- LABVIEW 8.0 System Board: PCI 5640
86
The RTSI, real time system integration bus, allows multiple RIO PCI-5640 devices to
share the same trigger and events’ synchronization signals.
The PCI bus provides PCI bus interface for the PCI 5640 device with bus mastering
capabilities. The PCI bus allows to efficiently transfer data between the host PC and the
5640 device.
The PCI 5640 device also includes onboard memory of 2 MB (SRAM) inaddition to
the RAM available in the Virtex-II Pro (XC2VP30) FPGA. Section IV in the Appendix
gives more details about the capabilities of the FPGA.
The advantage of such board in our project is that it uses a relatively easy software,
LABVIEW, and thus hides the complexity of the common HDL languages, VHDL and
Verilog, that are commonly used to design hardware components. Moreover, it also
supports designs that are created using HDL. So, modules created using VHDL or some
other HDL language can be imported to the LABVIEW as custom VIs
Appendix IV- Virtex-II Pro FPGA Capabilities
87
IV- Virtex II Pro FPGA Capabilities
The LABVIEW 8.0 PCI 5640 System board contains a Virtex II Pro device that is
connected to all resources on the device (ADC, DAC, clk distribution circuit (CDC),
external trigger…). The Virtex II Pro is a platform FPGA based on IP cores and
customized modules [29]. The device present in our system board is the XC2VP30 FPGA.
This kind of device incorporates many resources and features some of which are:
----RocketIO transceiver blocks:RocketIO transceiver blocks:RocketIO transceiver blocks:RocketIO transceiver blocks: a full duplex serial transceiver whose baud rates range
from 600 Mbits/sec to 3.125 Gbits/sec. It is a flexible serial to parallel and parallel to serial
embedded transceiver cores used to interconnect busses, backplanes or other subsystems
with high bandwidth [29]. Our device supports up to 8 RocketIO transceiver blocks.
----PowerPC Processor blocksPowerPC Processor blocksPowerPC Processor blocksPowerPC Processor blocks: an embedded 300 MHz or more with Harvard architecture
block. It can execute instructions at a sustained rate of 1 instruction per cycle. Our
device can support up to 2 PowerPC processors [29].
----30816 logic cells30816 logic cells30816 logic cells30816 logic cells where a logic cell is defined as
Logic cell = (1) 4-input LUT + (1) FF + Carry logic
----18x18 multiplier block18x18 multiplier block18x18 multiplier block18x18 multiplier block: an 18 bit x 18 bit multiplier block. The block is a two’s
complement signed multiplier and is characterized by a very efficient structure. The
device can hold up to 136 multiplier blocks.
----SelectRAM+ blockSelectRAM+ blockSelectRAM+ blockSelectRAM+ block: this block contains memory resources of 18 Kb of True Dual Port
RAM. It can be cascaded to implement large memory blocks. Our device supports 136
18Kb blocks with MaxBlock RAM of 2448 Kb.
----Max User I/O pads of 644Max User I/O pads of 644Max User I/O pads of 644Max User I/O pads of 644
----DCMDCMDCMDCM: digital clock manager; provides self calibrating, fully digital solutions for clock
distribution delay compensation, clock multiplication and division, and fine and coarse
clock phase shifting. Our device can support up to 8 DCMs.
Appendix V- Fixed Point Notation
88
V- Fixed Point Notation
In computing arithmetic, any fixed point integer can be represented by a pair of
integers (n, e), the mantissa and the exponent. The pair represents the function n.2-e. If ‘e’
is a variable quantity, then the pair (n, e) represents a floating point number. On the
other hand, if e is known in advance, in compile time, then the pair is said to be a fixed
point number.
The following steps are the needed operations used in fixed point notation:
• Converting a number to fixed point notation is simply dividing this number by 2-e
where ‘e’ is a fixed parameter, and the mantissa would be presented in our design
in 16-bit numbers stored in registers without the known exponent.
• Addition / Subtraction: addition of the mantissa without change of the exponents
n2-e ± m2-e = (n ± m)2-e
• Multiplication / Division: multiplication of the two mantissas and shifting to the
answer to the right ‘e’ times.
n2-p x m2-p = mn2-2p = mn 2-p x 2-p = (mn2-p ) >> p
The above argument was given for an exponent value of 2. The argument, however, can
be generalized for an exponent. An exponent of e = 10, for example is a valid example.
Note that the more bits you use to represent the mantissa and the exponent, the better
the resolution for the output is. The designer, however, wants to represent the
coefficients with the least possible number of bits that gives good accuracy for the output;
the aim of the designer is to take as much utilization of the resources as possible.
Unfortunately, the chosen number of bits is inversely proportional to the magnitude of
the quantization error.
89
BibliographyBibliographyBibliographyBibliography
1. Buracchini, E. “The software radio concept”. IEEE, Communications Magazine.
Volume 38, Issue 9, Page(s):138 – 143. Sept, 2000.
2. H. Córdova , P. Boets L. Van Biesen. Vrije. “Insight Analysis into WI-MAX Standard
and its Trends”. Universiteit Brussel, Belguim.
3. PhoneScoope Co. “WIMAX 8-2.16” Retrieved in 2005 from
http://www.phonescoop.com/glossary/term.php?gid=187
4. Amine Sobh, Mohammad Boulmalf, Shakil Akhtar, “Physical Layer Performance of
802.11g WLAN” Applied Telecommunication Symposium, UAE University
5. Hasssan Yaghobi. “WIMAX2 :802.16, Broadband Wireless Access: the next big thing
in Wireless”. Intel. Sept 16, 2003
6. “Interpolation” Retrieved in 2005 from
http://www.dspguru.com/info/faqs/multrate/interp.htm
7. Phil Shcniter. “Upsampling”. Connexions SM . Retrieved on Oct, 2005 from
http://cnx.rice.edu/content/m10403/latest/
8. Wipro Technologies. “Software-Defined Radio, White paper. A technology
Overview”. Available at: http://www.wipro.com/dsp. Aug, 2002.
9. G. Girau, M. Martina, A. Molino, A. Terreno, and F. Vacca. “FPGA Digital Down
Converter IP for SDR Terminals”. IEEE, Signals, Systems and Computers. Vol. 2,
pages: 1010-1014. November 2002.
10. Enrico Buracchini. “The Software Radio Concept”. IEEE, Communication Magazine.
Vol. 38, pages 138-43. September 2000
11. Apostolos A. Kountouris, Christphe Moy, Luc Rambaud, Pascal Le Corre. “A
Reconfigurable Radio Case Study: A Software based Mulit-Standard Transceiver for
90
UMTS, GSM, EDGE, and Bluetooth”. IEEE, Vehicular Technology Conference. Vol. 2,
pages: 1196 – 1200. October 2001.
12. Jerry C.-Y. Kao, C.-F Su and Allen C.-H Wu. “High Performance FIR Generation
Based on a Timing Driven Architecture and Component Selection Method”. IEEE,
Circuits and Systems. Vol. 4, pages:759-762. May 2005
13. Jun Seo Lee, Jong Hyun Park, Sang Woo Kim, Ying Shan Lee and Heung Gyoon Ryu.
“Implementation of DSP-Based Digital Receiver for the SDR Application”. IEEE,
Communications Society. Vol. 1, pages: 6-10. Aug-Sept 2004.
14. Litwin, Louig. “FIR and IIR digital Filter, the effects of finite bit precision”. IEEE
potentials. Vol. 19, Issue 4, pages: 28-31. Oct-Nov 2000.
15. Mitra, Sanjit Kumar. “Digital Signal Processing: a computer-based approach”. 2nd
edition, 2001. McGrow-Hill series in electrical and computer engineering, Boston
16. Thede, Les. “Practical Analog and Digital Filter Design”. Artech House Inc. 2004
17. Hoffman, M.W, Stewart, R. W. “Digital Signal Processing From A to Z”. Blue Box
Multimedia Inc. 1998
18. Carson K. S. Pun, S.C. Chan, K.S. Yeung, and K.L. Ho. “On the Design and
Implementation of FIR and IIR Filters With Variable Frequency Characteristics”.
IEEE, Trans. On Circuits and Systems: Analog And Digital Signal Processing. Vol. 2,
pages 185-188. May 2002.
19. Khalid H. Abed, Vivek Venugopal, Shailesh B. Nerurkar. “High Speed Digital Filter
Design Using Minimal Signed Digit Representation”. IEEE, South East Conference.
Pages 105-110. April 2005
20. Tian Bo Deng. “Variable 2-D FIR Digital Filter Design and Parallel Implementation”.
IEEE, Trans on Circuits and Systems: Analog and Digital Signal Processing. Vol. 46,
pages: 631-635. May 1999.
91
21. Rachid Zarour, Moustafa M. Fahmy. “A Design Technique for Variable Digital
Filters”. IEEE, Trans. On Circuits and Systems. Vol. 36, pages 1473-1478. November
1989
22. Cain, G.D, Hermanowicz, E, Rojewski, M, Tarczynski, A. “WLS design of variable
frequency response FIR filters””””. Circuits and Systems, ,Proceedings of 1997 IEEE
International Symposium. Volume 4, Page(s):2244 – 2247. June 1997
23. Macpherson, K.N.; Stewart, R.W. “Low FPGA area multiplier blocks for full parallel
FIR filters””””, Proceedings of the IEEE International Conference on Field-
Programmable Technology. Page(s):247 – 254. 2004.
24. Marc. Cummings, Shinichiro Haruyama. “FPGA in the Software Radio”. IEEE,
Communication Magazine. Vol. 37, pages: 108-112. February 1999
25. David A. Parker, Kashab K. Parhi. “Area-Efficient Parallel FIR digital
Implementation”. IEEE, Application Specific Systems, Architectures and Processors
(ASAP). Pages: 93-111. August 1996.
26. Xiaopeng Li, “Architectures and specs help analysis of multi-standard receivers”.
Available at http://www.ece.osu.edu/vlsi/architecture_multi_standard_receivers.htm.
March 12, 2003
27. Jongsun Park, Woopyo Jeong, Hunso Choo, hamid Mahmoodi-Meimand, Kaushik
Roy. “High Performance and Low Power FIR Filter Design Based on Sharing
Multiplication”. ACM, Low Power Electronics and Design, USA. Aug. 2002.
28. National Instruments. “NI5640R User manual”. Retrieved in 2005 from www.ni.com
29. Xilinx. “Virtex II Pro and Virtex II Pto X Platform FPGA: Complete Data Sheet”
Retrieved on October 10, 2005 from www.xilinx.com