Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.

Development of Parallel Simulator for Wireless WCDMA Network

Hong ZhangCommunication lab of HUT

2

Outline 1. Overview

1.1 The Requirement for Computational Speed of Simulation for

Wireless WCDMA system

1.2 Parallel Programming

2. Types of Parallel Computers

2.1 Shared Memory Multiprocessor System

2.2 Message Passing Multiprocessor with Local Memory

3. Parallel Programming Scenarios

3.1 Ideal Parallel Computations

3.2 Partitioning and Divide-and- Conquer Strategies

3.3 Pipelined Computation

3.4 Synchronous Computation

3.5 Load balancing

3.6 Multiprocessor with Shared Memory

4. Progress of the project

3

1. Overview

1.1 The Requirement for Computational Speed of Wireless WCDMA

Network Simulation

• In mobile communication, the development of advanced signal

processing techniques such as smart antenna and MUD can improve the

system performance, but require signal or system level simulation.

• Simulation is an important tool for getting insight into the problem.

However, often it is very time consuming task to simulate the signal

processing algorithms

• It is necessary to speed up simulation. Parallel programming is one

of the best techniques to solve this problem.

4

1.2 Parallel Programming

Parallel programming can speed up the execution of a program by

dividing the program into multiple fragments that can be executed

simultaneously, each on it’s own processor.

Parallel programming involves:

♦ Decomposing an algorithm or data into parts

♦ Distributing sub-tasks which are processed by

multiple processors simultaneously

♦ Coordinating work and communications between those processors

5

1.2 Parallel Programming ( cont. )

The Requirements for Parallel Programming

♦ Parallel architecture being used

♦ Multiple processors

♦ Network

♦ Environment to create and manage parallel processing

♦ A parallel algorithm and parallel program

6

2. Types of Parallel Computers

2.1 Shared Memory Multiprocessor System

CPU

Memory

♦ Multiple processors operate independently but share the same memory resources.

♦ Only one processor can access the shared memory location at a time

♦ Synchronisation achieved by controlling with READING FROM and

WRITING TO the shared memory.

CPU

CPU CPU

7

2.1 Shared Memory Multiprocessor System (cont.)

♦ Advantages

• Easy for user to use efficiently

• Data sharing among tasks is fast ( speedup memory access )

♦ Disadvantages

• The size of memory might be a limiting factor. Increase the number of

processors without increase of the size of memory can cause severe

bottlenecks

• User is responsible for establishing synchronization.

8

2.2 Message Passing Multiprocessor with Local Memory

♦ Multiple processors operate independently but each has its own local memory.

♦ Data is shared across communication network using message passing

♦ User is responsible for synchronization using message passing.

Network

Memory CPU

Memory CPU

CPU Memory

CPU Memory

9

2.2 Message Passing Multiprocessor with Local Memory (cont)

♦ Advantages

• Memory scalable to number of processors. Increase number of

processors with their own memory , the total size of memory will be

increased comparing with the shared memory multiprocessor system.

• Each processor can rapidly access its own memory without limitation.

♦ Disadvantages

• Difficult to map existing data structures.

• User is responsible for sending and receiving data among processors

• To minimize overhead and latency, data should be stacked up in large

blocks before receiving nodes will need it.

10

3. Parallel Programming Scenario

3.1 Ideal Parallel Computations

• A computation can be readily divided into completely

independent parts that can be executed simultaneously .

• Example:

In the simulation of Uplink WCDMA (single user), signal

processing at the transmitter and the receiver are

divided into smaller parts,

executed by separate processors.

11

3.1 Ideal Parallel Computations (cont.)

Example: simulation of wireless communication with Ideal Parallel Computation

CPU 2

Channel coding and

data matching

CPU4

Spreading and

scrambling

CPU 5

Pulse shaping filtering

CPU 1

Source data generation

(traffic/packet)

CPU 3

Modulation

Transmitter

CPU 10

Channel

decoding

CPU 9

demodulation

Receiver CPU 6

Reconstruction of the composite signal

(signal, channel,AWGN)

AWGN

CPU 8

Rake combining

CPU 7

Matched filtering

Radio channel

12

3.2 Task Partitioning and Divide-and-Conquer Strategies

• Partitioning: the problem is simply divided into separate parts and

each part is computed separately

• Divide-and-Conquer: to divide task continually into smaller

and smaller subtasks before solving the

smaller parts and the results are combined

• Example:

In the simulation of Rake combining technique in WCDMA,

the problem can be continually divided among different fingers. In

each finger, the problem can be also divided into correlating, delay

equalizing, MRC/EGC combining.

13

3.2 Partitioning and Divide-and- Conquer Strategies (cont.)

Example: the simulation of wireless communication with Divide-and- Conquer Strategy

Rake Combining

Finger K

Finger 2

Finger 1

CPU 2

modified with the channel estimate

CPU 3

combining with

MRC/EGC

CPU 1

Correlating

14

3.3 Pipelined Computation

• The problem is divided into a series of tasks that have to be

completed one after the other.

• Each task will be executed by a separate processor

• Partially sequential in nature

• Example:

In the simulation of WCDMA transmitter and receiver, each block

of signal processing needs the output of the previous block as its

input. In this case, Pipelining technique is adopted to parallel

sequential source code.

15

3.3 Pipelined Computation (cont.) Example: the simulation of wireless communication with Pipelined Computation

CPU 2

Channel coding and

data matching

CPU4

Spreading and

scrambling

CPU 5

Pulse shaping filtering

CPU 1

Source data generation

(traffic/packet)

CPU 3

Modulation

Transmitter

CPU 10

Channel

decoding

CPU 9

demodulation

Receiver CPU 6

Reconstruction of the composite signal

(signal, channel,AWGN)

AWGN

CPU 8

Rake combining

CPU 7

Matched filtering

Radio channel

16

3.4 Synchronous Computation

• Processors need to exchange data between themselves.

• All the processes start at the same time in a lock-step manner

• Each process must wait until all processes have reached a particular

reference point (barrier) in their computation.

• Example: WCDMA system

Smart Antenna (SA) : the signal processing in each branch of antenna

elements must be finished before combining them.

Rake Combining: the signal processing in each finger must be

finished before combining them.

Multiuser Detection(MUD): as MUD for each user signal needs

other users’ signal message, the operation for

all users’ signal must be finished before MUD.

17

3.4 Synchronous Computation (cont.)

Example: the simulation of wireless communication with Synchronous Computation

AWGN

CPU

AWGN

CPU

…

…

Received signal

reconstruction

Matched filtering

…

Beam forming

…

Rake Combining

CPU

Rake

Combining

…

CPU

Rake

Combining

CPU Finger K

CPU Finger 1

Modified with the channel estimate

Correlating

…

…

…

CPU

Beamforing

Combining

…

User 1 CPU Finger K

CPU Finger 1

Modified with the channel estimate

Correlating

…

User N

…

MUD

MUD

w

w

18

3.4 Synchronous Computation (cont.)

Example: the simulation of wireless communication with Synchronous Computation

Mutiuser Detection

CPU ...

...

CPU ...

CPU ...

...

The output of user 1’ beamforming /combining

The signature waveform of user 1

The output of user 2’ beamforming /combining

The signature waveform of user 2

The output of user N’ beamforming /combining

The signature waveform of user N

19

3.5 Load balancing

• to distribute computation load fairly across processors in order to

obtain the highest possible execution speed.

• Example: WCDMA system

Smart Antenna (SA) : the speed of Direction of arrival (DOA) variation for

different user signal can be different, this means that beamforming

processor for different user could have different number of

operations. The load of all processors can be fairly balanced by

detecting if the solution has been reached on each processor.

Rake Combining: the number of multipath signals for different users could

be

different. The load of all processors can be fairly balanced by

detecting if the solution has been reached by each processor.

20

3.5 Load balancing (cont.) Example: the simulation of wireless communication with Load balancing

Rake Combining

...

CPU 1 ( user 1)

Computation time

CPU 2 ( user 2 has more number of multipath signals)

than that of other users

Computation time

CPU N ( user N)

Computation time

Beamforming

...

CPU N+1 ( user 1)

Computation time

CPU N+2 ( the channel parameter of user 2 are varying faster than that of other users)

Computation time

CPU 2N ( user N)

Computation time

21

3.6 Multiprocessor with Shared Memory

• Multiprocessor with shard memory can speed up programming by

storing the executable code and data in shared memory for each

processor.

• Example

In the simulation of WCDMA with multiple users, each part of signal

processing model could have certain number of algorithms, for example

adaptive Beamforming: RLS, LMS, CMA, Conjugate Gradient Method

Multiuser Detection: Decorrelating detector, MMSE Detector,

Adaptive MMSE Detection etc.

All codes for these algorithms are stored in the shared memory.

Processing for each user shares all these codes

The processor for each user can access these executable codes in the

shared memory to speed up the programming.

22

3.6 Multiprocessor with Shared Memory (cont.)

Example: the simulation of wireless communication by Multiprocessor with Shared Memory

Beamforming

Cache

CPU 1 ( user 1)

...Cache

CPU N ( user N)

Memory module ( RLS )

Memory module (CMA)...

Multiuser Detection

Cache

CPU 1 ( user 1)

...Cache

CPU N ( user N)

Memory module

(decorrelating detector)

Memory module

( MMSE )...

23

4. Progress of the project

The following models of WCDMA system are developed /integrated into

simulator

-Spreader/despreder

-Spatial Processing

-RAKE receiver

-Fading radio channel

-Some simulation results are obtained for the models verification

-Interactions with SARG at Stanford on Rake receiver model

verifications

Work on translation from MATLAB into C language with further

parallelization is accomplished at UCLA.

Date post:	17-Dec-2015
Category:	Documents
Upload:	anthony-pierce
View:	215 times
Download:	0 times

Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.

Documents