Date post: | 17-Dec-2015 |
Category: |
Documents |
Upload: | anthony-pierce |
View: | 215 times |
Download: | 0 times |
2
Outline 1. Overview
1.1 The Requirement for Computational Speed of Simulation for
Wireless WCDMA system
1.2 Parallel Programming
2. Types of Parallel Computers
2.1 Shared Memory Multiprocessor System
2.2 Message Passing Multiprocessor with Local Memory
3. Parallel Programming Scenarios
3.1 Ideal Parallel Computations
3.2 Partitioning and Divide-and- Conquer Strategies
3.3 Pipelined Computation
3.4 Synchronous Computation
3.5 Load balancing
3.6 Multiprocessor with Shared Memory
4. Progress of the project
3
1. Overview
1.1 The Requirement for Computational Speed of Wireless WCDMA
Network Simulation
• In mobile communication, the development of advanced signal
processing techniques such as smart antenna and MUD can improve the
system performance, but require signal or system level simulation.
• Simulation is an important tool for getting insight into the problem.
However, often it is very time consuming task to simulate the signal
processing algorithms
• It is necessary to speed up simulation. Parallel programming is one
of the best techniques to solve this problem.
4
1.2 Parallel Programming
Parallel programming can speed up the execution of a program by
dividing the program into multiple fragments that can be executed
simultaneously, each on it’s own processor.
Parallel programming involves:
♦ Decomposing an algorithm or data into parts
♦ Distributing sub-tasks which are processed by
multiple processors simultaneously
♦ Coordinating work and communications between those processors
5
1.2 Parallel Programming ( cont. )
The Requirements for Parallel Programming
♦ Parallel architecture being used
♦ Multiple processors
♦ Network
♦ Environment to create and manage parallel processing
♦ A parallel algorithm and parallel program
6
2. Types of Parallel Computers
2.1 Shared Memory Multiprocessor System
CPU
Memory
♦ Multiple processors operate independently but share the same memory resources.
♦ Only one processor can access the shared memory location at a time
♦ Synchronisation achieved by controlling with READING FROM and
WRITING TO the shared memory.
CPU
CPU CPU
7
2.1 Shared Memory Multiprocessor System (cont.)
♦ Advantages
• Easy for user to use efficiently
• Data sharing among tasks is fast ( speedup memory access )
♦ Disadvantages
• The size of memory might be a limiting factor. Increase the number of
processors without increase of the size of memory can cause severe
bottlenecks
• User is responsible for establishing synchronization.
8
2.2 Message Passing Multiprocessor with Local Memory
♦ Multiple processors operate independently but each has its own local memory.
♦ Data is shared across communication network using message passing
♦ User is responsible for synchronization using message passing.
Network
Memory CPU
Memory CPU
CPU Memory
CPU Memory
9
2.2 Message Passing Multiprocessor with Local Memory (cont)
♦ Advantages
• Memory scalable to number of processors. Increase number of
processors with their own memory , the total size of memory will be
increased comparing with the shared memory multiprocessor system.
• Each processor can rapidly access its own memory without limitation.
♦ Disadvantages
• Difficult to map existing data structures.
• User is responsible for sending and receiving data among processors
• To minimize overhead and latency, data should be stacked up in large
blocks before receiving nodes will need it.
10
3. Parallel Programming Scenario
3.1 Ideal Parallel Computations
• A computation can be readily divided into completely
independent parts that can be executed simultaneously .
• Example:
In the simulation of Uplink WCDMA (single user), signal
processing at the transmitter and the receiver are
divided into smaller parts,
executed by separate processors.
11
3.1 Ideal Parallel Computations (cont.)
Example: simulation of wireless communication with Ideal Parallel Computation
CPU 2
Channel coding and
data matching
CPU4
Spreading and
scrambling
CPU 5
Pulse shaping filtering
CPU 1
Source data generation
(traffic/packet)
CPU 3
Modulation
Transmitter
CPU 10
Channel
decoding
CPU 9
demodulation
Receiver CPU 6
Reconstruction of the composite signal
(signal, channel,AWGN)
AWGN
CPU 8
Rake combining
CPU 7
Matched filtering
Radio channel
12
3.2 Task Partitioning and Divide-and-Conquer Strategies
• Partitioning: the problem is simply divided into separate parts and
each part is computed separately
• Divide-and-Conquer: to divide task continually into smaller
and smaller subtasks before solving the
smaller parts and the results are combined
• Example:
In the simulation of Rake combining technique in WCDMA,
the problem can be continually divided among different fingers. In
each finger, the problem can be also divided into correlating, delay
equalizing, MRC/EGC combining.
13
3.2 Partitioning and Divide-and- Conquer Strategies (cont.)
Example: the simulation of wireless communication with Divide-and- Conquer Strategy
Rake Combining
Finger K
Finger 2
Finger 1
CPU 2
modified with the channel estimate
CPU 3
combining with
MRC/EGC
CPU 1
Correlating
14
3.3 Pipelined Computation
• The problem is divided into a series of tasks that have to be
completed one after the other.
• Each task will be executed by a separate processor
• Partially sequential in nature
• Example:
In the simulation of WCDMA transmitter and receiver, each block
of signal processing needs the output of the previous block as its
input. In this case, Pipelining technique is adopted to parallel
sequential source code.
15
3.3 Pipelined Computation (cont.) Example: the simulation of wireless communication with Pipelined Computation
CPU 2
Channel coding and
data matching
CPU4
Spreading and
scrambling
CPU 5
Pulse shaping filtering
CPU 1
Source data generation
(traffic/packet)
CPU 3
Modulation
Transmitter
CPU 10
Channel
decoding
CPU 9
demodulation
Receiver CPU 6
Reconstruction of the composite signal
(signal, channel,AWGN)
AWGN
CPU 8
Rake combining
CPU 7
Matched filtering
Radio channel
16
3.4 Synchronous Computation
• Processors need to exchange data between themselves.
• All the processes start at the same time in a lock-step manner
• Each process must wait until all processes have reached a particular
reference point (barrier) in their computation.
• Example: WCDMA system
Smart Antenna (SA) : the signal processing in each branch of antenna
elements must be finished before combining them.
Rake Combining: the signal processing in each finger must be
finished before combining them.
Multiuser Detection(MUD): as MUD for each user signal needs
other users’ signal message, the operation for
all users’ signal must be finished before MUD.
17
3.4 Synchronous Computation (cont.)
Example: the simulation of wireless communication with Synchronous Computation
AWGN
CPU
AWGN
CPU
…
…
Received signal
reconstruction
Matched filtering
…
Beam forming
…
Rake Combining
CPU
Rake
Combining
…
CPU
Rake
Combining
CPU Finger K
CPU Finger 1
Modified with the channel estimate
Correlating
…
…
…
CPU
Beamforing
Combining
…
User 1 CPU Finger K
CPU Finger 1
Modified with the channel estimate
Correlating
…
User N
…
MUD
MUD
w
w
18
3.4 Synchronous Computation (cont.)
Example: the simulation of wireless communication with Synchronous Computation
Mutiuser Detection
CPU ...
...
CPU ...
CPU ...
...
The output of user 1’ beamforming /combining
The signature waveform of user 1
The output of user 2’ beamforming /combining
The signature waveform of user 2
The output of user N’ beamforming /combining
The signature waveform of user N
19
3.5 Load balancing
• to distribute computation load fairly across processors in order to
obtain the highest possible execution speed.
• Example: WCDMA system
Smart Antenna (SA) : the speed of Direction of arrival (DOA) variation for
different user signal can be different, this means that beamforming
processor for different user could have different number of
operations. The load of all processors can be fairly balanced by
detecting if the solution has been reached on each processor.
Rake Combining: the number of multipath signals for different users could
be
different. The load of all processors can be fairly balanced by
detecting if the solution has been reached by each processor.
20
3.5 Load balancing (cont.) Example: the simulation of wireless communication with Load balancing
Rake Combining
...
CPU 1 ( user 1)
Computation time
CPU 2 ( user 2 has more number of multipath signals)
than that of other users
Computation time
CPU N ( user N)
Computation time
Beamforming
...
CPU N+1 ( user 1)
Computation time
CPU N+2 ( the channel parameter of user 2 are varying faster than that of other users)
Computation time
CPU 2N ( user N)
Computation time
21
3.6 Multiprocessor with Shared Memory
• Multiprocessor with shard memory can speed up programming by
storing the executable code and data in shared memory for each
processor.
• Example
In the simulation of WCDMA with multiple users, each part of signal
processing model could have certain number of algorithms, for example
adaptive Beamforming: RLS, LMS, CMA, Conjugate Gradient Method
Multiuser Detection: Decorrelating detector, MMSE Detector,
Adaptive MMSE Detection etc.
All codes for these algorithms are stored in the shared memory.
Processing for each user shares all these codes
The processor for each user can access these executable codes in the
shared memory to speed up the programming.
22
3.6 Multiprocessor with Shared Memory (cont.)
Example: the simulation of wireless communication by Multiprocessor with Shared Memory
Beamforming
Cache
CPU 1 ( user 1)
...Cache
CPU N ( user N)
Memory module ( RLS )
Memory module (CMA)...
Multiuser Detection
Cache
CPU 1 ( user 1)
...Cache
CPU N ( user N)
Memory module
(decorrelating detector)
Memory module
( MMSE )...
23
4. Progress of the project
The following models of WCDMA system are developed /integrated into
simulator
-Spreader/despreder
-Spatial Processing
-RAKE receiver
-Fading radio channel
-Some simulation results are obtained for the models verification
-Interactions with SARG at Stanford on Rake receiver model
verifications
Work on translation from MATLAB into C language with further
parallelization is accomplished at UCLA.