+ All Categories
Home > Documents > Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy...

Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy...

Date post: 15-Jan-2016
Category:
Upload: baldwin-barton
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
27
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland
Transcript
Page 1: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

Juanjo NogueraXilinx Research Labs

Dublin, Ireland

Ahmed Al-Wattar

Irwin O. Irwin O. Kennedy

Alcatel-LucentDublin, Ireland

Page 2: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

introduce a new approach to reduce FPGA power consumption.

By exploiting the time varying nature of a systems environment

closely tracking environmental changes and adapting the implementation accordingly using partial reconfiguration

Page 3: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

Partial Reconfiguration (PR) allows the reconfiguration of a part of the device while the rest of the FPGA continues operating

there have been multiple hardware enhancements to Xilinx FPGAs to better support partial reconfiguration.

Page 4: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

Smaller units of reconfiguration granularity. ◦ From the full device height reconfiguration frames

in the Virtex-II and Virtex-II Pro families to the 16- CLB’s high in the Virtex-4 family.

Increased bandwidth in the internal configuration access port:◦ From 50Mbytes/s in the Virtex-II and Virtex-II Pro

families to 400Mbytes/s in the Virtex-4 family Early Access Partial Reconfiguration (EAPR)

Page 5: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

Traditionally, partial reconfiguration has been used to time multiplex multiple mutual exclusive functions, hence reducing cost and static power consumption.◦ it does not present any benefit in applications

where all application functions are required on the FPGA at the same time

Page 6: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

use of partial reconfiguration to time-multiplex different implementations of the same function.◦ reduce the FPGAs dynamic power consumption

specializing the implementation to the current subset of requirements, we can reduce average power consumption.

Page 7: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Page 8: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

We have applied this idea of adapting the implementation for power savings to the networking application domain◦ using a forward error correction core (i.e., Viterbi

decoder)

Page 9: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

most of the dynamic power dissipation in an FPGA fabric is due to the programmable interconnects and clocking resources

reductions in power consumption by increasing the number of pipeline stages in a FPGA design

Several authors have proposed low-power implementations of the Viterbi decoder

Page 10: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

The environment is the stimulus it receives from external sources◦ e.g. number of users in a system, communication

channel conditions, or total throughput.◦ The number of users in a wireless base-station

changes throughout the day.◦ signal to noise ratio at a wireless base-station

changes with the location of the mobile phone◦ The mixture of voice and data users on a cellular

base-station changes throughout the day

Page 11: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Page 12: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

Cost of electricity◦ Google warned that the cost of electricity used to

power their equipment could soon be greater than cost of the equipment itself

Reliability◦ Average heat energy is the greatest determinant

of digital electronics reliability Thermal Engineering

◦ Thermal engineering is concerned with removing excess heat energy from a system.

Page 13: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

Application-level partial reconfiguration Architecture-level partial reconfiguration

◦ the bit width of the data path or the number of pipeline stages in an arithmetic block implementation

Device-level partial reconfiguration◦ loading the unused function’s FPGA area with

the most power efficient idle configuration or directly controlling the FPGA clocking resources (i.e., clock buffers or DCM modules) from the configuration memory

Page 14: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

Forward error correction codes such as convolutional codes limit the effects of noise in digital communication

Viterbi algorithm is used for decoding convolutional codes

widely applied in networking applications due to its good noise tolerance

Page 15: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

adapting the Viterbi decoder implementation in two ways◦ changes in the signal to noise ratio◦ changes in the required throughput

Xilinx provides a Viterbi decoder core in Coregen.

Page 16: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

running at 100MHz dual-port memory blocks (32Kbytes)

implemented using on-chip BRAM’s we connected a power supply with

integrated ammeter to the FPGA internal core

Page 17: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Page 18: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

The Viterbi algorithm’s constraint length (K) greatly impacts the decoder’s Bit Error Rate (BER) performance

We verified this assumption experimentally using three implementations of the parallel Viterbi decoder with different constraint lengths.

significant impact that the constraint length parameter has on the number of FPGA resources used

Page 19: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Page 20: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Page 21: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Page 22: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

The Xilinx Viterbi core has a parameter that enables the user to select among a serial and a parallel architecture

Page 23: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

power consumption measurements reveal, that for this example, the parallel architecture is more power-efficient than the serial architecture

sample points for the 8.3Mbps throughput we can observe that there is a difference of 200mW (approx.)

Page 24: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

Reducing the number of LUTs and routing resources required to implement a function effectively reduces its capacitance

dynamic power consumption is also proportional to the switching activity of all nodes in the design

The serial architecture requires 12 clock cycles for each decoding process, while the parallel architecture only requires a single clock cycle◦ Serial average power consumption of 0.7W (approx.),

with peaks around 1W. ◦ Parallel average power of 0.5W (approx.) and peaks of

2.5W.

Page 25: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Page 26: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Page 27: Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

Recommended