+ All Categories
Home > Documents > Parallel Deposit (bit scatter)

Parallel Deposit (bit scatter)

Date post: 05-Feb-2016
Category:
Upload: amelie
View: 36 times
Download: 0 times
Share this document with a friend
Description:
1. 111. 1. 11111111. Advanced Bit Manipulation Instructions for Commodity Processors Yedidya Hilewitz and Ruby B. Lee Princeton Architecture Laboratory for Multimedia and Security Department of Electrical Engineering, Princeton University. New Instructions. Background and Motivation. - PowerPoint PPT Presentation
1
Parallel Deposit (bit scatter) Deposits in the result register, at positions flagged by 1’s in r 3 , the right justified bits from r 2 Yedidya Hilewitz and Ruby B. Lee, “Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Advanced Bit Manipulation Instructions for Commodity Processors Yedidya Hilewitz and Ruby B. Lee Princeton Architecture Laboratory for Multimedia and Security Department of Electrical Engineering, Princeton University Background and Motivation Advanced bit manipulations are not well supported by commodity microprocessors These operations are performed using “programming tricks” (see Hacker’s Delight) Bit manipulations play a role in applications of increasing importance We propose adding direct support for a few key bit manipulation operations to accelerate these applications Example Applications New Instructions Butterfly and Inverse Butterfly Parallel Extract and Parallel Deposit Bit Matrix Multiply Summary and Conclusions Ongoing and Future Work Applications (and Speedup) Permutation Butterfly and Inverse Butterfly Bit Gather and Bit Scatter Parallel Extract and Parallel Deposit Bit Matrix Multiply Other bit manipulation instructions (not covered here) Bit matrix transpose Population count Advanced bit manipulations play an important role in many applications We have introduced a few select bit manipulation instructions that speed up these applications We have evolved the shifter to a new design using butterfly and inverse butterfly datapaths to support basic and advanced bit manipulation instructions Advanced bit manipulations are no longer esoteric “programming tricks” but Cryptography Random number generation Von Neumann Extractor Toeplitz Matrix Multiply Steganography Cryptanalysis (Gaussian elimination) Other applications: Binary compression Binary image morphology Bioinformatics Communications coding FFT Finite field arithmetic Integer compression Pattern matching Other applications suggested by you! (up to 2.24× speedup) (9.9× speedup) (14.9× speedup) (2.92× speedup) Identify new applications where bit manipulation instructions are useful (e.g., LFSR and FCSR RNGs, software radio) Implementation Refine current circuit implementation Integrate new shifter in scalable crypto co-processor (PAX) Butterfly lg(n) stages of n 2:1 MUXes split into n/2 pairs that pass through or swap inputs bfly+ibfly = general permutation network Any of the n! permutations of n bits can be done with one pass of both instructions Inverse Butterfly Parallel Extract (bit gather) extracts bits from r 2 flagged by 1’s in r 3 and compresses and right justifies in result register r 2 r 1 r 3 1 111 111 11111111 11111111 1 r 2 r 1 r 3 Cryptography – permutations in ciphers and hash functions, e.g., TDES: Random Number Generators – extract bits from source of entropy Von Neumann Extractor (Intel RNG) – given bit-pair sequence {x 2i , x 2i+1 } from entropy pool, extract x 2i if the bits differ: Toeplitz Matrix Multiply Extractor – multiply bit string from entropy pool by a binary Toeplitz matrix: LSB Steganography – embed secret message in least significant bits of image or audio file: bmm.n C = B, A A, B, C: n × n bit matrices: C = A × B mod 2 for i from 1 to n for j from 1 to n c i , j = a i,1 b 1,j a i,2 b 2,j a i,n b n,j bmm.8 unit (pictured above) can be directly incorporated into the ALU (<¼ size) New Shifter Architecture Brand new shifter architecture that replaces the shifter with a new unit that directly supports bit manipulation operations New shifter performs basic shifter operations: shift, rotate, extract and deposit multimedia shift-permute operations: mix advanced bit manipulation operations: bfly, ibfly, pex, pdep Yedidya Hilewitz and Ruby B. Lee, “A New Basis for Shifters in General-Purpose Processors for Existing and Advanced Bit Manipulations,” to appear in IEEE Transactions on Computers. Yedidya Hilewitz and Ruby B. Lee, “Performing Advanced Bit Manipulations Efficiently in General- Purpose Processors,” Proceedings of 18 th IEEE Symposium on Computer Arithmetic (ARITH-18) , June 2007.
Transcript
Page 1: Parallel Deposit (bit scatter)

Parallel Deposit (bit scatter) Deposits in the result register, at positions

flagged by 1’s in r3, the right justified bits from r2

Yedidya Hilewitz and Ruby B. Lee, “Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors,” to appear in Journal of VLSI Signal Processing Systems.

Yedidya Hilewitz and Ruby B. Lee, “Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions,” Proceedings of the IEEE 17th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 65-72, September 11-13, 2006 (Best Paper Award).

Advanced Bit Manipulation Instructions for Commodity ProcessorsYedidya Hilewitz and Ruby B. Lee

Princeton Architecture Laboratory for Multimedia and SecurityDepartment of Electrical Engineering, Princeton University

Background and Motivation

Advanced bit manipulations are not well supported by commodity microprocessors These operations are performed using

“programming tricks” (see Hacker’s Delight) Bit manipulations play a role in applications of

increasing importance We propose adding direct support for a few key

bit manipulation operations to accelerate these applications

Example Applications

New Instructions

Butterfly and Inverse Butterfly

Parallel Extract and Parallel Deposit

Bit Matrix Multiply

Summary and Conclusions

Ongoing and Future Work

Applications (and Speedup)

Permutation Butterfly and Inverse Butterfly

Bit Gather and Bit Scatter Parallel Extract and Parallel Deposit

Bit Matrix Multiply Other bit manipulation instructions (not covered

here) Bit matrix transpose Population count

Advanced bit manipulations play an important role in many applications

We have introduced a few select bit manipulation instructions that speed up these applications

We have evolved the shifter to a new design using butterfly and inverse butterfly datapaths to support basic and advanced bit manipulation instructions

Advanced bit manipulations are no longer esoteric “programming tricks” but rather supported directly by microprocessors at only a marginal cost

Cryptography Random number generation

Von Neumann Extractor Toeplitz Matrix Multiply

Steganography Cryptanalysis (Gaussian elimination) Other applications:

Binary compression Binary image morphology Bioinformatics Communications coding FFT Finite field arithmetic Integer compression Pattern matching

Other applications suggested by you!

(up to 2.24× speedup)

(9.9× speedup)(14.9× speedup)(2.92× speedup)

Identify new applications where bit manipulation instructions are useful (e.g., LFSR and FCSR RNGs, software radio)

Implementation Refine current circuit implementation Integrate new shifter in scalable crypto co-

processor (PAX)

Butterfly

lg(n) stages of n 2:1 MUXes split into n/2 pairs that pass through or swap inputs

bfly+ibfly = general permutation network Any of the n! permutations of n bits can be

done with one pass of both instructions

Inverse Butterfly

Parallel Extract (bit gather) extracts bits from r2 flagged by 1’s in r3 and

compresses and right justifies in result register

rr22

rr11

rr33 11 111111 111111111111111111

rr22

rr11

rr33

Cryptography – permutations in ciphers and hash functions, e.g., TDES:

Random Number Generators – extract bits from source of entropy Von Neumann Extractor (Intel RNG) – given

bit-pair sequence {x2i, x2i+1} from entropy pool, extract x2i if the bits differ:

Toeplitz Matrix Multiply Extractor – multiply bit string from entropy pool by a binary Toeplitz matrix:

LSB Steganography – embed secret message in least significant bits of image or audio file:

bmm.n C = B, A A, B, C: n × n bit matrices: C = A × B mod 2 for i from 1 to n for j from 1 to n ci,j = ai,1b1,j ai,2b2,j … ai,nbn,j

bmm.8 unit (pictured above) can be directly incorporated into the ALU (<¼ size)

Yedidya Hilewitz and Ruby B. Lee, “Achieving Very Fast Bit Matrix Multiplication in Commodity Microprocessors,” Princeton University Department of Electrical Engineering Technical Report CE-L2007-006, August 2007.

New Shifter Architecture

Brand new shifter architecture that replaces the shifter with a new unit that directly supports bit manipulation operations

New shifter performs basic shifter operations:

shift, rotate, extract and deposit multimedia shift-permute operations:

mix advanced bit manipulation operations:

bfly, ibfly, pex, pdep

Yedidya Hilewitz and Ruby B. Lee, “A New Basis for Shifters in General-Purpose Processors for Existing and Advanced Bit Manipulations,” to appear in IEEE Transactions on Computers.

Yedidya Hilewitz and Ruby B. Lee, “Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors,” Proceedings of 18th IEEE Symposium on Computer Arithmetic (ARITH-18), June 2007.

Recommended