Parallel Deposit (bit scatter) Deposits in the result register, at positions
flagged by 1’s in r3, the right justified bits from r2
Yedidya Hilewitz and Ruby B. Lee, “Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors,” to appear in Journal of VLSI Signal Processing Systems.
Yedidya Hilewitz and Ruby B. Lee, “Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions,” Proceedings of the IEEE 17th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 65-72, September 11-13, 2006 (Best Paper Award).
Advanced Bit Manipulation Instructions for Commodity ProcessorsYedidya Hilewitz and Ruby B. Lee
Princeton Architecture Laboratory for Multimedia and SecurityDepartment of Electrical Engineering, Princeton University
Background and Motivation
Advanced bit manipulations are not well supported by commodity microprocessors These operations are performed using
“programming tricks” (see Hacker’s Delight) Bit manipulations play a role in applications of
increasing importance We propose adding direct support for a few key
bit manipulation operations to accelerate these applications
Example Applications
New Instructions
Butterfly and Inverse Butterfly
Parallel Extract and Parallel Deposit
Bit Matrix Multiply
Summary and Conclusions
Ongoing and Future Work
Applications (and Speedup)
Permutation Butterfly and Inverse Butterfly
Bit Gather and Bit Scatter Parallel Extract and Parallel Deposit
Bit Matrix Multiply Other bit manipulation instructions (not covered
here) Bit matrix transpose Population count
Advanced bit manipulations play an important role in many applications
We have introduced a few select bit manipulation instructions that speed up these applications
We have evolved the shifter to a new design using butterfly and inverse butterfly datapaths to support basic and advanced bit manipulation instructions
Advanced bit manipulations are no longer esoteric “programming tricks” but rather supported directly by microprocessors at only a marginal cost
Cryptography Random number generation
Von Neumann Extractor Toeplitz Matrix Multiply
Steganography Cryptanalysis (Gaussian elimination) Other applications:
Binary compression Binary image morphology Bioinformatics Communications coding FFT Finite field arithmetic Integer compression Pattern matching
Other applications suggested by you!
(up to 2.24× speedup)
(9.9× speedup)(14.9× speedup)(2.92× speedup)
Identify new applications where bit manipulation instructions are useful (e.g., LFSR and FCSR RNGs, software radio)
Implementation Refine current circuit implementation Integrate new shifter in scalable crypto co-
processor (PAX)
Butterfly
lg(n) stages of n 2:1 MUXes split into n/2 pairs that pass through or swap inputs
bfly+ibfly = general permutation network Any of the n! permutations of n bits can be
done with one pass of both instructions
Inverse Butterfly
Parallel Extract (bit gather) extracts bits from r2 flagged by 1’s in r3 and
compresses and right justifies in result register
rr22
rr11
rr33 11 111111 111111111111111111
rr22
rr11
rr33
Cryptography – permutations in ciphers and hash functions, e.g., TDES:
Random Number Generators – extract bits from source of entropy Von Neumann Extractor (Intel RNG) – given
bit-pair sequence {x2i, x2i+1} from entropy pool, extract x2i if the bits differ:
Toeplitz Matrix Multiply Extractor – multiply bit string from entropy pool by a binary Toeplitz matrix:
LSB Steganography – embed secret message in least significant bits of image or audio file:
bmm.n C = B, A A, B, C: n × n bit matrices: C = A × B mod 2 for i from 1 to n for j from 1 to n ci,j = ai,1b1,j ai,2b2,j … ai,nbn,j
bmm.8 unit (pictured above) can be directly incorporated into the ALU (<¼ size)
Yedidya Hilewitz and Ruby B. Lee, “Achieving Very Fast Bit Matrix Multiplication in Commodity Microprocessors,” Princeton University Department of Electrical Engineering Technical Report CE-L2007-006, August 2007.
New Shifter Architecture
Brand new shifter architecture that replaces the shifter with a new unit that directly supports bit manipulation operations
New shifter performs basic shifter operations:
shift, rotate, extract and deposit multimedia shift-permute operations:
mix advanced bit manipulation operations:
bfly, ibfly, pex, pdep
Yedidya Hilewitz and Ruby B. Lee, “A New Basis for Shifters in General-Purpose Processors for Existing and Advanced Bit Manipulations,” to appear in IEEE Transactions on Computers.
Yedidya Hilewitz and Ruby B. Lee, “Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors,” Proceedings of 18th IEEE Symposium on Computer Arithmetic (ARITH-18), June 2007.