+ All Categories
Home > Documents > Blowfish 2

Blowfish 2

Date post: 24-May-2017
Category:
Upload: drshakeelkhan
View: 239 times
Download: 0 times
Share this document with a friend
13
Blowfish Kevin Allison Keith Feldman Ethan Mick
Transcript

BlowfishKevin Allison

Keith FeldmanEthan Mick

IntroductionBlowfish is a Feistel network block cipher with a 64 bit block size and a variable key size up to 448 bits long. The Blowfish algorithm is unencumbered by patents and is free to use for any one is any situation.

Blowfish consists of two parts: key-expansion and data encryption. During the key expansion stage, the inputted key is converted into several subkey arrays total 4168 bytes. There is the P-array, which is eighteen 32-bit boxes, and the S-boxes, which are four 32-bit arrays with 256 entires each. All of these boxes are initialized with a fixed string, the hexadecimal digits of pi (less the number 3).

After the string initialization, the first 32 bits of the key are XORed with P1 (the first 32-bit box in the P-array). The second 32 bits of the key are XORed with P2, and so on, until all 448, or fewer, key bits have been XORed. Cycle through the key bits by returning to the beginning of the key, until the entire P-array has been XORed with the key.

Key-Expansion32 bits 32 bits32 bitsKey: 32 bits 10 bits

32 bits 32 bits 32 bits 32 bits 32 bits 32 bits 32 bits 32 bitsP-Array: ...⊕ ⊕ ⊕ ⊕ ⊕

Saturday, May 12, 12

(XOing bits once the key has been traversed through once)

Encrypt the all zero string using the Blowfish algorithm, using the modified P-array above, to get a 64 bit block. Replace P1 with the first 32 bits of output, and P2 with the second 32 bits of output (from the 64 bit block). Use the 64 bit output as input back into the Blowfish cipher, to get a new 64 bit block. Replace the next values in the P-array with the block. Repeat for all the values in the P-array and all the S boxes in order.

Key-Expansion

Blowfish32 bits 32 bits 32 bits 32 bits

32 bits 32 bits 32 bits 32 bits 32 bits 32 bitsP-Array: ...32 bits 32 bits

Saturday, May 12, 12

(The second 64 bit block is dropped into the P-array)

The Blowfish algorithm is now ready for encryption. The encryption is a simply Feistal network of 16 rounds. For the input of 64 bits, do:

Divide x into two 32-bit halves: xL, xRFor i = 1 to 16:xL = xL XOR Pi

xR = F(xL) XOR xRSwap xL and xRNext iSwap xL and xR (Undo the last swap.)xR = xR XOR P17xL = xL XOR P18Recombine xL and xRData-Encryption

32 bits 32 bits

64 bits

P i ⊕F ⊕

Do 16 Times

Saturday, May 12, 12

(The 16 rounds)

The F function is: F(xL) = ((S1,a + S2,b mod 232) XOR S3,c) + S4,d mod 232 where a,b,c,d are four 8 bit quartered derived from xL.

8 bits8 bits 8 bits 8 bits

Function-F

32 bits

S2 S3 S4S1

32 bits 32 bits 32 bits 32 bits

= addition modulo 232

32 bits

Saturday, May 12, 12

(The F function)

Decryption is the same as encryption, except the P-arrays are used in reverse.

OutputSome example input and output of the Blowfish algorithm.

$ ./Blowfish 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00Key: 0 0 0 0 0 0 0 0 Plaintext: 0 0 0 0 0 0 0 0

Ciphertext: 4e f9 97 45 61 98 dd 78

$ ./Blowfish FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FFKey: ff ff ff ff ff ff ff ff Plaintext: ff ff ff ff ff ff ff ff Ciphertext: 51 86 6f d5 b8 5e cb 8a

$ ./Blowfish 37 D0 6B B5 16 CB 75 46 16 4D 5E 40 4F 27 52 32Key: 37 d0 6b b5 16 cb 75 46 Plaintext: 16 4d 5e 40 4f 27 52 32 Ciphertext: 5f 99 d0 4f 5b 16 39 69

Some example input and output for the file encryption program.

[ Other program IO here]

DesignOur implementation of the Blowfish algorithm was written in C++. This language was chosen to avoid all the hassle Java creates when dealing with bytes and casting between types. In C++, using stdint.h allows for the types to be created with the appropriate size, allowing for easier debugging and code creation.

The original design followed the specification, especially during the encryption section. The encryption method takes in an array of bytes (should be 8 long for 64 bits), breaks it apart into two different 32 bit integers, and then performs the computation in the Feistal network loop 16 times.

(The encryption loop)

However, for key generation our implementation actually calculated the digits of pi using the Bailey-Borwein-Plouffe formula. Every time the key is set, the formula calculates the digits of Pi using the formula and sets the P-array and S boxes. While accurate, setting the key took a very long time.

ProfilingThe raw dump of gprof is:

% Time Cumulative Seconds

Self Seconds

Calls TotalS/call

name

84.75 4.08 4.08 17355552 0 BinaryExp

12.08 4.66 0.58 4168 0 series

3.02 4.80 0.15 17354511 0 std::pow

0.31 4.82 0.02 1 0.02 BlockCipher

0 4.82 0 8368 0 F

0 4.82 0 2088 0 pack32BitWord

0 4.82 0 1042 0 computeHexPi

0 4.82 0 523 0 encrypt

0 4.82 0 63 0 blockSize

0 4.82 0 20 0 keySize

0 4.82 0 12 0 std::operator&

0 4.82 0 6 0 std::setf

0 4.82 0 6 0 std:operator&=

0 4.82 0 6 0 std::operator~

0 4.82 0 6 0 std::operator|=

0 4.82 0 6 0 std::operator|

0 4.82 0 3 0 print_uint8_hex

0 4.82 0 1 0 global constructors

0 4.82 0 1 0 static_init

0 4.82 0 1 4.80 setKey

0 4.82 0 1 0.02 BlowFish

Image of callgrind information.

AnalysisThe gprof profiling tool gave a less in depth look than callgrind, but it was an excellent way to get started. The tool said that 84.75% of the program time was being spent in the BinaryExp() method. This method does binary exponentiation and ends up taking a lot of computing power the further along Pi we are trying to calculate. The total time spent in this method was 4.08 seconds, an absurd amount.

The next method, series, took 12.08% of the time, and is also used in the calculation of Pi. The method also uses the “pow” function in the C language, which ranks third in the gprof analysis.

Finally, ranked at number 5, we have the F function, which is used in encryption. The encryption method is down at number 8, with very little time being used in it.

The callgrind stack trace confirms this, while also going deeper in the library calls. The binaryExp method calls fmod, which ranks first. After which is the method binaryExp itself, confirming gprof’s information. Looking through the ranks, the profilers agree on which methods are consuming the most time.

From here, the best choice is to look at the Pi generation. Why are we generating Pi? We need to set the P-array and S boxes to the string every time a new key is set. But it doesn’t make sense to be generating it every time - the string itself never changes. In order to speed up the algorithm, we can generate the static digits of Pi and have the digits in an external file which can be referenced. The boxes can be set to these static digits without generation, speeding up the code.

With all of the Pi generation code gone, the highest costing method should be the F function and the Encryption function.

Re-DesignWe noticed that the majority of time was spent calculating the first 8336 hexadecimal digits of Pi.  To combat this, we removed the Pi generation code from the main Blowfish file and created its own executable.  When this was run, it output a syntactically correct header file that had the values of Pi defined in a class called HexPi.  This class could then be accessed by including the header file in the main Blowfish code and thus lead to a large speedup and decrease in execution time.

Pi[0] = 0x243f6a88;Pi[1] = 0x85a308d3;Pi[2] = 0x13198a2e;Pi[3] = 0x3707344;...

The array holds 1042 sections of Pi, which can be accessed during the key-expansion part of the algorithm. The digits of Pi can simply be assigned by:" pArray[i] = hexPi.Pi[i];

Once the Pi generation has been seperated out,

Profiling (Again)After re-designing our code, we ran gprof and kcachegrind again.

% Time Cumulative Seconds

Self Seconds

Calls TotalS/call

name

0 0.00 0.00 8352 0.00 F

0 0.00 0.00 2086 0.00 pack32BitWord

0 0.00 0.00 522 0.00 encrypt

0 0.00 0.00 21 0.00 keySize

0 0.00 0.00 13 0.00 BlockSize

0 0.00 0.00 12 0.00 operator&

0 0.00 0.00 6 0.00 std::setf

0 0.00 0.00 6 0.00 operator&=

0 0.00 0.00 6 0.00 operator~

0 0.00 0.00 6 0.00 operator|=

0 0.00 0.00 6 0.00 operator|

0 0.00 0.00 3 0.00 print_uint8_hex

0 0.00 0.00 1 0.00 BlockCipher

0 0.00 0.00 1 0.00 HexPi

0 0.00 0.00 1 0.00 setKey

0 0.00 0.00 1 0.00 BlowFish

kcachegrind:

Again, the kcachegrind goes into more detail and examines some of the library calls. However, callgrind says that the encrypt method is more expensive than the pack32BitWord method, whereas gprof has those two reversed. In both instances, the two are near the top of the lists.

AnalysisThe gprof analysis was less useful that we first hoped. It says the algorithm is faster, but the testing does not go into enough detail to know how much faster. However, it is safe to say the changes we made did speed up the program.

Callgrind goes into more detail. The first method that is ours is the F function, which is called only in the encrypt method. It goes through and puts each of the 8 bits into S boxes, and then runs the arithmetic. This method is rather efficient and short; in order to make it faster we could unroll the S boxes, but with each one being 256 entries long, and the array indexing being extremely convenient, it didn’t seem worth it.

Secondly, the encrypt method follows the specification, but in order to speed it up, we unrolled the loop that runs through the Feistal network. We also unrolled P-array to work with this, editing the key-expansion parts as well.

Doing these changes, as seen, drastically sped up the algorithm.

Developer ManualThe Blowfish project is hosted on Github, here: https://github.com/Wayfarer247/Blowfish482. It can be cloned using Git.

$ git clone https://github.com/Wayfarer247/Blowfish482

The repository has two branches. Master, which is the old branch and does not have any speed improvements, and the “speed-improvements” branch, which has the faster version of the algorithm. To switch between the two:

$ git checkout masteror$ git checkout speed-improvements

Once a branch has been chosen, build the project:

$ make

The makefile in the directory will compile the project and makes 3 executables. Blowfish, the main program, can be run. Another executable, Profile, runs the encryption algorithm with an all zero key and all zero plaintext data N times.  It prints out the total running time of the program. Crypter reads a file in, encryptions the file with the provided key and then saves the encrypted file

User ManualOur Blowfish implementation is completely command line based. The program will probably not run on Windows without gcc (g++) installed. Once it has been compiled (see Developer Manual), then it can be run.

$ ./Blowfish

This will run the program if you are on the master branch. Speed Improvements, the final branch, needs the key and plaintext passed in via command line arguments.

$ ./Blowfish 00 00 00 00 00 00 00 00 11 11 11 11 11 11 11 11

In this case, the first sixteen zeroes are the key, and the sixteen ones are the plaintext, inputted as hex.

To use profile, do:

$ Profile <Number of Iterations>e.g. Profile 100000

To use crypter, do:

$ Crypter <Input File> <Output File> <Key bits>e.g. (Key Size: 8 bytes) Crypter pt.txt ct.txt 00 00 00 00 00 00 00 00

DiscussionThe choice of using C++ over Java had interesting repercussions. The reason it was chosen was because of how C++ handles bit operations and how it stores bytes, integers, and long variables. These variables are stored in memory correctly, so running operations on them did not involve and Java witchcraft.

However, it did bring up some issues with pointers and how variables are stored. We finally got all these issues ironed out, but it took quite a bit longer and probably would have gone faster had it been done in Java, which we are more comfortable with.

When we started writing the algorithm, we divided up the parts to work on, but didn’t spend too much time designing how the algorithm overall would work. Because of this, the key-expansion part generated Pi from scratch, rather than simply using a static variable. While it worked, the effort required to generate Pi could have been better used elsewhere.

Of course, this did mean that we learned how to generate Pi effectively.

If we had done some planning and design before hand, we also could have written the algorithm in such a way that making future optimization easier. Since we didn’t, the algorithm was a good representation of an unoptimized algorithm, but the work required to optimize it later was much greater.

Future WorkTo further improve the Blowfish algorithm, we could unroll all the loops to remove that overhead of loop creation and the associated variables. Further, parts of the program, such as the F function could be rewritten into a lower level language, such as C or Assembly instead of C++.  This would decrease the running time of the algorithm at the cost of removing the portability of the implementation. We could add a decryption function to allow for the decryption of the encrypted bytes using a given key.

WorkEthan Mick• Presentation 1

• Paper• Presentation 2• Code maintenance

Kevin Allison• Pi Generation• Static Pi Generation• Key Expansion• Paper• Code maintenance

Keith Feldman• Encryption Method• Main Method• Usage• Speed-Improvements• Code maintenance

References

Schneier, Bruce. "Description of a New Variable-Length Key, 64-Bit Block Cipher(Blowfish)."" Blowfish Paper. 1993. Web. 18 Mar. 2012." <http://www.schneier.com/paper-blowfish-fse.html>.

Morgan, Mike. "Blowfish Bug." Schneier on Security. 8 July 1996. Web. 18 Mar. 2012." <http://www.schneier.com/blowfish-bug.txt>.

Schneier, Bruce. "The Blowfish Encryption Algorithm" Blowfish. 1993. Web. 18 Mar. 2012." <http://www.schneier.com/blowfish.html>.

"Standard Cryptographic Algorithm Naming." Zetnet Meta Refresh. Web. 18 Mar. 2012." <http://www.users.zetnet.co.uk/hopwood/crypto/scan/cs.html>.


Recommended