A Robust Encryption files Approach using Minhash Technique
1 Mehdi Ebady Manaa and Rasha Hussein Jwdha
1College of Information Technology, University of Babylon, Babylon, Iraq.
[email protected]; [email protected]
Abstract
With the fast development of network applications and big data, the encryption and
key generation techniques play an important role in the protection of the sensitive
information for the large organizations that have access to the Internet. Information
security has become an important issue and a source of concern due to increase in the
information such as social media, e-commerce, and banks. Encryption plays an
important role in the protection of information against various attacks using
cryptographic algorithms. This paper comes to overcome the weaknesses in
generating the cipher key for the algorithms and to produce a robust block cipher
using one of the principles for minhash technique in data mining field. The block
keys are generated using the K-Shingle which is mainly used in minhash technique to
convert the text file into a sequence of consecutive words. The length of shingles
depends on the number of K. Minhash technique uses many hash functions to
generate the cipher keys and then to encrypt text files using cipher algorithms DES,
Triple DES, AES, and Blowfish. The robustness of this paper against the attacker is
obtained by the results which show that AES and Blowfish algorithms have excellent
results in term of encryption time, throughput, CPU usage and memory space. The
results are obtained in terms of Encryption time, Throughput, Memory used,
Avalanche effect.
Keywords: Minhash algorithm, Cryptography, Encryption algorithms, Network
Security, Symmetric algorithms.
International Journal of Pure and Applied MathematicsVolume 119 No. 15 2018, 169-183ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/
169
1. Introduction
The cryptography is the art and science of converting essential information into non-
understandable form that can be not understand by the third part or attacker. The word
Cryptography is derived from Greek origin, Crypto in the sense of secret and Graph in
the sense of writing. The message in this science is converted from readable form into
the unreadable form and then sent to the recipient which he is only authenticated to
decrypt the encrypted message using the agreed key into original message
.
The security criteria are confidentiality, integrity and availability are
conducted using the cryptography to protect the data for the large organization that
have access to the internet. It provides also authentication and non-repudiation. In the
other side, it provides an environment to use in different disciplines such
mathematics, electrical engineering, computer science, and many applications such as
computer password, ATM cards, message integrity techniques, digital signatures,
secure computation, interactive proof, identity authentication, and electronic
commerce . The simple form of encryption is conducted using encryption
and decryption between two parties using shared key between them as depicted in
figure
This form provide a secure communication between Bob and Alice against
adversaries, hackers or attackers. Figure (1) shows the classification of modern cipher
techniques into two categories:
Figure 1: Classification of the Modern Ciphers Techniques.
1. The symmetric algorithms that use the same key in encryption and decryption process. One of the main advantages of the symmetric algorithms is easy to use
and fast to implement but it has disadvantages when use in large networks .
International Journal of Pure and Applied Mathematics Special Issue
170
Examples of algorithms that use the symmetric key are AES, DES, 3DES, and others. [1].
2. Asymmetric key encryption (public key encryption) is a type of encryption where
the user has a pair of encryption keys, public key and the private key. The private
key remains secret. The public key share between the parties. The advantage of
this type is that when the message is encrypted using the public key, it can only be
decrypted by the corresponding private key. When a repetition occurs in text we
use mode of operation to avoid repetitions such as CBC, ECB, OFB ...etc. Some
modes require IV (initialization vector) is unique binary sequence for each
encryption operation. The initialization vector is used to ensure distinct
ciphertexts are produced even when the same plaintext is encrypted multiple times
independently with the same key.[2]
In this paper, we use block cipher based on a k-shingle with minhash technique to
encrypt and decrypt the files text that include sensitive information for the large
organization. The K-shingle is used to convert text file into consecutive tokens
depends on the length number of K. the Minhash technique is used to generate many
key for each shingle to generate groups of keys. These keys are manipulated as cipher
keys for the symmetric algorithms such as AES, DES, 3DES, and blowfish that used
in this paper. The remaining of this paper is organized as follow. Section (2) presents
the related work. The cipher method background in section (3). The Key Phase in
section (4). The Materials and Results (proposed system) is illustrated in section (5).
The results and discussion are shown in section (6). Finally. Section (7) discussed the
conclusion of the paper.
2. Related Work
The common related works for this work is illustrated in this part which are many of
the research used symmetric algorithms to encrypt the sensitive data and then use the
same key to decrypt it for original form.
The cost performance evaluation of cryptographic algorithms in term of Encryption
time, Memory used, ,Avalanche effect, entropy and number of bits for encoding
optimality is proposed in [3]. They used many cryptographic algorithms DES, 3DES,
AES, RSA and blowfish. The key is generated by "KeyGenerator" object using
packages in java security and java crypto. They experiments concluded the Blowfish
algorithm is the appropriate algorithm regarding memory used and encryption time.
The DES algorithm is the best in the bandwidth and AES is the best in cryptographic
strength.
The comparison between AES (Rijndael), Triple DES, DES, RC6 and Blowfish
algorithms are conducted using different setting in [4]. They use many evaluation
parameters data size, data type, encryption and decryption time, key size and power
consumption. Simulation results show a comprehensive evaluation for each
algorithm. In the same direction, the paper in [5] applied DES, 3DES, AES, and the
International Journal of Pure and Applied Mathematics Special Issue
171
Blowfish. The ECB (electronic Codebook) mode results show various evaluation for
these algorithms in term of second time for varying input files. The results showed
that Blowfish was the best performance algorithm among the rest of others.
The evaluation of symmetric and asymmetric algorithms such as DES, Blowfish, AES
and RSA is presented in [6]. The comparing algorithms using a number of parameters
such as encryption / decryption time and throughput are conducted for different types
and sizes of data in this experiment. The study has proven that the AES algorithm is
the best in term of the encryption /decryption time and throughput.in addition, The
proposed work in [7] evaluates AES and DES cipher algorithms. The evaluation
parameters are conducted using processing time, CPU usage and throughput using
two platforms in laptop core I5, 2.5 GH CPU on windows 7 and Mac platform for
different data size. The simulation results are evaluated for different sizes of files.
The modified Blowfish algorithm is designed and implemented [8] for networking
and communication application for enhanced network security and defence
applications. They use single Blowfish round instead of many rounds. They uses
Xilinx ISE platform for evaluation the work based on VHDL language. In addition,
the comparisons between two algorithms blowfish and Skipjack are designed and
implemented for encrypting input files of varying contents and sizes. The results show
Blowfish is the best performing algorithm for implementation[9].
The AES algorithm is implemented for five different execution platforms by F. García
in [10]. The main objective of this work is to provide model for the configuration
parameters of AES implementation. In another work, various encryption algorithms
(AES ,Blowfish , Twofish, DES , RSA and Diffie-Hellman) based on different
parameter are compared to choose the best data encryption algorithm by Gaurav and
Aparna[11]. Simulation results are given to show the effectiveness of each algorithm
with the encryption and decryption time, encryption speed and throughput.
3. The Cipher Block Algorithms
The most block cipher algorithms that used in this work are presented in the following
sections. The work in implemented and analysis using Java. In spite of Java complier
is generated Java Virtual Mahine (JVM) that need an interprter to run it inside the
machine code, the cipher algorithms show good results in term of many performance
evaluation and based on Minhash technique.
3.1 Advanced Encryption Standard (AES)
AES was known as Rijndael, is one of symmetric key block cipher developed by
Joaen Daemen, Vincent Rijmen . The block size is 128 bit and it has three key size
128, 192 and 256 bits. The AES rounds are 10 when key size 128, 12 and 14 rounds
with key size 192 and 256 respectively. AES uses key expansion to encrypt/ decrypt
data where key expansion comes before encryption process and before decryption
process. It comes as advance of DES and 3DES[12].
International Journal of Pure and Applied Mathematics Special Issue
172
3.2 Data Encryption Standard (DES)
The DES was published by National Institute of Standards and Technology(NIST) as
first encryption standard. It was designed by IBM based on their Lucifer cipher which
considers as block cipher algorithm with 64 bits key length as standard. The NSA puts
a restriction to use 56 bits as key length for block cipher encryption with 64 bits and
discard 8 bits. The DES is more flexible because it works with different modes such
as OFB, CBC ...etc .
The main drawback of DES is used 56 bits for key length and it was broken by
supercomputer for total 22 hours by DES cracker in 1998 . with the help of
lakh’s of distributed PCs on the Internet . For this reason, it was
modified to a 3DES algorithm, which was used three time key to increase the
strength of the algorithm but it becomes slow.[13]
3.3 Triple Data Encryption Standard (DES )
The enhanced version of DES is the symmetric key block cipher algorithm (3DES). It
comes to strong the block cipher three times the DES. The main problem of 3DES is
slower in performance that DES . The main advantage was to increase
the key size to provide safer for the ciphers data with 112 and 168 bit for block cipher
64 bits length [12].
3.4 Blowfish Block Cipher Algorithm
Blowfish is symmetric key block cipher algorthim with block size 64 bits and various
key cipher size from 32 to 448 bits to protect the encrypted data. It was announced by
Bruce Schneier in 1993. The details of this algorithm can be found in [3]
4. Key generation Method
The main method to generate the key in this work is described by the following
subsections
4.1 K-Shingle
The term of K-shingle is used heavily in document similarity, in this technique,
documents is split into set of tokens depending on the length of k. for example, if the document has the string "The weather is nice and the sky is blue". If we choose k=3, the number of the generated tokens is equal to (n-k+1), where n is the total number of
documents words k is the shingle length. The generated tokens are {“The weather is, weather is nice, is nice and, nice and the, and the sky, the sky is, sky is blue”}. In this
paper, we split the text files into shingles based on the number length of k. then we applied the hash functions (MD5, SHA-512 and SHA-1) for each shingle. The minhash technique is applied for the hashed tokens.
4.2. MinHash Function
The principle of minhash technique is applied in this work. The general form of the
hash function is given in equation (1) .
International Journal of Pure and Applied Mathematics Special Issue
173
where a and b are two random values, x is the hash function value for the tokens and c
is a prime number that is greatly than the maximum number of x [15]. For example,
For example, if we use 4 randomly generated hash functions as shown in equations (2
to 5) respectively for the file size ( ) that has the following text (
)
… (2),
The hash functions in above are used to calculate the value of each hash token for
the file text as depicted in Table(1) below.
Table : The proposed Minhash Work
# The hashed Shingles
(values)
Hash_1 Hash_2 Hah_3 Hash_4
International Journal of Pure and Applied Mathematics Special Issue
174
5. The Proposed System
In this work, we have implemented and compared four block cipher algorithms DES,
3DES, AES and Blowfish. We have implemented algorithms in Java and using files in
different sizes. The key generation is applied based on k-shingles and Minhash. The
evaluation metrics are applied based on the encryption time, memory usage, entropy,
throughput and avalanche effect to evaluate the performance of these algorithms. The
system results are explained in details in the following subsections. The proposed
system is illustred in figure (XXXXXXXXXXXX))
5.1 The key generation phase
At this phase, the K-shingle, hash function and Minhash technique are used. K-
shingle is the process of split the input of the data into substring according to the
specified length of K. The punctuation marks and spaces are removed before applying
the shingling process. The hash function SHA 512 is applied for each generated
shingle. The minhash function is implemented for each hashed shingle. In this work,
we choose 10 hash functions to test the results which are the text files such as .pdf,
.docx, .doc and .xls in different size. The values of a nd b are chosen randomly. The
value of the x is represented the minimum value after applied the 10 hash functions in
randomly setting for a and b values. The output of this stage is many key for each
shingle that used later for cipher process. Figure ( ) illustrates the main steps of the
key generating phase.
Figure 2: The key generation phase for the proposed system
International Journal of Pure and Applied Mathematics Special Issue
175
Input files choose k value prepress text (remove punctuation and double space)
Hashing the Shingles applying Minhash Technique Encryption Phase
Decryption Phase
The proposed system based
5.2. Encryption/ Decryption phase
The generated keys from previous phase is applied for each block size text
. For example, the key 1 is applied to block cipher 1 and key 2 is applied
for block cipher 2 and so on. In the decryption, the same keys are used for the
encypted text to convert it to its original form.
Figure 3:
The main pseudo code steps for the proposed system is illustrated in algorithm (1)
6. The results and discussion
The main criteria to evaluate the algorithms (DES, Triple DES, AES, and Blowfish),
that are implemented in this work, are encryption time, memory usage, entropy,
throughput and avalanche effect.
6.1 Encryption Time
Encryption time is the amount of time that the algorithm is needed to convert data from text to encrypted text depending on the key size and data block size. The less
time the algorithm takes, the better the algorithm is used to embed encryption it in
International Journal of Pure and Applied Mathematics Special Issue
176
other applications such as e-commerce, banking and online transaction processing applications. In this work, we measure the time in milliseconds for different size of
input files in bytes. Table2: The Encryption Time when K=5.
Input Size
(Bytes)
Times in Millisecond
DES Triple DES AES Blowfish
1,164 408510 6104568 143 80
2,492 1930512 78119334 218 192
12,571 90647150 271941450 648 404
114 8882 11429 19 17
24,490 181294300 543882900 1159 464
Figur4: Encryption Time for DES, Triple DES, AES and Blowfish.
It is noticed from the figure (xxxxxx) that the Blowfish and AES take less time for
encryption, while DES and 3DES need time.
6.2 Throughput
The Throughput is calculated by using the equation (xxxxxxxxxxxxx)
Table (xxx) shows the throughput values for different for the fours block cipher
symmetric algorithms.
0
100000000
200000000
300000000
400000000
500000000
600000000
1,164 2,492 12,571 114 24,490
Tim
es
in M
illis
eco
nd
Tie
Files input Size (Bytes)
Encryption Time
DES
3DES
AES
Blofish
International Journal of Pure and Applied Mathematics Special Issue
177
Table3: Throughput Values for DES, 3DES, AES and Blowfish
Input Size
(Bytes)
Throughput
DES Triple
DES AES BlowFish
1,164 0.00284 0.000190 8.13986 14.55
2,492 0.00129 0.000031 11.43119 12.97916
12,571 1.38680 0.000046 19.39969 31.11633
114 0.01306 0.00997 6.10526 6.82352
24,490 0.00013 0.000045 21.13028 52.78017
Figure 5: throughput values
It is noticed in figure (xxxxxxxxxx), the two algorithms Blowfish and AES satisfy the
best throughput values.
6.3 Memory used
The memory usage is the space that is reserved for implementation the algorithms
which is depends on the number of operations in algorithm, key size, initialization
vectors and type of mode operations. It is noticed from the memory usage results that
the Blowfish algorithm is lower consumption of memory, as shown in the table
(xxxxxxxxxxxx).
Table 4: Memory usage for Cryptography algorithms
Input Size
(Bytes)
Memory Used
DES Triple DES AES Blowfish
1,164 4112704 8873136 4603416 4231824
2,492 33792936 201033047 12960288 6498872
0
10
20
30
40
50
60
1,164 2,492 12,571 114 24,490
Thro
ugh
pu
t (b
/Ms)
Files Input Size (Bytes)
Throughput
DES
Triple DES
AES
BlowFish
International Journal of Pure and Applied Mathematics Special Issue
178
12,571 201350208 604050624 78744264 14345040
114 79310016 977580 576632 0
24,490 300762111 904050622 1708720 21142864
Figure (xxxxxxxxxxxx) the superior of Blowfish algorithm than other algorithms in
this work.
Figure 6: Memory usage by Cryptography algorithms
It is noticed for figure (xxxxx) that the Blowfish has the least memory consumption,
while the DES algorithm has the highest memory consumption.
6.4 Avalanche effect
Equation (xxxxxxxx) is used to calculate the avalanche effect by using the hamming
distance.
[3]
It can calculated using the equation (xxxxxxxxx).
It is noted from table (xxxxx) using the equation (xxx) that the AES has the highest
values and satisfy the best results than DES which has the least value.
0
100000000
200000000
300000000
400000000
500000000
600000000
700000000
800000000
900000000
1E+09
1,164 2,492 12,571 114 24,490
Me
mo
ry u
sage
(B
yte
s)
Files Input Size (Bytes)
Memory consumption
DES
Triple DES
AES
Blowfish
International Journal of Pure and Applied Mathematics Special Issue
179
Table 5: Average of Avalanche effect by changing #bits in plaintext
It is cleared that the proposed method satisfy good results when we compare with
results in [16][17] and)[18] in terms of these cipher algorithms.
6.5 Entropy Value
It is calculated using the Shanon law in equation (xxxxxx). It is a measure of
randomness which indicates that the power of the cipher algorithms when hacker tries
to break it.
It is cleared from table (xxxxx) that the AES and Blowfish have the highest values in
randomize.
Table 6: Average of entropy values for cipher algorthims
Input Size (Byte) Entropy
DES Triple DES AES Blowfish
1,164 7.45121 7.45121 7.45121 7.45121
2,492 8.50779 8.50779 8.50779 8.50779
12,571 10.86936 10.87574 10.87574 10.87574
114 3.96981 4.08746 4.08746 4.08746
24,490 10.54109 11.54109 11.54109 11.54109
7. Conclusion
The power of cipher algorithms is depend on many factors the strong of the generated
key or key length for cipher algorithms. This paper comes to propose a robust key
generation based on the principles of K-shingle and Minhash technique. The AES,
DES, 3DESand Blowfish algorithms are implemented in this work based on the
No. of filliped
bits
The Proposed Method (%)
DES 3DES AES Blowfish
1 17.1875 98.437 98.437 93.229
2 17.708 95.312 98.437 95.833
4 0 98.437 100 93.229
19 0 97.368 98.437 92.083
35 33.680 98.958 98.863 93.442
45 26.988 97.5 96.093 92.329
Average 15.927 97.669 98.378 93.357
International Journal of Pure and Applied Mathematics Special Issue
180
minhash technique. We conclude many results in term of many performance
evaluation such as Encryption time, throughput, memory used, and Avalanche effect.
It is concluded that the AES and Blowfish superior the other used algorithms. In the
other results, we conclude that the DES and 3DES satisfy less results in term of
encryption time, throughput and memory consumption where they require a lot of
time to execute. In addition, the obtained results show the Blowfish is more suitable
for the applications that needs speed and AES is better for the applications that need
confidentiality and integrity with highest priority. The strength of all algorithms show
that entropy has equal value which indicate the randomness strength in these cipher
algorithms.
References
[1] W. Stallings, Cryptography And Network Security: Principles And Practice, 6th Editio. Nj, Englewood Cliffs:Prentice-Hall, 2006.
[2] P. C. Van O. And S. A. V. Alfred J. Menezes, Handbook Of Applied Cryptography. 1996.
[3] M. S. Patil P, Narayankar P, Narayan Dg, “A Comprehensive Evaluation Of
Cryptographic Algorithms: Des, 3des, Aes, Rsa And Blowfish,” Elsevier B. V. Procedia Comput. Sci., Vol. 78, Pp. 617–624, 2016.
[4] H. M. A. K. And M. M. H. Diaa Salama Abdul. Elminaam, “Performance
Evaluation Of Symmetric Encryption Algorithms,” Int. J. Netw. Secur., Vol. 8, No. 12, 2009.
[5] M. S. Patil P, Narayankar P, Narayan Dg, “A Comprehensive Evaluation Of Cryptographic Algorithms: Des, 3des, Aes, Rsa And Blowfish,” Elsevier B. V.
Procedia Comput. Sci., No. September, Pp. 84–89.
[6] M. Panda, “Performance Analysis Of Encryption Algorithms For Security,” Pp. 840–844, 2016.
[7] S. D. Rihan And S. E. F. Osman, “A Performance Comparison Of Encryption
Algorithms Aes And Des,” Int. J. Eng. Res. Technol., Vol. 4, No. 12, Pp. 151–154, 2015.
[8] S. Manku And K. Vasanth, “Blowfish Encryption Algorithm For Information
Security,” Arpn J. Eng. Appl. Sci., Vol. 10, No. 10, Pp. 4717–4719, 2015.
[9] H. Z. M. Ali Ahmad Milad And M. A. A. Zul Azri Bin Muhamad Noh, “Comparative Study Of Performance In Cryptography Algorithms (Blowfish And Skipjack),” J. Comput. Sci., Vol. 8, No. 7, Pp. 1191–1197, 2012.
[10] T.Sathya , K.Sudhadevi ,"Secure Data Transfer Using Generic Data Lineage Framework & Accountability Mechanism", International Journal of Innovations in Scientific and Engineering Research (IJISER).,Vol.4 ,No. 1,Pp. 11-17,2017.
[11] D. F. García, “Performance Evaluation Of Advanced Encryption Standard
Algorithm,” Pp. 247–252, 2015.
[12] G. Yadav And M. A. Majare, “A Comparative Study Of Performance Analysis Of Various Encryption Algorithms,” Int. Conf. Emanations Mod. Technol.
Eng., Vol. 5, No. 3, Pp. 70–73, 2017.
[13] B. A.Forouzan, Cryptography And Network Security, Vol. 721. New York:Mcgraw Hill, 2007.
[14] C. Paar And J. Pelzl, “Understanding Cryptography: A Textbook For Students And Practitioners,” Springer-Verlag New York, Inc., 2010.
International Journal of Pure and Applied Mathematics Special Issue
181
[15] J. D. U. Jure Leskovec, Anand Rajaraman, Mining Of Massive Datasets. 2014.
[16] Dr.A. Ganesan , " User Awareness Of Inflibnet In Etd Shodhganga- A View, International Research Journal of Multidisciplinary Science & Technology, Vol.1 , No.1 ,2006.
[17] S. R. Nagireddy, “Scalable Techniques For Similarity Search,” 2015.
[18] C. Echeverri, “Visualization Of The Avalanche Effect In Ct2,” Vol. 2016, 2017.
[19] M. S. Himani Agrawal, “Mplementation And Analysis Of Various Symmetric
Cryptosystems,” Indian J. Sci. Technol., Vol. 3, No. 12, 2010.
[20] N. Kumar, Amish, & Tiwari, “Effective Implementation And Avalanche Effect Of Aes,” Int. J. Secur. Priv. Trust Manag., Vol. 1, No. 3, Pp. 31–35, 2012.
[21] C. Cachin, “Entropy Measures And Unconditional Security In Cryptography,”
Swiss Federal Institute Of Technology Zurich, 1997.
International Journal of Pure and Applied Mathematics Special Issue
182
183
184